CAD Seminar Series: Information-Theoretic Generalization Bounds for Deep Neural Networks

Name: CAD Seminar Series: Information-Theoretic Generalization Bounds for Deep Neural Networks
Start: 2024-10-02T14:30:00-04:00
End: 2024-10-02T15:30:00-04:00

This event is in the past.

When:

October 2, 2024
2:30 p.m. to 3:30 p.m.

Where:

Faculty/Administration
656 W. Kirby (Room #1146)
Detroit, MI 48202
Zoom Go to virtual location

Event category: Seminar

Hybrid

Speaker: Haiyun He, PhD, postdoctoral associate at Cornell University, Center for Applied Math

(https://www.cam.cornell.edu/cam/people/postdocs-and-visitors)

Postdocs and Visitors | Center for Applied Mathematics

Meet our exceptional postdocs and visitors:

www.cam.cornell.edu

Time: Wednesday, October 2, from 2:30 to 3:30 pm

Location for in-person participants: 1146 FAB

Zoom link for online audience: https://wayne-edu.zoom.us/j/92845590121?pwd=CpRA5Wa5gzSMn2xiVkR2abD83O5nrH.1

Title: Information-Theoretic Generalization Bounds for Deep Neural Networks

Abstract:

Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the train and test distributions of the network internal representations. The KL divergence bound shrinks as the layer index increases, while the Wasserstein bound implies the existence of a layer that serves as a generalization funnel, which attains a minimal 1-Wasserstein distance. Analytic expressions for both bounds are derived under the setting of binary Gaussian classification with linear DNNs. To quantify the contraction of the relevant information measures when moving deeper into the network, we analyze the strong data processing inequality (SDPI) coefficient between consecutive layers of three regularized DNN models: $mathsf{Dropout}$, $mathsf{DropConnect}$, and Gaussian noise injection. This enables refining our generalization bounds to capture the contraction as a function of the network architecture parameters. Specializing our results to DNNs with a finite parameter space and the Gibbs algorithm reveals that deeper yet narrower network architectures generalize better in those examples, although how broadly this statement applies remains a question.

Bio:

Haiyun He is currently a postdoctoral associate at Cornell University in Center for Applied Math, working with Prof. Ziv Goldfeld and Prof. Christina Lee Yu. She earned her PhD in Electrical and Computer Engineering (ECE) from the National University of Singapore (NUS) in Sep. 2022, advised by Prof. Vincent Y. F. Tan. She obtained her master degree in ECE from NUS in 2017 and bachelor degree in Electronics and Information Engineering from Beihang University, China in 2016. Her research focuses on the intersection of information theory (IT) and machine learning (ML), aiming to develop fundamental theoretical analyses for statistical and machine learning challenges using information-theoretic tools. Her works have been published on top-tier IT and ML journals and conferences. She was once selected as EECS Rising Star by UT Austin in 2022.

Contact

Rohini Kumar
rohini.kumar@wayne.edu

Cost

Free

Calendars

Research Events, Mathematics, Main Events Calendar

Audience

Current students, Faculty