Mathematics Data Science Seminar: Prof. Long Nguyen, Hierarchical clustering and mixture modeling fo

Warning Icon This event is in the past.

When:
February 28, 2024
2:30 p.m. to 3:30 p.m.
Where:
Faculty/Administration #1146
656 W. Kirby
Detroit, MI 48202
Event category: Seminar
In-person

Speaker: Long Nguyen, University of Michigan

 

Time: Wednesday, Feb 28, 2:30pm-3:30pm

 

Place: Nelson room


Title: Hierarchical clustering and mixture modeling for heterogeneous data

Abstract:

Agglomerative hierarchical clustering is a well-known method for exploratory

data analysis and visualization but there is very little theoretical support.
Mixture modeling provides strong theoretical guarantees for learning
heterogeneous data populations, but it requires strong model assumptions
and can be brittle if the model is misspecified or only weakly identifiable.
This work provides a bridge to agglomerative hierarchical clustering
by following a mixture model-based approach. Starting with fitting a
finite mixture model on a heterogeneous data set with a finite number of
components larger than needed, a hierarchical clustering tree (also known as
the dendrogram) is constructed in a way analogous to an agglomerative
hierarchical clustering algorithm that sequentially merges clusters.
The specific way in which the merging is developed is derived from an
optimal transport based theory of convergence of the mixing measures,
where competing atoms that provide support for the estimated mixing measures
are merged via a suitable projection under the L2 Wasserstein metric.
With this algorithm we can consistently select the true number of components
and obtain a pointwise optimal convergence rate for parameter estimation
from the hierarchical tree, even when the model parameters are only weakly identifiable.
In theory, it also explicates the choice of the optimal number of clusters
in hierarchical clustering.  In practice, the dendrogram reveals more
information on the hierarchy of subpopulations compared to traditional
ways of summarizing mixture models.  Illustrations on simulated data and
a single-cell RNA sequence data set will be discussed. This work is joint
with Dat Do, Linh Do, Scott McKinley and Jonathan Terhost.

Contact

Rohini Kumar
rohini.kumar@wayne.edu

Cost

Free
February 2024
SU M TU W TH F SA
28293031123
45678910
11121314151617
18192021222324
252627282912