Seminar • Machine Learning • Unsupervised Learning: Validation Beyond Visualization

Thursday, November 23, 2023 10:30 am - 11:30 am EST (GMT -05:00)

Please note: This seminar will take place in DC 1304.

Marina Meila
Department of Statistics, University of Washington
Senior Fellow, University of Washington’s eScience Institute

While machine learning is many times faster than humans at finding patterns in scientific data, the task of validating these patterns as “meaningful” is still left to the scientist, or to ad-hoc methods such as visualization. To effectively accelerate scientific discovery with machine learning, human validation must be replaced with automated validation to the extent possible. Otherwise, instead of drowning in data, one risks drowning in hypotheses. In this talk I will present instances in which unsupervised learning tasks can be augmented with data driven guarantees of reproducibility and correctness.

In the case of clustering, I will introduce a new framework for proving that a clustering is approximately “correct”. This framework does not require a user to know anything about the data distribution. Unlike the PAC bounds in supervised learning, the bounds for clustering can be calculated exactly by solving a convex program and can be of direct practical utility.

In the case of non-linear dimension reduction by manifold learning, I will demonstrate some of my group’s contributions to making the output of ML algorithms reproducible and interpretable. Surprisingly, some of the results bring us back to familiar machine learning methods such as sparse recovery.

Joint work with Dominique Perrault-Joncas, James McQueen, Yu-Chia Chen, Samson Koelle, Hanyu Zhang, Weicheng Wu, Ioannis Kevrekidis


Bio: Marina Meila is Professor of Statistics at the University of Washington and Senior Fellow of the University of Washington’s eScience Institute. Her long term interest is in statistical learning, particularly the discovery of geometric and combinatorial structure in data, efficient algorithms, and developing guarantees and validation methods for unsupervised learning with minimal or no assumptions about the data generating process.

She has collaborated with scientists in applied inverse problems, materials science and theoretical chemistry. Meila holds a MS degree in Electrical Engineering from the Polytechnic Institute of Bucharest, and a PhD in Computer Science and Electrical Engineering from the Massachusetts Institute of Technology.