Seminar • Algorithms and Complexity — Optimal Sub-Gaussian Mean Estimation in R | Cheriton School of Computer Science

Please note: This seminar will be given online.

Jasper Lee, Department of Computer Science
Brown University

We revisit and settle one of the most fundamental problems in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean in the high probability regime, in terms of error convergence with respect to sample size? The conventional wisdom is to use the empirical mean as our estimate. However, it is known that the empirical mean can in fact have exponentially sub-optimal convergence for certain heavy-tailed distributions. On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit only in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, to have an estimator that has optimal sub-Gaussian concentration with an optimal constant, for all finite-variance distributions.

In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a 1+o(1) multiplicative factor. Our estimator is furthermore computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using linear and convex-concave programming dualities, which may be of independent interest.

This is joint work with Paul Valiant.

To join this seminar on Zoom, please go to https://zoom.us/j/97232757708?pwd=bHhPNHBSUUpUVFE4OHBkYkNJalkwdz09.