Priyank
Jaini,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Multivariate density estimation is a central problem in unsupervised machine learning that has been studied immensely in both statistics and machine learning. Several methods have thus been proposed for density estimation including classical techniques like histograms, kernel density estimation methods, mixture models, and more recently neural density estimation that leverages the recent advances in deep learning and neural networks to tractably represent a density function. In today's age when large amounts of data are being generated in almost every field it is of paramount importance to develop density estimation methods that are cheap both computationally and in memory cost. The main contribution of this thesis is in providing a principled study of parametric density estimation methods using mixture models and triangular maps for neural density estimation.
The first part of the thesis focuses on the compact representation of mixture models using deep architectures like latent tree models, hidden Markov models, tensorial mixture models, hierarchical tensor formats and sum-product networks and provides a unifying view of representation of mixture models using such deep architectures. The unifying view allows us to prove exponential separation between deep mixture models and mixture models represented using shallow architectures demonstrating benefits of depth in their representation. In a surprising result thereafter, we prove that a deep mixture model can be approximated using the conditional gradient algorithm by a shallow architecture of polynomial size w.r.t. the inverse of the approximation accuracy.
Next, we address the more practical problem of density estimation of mixture models for streaming data by proposing an online Bayesian Moment Matching algorithm for Gaussian mixture models that can be distributed over several processors for fast computation. Exact Bayesian learning of mixture models is intractable because the number of terms in the posterior grow exponentially w.r.t. to the observations. We circumvent this problem by projecting the exact posterior on to a simple family of densities by matching a set of sufficient moments. Subsequently, we extend this algorithm for sequential data modeling using transfer learning by learning a hidden Markov model over the observations with Gaussian mixtures. We apply this algorithm on three diverse applications of activity recognition based on smartphone sensors, sleep stage classification for predicting neurological disorders using electroencephalography data and network size prediction for telecommunication networks.
In the second part, we focus on neural density estimation methods where we provide a unified framework for estimating densities using monotone and bijective triangular maps represented using deep neural networks. Using this unified framework we study the limitations and representation power of recent flow based and autoregressive methods. Based on this framework, we subsequently propose a novel Sum-of-Squares polynomial flow that is interpretable, universal and easy to train.