Please note: This master’s thesis presentation will take place in DC 3317.
Alex
Bie,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisors: Professors Gautam Kamath, Shai Ben-David
We study the problem of private distribution learning with access to public data. In this setup, a learner is given both public and private samples drawn from an unknown distribution p belonging to a class Q, and has the task of outputting an estimate of p while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples.
Our setting is motivated by the privacy-utility tradeoff: algorithms satisfying the mathematical definition of differential privacy offer provable privacy guarantees for the data they operate on, however, owing to such a constraint, exhibit degraded accuracy. In particular, there are classes Q where learning is possible when privacy is not a concern, but for which any algorithm satisfying the constraint of pure differential privacy will fail on.
We show that in several scenarios, we can use a small amount of public data to evade such impossibility results. Additionally, we complement these positive results with an analysis of how much public data is necessary to see such improvements. Our main result is that to learn the class of all Gaussians in R^d under pure differential privacy, d+1 public samples suffice while d public samples are necessary.