Please note: This presentation will take place in DC 2314 and online.
Sabrina Mokhtari, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Gautam Kamath
Differential privacy (DP) is a standard method for preserving the privacy of individual data points. DP prevents models from memorizing training data, thus reducing the risk of data leakage. While DP has been effective in machine learning (ML), there are growing concerns about some common practices in the literature, particularly regarding the use of existing ML benchmarks to measure progress in DP ML. Recent critiques highlight that popular benchmarks such as CIFAR-10, although widely used in non-private settings, may not adequately reflect the complexities of more sensitive domains like medical data. Moreover, the effectiveness of pre-training on similar public datasets may not translate to scenarios where private data is substantially different. This thesis addresses these concerns by evaluating DP methods using various privacy-sensitive datasets and training scenarios. We focus on medical datasets, where privacy is crucial, and study a thorough set of techniques. These techniques cover a range of settings, including those with public data pre-training, cases without public data, full-layer and last-layer fine-tuning, linear probing, and different privacy levels.
To attend this presentation in person, please go to DC 2314. You can also attend virtually using Zoom.