Please note: This presentation will take place in DC 2310 and online.
Mushi Wang, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Xi He
This paper evaluates privacy metrics for synthetic tabular data, with a particular focus on black-box privacy metrics that assess privacy without requiring detailed knowledge of the data generation process. We investigate the effectiveness of two prominent metrics: Density Overfitting Membership Inference Attack with Synthetic Data (DOMIAS) and Distance to Closest Record (DCR). Using six diverse datasets from the UCI Machine Learning Repository, we compare the performance of different synthetic data generation models, including diffusion models like TabDDPM and traditional models like PrivBayes. Our results indicate that while DOMIAS demonstrates limited sensitivity across various datasets and configurations, DCR effectively measures the similarity between synthetic and real data, providing valuable insights into privacy preservation. Additionally, we examine the Step-wise Error Comparing Membership Inference (SECMI) attack, which involves analyzing prediction errors at each generation step to infer membership status. Diffusion models like TabDDPM generally achieve a superior balance of utility and privacy compared to traditional models like PrivBayes. This study underscores the necessity of developing robust and adaptable privacy metrics to reliably assess privacy risks associated with synthetic data, ensuring its safe application across diverse fields, thereby fostering innovation while safeguarding individual privacy.
To attend this presentation in person, please go to DC 2310. You can also attend virtually using Zoom.