Overview:
Data-driven companies, or organizations in general, are unable or unwilling to afford privacy-preserving solutions in their products due to the lack of standards and resources. Those privacy-preserving solutions developed by Google and the U.S. Census require a group of privacy experts, and they are not directly transferable to other domains. As the market for diverse smart-devices and systems has expeditiously emerged, the need for privacy-preserving solutions within a wide range of domains and applications have become ever-pressing. However, ad hoc approaches lead to a drastic and inevitable cost to companies and individuals. To address these challenges, our long-term goal is to design and build a toolbox called Cost-Aware Privacy Engine (CAPE) that (i) can be seamlessly integrated into existing systems and environments and that (ii) can ensure provable and customizable privacy based on the cost requirements (e.g., utility cost or computation cost) of different applications. With CAPE, data curators can deliver their privacy promises with little overhead while minimizing the cost of privacy.
Projects:
- Privacy engine for data exploration system:
Data exploration carries a high value in supporting and enabling large-scale data integration and analytics work (e.g. cleaning and study on medical records, service recommendation based on users behavior) especially when data gets increasingly complex. Through interactive queries on the dataset, a data analyst is able to better understand the underlying semantics of the dataset and plan for the subsequent applications. However, many of these datasets are a mix of public and private sensitive data of individuals. Allowing these datasets to be explored by external analysts (and even internal analysts) without any provable guarantees can lead to a myriad of privacy issues, such as the Facebook Cambridge Analytica scandal. However, the existing data exploration systems [KSV+16, AMP+13] that provide accuracy and latency guarantee do not offer any privacy guarantee to data owners. On the other hand, the systems [McS09, PGM14, ZMK+18, JNS17] that offer strong privacy guarantees are not meant for data exploration and requires data analysts to have privacy background. Hence, this research project aims at enabling general data analysts to explore private sensitive data with accuracy and latency guarantee while offering strong privacy guarantee to data owners.
We will explore the following sub-projects.
- Supporting Accuracy-aware Differentially Private Data Exploration
- Enabling Private Sampling and Online Aggregation
- Customizing Privacy Protection for Different Domains
Related research publications:
- "DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance", with Shufan Zhang, SIGMOD'24 arXiv
- "Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", with Miti Mazmudar, Thomas Humphries, Jiaxiang Liu, Matthew Rafuse, VLDB'23, link
- "MIDE: Accuracy Aware Minimally Invasive Data Exploration for Decision Support", with Sameera Ghayyur, Dhrubajyoti Ghosh, Sharad Mehrotra, VLDB'22, (pdf)
- "Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases", with Priyanka Nanayakkara, Johes Bater, Jessica Hullman, and Jennie Rogers, PETS'22, arXiv demo
- "Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", with Miti Mazmudar, Thomas Humphries, and Matthew Rafuse, TPDP 2020, link
- "Differentially Private Sublinear Average Degree Approximation", with Harry Sivasubramaniam, and Haonan Li, TPDP 2020, link
- "Linear and Range Counting under Metric-based Local Differential Privacy", with Zhuolun Xiang, Bolin Ding, and Jingren Zhou, ISIT 2020, link
- "Towards Accuracy Aware Minimally Invasive Monitoring (MiM)", with Sameera Ghayyur, Dhrubajyoti Ghosh, and Sharad Mehrotra, TPDP 2019, link
- "PrivateSQL: A Differentially Private SQL Query Engine", with Ios Kotsogiannis, Yuchao Tao, Ashwin Machanavajjhala, Michael Hay and Gerome Miklau, VLDB 2019, link
- "APEx: Accuracy-Aware Differentially Private Data Exploration", with Chang Ge, Ihab Ilyas, and Ashwin Machanavajjhala, SIGMOD 2019, link
- "Provably privacy for mobility data", with Ashwin Machanavajjhala, Springer Handbook on Mobile Data Privacy, 2017
- "A Demonstration of VisDPT: visual exploration of differentially private trajectories", with Nisarg Raval and Ashwin Machanavajjhala, VLDB 2016 (BEST DEMO AWARD) pdf
- "DPT: Differentially private trajectory synthesis using hierarchical reference systems", with Graham Cormode, Ashwin Machanavajjhala, Cecilia M. Procopiuc, and Divesh Srivastava, VLDB 2015 pdf
- "Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies", with Ashwin Machanavajjhala and Bolin Ding, SIGMOD 2014 pdf