CS848: PRIVACY & FAIRNESS IN DATA SCIENCE

Course Overview

This course focuses on the design of data science algorithms and characterizing properties of privacy and fairness. Our daily lives are actively being monitoring on web browsers, social networks, wearable devices and even robots. These data are routinely analyzed by statistical and machine learning tools to infer aggregate patterns of our behavior (with applications to science, medicine, advertising, IoT, etc). However, these data also contain private information about us that we would not like revealed to others (e.g., medical history, sexual orientation, locations visited, etc.). Disclosure of such data leads to our privacy being breached. Moreover, acting on data with such sensitive information could result in physical or financial harm, and discrimination, leading to issues in fairness. A grand challenge that pervades all fields of computer science (and more generally scientific) research is: how to learn from data collected from individuals while provably ensuring that (a) private properties of individuals are not revealed by the results of the learning process, and (b) the decisions taken as a result of data analysis ensure fairness. In this course, we will study recent work in computer science that mathematically formulates these societal constraints. For privacy, we will study differential privacy, a breakthrough privacy notion: an algorithm is differentially private if its output is insensitive to (small) changes in the input. Differential private algorithms have found applications in developing algorithms with provable privacy guarantees while learning from data from varied domains (e.g., social science, medicine, communications) and in varied modalities (e.g., tables, graphs, streams), and is used by government organizations and internet corporations like Google and Apple to collect and analyze data. Differential privacy has also been shown to help design better learning algorithms. We will also investigate how to formulate the notion of fairness in data analysis mathematical. We will study both the theory and practice of designing private and fair data analysis algorithms and their applications to data arising from real-world systems.

Prerequisites: The course is open to interested graduate students with sufficient mathematical maturity. Basic knowledge in algorithms, proof techniques, and probability will be assumed. Familiarity with databases and machine learning would help but is not necessary.

The course is currently listed in the Databases areas.

Format:

Graded Student Work:

Grading:

Schedule

DATE TOPIC READING
Sep 10 Introduction (slides)
Module I: Intro to Privacy
Privacy attacks practicum (slides) + in-class exercise (pdf)
Differential privacy & basic algorithms (slides)
Machanavajjhala, Kifer, "Designing Statistical Privacy For Your Data", CACM 2015 (PDF)
Dinur, Nissim, "Revealing Information while Preserving Privacy", PODS 2003 (PDF)
Dwork, Roth, "Algorithmic Foundations of Differential Privacy", Foundations and Trends 9(3-4), 2014 (PDF) Chapter 2 & 3 (Henceforth called the Privacy Textbook)
Sep 17 Designing complex algorithms & composition (slides)
In-class mini project 1 (handout, files)
(mini project 1 report due by Sep 23 noon)
DPComp.org
Hay, Rastogi, Miklau, Suciu, "Boosting the Accuracy of Differentially Private Histograms Through Consistency", PVLDB 3(1) 2010 (PDF)
Sep 24 Discussion of mini project 1
Module II: Intro to Fairness
Fairness practicum (slides) + in-class exercise (code)
(deadline for choosing projects due by Oct 1 noon)
Angwin, Julia, "Machine Bias", Propublica 2016 (Article)
Oct 1 Fairness in ML I (slides)
In-class mini project 2 (part i) (files)
Dwork, Cynthia, "Fairness Through Awareness", ITCS 2012 (PDF)
Feldman, Michael, "Certifying and Removing Disparate Impact", KDD 2015 (PDF)
Oct 8 Fairness in ML II (slides)
In-class mini project 2 (part ii) (files)
(mini project 2 report due by Oct 14 noon)
Paper Reading Instruction
Hardt, Moritz, "Equality of Opportunity in Supervised Learning", NIPS 2016 (PDF)

Guidelines for Paper Reading & Writing a good critique (LEARN)
Oct 15 (reading week)

(project proposal due by Oct 21)
Oct 22 Topic 1: privacy v.s. fairness
(privacy and fairness, trade-offs, etc.)
Milklau et al, "Fair Decision Making using Privacy-Protected Data" ((PDF)) (nanantha)
Fain, Brandon "The Core of the Participatory Budgeting Problem", WINE 2017 (PDF) (kknopf)
Optional Reading: Ghodsi et. al, "Dominant Resource Fairness: fair allocation of multiple resource types", NSDI 2011 (PDF)

Kleinberg et al, "Inherent Trade-Offs in the Fair Determination of Risk Scores", ITCS 2017 (PDF) (v3menon, s34agarw)
Kifer, Machanavajjhala, "No Free Lunch in Differential Privacy", SIGMOD 2011
(PDF) (bkacsmar)
Optional Reading: Kifer, Machanavajjhala, "Pufferfish", ACM TODS 2014 (PDF); He et al, "Blowfish Privacy", SIGMOD 2015 (PDF)
Oct 29 Topic 2: sources of bias Pierson et al, "A large-scale analysis of racial disparities in police stops across the United States", (PDF) (jb2alawo)
Caliskan et al, "Semantics Derived Automatically from Language Corpora Contain Human-like Biases", (PDF) (utdas)
Torralba & Efros, "Unbiased Look at Dataset Bias" (PDF) (C447CHEN, jpremkum)
Optional Reading: Bakshy et al. "Exposure to ideologically diverse news and opinion on Facebook" (PDF)
Nov 05 Topic 3: fairness mechanisms

Zemel et al, "Learning Fair Representations", JMLR 2013, (PDF) (z97hu)
Bolukbasi et. al, "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings", NIPS 2016 (PDF) (D35KUMAR)
Kusner et. al, "Counterfactual Fairness", NIPS 2017 (PDF) (mniknami, mkaregar)
Optional Reading: Niki Kilbertus, "Avoiding Discrimination through Causal Reasoning", NIPS 2017 (PDF)
Nov 12 (mid-term report due by Nov 12 noon)
Project day
Nov 19 Topic 4: private machine learning
Chaudhuri et al, "Differentially Private Empirical Risk Minimization", JMLR 2011 (PDF) (S3MOHAPA)
Abadi et al, "Differentially Private Deep Learning", ACM CCS 2016 (PDF) (mrafuse, r5akhava)
Optional Reading: Mironov, "Renyi DP" (PDF)
Papernot et al, "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data", ICLR 2017 (PDF) (t3humphr)
Nov 26 Topic 5: deployments of DP Erlingsson et al, "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response", ACM CCS 2014 (PDF) (eeaton)
Optional Reading: Bittau et al, "Prochlo: Strong Privacy for Analytics in the Crowd", SOSP 2017 (PDF); Erlingsson et al, "Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity", SODA 2019 (PDF)
Kotsogiannis et al, "PrivateSQL", VLDB 2019 (PDF) (m2mazmud, harry)
Optional Reading: Ge et al, "APEx: accuracy-aware DP data exploration", SIGMOD 2019 (pdf)
Bater et al, "Shrinkwrap: Differentially-Private Query Processing in Private Data Federations", VLDB 2019 (PDF) (akassis)
Dec 03 Final presentation
Dec 10 (deadline for submitting final report)

Academic Integrity

Note that students are not generally permitted to submit the same work for credit in multiple classes. For example, if a student has reviewed or presented one of the papers in another seminar class, he or she should avoid reviewing or presenting it again for this class.

The general Faculty and University policy:

Note for Students with Disabilities

AccessAbility Services, located in Needles Hall, Room 1401, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with AccessAbility at the beginning of each academic term.