More than 94% of current enterprises – including 83% of healthcare organizations – rely on cloud services, especially for their infrastructure and data storage needs. Organizations outsource their storage to third party cloud providers because of the high cost associated with owning and maintaining an on-premise storage or compute fleet. However, outsourcing an application’s data in plaintext can reveal sensitive information to a potentially nontrustworthy cloud provider. While encrypting the data forms the first obvious solution to ensure data privacy, a growing body of attacks exploit side-channel information such as access frequency or duration on encrypted data to uncover plaintext data. Such attacks are called inference or access pattern attacks. For example, the duration and frequency with which an oncologist accesses (encrypted) data can reveal the type of a patient’s cancer (e.g., based on the frequency and intervals of chemotherapy treatments).
In this course, we will study the design and implementation of private data systems that protect applications against access pattern attacks. We begin the course by reviewing the basic cryptographic encryption schemes, followed by understanding an encrypted database and the potential leakages in it as a motivation to learn about systems with stronger privacy guarantees. This course will cover data systems that use four types of cryptographic primitives: oblivious RAM, trusted hardware enclaves, secure multi party computation, and private information retrieval. Note that since this course is offered under data systems category, the discussions will focus primarily of the database design concepts.
The final grade for the course will be based on the following components:
Reviews (10%): Each week will study two research papers. Every student is expected to write at least one review in the duration of the course. Each review should be no more than 1000 words and contain the following sections, following the typical format of reviews in database conferences:
Week | Dates | Topic | Speaker | Readings |
---|---|---|---|---|
1 | Sep 3rd | No Class! | 2 | Sep 10th | Course Intro & Background on crypto and DB basics. | Sujaya Maiyya | Required: Textbook Chapter 0 and Chapter 2. Recommend reading other chapters as well. Slides |
3 | Sep 17th | Intro to ORAM and TEEs. | Sujaya Maiyya | Reading required but review not required: (1) PathORAM up to Section 4. (2) RingORAM. Slides |
4 | Sep 24th - 1 | CrypDB | Student 1 | Required: CryptDB: protecting confidentiality with encrypted query processing |
Sep 24th - 2 | Attack on CryptDB | Student 2 | Required: Inference Attacks on Property-Preserving Encrypted Databases | |
5 | Oct 1st - 1 | Parallel, scalable, and fault tolerant ORAM | Student 3 | Required: Treebeard: A Scalable and Fault Tolerant ORAM Datastore |
Oct 1st - 2 | Transactions in ORAM | Student 4 | Required: Obladi: Oblivious Serializable Transactions in the Cloud | |
6 | Oct 8th - 1 | Alternate techniques to ORAM (1) | Student 5 | Required: Pancake: Frequency Smoothing for Encrypted Data Stores |
Oct 8th - 2 | Alternate techniques to ORAM (2) | Student 6 | Required: Waffle: An Online Oblivious Datastore for Protecting Data Access Patterns | |
7 | Oct 13th | No Class - reading week. | ||
8 | Oct 22nd - 1 | Scalable oblivious relational queries (1) | Student 7 | Required: SEAL: Attack Mitigation for Encrypted Databases via Adjustable Leakage |
Oct 22nd - 2 | Scalable oblivious relational queries (2) | Student 8 | Required: OasisDB: An Oblivious and Scalable System for Relational Data | |
9 | Oct 29th - 1 | TEE-based (hierarchical) ORAM | Student 9 | Required: H2O2RAM: A High-Performance Hierarchical Doubly Oblivious RAM |
Oct 29th - 2 | TEE-based oblivious scalable DB | Student 10 | Required: Snoopy: Surpassing the Scalability Bottleneck of Oblivious Storage | |
10 | Nov 5th - 1 | TEE-based relational DB (1) | Student 11 | Required: Menhir: An Oblivious Database with Protection against Access and Volume Pattern Leakage |
Nov 5th - 2 | TEE-based relational DB (2) | Student 12 | Required: OBLIVIATOR: OBLIVIous Parallel Joins and other OperATORs in Shared Memory Environments | |
11 | Nov 12th - 1 | Circuit based encrypted DB | Student 13 | Required: Arx: An Encrypted Database using Semantically Secure Encryption |
Nov 12th - 2 | Circuit based federated db | Student 14 | Required: Alchemy: A Query Optimization Framework for Oblivious SQL | |
12 | Nov 19th - 1 | Secret sharing based DB | Student 15 | Required: Information-Theoretically Secure and Highly Efficient Search and Row Retrieval |
Nov 19th - 2 | PIR based kv store | Student 16 | Required: Pantheon: Private Retrieval from Public Key-Value Store | |
13 | Nov 26th - 1 | PIR based semantic search | Student 17 | Required: Private Web Search with Tiptoe |
Nov 26th - 2 | ORAM based semantic search | Student 16 | Required: Compass: Encrypted Semantic Search with High Accuracy | |
14 | Dec 3rd | Project presentations |
Mental Health: If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support.
On-campus Resources
Off-campus Resources
Diversity: It is our intent that students from all diverse backgrounds and perspectives be well served by this course, and that students’ learning needs be addressed both in and out of class. We recognize the immense value of the diversity in identities, perspectives, and contributions that students bring, and the benefit it has on our educational environment. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In particular:
MOSS (Measure of Software Similarities) is used in this course as a means of comparing students' assignments to ensure academic integrity. We will report suspicious activity, and penalties for plagiarism/cheating are severe. Please read the available information about academic integrity very carefully.
Discipline cases involving any automated marking system such as Marmoset include, but are not limited to, printing or returning values in order to match expected test results rather than making an actual reasonable attempt to solve the problem as required in the assignment question specification.