Please note: This master’s research paper presentation will take place online.
Muhammad Arsalan Khan, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Shane McIntosh
Version control is a key tool in a developer’s arsenal; however, common operations, such as cloning, may (inadvertently) expose user requests to the remote server or other entities capable of monitoring network traffic. This exposure reveals the repository being accessed to the server, raising privacy concerns for users who must or would prefer to keep their development activities confidential.
To address this concern, we propose GIT-PIR — a Private Information Retrieval (PIR) solution that enables users to privately clone Git repositories without disclosing the details of their request to the server. Our design and implementation of GIT-PIR features (1) an updated version of Git for the client-side and (2) the SimplePIR scheme. Through our experimental evaluation, we show that GIT-PIR users can clone repositories without loss of accuracy. The server incurs an overhead of 3,862 ms when cloning a 5 MB repository from a PIR database (i.e., a matrix representation of the hosted repositories) of 1.8 GB. Although our approach incurs a considerable relative overhead (mean of 735%), the absolute overhead remains below four seconds (mean of 3,862 ms) while the PIR execution time remains constant at approximately 2,700 ms. Given the improvements to privacy, this one-time cost during the cloning operation is likely tolerable. While larger experimental workloads are needed to scale our observations up to those of a modern social coding platforms (e.g., GitLab and GitHub), our initial results indicate that GIT-PIR offers a promising solution for enhancing the privacy of Git repository retrieval without imposing an unrealistic amount of overhead.