Please note: This PhD defence will take place online.
Saber Malekmohammadi, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Yaoliang Yu
A remarkable progress in machine learning (ML) technologies has happened during the past decade. Federated learning (FL) is a decentralized ML setting, where some clients collaborate with each other to train a model under the orchestration of a central server. Clients, for instance, can be some mobile devices (in cross-device FL) or some organizations (in cross-silo FL). In FL, the training data of clients remains decentralized, mitigating many systemic privacy risks and costs existing in centralized ML. This thesis investigates three main sub-problems of FL, including: 1. Convergence analysis of FL algorithms 2. Providing formal data privacy guarantees to clients through differentially private FL (DPFL) 3. Performance fairness for clients in DPFL systems under structured data heterogeneity.
Over the past years, the FL community has witnessed a plethora of new algorithms. However, there is a lack of thorough comparison of these algorithms and the theory behind them. Our fragmented understanding of the theory behind the existing algorithms is the reason behind the lack of such a formal comparison. This is the main focus of the first chapter, where we show that many of the existing FL algorithms can be understood from the lens of operator splitting theory. This unification allows us to compare different algorithms easily, to refine previous convergence results and to propose new algorithmic variants. Our new theoretical findings reveal the remarkable role played by the step size in FL algorithms. Furthermore, the unification allows us to propose a simple and economic way to accelerate the convergence of different algorithms, which is indeed vital, as communication between the server and clients is often considered as an overhead in FL systems.
Despite the fact that FL provides clients with some level of informal data privacy by operating on their decentralized data, the orchestrating server or other malicious third parties can still attack the data privacy of participating clients. Consequently, FL has been augmented with differential privacy (DP) in order to provide rigorous data privacy guarantees to clients. In DPFL, which is the focus of the second chapter, depending on clients’ privacy policies, there is often heterogeneity in their privacy requirements. This heterogeneity as well as the heterogeneity in batch and/or dataset sizes of clients lead to a variation in the DP noise level across clients’ model updates. Hence, straightforward aggregation strategies, e.g. assigning clients’ aggregation weights proportional to their privacy parameters, will lead to a low utility for the system. We propose a DPFL algorithm which efficiently estimates the true noise level in clients’ model updates and uses an aggregation strategy to reduce the noise level in the aggregated model update. Our proposed method improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameters to the server to attack the system’s utility.
Finally, in the third chapter, we investigate the intersection of FL, DP and performance fairness, under structured data heterogeneity across clients. This type of data heterogeneity is often addressed through clustering clients (a.k.a. clustered FL). However, the existing clustered FL methods remain sensitive and prone to errors, further exacerbated by the DP noise in the system. In the third chapter, we propose a robust algorithm for differentially private clustered FL. To this end, we propose to use large batch sizes in the first round and cluster clients based on both their model updates as well as their training loss values. Furthermore, for clustering clients’ model updates at the end of the first round, our proposed approach addresses the server’s uncertainties by employing Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise and leads to a high accuracy in clustering clients.