Canadian AI Federated Learning Workshop

Location: Sheraton Hotel, 123 Queen St W, Toronto, ON M5H 2M9

The workshop wil be held in the Chestnut room and Pine ballroom on the Mezzanine floor of the Sheraton hotel. All presentations will be in the Chestnut room and all breakfasts, lunches and coffee breaks will be in the Pine ballroom. Dinner on Monday night will be at Hy's Steakhouse & Cocktail Bar, 365 Bay St., Toronto, ON M5H 2V1.

Monday, October 24

Time	Speaker	Theme	Title and Abstract
8:00-9:00	Breakfast (Pine ballroom)
9:00-9:15	Randolph Goebel (University of Alberta)	Introduction
9:15-9:30	Wen Tong (Huawei)	Introduction
9:30-10:10	Xiaoxiao Li (UBC)	Algorithms (heterogeneity)	Theory-driven federated learning algorithms for heterogeneous data Federated learning (FL) is a trending framework to enable multi-institutional collaboration in machine learning without sharing raw data. This presentation will discuss our ongoing progress in designing FL algorithms that embrace the data heterogeneity properties for multi-institutional data analysis in the FL setting. I will present our algorithms for tackling feature and label heterogeneity, motivated by our previous theoretical foundation. I will also show the promising results of applying our FL algorithms in healthcare applications.
10:10-10:40	Coffee break (Pine ballroom)
10:40-11:20	Pascal Poupart (University of Waterloo)	Algorithms (uncertainty)	Uncertainty Aware Federated Learning Federated Learning (FL) has gained popularity for combining models trained on edge devices without the data leaving its owner’s device. Since each edge device may not have a lot of data, several questions arise: How do we avoid overfitting? How do we quantify model uncertainty? How can the uncertainty of local models be taken into account during their aggregation into a global model? How can we calibrate the predictive uncertainty? In this talk, I will describe two Bayesian FL techniques that quantify uncertainty and then use this uncertainty to obtain robust global models with improved predictions. The first technique is based on Gaussian processes, which express a distribution over the parameters of the last layer of a predictor. The second technique directly computes a distribution over predictions while reducing the amount of communication between edge devices and the server to a single round of messages.
11:20-12:00	Nidhi Hegde (University of Alberta)	Algorithms (graphs)	Federated learning on non-disjoint graphs Distributed collaborative paradigms for machine learning, such as federated learning, have shown great promise in building high performing machine learning models from separate data sources when combining them is prohibitive due to privacy and security concerns. In much of the prior work, each dataset is assumed to contain independent data points. However, in reality, there is often an underlying graph that structures the data points. Such structures emerge in data on social networks, bank transactions data, healthcare data, and other such data where there is a notion of similarity or relation that links data points. Standard federated learning frameworks do not consider graph data, miss key insights from the graph structure, and require too much communication overhead for graph learning models. We consider this scenario of learning a combined machine learning model from graph-based data distributed at separate sources. We assume the presence of a small set of data points (nodes) existing at multiple sources. We propose a federated learning model where clients learn on their graphs separately but share a minimal set of intermediate model information to allow for a more accurate global model. We show through experiments on real data that our model provides more accurate node classification compared to the baseline federated learning algorithm, even with minimal communication and computation overhead.
12:00-13:00	Lunch (Pine ballroom)
13:00-13:40	Xi (Alex) Chen (Huawei)	Application (telecommunications)	AI in 6G In this talk, we will share Huawei’s vision on how AI and 6G will revolutionize each other in the near future. One on hand, we will talk about the six key capabilities of 6G, and how AI could help provide and enhance these capabilities. On the other hand, we will share a new design of communications in 6G to realize a world of connected intelligence. Federated learning, as a promising candidate of learning in the 6G paradigm, has been explored by different Huawei businesses. Some application scenarios and challenges of real-world FL will be shared as well.
13:40-14:20	Julien Cohen-Adad (Ecole Polytechnique)	Application (vision)	Federated learning for medical image analysis: Fully-integrated solution across Quebec hospitals with the CODA-19 infrastructure Federated learning has gained important momentum for applications in medical image analysis tasks such as segmentation. While a few proof-of-concept examples have appeared in the past few years (eg: https://www.med.upenn.edu/cbica/fets/), most of them focused on the specific assessment of a FL strategy, without proposing a sustainable and fully-integrated solution inside the clinical environment. Here we will discuss the CODA-19 infrastructure, which is a collaborative effort between Hospitals in Quebec and Ontario to build a standardized database of demographic, biological signals and images, directly accessed from the Hospital PACS. Each site can train models and exchange model parameters using secured communication protocols between other sites.
14:20-14:50	Coffee break (Pine ballroom)
14:50-15:30	Petr Musilek (University of Alberta)	Application (energy)	Federated Learning for Short-Term Electrical Load Forecasting Electrical load forecasting is an integral part of power system operation. However, sharing electricity consumption data of individual households for load prediction may compromise user privacy and can be expensive in terms of communication resources. Federated learning methods can take advantage of the data without centrally storing it. We discuss the advantages and disadvantages of federated learning approach for load forecasting by comparing it to centralized and local learning schemes. We also evaluate its forecasting performance for individual house loads as well as the aggregate load of a distribution circuit. In addition, we describe a novel client clustering method to reduce the convergence time.
15:30-17:00	Brief introductions and panel discussion (moderator: Randy Goebel) Euijin Choo (University of Alberta) Joseph Ross Mitchell (University of Alberta) Steve Drew (University of Calgary) Gauthier Gidel (Universite de Montreal) Chris Pal (Ecole Polytechnique) Michael Brudno (University of Toronto) Hongyang Zhang (University of Waterloo)
18:00-20:00	Dinner (Hy's Steakhouse & Cocktail Bar, 365 Bay St., Toronto, ON M5H 2V1)

Tuesday, October 25

Time	Speaker	Theme	Title
8:00-9:00	Breakfast (Pine ballroom)
9:00-9:40	Gautam Kamath (University of Waterloo)	Privacy/application (NLP)	Differentially Private Fine-tuning of Language Models We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of 87.8% using RoBERTa-Large and 83.5% using RoBERTa-Base with a privacy budget of ϵ=6.7. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of 90.2%. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of ϵ=6.8,δ=1e-5) whereas the non-private baseline is 48.1. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.
9:40-10:20	Yuhong Guo (Carleton University)	Privacy	Information Sharing with Privacy Preservation Information sharing is a tool for reducing the cost of data annotation, which is a bottleneck for supervised machine learning. Meanwhile, data privacy has become an important issue in real-world applications where personal or commercial data are involved. In this talk, I will introduce the source-free domain adaptation problem and methods as a solution to support information sharing across domains, while respecting data privacy. This study can be extended into federated learning.
10:20-10:50	Coffee break (Pine ballroom)
10:50-11:30	Bei Jiang (University of Alberta)	Privacy	A general differentially private learning framework for decentralized data Decentralized consensus learning has been hugely successful, which minimizes a finite sum of expected objective functions over a network of nodes. However, the local communication across neighboring nodes in the network may lead to the leakage of private information. To address this challenge, we propose a general differentially private (DP) learning framework for decentralized data that applies to many non-smooth learning problems. We show that the proposed algorithm retains the performance guarantee in terms of stability, generalization, and finite sample performance. We investigate the impact of local privacy-preserving computation on the global DP guarantee. Further, we extend the discussion by adopting a new class of noise-adding DP mechanisms based on generalized Gaussian distributions to improve the utility-privacy trade-offs. Our numerical results demonstrate the effectiveness of our algorithm and its better performance over the state-of-the-art baseline methods in various decentralized settings.
11:30-12:10	Mohamed Amine Merzouk (Ecole Polytechnique)	Privacy	The threat of malicious clients in federated learning The anonymity of participating clients is a foundation of federated learning, especially in cross-device settings. However, the lack of trust in clients raises serious security issues. In this talk, we will explore the security challenges that a malicious client introduces in federated learning.
12:10-13:10	Lunch (Pine ballroom)
13:10-13:50	Yaoliang Yu (University of Waterloo)	Algorithms (optimization)	A Unifying Framework for Federated Learning There have been multiple federated learning (FL) algorithms proposed in the FL community during the recent years. However, a thorough comparison of these algorithms is largely missing, and our understanding of the theory of FL is still limited. The lack of a unifying view in practice has also led to the reinvention of the same algorithms under different names. Motivated by this gap, we develop a unifying scheme for FL and demonstrate that many of the algorithms that exist in the FL literature are special cases of this scheme. The unification allows us to get a deeper understanding of different FL algorithms, to compare them more easily, to improve the previous convergence analysis and to find new algorithmic variants. In particular, we demonstrate the important role that step size plays in the convergence of FL algorithms. Furthermore, based on our unifying scheme, we propose an efficient and economic method for accelerating FL algorithms. This streamlined acceleration method does not incur any communication overheads. We evaluate our findings by performing extensive experiments on both nonconvex and convex problems.
13:50-14:30	Linglong Kong (University of Alberta)	Algorithms (optimization)	Sample Averaging Approximation for Conditional Stochastic Optimization with Non-IID Data and Its Application in Federated Learning Sample average approximation (SAA), a novel approach for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in machine learning tasks with time series data or distributed computing with Markovian training samples. In this paper, we focus on the statistical challenge of SAA with non-i.i.d. data through Conditional Stochastic Optimization (CSO), which finds a wide spectrum of applications including federated learning, peer-to-peer optimization, reinforcement learning, and causal inference. We derive exponentially decay error bounds using rigorous probabilistic error analysis. In addition, we show that SAA for CSO remains tractability when the distribution of unknown parameters is only observable through dependent instances and still enjoys asymptotic consistency and finite sample guarantees. In the same setting, we also establish the sample complexity of SAA for CSO under a variety of structural assumptions, such as Lipschitz continuity. The theoretical results are verified by numerical experiments in the dependent setting.
14:30-15:00	Coffee break (Pine ballroom)
15:00-15:40	Guojun Zhang (Huawei)	Algorithms (fairness)	Proportional Fairness in Federated Learning With the increasingly broad deployment of federated learning (FL) systems in the real world, it is critical but challenging to ensure fairness in FL, i.e., reasonably satisfactory performances for each of the numerous diverse clients. In this work, we introduce and study a new fairness notion in FL, called Proportional Fairness (PF), which is based on the relative change of each client's performance. From its connection with the bargaining games, we propose PropFair, a novel and easy-to-implement algorithm for finding proportionally fair solutions in FL, with its convergence proved. Through extensive experiments on vision and language datasets, we demonstrate that PropFair can approximately achieve PF solutions. Moreover, it consistently achieves a noticeable improvement of the worst 10% accuracy over state-of-the-art fair FL algorithms, while the overall performance remains competitive.
15:40-17:00	Brief introductions and panel discussion (moderator: Pascal Poupart) Xi He (University of Waterloo) Jun Chen (McMaster University) Roozbeh Razavi Far (University of Windsor) Eugene Belilovski (Concordia University) Martin Carrier-Vallieres (McGill University) Mathias L'Ecuyer (UBC)