Professor Yaoliang Yu has received the 2026 Faculty of Mathematics Golden Jubilee Research Excellence Award. Established in 2017 to commemorate the 50th anniversary of the Faculty of Mathematics, the $2,500 award recognizes early- and mid-career faculty members for outstanding research contributions. Professor Yu won the award in the mid-career category.
“Congratulations to Yaoliang on this much-deserved award,” said Raouf Boutaba, University Professor and Director of the Cheriton School of Computer Science. “He has made significant contributions to the design and analysis of secure, private and interpretable AI systems. In particular, his recent work on trustworthiness of AI is a major advancement that reflects a rare combination of creativity, technical ability and visionary thinking in research.”

Yaoliang Yu is an Associate Professor at the Cheriton School of Computer Science, a Canada CIFAR AI Chair, and a faculty member at the Vector institute. His research focuses on developing efficient, scalable and robust algorithms for modern machine learning models and applications, with formal theoretical guarantees and analyses. He is also interested in applying machine learning techniques to vision and natural language applications.
Professor Yu held a Cheriton Faculty Fellowship from 2020 to 2023. In 2024, he received an Ontario Early Researcher Award to support his work on deep generative machine learning models. As of June 2026, his research publications have been cited more than 5,900 times with an h-index of 37 according to Google Scholar.
About Professor Yaoliang Yu’s research
Modern deep learning–based AI systems have transformed many areas of science and engineering, but their rapid adoption has also raised societal concerns. The most powerful AI systems today are typically trained on massive datasets from the internet, sometimes without sufficient consideration of data quality, copyright ownership, privacy protections or susceptibility to manipulation. These concerns have led to growing scrutiny of the deployment of AI systems, especially in safety-critical areas where robustness, transparency and explainability are as important as accuracy.
Professor Yaoliang Yu’s recent research advances understanding of several core challenges in trustworthy AI, including data poisoning, and privacy and copyright protection. His work has clarified the limitations of existing tools, identified previously unrecognized vulnerabilities, and developed new methods with theoretical guarantees.
Data poisoning
Modern machine learning models are increasingly trained on data at web scale, creating opportunities for adversaries to poison models by injecting malicious data into training pipelines. While data poisoning attacks have been demonstrated on simple models, Professor Yu’s TMLR 2022 work showed that modern deep architectures are surprisingly robust against data poisoning.
One plausible explanation is that the optimization involved is much more difficult than existing algorithms can solve. Professor Yu’s ICML 2023 work proved that there is another fundamental bottleneck: the amount of poisoned data. With his collaborators, he proposed an easily computable index to characterize the minimum amount of poisoned data needed by any algorithm to induce abnormal behaviour on the target model. For the first time, this work made it possible to check algorithmically if the proportion of poisoned data is great enough to degrade a model.
This work also showed that even for linear models such as logistic regression, an adversary would need an exceedingly large amount of poisoned data before the attacks become effective. Professor Yu’s work gave a formal explanation of why data poisoning can be difficult in practice and offered solutions to protect against existing poisoning attacks.
Generative AI copyright protection
Protection of copyright in generative AI systems has emerged as a major societal concern. Professor Yu’s research has revealed important vulnerabilities in existing protection methods and proposed new approaches that strengthen privacy safeguards.
In his ICML 2024 work, Professor Yu demonstrated that copyrighted content can be scrambled and hidden within seemingly innocent-looking training data. As a result, auditing training datasets alone may not be sufficient to determine whether copyrighted material was used during model development. Instead, more sophisticated analysis of a model’s internal representations may be required. His NeurIPS 2025 work further showed that copyright protection tools may provide a false sense of security. If an adversary has API access to these tools and is able to gather as few as five original data samples, the existing protections can become vulnerable.
Professor Yu’s research has also identified promising solutions. His ICML 2025 work demonstrated that diffusion-based generative AI models can be trained successfully on mostly noise-corrupted data. Because the training algorithm is restricted to largely noisy data with details significantly obscured, it provides a strong, rigorous privacy guarantee. A model trained on only 4% clean images and 96% noisy images, generated clean images with quality on par with those trained exclusively on clean images, representing a significant advance in privacy and security of modern image-generative models.
Model interpretability
As deep learning models grow in complexity, understanding how they make decisions has become increasingly challenging. While these models are incredibly powerful, they can operate as black boxes, making it difficult to explain their predictions. As a result, there is a strong interest in developing methods that make AI systems more transparent and accountable.
Game-theoretic methods, such as those based on the Shapley value, serve this purpose and are particularly popular among practitioners. However, computing these values is often computationally intractable, particularly for the large models and datasets used in modern AI.
Through a series of works, Professor Yu has developed randomized approximation algorithms that make these methods substantially more efficient while maintaining provable accuracy guarantees. His ICLR 2024 work introduced an algorithm that can estimate a large subset of probabilistic values including all those that are used in practice in time complexity n log(n), improving the previous baseline by a factor of n. And his NeurIPS 2024 work proposed a second algorithm that can simultaneously estimate all probabilistic values with average time complexity n log(n) and quadratic space complexity.
Most recently, his ICML 2026 work achieved the optimal linear-time and space complexity for a large subset of probabilistic values, a new result even for the widely used Shapley value, by finally removing the log n factor in previous works.
Broader contributions
Professor Yu’s other significant contributions include the first dynamic early exiting architecture DeeBERT (presented at ACL 2020) widely used to accelerate inference in transformer-based large language models; the theoretical framework ProxConnect (presented at NeurIPS 2021 and NeurIPS 2023) for understanding and analyzing neural network quantization heuristics; and the classification and unification of gradient-based multi-objective optimization algorithms in machine learning (presented at IEEE NSE 2022 and ICLR 2025).