Data Systems Seminar Series • Invisible Yet Powerful: Watermarking to Protect Datasets and Models in Machine Learning | Cheriton School of Computer Science

Monday, July 14, 2025 10:30 am - 11:30 am EDT (GMT -04:00)

Please note: This seminar will take place in DC 1304.

Lingyang Chu, Assistant Professor
McMaster University

The rapid advancement of AI has transformed both datasets and models into valuable assets, yet they remain vulnerable to unauthorized use, theft, and replication. Watermarking provides a promising solution by embedding verifiable ownership signals to establish ownership protection. Traditional database watermarking techniques assume that attackers seek to preserve query utility, which inherently restricts the extent of modifications they can apply to the data. However, this assumption does not hold for machine learning, where models can maintain predictive performance even when trained on significantly altered datasets. As a result, adversaries can heavily modify a dataset or distill a model while preserving its learning utility, which enables much stronger watermark removal attacks than those in traditional database watermarking.

How can we design watermarking methods that safeguard AI-related assets against these threats while maintaining their usability?

This talk presents our recent research on addressing the novel challenges in watermarking tabular datasets and deep learning models in the context of machine learning. First, I will introduce TabularMark, a non-blind watermarking framework that embeds verifiable ownership signals into tabular datasets while ensuring that models trained on watermarked data retain high predictive performance. Second, I will discuss blind watermarking for numerical tabular datasets, which enables watermark verification without requiring access to the original data, making it more practical for real-world data-sharing scenarios. Third, I will introduce a robust model watermarking approach that embeds ownership signals into deep neural networks to withstand ensemble distillation attacks. Finally, I will conclude with open challenges and future directions.

Bio: Lingyang Chu is an Assistant Professor in the Department of Computing and Software at McMaster University. He received his Ph.D. in Computer Science from the University of Chinese Academy of Sciences. Before joining McMaster University, he was a postdoctoral fellow at Simon Fraser University and a Principal Researcher at Huawei Technologies Canada.

His research focuses on data mining, explainable machine learning, and trustworthy computing, with a growing focus on data security in database systems. Some of his recent works explore AI-related data watermarking techniques to ensure data integrity and provenance in large-scale systems and data markets.

He is an Associate Editor of ACM Transactions on Knowledge Discovery from Data (TKDD) and he also served as a program committee member and reviewer for conferences and journals including SIGMOD, VLDB, KDD, ICDE, ICDM, CIKM, CVPR, NeurIPS, ICML, ICLR, ACM Multimedia, TKDE, TMM, etc.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
DC 1304
Waterloo, ON, CA N2L 3G1

Location coordinates: