DSG Seminar Series • A Vision for Data Alignment and Integration in Data Lakes

Thursday, June 23, 2022 11:30 am - 12:30 pm EDT (GMT -04:00)

Please note: This seminar will be given in person in DC 1302 and livestreamed.

Renée Miller, University Distinguished Professor of Computer Science
Khoury College of Computer Sciences, Northeastern University

The requirements for integration over massive, heterogeneous table repositories (aka data lakes) are fundamentally different than they are for federated data integration (where the data owned by an enterprise is integrated into a cohesive whole) or data exchange (where data is exchanged and shared among a small set of autonomous peers).

In this talk, I will outline a vision for data alignment and integration in data lakes. Data lakes afford new opportunities for using new methods, from network science and other areas, to discover emergent semantics from large heterogeneous collections of data sets. I will illustrate these ideas by discussing the problem of data lake disambiguation, work which received the best paper award in EDBT 2021.


Bio: Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University. She is a Fellow of the Royal Society of Canada and received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM.

Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her colleagues received the ICDT Test-of-Time Award and the 2020 Alonzo Church Award for Outstanding Contributions to Logic and Computation for their influential work establishing the foundations of data exchange. In 2020, she received the CS-Can/Info-Can Lifetime Achievement Award in Computer Science.

Professor Miller is an Editor-in-Chief of the VLDB Journal and former president of the Very Large Data Base (VLDB) Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.