Seminar • Data Systems • Towards Intelligent, Scalable and User-friendly Data Pipelines

Monday, February 24, 2025 10:30 am - 11:30 am EST (GMT -05:00)

Please note: This seminar will take place in DC 1304.

Jin Wang, Research Scientist
Megagon Labs

Nowadays data-driven approaches have become a mainstream research methodology in multiple communities. To support effective and scalable data science applications on the ever growing datasets, researchers from both academic and industrial fields have made great efforts in building end-to-end data pipelines.

In this talk, I will present my work in improving the data pipelines from three aspects. First of all, I would like to present my efforts in boosting the result quality. To be more specific, I will introduce a new paradigm for data preparation based on self-supervised learning and its application in several downstream tasks such as dataset discovery, table annotation and entity matching. In addition, I will also give a high level introduction of my efforts in the aspects of enhancing user experience and reducing the processing time of data pipelines, respectively. Finally, I will conclude with the vision for future work of data pipelines.


Biography: Jin Wang is a research scientist from Megagon Labs. Before that he obtained his PhD degree of computer science from University of California, Los Angeles in July 2020 under the supervision of Professor Carlo Zaniolo.

His research interests lie in the board area of data management and data science. In particular, his work focuses on Database systems, Datalog, Data Integration and Table Representation Learning. His work appears in top venues of data management such as SIGMOD, VLDB, ICDE and VLDB Journal.

He regularly serves as the PC members for leading conferences of data management and data mining and wins the Distinguished Reviewer Award of PVLDB Vol. 17 (2024). More information could be found on his website: https://www.jinwang18.net/