Seminar • Data Systems • Query Evaluation for Big Data

Monday, March 28, 2022 11:30 am - 11:30 am EDT (GMT -04:00)

Please note: This seminar will be given online.

Xiao Hu, Visiting Researcher
Discrete Algorithm Group, Google Research

Query evaluation has been one of the core problems in databases for more than 40 years, while the need to process and analyze big data has invigorated this long-time research area with fresh challenges. Massively parallel data systems, such as MapReduce and Spark, have become an effective tool for handling large volumes of data, while query evaluation algorithms in these systems have to be designed so that they can scale to thousands of machines in parallel. Moreover, data is generated at very high speeds, which requires the query engine to deliver timely answers over dynamic databases. In addition, privacy has gained much more attention recently when more sensitive data is being analyzed, and query answers or even the evaluation process can leak information about input data. Beyond the traditional goal of efficiency, my research has also aimed at equipping query evaluation algorithms in modern data analytical systems with new features, such as scalability, timeliness, and privacy.

In this talk, I will focus on query evaluation for massively parallel systems for join queries, the most fundamental and practically important class of queries. I will describe the intrinsic relationship between the join structure and its parallel computational cost. In addition to a homogeneous parallel model, I will also discuss some new challenges when the underlying communication model takes an arbitrary topology. At last, I will briefly discuss some interesting open questions on query evaluation and conclude with exciting connections between query evaluation with other fields, such as machine learning, differential privacy, and high-performance computing.


Bio: Xiao Hu is a visiting researcher in the Discrete Algorithm group at Google Research, working with Badih Ghazi and Ravi Kumar. Before that, she was a postdoctoral associate in the Department of Computer Science at Duke University, co-supervised by Prof. Pankaj Agarwal and Prof. Jun Yang. Prior to that, she received her Ph.D. in Computer Science and Engineering from HKUST in 2019, and BE degree in Computer Software from Tsinghua University in 2014. 

Her research has focused on studying fundamental problems in database theory and their implications to practical systems. Her work on massively parallel join algorithms has been invited to ACM Transactions on Database Systems as a research paper, as well as a feature article in the Database Principles Column in SIGMOD Record.