- This event has passed.
Colloquium – Xiao Hu
April 8 @ 10:00 am - 11:00 am
April 8, 2021
Host: Jeff Phillips
Query Evaluation for Big Data
Abstract: Query evaluation has been one of the core problems in databases for more than 40 years, while the need to process and analyze big data has invigorated this long-time research area with fresh challenges. Massively parallel data systems, such as MapReduce and Spark, have become an effective tool for handling large volumes of data, while query evaluation algorithms in these systems have to be designed so that they can scale to thousands of machines in parallel. In addition, data is generated at very high speeds, which requires the query engine to deliver timely answers over dynamic databases, and ensure answers with robust qualities. Beyond the traditional goal of efficiency, my research has also aimed at equipping query evaluation algorithms in modern data analytical systems with new features, such as scalability, timeliness, and veracity.
In this talk, I will focus on query evaluation for massively parallel systems for join queries, the most fundamental and practically important class of queries. I will describe the intrinsic relationship between the join structure and its parallel computational cost. In addition to a homogeneous parallel model, I will also discuss some new challenges when the underlying communication model takes an arbitrary topology. At last, I will briefly discuss some interesting open questions on query evaluation over dynamic databases, and conclude with exciting connections between query evaluation with other fields, such as machine learning, differential privacy, and high-performance computing.
Xiao Hu is a postdoctoral associate in the Department of Computer Science at Duke University, co-supervised by Prof. Pankaj Agarwal and Prof. Jun Yang. Prior to that, she received her Ph.D. in Computer Science and Engineering from HKUST, and BE degree in Computer Software from Tsinghua University. Her research has focused on studying fundamental problems in database theory and their implications to practical systems. Her work on massively parallel join algorithms has been invited to ACM Transactions on Database Systems as a research paper, as well as a feature article in the Database Principles Column in SIGMOD Record.