Yongjoo Park, Research fellow
Computer Science and Engineering, University of Michigan
Despite advances in computing power, the cost of large-scale analytics and machine learning remains daunting to small and large enterprises alike. This has created a pressing demand for reducing infrastructure costs and query latencies. To meet these goals, data analysts and applications are in many cases willing to tolerate a slight — but controlled — degradation of accuracy in exchange for substantial gains in cost and performance, which we refer to as statistical tradeoffs. This is particularly true in the early stages of data exploration and is in stark contrast to traditional tradeoffs where the infrastructure costs must increase for higher performance.
My research builds large-scale data systems that can make these statistical tradeoffs in a principled manner. In this talk, I will focus on two specific directions. First, I will present VerdictDB, a system that enables quality-guaranteed, statistical tradeoffs without any changes to backend infrastructure; thus, it offers a universal solution for off-the-shelf query engines. Second, I will introduce Database Learning, a new query execution paradigm that allows existing query engines to constantly learn from their past executions and become “smarter” over time without any user intervention. I will conclude by briefly discussing other promising directions with emerging workloads beyond SQL, including visualization and machine learning.
Bio: Yongjoo Park is a Research Fellow in Computer Science and Engineering at the University of Michigan, Ann Arbor. His research interest is software systems for fast data analytics and machine learning. He received a Ph.D. from the University of Michigan, advised by Michael Cafarella and Barzan Mozafari. He is a recipient of 2018 ACM SIGMOD Jim Gray Dissertation Award Runner-up, Kwanjeong Ph.D. Fellowship, and Jeongsong Graduate Study Fellowship.