Master’s Thesis Presentation • Data Systems — Scaling Machine Learning Data Repair Systems for Sparse Datasets

Friday, December 11, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Please note: This master’s thesis presentation will be given online.

Omar Attia, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Ihab Ilyas

Machine learning data repair systems (e.g., HoloClean) have achieved state-of-the-art performance for the data repair problem on many datasets. However, these systems still face significant challenges when applied to sparse datasets.

In this work, we study the challenges presented by such datasets to machine learning data repair systems. We suggest dataset-independent methods to mitigate the effects of data sparseness. Finally, we present our results on a large, sparse real-world dataset: Census.


To join this master’s thesis presentation on Zoom, please go to https://us04web.zoom.us/j/9515296655?pwd=c2NOYTUzS3I3QU1GQlRndmN3dXNJQT09.