Please note: This PhD seminar will be given online.
Alireza
Heidarikhazaei, PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Ihab Ilyas
We introduce a learning framework for the problem of unifying conflicting data into an original representation, which we call “record fusion”. This approach expresses record fusion as a learning problem over probabilistic models. This design relaxes some assumptions of two familiar problems “data fusion” and “golden record” so that it can solve both at the same time. In contrast to preceding approaches, our method can achieve high performance with or without the record’s source information, and in both cases, outperform state-of-art baselines. Furthermore, we show how our resolution can solve the problem of scarcity of training data for the model. We show that our framework fuses records with an average precision of ~98% when source information is available, and ~94% without source information across a diverse array of datasets that exhibit various properties like labeled data and records source information. We compare our approach to a comprehensive collection of data fusion and entity consolidation methods, ranging from source information related methods to approaches that do not need any source information. We show that our approach can achieve an average improvement of ~ 20 / ~ 45 precision points with/without source information, while it can also generate high quality artificial training data in the case of not enough labeled data.
To join this PhD seminar on Zoom, please go to https://us02web.zoom.us/j/83326411204?pwd=Z3dNVUxIK01PMXY3MTlXaHNVckJqdz09.