Please note: This master’s thesis presentation will be given online.
Mohammadali Niknamian, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Ihab Ilyas
Knowledge graphs are one of the most important resources of information in many applications such as question answering and social networks. These knowledge graphs however, are often far from complete as there are so many missing properties and links between entities. This greatly affects their usefulness in applications that they are used in. Many methods have been proposed to alleviate this problem. One of the most prominent and studied subjects in this area are the graph embedding and link prediction methods. However, these methods only consider the relations between entities in knowledge graphs and completely ignore their literal values and properties that account for 41% of the facts in the knowledge graph YAGO4. They also do not scale for large knowledge graphs and their inference process for imputing missing links is by nature quadratic with respect to the number of entities in the knowledge graph. Furthermore, the embedding vectors that represent entities and relations might not be able to capture information that is necessary for inference for millions of entities that exist in large-scale knowledge graphs.
We present a novel method based on the HoloClean’s framework — a powerful cleaning tool for relational data. Our system is designed based on the open-source HoloClean and can be used to integrate multiple and different signals from various knowledge graph completion methods which allows us to holistically tackle this problem. We have done a thorough experiment on the YAGO4 dataset with 5M entities and 20M facts and we were able to enlarge the knowledge graph by roughly 12% with an average precision of 0.81 on 162 different classes.
200 University Avenue West
Waterloo, ON N2L 3G1