Master’s Thesis Presentation • Machine Learning — Cross-Lingual Entity Matching for Knowledge Graphs

Tuesday, December 8, 2020 11:00 am - 11:00 am EST (GMT -05:00)

Please note: This master’s thesis presentation will be given online.

Hsiu-Wei Yang, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent entities in different languages. The task of cross-lingual entity matching is to align entities in a source language with their counterparts in target languages. 

In this thesis, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other. Specifically, we apply graph convolutional networks (GCNs) to combine multi-aspect information of entities, including topological connections, relations, and attributes of entities, to learn entity embeddings. To exploit the literal descriptions of entities expressed in different languages, we propose two uses of a pre-trained multilingual BERT model to bridge cross-lingual gaps. We further propose two strategies to integrate GCN-based and BERT-based modules to boost performance. Extensive experiments on two benchmark datasets demonstrate that our method significantly outperforms existing systems. We also introduce a new dataset comprised of 15 low-resource languages and featured with unlinkable cases to draw closer to the real-world challenges.


To join this master’s thesis presentation on MS Teams, please go to https://teams.microsoft.com/l/meetup-join/19%3ameeting_MmE1MDhlZDktNGExNy00NjdkLTlmZTMtNWMyYWQwZGUxNTlk%40thread.v2/0?context=%7b%22Tid%22%3a%22723a5a87-f39a-4a22-9247-3fc240c01396%22%2c%22Oid%22%3a%22278677d6-38dd-41ef-bb10-77a7826c65e3%22%7d.