Mina Farid, PhD candidate
David R. Cheriton School of Computer Science
One challenge that faces most extraction tools is the long tail of information. Entities that lie in the long tail do not have enough mentions in the text, limiting their relevant context. The absence of enough repetition restricts the extraction of property values with high confidence.
In this talk, we present an approach to estimate property values of long tail entities. Our approach does not rely on the direct extraction of property values from the text. Instead, we simulate how humans integrate background knowledge into drawing conclusions and extrapolating knowledge to unknown entities. For example, an advanced user might infer that the weight of a boxing player in the middleweight division is approximately 165 pounds, even if this information is not explicitly mentioned in the text. By associating the unknown player entity to a relevant community of head entities and having background knowledge about the weight of entities in that community, we produce a distribution of the value of the weight property for the unknown player entity. Our approach leverages the fewer features available in the text to infer other features that help in estimating the target property.
To join this PhD seminar on Zoom, please go to https://zoom.us/j/98492300181?pwd=b1hlaC9ITTk3bTdYZVpvOEd5VnF4UT09.
Meeting ID: 984 9230 0181
200 University Avenue West
Waterloo, ON N2L 3G1