Please note: This master’s thesis presentation will take place in DC 3317.
Linh
Nhi
Phan
Minh,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Mark Smucker
Preference judging has been proposed as an effective method to identify the most relevant documents for a given search query. In this thesis, we investigate the degree to which assessors using a preference judging system are able to consistently find the same top documents and how consistent they are in their own preferences. We also examine to what extent variability in assessor preferences affect the evaluation of information retrieval systems. We designed and conducted a user study where 40 participants were recruited to preference judge 30 topics taken from the 2021 TREC Health Misinformation track.
The research study found that the number of judgments needed for preference judging a topic is about twice the number of documents in that topic. It also suggests that relying on just one non-professional assessor to do preference judging is not sufficient for evaluating information retrieval systems. Additionally, the study showed that preference judging to find the top-10 documents does significantly change the rankings of runs as compared to the rankings reported in the TREC 2021 Health Misinformation track, with most changes happening among the lower-ranked runs rather than the top-ranked runs.
Overall, this thesis provides insights into assessor behaviour and assessor agreement when using preference judgments for evaluating information retrieval systems.