Master’s Thesis Presentation • Data Systems • Test Collections for Web-scale Datasets Using Dynamic Sampling

Friday, December 10, 2021 10:00 am - 10:00 am EST (GMT -05:00)

Please note: This master’s thesis presentation will be given online. Please also note new date: Friday, December 10.

Anmol Singh, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Gordon Cormack

Dynamic Sampling is a non-uniform sampling strategy used for the construction of statistical test collections for evaluating information retrieval systems. Dynamic Sampling has been shown to lead to comparable or better test collections compared to pooling methods, at a fraction of the assessment effort.

We adapt a high-recall retrieval system to run a Dynamic Sampling protocol for web-scale datasets, and use this to create relevance assessments for 30 topics from the TREC 2019 Medical Misinformation Track. We compare our relevance assessments to qrels created using two pooling-based approaches. We also compare the official NIST qrels, which were based on ClueWeb12B (7% of the full dataset), to qrels based on the full ClueWeb12 dataset.


To join this master’s thesis presentation on Zoom, please go to https://uwaterloo.zoom.us/j/99934447820?pwd=ZkJwOGRSV2hpbThIajNPb01HUVRwQT09.

Please email Joe Petrik if you require the passcode for this presentation.