Please note: This PhD seminar will take place in DC 3301.
Dake Zhang, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Mark Smucker
Online news readers often need help assessing trustworthiness. Such help can be in the form of investigative questions and evidence-grounded context. The TREC 2025 DRAGUN Track studies this assistive setting through two tasks: generating critical questions that a careful reader should ask about a news article and generating a concise, well-attributed report that summarizes what the reader should know. In this seminar, I will briefly recap the DRAGUN task design and rubric-based evaluation, in which TREC assessors conducted open-web research and created importance-weighted rubrics of investigative questions and expected short answers. These rubrics were used to evaluate how well system-generated questions and reports covered expert investigative priorities.
The main focus of the seminar will be the iterative multi-agent RAG system I developed for the DRAGUN track. The system simulates a lateral reader by repeatedly generating investigative queries, retrieving and filtering evidence from the MS MARCO V2.1 Segmented Corpus, evaluating whether the collected information is sufficient, and then producing both final investigative questions and a 250-word attributed report. Its components include a query generator, a three-stage segment retriever combining BM25+RM3, neural reranking, and LLM-based selection, an information evaluator, a question generator, and a report generator. I will discuss the system architecture, implementation choices, official DRAGUN results, and lessons learned. In the official evaluation, the GPT-4.1 version of the system achieved the highest mean supportive score for report generation among 28 submitted runs from 8 teams while maintaining a low contradictory score. I will conclude with remaining challenges, especially aligning systems with expert fact-checking priorities, prioritizing evidence under a strict word budget, and more tightly coupling question generation with report generation.