PhD Seminar • Machine Learning | Information Retrieval • Advancing Multilingual RAG Systems: Retrieval, Relevance, and Generation Evaluation

Wednesday, February 5, 2025 12:00 pm - 1:00 pm EST (GMT -05:00)

Please note: This PhD seminar will take place online.

Nandan Thakur, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

As Retrieval-Augmented Generation (RAG) systems gain prominence for grounding large language models (LLMs) in external knowledge, constructing evaluation frameworks is critical in accelerating developments across multiple diverse languages.

This talk introduces a comprehensive multilingual RAG evaluation pipeline comprising three key components: retrieval, relevance assessment, and generation. MIRACL, a multilingual retrieval dataset with high-quality relevance judgments annotated by native speakers; NoMIRACL, a benchmark for assessing relevance in multilingual RAG, designed to measure LLM robustness against retrieval errors; and MIRAGE-Bench, an arena-based multilingual RAG evaluation framework integrating both heuristic metrics and surrogate judge models for multilingual generation evaluation. Together, these resources provide a foundation for advancing multilingual information access and enhancing the robustness of RAG systems. This talk highlights key findings from each section, challenges, and future work for multilingual RAG research.


Attend this PhD seminar online using Zoom.