PhD Seminar • Data Systems | Artificial Intelligence • How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense RetrievalExport this event to calendar

Friday, April 5, 2024 — 10:00 AM to 11:00 AM EDT

Please note: This PhD seminar will take place in DC 3317 and online.

Sheng-Chieh (Jack) Lin, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA).

Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our Dense Retriever trained with diverse AuGmentatiON, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction.

The paper is available at https://aclanthology.org/2023.findings-emnlp.423/.


To attend this PhD seminar in person, please go to DC 3317. You can also attend virtually using Zoom at https://uwaterloo.zoom.us/j/94259213051.

Location 
Hybrid: DC 3317 | Online PhD seminar
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
4
  1. 2024 (121)
    1. May (5)
    2. April (39)
    3. March (27)
    4. February (25)
    5. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)