PhD Seminar • Machine Learning • MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Cheriton School of Computer Science

Wednesday, September 11, 2024 1:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place online.

Hossein Aboutalebi, PhD candidate
David R. Cheriton School of Computer Science

Supervisors: Professors Justin Wan and Alex Wong

Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs. Previous approaches augment textual dialogues with retrieved images, posing privacy, diversity, and quality constraints. In this work, we introduce Multimodal Augmented Generative Images Dialogues (MAGID), a framework to augment text-only dialogues with diverse and high-quality images 1 . Subsequently, a diffusion model is applied to craft corresponding images, ensuring alignment with the identified text. Finally, MAGID incorporates an innovative feedback loop between an image description generation module (textual LLM) and image quality modules (addressing aesthetics, image-text matching, and safety), that work in tandem to generate high-quality and multi-modal dialogues. We compare MAGID to other SOTA baselines on three dialogue datasets, using automated and human evaluation. Our results show that MAGID is comparable to or better than baselines, with significant improvements in human evaluation, especially against retrieval baselines where the image database is small.

Attend this PhD seminar on Microsoft Teams.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Online PhD seminar
Waterloo, ON, CA N2L 3G1

Location coordinates: