PhD Defence • Artificial Intelligence | Machine Learning • Towards Foundation Models for Text-Rich Multimodal Tabular Data

Tuesday, July 21, 2026 9:00 am - 12:00 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 2314 and online.

William Loh, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Tabular data has long been a foundational format in statistics, yet many of its challenges remain underexplored in modern machine learning. This dissertation investigates practical approaches for advancing tabular models through improved input representation and information transfer. First, it addresses continual adaptation by formulating tabular learning as a multi-armed bandit problem, proposing a framework based on Nadaraya-Watson kernel regression and Thompson sampling that enables models to adapt at inference time while guiding future data collection. The framework is supported by finite-sample theoretical guarantees and empirical improvements on bandit and news recommendation benchmarks.

Second, the dissertation examines the limitations of existing tabular representations and introduces Basis Transformers, a transformer-based architecture designed specifically for the heterogeneous and context-rich nature of tabular data. The model demonstrates strong performance across multi-task and related-task regression settings, outperforming gradient boosted decision trees, finetuned large language models, and comparable deep tabular models.

Finally, the dissertation explores the development of a multimodal tabular foundation model that incorporates column names and diverse data types to produce holistic representations. Experiments show promising zero-shot inference capabilities and improved performance on text-rich tabular datasets, contributing toward more scalable and generalizable tabular learning systems.


To attend this PhD defence in person, please go to DC 2314. You can also attend virtually on Zoom.