PhD Defence • Artificial Intelligence | Machine Learning • Efficient Inference-time Control and Alignment | Cheriton School of Computer Science

Monday, April 6, 2026 10:30 am - 1:30 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 2310 and online.

Ahmad Rashid, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Modern foundation models are typically trained in three broad stages. First, large-scale pre-training is performed using self-supervised learning on massive corpora. Second, models are adapted through mid-training using supervised fine-tuning or instruction tuning on labeled datasets. Finally, a post-training stage is often applied using preference data and reinforcement learning in order to align the model and improve its safety, reliability, and usefulness.

Although effective, post-training methods can be computationally expensive and inflexible once large models are deployed. This thesis explores an alternative paradigm: enforcing behavioral objectives at inference time rather than modifying model parameters during post-training. In this approach, smaller modular control models are combined with a base model to shape predictions during the decision process. Our aim is to design alignment mechanisms that are both mathematically grounded and empirically strong while remaining computationally efficient and easy to deploy.

We apply this perspective of inference-time control to three problems. First, we ad-dress reliability in neural classifiers. We introduce PreLoad, an inference-time mechanism that mitigates pathological overconfidence on inputs that lie outside the training support. PreLoad provably prevents arbitrarily confident predictions while preserving accuracy and efficiency.

Second, we study reward-guided text generation (RGTG) in large language models as a form of inference-time alignment. We show that stable reward-guided decoding requires carefully designed token-level reward models and propose two algorithms, PARGS and FaRMA, that enable effective reward-guided generation.

Third, we address the computational cost of RGTG and propose an efficient algorithm that adds only a minor overhead during inference while preserving the performance and benefits of reward-guided decoding.

Together, these results demonstrate that inference-time control provides a flexible and computationally efficient framework for shaping the behavior of modern neural systems. By decoupling representation learning from the decision-time objectives, this work introduces new tools for improving the reliability, alignment, and efficiency of large-scale machine learning models without retraining them.

To attend this PhD defence in person, please go to DC 2310. You can also attend virtually on MS Teams.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Hybrid: DC 2310 | Online PhD defence
Waterloo, ON, CA N2L 3G1

Location coordinates: