Please note: This master’s thesis presentation will take place in DC 2310 and online.
Matina Mahdizadeh Sani, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Olga Veksler
Semantic segmentation is one of the core tasks in computer vision, with applications ranging from autonomous driving to medical image analysis. Recent advances have produced increasingly powerful segmentation models based on both convolutional and transformer based architectures. Despite these advances, segmentation performance depends not only on model architecture, but also on a range of design choices made during training and inference. In practice, different components are often adopted from prior work without systematic analysis. As a result, the contribution of individual design choices to final segmentation performance is not understood sufficiently. This thesis presents a systematic experimental analysis of key components in modern semantic segmentation frameworks, focusing on loss functions, inference strategies, and matching techniques.
Motivated by the observation that transformer based segmentation models often rely on inference strategies that differ from the classic approaches commonly used in CNN based architectures, this thesis first investigates the effect of transferring inference strategies across model families. Specifically, classic inference methods that are well established for CNNs are applied to a transformer based model, while inference strategies originally developed for transformers are evaluated within a CNN based framework. This analysis aims to determine whether inference strategies are inherently architecture specific or can be interchanged across model classes.
Following the same cross architecture perspective, the thesis further explores matching strategies by replacing bipartite matching, commonly used in transformer based models, with fixed matching inspired by CNN based formulations, and also applying bipartite matching to a CNN based model.
Building on these analysis, the thesis systematically studies the role of loss functions by evaluating a wide range of loss formulations both in isolation and in combination. This includes examining how different losses interact with inference strategies and how their effectiveness varies across architectures. These experiments disentangle the individual and combined effects of loss design choices on segmentation performance.
The study evaluates both a CNN based and a transformer based model, using DeepLabV3+ and Mask2Former as representative frameworks, and conducts experiments on the Cityscapes and Pascal VOC 2012 datasets under controlled settings. The results show that inference strategies exhibit architecture dependent behavior in which classic inference generally performs better for CNN based models, while semantic inference is more effective for transformer based architectures. The experiments further demonstrate that fixed matching can consistently outperform bipartite matching. Additionally, the effect of class level supervision is found to be configuration dependent, providing benefits in some settings but offering limited or inconsistent gains in others.
By examining inference, loss, and matching components through systematic cross-architecture transfer, this thesis clarifies which segmentation design choices are architecture-specific and which can be effectively borrowed across model families. The findings provide practical guidance for reusing well performing components from one architecture in another, enabling effective semantic segmentation systems.
To attend this master’s thesis presentation in person, please go to DC 2310. You can also attend virtually on Zoom.