Please note: This master’s thesis presentation will take place in DC 2314 and online.
Nima Jamali, Master’s candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Olga Veksler, Yuri Boykov
Despite notable advances in network architectures and representation learning, most semantic segmentation pipelines continue to rely on hard ground-truth labels and evaluation metrics originally designed for binary masks. This assumption is misaligned with real-world data, where object boundaries are often ambiguous, annotations are noisy, spatial downsampling aggregates multiple semantic classes, and uncertainty is frequently encoded through void labels that are ignored during training and evaluation. As a result, both learning objectives and evaluation criteria do not faithfully reflect the underlying semantic structure and uncertainty of the data.
In this thesis, we study semantic segmentation in settings where the ground-truth labels are not necessarily hard, but instead may be uncertain. We refer to this formulation as \textit{soft-label semantic segmentation}. We treat this problem in a unified, end-to-end manner encompassing label generation, training, and evaluation.
To generate soft labels, we propose a geometry-aware downsampling strategy called \textit{Weighted Average Pooling} (\emph{WAP}) for semantic segmentation masks. WAP produces smooth and probabilistically valid soft labels at arbitrary resolutions by constructing spatially varying weights based on geometric relationships and spatial proximity. As a result, the generated soft labels are resolution-agnostic, preserve the underlying probabilistic structure of the annotations, and avoid artifacts commonly introduced by conventional downsampling methods.
In addition, this work motivates the need for evaluation metrics that operate directly on probabilistic segmentation outputs. To this end, we introduce several principled relaxations of the soft intersection-over-union (soft IoU) metric that provide faithful extensions of standard IoU to soft-label settings. We further introduce void replacement strategies that assign soft class distributions to void pixels based on spatial context, enabling uncertain regions such as ambiguous boundaries and thin structures to be incorporated directly into the supervision signal.
Extensive experiments on the PASCAL VOC 2012 dataset demonstrate that WAP produces more faithful soft labels than conventional approaches, particularly in scenarios involving thin structures and complex spatial arrangements. The proposed soft IoU relaxations offer improved interpretability and better alignment with hard-label evaluation, while the void replacement strategies perform comparably to hard-label baselines, indicating that incorporating soft supervision in uncertain regions does not compromise segmentation quality. Together, these contributions establish a principled framework for generating, training, and evaluating soft labels in semantic segmentation.
To attend this master’s thesis presentation in person, please go to DC 2314. You can also attend virtually on Zoom.