Please note: This master’s thesis presentation will be given online.
Rishav Agarwal, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Lukasz Golab
Object detection is a popular task in computer vision with various applications, from pedestrian detection to face detection. Following the success of Convolutional Neural Networks (CNNs), many CNN based object detectors have been proposed to solve the object detection task. Early CNN based detectors suggested using deeper networks to detect objects in images. However, deeper networks cannot capture objects of varied sizes and aspect ratios with high accuracy. Thus, CNN-based detectors have two main challenges — scale invariance (detecting objects at multiple scales) and aspect-ratio invariance (detecting objects at various aspect ratios).
Modern CNN-based object detectors have two main components — a backbone network that learns features from an image and an output network that leverages these features to make predictions. Scale and aspect-ratio invariance are typically added by either making changes to the backbone or the output network. Adding scale awareness to the output network is often computationally expensive. Thus, a popular method to add scale invariance by changing the backbone is Feature Pyramid Networks (FPNs). FPNs create a hierarchy of features at different scales and implicitly capture objects at various resolutions.
However, FPNs have a square-bias and favour square objects over asymmetric ones. One solution to alleviate the square biasedness of FPNs is to add template anchor boxes of various sizes to add more bias towards non-square objects. However, anchor boxes are set as hyperparameters and add a computational overhead to the network. Newer architectures have thus moved towards anchor-free techniques; however, they still rely on FPNs, which are square-biased. Recently, MatrixNets has been proposed, a general-purpose aspect-ratio aware extension of FPNs that can explicitly model aspect-ratios better than anchor boxes while keeping the model anchor-free. While MatrixNets has been shown to improve keypoint based object detectors significantly, the implementation makes significant changes to the architecture, making it difficult to isolate the solo impact of MatrixNets.
In this thesis, we explore MatrixNets as a viable method to add aspect-ratio awareness. Specifically, we study MatrixNets along three axes — 1) Does MatrixNets make anchor-based detectors anchor-free, 2) Does MatrixNets add aspect-ratio awareness to object detectors, and 3) can MatrixNets be used for other, more complicated computer vision tasks like instance segmentation. We explore these questions via three case studies. We demonstrate the effectiveness of MatrixNets by replacing anchor boxes in RetinaNet with our MatrixNets module and showing better performance on skewed boxes while making the detector anchor-free. Then, we extend the anchor-free CornerNets to x-CornerNet to support multiple output heads and smaller backbones. We then apply MatrixNets to x-CornerNet and demonstrate a similar improvement in skewed boxes leading to an overall 5.6% mAP improvement on MS COCO, achieving competitive results. Finally, we add MatrixNets to Mask RCNN to tackle the instance segmentation tasks. We propose a new loss function, Mask Edge Loss (MEL), that leverages mask contours to reduce coarseness in predicted masks, thereby achieving higher accuracy. Together these three case studies demonstrate the effectiveness of MatrixNets for adding aspect-ratio awareness to object detectors. The code-base for our implementation will be made public.
To join this master’s thesis presentation on Zoom, please go to https://zoom.us/j/91941954965?pwd=bmROOWkrZGlzNjltSU02S0FEem9wZz09.
200 University Avenue West
Waterloo, ON N2L 3G1