Michael Cormier

PHD student at the University of Waterloo
Profile photo

Research Interests

Broadly speaking, my research is in the area of computer vision, usually including a machine learning component. I particularly enjoy working with "unusual" image classes, outside of the traditional "coventional camera viewing a scene composed of discrete, opaque objects" scenario.

My current research focus is on the use of computer vision to analyse the structure of web pages, with the aim of supporting assistive technology solutions such as screen reader programs. The ultimate goal of this research is to create a system capable of parsing the structure of the content of a page at a high level using the same visual information available to users (i.e. using an image of the rendered page rather than the page source code). I take this approach rather than the more typical analysis of the page source code for several reasons:

Additionally, web pages are an interesting class of image to analyse from a computer vision perspective. A rendered web page is a designed image, but one designed to convey information to human users rather than to a computer vision system. Thus, they form a restricted but nontrivial domain, intermediate between toy problems and natural scenes, in which to study computer vision methods.

I have also worked with images formed by projection (in the sense of integration of density along a line of sight), such as telescopic imagery of galaxies or X-ray images, and the reconstruction of density functions from small numbers of projections under assumptions about density function structure.

Reviewed Publications

Other Publications


In Progress: PhD (University of Waterloo)

Title:Visual Document Understanding for Assistive Technology (tentative title)
Supervisors: Prof. Robin Cohen and Prof. Richard Mann
I am presently studying the application of computer vision tecchniques to the problem of understand the structure of documents. This research has led to a paper describing the use of computer vision to interpret the high-level semantic structure of web pages; the primary objective is to improve the screen reader programs that visually impaired users need to use the Internet. Future possibilities for research include empirical studies of our framework for understanding web page organization, generalization to other types of document, and methods of presenting complex structures to visually-impaired users.

2013: MMath (University of Waterloo)

Title:3-D Reconstruction from Single Projections, with Applications to Astronomical Images
Supervisors: Prof. Daniel J. Lizotte and Prof. Richard Mann
In my MMath thesis, I developed a framework for the reconstruction of three-dimensional data from images formed by projection (i.e., by integration along a line of sight of some value at each point in a volume). This framework was designed for the reconstruction of the distribution of light-emitting matter (stars, for the most part) in a galaxy from a single projected image formed by integration of luminosity along the line of sight from each pixel.

For simplicity, assume that the image is square, and the volume of the galaxy is divided into voxels (“volume pixels”) in a cube with a side length equal to the side length of the image. Since each pixel provides a linear constraint, the voxel values are underdetermined. By making certain physically reasonable assumptions about structural properties such as symmetry, however, additional constraints can be found that allow the system to be solved, thus reconstructing the original distribution insofar as it is consistent with the structural assumptions made. Furthermore, the projected image of the reconstructed distribution shows which aspects of the original image can be explained by the structure assumed. This allows the isolation of structures which are not consistent with the assumptions made.

A wide range of structural assumptions can be expressed easily in this framework using a combination of reparametrization of the reconstruction problem and the addition of regularization terms. Since the framework uses three-dimensional reconstruction, the structural constraints are also three-dimensional. Three-dimensional constraints better reflect physical reality than constraints on the two-dimensional image, and are often both simpler and more flexible.

2011: BSc with First-Class Honours (St. Francis Xavier University)

Title:Strong Image Segmentation using Learned Regions and Spatial Relationships
Supervisor: Prof. Iker Gondra
In my B.Sc. Honours thesis, I developed an algorithm to segment an image in such a way as to isolate an object of interest (OOI), which may consist of many distinct regions (each internally homogeneous with respect to low-level image features), using multiple instance learning to find prototypical representations of each region and using a naive Bayesian classifier to determine whether a given block of pixels in the image is part of the OOI or part of the background. The features considered include spatial relationships between regions and the color and texture of the block. The results of this thesis showed that these techniques can be used to learn characteristics of the OOI useful in segmentation.