Michael Cormier

PHD student at the University of Waterloo
Profile photo

Research Interests

Broadly speaking, my research is in the area of computer vision, usually including a machine learning component. I particularly enjoy working with "unusual" image classes, outside of the traditional "coventional camera viewing a scene composed of discrete, opaque objects" scenario.

My current research focus is on the use of computer vision to analyse the structure of web pages, with the aim of supporting assistive technology solutions such as screen reader programs. The ultimate goal of this research is to create a system capable of parsing the structure of the content of a page at a high level using the same visual information available to users (i.e. using an image of the rendered page rather than the page source code). I take this approach rather than the more typical analysis of the page source code for several reasons:

Web pages often include dynamic content, images, Flash objects, and other objects that are difficult to analyse at a source code level
The visual layout of a web page is designed to clearly convey its semantic structure to the users, while the source code is simply designed to cause the browser's rendering engine to produce a page with the visual layout specified by the designer
While web frameworks and associated technologies will inevitably shift over time, it is reasonable to assume that visual layout cues will remain broadly consistent to avoid confusing users

Additionally, web pages are an interesting class of image to analyse from a computer vision perspective. A rendered web page is a designed image, but one designed to convey information to human users rather than to a computer vision system. Thus, they form a restricted but nontrivial domain, intermediate between toy problems and natural scenes, in which to study computer vision methods.

I have also worked with images formed by projection (in the sense of integration of density along a line of sight), such as telescopic imagery of galaxies or X-ray images, and the reconstruction of density functions from small numbers of projections under assumptions about density function structure.

Reviewed Publications

Cormier, M., Mann, R., Moffatt, K., and Cohen, R. (2017). "Towards an Improved Vision-based Web Page Segmentation Algorithm". Computer and Robot Vision (CRV) 2017. (conference; refereed; paper with poster presentation)
Cormier, M., Mann, R., Cohen, R., and Moffat, K. (2016). "Classification via Hidden Markov Trees for a Vision-Based Approach to Conveying Webpages to Users with Assistive Needs". 2016 IEEE/WIC/ACM International Conference on Web Intelligence. (conference; refereed; paper with oral presentation)
Cormier, M. (2016). "Computer Vision-based Analysis of Web Page Structure for Assistive Interfaces". Proceedings of the 13th Web for All Conference, 24:1-24:2. (conference; selected by organizers as one of only six students for the Doctoral Consortium, and funded by Google; paper with oral presentation)
Cormier, M., Moffatt, K., Cohen, R., and Mann, R. (2016). "Purely vision-based segmentation of web pages for assistive technology". Computer Vision and Image Understanding (Special Issue on Assistive Computer Vision and Robotics), 148, 46-66. (journal; refereed)
Cormier, M., Lizotte, D. J., and Mann, R. (2015). "Reconstruction of 3-D Density Functions from Few Projections: Structural Assumptions for Graceful Degradation". In CRV 2015. (conference; refereed; oral presentation)
Cormier, M., Cohen, R., Mann, R., Rahim, K., and Wang, D. (2014). "A Robust Vision-Based Framework for Screen Readers". Second Workshop on Assistive Computer Vision and Robotics (ACVR 2014) at ECCV 2014. (Workshop; refereed; poster presentation)
Cohen, R., Lam, D. Y., Agarwal, N., Cormier, M., Jagdev, J., Jin, T., Kukreti, M., Liu, J., Rahim, K., Rawat, R., Sun, W., Wang, D., and Wexler, M. (2014). "Using computer technology to address the problem of cyberbullying". ACM Special Interest Group on Computers and Society (SIGCAS) Newsletter, pp. 52-61. (newsletter; reviewed by editors)
Gondra, I., Xu, T., Chiu, D. K. Y., and Cormier, M. (2014). "Object Segmentation Through Multiple Instance Learning". International Conference on Image and Signal Processing (ICISP 2014). (Conference; refereed)
Cormier, M. and Gondra, I. (2011). "Supervised Object Segmentation using Visual and Spatial Features". Proceedings of the 2011 International Conference on Image Processing, Computer Vision, & Pattern Recognition. Vol. 2, pp. 557-563 (Conference; refereed; oral presentation)
Hatchard, T. D., George, A. E., Farrell, S. P., Steinitz, M. O., Adams, C. P., Cormier, M., & Dunlap, R. A. (2010). "Production and characterization of < 1 0 0 > textured magnetostrictive Fe-Ga rods". Journal of Alloys and Compounds. 494(1-2) 420-425.

Other Publications

Cormier, M., Lizotte, D. J., and Mann, R. (2014). "3-D Reconstruction from Few Projections: Structural Assumptions for Graceful Degradation". University of Waterloo Cheriton School of Computer Science Technical Report CS-2014-07. (technical report; not refereed; expanded and published in CRV 2015 as "Reconstruction of 3-D Density Functions from Few Projections: Structural Assumptions for Graceful Degradation")
Cormier, M. and Gondra, I. (2010). "Strong image segmentation using learned regions and spatial relationships". In APICS Annual Computer Science Conference, St. Mary's University, Halifax, Nova Scotia, Canada. (Student conference; non-refereed; oral presentation)
Cormier, M. and Gondra, I. (2009). "Strong image segmentation through non-homogeneous region merging". In APICS Annual Computer Science Conference, Dalhousie University, Halifax, Nova Scotia, Canada. (Student conference; non-refereed; oral presentation)

Education

In Progress: PhD (University of Waterloo)

Title:Visual Document Understanding for Assistive Technology (tentative title)
Supervisors: Prof. Robin Cohen and Prof. Richard Mann
Description:
I am presently studying the application of computer vision tecchniques to the problem of understand the structure of documents. This research has led to a paper describing the use of computer vision to interpret the high-level semantic structure of web pages; the primary objective is to improve the screen reader programs that visually impaired users need to use the Internet. Future possibilities for research include empirical studies of our framework for understanding web page organization, generalization to other types of document, and methods of presenting complex structures to visually-impaired users.

2013: MMath (University of Waterloo)

Title:3-D Reconstruction from Single Projections, with Applications to Astronomical Images
Supervisors: Prof. Daniel J. Lizotte and Prof. Richard Mann
Description:
In my MMath thesis, I developed a framework for the reconstruction of three-dimensional data from images formed by projection (i.e., by integration along a line of sight of some value at each point in a volume). This framework was designed for the reconstruction of the distribution of light-emitting matter (stars, for the most part) in a galaxy from a single projected image formed by integration of luminosity along the line of sight from each pixel.

For simplicity, assume that the image is square, and the volume of the galaxy is divided into voxels (“volume pixels”) in a cube with a side length equal to the side length of the image. Since each pixel provides a linear constraint, the voxel values are underdetermined. By making certain physically reasonable assumptions about structural properties such as symmetry, however, additional constraints can be found that allow the system to be solved, thus reconstructing the original distribution insofar as it is consistent with the structural assumptions made. Furthermore, the projected image of the reconstructed distribution shows which aspects of the original image can be explained by the structure assumed. This allows the isolation of structures which are not consistent with the assumptions made.

A wide range of structural assumptions can be expressed easily in this framework using a combination of reparametrization of the reconstruction problem and the addition of regularization terms. Since the framework uses three-dimensional reconstruction, the structural constraints are also three-dimensional. Three-dimensional constraints better reflect physical reality than constraints on the two-dimensional image, and are often both simpler and more flexible.

2011: BSc with First-Class Honours (St. Francis Xavier University)

Title:Strong Image Segmentation using Learned Regions and Spatial Relationships
Supervisor: Prof. Iker Gondra
Description:
In my B.Sc. Honours thesis, I developed an algorithm to segment an image in such a way as to isolate an object of interest (OOI), which may consist of many distinct regions (each internally homogeneous with respect to low-level image features), using multiple instance learning to find prototypical representations of each region and using a naive Bayesian classifier to determine whether a given block of pixels in the image is part of the OOI or part of the background. The features considered include spatial relationships between regions and the color and texture of the block. The results of this thesis showed that these techniques can be used to learn characteristics of the OOI useful in segmentation.