CS484/684-- Computational Vision

^(*)CS484/684 (F11) - Computational Vision

(*) Pending calendar changes, the F11 course is offered as CS489-001, CS698-003 (digits 8/9 swapped).

(**) Open for graduate students, please sign up at first lecture.

Administrivia:

Time: Fall 2011; Lectures T,Th 13:00-14:20, EV1 132. First lecture, Tues 13 September. Office hours TBA. Tutorial time(s) TBA.

Instructor: Richard Mann, DC2510, x33006, mannr@uwaterloo.ca, http://www.cs.uwatleroo.ca/~mannr

Audience: This course is intended for advanced undergraduate and beginning graduate students interested in pursuing research in AI, Vision or related areas. Students should expect to do a fair amount of independent study, both in following the material and completing a course project.

The grades are based on a small number of assignments (4) and a project. For the project, students will choose a vision problem or application, implement one or more algorithm(s), and prepare a final report. The grades awarded will depend on the difficulty of the problem selected, the implementation effort, and the report. The report should provide sufficient experiments and analysis, in particular, situations where the algorithm(s) work and where they fail.

CS684: Graduate students will take on a more advanced project, either from a recent publication, or vision-related topics from their thesis area.

Grading: 50% assignments, 50% project. Assignments will typically have a both a written and a (small) programming component.

Presentation: This is a lecture-based course. Material will be presented on the blackboard and supplemented with images and video. For those who do not wish to copy notes, I will appoint a note taker who will make lectures available after each lecture.

Software: Matlab will be used for assignments, and is encouraged for the project. Matlab is available in the Undergrad Computing Enviroment. Matlab is an interactive language, with a C-like syntax, that is optimized for numerical computation and plotting/graphics.

Software Licensing: The Matlab license is expensive and restrictive. For those interested in Open Source software, Octave provides a decent substitute for the assignments, albeit with bit of extra user effort. The main difference between Octave and Matlab is in plotting (Matlab is nicer). Matlab also has several specialized "toolboxes" (image processing, signal processing, etc) that Octave does not have. These change frequently, and require additional licensing, so assignments in this course will avoid toolboxes to maximize portability of code.

Prerequisites: There are no formal prerequisites for this course, however, it is advisable to have some exposure to numerical computation, especially linear algebra (eg., CS370), and some basic programming experience.

References: All required material will be provided in lectures. Possible references include:

Introductory Techniques for 3D Computer Vision, E. Trucco and A. Verri, Prentice-Hall, 1998.

A concise treatment, focussing on geometric approaches to vision. The course loosely follows this book, with lots of other stuff thrown in.

Computer Vision: Algorithms and Applications, R. Szeliski, Springer (2010). See also, Web Publication (2010).

New book, well presented, lots of computational details. Good source of algorithms for projects.

Computer Vision, A modern approach, D. A. Forsyth and J. Ponce, Prentice Hall, 2003.

Comprehensive treatment of many areas of compuational vision. A good resource for background reading and choosing project ideas. Noteable omission: optical flow.

Vision, D. Marr. Freeman, 198x.

Classic text. Still recommended reading showing the origins of many vision problems.

Digital Image Processing, K. R. Castleman, Prentice Hall, 1996.

Excellent reference on signal processing and Fourier analysis. Very accessible to both Computer Scientists and Engineers.

Robot vision, B. K. P. Horn, MIT Press, 1986.

Three-Dimensional Computer Vision, O. Faugeras, MIT Press, 1993.

I will put all of these books on reserve in the library (DC). You may also purchase Trucco and Verri's book in the University bookstore.

General Information:

Course outline (includes list of readings)

Assignments:

Assignment 1: Image formation and lighting. Out: 20 September, Due: 4 October, 1pm (before class). Note: Assignment 1 code (photometric.m) changed. Please reload above.
Assignment 2: Linear Systems and Feature Detection. Out: 11 October, Due 25 October, 1pm (before class). Required software: Linear Filtering and Pyramid Tools (iseTools/pyrTools, [Jepson, Fleet, and Simoncelli])
Assignment 3: Robust and Mixture Models For Optical Flow Out: 28 October, Due: 15 November *extended*, 1pm (before class).
Assignment 4: Stereo: Block Matching and Dynamic Programming. Out: 8 November, Due 22 November, 1pm (before class).

Lectures:

Tues Sep 13. Vision overview (half of first lecture); Image formation, optics. Radiometry (Horn Ch10, Trucco and Verri Ch2)
Thur Sep 15. Combining images of different exposure. References: Mann S. and Picard W., "IS\&T's 48th annual conference, Cambridge, MA. May 1995.; Debevec P.E. and Malik J., Siggraph 1997.

Tues Sep 20. Perspective projection. Radiometric image formation, Lambertian surface (Horn Ch10) A1 out.
Thur Sep 22. Gradient space, photometric stereo (Horn Ch10). Shading and reflectance. Reference: Adelson and Pentland.

Tues Sep 27. Linear systems and filtering. (Horn Ch6; Szeliski Section 3.2).
Thur Sep 29. Fourier analysis. (Castleman Ch10).

Tues Oct 4. Fourier analysis, part II. A1 due (in class). Project prosposal due (in class).
Thur Oct 6. Feature detection (edges).

Tues Oct 11. Feature detection, part II (corners, SIFT). A2 out.
Thur Oct 13. Data fitting, least squares.

Tues Oct 18. Hough transform, RANSAC algorithm. Robust fitting (lines).
Thur Oct 20. Mixture models.

Tues 25 Oct. Optical flow. A2 due (in class).
Thur 27 Oct. Mixture models for Optical flow. Allan Jepson, Optical flow notes. A3 out.

Tues 1 Nov. Fourier methods for motion analysis, optical snow. Langer&Mann, Optical Snow.
Thur 3 Nov. Stereo (Baseline case), Block matching.

Tues 8 Nov. Stereo, Dynamic programming. Begin Epipolar geometry. A4 out.
Thur 10 Nov. Stereo, Epipolar geometry.

Tues 15 Nov. Structure from motion, differential methods. A3 due (in class).
Thur 17 Nov. Structure from motion, factorization method.

Tues 22 Nov. Object recognition: overview. Eigenfaces. A4 due (in class).
Thur 24 Nov. Object recognition: model-based methods.

Tues 29 Nov.
Thur 1 Dec.

Mon 5 Dec. End of term. Projects due (undergrad).

Reference material:

Course software:

Project ideas:

(Undergrads) Implementation and further study of algorithms described in class
- Mixture models,
- Optical flow,
- Image alignment for composites,
- Structure from motion, etc.
(Grads) Thesis-related vision project (consult instructor). Note: I recommend starting with a small, well-defined problem, even if you plan an ambitious project. You can always add to it later.

Additional References (to be updated):

The following resources are from Allan Jepson's computer vision course at University of Toronto. These are not required for this course, but you might find them useful.

CMU course on Image-based representation and rendering. This page has a very good set of resources about warping, image compositing, structure from motion, etc.

Review of projective geometry (Appendix from a book by Zisserman and Mundy.) Please let me know if you get through this. I got stuck near the beginning.

Wearcam Steve Mann's webpage on wearable computers and cameras.

Gerhard Roth (NRC) Software for 3D scene reconstruction (uses Epipolar geometry). Also, some tutorials on reconstruction using projective geometry.