(*)CS484/684 (F11)
-
Computational
Vision
(*) Pending calendar changes, the F11
course is offered as CS489-001, CS698-003 (digits 8/9 swapped).
(**) Open for graduate students, please sign up at first lecture.
Administrivia:
Time: Fall 2011; Lectures T,Th 13:00-14:20, EV1 132.
First
lecture, Tues 13 September. Office hours TBA.
Tutorial time(s) TBA.
Instructor: Richard Mann, DC2510, x33006, mannr@uwaterloo.ca,
http://www.cs.uwatleroo.ca/~mannr
Audience: This course is
intended for advanced undergraduate and beginning graduate students
interested
in pursuing research in AI, Vision or related areas. Students
should expect to do a fair amount of independent study, both in
following the material and completing a course project.
The grades are based
on a small number of assignments (4) and a project. For the
project,
students will choose a vision problem or application, implement one or
more algorithm(s), and prepare a final report. The grades awarded
will depend on the difficulty of the problem selected, the
implementation effort, and the report. The report should provide
sufficient experiments and analysis, in particular, situations where
the algorithm(s) work and where they fail.
CS684: Graduate
students will take on a more advanced project, either from a recent
publication,
or vision-related topics from their thesis area.
Grading: 50%
assignments, 50% project. Assignments will typically have a both
a written and a (small) programming component.
Presentation:
This is a lecture-based course. Material will be presented on the
blackboard and supplemented with
images and video. For those who do not wish to copy notes, I will
appoint a note taker who will make lectures available after each
lecture.
Software: Matlab will be used for
assignments, and
is encouraged for the project. Matlab is available in the
Undergrad Computing Enviroment. Matlab is an interactive
language, with a C-like syntax, that is optimized for numerical
computation and plotting/graphics.
- Software Licensing:
The Matlab license is expensive and restrictive. For those
interested in Open Source software, Octave provides a
decent
substitute for the assignments, albeit with bit of extra user
effort. The main difference between Octave and Matlab is in
plotting (Matlab is nicer). Matlab also has several specialized
"toolboxes" (image processing, signal
processing, etc) that Octave does not have. These change
frequently, and require additional licensing, so assignments in this
course will avoid toolboxes to maximize portability of code.
Prerequisites: There are no formal prerequisites for this
course,
however, it
is advisable to have some exposure to numerical computation, especially
linear
algebra (eg., CS370), and some basic programming experience.
References: All required material will be provided in
lectures. Possible references
include:
- Introductory Techniques for 3D Computer Vision, E.
Trucco and A. Verri, Prentice-Hall, 1998.
A concise treatment, focussing on geometric approaches
to vision. The course loosely follows this book, with lots of
other stuff thrown in.
New book, well presented, lots of
computational details. Good source of algorithms for projects.
- Computer Vision, A modern approach, D. A. Forsyth and
J. Ponce, Prentice Hall, 2003.
Comprehensive treatment of many areas of compuational
vision. A good
resource for background reading and choosing project ideas. Noteable
omission:
optical
flow.
- Vision, D. Marr. Freeman, 198x.
Classic text. Still recommended
reading showing the origins of many vision problems.
- Digital Image Processing, K. R. Castleman, Prentice
Hall, 1996.
Excellent reference on signal
processing and Fourier analysis. Very accessible to both Computer
Scientists and Engineers.
- Robot vision, B. K. P. Horn, MIT Press, 1986.
- Three-Dimensional Computer Vision, O. Faugeras, MIT Press, 1993.
I will put all of these books on reserve in the library (DC). You may
also
purchase Trucco and Verri's book in the University bookstore.
General Information:
Assignments:
- Assignment 1: Image formation and lighting.
Out:
20
September,
Due: 4
October, 1pm (before class). Note:
Assignment
1
code
(photometric.m) changed. Please reload above.
- Assignment
2:
Linear
Systems and Feature Detection. Out: 11
October, Due 25 October, 1pm
(before class). Required
software: Linear Filtering and
Pyramid Tools (iseTools/pyrTools, [Jepson, Fleet, and Simoncelli])
- Assignment
3:
Robust and Mixture Models For Optical Flow Out: 28
October, Due:
15
November *extended*,
1pm
(before class).
- Assignment 4: Stereo: Block Matching and
Dynamic Programming. Out: 8 November, Due 22 November, 1pm
(before class).
Lectures:
- Tues Sep 20. Perspective projection. Radiometric image
formation, Lambertian surface (Horn Ch10) A1 out.
- Thur Sep 22. Gradient space, photometric stereo (Horn
Ch10). Shading and reflectance. Reference: Adelson and Pentland.
- Tues Sep 27. Linear systems and filtering. (Horn Ch6;
Szeliski Section 3.2).
- Thur Sep 29. Fourier analysis. (Castleman Ch10).
- Tues Oct
4. Fourier analysis, part II. A1 due (in class). Project prosposal
due (in class).
- Thur Oct 6. Feature detection (edges).
- Tues Oct 11. Feature detection, part II (corners,
SIFT). A2 out.
- Thur Oct 13. Data fitting, least squares.
- Tues Oct 18. Hough transform, RANSAC algorithm.
Robust fitting (lines).
- Thur Oct 20. Mixture models.
- Tues 1 Nov. Fourier methods for motion analysis, optical
snow. Langer&Mann,
Optical Snow.
- Thur 3 Nov. Stereo (Baseline case), Block matching.
- Tues 8 Nov. Stereo, Dynamic programming. Begin
Epipolar geometry. A4 out.
- Thur 10 Nov. Stereo, Epipolar geometry.
- Tues 15 Nov. Structure from motion, differential methods. A3 due (in class).
- Thur 17 Nov. Structure from motion, factorization method.
- Tues 22 Nov. Object recognition: overview. Eigenfaces. A4 due (in
class).
- Thur 24 Nov. Object recognition: model-based methods.
- Mon 5 Dec. End of term. Projects due (undergrad).
Reference material:
Course software:
Project ideas:
- (Undergrads) Implementation and further study of algorithms
described in
class
- Mixture models,
- Optical flow,
- Image alignment for composites,
- Structure from motion, etc.
- (Grads) Thesis-related vision project (consult instructor).
Note: I
recommend starting with a small, well-defined problem,
even if you plan an ambitious project. You can always add to it later.
Additional References (to be updated):
The following resources are from Allan Jepson's
computer vision course at University of Toronto.
These are not required for this course, but you might find
them useful.
CMU course
on
Image-based representation and rendering. This page has a very
good set of resources about warping, image compositing, structure
from motion, etc.
Review of projective geometry
(Appendix from a book by Zisserman and Mundy.) Please let me
know if you get through this. I got stuck near the beginning.
Wearcam Steve Mann's webpage
on wearable computers and cameras.
Gerhard Roth
(NRC)
Software for 3D scene reconstruction (uses Epipolar geometry).
Also, some tutorials on reconstruction using projective geometry.