CS486/686 Project Suggestions
The following list outlines possible projects for CS486/686.
However, you should feel free to come up with your own project.
Intelligent Systems Challenge: Protecting Canada's Coastal Borders
- Challenge website: http://www.intelligent-systems-challenge.ca
- National programming competition
- Joint initiative by the Canadian
Artificial Intelligence Association and Precarn Inc
- The 2009 challenge problem is provided by Vancouver-based MacDonald, Dettwiler and
Associates (MDA), the company
that developed Canadarm, Radarsat-2, and the satellite image processing
systems used for Google Earth.
- Problem: can we detect a cargo ship
rendezvous with another vessel at sea? Such rendezvous are seldom
necessary to meet legitimate commercial objectives. The
core problem is that the actual rendezvous will seldom be observed
directly - it must be inferred or ruled out based on the tracks of the
ships before and after they intersect. This is a problem of model
reconstruction from sparse data in one time and two space
dimensions.
- Full problem description and practice dataset are available
here:
http://www.intelligent-systems-challenge.ca/challenge2009/index.html
- Possibility to win prizes and attend the Canadian AI Conference
in Kelowna (BC) in May 2009
Smart Walker Project
- A multidisciplinary research team at U of Waterloo is developing
a smart walker to monitor and assist older adults. The smart
walker is a regular walker instrumented with sensors (load sensors,
accelerometers, odometer and video cameras) that can monitor various
health indices (e.g., gait, stability, mobility, behaviours) of users
and detect obstacles.
- Possible problems (mostly in machine learning and computer
vision):
- brake detection: detect when the brakes are applied based on
video images of the wheels and brake pads (computer vision problem)
- user recognition: recognize the owner of a walker based on
usage pattern (machine learning and/or vision problem)
- behaviour recognition: recognize the specific activity
performed the user at any time based on sensor data (machine learning
problem)
- limb tracking: extract the position of the feet and legs of the
user from video images (computer vision problem)
- gait monitoring: extract heel-strike and toe-off events from
video images (computer vision problem)
- obstacle detection: detect obstacles from video images
(computer vision problem)
- stability monitoring: detect moments of instability from sensor
data (machine learning problem)
- Contact Farheen Omar (cs486@student.cs.uwaterloo.ca) for more
details.
Document Clustering and Analysis
- Several companies (e.g., Google, Yahoo, Microsoft, AideRSS) are
interested to cluster and analyze news articles, blogs, RSS feeds,
emails, etc. There are several interesting problems at the
intersection of machine learning and natural language processing:
- document clustering: cluster documents by topics
- story deduplication: detect when two articles describe the same
news, event or story, but perhaps using different words
- topic labeling: label a cluster of documents with a
synthesizing phrase
- topic detection: detect when a new topic arises in a stream of
news articles, blogs, RSS feeds or emails
- sentiment analysis: detect the overall sentiment of a document
(e.g., positive or negative). This is particularly useful in
movie recommendation and product reviews
- entity extraction: extract the main entities (e.g., people,
companies, objects, etc.) in a document
- relation extraction: extract the relations between entities
(e.g., which companies are customers/suppliers of which companies in
press releases)
- Contact Pascal for more details
Semi-supervised learning
- Machine learning works well when we have complete labeled
data. However, in practice, data is usually
incomplete/unlabeled. For instance, consider spam filtering,
which can be viewed as a machine learning problem where you'd like to
train your computer to classify incoming emails as spam or
legitimate. If you have a training set of emails that are already
labeled as spam or legitimate then you can use a supervised learning
technique. However, you'd like to train your filter with
recent emails that may not already be labeled to make sure that your
filter is up to date with recent spam patterns. In that case, you
could hand labeled a few emails and train your filter with many more
unlabeled emails, which corresponds to semi-supervised learning.
There are several fundamental questions that are unresolved regarding
semi-supervised learning:
- robust optimization for Bayes net semi-supervised learning:
learning the parameters of a Bayes net with incomplete/unlabeled data
leads to a non-convex optimization problem. This means that only
a local optimum can be guaranteed. Explore techniques to find
better optima.
- tradeoff between labeled and unlabeled data: too much unlabeled
data can misguide a semi-supervised learning algorithm. An
assumption must be made to use unlabeled data (e.g., classes correspond
to clusters) since we don't have the labels. In practice, the
assumption made is rarely correct, so care must be taken not to use too
much unlabeled data. Explore techniques to balance the amount of
labeled and unlabeled data that should be used for best performance.
- Contact Pascal for more details