CS486/686 Project Suggestions

The following list outlines possible projects for CS486/686. However, you should feel free to come up with your own project.

Intelligent Systems Challenge: Protecting Canada's Coastal Borders

Challenge website: http://www.intelligent-systems-challenge.ca
National programming competition
Joint initiative by the Canadian Artificial Intelligence Association and Precarn Inc
The 2009 challenge problem is provided by Vancouver-based MacDonald, Dettwiler and Associates (MDA), the company that developed Canadarm, Radarsat-2, and the satellite image processing systems used for Google Earth.

Problem: can we detect a cargo ship rendezvous with another vessel at sea? Such rendezvous are seldom necessary to meet legitimate commercial objectives. The core problem is that the actual rendezvous will seldom be observed directly - it must be inferred or ruled out based on the tracks of the ships before and after they intersect. This is a problem of model reconstruction from sparse data in one time and two space dimensions.

Full problem description and practice dataset are available here: http://www.intelligent-systems-challenge.ca/challenge2009/index.html

Possibility to win prizes and attend the Canadian AI Conference in Kelowna (BC) in May 2009

A multidisciplinary research team at U of Waterloo is developing a smart walker to monitor and assist older adults. The smart walker is a regular walker instrumented with sensors (load sensors, accelerometers, odometer and video cameras) that can monitor various health indices (e.g., gait, stability, mobility, behaviours) of users and detect obstacles.
Possible problems (mostly in machine learning and computer vision):

brake detection: detect when the brakes are applied based on video images of the wheels and brake pads (computer vision problem)
user recognition: recognize the owner of a walker based on usage pattern (machine learning and/or vision problem)
behaviour recognition: recognize the specific activity performed the user at any time based on sensor data (machine learning problem)
limb tracking: extract the position of the feet and legs of the user from video images (computer vision problem)
gait monitoring: extract heel-strike and toe-off events from video images (computer vision problem)
obstacle detection: detect obstacles from video images (computer vision problem)
stability monitoring: detect moments of instability from sensor data (machine learning problem)

Several companies (e.g., Google, Yahoo, Microsoft, AideRSS) are interested to cluster and analyze news articles, blogs, RSS feeds, emails, etc. There are several interesting problems at the intersection of machine learning and natural language processing:

document clustering: cluster documents by topics
story deduplication: detect when two articles describe the same news, event or story, but perhaps using different words
topic labeling: label a cluster of documents with a synthesizing phrase
topic detection: detect when a new topic arises in a stream of news articles, blogs, RSS feeds or emails
sentiment analysis: detect the overall sentiment of a document (e.g., positive or negative). This is particularly useful in movie recommendation and product reviews
entity extraction: extract the main entities (e.g., people, companies, objects, etc.) in a document
relation extraction: extract the relations between entities (e.g., which companies are customers/suppliers of which companies in press releases)

Machine learning works well when we have complete labeled data. However, in practice, data is usually incomplete/unlabeled. For instance, consider spam filtering, which can be viewed as a machine learning problem where you'd like to train your computer to classify incoming emails as spam or legitimate. If you have a training set of emails that are already labeled as spam or legitimate then you can use a supervised learning technique. However, you'd like to train your filter with recent emails that may not already be labeled to make sure that your filter is up to date with recent spam patterns. In that case, you could hand labeled a few emails and train your filter with many more unlabeled emails, which corresponds to semi-supervised learning. There are several fundamental questions that are unresolved regarding semi-supervised learning:

robust optimization for Bayes net semi-supervised learning: learning the parameters of a Bayes net with incomplete/unlabeled data leads to a non-convex optimization problem. This means that only a local optimum can be guaranteed. Explore techniques to find better optima.
tradeoff between labeled and unlabeled data: too much unlabeled data can misguide a semi-supervised learning algorithm. An assumption must be made to use unlabeled data (e.g., classes correspond to clusters) since we don't have the labels. In practice, the assumption made is rarely correct, so care must be taken not to use too much unlabeled data. Explore techniques to balance the amount of labeled and unlabeled data that should be used for best performance.