CS 856 
Internet-Scale Distributed Data Management

Guidelines for Term Projects

Instructor: M. Tamer Özsu

Project Objectives and Scope

The projects consists of picking up a research problem related to data management on the Internet and Web and working on its solution for the duration of this term. Any topic that is covered in the course is acceptable as the domain from which a problem can be selected.

What I expect in the project is a good understanding of the problem (resulting in a survey part), insight into its solution and a well defined strategy for its solution. You should treat the term project as if you were doing the initial background study for further in-depth research. In other words, the report should demonstrate an understanding of and an insight into the problem such that given enough time, you could carry it to its logical conclusion and complete the research. How far you go into the solution together with the difficulty of the problem will determine your final mark.

The research part of the projects will be done in teams of two. The survey parts will be done in groups whose size will be determined once we know the class size .

Deliverables

There are two deliverables of the term project (note that their deadlines overlap):
  1. Survey Report. The report will describe the problem domain, with proper problem definition, and a survey of existing work. This should be about 25-30 typed pages (using ACM Computing Surveys style -- use LaTeX). This report, as indicated in the schedule below, will be handed in at the end of the 9th week of classes. What I want is to have one survey for each area that we will cover. Therefore, once you choose one of the three areas that you will work on, everyone who is going to work in that area will form a group and will write a single survey paper. Generally, every member of the group is likely to get the same mark, but I will try to have interviews to determine how much each member has contributed and I will differentiate the marks accordingly.
  2. Research Report. This report will describe your attempt to either solve a problem in this domain or go a long way towards its solution. What I minimally expect is a good solution approach such that if I gave you 2-3 more months, you could complete the solution, conduct all the experiments and produce a publishable paper. This report should be 12 typed pages using ACM SIGMOD Conference style (use LaTeX), including a summary of related work (this should not be a significant part of your paper).
The first part is something that every group should do well; the second part is going to vary with the abilities of each group's members. It is possible that some may not do well on the second part; you should, therefore, make sure that you do really well on the survey part to get a decent mark.

Note that the report is very important. It should be written carefully so that I can understand it easily. There will probably not be enough time for me to spend an inordinate amount of effort trying to understand what you mean. So include the necessary introductory material and make sure that the presentation is good. All reports should be typewritten. Use a word processor of your choice.

Schedule

The following is the schedule that we will follow. We may have to change these a bit based on the number of people in the class (which will determine the presentation schedule).
 
October 28:

You should have finished reading the literature in the area by now. You'll have two weeks to write the survey report.

November 1:

I would like to receive a detailed, annotated outline of the survey report that you will write. I will meet with each group during this week to discuss the outline to make sure that things are in good shape. Outline due by 5PM (send me PDF)

October 26:
I would like to receive a problem definition that is 1-3 typed pages. This document should include a clear description of the problem on which you are going to work in the area that you are writing your survey in. the problems should come from the area on which you write the survey. If necessary, talk to me, my graduate students and other graduate students working on these topics.
November 13:

Your survey report should be in by midnight. Send me the PDF file as well as a zipped directory of all the source files.

December 30:
Absolute deadline for handing in final reports (by 4PM). This is one month after the end of lectures to give you time to work on it, if you need the extra time. Send me the PDF file as well as a zipped directory of all the source files.

Resources

You should be looking at the proceedings of conferences such as ACM SIGMOD, VLDB, ICDE, WWW, ICDCS, WISE, ... and at journals such as ACM TODS, IEEE TKDE, VLDB Journal, Distributed and Parallel Databases Journal, Journal of Intelligent Information Systems, World Wide Web, and many others.

Most of these publications can either be obtained through the University of Waterloo Library's TRELLIS System, or through ACM Digital Library, or through VLDB Endowment web page, or Michael Ley's Computer Science Bibliography. Michael's Bibliography is probably the best place to start since it incorporates many of the papers. In a few cases where I thought you might have difficulty finding the paper, I have included a link.
 


[University of Waterloo]
University of Waterloo
[Department of Computer Science]
Computer Science
[M. Tamer Özsu's home page]
M.T. Özsu
[CS 856 home page]
CS 856 Home Page