CS 748T 
Topics in Databases - Distributed Database Management

Guidelines for Term Projects

Instructor: M. Tamer Özsu

Project Objectives and Scope

The projects consists of picking up a distributed database research problem from among the ones listed below and working on its solution for the duration of this term. I would be willing to entertain other topics that may be of interest to you, but you should contact me immediately

What I expect in the project is a good understanding of the problem (resulting in a survey part), insight into its solution and a well defined strategy for its solution. You should treat the term project as if you were doing the initial background study for further in-depth research.In other words, the report should demonstrate an understanding of and an insight into the problem such that given enough time, you could carry it to its logical conclusion and complete the research.

The term project that you will prepare for this course will contribute towards 50% of your final grade. Half of this will be for the survey part, and the other half for the research part. These could be individual projects, or group projects of two people. For group projects, both members of the group will receive the same grade. Therefore, it is incumbent upon you to make sure that both partners share in the work (and let me know very quickly if the partnership is not working).

Deliverables

There are two deliverables of the term project:
  1. A survey report that describes the problem domain, with proper problem definition, and a survey of existing work. This should be about 25 typed pages (12pt type with 1.5 spacing). This report, as indicated in the schedule below, will be handed in at the end of the 6th week of classes. Each person will then be responsible for presenting the field to class and leading a discussion. This will start in Week 7 and each topic will get about 1.5 hours of class time.
  2. A research report which will describe your own attempt to either solve a problem in this domain or go a long way towards its solution. What I minimally expect is a good solution approach such that if I gave you 2-3 more months, you could complete the solution, conduct the experiments and produce a publishable paper. This report should be about 30 typed pages,including a summary survey (i.e., a summary of part 1) of about 5 pages.
The first part is something that every individual (group) should do well; the second part is going to vary with each individual's (group's) abilities. It is possible that some groups may not do well on the second part; you should, therefore, make sure that you do really well on the survey part to get a decent mark.

Note that the report is very important. It should be written carefully so that I can understand it easily. There will probably not be enough time for me to spend an inordinate amount of effort trying to understand what you mean. So include the necessary introductory material and make sure that the presentation is good. All reports should be typewritten. Use a word processor of your choice.

Schedule

The following is the schedule that we will follow.
January 3:
We decide if this is an individual or a joint project. If it is joint, your first task is to find a partner.
January 17:
Find a problem that you want to work on. The list below gives some problems that I am interested in and would like to cover in this course. If you wish to pick up another topic, make an appointment and see me first.
January 31:
    You should have finished reading the literature in the area by now.You'll have ten days to write the survey report.
February 2:
    I would like to receive a problem definition that is 1-3 typed pages.This document should include a clear description of the problem on which you are going to work in the area that you are writing your survey in.
February 9:
Your survey report should be in by 4PM.
April 11:
Absolute deadline for handing in final reports.

Research Topics

Here are a few research topics that you might wish to consider. Note that the papers listed here are only to get you started. They are not meant to be a complete list, and they many not even be the most important ones from your perspective. My hope is that by the time you complete the project, you'll have a good idea of what each area is about and what the more important publications are. You should be looking at the proceedings of conferences such as ACM SIGMOD, VLDB, ICDE, ICDCS, ... and at journals such as ACMTODS, IEEE TKDE, VLDB Journal, Distributed and Parallel Databases Journal, Journal of Intelligent Information Systems, World Wide Web, and many others.

Most of these publications can either be obtained through the University of Waterloo Library's TRELLIS System, or through ACM Digital Library, or through VLDB Endowment web page, or Michael Ley's Computer Science Bibliography. Michael's Bibliography is probably the best place to start since it incorporates many of the papers. In a few cases where I thought you might have difficulty finding the paper, I have included a link.
 

Data Replication

Student:

Most distributed database systems are replicated. Replication has even gained importance outside the database domain as a means of improving system access performance and system reliability. There are some classical proposals,but there are also newer ones that push the envelope significantly in how replicated data are managed. The following is a combination of new and classic publications for you to consider.

New networking architectures and their impact on DDBMS

Student: Ning Zhang

This research deals with the study of the effects of changing computer network characteristics on the architecture and algorithms of DDBMSs. The infrastructure on which DDBMSs are built is undergoing dramatic and rapid changes. An important part of this infrastructure is the computer network over which DDBMSs operate. A major change in this infrastructure is the emergence of high-bandwidth, high-speed broadband networks coupled with lightweight (low overhead) protocols. The characteristics of these networks raise questions about the fundamental DDBMS assumption that the network is the bottleneck. The question I want to investigate in this project is exactly what these changes are, and what are the database architecture components/protocols/... that need to be revisited.

Web Data Management

Student:

This is a loaded topic.It means many things and it means different things to different people. I have separated the topic into a number of segments. Under this heading, I am more interested in modeling and querying Web data. In addition to the Abiteboul, Bunemann, and Suciu's book Data on the Web , the following references are relevant.

Web Search Engines/Crawlers

Student: Sunny Lam

This topic focuses on an interesting aspect of the Web data management that has significant overlaps with information retrieval. Therefore, I suggest that you also look at information retrieval literature. The first reference is a good one to start from.

Web Mining

Student: Yan Wang

Web mining can focus on three major areas: Web usage mining, Web content mining, and Web structure mining. The first paper below focuses on Web usage mining. The second one is more general, and the third one deals with exploiting links in the mining process. The other papers are not classified this way.
 

Web Caching

Student: Leon Cao

This is a very wide topic. One can talk about static versus dynamic Web caching algorithms, or one can focus on Web cache consistency algorithms, or even proxy caching systems. The literature is so wide that it is hard to give a representative sample. However, I have the following list for now and will be adding to it soon. You can also find a ton of them if you do a Web search.

Pervasive and Mobile Computing

Student:

This topic is also quite wide, so you'll need to focus. Some literature makes a distinction between mobile computing and pervasive computing, so you'll need to be careful. Also, note that I am interested in data management problems in this domain, not general systems issues.

Distributed View Management

Student: Lubomir Stanchev

The title of the topic is perhaps not exactly very informative. There are two ways to look at this. One is the materialized view maintenance issue. The other is the maintenance of distributed virtual views. There is more material on the first topic, which is more popular these days. So, the followign are the possible literature on that topic:

Information Integration

Student: Meng He

There are a number of ways of approaching this topic. One can look at information integration using views (references 1-4), or techniques to deal with schematic and data discrepencies (references (5-9) or looking at integration/componentization architectures using distributed platforms such as CORBA or DCOM/OLE, or the use of XML for data integration. You should choose one of these dimensions to work on.

Hybrid Query Execution Models

Student: Ivan Bowman

The issue here can be looked at either within the context of client-server object DBMSs, or client-server relational DBMSs. In the former, the usual execution model is data shipping from the server to the client, which is then responsible for execution of queries on the cached data. In the latter, the usual execution model is function shipping where the client ships the SQL query to the server without processing and receives the result. Hybrid query execution considers the possibility of executing queries at both the client and the server. The project should focus either on relational or object DBMSs (at least originally).
[University of Waterloo]
University of Waterloo
[Department of Computer Science]
Computer Science
[M. Tamer Özsu's home page]
M.T. Özsu
[M. Tamer Özsu's home page]
CS 748t Home Page