Weekly Schedule

The following is the weekly schedule for the course.We will complete the study of "classical" distributed database systems in six weeks. The remainder of the time will be devoted to discussing more recent topics and your projects.

In the weekly schedule, I have indicated the material that you need to read. Whenever there is a reference to the textbook, I will be lecturing. The papers will be presented by students. Each presentation will be 30 minutes followed by 15 minutes of discussion (sometimes we may extend these times) and each presenter will be required to submit (at the time of the presentation) a 5-7 page critique of the paper. This will count towards one of the two paper critiques. Requirements for the paper critiques will follow.

Each of these papers are available electronically through the TRELLIS system of the University of Waterloo Library. Whenever a paper may be easier to obtain otherwise or have not yet been published, I have placed a link to it from the paper's title.

Week 1 - January 3, 2001

Introduction and architectural issues

Chapters 1-4 from the textbook.

Week 2 - January 10, 2001

Data distribution/distributed query processing

Sections 5.1-5.2 and Chapters 7&8 from the textbook.

Week 3 - January 17, 2001

Distributed query optimization

Chapter 9 from the textbook.

Chapter 3 in C. Yu and W. Meng, Principles of Query Processing for Advanced Database Applications, Morgan Kaufmann, 1997.

G. Graefe, "Query Evaluation Techniques for Large Databases", ACM Computing Surveys, June 1993.

Presenter: Ivan Bowman
D. Kossmann, "The State of the Art in Distributed Query Processing", to appear in ACM Computing Surveys. (PDF format)

Presenter: Sunny Lam

Week 4 - January 24, 2001

Multi-Database Query Processing

Section 15.2 of the textbook

Chapter 4 in C. Yu and W. Meng, Principles of Query Processing for Advanced Database Applications, Morgan Kaufmann, 1997.

L.M. Haas, D. Kossmann, E. Wimmers, and J. Yang, "Optimizing queries across diverse data sources", Proc. Int'l Conf. on VLDB, 1997. (PDF format)

Presenter: Hui Zhang
A. Tomasic, L. Raschid, and P. Valduriez, "Scaling Access to Heterogeneous Data Sources with DISCO", IEEE Trans. on Knowledge and Data Eng., 10(5): 808-823, 1998.

Presenter: Yan Wang
M.T. Roth and P. Schwarz, "Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources", Proc. Int'l. Conf. on VLDB, 1997. (PDF format)

Presenter: Lubomir Petrov Stanchev

Week 5 - January 31, 2001

Transaction Processing and Concurrency Control

Chapters 10 & 11 of the textbook.

D. Georgakopoulos, M. Hornick, and A. Sheth, "An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure", Distributed and Parallel Databases, 3: 119-153, 1995.

Presenter: Yuxin Cao
G. Weikum, "Principles and Realization Strategies of Multilevel Transaction Management", ACM Trans. on Database Systems, 16(1): 132-180, 1991.

Presenter: Ning Zhang

Week 6 - February 7, 2001

Distributed Database Reliability

Chapter 12 of the textbook.

C. Mohan and I. Narang, "ARIES/CSA: A method for database recovery in client-server architectures", Proc. ACM SIGMOD Conference, 1994, pages 55-66.

This requires knowledge of the following:

C. Mohan et al., "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging", ACM Trans. on Database Systems, 17(1): 94-162, 1992.

Presenter: Meng He

Week 7 - February 14, 2001 - Survey Talks

Hybrid Query Execution Models, Ivan Bowman (Presentation slides)

Abstract:

In this talk, I will discuss extending the client server model to allow an optimizer to select the execution site for both
relational and procedural code. In this hybrid model, clients can perform local processing of relational operators, and servers can execute fragments of procedural code on behalf of the client application. Such a model requires the following:

a query processing engine on the client, accessing either a local cache (with associated consistency issues) or retrieving data from the server;
a mechanism to execute procedural code on the server (possibly requiring state information from the client), and
an optimizer that can choose a reasonable execution plan considering these alternatives.

Data Synchronization in a Distributed Database Environment, Lubomir Stanchev (Presentation slides)

Abstract: In a distributed database the stored data is often related. For example the results from often executed queries may be stored at one or more sites in order to reduce the query result reporting time. We will refer to such stored query results as materialized views and to the data on which the queries are posed as the underlying data. The main concern in such a model is synchronizing the related data. For example we would like when the underlying data is updated the updates to be propagated to the materialized view - this problem is called view maintenance. As well we might want to allow for a restricted type of materialized view updates, and when such updates occur, the underlying data may have to be updated accordingly.

An important problem is imposing constraints on what data updates are allowed and to what extend the data should by synchronized. Possible constraints on data updates may include that materialized views may not be directly updated or that this can be done only if certain predicates on the data hold. As well we may require that local and global integrity constraints on the data should hold and only updates that preserve those integrity constraints should be allowed. Examples of synchronization constraints include specifying which data items should be always up-to-date, and which may be lagging the most up-to-date data, but not by more than a specified amount of time.

In this survey talk we will explore relevant research in the areas of view maintenance, view update and data integration. Time permitting, we will discuss how the existing theory applies to the problem of data synchronization and how we can exploit integrity constraints and auxiliary data to improve the performance of existing algorithms.

February 21, 2001 (Study week, no class)

Week 8 - February 28, 2001 - Survey Talks

The Overview of Web Search Engines, Sunny Lam (Presentation slides)

Abstract:

Consistency Control Algorithms for Web Caching, Leon Cao (Presentation slides)

Abstract:

Over the years Web Caching has become an increasingly important topic. The use of web caches has become a cheap and effective way to improve performance for all Internet users. A web cache sits between web servers and a client or many clients, and watches requests for HTML pages, images and files (generally, objects) come by, saving a copy for itself. Then, if there is another request for the same object, it will use the copy that it has, instead of asking the original server for it again.

There are two main reasons that web caches are used:

To reduce latency - Because the request is satisfied from the cache (which is closer to the client) instead of the original server, it takes less time for the client to get the object and display it. This makes web sites seem more responsive.
To reduce traffic - Because each object is only retrieved from the server once, it reduces the amount of bandwidth used by a client. This saves money if the client is paying by traffic, and keeps their bandwidth requirements lower and more manageable.

However, the web caching technology still has a lot of open issues. One of them is that many web caches do not satisfactorily keep cached contents consistent with web servers. How to ensure the consistency between cached contents in the cache, and those on the actual web server, how to check if the cached page is still fresh, and when should it be checked and refreshed if necessary, these questions lead to the topic of cache consistency. Cache consistency protocols for client/server database systems have been the subject of much study in recent years and at least a dozen different algorithms have been proposed and studied in this area.

Week 9 - March 7, 2001 - No class

Week 10 - March 14, 2001 - Survey Talks

Web Mining and Knowledge Discovery of Usage Patterns, Yan Wang (Presentation slides)

Abstract:

My talk will focus on the web usage mining, encompassing the three phases of the usage mining: Pre-processing(using the relational tables to map the usage data to web server before using data mining technology, using the usage logs directly by utilizing the pre-processing techniques), Pattern discovery and Pattern analysis. I will give a brief introduction on an example of a web usage mining system, and talk generally about some current work on this research area.

New networking architectures and their impact on DDBMS, Ning Zhang (Presentation slides)

Abstract:

Week 11 - March 21, 2001 - Research Presentations

Exploiting Networks in Distributed Sorting and Relational Operations, Ning Zhang
Temporal Analysis in Usage Pattern Discovery, Yan Wang
Lease-Augmented Cache Consistency Algorithm and its Performance Estimation, Leon Cao

Week 12 - March 28, 2001 - Research Presentations

The MultiText Query and Answering System, Sunny Lam
Automatically Partitioning Client-Server Applications, Ivan Bowman
Proposal for a System for Semantic Data Control in Distributed Database Environment, Lubomir Stanchev

University of Waterloo

Computer Science

M.T. Özsu

CS 748t Home Page