Weekly Schedule

The following is the weekly schedule for the course. We will complete the study of "classical" distributed database systems in the first two weeks. These are the only weeks when I will be lecturing. The remainder of the time will be devoted to discussing more recent topics and your projects.

Please note: I have prepared a list of students taking the course. As people pick topics, we will fill these. To find a partner,

Lecture contents

The course will be divided into three parts. In the first part, I will lecture on the fundamentals of distributed data management. This will be only for two weeks or so. The second part of the course will be paper reviews and discussions. The third part of the course will be devoted to presentation of research projects.

Mechanics

Each of you are responsible for picking a paper that you wish to present from the readings. The presentation should go beyond the paper and give the background and also how it relates to other work. Note that the last thing I am looking for is a linear presentation of the sections in the papers. These presentations should be 20-30 minutes. You are responsible for preparing the presentation slides and making them available to me by noon of the Monday of the week you will be presenting. These slides will be put online for everyone.

This presentation will be followed by a discussion of the paper (for about 30 minutes). Everyone is expected to actively participate in the debate (note that part of the final mark is devoted to class participation). Consequently, everyone should come to class prepared with questions, counter examples, and even suggested improvements.  With any luck, this will set up a debate-like atmosphere in which we can argue about the pros and cons of the basic technologies.

You might find the short brochure "Efficient Reading of Papers in Science and Technology" by Michael J. Hanson and updated by D. McNamee very useful.

Paper Critiques

Everyone will write one paper critique per week. You can choose which paper you write a critique on (of course, the are expected to write critiques of the papers they present). I will set up an on-line system for entering these critiques. Primarily, you should think of these are paper reviews for a conference and try to identify the strengths and weaknesses of the paper and how it can be improved. The reviews/critiques should be about 2 pages in length.

For paper critiques, the following (relatively old) paper should be quite useful: A.J. Smith, The Task of the Referee, IEEE Computer, April 1990.

Week 1 - September 14, 2005

Course organization

"Classical" distributed database issues

Week 2 - September 21, 2005

Review of topics covered in past term

Week 3 - September 28, 2005 - Paper Discussions (Web Data Integration)

Paper W1: K. C.-C. Chang, B. He, and Z. Zhang. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web, In Proc. 2nd Conf. on Innovative Data Systems Research (CIDR 2005), January 2005.

Paper W2: S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm, In Proc. Int. Conf. on Data Engineering, 2002.

Paper W3: W. Wu, C. Yu, A. Doan, and W. Meng. An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web, In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2004.

Week 4 - October 5, 2005 - Paper Discussions (Data Streams)

Paper S1: R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. In Proc. 1st Biennial Conf. on Innovative Data Syst. Res., pages 245-256, 2003.

Paper S2: D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120-139, Aug 2003.

Paper S3: J. Li, D. Maier, K. Tufte, V. Papadimos, and P. Tucker. Semantics and evaluation techniques for window aggregates in data streams. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 311-322, 2005.

Week 5 - October 12, 2005 - Paper Discussions (Web Data Integration)

Paper W4: A. Kementsietsidis, M. Arenas, R. J. Miller: Mapping Data in Peer-to-Peer Systems. Semantics and Algorithmic Issues, In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 325-336, 2003.

Paper W5: B. He, K. C.-C. Chang, and J. Han. Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach, In Proc. ACM SIGKDD Conference (KDD 2004), pages 148-157, 2004.

Paper W6: A. Doan, P. Domingos, and A. Halevy. Learning to Match the Schemas of Databases: A Multistrategy Approach, Machine Learning, 50(3): 279 - 301, 2003.

Week 6 - October 19, 2005 - Paper Discussions (Data Streams)

Paper S4: L. Golab and M. T. Ozsu. Processing sliding window multi-joins in continuous queries over data streams. In Proc. 29th Int. Conf. on Very Large Data Bases, pages 500-511, 2003.

Paper S5: A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In Proc. 30th Int. Conf. on Very Large Data Bases, pages 336-347, 2004.

Paper S6: B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator scheduling for memory minimization in data stream systems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 253-264, 2003.

Week 7 - October 26, 2005 - Paper Discussions (Web Data Integration)

Paper W7: R. McCann, B. K AlShebli, Q. Le, H. Nguyen, L. Vu, A. Doan. Mapping Maintenance for Data Integration Systems. In Proc. 31st Int. Conf. on Very Large Data Bases, 2005.

Paper W8: E. Rahm, and P. A. Bernstein. A survey of approaches to automatic schema matching, The VLDB Journal, 10(3): 334-350, 2001.

Paper W9: Zachary G. Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld. An Adaptive Query Execution System for Data Integration, In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1999.

Week 8 - November 2, 2005 - Paper Discussions (Data Streams)

Paper S7: R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 261-272, 2000.

Paper S8: A. Ayad and J. Naughton. Static optimization of conjunctive queries with sliding windows over unbounded streaming information sources. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 419- 430, 2004.

Paper S9: S. Chandrasekaran and M. Franklin. Streaming queries over streaming data. In Proc. 28th Int. Conf. on Very Large Data Bases, pages 203-214, 2002.

Week 9 - November 9, 2005 - Paper Discussions (Web Data Integration)

Paper W10: A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions, In Proc. Int. Conf. on Very Large Data Bases, 1996.

Paper W11: M. Lenzerini. Data Integration - A Theoretical Perspective, In Proc. ACM Symp. on Principles of Database Systems, 2002.

Paper W12: S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and Efficient Fuzzy Match for Online Data Cleaning. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2003.

Week 10 - November 16, 2005 - Paper Discussions (Data Streams)

Paper S10: S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream Filters. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 407-418, 2004.

Paper S11: J-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In Proc. 21st Int. Conf. on Data Engineering, pages 779-790, 2005.

Paper S12: N. Shivakumar and H. Garcia-Molina. Wave-indices: indexing evolving databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 381-392, 1997.

Week 11 - November 23, 2005 - Paper Discussions (Web Data Integration)

Paper W13: L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang. Optimizing Queries Across Diverse Data Sources, In Proc. Int. Conf. on Very Large Data Bases, pages 276-285, 1997.

Paper W14: R. Fagin, A. Lotem, M. Naor. Optimal aggregation algorithms for middleware, In Proc. 20th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 2001.

Paper W15: A. Y. Halevy, O. Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, I. Tatarinov. Crossing the Structure Chasm, In Proc. Conf. on Innovative Data Systems Research (CIDR), 2003

Week 12 - November 30, 2005 - Paper Discussions (Data Streams)

Paper S13: M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Fault tolerance in the Borealis distributed stream processing system. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 13-24, 2005.

Paper S14: C. Estan and G. Varghese. New directions in traffic measurement and accounting. In Proc. SIGCOMM Conference, pages 323-336, 2002.

Paper S15: D. Kifer, S. Ben-David, J. Gehrke. Detecting change in data streams. In Proc. 30th Int. Conf. on Very Large Data Bases, pages 180-191, 2004.


[University of Waterloo]
University of Waterloo
[Department of Computer Science]
Computer Science
[M. Tamer Özsu's home page]
M.T. Özsu
[CS 856 home page]
CS 856 Home Page