Weekly Schedule

The following is the weekly schedule for the course. We will complete the study of "classical" distributed database systems in the first two weeks. These are the only weeks when I will be lecturing. The remainder of the time will be devoted to discussing more recent topics and your projects.

Please note: I have prepared a list of students taking the course. As people pick topics, we will fill these. To find a partner,

Lecture contents

The course will be divided into three parts. In the first part, I will lecture on the fundamentals of distributed data management. This will be only for two weeks or so. The second part of the course will be paper reviews and discussions. The third part of the course will be devoted to presentation of research projects.

Mechanics

Each of you are responsible for picking a paper that you wish to present from the readings. The presentation should go beyond the paper and give the background and also how it relates to other work. Note that the last thing I am looking for is a linear presentation of the sections in the papers. These presentations should be 20-30 minutes. You are responsible for preparing the presentation slides and making them available to me by noon of the Monday of the week you will be presenting. These slides will be put online for everyone.

This presentation will be followed by a discussion of the paper (for about 30 minutes). Everyone is expected to actively participate in the debate (note that part of the final mark is devoted to class participation). Consequently, everyone should come to class prepared with questions, counter examples, and even suggested improvements.  With any luck, this will set up a debate-like atmosphere in which we can argue about the pros and cons of the basic technologies.

You might find the short brochure "Efficient Reading of Papers in Science and Technology" by Michael J. Hanson and updated by D. McNamee very useful.

Paper Critiques

Everyone will write one paper critique per week. You can choose which paper you write a critique on (of course, the are expected to write critiques of the papers they present). I will set up an on-line system for entering these critiques. Primarily, you should think of these are paper reviews for a conference and try to identify the strengths and weaknesses of the paper and how it can be improved. The reviews/critiques should be about 2 pages in length.

For paper critiques, the following (relatively old) paper should be quite useful: A.J. Smith, The Task of the Referee, IEEE Computer, April 1990.

Week 1 - January 5, 2005

Course organization

Week 2 - January 12, 2005

"Classical" distributed database issues

Week 3 - January 19, 2005

Web data management fundamentals

Week 4 - January 26, 2005 - Paper Discussions (Web Querying/Searching)

Paper 1: S. Raghavan and H. Garcia-Molina, Representing Web Graphs, In Proc. Int. Conf. Data Eng. (ICDE), 2003.

Paper 2: M. Kobayashi, K. Takeda, Information Retreival on the Web, ACM Computing Surveys, 32(2), June 2000.

Paper 3: A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan, Searching the Web, ACM Trans. Internet Tech., 1(1), 2001.

Week 5 - February 2, 2005 - Paper Discussions

I am out of town, so no class this week.

Week 6 - February 9, 2005 - Paper Discussions (Web Querying/Searching)

Paper 4: J. Cho, H. Garcia-Molina and L. Page,  Efficient crawling through URL ordering,  In Proc. 7th World Wide Web Conference (WWW7), 1998. Published as Computer Networks, 30(1-7), April 1998.

Paper 5: S. Chakrabarti, M. van den Berg, and B. Dom, Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery, Computer Networks, 31(11-16), 1999.

Paper 6: P. G. Ipeirotis, L. Gravano, Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection, Proc. 28th International Conference on Very  Large Data Bases (VLDB), 2002.

Week 7 - February 16, 2005 - Paper Discussions (Web Querying/Searching)

Paper 7: R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, K. Stocker, ObjectGlobe: Ubiquitous query processing on the Internet, VLDB Journal, 10(1): 48-71, 2001.

Paper 8: S. Lam and M.T. Özsu, Querying Web Data - The WebQA Approach, Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), 2002.

Paper 9: E. Agichtein, S. Lawrence, and L. Gravano, Learning to find answers to questions on the Web, ACM Trans. Internet Tech., 4(2), May 2004.

Week 8 - February 23, 2005 - Paper Discussions (Web Querying/Searching)

Paper 10: M. Ouzzani and A. Bouguettaya, Query Processing and Optimization on the Web, Distributed and Parallel Databases, 15, 2004.

Paper 11: A. Kemper, C. Wiesner, Hyperqueries: Dynamic Distributed Query Processing on the Internet, Proc. 27th Int. Conference on Very  Large Data Bases (VLDB), 2001.

Paper 12: J-M. Bremer and M. Gertz, Integrating Document and Data Retrieval Based on XML, VLDB Journal, to appear in 2005.

Week 9 - March 2, 2005 - Paper Discussions (P2P)

Paper 13: Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker, "Making Gnutella-like P2P Systems Scalable", In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2003.

Paper 14: B. Yang, H. Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems, In Proc. of 27th International Conference on Very Large Data Bases (VLDB), 2001.

Paper 15: I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2001. Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003.

Week 10 - March 9, 2005 - Paper Discussions (P2P)

Paper 16: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A Scalable Content-Addressable Network, In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2001.

Paper 17: N.J.A. Harvey, M.B. Jones, S. Saroiu, M. Theimer, and A. Wolman, SkipNet: A Scalable Overlay Network with Practical Locality Properties, In Proc. 4th USENIX Symp. on Internet Tech. and Syst. (USITS), 2003.

Paper 18: A. Kementsietsidis, M. Arenas, and R. Miller, Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues, In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2003.

Week 11 - March 16, 2005 - Paper Discussions (P2P)

Paper 19: R.Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, I. Stoica, Querying the Internet with PIER, In Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), 2003.

Paper 20: B. Gedik and L. Liu, PeerCQ: A Decentralized and Self-Configuring Peer-to-Peer Information Monitoring System. In Proc. 23rd Int. Conf. on Distributed Computing Systems (ICDCS), 2003.

Paper 21: W.S. Ng, B. C. Ooi, K-L Tan, and A. Zhou, PeerDB: A P2P-based System for Distributed Data Sharing. In Proc. 19th Int. Conf. on Data Eng. (ICDE), 2003.

Week 12 - March 23, 2005 - Paper Discussions (P2P)

Paper 22: I. Brunkhorst, H. Dhraief, A. Kemper, W. Nejdl, C. Wiesner,  Distributed Queries and Query Optimization in Schema-Based P2P-Systems, In Proc. Int. Workshop On Databases, Information Systems and Peer-to-Peer Computing, September 2003.

Paper 23: L. Galanis, Y. Wang, S.R. Jeffery, D.J. DeWitt, Locating Data Sources in Large Distributed Systems, In Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), 2003.

Paper 24: T. Stading, P. Maniatis, and M. Baker, Peer-to-peer caching schemes to address flash crowds, In Proc. 1st Int. Workshop on Peer-to-Peer Systems (IPTPS), 2002.

Week 13 - March 30, 2005 - Paper Discussions (P2P) & Research Presentations

Paper 25: M. Harren, J. M. Hellerstein, R. Huebsch, B. T. Loo, S. Shenker and I. Stoica. Complex Queries in DHT-based Peer-to-Peer Networks, In Proc. 1st Int. Workshop on Peer-to-Peer Systems (IPTPS), 2002.

Research Presentation 1: Aseem Chema & Amr El-Helw

Research Presentation 2: Nabeel Ahmed & David Hadaller

Research Presentation 3: Issam Al-Azzoni & E. Cem Sozgen

Research Presentation 4: Rolando Blanco & Mohamed Ali Saliman

Research Presentation 5: Herman Li & Alex Sung

Research Presentation 6: Ali Taleghani & Yasemin Ugur-Ozekinci


[University of Waterloo]
University of Waterloo
[Department of Computer Science]
Computer Science
[M. Tamer Özsu's home page]
M.T. Özsu
[CS 856 home page]
CS 856 Home Page