Please note: I have prepared a list of students taking the course. As people pick topics, we will fill these. To find a partner,
The course will be divided into three parts. In the first part, I will lecture on the fundamentals of distributed data management. This will be only for two weeks or so. The second part of the course will be paper reviews and discussions. The third part of the course will be devoted to presentation of research projects.
Each of you are responsible for picking a paper that you wish to present from the readings. The presentation should go beyond the paper and give the background and also how it relates to other work. Note that the last thing I am looking for is a linear presentation of the sections in the papers. These presentations should be 20-30 minutes. You are responsible for preparing the presentation slides and making them available to me by noon of the Monday of the week you will be presenting. These slides will be put online for everyone.
This presentation will be followed by a discussion of the paper (for about 30 minutes). Everyone is expected to actively participate in the debate (note that part of the final mark is devoted to class participation). Consequently, everyone should come to class prepared with questions, counter examples, and even suggested improvements. With any luck, this will set up a debate-like atmosphere in which we can argue about the pros and cons of the basic technologies.
You might find the short brochure "Efficient Reading of Papers in Science and Technology" by Michael J. Hanson and updated by D. McNamee very useful.
Everyone will write one paper critique per week. You can choose which paper you write a critique on (of course, the are expected to write critiques of the papers they present). I will set up an on-line system for entering these critiques. Primarily, you should think of these are paper reviews for a conference and try to identify the strengths and weaknesses of the paper and how it can be improved. The reviews/critiques should be about 2 pages in length.
For paper critiques, the following (relatively old) paper should be quite useful: A.J. Smith, The Task of the Referee, IEEE Computer, April 1990.
Course organization
"Classical" distributed database issues
- Chapters 1, 4, 5 (5.1 and 5.2), 7, 8, 9, 10-12 from the principal reference.
- Course slides (PDF)
- Slides in 3-up handout format (PDF)
Web data management fundamentals
- Read the "General Reading" material and be prepared to discuss these as the background.
Paper 1: S. Raghavan and H. Garcia-Molina, Representing Web Graphs, In Proc. Int. Conf. Data Eng. (ICDE), 2003.
- Presenter: Amr El-Helw (Presentation Slides)
Paper 2: M. Kobayashi, K. Takeda, Information Retreival on the Web, ACM Computing Surveys, 32(2), June 2000.
- Presenter: Vahid Karimi (Presentation Slides)
Paper 3: A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan, Searching the Web, ACM Trans. Internet Tech., 1(1), 2001.
- Presenter: Ali Taleghani (Presentation Slides)
I am out of town, so no class this week.
Paper 4: J. Cho, H. Garcia-Molina and L. Page, Efficient crawling through URL ordering, In Proc. 7th World Wide Web Conference (WWW7), 1998. Published as Computer Networks, 30(1-7), April 1998.
- Presenter: Issam Al-Azzoni (Presentation Slides)
Paper 5: S. Chakrabarti, M. van den Berg, and B. Dom, Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery, Computer Networks, 31(11-16), 1999.
- Presenter: Mohamed Ali Soliman (Presentation Slides)
Paper 6: P. G. Ipeirotis, L. Gravano, Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection, Proc. 28th International Conference on Very Large Data Bases (VLDB), 2002.
- Presenter: Amr El-Helw (Presentation Slides)
Paper 7: R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, K. Stocker, ObjectGlobe: Ubiquitous query processing on the Internet, VLDB Journal, 10(1): 48-71, 2001.
- Presenter: Yasemin Ugur-Ozekinci (Presentation Slides)
Paper 8: S. Lam and M.T. Özsu, Querying Web Data - The WebQA Approach, Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), 2002.
- Presenter: E. Cem Sozgen (Presentation Slides)
Paper 9: E. Agichtein, S. Lawrence, and L. Gravano, Learning to find answers to questions on the Web, ACM Trans. Internet Tech., 4(2), May 2004.
- Presenter: Aseem Chema (Presentation Slides)
Paper 10: M. Ouzzani and A. Bouguettaya, Query Processing and Optimization on the Web, Distributed and Parallel Databases, 15, 2004.
- Presenter: Issam Al-Azzoni (Presentation Slides)
Paper 11: A. Kemper, C. Wiesner, Hyperqueries: Dynamic Distributed Query Processing on the Internet, Proc. 27th Int. Conference on Very Large Data Bases (VLDB), 2001.
- Presenter: Yasemin Ugur-Ozekinci (Presentation Slides)
Paper 12: J-M. Bremer and M. Gertz, Integrating Document and Data Retrieval Based on XML, VLDB Journal, to appear in 2005.
- Presenter: E. Cem Sozgen (Presentation Slides)
Paper 13: Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker, "Making Gnutella-like P2P Systems Scalable", In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2003.
- Presenter: Herman Li (Presentation Slides)
Paper 14: B. Yang, H. Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems, In Proc. of 27th International Conference on Very Large Data Bases (VLDB), 2001.
- Presenter: Alex Sung (Presentation Slides)
Paper 15: I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2001. Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003.
- Presenter: Nabeel Ahmed (Presentation Slides)
Paper 16: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A Scalable Content-Addressable Network, In Proc. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2001.
- Presenter: Alex Sung (Presentation Slides)
Paper 17: N.J.A. Harvey, M.B. Jones, S. Saroiu, M. Theimer, and A. Wolman, SkipNet: A Scalable Overlay Network with Practical Locality Properties, In Proc. 4th USENIX Symp. on Internet Tech. and Syst. (USITS), 2003.
- Presenter: David Hadaller (Presentation Slides)
Paper 18: A. Kementsietsidis, M. Arenas, and R. Miller, Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues, In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2003.
- Presenter: Rolando Blanco (Presentation Slides)
Paper 19: R.Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, I. Stoica, Querying the Internet with PIER, In Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), 2003.
- Presenter: Nabeel Ahmed (Presentation Slides)
Paper 20: B. Gedik and L. Liu, PeerCQ: A Decentralized and Self-Configuring Peer-to-Peer Information Monitoring System. In Proc. 23rd Int. Conf. on Distributed Computing Systems (ICDCS), 2003.
- Presenter: Herman Li (Presentation Slides)
Paper 21: W.S. Ng, B. C. Ooi, K-L Tan, and A. Zhou, PeerDB: A P2P-based System for Distributed Data Sharing. In Proc. 19th Int. Conf. on Data Eng. (ICDE), 2003.
- Presenter: Aseem Chema (Presentation Slides)
Paper 22: I. Brunkhorst, H. Dhraief, A. Kemper, W. Nejdl, C. Wiesner, Distributed Queries and Query Optimization in Schema-Based P2P-Systems, In Proc. Int. Workshop On Databases, Information Systems and Peer-to-Peer Computing, September 2003.
- Presenter: Mohamed Ali Soliman (Presentation Slides)
Paper 23: L. Galanis, Y. Wang, S.R. Jeffery, D.J. DeWitt, Locating Data Sources in Large Distributed Systems, In Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), 2003.
- Presenter: Rolando Blanco (Presentation Slides)
Paper 24: T. Stading, P. Maniatis, and M. Baker, Peer-to-peer caching schemes to address flash crowds, In Proc. 1st Int. Workshop on Peer-to-Peer Systems (IPTPS), 2002.
- Presenter: David Hadaller (Presentation Slides)
Paper 25: M. Harren, J. M. Hellerstein, R. Huebsch, B. T. Loo, S. Shenker and I. Stoica. Complex Queries in DHT-based Peer-to-Peer Networks, In Proc. 1st Int. Workshop on Peer-to-Peer Systems (IPTPS), 2002.
- Presenter: Ali Taleghani (Presentation Slides)
Research Presentation 1: Aseem Chema & Amr El-Helw
Research Presentation 2: Nabeel Ahmed & David Hadaller
Research Presentation 3: Issam Al-Azzoni & E. Cem Sozgen
Research Presentation 4: Rolando Blanco & Mohamed Ali Saliman
Research Presentation 5: Herman Li & Alex Sung
Research Presentation 6: Ali Taleghani & Yasemin Ugur-Ozekinci
University of Waterloo |
Computer Science |
M.T. Özsu |
CS 856 Home Page |