Please note: I have prepared a list of students taking the course. As people pick topics, we will fill these. To find a partner,
Each week, we will tackle about three "lines of research". Lines of research is a vague term that I am using on purpose to roughly represent one approach to solving a particular problem. For example, there may be different projects that propose different approaches and each project represents one "line of research". The objective is to study multiple aspects of a topic by considering different perspectives.
For each line of research, I will indicate the material that you need to read. Each topic will be presented by a group of two students (choose your partner yourself). Each presentation will be 30 minutes followed by 30 minutes of discussion (we may have to adjust these times based on enrollment). Here's how the mechanics will work (these have been adopted - sometimes taken verbatim - from the requirements of a course that Prof. Stan Zdonik taught at Brown):
One team will be responsible for presenting a summary of the topic based on the readings. This can largely be derived from the assigned readings, but you are encouraged to go beyond these to discover other interesting work within the same "line of research". Remember that the last thing in the world that we are looking for is a linear presentation of the sections in the papers. Part of the message should be a description of how you think that the topic at hand relates to Web data management (very broadly defined). This team will try to present the area in the best possible light. You guys are the cheerleaders for the approach.
Another team will be assigned the job of being the discussants. Discussants will present a short rebuttal to the presenters talk. They will also come to class prepared with questions, counterexamples, and a generally crabby attitude toward the work. With any luck, this will set up a debate-like atmosphere in which we can argue about the pros and cons of the basic technologies.
The rest of you are not off the hook. You are expected to actively participate in the debate. Also, in order to ensure that you read the papers and think about the issues before coming to class, everyone who is not a presenter or a discussant will write a brief position paper which captures your own thoughts about the readings. These should not be longer than 2 pages in length and should reflect your views on the paper(s), not a rehash of their contents. Please note that I am expecting one position paper-per-presented paper (not one 2 page position paper for all three papers presented in one week).
It is unlikely that we will be able to accommodate everyone as a presenter; so each of you will either be a presenter or a discussant. If you are a presenter or a discussant, you will write a critique (see the guidelines) of the area/paper(s) and this will count towards one of your paper critique requirements.
Distributed data management fundamentals (architectures, data placement, query optimization)
- Chapters 1, 4, 5 (5.1 and 5.2), 7, 8, 9 from the principal reference..
- Course slides (PDF)
- Slides in 2-up handout format (PDF)
Distributed transaction processing, concurrency control, recovery, interoperability
- Chapters 10-12, 14 from the principal reference.
- Course slides (PDF)
- Slides in 2-up handout format (PDF)
Web data management fundamentals
- The first part of the class will be on fundamentals. Course slides. Slides in 2-up handout format (both in PDF).
- The second part of the class will be a guest lecture by Prof. Alberto Mendelzon. Talk slides (PDF).
Talk 1:
- Presenter: Lukasz Golub (Presentation notes)
- Paper: S. Chandrasekaran and M.J. Franklin, "Streaming Queries over Streaming Data", Proc. 28th International Conference on Very Large Data Bases, 2002. Download
- Discussant: Mohammad Ahmad Munawar (Presentation notes)
Talk 2:
- Presenter: Pei Man James She (Presentation notes)
- Paper: A. Datta, K. Dutta, H. Thomas, D. VanderMeer, Suresha, K. Ramamritham, "Proxy-Based Acceleration of Dynamically Generated Content on the World Wide Web: An Approach and Implementation", Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2002. Download
- Discussant: Mohammed Abouzour (Presentation notes)
Talk 3:
- Presenter: Yiwen Huang (Presentation notes)
- Paper: Q. Luo, S. Krishnamurthy, C. Mohan, H. Pirahesh, H. Woo, B. G. Lindsay, and J. F. Naughton, "Middle-tier Database Caching for e-Business", Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2002. Download
- Discussant: Joon Wang (Presentation notes)
Talk 4:
- Presenter: Hossein Sheikh Attar (Presentation notes)
- Paper: K. Yagoub, D. Florescu, V. Issarny, P. Valduriez, "Caching Strategies for Data-Intensive Web Sites", Proceedings of 26th International Conference on Very Large Data Bases (VLDB), pages 188-199, 2000. Download
- Discussant: Aziz Kara (Presentation notes)
Talk 5:
- Presenter: Milenko Petrovic (Presentation notes)
- Paper: F. Fabret, H. A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. "Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems ", Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2001. Download
- Discussant: Ioana Burcea (Presentation notes)
Talk 6:
- Presenter: Fei Yuan (Presentation notes)
- Paper: C. Bobineau, L. Bouganim, P. Pucheral, P. Valduriez, "PicoDMBS: Scaling Down Database Techniques for the Smartcard", The VLDB Journal, 10(2-3): 120-132, 2001. Download
- Discussant: Hesham Fahmy (Presentation notes)
Talk 7:
- Presenter: Weimin Li (Presentation notes)
- Paper: M. Altinel, M. J. Franklin, "Efficient Filtering of XML Documents for Selective Dissemination of Information", Proceedings of 26th International Conference on Very Large Data Bases (VLDB), pages 53-64, 2000. Download
- Discussant: Yutao Guo (Presentation notes)
Talk 8:
- Presenter: Huaxin Zhang (Presentation notes)
- Paper: Paper: P. Kalnis, W. S. Ng, B. C. Ooi, D. Papadias, K. L. Tan, An Adaptive Peer-to-Peer Network for Distributed Caching of OLAP Results, Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2002. Download
- Discussant: Yaya Yang (Presentation notes)
Talk 9:
- Presenter: Jin Xiao (Presentation notes)
- Paper: B. Yang, H. Garcia-Molina, "Comparing Hybrid Peer-to-Peer Systems", Proceedings of 27th International Conference on Very Large Data Bases (VLDB), 2001. Download
- Discussant: Wenli Liu (Presentation notes)
Talk 10:
- Presenter: Qiang Wang (Presentation notes)
- Paper: Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents", IEEE Trans. Knowledge and Data Management, 14(4), July/August 2002. Download
- Discussant: Robert Neugebauer (Presentation notes)
Talk 11:
- Presenter: Yongjuan Zou (Presentation notes)
- Paper: M. Rodriguez, and N. Roussopoulos, "MOCHA: A Self-Extensible Database Middleware System for Distributed Data sources", Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2000. Download
- Discussant: Yuhui Wen (Presentation notes)
Talk 12:
- Presenter: Catalin Visinescu (Presentation notes)
- Paper: D. Aksoy, M. J. Franklin, S. B. Zdonik, "Data Staging for On-Demand Broadcast", Proc. 27th International Conference on Very Large Data Bases (VLDB), 2001. Download
- Discussant: Philip Tilker (Presentation notes)
Talk 13:
- Presenter: Shabnam Surjintsingh Sobti (Presentation notes)
- Paper: O. Etzioni, "The World Wide Web: Quagmire or Gold Mine", Communications of ACM, 39(11): 65-58, 1996. Download
- Discussant: We don't have a discussant for this paper.
Talk 14:
- Presenter: Jack Ng (Presentation notes)
- Paper: S. Raghavan, H. Garcia-Molina, "Crawling the Hidden Web", Proc. 27th International Conference on Very Large Data Bases (VLDB), 2001. Download
- Discussant: Xuhui Li (Presentation notes)
Talk 15:
- Presenter: Wei Xie (Presentation notes)
- Paper: A. Crespo, H. Garcia-Molina, "Routing Indices For Peer-to-Peer Systems", Proceedings of International Conference on Distributed Computing Systems (ICDCS), 2002. Download
- Discussant: Cyrus Tishan Mills (Presentation notes)
- Presentation 1: Join ordering heuristics for continuous queries over data streams, Lukasz Golab and Weimin Li
- Presentation 2: Towards the Scalability of Web Crawling Using Genetic Algorithms and Intelligent Agentss, Shabnam S. Sobti and Mohammed Abouzour
- Presentation 3:Scalable Data Searching and Routing Strategies for Internet P2P Systems, Fei Yuan and Xuhui Li
- Presentation 4:Data Summary Techniques for Streaming Data in the Context of Non-Blocking Aggregate Operators, Aziz Kara and Tishan Cyrus Mills
- Presentation 5:Distributed multimedia on-demand streaming using P2P networks: Credit based management, Jin Xiao and Wenli Liu
- Presentation 6:A Hint Based Approach to Cache Dynamically Generated Content in Forward Proxies, Wei Xie, Yuhui Wen, and Mohammad Munawar
- Presentation 7: Data Redistribution in a Web-Based Distributed Database - The Spring Approach, Catalin Visinescu, Pei Man James She
- Presentation 8: Providing strong consistency in dynamic Web caching, Yaya Yang and Hossein S. Attar
- Presentation 9: Adding Semantic Capabilities to Publish/Subscribe Systems, Ioana Burcea and MilenkoPetrovic
- Presentation 10: A Mobile Transactional Model for Mobile Hybrid Peer to Peer Networks, Yiwen Huang and Hesham Fahmy
- Presentation 11: Volatility-based Indexing Strategy for the Hidden Web, Jack H.W. Ng and Joon Wong
- Presentation 12: Semantic Relevance Feedback for Question Answering System, Huaxin Zhang and Yongjuan Zou
- Presentation 13: A Caching Strategy on Web Question Answering Systems, Qiang Wang and Yutao Guo
- Presentation 14: Aggregation and Resources in a Distributed Environment, Robert Neugebauer and Philip Tilker
University of Waterloo |
Computer Science |
M.T. Özsu |
CS 856 Home Page |