CS 848
Advanced Topics in Databases -
Big Data Management Platforms
(Winter 2019)
M. Tamer Özsu
tamer.ozsu@uwaterloo.ca
DC 3350
Lecture times: Tuesday 9:00-11:50AM
Lecture location: DC 2568
Calendar Description
This is a seminar course that will cover the advances in big data processing and the platforms that have been designed for this purpose.
References
There is no textbook required, but I will provide (online) some chapters of the upcoming 4th edition Principles of Distributed Database Systems, M. Tamer Özsu and Patrick Valduriez
For the most part, we will read papers from literature. For access to papers, you can consult the DBLP online bibliography, ACM Digital Library and the IEEE XPlore:
Classes
This is a seminar course, and as such most of our time will be spent on reading, presenting and discussing recent publications. The list of papers are here, but you can suggest other papers you might be interested in -- just let me know. The schedule will be roughly as follows (week numbers include the reading week):
- Week 1: I will give one lecture to establish a common ground and to give us time to get organized.
- There is draft chapter of a forthcoming book that would be a good background. Login to LEARN and you will see it at the home page under "About this course". I welcome feedback on the chapter, but please do not distribute or post anywhere.
- The slides I will use are here.
- Weeks 2-11: We discuss three papers per week (total 27 papers) -- see below for details.
- Weeks 12-13: Project presentations
Workload and Evaluation (tentative and may change depending on class size)
- Class presentation (20%)
- Each student will present two papers in class, critique it, and lead the Q&A on it (25 minutes+20 minutes Q&A). Each of these presentations will be 10% of the mark.
- We will spend 9 weeks for paper presentation and discussion, which gives us 27 papers to discuss. If the class size does not allow us to have two presentations per student, some of the students will be asked to undertake another task to account for the 10%
- This talk is supposed to be an in depth description and analysis of the papers. Make sure you read and understand the paper fully as well as any background you need. The presentations should not be hand-waivy but deep and instructive demonstrating that you have fully understood all aspects of the paper.
- The evaluation form (see below) is a good guide in preparing your presentations.
- It is the responsibility of the presenter to fully explain each paper. Therefore, you are expected to know and understand all the aspects of the material, which may require you to do additional background reading.
- Please submit the presentation slides by 7pm on the Monday prior to your presentation.
- See schedule (which will be filled in as we go along)...
- Presentation evaluations (10%)
- Each student has to fill out an evaluation form for each presentation by 5PM on Thursday of the week when the presentation is given. You can download the PDF or Word version
- Paper reviews (20%)
- All students are to have read all three papers before the class, and to have submitted a review for two of them (of the student's choice) by 7pm on Monday before the lecture.
- Class participation (10%)
- Participation in the discussions in class (we'll have about 20 minutes discussion for each paper).
- Since this is a seminar course, this discussion is very important. Although the presenter will lead the discussion, my expectation is that everyone in class will participate.
- Term project (40%)
- Done in groups of two.
- Each group needs to find a topic to research and write a report.
- See here for further details
Submission of Course material
- I have set up a dropbox folder for you to drop things intop; drop everything here.
- Everything should be submitted in PDF format.
- Please name your files as follows: <week_no>-<last_name>-<type><number>.pdf where <type> is one of {eval, review,slides}. "review" is for paper reviews, "eval" is for presentation evaluations, "slides" is for presentation slides when it is your turn; <number> is obvious.
Course Outline
- Big Data platform review
- Distributed data stores
- Main memory systems
- MapReduce-based data management
- Streaming data management
- Graph processing
- Machine learning for big data analytics
- RDF data management
Schedule
The weekly schedule is here.
Administrative details
Please review the materials concerning academic integrity and academic honesty.