The course focuses on assessing enforcing quality of large data sets. Topics include constraints discovery, data cleaning, supervised and unsupervised repairing algorithms, and various pragmatic data quality issues.
The course starts by background lecture(s) to overview the topics of interest. The rest of the course is in-class presentations of key papers in each topic. Projects will be demonstrated in a poster/demo session at the end of the term.
You will write a review for three papers every week, except if you are the presenter for one of the papers we are discussing. I will assign the papers to the students for review. You should send your review to me as the body of a text e-mail message (not an HTML message, and not an attachment) by 11:59 PM on Monday before we discuss the paper. The subject line of your message should be "CS848 - Review of paper k", where k is the paper number (e.g., k= 1.2.1 is paper number 1 in section 1.2).We will use this review form.
You will come up with research-oriented project related to the topics discussed in class. The project can be: (1) proposed solution for a new research problem inspired by the work and subjects covered in class, or (2) an experimental case study to one or more already proposed solutions. Teams of one to two students can be formed.