Topics in Database Systems

CS 848, Waterloo, Spring '10


  Instructor:   David Toman (david@uwaterloo.ca)
  Lectures:     Wednesday 4:00-6:30 pm MC2036A
  Office:       DC 3344, x34777
  Class Info:   http://cs.uwaterloo.ca/~david/cs848s10/


Summary:

The class will focus on issues connected with conceptual vs. physical database design and on how to execute queries/updates over advanced physical designs of databases.


Lecture Outlines:

  • Week 1: Introduction, Organization, etc.

    Reading for weeks 1 and 2:

  • Week 2: no class (presenting paper at KR 2010)

  • Week 3: Introduction Goals and Current Practice.

    The current approaches to physical design closely follows conceptual design: e.g., creating base files for all tables, adding additional indices, etc. New applications and performance requirements have lead to the introduction of additional physical structures, e.g., materialized views, but query optimization technology has fallen behind; typically using only ad-hoc techniques for including materialized views into query plans. The lecture will survey current practices, identify their weaknesses and outline possible solutions. It will also introduce the unifying theme for the remaining lectures: the development of an uniform and integrated approach to physical design that is decoupled from conceptual schemes and to query compilation and optimization in this setting.

  • Week 4. Physical Design and Schema Languages.

    How do we describe actual physical designs and how do we link them to a conceptual view of the data? The lecture will review data models and integrity constraints with the help of Description Logic and show how such a development relates to classical database constraints such as functional and inclusion dependencies. Furthermore it will discuss additional annotations, such as binding patterns and their use to describe physical designs, possibly up to the level of (sets of) main-memory records connected by pointers. The theoretical underpinnings will be accompanied by examples of fine-grained descriptions of physical designs by elaborating on traditionally monolithic data structures (such as B+ trees) via constraints. It will also consider the reasoning complexity (decidability) vs. expressive power trade-offs in schema languages: what the right trade-off and the impact on query languages, query evaluation, and query "safety" issues might be.

    Projects:

  • Week 5. How do we execute queries? (take 1: conjunctive queries)

    The lecture will study chase-based approaches to query rewriting and its limitations (e.g., the inability of rewriting conjunctive queries over conjunctive views); the impact of binding patterns for accessing indices, and the integration with (simple) cost models. Other issues discussed in this lecture will relate to handling duplicates and order of data and to approaches for accommodating these crucial features in query plans.

    Projects:

    Interpolant construction tableaux and resolution.

  • Week 6. How do we execute queries? (take 2: first-order queries)

    The lecture will first discuss shortcomings of existing approaches to rewriting complex queries based on ad-hoc approaches, such as the query graph model (QGM), and then introduce a novel technique based on the application of Craig's Interpolation Theorem to the query rewriting problem, in particular it will show how to extract rewritings from refutation proofs. In addition it will discuss the usual extensions needed for efficient query processing, e.g., binding patterns, duplicates, and ordering.

    Projects:

  • Week 7. Querying Databases through Ontologies: the "open world" approach.

    The presentation will consider an alternative to query rewriting (equivalent under constraints): the computation of certain answers. It will introduce the approach and discuss its computational price in terms of how powerful query and constraint languages are used. It will show that such an approach is computationally feasible for only relatively weak languages. It will then discuss the possibility of generating certain answer based on first-order rewritability in such a setting, i.e., for conjunctive queries over ontologies formulated in families of suitably restricted description logics, such as ELH and DL-Lite.

    KR 2010 Presentation

    Projects:

  • Week 8. Look into the Future

    The lecture will conclude with an overview of topics for future investigation and research: the topics will relate to studying constraints and queries beyond first-order logic, such as Datalog/inductive data types and their impact on physical design and query processing, the issue of updates in a decoupled conceptual and physical designs, and the impact of transactions. Projects:

  • Week 9. Updates and Constraints

    constant complament approach

  • Week 10. Inconsistent Data

    consistent query answering (1), (2)

  • Week 11-13: Presentations/Projects,

  • Class Resources:

    Announcements, class notes, etc. are/will be available here; you are expected to look them up.

    Assessment:

  • class participation (20%)
  • projects (80%) [deliverables: report (pdf), source code (if applicable)]