Instructor: David Toman (david@uwaterloo.ca) Lectures: Wednesday 4:00-6:30 pm MC2036A Office: DC 3344, x34777 Class Info: http://cs.uwaterloo.ca/~david/cs848s10/
Lecture Outlines:
Reading for weeks 1 and 2:
The current approaches to physical design closely follows conceptual design: e.g., creating base files for all tables, adding additional indices, etc. New applications and performance requirements have lead to the introduction of additional physical structures, e.g., materialized views, but query optimization technology has fallen behind; typically using only ad-hoc techniques for including materialized views into query plans. The lecture will survey current practices, identify their weaknesses and outline possible solutions. It will also introduce the unifying theme for the remaining lectures: the development of an uniform and integrated approach to physical design that is decoupled from conceptual schemes and to query compilation and optimization in this setting.
How do we describe actual physical designs and how do we link them to a conceptual view of the data? The lecture will review data models and integrity constraints with the help of Description Logic and show how such a development relates to classical database constraints such as functional and inclusion dependencies. Furthermore it will discuss additional annotations, such as binding patterns and their use to describe physical designs, possibly up to the level of (sets of) main-memory records connected by pointers. The theoretical underpinnings will be accompanied by examples of fine-grained descriptions of physical designs by elaborating on traditionally monolithic data structures (such as B+ trees) via constraints. It will also consider the reasoning complexity (decidability) vs. expressive power trade-offs in schema languages: what the right trade-off and the impact on query languages, query evaluation, and query "safety" issues might be.
Projects:
The lecture will study chase-based approaches to query rewriting and its limitations (e.g., the inability of rewriting conjunctive queries over conjunctive views); the impact of binding patterns for accessing indices, and the integration with (simple) cost models. Other issues discussed in this lecture will relate to handling duplicates and order of data and to approaches for accommodating these crucial features in query plans.
Interpolant construction tableaux and resolution.
The lecture will first discuss shortcomings of existing approaches to rewriting complex queries based on ad-hoc approaches, such as the query graph model (QGM), and then introduce a novel technique based on the application of Craig's Interpolation Theorem to the query rewriting problem, in particular it will show how to extract rewritings from refutation proofs. In addition it will discuss the usual extensions needed for efficient query processing, e.g., binding patterns, duplicates, and ordering.
The presentation will consider an alternative to query rewriting (equivalent under constraints): the computation of certain answers. It will introduce the approach and discuss its computational price in terms of how powerful query and constraint languages are used. It will show that such an approach is computationally feasible for only relatively weak languages. It will then discuss the possibility of generating certain answer based on first-order rewritability in such a setting, i.e., for conjunctive queries over ontologies formulated in families of suitably restricted description logics, such as ELH and DL-Lite.
KR 2010 Presentation
The lecture will conclude with an overview of topics for future investigation and research: the topics will relate to studying constraints and queries beyond first-order logic, such as Datalog/inductive data types and their impact on physical design and query processing, the issue of updates in a decoupled conceptual and physical designs, and the impact of transactions. Projects:
constant complament approach
consistent query answering (1), (2)