David Toman's Research Statement

Research Statement

David Toman

Relational database systems enjoy wide-spread acceptance and provide the basis for many every day applications. Their success is based on two integral properties of relational technology:
  • sound and well-understood foundations, and
  • efficient implementation techniques suitable for large amounts of data.
  • On the other hand, there are many limitations of the classical relational model and its implementations, in particular
  • its inability to deal elegantly with interpreted data, e.g., time and space; and
  • its close ties with large, monolithic DBMS implementations that cannot be used in embedded systems;
  • My research focuses on extensions of the relational model that alleviate the above shortcomings.

    Research Contributions (Temporal Data):

    While virtually every database application has to handle time in one form or another (and usually does so using ad-hoc techniques), past research in this area failed to produce a widely accepted and available solution. This failure can be attributed to the use of complicated approaches without solid theoretical foundations. My research proposes an alternative way of adding temporal capabilities to a relational system. The main contributions in this area are:
  • A separation (with respect to expressive power) of first-order temporal logic from two-sorted first-order logic (independently Abiteboul et al., PODS'96). Therefore, first-order temporal queries cannot be composed of simpler subqueries over the universe of single- (or fixed-) dimensional temporal relations; a common choice in most of the current proposals for temporal extensions of the relational model and/or SQL.
  • A definition of a sound and natural temporal extension of SQL including an efficient query evaluation procedure over a compactly encoded temporal databases. The proposal provides both a clean and sound theoretical foundation and a well-defined implementation path.
  • The work on a compact representation of large and potentially infinite sets of time instants in temporal databases connects my research with the area of Constraint Databases. I have studied a representation that uses periodic sets in conjunction with order constraints and efficient query evaluation procedures for deductive queries over such a representation. I have built an experimental prototype of a query evaluation engine to measure the relative performance of the individual query evaluation procedures.

    The results obtained for temporal databases feed directly into the exploration of expiration and aging of data in such systems. I have developed a new technique for query-driven expiration of database histories that, for an arbitrary fixed set o first-order (SQL) queries, compacts the existing history in such a way that the answers to the queries are preserved while the size of the result is independent of the length of the history (it only depends on the size of the active domain of the history; this was conjectured not to be possible.on logactive features of the INGRES and IBM's Starburst Database Management Systems.

    Research Contributions (Database Technology for Embedded Systems):

    An embedded control program can be often viewed as a main-memory database system tailored to suit the needs of a particular application. For performance reasons, the program will usually define concrete low level data structures to encode the database, which in turn must be understood by anyone who needs to develop or modify the program. This is in contrast with the data independence that can be achieved by using a database system. However, because of space and performance requirements, the use of current database technology is not likely to be feasible in this setting. To explore one obstacle to this, we have developed a query optimizer that compiles queries on a conceptual schema to native Java or C code that navigates low level data structures. Of crucial significance is that an arbitrary collection of such structures, perhaps already devised for an earlier version of the control program, can be given as a part of the input to the optimizer. The main contributions in this area are:
  • A query optimizer that, with the help of schematic information that describes main-memory data structures and/or interfaces of a embedded control system, transforms a high-level declarative query (e.g., in SQL) to C or JAVA code that can be executed within the embedded system. The DEMO system based on this approach has been verified by experimental means on top of the Linux operating system.
  • The development of a schema definition language based on description logic that is used by the DEMO system, study of its computational properties (in particular, the relationship between various dialects of description logics and formulas with the Ackermann quantifier prefix), and its use for query optimization.
  • While the embedded control programs themselves provide a rich application area, the DEMO system can be easily adopted to other uses, e.g., to handling network management or to processing XML documents. (the work on the DEMO project this is joint work with Grant E. Weddell at the University of Waterloo; the DEMO project is funded by NORTEL Networks and NSERC, CITO, and CFI/OIT grants).

    Future Research Plans

    My current research plans are closely tied to the above results, in particular to resolving several open questions in the area of temporal data:
  • what are the tradeoffs between expressive power of a temporal query language and the amount of data needed to answer a fixed set of queries over a database history?
  • how is the above situation affected by retroactive updates?
  • is there a relationship with approaches used for garbage collection in programming languages (or vice versa)?
  • can schematic information be used to reduce the amount of data to remember even further?
  • and in the area of embedded systems:
  • what data structures can be described by a schema language? (e.g., can the DEMO system be extended to handling inductive data types)
  • what are the limits on expressive power of the queries?
  • what is the impact of ordering of results? (this is particularly pertinent to handling XML queries).
  • In all cases I am more interested in the foundational principles of the problems and in the development of techniques to support and promote correct implementation, rather than in quick-n-dirty solutions.

    March 2002