From Data Independence to Ontology Based Data Access (and back)

Among the most commonly cited features of the ontology based data access (OBDA) approach to accessing data sources is its ability to use a high-level user-friendly interface to a conceptual understanding of the data (aka ontologies), while still utilizing low-level but efficient ways of representing the data in a computer store. The aim of this tutorial is to compare and contrast this OBDA based approach with approaches centered around the concept of data independence that has been under development in the area of database systems since the early 1970s. The tutorial focuses on the common lessons shared by all approaches, and on how each can benefit from lessons learned from the other.

Location and Time:	June 4 afternoon, LPAR 2023, Manizales, Colombia (room W-134 of Building W, Faculty of Exact and Natural Sciences, Campus la Nubia)
Tutorial Slides:	itb-tutorial.pdf (still being updated)

Tutorial Synopsis

Accessing information using a high-level data model or ontology has been a long-standing objective of research communities in several areas. In work based on knowledge representation in artificial intelligence (AI), this objective commonly falls under the heading of OBDA and of ontology mediated querying (OMQ), and has fostered the development of approaches using query rewriting or using variants of the so-called combined approach. However, the underlying idea of separating an ontological view of how information must be understood by users from a physical view of the layout of data in data structures-called data independence-has been the focus of work in the area of information systems for more than fifty years. This tutorial explores how the original idea of data independence evolved and ultimately culminated in logic-based approaches to information management by systems that has enabled high-level ontological views of information entirely devoid of any low-level physical views of concrete data layout. An integral part of the tutorial is to explore the relationship between such high-level ontologies that users see and an understanding of the physical representation of such information in computer systems that is necessary to attain acceptable performance. The tutorial will address the latter by showing how ontologies derived by ontology design in AI can be used in a way that achieves an understanding of physical encoding of information sufficiently fine grained to ensure the performance of code ultimately executed to satisfy users' information requests can be competitive with solutions hand-written in low-level programming languages such as C.

Detailed Outline

Ontologies, Logical Theories, and Data Independence. We start by introducing the idea of data independence itself and show how logic-based AI technologies can be used to formally capture this idea. We also survey the key developments and barriers to full adoption of the idea in information systems;
Physical Design as Logical Design. We continue with representative examples of what can be achieved by full adoption of this idea. We focus on the link between the conceptual/logical understanding of the information and its physical representation in a computer systems (called physical design). We also show how knowledge representation can be used to account for various intricacies of a physical design;
Supporting Technology. We discuss AI technologies needed to make the idea of data independence viable in practice, focusing on issues relating to generation of efficient code that can be subsequently integrated in applications and information systems;
Open Problems. We conclude the tutorial with an outline of directions for further research, and with a list of open issues related to physical data independence in ontology-based information systems.

Audience and Background

The topics covered in the tutorial are of interest to wide range of AI researchers and to members of the general public with an interest in knowledge representation. In particular, the tutorial targets the following groups:

Undergraduate and graduate students and junior researchers: the tutorial introduces this group to state-of-the-art approaches to addressing issues connected with representation, storage, and manipulation of information and to modern techniques that address these issues;
Researchers in the area of knowledge representation and other areas of AI: the tutorial provides bridges to many areas of AI where large data sets are used, ranging from approaches to knowledge representation and, in particular, implementation of such systems, to managing information for semantic WEB systems;
Industry practitioners and developers: the tutorial provides ideas how development of software systems, in particular in the critical phase of conceptual modelling and its mapping to physical computer storage, can be improved and what tools are available to aid this goal;
Members of the general public, with an interest in logical underpinnings of logic-based information management and in technologies based on these ideas.

The tutorial assumes the audience is familiar with the basics of first order logic and of conceptual modelling formalisms (such as ER or UML) at the introductory university course level. No knowledge of particular ontology/KR languages such as description logics and other formalisms is assumed.

Relevance to LPAR 2023

The tutorial focuses on foundational issues relating to the use of high-level languages, such as the relational calculus and its extensions, to query and modify data stored in computer systems. This includes knowledge bases, ontologies, and information systems based on ontologies. It shows how issues relating both to user appreciation of the information and to effective use of the underlying computing infrastructure can be comprehensively addressed. Since every design of an information system faces decisions relating to how external entities will be represented within such a system (in addition to representing various properties of such entities), a general approach to this problem is of interest both to ontology/knowledge base/database schema developers/engineers and to programmers/DBAs who design the so called physical designs, ways information is actually represented in low-level data structures to facilitate efficient data manipulation. The tutorial also shows how automated reasoning can serve as the basis for tools supporting automated translation of such high-level information requests to low-level accesses to performance-tailored data structures. Thus the proposed tutorial fits with LPAR's focus on combining automated reasoning, computational logic, and programming languages and on their practical applications. Interestingly, the approach to the representation and storage of information discussed in the tutorial naturally and seamlessly complements standard approaches in conceptual and ontology design methodologies. The tutorial is thus of interest both to researchers in knowledge representation and to practitioners in the wide area of information management.

About the Authors

Dr. David Toman and Dr. Grant Weddell are professors of Computer Science at the University of Waterloo, Canada. They have published and presented results in the area of knowledge representation over the last 20 years at premier AI conferences (including a Reiter Prize at KR 2010 and Best Paper Prize at ISWC 2013); Dr. Toman has also given tutorials in the area of temporal representation and reasoning and temporal databases and information systems that has led to an invited chapter in the Handbook of Temporal Reasoning in Artificial Intelligence.

Presenters' Background in the Area of the Tutorial

The authors have longstanding interest in the area of the tutorial, and have published a monograph on Fundamentals of Physical Design and Query Compilation on this topic. They are also experts on OBDA/OMQ approaches to query answering in knowledge representation systems and were awarded (with coauthors) the Ray Reiter Prize in 2010 for their work on the combined approach to OBDA at KR 2010 and later the Best Paper Prize at ISWC13. They are authors of many other papers on this topic and have been developing an experimental system that validates the general approach to data independence discussed in the tutorial.

The authors have presented numerous tutorials on the topics ranging from management of temporal data to query compilation (this tutorial) and to the use of referring expressions in knowledge representation and information systems. The last tutorials were based on results developed together with Alexander Borgida (Rutgers) for which they were awarded the Ray Reiter Best Paper prize at KR 2016. Subsequently, with their coauthors, they were awarded the 2018 Bob Wielinga Best Paper Award for the paper furthering the use of referring expressions in conceptual modelling. The tutorials were as folows:

Referring Expressions in Artificial Intelligence and Knowledge Representation Systems. 17th International Conference on Principles of Knowledge Representation and Reasoning (part of The Federated Logic Conference FLoC 2022), Haifa, Israel;
David Toman, Grant E. Wedell: Projective Beth Definability and Craig Interpolation for Relational Query Optimization (Material to Accompany Invited Talk). Second-Order Quantifier Elimination and Related Topics, SOQE@KR 2021, online, 1-13, 2021.
From Data Independence to Ontology Based Data Access (and back). 29th International Joint Conference on Artificial Intelligence, IJCAI (2020), online, Japan.
Referring Expressions in Knowledge Representation Systems. 24th European Conference on Artificial Intelligence, ECAI (2020), online, Spain;
Referring Expressions in Knowledge Representation Systems. 28th International Joint Conference on Artificial Intelligence, IJCAI (2019), Macao, China;
Referring Expressions in Knowledge Representation Systems. The 16th Pacific Rim International Conference on Artificial Intelligence, PRICAI (2019), Cuvu, Yanuca Island, Fiji;
Managing and Communicating Object Identities in Knowledge Representation and Information Systems. Australasian Joint Conference on Advances in ArtificialAI (2018), Wellington, New Zealand;
Referring Expressions in Ontologies and Query Answering. Formal Ontology in Information Systems (FOIS/JOWO) 2018, Cape Town, South Africa.
Expiration of Data (invited keynote) TIME 2002, IEEE International Symposium on Temporal Representation and Reasoning, Manchester, UK, July 2002.
Temporal Databases (invited tutorial with Jan Chomicki) International Conference Advances in Database Technology EDBT'98, Valencia, Spain, March 1998.
Temporal Databases (invited tutorial with Jan Chomicki) International Conference on Temporal Logic, ICTL'97, Manchester, UK, July 1997.

Moreover, as Professors of Computer Science in one of the top-ranked CS programs, the authors have been lecturing on both the graduate and undergraduate level for more than twenty years each on topics ranging from introductory lectures on Logic, to specialized graduate lectures on Description Logic and Knowledge Representation, and to senior-year lectures on Programming Languages and on Database Systems Implementation.

Contact

Name:	David Toman and Grant Weddell
Affiliation:	Cheriton School of Computer Science, University of Waterloo
Address:	200 University Ave W., Waterloo, ON N2L3G1, Canada
E-mail:	`{david,gweddell}@uwaterloo.ca`
WWW:	`cs.uwaterloo.ca/~{david,gweddell}`

Resources

Tutorial Materials

Tutorial Slides: itb-tutorial.pdf