From Data Independence to Ontology Based Data Access (and back)

Among the most commonly cited features of the ontology based data access (OBDA) approach to accessing data sources is its ability to use a high-level user-friendly interface to a conceptual understanding of the data (aka ontologies), while still utilizing low-level but efficient ways of representing the data in a computer store. The aim of this tutorial is to compare and contrast this OBDA based approach with approaches centered around the concept of data independence that has been under development in the area of database systems since the early 1970s. The tutorial focuses on the common lessons shared by all approaches, and on how each can benefit from lessons learned from the other.

Location and Time:	TBA, IJCAI 2020, Yokohama, Japan.
Tutorial Slides:	itb-tutorial.pdf

Tutorial Synopsis

Accessing information using a high-level data model or ontology has been a long-standing objective of research communities in several areas. In work based on knowledge representation in artificial intelligence (AI), this objective commonly falls under the heading of OBDA and of ontology mediated querying (OMQ), and has fostered the development of approaches using query rewriting or using variants of the so-called combined approach. However, the underlying idea of separating an ontological view of how information must be understood by users from a physical view of the layout of data in data structures-called data independence-has been the focus of work in the area of information systems for more than fifty years. This tutorial explores how the original idea of data independence evolved and ultimately culminated in logic-based approaches to information management by systems that has enabled high-level ontological views of information entirely devoid of any low-level physical views of concrete data layout. An integral part of the tutorial is to explore the relationship between such high-level ontologies that users see and an understanding of the physical representation of such information in computer systems that is necessary to attain acceptable performance. The tutorial will address the latter by showing how ontologies derived by ontology design in AI can be used in a way that achieves an understanding of physical encoding of information sufficiently fine grained to ensure the performance of code ultimately executed to satisfy users' information requests can be competitive with solutions hand-written in low-level programming languages such as C.

Detailed Outline

Ontologies, Logical Theories, and Data Independence. We start by introducing the idea of data independence itself and show how logic-based AI technologies can be used to formally capture this idea. We also survey the key developments and barriers to full adoption of the idea in information systems;
Physical Design as Logical Design. We continue with representative examples of what can be achieved by full adoption of this idea. We focus on the link between the conceptual/logical understanding of the information and its physical representation in a computer systems (called physical design). We also show how knowledge representation can be used to account for various intricacies of a physical design;
Supporting Technology. We discuss AI technologies needed to make the idea of data independence viable in practice, focusing on issues relating to generation of efficient code that can be subsequently integrated in applications and information systems;
Open Problems. We conclude the tutorial with an outline of directions for further research, and with a list of open issues related to physical data independence in ontology-based information systems.

Audience and Background

The topics covered in the tutorial are of interest to wide range of AI researchers and to members of the general public with an interest in knowledge representation. In particular, the tutorial targets the following groups:

Undergraduate and graduate students and junior researchers: the tutorial introduces this group to state-of-the-art approaches to addressing issues connected with representation, storage, and manipulation of information and to modern techniques that address these issues;
Researchers in the area of knowledge representation and other areas of AI: the tutorial provides bridges to many areas of AI where large data sets are used, ranging from approaches to knowledge representation and, in particular, implementation of such systems, to managing information for semantic WEB systems;
Industry practitioners and developers: the tutorial provides ideas how development of software systems, in particular in the critical phase of conceptual modelling and its mapping to physical computer storage, can be improved and what tools are available to aid this goal;
Members of the general public, with an interest in logical underpinnings of logic-based information management and in technologies based on these ideas.

The tutorial assumes the audience is familiar with the basics of first order logic and of conceptual modelling formalisms (such as ER or UML) at the introductory university course level. No knowledge of particular ontology/KR languages such as description logics and other formalisms is assumed.

Relevance to IJCAI 2020

The tutorial focuses on foundational issues relating to representation of information in computer systems, including knowledge bases, ontologies, and information systems based on ontologies, and on how issues relating both to user appreciation of the information and to effacing use of the underlying computing infrastructure can be comprehensively addressed. Since every design of an information system faces decisions relating to how external entities will be represented within such a system (in addition to representing various properties of such entities), a general approach to this problem is of interest to ontology developers/engineers and data scientists. Interestingly, the approach to the representation and storage of information discussed in the tutorial naturally and seamlessly complements standard approaches in conceptual and ontology design methodologies. The tutorial is thus of interest both to researchers in knowledge representation and to practitioners in the wide area of information management.

About the Authors

Dr. David Toman and Dr. Grant Weddell are professors of Computer Science at the University of Waterloo, Canada. They have published and presented results in the area of knowledge representation over the last 20 years at premier AI conferences (including a Reiter Prize at KR 2010 and Best Paper Prize at ISWC 2013); Dr. Toman has also given tutorials in the area of temporal representation and reasoning and temporal databases and information systems that has led to an invited chapter in the Handbook of Temporal Reasoning in Artificial Intelligence.

Presenters' Background in the Area of the Tutorial

The authors have longstanding interest in the area of the tutorial, and have published a monograph on Fundamentals of Physical Design and Query Compilation on this topic. They are also experts on OBDA/OMQ approaches to query answering in knowledge representation systems and were awarded (with coauthors) the Ray Reiter Prize in 2010 for their work on the combined approach to OBDA at KR 2010 and later the Best Paper Prize at ISWC13. They are authors of many other papers on this topic and have been developing an experimental system that validates the general approach to data independence discussed in the tutorial.

The authors have recently presented tutorials on the topic of referring expressions in knowledge representation and information systems, based on results developed together with Alexander Borgida (Rutgers) for which they were awarded the Ray Reiter Best Paper prize at KR 2016. Subsequently, with their coauthors, they were awarded the 2018 Bob Wielinga Best Paper Award for the paper furthering the use of referring expressions in conceptual modelling. The tutorials were as follows:

Referring Expressions in Ontologies and Query Answering at the 10th International Conference on Formal Ontology in Information Systems, FOIS 2018 (in Cape Town, South Africa, September 2018), and
Managing and Communicating Object Identities in Knowledge Representation and Information Systems at the 31st Australasian Joint Conference on Artificial Intelligence AI 2018 (in Wellington, New Zealand, December 2018).
Referring Expressions in Knowledge Representation Systems at the 28th International Joint Conference on Artificial Intelligence (in Macao, SAR China, August 2019).

Moreover, as Professors of Computer Science in one of the top-ranked CS programs, the authors have been lecturing on both the graduate and undergraduate level for more than twenty years each on topics ranging from introductory lectures on Logic, to specialized graduate lectures on Description Logic and Knowledge Representation, and to senior-year lectures on Programming Languages and on Database Systems Implementation.

Contact

Name:	David Toman and Grant Weddell
Affiliation:	Cheriton School of Computer Science, University of Waterloo
Address:	200 University Ave W., Waterloo, ON N2L3G1, Canada
E-mail:	`{david,gweddell}@uwaterloo.ca`
WWW:	`cs.uwaterloo.ca/~{david,gweddell}`

Resources and Bibliography

Tutorial Materials

Tutorial Slides: itb-tutorial.pdf