For a list of students supervised, refer to Students.
See also Publications.

Overview of research program

The long-range objective of my research has been (and will continue to be) to develop a unified methodology for designing data structures from the individual users' models through the enterprise model to the storage structures. This has involved the development of formal models, the development and analysis of effective algorithms, and the application of the ideas to solving large, practical problems.

We pursued this objective first as it applies to conventional record-oriented databases. Doctoral students working with me concentrated on defining and analyzing properties of "normal forms" as part of the design of a conceptual model [Osborn PhD:77, Ling PhD:78]. Collaborating in part with Professor Gaston Gonnet, we examined formalisms for describing data structures to support efficient algorithms [Tompa 1977, Gonnet-Tompa 1983], concentrating particularly on the specification of their abstract structures [Santoro PhD:79, Tompa 1980] and on the design of efficient storage structures and policies [Ramirez PhD:80, Ziviani-Tompa 1982]. More recently, students working with me have concentrated on supporting user models of the data: examining the problems of processing database updates that are expressed in terms of a partial view of the data [Medeiros PhD:85, Brodnik-Tompa 1993] and of keeping a materialized view up-to-date in the presence of change to the underlying stored data [Blakeley PhD:87]. We have also examined algorithms to process users' queries by the most efficient means available [Icaza PhD:87].

Since 1981, we have examined database concerns for non-standard databases, first concentrating on videotex databases. Because the fundamental assumptions about the nature of the data and its uses distinguishes videotex databases from conventional ones, we developed a page-oriented database model that includes query and update facilities [Tanguay MMath:86]. Because of videotex's orientation towards large public-access systems for casual users, several students working with me considered the support of individualized views. We investigated powerful browsing facilities [Raymond MMath:84], graphical query languages [Böggild MMath:86], and the effectiveness of users' classification systems [Raymond-Cañas-Tompa-Safayeni 1989]. Although declining interest in videotex reduced the impact of our work, it served well as a precursor for ongoing work with hypertexts [Tompa 1990, Tompa-Raymond 1991, Tompa-Blake-Raymond 1993].

Since 1984, we have concentrated on more general text-dominated databases. The thrust of this research has been directed towards the development of a database system that will be capable of supporting the needs of text creators (such as the editors at the Oxford University Press who will maintain and enhance the Oxford English Dictionary), while simultaneously supporting the needs of text users (writers, editors, humanist scholars) who will access such machine-readable texts interactively. Again the conventional assumptions about data were found to be inappropriate — even the fundamental concept of ``entity'' does not apply — and, in close collaboration with Professor Gaston Gonnet, we developed two new models of text data [Blake-Bray-Tompa 1992, Salminen-Tompa 1994]. Because of the great potential, an Ontario company, Open Text Corporation, began operations in July 1989 to develop and market products based on our research. Open Text, which currently employs over 12,000 individuals worldwide, has developed the Livelink Intranet application suite including Livelink Search, which has evolved from our research.

Later research includes the design and development of a text/relational database management system, based on a federated model that provides a hybrid query processor that supports extensions to SQL to accommodate structured text such as described using SGML, and the design and development of a system for data resource discovery for deployment on the internet.

The following list of major collaborations is indicative of the value of the research to industry: