ACM Computing Surveys 28(4es), December 1996, http://www.acm.org/pubs/citations/journals/surveys/1996-28-4es/a85-ozsu/. Copyright © 1996 by the Association for Computing Machinery, Inc. See the permissions statement below. This article derives from a position statement prepared for the Workshop on Strategic Directions in Computing Research.


Future of Database Systems:
Changing Applications and Technological Developments


M. Tamer Özsu

University of Alberta
ozsu@cs.ualberta.ca


Database management systems (DBMSs) are at a crossroads, perhaps the first since their successful entry into the information-processing marketplace. On the one hand, relational systems have been enormously successful, creating a multibillion-dollar industry over the last two decades. On the other, current technological developments and application demands are severely testing the limits of current commercial systems. Failure to address these changes and demands may result in the marginalization of database management with more and more data stored elsewhere and managed by systems without typical database functionality (e.g., querying, transactional support, integrity enforcement). The following are some of my views on the challenges and the directions that seem to be important to investigate.

Changing Applications

The unsuitability of the existing (relational) DBMSs in servicing the data management requirements of complex (aka `advanced') application domains is a well known (and much repeated) fact. Most of the problems arise from the multiplicity and complexity of data types and the uncertainty of accessing them. Existing systems are optimized to manage structured data of a few relatively simple types. Users are expected to pose well-formed queries and/or simple transactions to access the database. All of these issues change in new applications that now demand DBMS services. Most of these applications deal with multiple types of quite complex data that are not well structured; user access to these data is ill-defined, requiring partial match searches, and the transactional access to the data is at workflow complexity. Since many of these applications exhibit multimedia characteristics, I ll use multimedia information systems to demonstrate some of the issues. What are the characteristics of multimedia data that differentiate these databases from traditional ones?

Current relational DBMSs cannot meet these requirements for a number of model and architectural reasons. This is perhaps why DBMS technology has not had a significant impact on data management requirements of applications with similar requirements. The main use of database technology has been as the holder of metadata, but the actual multimedia data, in particular continuous-media data such as audio and video, are stored in ordinary files. This unfortunately eliminates the possibility of posing queries on these data. Ultimately, we would should be able to pose queries such as ``Find all multimedia documents that show Bill Clinton standing next to John Chretien and uttering the words `Canada is our best neighbor''' or ``Show me all the images that contain an object that looks like O (depicting a shape)'' and have these queries executed efficiently.

What needs to be done to let DBMSs fulfill the requirements of these applications? The needs are both architectural and model-dependent. For example, since the delivery of audio and video streams (assuming they are delivered on different streams) are both time-dependent and need to be synchronized with each other, either the communication between the client and the server has to be adjusted to meet these real-time synchronization requirements, or the server interface of the systems needs to be opened to enable syncronization routines to access multimedia objects at the server buffer. From a modeling perspective, more sophisticated models are necessary to capture the application objects properly. Object DBMSs are, in my view, the most promising systems for meeting these requirements. However, many problems in engineering high-performance, full-functionality object DBMSs have yet to be resolved. Existing systems, by and large, are persistent object repositories with limited DBMS functionality following simple distribution strategies. It is hard to extend them (how do you store JPEG encoded images in native mode in an object DBMS so that you can interpret the encoded images?), they don't offer query models that can be extended with multimedia constructs, their query-optimization capabilities are severely limited, and I don't know how they would scale with increasing data volume and user community. Furthermore, querying these databases is significantly more complicated as one has to deal with quality-of-service concerns and handle fuzzy queries that can be answered by partial matches and those that need the discovery of ill-defined (or undefined) patterns in the data. Thus, for example, data-mining techniques can be used on image databases to answer some of the queries.

As indicated above, multimedia systems are representative (in terms of their data management requirements) of many other applications such as electronic commerce, digital libraries, and engineering design environments. I believe distributed object management [2,3] will be the major technology in addressing these requirements and R&D in this technology is a fundamental strategic direction. Research in this area is in its infancy.

The foregoing discussion may have left the impression that it is essential (or, at least, desirable) to collect all of this data under the control of a DBMS. This is not my claim, since I recognize that most of this data is already stored in various other places. What I propose, however, is that DBMS-like access to this data be provided in an interoperable environment. This raises the second important strategic issue, namely interoperable systems. Early research in this area concentrated on multidatabases (or federated databases). More recent research has started to address the problem in more generality with emphasis on wrapper-mediator systems [4]. The wrapper-mediator approach, coupled with object orientation, seems to be the correct paradigm to deal with interoperability. However, there is currently no well-defined methodology for constructing these systems and there is little or no support for (semi-)automatic generation of wrappers and mediators for different functions.

Technological Developments

Perhaps the most important technological development that is affecting database management is the emergence of distributed and parallel computing as a mainstream computing paradigm. Stonebraker claimed in 1988 that in the subsequent decade centralized database managers would become an antique curiosity as more organizations move toward distributed database managers [5]. By and large, this turned out to be an accurate forecast. Most, if not all, of the commercial DBMSs provide some sort of distribution. Practically every product can be configured as a single server client-server system and some go beyond that as well. In my view, this trend will continue at an accelerated pace in the future and the only obstacle to this growth that I can see is our inability to manage highly distributed systems effectively. One might question whether this is a computer science problem or an issue that organizational and management science should tackle, but it remains an issue.

We claimed in a 1991 paper [6] that we do not have a handle on the effects of computer network protocols on the performance of distributed DBMSs. This remains largely true today as well, and the problem is becoming more serious. There is a convergence of communications and data management whose synergistic effect provides both challenges and opportunities. There are three major developments in networking that will have profound effects on database management, and I am not convinced that we know how to deal with the effects of these developments. These are (1) the emergence of high-bandwidth, high-speed broadband networks, (2) mobile computing environments, and (3) the explosion of the Internet.

Broadband networks violate almost all of the assumptions that we used to make in designing distributed database systems. The network is no longer the bottleneck, since network speeds can exceed I/O speeds. Some have suggested that the emergence of broadband networks signals the death of distributed databases by making access to a remote centralized database feasible. These arguments miss the point, in my view, since bandwidth and latency are different things and there are motivating factors other than bandwidth and speed for distribution of storage and maintenance of data. However, there is no question that important architectural reevaluation is necessary. There is some work, for example, that investigates the tradeoffs of accessing data from a `neighbor's' cache rather than retrieving it from disk if network speeds make this advantageous. More work on issues such as this, which might turn some of the underlying assumptions of DBMSs on their heads, is necessary.

Mobility is emerging as a major force in the marketplace. Most mobile data management research assumes an environment where data are located in computers on the wireline network with the mobile stations, with limited capabilities, `downloading' data as needed. This is a realistic scenario for a limited number of applications and one that poses no major challenges for data management, since data resides primarily on wireline computers. What is more interesting is the environment in which mobile stations are more powerful and store native data that may need to be shared by others (the so-called `walkstation' case [7]). This case poses significant data management difficulties due to the characteristics of the mobile environment. Mobile computing environments are characterized by three issues [8]: communication characteristics, mobility and portability. Communication is over wireless networks that are prone to disconnections, noise, echo, and low bandwidth. Mobility of some of the equipment on the network causes static data in wireline networks to become dynamic and volatile in wireless networks. Mobility raises issues such as address migration, maintenance of directories and difficulty in locating stations. Finally, portability places restrictions on the type of equipment that can be used in these environments. For example, easy portability and the desire for long operation between battery recharges usually restrict the possible type and size of storage. Dealing with the effects of these is a major R&D issue.

A particular difficulty to be handled is that these two technologies are entering the fray at the same time. Thus, networks of tomorrow will likely be broadband backbones with wireless networks connected to it. Furthermore, some of the broadband backbone may be wireless, going over satellite channels. These networks pose other difficulties since the bandwidth availability is offset by communication latency between earth stations and satellites. In this case, query processing, for example, must take into account quality-of-service considerations. This evironment is not too far in the future; even today, Canada is equipped with a country-wide ATM-based broadband test network. There are many such trials all over the world and the large-scale emergence of these networks will make distributed data management over wide-area networks both feasible and an R&D challenge.

The explosion of the Internet is now the topic of daily newspaper articles and TV programs. Putting aside the hype, the Internet activity is important from a database management perspective simply because of the diversity of repositories that it introduces. Most existing Internet access tools are browsing-based. However, there is a demand to perform complex queries over Internet sites and this poses significant challenges. One of the fundamental problems is the inherent heterogeneity of the information sources and the lack of a schema to guide the querying process. The other difficulty is the variance in the capabilities of the various sites in processing these queries.

Conclusion

In my view, strategic directions for database system research and development efforts can be summarized as follows: We should be addressing the requirements of new application domains by building DBMSs with sufficiently powerful models and flexible and extensible architectures that can exploit and adapt to the technological changes. This is a generic statement that requires fleshing out. The specifics of an R&D agenda along these lines should in my view include the following:

  1. We should be spending time learning the specific requirements of the application domains whose needs we try to serve with our systems. These requirements will, we hope, be generalizable and abstractable to a certain extent, allowing the development of generalized systems. I believe distributed object technology provides a significant handle on the problem. However, this technology needs to be extended to provide full DBMS functionality.
  2. Systems need to be developed with an open architecture that allows for `easy' extension and fine-tuning as well as scalability. The extensibility property is essential to being able to incorporate into a base system the specific requirements of application domains in which they will be deployed (e.g., temporality, new query primitives, etc). Scalability is one of the fundamental concerns in meeting the challenges of increasing volumes of data and increasing numbers of users.
  3. Systems need to be developed to exploit high-speed/high-bandwidth networks and need to incorporate mobility as a fundamental design criterion.
  4. Methodologies for interoperability need to be developed. In this regard, distributed object technology can again provide some solutions via industry standards such as CORBA and OLE.

References

[1] E.A. Fox. "Advances in interactive digital multimedia systems," Computer, 24(10): 9--21, October 1991.

[2] M.T. Özsu, U. Dayal, and P. Valduriez. Distributed Object Management, Morgan-Kaufmann, 1994.

[3] R. Orfali, D. Harkey, and J. Edwards. The Essential Distributed Objects Survival Guide, John Wiley, 1996.

[4] G. Wiederhold. "Mediators in the Architecture of Future Information Systems", IEEE Computer, March 1992, 38-49.

[5] M. Stonebraker. Readings in Database Systems. Morgann Kaufmann, 1988.

[6] M.T. Özsu and P. Valduriez. "Distributed Database Systems: Where Are We Now?", IEEE Computer, August 1991, 68-78.

[7] T. Imielinski and B.R. Badrinath. "Data Management Issues in Mobile Computing," Communications of ACM, October 1994.

[8] G.H. Forman and J. Zahorjan. "The Challenges of Mobile Computing", IEEE Computer, April 1994.


Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.