XBench - A Family of Benchmarks for XML DBMSs

[Home]

Specification

Database Generator

Schemas and DTDs

Workload

Downloads

Publications

People

Other XML Benchmarks

XBench -
A Family of Benchmarks for XML DBMSs

XML (eXtensible Markup Language), a subset of SGML (Standard Generalized Markup Language), is a specification proposed by the World Wide Web Consortium (W3C) to complement HTML (Hypertext Markup Language) for electronic data representation and exchange on the Web. Because it is self-describing, it is beginning to be extensively used in various application domains such as chemistry, biology, medicine and e-business. As a result, large amounts of XML documents are being generated, which has raised the demand for their efficient management. Researchers in both industry and academia have been focusing on efficiently storing, manipulating, and retrieving XML documents. The individual performance characteristics of different approaches as well as the relative performance of various systems is an ongoing concern.

XBench is a family of benchmarks that capture different XML application characteristics. These applications are categorized as data-centric or text-centric and the corresponding databases can consist of single documents or multiple documents. In data-centric (DC) applications, the database stores data that are captured in XML even though the original data may not be in XML. Examples include e-commerce catalog data or transactional data that is captured as XML. Text-centric (TC) applications manage actual text documents and use a database of native XML documents. Examples include book collections in a digital library, or news article archives. The single document (SD) case covers those databases, such as an e-commerce catalog, that consists of a single document with complex structures (deep nested elements), while the multiple document case covers those databases that contain a set of XML documents, such as an archive of news documents or transactional data. The result is a requirement for a database generator that can handle four cases: DC/SD, DC/MD, TC/SD, and TC/MD. The XBench database generator can generate databases in any of these classes ranging from 10MB to 10GB in size.

The workload specification covers the functionality of XQuery as captured in the XML Query Use Cases. Each of these queries are slightly varied to fit the specifics of the application domain.

The early phases of the project was carried out jointly with the IBM Toronto Laboratory and was funded by the Centers for Advanced Studies (CAS).

University of Waterloo

School of Computer Science

Database Research Group

Benjamin Bin Yao

Last modified: Fri Sep 5 04:10:00 EST 2003