XBench -
A Family of Benchmarks for XML DBMSs
XML (eXtensible Markup Language), a subset of SGML
(Standard Generalized Markup Language), is a
specification proposed by the World Wide Web Consortium (W3C) to
complement HTML (Hypertext Markup Language) for
electronic data representation and exchange on the Web. Because it
is self-describing, it is beginning to be extensively used in
various application domains such as chemistry, biology, medicine
and e-business. As a result, large amounts of XML documents are
being generated, which has raised the demand for their efficient
management. Researchers in both industry and academia have been
focusing on efficiently storing, manipulating, and retrieving XML
documents. The individual performance characteristics of different
approaches as well as the relative performance of various systems
is an ongoing concern.
XBench is a family of benchmarks that capture different
XML application characteristics. These applications are
categorized as data-centric or text-centric and the
corresponding databases can consist of single documents or
multiple documents. In data-centric (DC) applications, the
database stores data that are captured in XML even
though the original data may not be in XML. Examples include
e-commerce catalog data or transactional data that is
captured as XML. Text-centric (TC) applications manage actual
text documents and use a database of native XML
documents. Examples include book collections in a digital library,
or news article archives. The single document (SD)
case covers those databases, such as an e-commerce catalog, that
consists of a single document with complex
structures (deep nested elements), while the multiple document
case covers those databases that contain a set of
XML documents, such as an archive of news documents or
transactional data. The result is a requirement for a
database generator that can handle four cases: DC/SD, DC/MD,
TC/SD, and TC/MD. The XBench database generator
can generate databases in any of these classes ranging from
10MB to 10GB in size.
The workload specification covers the functionality of XQuery
as captured in the XML Query Use Cases. Each of these queries
are slightly varied to fit the specifics of the application domain.
The early phases of the project was carried out jointly with the IBM Toronto Laboratory and was funded by
the Centers for Advanced Studies (CAS).
|