XBench - A Family of Benchmarks for XML DBMSs

Home

Specification

[Database Generator]

Schemas and DTDs

Workload

Downloads

Publications

People

Other XML Benchmarks

XBench -
A Family of Benchmarks for XML DBMSs

Data Gathering Methodology

text-centric/single document

text-centric/multiple documents

data-centric/single document

data-centric/multiple documents

Analyze real XML documents and extract statistical data;
There are sufficient amount for text-centric XML documents for analysis. However, for data-centric classes, the availability of real XML data for analysis is problematic. Most of the XML documents in the data-centric classes are currently relational that may be translated into XML for communication. Therefore, the schema of the TPC-W benchmark is used and is mapped to XML.
Generalize the characterizations of XML documents in each category;
Create synthetic data to simulate the XML documents in each category.

Document Characterization

Element types.: The collection of all element types that appear in XML documents.
Tree structure of element types.: The relationship of all element types in the collection, indicating the parent/child relationships of each pair of element types, if there is such relationship.
Distribution of children to elements.: For each element type, the probability distribution of instance occurrences of all its child element (directly sub-element) types.
Distribution of element values to types.: The probability distribution of values of each element type.
Attribute names.: The collection all all attribute names in an XML documents.
Distribution of attribute values to names.: The probability distribution of values of each attribute.
Distribution of attributes to elements.: The probability distribution of the attributes to each element.

Database Generator

ToXgene

Downloads

Database Size

Please note that ToXgene currently takes some time generating "large" size databases and cannot generate "huge" size databases. These affect XBench data generation as well. We will inform registered users when these issues are resolved.

University of Waterloo

School of Computer Science

Database Research Group

Benjamin Bin Yao

Last modified: Mon Dec 9 09:13:17 EST 2002