CS 741 Non-Traditional Databases

Fall 2009: Text Databases


Frank Tompa
DC 1313, x34675


The Fall 2009 offering covers the management of text databases. At the end of the course the student will be able to design or evaluate a database subsystem capable of supporting the needs of text creators and users who wish to access machine-readable text interactively or use XML for data interchange. Students will be familiar with structured text standards, including XML. Students will be able to design wrappers, storage structures, and index methods appropriate for text and understand traditional text applications, such as information retrieval.


Students are expected to understand the fundamentals of database systems, programming language specifications, and data structures and algorithms, each at least at the level of an introductory course.



J. Melton and S. Buxton, Querying XML : XQuery, XPath, and SQL/XML in context. Morgan Kaufmann, 2006.

See also:

Workload and Evaluation

Notice on Absences

There will be no tests or exams.


Please review the materials concerning plagiarism and academic honesty.  You must complete and sign the Academic Integrity Acknowledgement Form, and hand it in by classtime on Thursday, October 1.


Special seminar: Dirk Van Gucht, The Duality Between Query Languages and Index Structures, Thursday, November 19, 9:30-10:30, DC 1304.

Note: non-standard start time; non-standard room

Tuesdays and Thursdays 10-11:20 am
RCH 106

Office hours: Mondays 2-5 pm

Outline (tentative)

 1.   Introduction

Text-dominated databases. Overview of W3C's XML specifications: XML core, query language, schema, and transformations. Common applications of XML.

 2.   Data Model

Structured text data model(s) and properties. DOM and SAX. XPath and other path expressions.

 3.   Text DDLs

DTDs, XML Schema, Relax-NG.

 4.   Text DMLs

XQuery FLWOR expressions. Full text facilities.

 5.   Data Storage and Indexing

Techniques for storing structured text, including graph and interval encodings. Text indexing techniques. Indexing semi-structured data.

 6.   Updates and Transformations

Support for updates. Transaction management. XSLT.

 7.   XQuery Formal Semantics

XQuery core. Static typing. Dynamic semantics.

 8.   Query Processing and Optimization

Region algebras. Algorithms for native processing of queries. Query optimization techniques.

 9.   Relational Encodings

Mappings to and from the relational model. SQL/XML.

10.  View Matching

Using materialzed SQLXML views to answer SQLXML queries.

11.  Web Services

Internationalization. SOAP. WSDL.

12.  Streaming Text

Publish-subscribe systems. Pre-filtering documents.