Structured Text

Text models

Stream: of characters, words, paragraphs, page images.

Document: monolithic stream, fielded record, nested fields (tree or forest), graph.

Document collection: sequence, set, graph.

Text extents

Mechanism: fixed width, superimposed (external) structures, markup (inline tags).

Terminators: explicit vs. implicit.

Tag semantics: presentation, procedural, descriptive.

Constraints

Purpose: consistency (creation or update), efficiency (storage, access)

Forms:

None (self-defining structure)

Grammars: regular expression, context-free language, regular right-part grammar

Algebraic: data types, relational

Semi-structured data

Lore data model

Based on OEM directed graph model developed as part of Tsimmis project at Stanford

No schema no constraints whatsoever placed on possible instances.

 

Example: encyclopedia bibliography

DataGuide

Abstraction of relationships that are found in the data

Construction similar to converting NFA to DFA, and as a result

  1. For every path, the set of all OIDs (or values) reachable from a root in the database instance by that path will be represented by a unique node in the DataGuide.
  2. For every edge in the DataGuide with label L connecting N1 to N2, there will be some edge <S,L,T> in the instance such that the OID for S is in the set corresponding to N1 and the OID for T is in the set corresponding to N2.

 

Properties of the DataGuide:

  1. Every path starting from a root in the database has one and only one corresponding path in the DataGuide with the same sequence of labels.
  2. Cycles in the database correspond to cycles in the DataGuide.

XML data models

XML interfaces

SAX : stream, event-based

DOM : tree-structure, object-oriented (types and methods); “live” document

XPath expressions

References and related reading

Querying XML, Chapters 2, 3, 6, 9

Tompa89

Raymond96

Coombs87

 

Suciu97

Garcia-Molina97

Goldman97

Nestorov97b

Consens08

W3C Data Model

XPath

Brownell

W3C DOM