Structured Text
Stream: of characters, words, paragraphs, page images.
Document: monolithic stream, fielded record, nested fields
(tree or forest), graph.
Document collection: sequence, set, graph.
Mechanism: fixed width, superimposed (external) structures,
markup (inline tags).
Terminators: explicit vs. implicit.
Tag semantics: presentation, procedural, descriptive.
Purpose: consistency (creation or update), efficiency
(storage, access)
Forms:
None
(self-defining structure)
Grammars:
regular expression, context-free language, regular right-part grammar
Algebraic: data
types, relational
Based on OEM directed graph model developed as part of
Tsimmis project at Stanford
No schema no
constraints whatsoever placed on possible instances.
Example: encyclopedia bibliography
Abstraction of relationships that are found in the data
Construction similar to converting NFA to DFA, and as a result
Properties of the DataGuide:
Tree-structured, with nodes (information items) for Document, Element, Attribute, Processing Instruction, Unexpanded Entity Reference, Character, Comment, DTD, Unparsed Entity (declaration), Notation (declaration), Namespace. Each node type has properties defined (e.g., children, parent, value)
Tree-structured, with nodes for Document, Element, Attribute, Processing Instruction, Text, Comment, Namespace. The properties are similar to, but not identical to, those defined for the corresponding nodes in the InfoSet.
SAX : stream, event-based
DOM : tree-structure, object-oriented (types and methods);
“live” document
axes: child, desendent, parent, ancestor, self, descendent-or-self, ancestor-or-self, following-sibling, preceding-sibling, following, preceding, attribute, namespace
Querying XML, Chapters 2, 3, 6, 9
|
||||