Text Data Manipulation Languages

Background

Conventional languages

SQL and OQL

XPath

Selection of components, but no ability to reshape them or create new ones

Lorel query language

For use with Lore semi-structured data model (OEM directed graph)

recall: EB bibliography example

What works written after 1960 appear in references that include at least two citation segments?

 

select distinct m.title.t

from eb_bib.para.ref r, r.cite m, r.cite n

where (m != n) and (m.title.date > 1960)

 

Why a new language?

Roots in conventional database querying + information retrieval + Web search engines

XML vs. relations

Requirements

XQuery Data Model

FLWOR expressions

For-Let-Where-OrderedBy-Return clauses

Other expressions

Functions and Operators

User-defined functions

Example, using EB bibliography

What works written after 1960 appear in references that include at least two citation segments?

 

<biblio>

 {

 FOR $r IN doc("eb-bib.xml")//ref ,

     $t IN distinct-values($r/cite/title)

 LET $c := $r/cite

 WHERE $t/@date > 1960

   AND count($c) > 1

 ORDER BY($t/@date DESCENDING)

 RETURN

    <citation>

         {

         FOR $a IN $r//author

         RETURN

              $a

         } ,

              <work>{ $t/text() } ,

                    Ed. ,

                    { $t/@edition }

              </work> ,

              <date>{ $t/@date }</date>

    </citation>

 }

</biblio>

For more examples, examine some of the XML Use Cases or examples available through MonetDB. A list of implementations is maintained by W3C.

Full-text support

Extensions for keyword search, including boolean combinations, stemming, proximity, thesaurus expansion, stopwords, ordering:

ft-selection → ft-boolean-expr ft-filter*
ft-boolean-primary → ft-primary ft-match-option*
ft-match-option → ft-language | ft-wildcard | ft-thesaurus | ft-stem | ft-case | ft-diacritics | ft-stopwords | ft-extension
ft-filter → ft-order | ft-window | ft-distance | ft-scope | ft-anchor

     FOR $book IN doc("http://bstore1.example.com/full-text.xml")/books/book
     LET $cont := $book/content
     WHERE $cont FTCONTAINS "software" FTAND "developer" WITH STEMMING
           DISTANCE AT MOST 3 WORDS
     RETURN $book

For more examples, examine some of the XML Full Text Use Cases.

References and related reading

Querying XML, Chapters 10, 11, 12, 13.2, 14, C.2

Maier98

W3C Query Requirements

W3C Data Model

W3C XQuery

W3C Operators

W3C Full Text