Updating and Transforming Structured Text

Updates in XQuery

Simple update operators

Let XQueryExpr be any expression, XQueryExpr1 be an expression that evaluates to a single node, and QName be an expresssion that evaluates to a single qualified name:

o       insert (node | nodes)XQueryExpr (as last | as first)? into XQueryExpr1

o       insert (node | nodes) XQueryExpr (before | after) XQueryExpr1

o       delete (node | nodes) XQueryExpr

o       replace (value of)? node XQueryExpr1 with XQueryExpr

o       rename node XQueryExpr1 as QName

o       ( )

Note: updates must obey XQuery data model:

1.      Insertion can only be into an element or document node, and before or after an element, comment, or processing instruction node.

2.      Before inserting or replacing, sequences of atomic values must be cast to text with intervening blanks and inserted as a single text node. After updating, adjacent text nodes must be coalesced with no intervening blanks.

3.      Replacement of the value of an element or document node must constitute well-formed element content.

FLWOR updates

Update statements may appear in a return clause of any FLWOR expresssion.

Incompatible updates result in an error.

Can also be used in conditional expressions and function definitions that are declared to be "updating"

Examples, using EB bibliography

Delete references that include two citations, at least one of which is to a work written before 1960.

 

 FOR $r IN doc("eb-bib.xml")//ref

 LET $c := $r/cite

 WHERE $r//title/@date < 1960

   AND count($c) > 1

 RETURN DELETE NODE $r

 

Replace W.L. Morton by William Lewis Morton wherever it appears as the name of an author.

 

 FOR $r IN doc("eb-bib.xml")//ref//author[text()="W.L. Morton"]

 RETURN REPLACE VALUE OF NODE $r WITH "William Lewis Morton"

 

For each use of "Ibid." insert the appropriate author or authors.

 

 FOR $r IN doc("eb-bib.xml")//ref ,

$c IN $r/cite[@type="ibid"]

 RETURN

     INSERT NODES { $r/cite/author } AS FIRST INTO $c

     REPLACE VALUE OF NODE $c/@type WITH "full"

Semantics

"Snapshot semantics"

1.      Pre-update processing

a.       Bind the variables declared in the for and let clauses

b.      Evaluate each update in the list of simple updates, and append to a "pending update" list

2.      Perform semantic checking for validity (aborting if result would be invalid)

3.      Apply the updates sequentially

4.      Re-validate, if desired, to re-establish data types

Incremental validity checking

o       Consider tree structure corresponding to XML (attributes treated as children before subelements)

o       Construct a bottom-up tree automaton that matches nodes from leaves to the root

1.      Content model for an element (regular expression) converted to FSA

2.      Symbol table for an element includes bag of IDs and bag of IDREFs used in subtree

o       XML is valid if

1.      automaton for each node matches list of children

2.      no IDs repeated in root's symbol table

3.      set of IDREFs is a subset of set of IDs in root's symbol table

o       Construct similar tree automaton for insertion or replacement value and check for "local validity" (only 1 and 2)

o       Check for compatibility of update

1.      reexecute automaton at insertion node against updated list of children

o       Note: can start from state corresponding to point of insertion

2.      check IDs(modRoot) = IDs(root)-IDs(deletion)+IDs(insertion) for duplicate values

3.      check IDREFs(root)-IDREFs(deletion)+IDREFs(insertion)-IDs(modRoot) for dangling values

Transaction support

o       Merely wrap statements in begin transaction /end transaction commands

o       Atomic (all or nothing)

o       Consistent (preserve DB constraints)

o       Isolated (independent of other transactions, but relaxed by considering ANSI isolation levels)

(1)     read uncommitted

(2)     read committed

(3)     repeatable read

(4)     serializable

o       Durable (changes guaranteed upon commit)

o       How to apply locks at the XML level?

o       based on strict 2PL locking for trees (lock paths from the root)

o       account for predicates on content as well

 

 

 

Granted

 

 

 

Requested

None

IS

IX

S

SIX

X

IS

+

+

+

+

+

P

IX

+

+

+

P

P

P

S

+

+

P

+

P

P

SIX

+

+

P

P

P

P

X

+

P

P

P

P

P

o       For an individual update command

1.      determine all nodes (from the root) on each specified path, and predicates for each node

2.      for each node to be read, acquire IS locks on all ancestors (in order), then S on node

3.      for each node to be updated, acquire IX locks on all ancestors (in order), then X on node

 

o       apply locks to DataGuide, rather than to data itself

o         appropriate even when data is not stored as a graph

o         (usually) smaller than data graph

Transformations

o         Grammar-preserving (simple changes to content)

o         Local structural modifications (simple insertions, deletions, or rearrangements)

o         Global rearrangements (including inversions)

o         Multi-document segmentation and integration

Using XQuery

Usually needs user-defined functions to reconstruct nested structure

XQuery Update Facility defines transformation operator (again to be used in a return clause)

o       copy XQueryVar := XQueryExpr1
       ( , XQueryVar := XQueryExpr1 )*
modify XQueryUpdateExpr1
return XQueryExpr1

Transformations do not update persistent data.

XSLT

o         Evolution from XML Stylesheet Language (XSL) for producing HTML

o         Functional language, converting source tree into result tree

o         "stylesheet" = set of templates

o         Push-pull model based on matching patterns using XPath

 

<xsl:template match="class/student">

   <xsl:apply-templates/>

   <newNode>

      <xsl:value-of select="instructor/firstName"/>

   </newNode>

</xsl:template>

 

o         Types of templates:

 

<xsl:template match=pattern name=qname priority=number mode=qname>

   ... possibly including call/apply with other templates ...

</xsl:template>

<xsl:apply-templates select=sequence-expression mode=qname>

   ... provide sorting criteria or parameters if applicable ...

</xsl:apply-templates>

<xsl:call-template name=qname>

   ... provide template parameters if applicable ...

</xsl:call-template>

<xsl:value-of select=sequence-expression />

<xsl:for-each select=sequence-expression>

   ... provide sorting criteria if applicable ...

</xsl:for-each>

<xsl:if test=expression>
   ...
</xsl:if>

<xsl:choose>
   xsl:when +

   xsl:otherwise ?
</xsl:choose>

<xsl:when test=expression>
   ...
</xsl:when>

<xsl:otherwise>
   ...
</xsl:otherwise>

 

Default templates:

 

<xsl:template match ="*|/" mode="#all">

   <xsl:apply-templates/>

</xsl:template>

<xsl:template match="text()|@*" mode="#all">

   <xsl:value-of select="."/>

</xsl:template>

<xsl:template match="processing-instruction()|comment()" mode="#all"/>

 

o         Example (adapted from http://www.topxml.com/xsltstylesheets/)

<?xml version="1.0" encoding="utf-8" ?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 

<xsl:template match="/">

<customers>

<xsl:apply-templates select="/customers" />

</customers>

</xsl:template>

 

<xsl:template match="customers">

<xsl:apply-templates />

</xsl:template>

 

<xsl:template match="customer">

<customer>

<CompanyName>

<xsl:value-of select="@CompanyName" />

</CompanyName>

<CustomerID>

<xsl:value-of select="@CustomerID" />

</CustomerID>

<Country>

<xsl:value-of select="@Country" />

</Country>

</customer>

<xsl:apply-templates />

</xsl:template>

</xsl:stylesheet>

References and related reading

Querying XML, Chapters 7.1, 7.2, 13.3

W3C Update Req.

W3C Update

   

Sur04

Ghel06

Bouchou03

Grabs02

W3C XSLT

Kuikka 93

Bruno 03

Tang 02