This section contains examples of several important classes of queries that can be expressed using XQuery. The applications described here include joins across multiple data sources, grouping and aggregation, queries based on sequential relationships, recursive transformations, and selection of distinct combinations of values.
Note:
The new features of XQuery 3.0 and XQuery 3.1 can significantly simplify some of these queries.
Joins, which combine data from multiple sources into a single result, are a very important type of query. In this section we will illustrate how several types of joins can be expressed in XQuery. We will base our examples on the following three documents:
A document named parts.xml
that contains many part
elements; each part
element in turn contains partno
and description
subelements.
A document named suppliers.xml
that contains many
supplier
elements; each supplier
element in turn
contains suppno
and suppname
subelements.
A document named catalog.xml
that contains information about the
relationships between suppliers and parts. The catalog document contains many
item
elements, each of which in turn contains
partno
, suppno
, and price
subelements.
A conventional ("inner") join returns information from two or more related sources, as illustrated by the following example, which combines information from three documents. The example generates a "descriptive catalog" derived from the catalog document, but containing part descriptions instead of part numbers and supplier names instead of supplier numbers. The new catalog is ordered alphabetically by part description and secondarily by supplier name.
<descriptive-catalog> { for $i in fn:doc("catalog.xml")/items/item, $p in fn:doc("parts.xml")/parts/part[partno = $i/partno], $s in fn:doc("suppliers.xml")/suppliers /supplier[suppno = $i/suppno] order by $p/description, $s/suppname return <item> { $p/description, $s/suppname, $i/price } </item> } </descriptive-catalog>
The previous query returns information only about parts that have suppliers and suppliers that have parts. An outer join is a join that preserves information from one or more of the participating sources, including elements that have no matching element in the other source. For example, a left outer join between suppliers and parts might return information about suppliers that have no matching parts.
The following query demonstrates a left outer join. It returns names of all the suppliers in alphabetic order, including those that supply no parts. In the result, each supplier element contains the descriptions of all the parts it supplies, in alphabetic order.
for $s in fn:doc("suppliers.xml")/suppliers/supplier order by $s/suppname return <supplier> { $s/suppname, for $i in fn:doc("catalog.xml")/items/item [suppno = $s/suppno], $p in fn:doc("parts.xml")/parts/part [partno = $i/partno] order by $p/description return $p/description } </supplier>
The previous query preserves information about suppliers that supply no parts. Another
type of join, called a full outer join, might be used to preserve
information about both suppliers that supply no parts and parts that have no supplier.
The result of a full outer join can be structured in any of several ways. The following
query generates a list of supplier
elements, each containing nested
part
elements for the parts that it supplies (if any), followed by a
list of part
elements for the parts that have no supplier. This might be
thought of as a "supplier-centered" full outer join. Other forms of outer join queries
are also possible.
<master-list> { for $s in fn:doc("suppliers.xml")/suppliers/supplier order by $s/suppname return <supplier> { $s/suppname, for $i in fn:doc("catalog.xml")/items/item [suppno = $s/suppno], $p in fn:doc("parts.xml")/parts/part [partno = $i/partno] order by $p/description return <part> { $p/description, $i/price } </part> } </supplier> , (: parts that have no supplier :) <orphan-parts> { for $p in fn:doc("parts.xml")/parts/part where fn:empty(fn:doc("catalog.xml")/items/item [partno = $p/partno] ) order by $p/description return $p/description } </orphan-parts> } </master-list>
The previous query uses an element constructor to enclose its output inside a
master-list
element. The concatenation operator (",") is used to
combine the two main parts of the query. The result is an ordered sequence of
supplier
elements followed by an orphan-parts
element that
contains descriptions of all the parts that have no supplier.
XQuery uses the <<
and >>
operators to compare
nodes based on document order. Although these operators are quite simple, they can be
used to express complex queries for XML documents in which sequence is meaningful. The
first two queries in this section involve a surgical report that contains
procedure
, incision
, instrument
,
action
, and anesthesia
elements.
The following query returns all the action
elements that occur between the
first and second incision
elements inside the first procedure. The original
document order among these nodes is preserved in the result of the query.
let $proc := /report/procedure[1] for $i in $proc//action where $i >> ($proc//incision)[1] and $i << ($proc//incision)[2] return $i
It is worth noting here that document order is defined in such a way that a node is
considered to precede its descendants in document order. In the surgical report, an
action
is never part of an incision
, but an
instrument
is. Since the >>
operator is based on
document order, the predicate $i >> ($proc//incision)[1]
is true for
any instrument
element that is a descendant of the first
incision
element in the first procedure.
For some queries, it may be helpful to declare a function that can test whether a node
precedes another node without being its ancestor. The following function returns
true
if its first operand precedes its second operand but is not an
ancestor of its second operand; otherwise it returns false
:
declare function local:precedes($a as node(), $b as node()) as boolean { $a << $b and fn:empty($a//node() intersect $b) };
Similarly, a local:follows
function could be written:
declare function local:follows($a as node(), $b as node()) as boolean { $a >> $b and fn:empty($b//node() intersect $a) };
Using the local:precedes
function, we can write a query that finds
instrument
elements between the first two incisions, excluding from the
query result any instrument
that is a descendant of the first
incision
:
let $proc := /report/procedure[1] for $i in $proc//instrument where local:precedes(($proc//incision)[1], $i) and local:precedes($i, ($proc//incision)[2]) return $i
The following query reports incisions for which no prior anesthesia was recorded in the
surgical report. Since an anesthesia
is never part of an
incision
, we can use <<
instead of the
less-efficient local:precedes
function:
for $proc in /report/procedure where some $i in $proc//incision satisfies fn:empty($proc//anesthesia[. << $i]) return $proc
In some documents, particular sequences of elements may indicate a logical hierarchy.
This is most commonly true of HTML. The following query returns the introduction of an
XHTML document, wrapping it in a div
element. In this example, we assume
that an h2
element containing the text "Introduction" marks the beginning
of the introduction, and the introduction continues until the next h2
or
h1
element, or the end of the document, whichever comes first.
let $intro := //h2[text()="Introduction"], $next-h := //(h1|h2)[. >> $intro][1] return <div> { $intro, if (fn:empty($next-h)) then //node()[. >> $intro] else //node()[. >> $intro and . << $next-h] } </div>
Note that the above query makes explicit the hierarchy that was implicit in the original
document. In this example, we assume that the h2
element containing the
text "Introduction" has no subelements.
Occasionally it is necessary to scan over a hierarchy of elements, applying some transformation at each level of the hierarchy. In XQuery this can be accomplished by defining a recursive function. In this section we will present two examples of such recursive functions.
Suppose that we need to compute a table of contents for a given document by scanning over
the document, retaining only elements named section
or title
,
and preserving the hierarchical relationships among these elements. For each
section
, we retain subelements named section
or
title
; but for each title
, we retain the full content of
the element. This might be accomplished by the following recursive function:
declare function local:sections-and-titles($n as node()) as node()? { if (fn:local-name($n) = "section") then element { fn:local-name($n) } { for $c in $n/* return local:sections-and-titles($c) } else if (fn:local-name($n) = "title") then $n else ( ) };
The "skeleton" of a given document, containing only its sections and titles, can then be
obtained by invoking the local:sections-and-titles
function on the root
node of the document, as follows:
local:sections-and-titles(fn:doc("cookbook.xml"))
As another example of a recursive transformation, suppose that we wish to scan over a
document, transforming every attribute named color
to an element named
color
, and every element named size
to an attribute named
size
. This can be accomplished by the following recursive function
(note that the element constructor in case $e
generates attributes before
child elements):
declare function local:swizzle($n as node()) as node() { typeswitch($n) case $a as attribute(color) return element color { fn:string($a) } case $es as element(size) return attribute size { fn:string($es) } case $e as element() return element { fn:local-name($e) } { for $c in ($e/@* except $e/@color, (: attr -> attr :) $e/size, (: elem -> attr :) $e/@color, (: attr -> elem :) $e/node() except $e/size ) (: elem -> elem :) return local:swizzle($c) } case $d as document-node() return document { for $c in $d/* return local:swizzle($c) } default return $n };
The transformation can be applied to a whole document by invoking the
local:swizzle
function on the root node of the document, as
follows:
local:swizzle(fn:doc("plans.xml"))
It is sometimes necessary to search through a set of data to find all the distinct
combinations of a given list of properties. For example, an input data set might consist
of a large set of order
elements, each of which has the same basic
structure, as illustrated by the following example:
<order> <date>2003-10-15</date> <product>Dress Shirt</product> <size>M</size> <color>Blue</color> <supplier>Fashion Trends</supplier> <quantity>50</quantity> </order>
From this data set, a user might wish to find all the distinct combinations of
product
, size
, and color
that occur together
in an order
. The following query returns this list, enclosing each distinct
combination in a new element named option
:
for $p in fn:distinct-values(/orders/order/product), $s in fn:distinct-values(/orders/order/size), $c in fn:distinct-values(/orders/order/color) order by $p, $s, $c return if (fn:exists(/orders/order[product eq $p and size eq $s and color eq $c])) then <option> <product>{$p}</product> <size>{$s}</size> <color>{$c}</color> </option> else ()