14 Functions and operators on sequences

A sequence is an ordered collection of zero or more items. An item is either a node or an atomic value. The terms sequence and item are defined formally in [XQuery 4.1: An XML Query Language] and [XML Path Language (XPath) 4.0].

14.1 General functions and operators on sequences

The following functions are defined on sequences. These functions work on any sequence, without performing any operations that are sensitive to the individual items in the sequence.

Function Meaning
fn:empty Returns true if the argument is the empty sequence.
fn:exists Returns true if the argument is a non-empty sequence.
fn:foot Returns the last item in a sequence.
fn:head Returns the first item in a sequence.
fn:identity Returns its argument value.
fn:insert-before Returns a sequence constructed by inserting an item or a sequence of items at a given position within an existing sequence.
fn:intersperse Inserts a separator between adjacent items in a sequence.
fn:items-at Returns a sequence containing the items from $input at positions defined by $at, in the order specified.
fn:remove Returns a new sequence containing all the items of $input except the item at position $position.
fn:replicate Produces multiple copies of a sequence.
fn:reverse Reverses the order of items in a sequence.
fn:slice Returns a sequence containing selected items from a supplied input sequence based on their position.
fn:subsequence Returns the contiguous sequence of items in $input beginning at the position indicated by $start and continuing for the number of items indicated by $length.
fn:tail Returns all but the first item in a sequence.
fn:trunk Returns all but the last item in a sequence.
fn:unordered Returns the items of $input in an ·implementation-dependent· order.

As in the previous section, for the illustrative examples below, assume an XQuery or transformation operating on a non-empty Purchase Order document containing a number of line-item elements. The variable $seq is bound to the sequence of line-item nodes in document order. The variables $item1, $item2, etc. are bound to separate, individual line-item nodes in the sequence.

14.1.1 fn:empty

Summary

Returns true if the argument is the empty sequence.

Signature
fn:empty(
$input as item()*
) as xs:boolean
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $input is the empty sequence, the function returns true; otherwise, the function returns false.

Examples

The expression fn:empty((1,2,3)[10]) returns true().

The expression fn:empty(fn:remove(("hello", "world"), 1)) returns false().

The expression fn:empty([]) returns false().

The expression fn:empty(map{}) returns false().

The expression fn:empty("") returns false().

Assuming $in is an element with no children:

               let $break := <br/>
               return fn:empty($break)
            

The result is false().

14.1.2 fn:exists

Summary

Returns true if the argument is a non-empty sequence.

Signature
fn:exists(
$input as item()*
) as xs:boolean
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $input is a non-empty sequence, the function returns true; otherwise, the function returns false.

Examples

The expression fn:exists(fn:remove(("hello"), 1)) returns false().

The expression fn:exists(fn:remove(("hello", "world"), 1)) returns true().

The expression fn:exists([]) returns true().

The expression fn:exists(map{}) returns true().

The expression fn:exists("") returns true().

Assuming $in is an element with no children:

               let $break := <br/>
               return fn:exists($break)
            

The result is true().

14.1.3 fn:foot

Summary

Returns the last item in a sequence.

Signature
fn:foot(
$input as item()*
) as item()?
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of the expression $input[position() = last()]

Notes

If $input is the empty sequence the empty sequence is returned.

Examples

The expression fn:foot(1 to 5) returns (5).

The expression fn:foot(()) returns ().

History

Proposed for 4.0; not yet reviewed.

14.1.4 fn:head

Summary

Returns the first item in a sequence.

Signature
fn:head(
$input as item()*
) as item()?
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of the expression $input[1]

Notes

If $input is the empty sequence, the empty sequence is returned. Otherwise the first item in the sequence is returned.

Examples

The expression fn:head(1 to 5) returns 1.

The expression fn:head(("a", "b", "c")) returns "a".

The expression fn:head(()) returns ().

The expression fn:head([1,2,3]) returns [1,2,3].

14.1.5 fn:identity

Summary

Returns its argument value.

Signature
fn:identity(
$input as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns $input.

Notes

The function is useful in contexts where a function must be supplied, but no processing is required.

Examples

The expression fn:identity(0) returns (0).

The expression fn:identity(1 to 10) returns (1, 2, 3, 4, 5, 6, 7, 8, 9, 10).

The expression fn:identity(/) is / returns true().

The expression fn:identity(()) returns ().

History

New in 4.0. Accepted 2022-09-20.

14.1.6 fn:insert-before

Summary

Returns a sequence constructed by inserting an item or a sequence of items at a given position within an existing sequence.

Signature
fn:insert-before(
$input as item()*,
$position as xs:integer,
$insert as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The value returned by the function consists of all items of $input whose 1-based position is less than $position, followed by all items of $insert, followed by the remaining elements of $input, in that order.

Notes

If $input is the empty sequence, $insert is returned. If $insert is the empty sequence, $input is returned.

If $position is less than one (1), the first position, the effective value of $position is one (1). If $position is greater than the number of items in $input, then the effective value of $position is equal to the number of items in $input plus 1.

The value of $input is not affected by the sequence construction.

Examples
let $abc := ("a", "b", "c")

The expression fn:insert-before($abc, 0, "z") returns ("z", "a", "b", "c").

The expression fn:insert-before($abc, 1, "z") returns ("z", "a", "b", "c").

The expression fn:insert-before($abc, 2, "z") returns ("a", "z", "b", "c").

The expression fn:insert-before($abc, 3, "z") returns ("a", "b", "z", "c").

The expression fn:insert-before($abc, 4, "z") returns ("a", "b", "c", "z").

14.1.7 fn:intersperse

Summary

Inserts a separator between adjacent items in a sequence.

Signature
fn:intersperse(
$input as item()*,
$separator as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of head($input), tail($input) ! ($separator, .).

Notes

If $input contains less than two items then it is returned unchanged.

If $separator is the empty sequence then $input is returned unchanged.

For example, in XQuery, fn:intersperse(para, <hr/>) would insert an empty hr element between adjacent paragraphs.

Examples

The expression fn:intersperse(1 to 5, "|") returns (1, "|", 2, "|" , 3, "|", 4, "|", 5).

The expression fn:intersperse((), "|") returns ().

The expression fn:intersperse("A", "|") returns "A".

The expression fn:intersperse(1 to 5, ()) returns (1, 2, 3, 4, 5).

The expression fn:intersperse(1 to 5, ("⅓", "⅔")) returns (1, "⅓", "⅔", 2, "⅓", "⅔", 3, "⅓", "⅔", 4, "⅓", "⅔", 5).

History

New in 4.0. Accepted 2022-09-27.

14.1.8 fn:items-at

Summary

Returns a sequence containing the items from $input at positions defined by $at, in the order specified.

Signature
fn:items-at(
$input as item()*,
$at as xs:integer*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Returns the value of $at ! fn:subsequence($input, ., 1)

Notes

The effect of the function is to return those items from $items at the positions given by the integers in $at, in the order represented by the integers in $at.

In the simplest case where $at is a single integer, fn:items-at($input, 3) returns the same result as $input[3].

Compared with a simple positional filter expression, the function is useful because:

  1. It can select items at multiple positions, and unlike fn:subsequence, these do not need to be contiguous.

  2. The $at expression can depend on the focus.

  3. The order of the returned items can differ from their order in the $input sequence.

If any integer in $at is outside the range 1 to count($input), that integer is effectively ignored: no error occurs.

If either of the arguments is an empty sequence, the result is an empty sequence.

Examples

The expression fn:items-at(11 to 20, 4) returns 14.

The expression fn:items-at(11 to 20, 4 to 6) returns 14, 15, 16.

The expression fn:items-at(11 to 20, (7, 3)) returns 17, 13.

The expression fn:items-at(11 to 20, fn:index-of(("a", "b", "c"), "b")) returns 12.

The expression fn:items-at(fn:characters("quintessential"), (4, 8, 3)) returns ("n", "s", "i").

The expression fn:items-at((), 832) returns ().

The expression fn:items-at((), ()) returns ().

History

Proposed for 4.0 in issue 213

14.1.9 fn:remove

Summary

Returns a new sequence containing all the items of $input except the item at position $position.

Signature
fn:remove(
$input as item()*,
$position as xs:integer
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns a sequence consisting of all items of $input whose 1-based position is less than $position, followed by all items of $target whose 1-based position is greater than $position.

Notes

If $position is less than 1 or greater than the number of items in $input, $input is returned.

If $input is the empty sequence, the empty sequence is returned.

Examples
let $abc := ("a", "b", "c")

The expression fn:remove($abc, 0) returns ("a", "b", "c").

The expression fn:remove($abc, 1) returns ("b", "c").

The expression fn:remove($abc, 6) returns ("a", "b", "c").

The expression fn:remove((), 3) returns ().

14.1.10 fn:replicate

Summary

Produces multiple copies of a sequence.

Signature
fn:replicate(
$input as item()*,
$count as xs:nonNegativeInteger
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of (1 to $count) ! $input.

Notes

If $input is the empty sequence, the empty sequence is returned.

The $count argument is declared as xs:nonNegativeInteger, which means that a type error occurs if it is called with a negative value.

If the input sequence contains nodes, these are not copied: instead, the result sequence contains multiple references to the same node. So, for example, fn:count(fn:replicate(/, 6)|()) returns 1, because the fn:replicate call creates duplicates, and the union operation eliminates them.

[TODO: the use of type xs:nonNegativeInteger for the second argument assumes we will accept the proposal to allow downcasting in the coercion rules for function arguments. MHK 2022-10-04.]

Examples

The expression fn:replicate(0, 6) returns (0, 0, 0, 0, 0, 0).

The expression fn:replicate(("A", "B", "C"), 3) returns ("A", "B", "C", "A", "B", "C", "A", "B", "C").

The expression fn:replicate((), 5) returns ().

The expression fn:replicate(("A", "B", "C"), 1) returns ("A", "B", "C").

The expression fn:replicate(("A", "B", "C"), 0) returns ().

History

New in 4.0. Accepted 2022-10-04.

14.1.11 fn:reverse

Summary

Reverses the order of items in a sequence.

Signature
fn:reverse(
$input as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns a sequence containing the items in $input in reverse order.

Notes

If $input is the empty sequence, the empty sequence is returned.

Examples
let $abc := ("a", "b", "c")

The expression fn:reverse($abc) returns ("c", "b", "a").

The expression fn:reverse(("hello")) returns ("hello").

The expression fn:reverse(()) returns ().

The expression fn:reverse([1,2,3]) returns [1,2,3]. (The input is a sequence containing a single item (the array)).

The expression fn:reverse(([1,2,3],[4,5,6])) returns ([4,5,6],[1,2,3]).

14.1.12 fn:slice

Summary

Returns a sequence containing selected items from a supplied input sequence based on their position.

Signature
fn:slice(
$input as item()*,
$start as xs:integer? := (),
$end as xs:integer? := (),
$step as xs:integer? := ()
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $input is the empty sequence, the function returns the empty sequence.

Let $S be the first of the following that applies:

  • If $start is absent, empty, or zero, then 1.

  • If $start is negative, then fn:count($input) + $start + 1.

  • Otherwise, $start.

Let $E be the first of the following that applies:

  • If $end is absent, empty, or zero, then fn:count($input).

  • If $end is negative, then fn:count($input) + $end + 1.

  • Otherwise, $end.

Let $STEP be the first of the following that applies:

  • If $step is absent, empty, or zero, then:

    • If $E ge $S, then +1

    • Otherwise -1

  • Otherwise, $step.

If $STEP is negative, the function returns $input => fn:reverse() => fn:slice(-$S, -$E, -$STEP).

Otherwise the function returns the result of the expression:

$input[position() ge $S and position() le $E and (position() - $S) mod $STEP eq 0]
Notes

The function is inspired by the slice operators in Javascript and Python, but it differs in detail to accommodate the tradition of 1-based addressing in XPath. The end position is inclusive rather than exclusive, so that in the simple case where $start and $end are positive and $end > $start, fn:slice($in, $start, $end) returns the same result as $in[position() = $start to $end].

Examples
let $in := ('a', 'b', 'c', 'd', 'e')

The expression fn:slice($in, start:2, end:4) returns ("b", "c", "d").

The expression fn:slice($in, start:2) returns ("b", "c", "d", "e").

The expression fn:slice($in, end:2) returns ("a", "b").

The expression fn:slice($in, start:3, end:3) returns ("c").

The expression fn:slice($in, start:4, end:3) returns ("d", "c").

The expression fn:slice($in, start:2, end:5, step:2) returns ("b", "d").

The expression fn:slice($in, start:5, end:2, step:-2) returns ("e", "c").

The expression fn:slice($in, start:2, end:5, step:-2) returns ().

The expression fn:slice($in, start:5, end:2, step:2) returns ().

The expression fn:slice($in) returns ("a", "b", "c", "d", "e").

The expression fn:slice($in, start:-1) returns ("e").

The expression fn:slice($in, start:-3) returns ("c", "d", "e").

The expression fn:slice($in, end:-2) returns ("a", "b", "c", "d").

The expression fn:slice($in, start:2, end:-2) returns ("b", "c", "d").

The expression fn:slice($in, start:-2, end:2) returns ("d", "c", "b").

The expression fn:slice($in, start:-4, end:-2) returns ("b", "c", "d").

The expression fn:slice($in, start:-2, end:-4) returns ("d", "c", "b").

The expression fn:slice($in, start:-4, end:-2, step:2) returns ("b", "d").

The expression fn:slice($in, start:-2, end:-4, step:-2) returns ("d", "b").

The expression fn:slice(("a", "b", "c", "d"), 0) returns ().

History

Proposed for 4.0; not yet reviewed. The design depends on having functions with keyword arguments.

14.1.13 fn:subsequence

Summary

Returns the contiguous sequence of items in $input beginning at the position indicated by $start and continuing for the number of items indicated by $length.

Signatures
fn:subsequence(
$input as item()*,
$start as xs:double
) as item()*
fn:subsequence(
$input as item()*,
$start as xs:double,
$length as xs:double
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

In the two-argument case, returns:

$input[fn:round($start) le position()]

In the three-argument case, returns:

$input[fn:round($start) le position() 
         and position() lt fn:round($start) + fn:round($length)]
Notes

The first item of a sequence is located at position 1, not position 0.

If $input is the empty sequence, the empty sequence is returned.

In the two-argument case, the function returns a sequence comprising those items of $input whose 1-based position is greater than or equal to $start (rounded to an integer). No error occurs if $start is zero or negative.

In the three-argument case, The function returns a sequence comprising those items of $input whose 1-based position is greater than or equal to $start (rounded to an integer), and less than the sum of $start and $length (both rounded to integers). No error occurs if $start is zero or negative, or if $start plus $length exceeds the number of items in the sequence, or if $length is negative.

As a consequence of the general rules, if $start is -INF and $length is +INF, then fn:round($start) + fn:round($length) is NaN; since position() lt NaN is always false, the result is an empty sequence.

The reason the function accepts arguments of type xs:double is that many computations on untyped data return an xs:double result; and the reason for the rounding rules is to compensate for any imprecision in these floating-point computations.

Examples
let $seq := ("item1", "item2", "item3", "item4", "item5")

The expression fn:subsequence($seq, 4) returns ("item4", "item5").

The expression fn:subsequence($seq, 3, 2) returns ("item3", "item4").

14.1.14 fn:tail

Summary

Returns all but the first item in a sequence.

Signature
fn:tail(
$input as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of the expression subsequence($input, 2)

Notes

If $input is the empty sequence, or a sequence containing a single item, then the empty sequence is returned.

Examples

The expression fn:tail(1 to 5) returns (2, 3, 4, 5).

The expression fn:tail(("a", "b", "c")) returns ("b", "c").

The expression fn:tail("a") returns ().

The expression fn:tail(()) returns ().

The expression fn:tail([1,2,3]) returns ().

14.1.15 fn:trunk

Summary

Returns all but the last item in a sequence.

Signature
fn:trunk(
$input as item()*
) as item()*
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the value of the expression fn:remove($input, count($input))

Notes

If $input is the empty sequence, or a sequence containing a single item, then the empty sequence is returned.

Examples

The expression fn:trunk(1 to 5) returns (1, 2, 3, 4).

The expression fn:trunk(("a", "b", "c")) returns ("a", "b").

The expression fn:trunk("a") returns ().

The expression fn:trunk(()) returns ().

The expression fn:trunk([1,2,3]) returns ().

History

Proposed for 4.0.

14.1.16 fn:unordered

Summary

Returns the items of $input in an ·implementation-dependent· order.

Signature
fn:unordered(
$input as item()*
) as item()*
Properties

This function is ·nondeterministic-wrt-ordering·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the items of $input in an ·implementation-dependent· order.

Notes

Query optimizers may be able to do a better job if the order of the output sequence is not specified. For example, when retrieving prices from a purchase order, if an index exists on prices, it may be more efficient to return the prices in index order rather than in document order.

Examples

The expression fn:unordered((1, 2, 3, 4, 5)) returns some permutation of (1, 2, 3, 4, 5).

14.2 Functions that compare values in sequences

The functions in this section rely on comparisons between the items in one or more sequences.

Function Meaning
fn:starts-with-sequence Determines whether one sequence starts with another, using a supplied callback function to compare items.
fn:ends-with-sequence Determines whether one sequence ends with another, using a supplied callback function to compare items.
fn:contains-sequence Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items.
fn:distinct-values Returns the values that appear in a sequence, with duplicates eliminated.
fn:index-of Returns a sequence of positive integers giving the positions within the sequence $input of items that are equal to $search.
fn:deep-equal This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic values that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members.
fn:differences This function compares two sequences and returns information about their differences.

14.2.1 fn:starts-with-sequence

Summary

Determines whether one sequence starts with another, using a supplied callback function to compare items.

Signature
fn:starts-with-sequence(
$input as item()*,
$subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := fn:deep-equal#2
) as xs:boolean
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Informally, the function returns true if $input starts with $subsequence, when items are compared using the supplied (or default) $compare function.

More formally, the function returns the value of the expression:

fn:count($input) ge fn:count($subsequence) 
and fn:all(fn:for-each-pair($input, $subsequence, $compare))
Notes

There is no requirement that the $compare function should have the traditional qualities of equality comparison. The result is well-defined, for example, even if $compare is not transitive or not symmetric.

Examples

The expression fn:starts-with-sequence((), ()) returns true().

The expression fn:starts-with-sequence(1 to 10, 1 to 5) returns true().

The expression fn:starts-with-sequence(1 to 10, ()) returns true().

The expression fn:starts-with-sequence(1 to 10, 1 to 10) returns true().

The expression fn:starts-with-sequence(1 to 10, 1) returns true().

The expression fn:starts-with-sequence(1 to 10, 101 to 105, ->($x, $y){$x mod 100 = $y mod 100}) returns true().

The expression fn:starts-with-sequence(("A", "B", "C"), ("a", "b"), ->($x, $y){fn:compare($x, $y, "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive") eq 0}) returns true().

The expression let $p := parse-xml("<doc><chap><p/><p/></chap></doc>")//p[2] return fn:starts-with-sequence($p!ancestor::*, $p!parent::*, op("is")) returns true().

The expression fn:starts-with-sequence(10 to 20, 1 to 5, op("gt")) returns true().

The expression fn:starts-with-sequence(("Alpha", "Beta", "Gamma"), ("A", "B"), fn:starts-with#2) returns true().

The expression fn:starts-with-sequence(("Alpha", "Beta", "Gamma", "Delta"), 1 to 3, ->($x, $y){fn:ends-with($x, 'a')} returns true(). (True because the first three items in the input sequence end with "a".)

History

Accepted 2022-11-01

14.2.2 fn:ends-with-sequence

Summary

Determines whether one sequence ends with another, using a supplied callback function to compare items.

Signature
fn:ends-with-sequence(
$input as item()*,
$subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := fn:deep-equal#2
) as xs:boolean
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Informally, the function returns true if $input ends with $subsequence, when items are compared using the supplied (or default) $compare function.

More formally, the function returns the value of the expression:

fn:starts-with-sequence(fn:reverse($input), fn:reverse($subsequence), $compare)
Notes

There is no requirement that the $compare function should have the traditional qualities of equality comparison. The result is well-defined, for example, even if $compare is not transitive or not symmetric.

Examples

The expression fn:ends-with-sequence((), ()) returns true().

The expression fn:ends-with-sequence(1 to 10, 5 to 10) returns true().

The expression fn:ends-with-sequence(1 to 10, ()) returns true().

The expression fn:ends-with-sequence(1 to 10, 1 to 10) returns true().

The expression fn:ends-with-sequence(1 to 10, 10) returns true().

The expression fn:ends-with-sequence(1 to 10, 108 to 110, ->($x, $y){$x mod 100 = $y mod 100}) returns true().

The expression fn:ends-with-sequence(("A", "B", "C"), ("b", "c"), ->($x, $y){fn:compare($x, $y, "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive") eq 0}) returns true().

The expression let $p := parse-xml("<doc><chap><p/><p/></chap></doc>")//p[2] return fn:ends-with-sequence($p!ancestor::node(), $p!root(), op("is"))</fos:expression> <fos:result>true() returns true().

The expression fn:ends-with-sequence(10 to 20, 1 to 5, op("gt")) returns true().

The expression fn:ends-with-sequence(("Alpha", "Beta", "Gamma"), ("B", "G"), fn:starts-with#2) returns true().

The expression fn:ends-with-sequence(("Alpha", "Beta", "Gamma", "Delta"), 1 to 2, ->($x, $y){fn:string-length($x) eq 5} returns true(). (True because the last two items in the input sequence have a string length of 5.)

History

Accepted 2022-11-01

14.2.3 fn:contains-sequence

Summary

Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items.

Signature
fn:contains-sequence(
$input as item()*,
$subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := fn:deep-equal#2
) as xs:boolean
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Informally, the function returns true if $input contains a consecutive subsequence matching $subsequence, when items are compared using the supplied (or default) $compare function.

More formally, the function returns the value of the expression:

if (fn:starts-with-sequence($input, $subsequence, $compare))
then true()
else if (fn:empty($input))
     then false()
     else fn:contains-sequence(fn:tail($input, $subsequence, $compare))
Notes

There is no requirement that the $compare function should have the traditional qualities of equality comparison. The result is well-defined, for example, even if $compare is not transitive or not symmetric.

Examples

The expression fn:contains-sequence((), ()) returns true().

The expression fn:contains-sequence(1 to 10, 3 to 6) returns true().

The expression fn:contains-sequence(1 to 10, (2, 4, 6)) returns false().

The expression fn:contains-sequence(1 to 10, ()) returns true().

The expression fn:contains-sequence(1 to 10, 1 to 10) returns true().

The expression fn:contains-sequence(1 to 10, 5) returns true().

The expression fn:contains-sequence(1 to 10, 103 to 105, ->($x, $y){$x mod 100 = $y mod 100}) returns true().

The expression fn:contains-sequence(("A", "B", "C", "D"), ("b", "c"), ->($x, $y){fn:compare($x, $y, "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive") eq 0}) returns true().

The expression let $chap := parse-xml("<doc><chap><h1/><p/><p/><footnote/></chap></doc>")//chap return fn:contains-sequence($chap!child::*, $chap!child::p, op("is")) returns true(). (True because the p children of the chap element form a contiguous subsequence.)

The expression fn:contains-sequence(10 to 20, (5, 3, 1), op("gt")) returns true().

The expression fn:contains-sequence(("Alpha", "Beta", "Gamma", "Delta"), ("B", "G"), fn:starts-with#2) returns true().

The expression fn:contains-sequence(("Zero", "Alpha", "Beta", "Gamma", "Delta", "Epsilon"), 1 to 4, ->($x, $y){fn:ends-with($x, 'a')} returns true(). (True because there is a run of 4 consecutive items ending in "a".)

History

Accepted 2022-11-01

14.2.4 fn:distinct-values

Summary

Returns the values that appear in a sequence, with duplicates eliminated.

Signatures
fn:distinct-values(
$values as xs:anyAtomicType*
) as xs:anyAtomicType*
fn:distinct-values(
$values as xs:anyAtomicType*,
$collation as xs:string
) as xs:anyAtomicType*
Properties

The one-argument form of this function is ·nondeterministic-wrt-ordering·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

The function returns the sequence that results from removing from $values all but one of a set of values that are considered equal to one another. Two items $J and $K in the input sequence (after atomization, as required by the function signature) are considered equal if fn:deep-equal($J, $K, $coll) is true, where $coll is the collation selected according to the rules in 5.3.5 Choosing a collation. This collation is used when string comparison is required.

The order in which the sequence of values is returned is ·implementation-dependent·.

Which value of a set of values that compare equal is returned is ·implementation-dependent·.

Notes

If $values is the empty sequence, the function returns the empty sequence.

Values of type xs:untypedAtomic are compared as if they were of type xs:string.

Values that cannot be compared, because the eq operator is not defined for their types, are considered to be distinct.

For xs:float and xs:double values, positive zero is equal to negative zero and, although NaN does not equal itself, if $values contains multiple NaN values a single NaN is returned.

If xs:dateTime, xs:date or xs:time values do not have a timezone, they are considered to have the implicit timezone provided by the dynamic context for the purpose of comparison. Note that xs:dateTime, xs:date or xs:time values can compare equal even if their timezones are different.

In previous versions of this specification, problems could arise when the input sequence contained a mix of different numeric types, due to non-transitivity of the eq operator in edge cases. This problem has been fixed by changes to the behavior of op:numeric-equal: see 4.3 Comparison operators on numeric values.

Examples

The expression fn:distinct-values((1, 2.0, 3, 2)) returns some permutation of (1, 3, 2.0). (The result may include either the xs:integer 2 or the xs:decimal 2.0).

The expression fn:distinct-values((xs:untypedAtomic("cherry"), xs:untypedAtomic("plum"), xs:untypedAtomic("plum"))) returns some permutation of (xs:untypedAtomic("cherry"), xs:untypedAtomic("plum")).

14.2.5 fn:index-of

Summary

Returns a sequence of positive integers giving the positions within the sequence $input of items that are equal to $search.

Signatures
fn:index-of(
$input as xs:anyAtomicType*,
$search as xs:anyAtomicType
) as xs:integer*
fn:index-of(
$input as xs:anyAtomicType*,
$search as xs:anyAtomicType,
$collation as xs:string
) as xs:integer*
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

The function returns a sequence of positive integers giving the positions within the sequence $input of items that are equal to $search.

The collation used by this function is determined according to the rules in 5.3.5 Choosing a collation. This collation is used when string comparison is required.

The items in the sequence $input are compared with $search under the rules for the eq operator. Values of type xs:untypedAtomic are compared as if they were of type xs:string. Values that cannot be compared, because the eq operator is not defined for their types, are considered to be distinct. If an item compares equal, then the position of that item in the sequence $input is included in the result.

The first item in a sequence is at position 1, not position 0.

The result sequence is in ascending numeric order.

Notes

If $input is the empty sequence, or if no item in $input matches $search, then the function returns the empty sequence.

No error occurs if non-comparable values are encountered. So when comparing two atomic values, the effective boolean value of fn:index-of($a, $b) is true if $a and $b are equal, false if they are not equal or not comparable.

Examples

The expression fn:index-of((10, 20, 30, 40), 35) returns ().

The expression fn:index-of((10, 20, 30, 30, 20, 10), 20) returns (2, 5).

The expression fn:index-of(("a", "sport", "and", "a", "pastime"), "a") returns (1, 4).

The expression fn:index-of(current-date(), 23) returns ().

The expression fn:index-of([1, [5, 6], [6, 7]], 6) returns (3, 4). (The array is atomized to a sequence of five integers).

If @a is an attribute of type xs:NMTOKENS whose string value is "red green blue", and whose typed value is therefore ("red", "green", "blue"), then fn:index-of(@a, "blue") returns 3. This is because the function calling mechanism atomizes the attribute node to produce a sequence of three xs:NMTOKEN values.

14.2.6 fn:deep-equal

Summary

This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic values that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members.

Signatures
fn:deep-equal(
$input1 as item()*,
$input2 as item()*
) as xs:boolean
fn:deep-equal(
$input1 as item()*,
$input2 as item()*,
$collation as xs:string
) as xs:boolean
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

The $collation argument identifies a collation which is used at all levels of recursion when strings are compared (but not when names are compared), according to the rules in 5.3.5 Choosing a collation.

If the two sequences are both empty, the function returns true.

If the two sequences are of different lengths, the function returns false.

If the two sequences are of the same length, the function returns true if and only if every item in the sequence $input1 is deep-equal to the item at the same position in the sequence $input2. The rules for deciding whether two items are deep-equal follow.

Call the two items $i1 and $i2 respectively.

If $i1 and $i2 are both atomic values, they are deep-equal if and only if ($i1 eq $i2) is true, or if both values are NaN. If the eq operator is not defined for $i1 and $i2, the function returns false.

If $i1 and $i2 are both ·maps·, the result is true if and only if all the following conditions apply:

  1. Both maps have the same number of entries.

  2. For every entry in the first map, there is an entry in the second map that:

    1. has the ·same key· (note that the collation is not used when comparing keys), and

    2. has the same associated value (compared using the fn:deep-equal function, under the collation supplied in the original call to fn:deep-equal).

If $i1 and $i2 are both arrays, the result is true if and only if all the following conditions apply:

  1. Both arrays have the same number of members (array:size($i1) eq array:size($i2)).

  2. Members in the same position of both arrays are deep-equal to each other, under the collation supplied in the original call to fn:deep-equal: that is, every $p in 1 to array:size($i1) satisfies deep-equal($i1($p), $i2($p), $collation)

If $i1 and $i2 are both nodes, they are compared as described below:

  1. If the two nodes are of different kinds, the result is false.

  2. If the two nodes are both document nodes then they are deep-equal if and only if the sequence $i1/(*|text()) is deep-equal to the sequence $i2/(*|text()).

    Note:

    This rule was designed to ensure that comments and processing instructions are ignored in the comparison. Unfortunately, however, it fails to merge text nodes that are separated by a comment or processing instruction. This oversight has been corrected in the new fn:differences function.

  3. If the two nodes are both element nodes then they are deep-equal if and only if all of the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. Either both nodes are annotated as having simple content or both nodes are annotated as having complex content. For this purpose "simple content" means either a simple type or a complex type with simple content; "complex content" means a complex type whose variety is mixed, element-only, or empty.

      Note:

      It is a consequence of this rule that validating a document D against a schema will usually (but not necessarily) result in a document that is not deep-equal to D. The exception is when the schema allows all elements to have mixed content.

    3. The two nodes have the same number of attributes, and for every attribute $a1 in $i1/@* there exists an attribute $a2 in $i2/@* such that $a1 and $a2 are deep-equal.

    4. One of the following conditions holds:

      • Both element nodes are annotated as having simple content (as defined in 3(b) above), and the typed value of $i1 is deep-equal to the typed value of $i2.

      • Both element nodes have a type annotation that is a complex type with variety element-only, and the sequence $i1/* is deep-equal to the sequence $i2/*.

      • Both element nodes have a type annotation that is a complex type with variety mixed, and the sequence $i1/(*|text()) is deep-equal to the sequence $i2/(*|text()).

        Note:

        This rule was designed to ensure that comments and processing instructions are ignored in the comparison. Unfortunately, however, it fails to merge text nodes that are separated by a comment or processing instruction. This oversight has been corrected in the new fn:differences function.

      • Both element nodes have a type annotation that is a complex type with variety empty.

  4. If the two nodes are both attribute nodes then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. The typed value of $i1 is deep-equal to the typed value of $i2.

  5. If the two nodes are both processing instruction nodes, then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. The string value of $i1 is equal to the string value of $i2.

  6. If the two nodes are both namespace nodes, then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes either have the same name or are both nameless, that is fn:deep-equal(node-name($i1), node-name($i2)).

    2. The string value of $i1 is equal to the string value of $i2 when compared using the Unicode codepoint collation.

  7. If the two nodes are both text nodes or comment nodes, then they are deep-equal if and only if their string-values are equal.

In all other cases the result is false.

Error Conditions

A type error is raised [err:FOTY0015] if either input sequence contains a function item that is not a map or array.

Notes

The two nodes are not required to have the same type annotation, and they are not required to have the same in-scope namespaces. They may also differ in their parent, their base URI, and the values returned by the is-id and is-idrefs accessors (see Section 5.5 is-id AccessorDM40 and Section 5.6 is-idrefs AccessorDM40). The order of children is significant, but the order of attributes is insignificant.

The contents of comments and processing instructions are significant only if these nodes appear directly as items in the two sequences being compared. The content of a comment or processing instruction that appears as a descendant of an item in one of the sequences being compared does not affect the result. However, the presence of a comment or processing instruction, if it causes a text node to be split into two text nodes, may affect the result.

Comparing items of different kind (for example, comparing an atomic value to a node, or a map to an array, or an integer to an xs:date) returns false, it does not return an error. So the result of fn:deep-equal(1, current-dateTime()) is false.

Comparing a function (other than a map or array) to any other value raises a type error.

Examples
let $at := <attendees> <name last='Parker'
            first='Peter'/> <name last='Barker' first='Bob'/> <name last='Parker'
            first='Peter'/> </attendees>

The expression fn:deep-equal($at, $at/*) returns false().

The expression fn:deep-equal($at/name[1], $at/name[2]) returns false().

The expression fn:deep-equal($at/name[1], $at/name[3]) returns true().

The expression fn:deep-equal($at/name[1], 'Peter Parker') returns false().

The expression fn:deep-equal(map{1:'a', 2:'b'}, map{2:'b', 1:'a'}) returns true().

The expression fn:deep-equal([1, 2, 3], [1, 2, 3]) returns true().

The expression fn:deep-equal((1, 2, 3), [1, 2, 3]) returns false().

14.2.7 fn:differences

Summary

This function compares two sequences and returns information about their differences.

Signatures
fn:differences(
$input1 as item(),
$input2 as item()
) as map(*)*
fn:differences(
$input1 as item(),
$input2 as item(),
$options as map(*)
) as map(*)*
fn:differences(
$input1 as item(),
$input2 as item(),
$options as map(*),
$collation as xs:string
) as map(*)*
Properties

The two-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The three-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

The four-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

Calling the 2-argument version of the function has the same effect as calling the 3-argument version with an empty map as the third argument.

Calling the 3-argument version of the function has the same effect as calling the 4-argument version with the default collation as the fourth argument.

The $collation argument identifies a collation which is used at all levels of recursion when general strings are compared (but not when node names or map keys are compared), according to the rules in 5.3.5 Choosing a collation.

The behavior of the function is described in terms of a sequence of rules, which are applied in order. Each rule takes two values as input, and produces either a difference or nothing as its result. The final result of the fn:differences function is a sequence of differences, represented as maps, and is empty if no differences were found.

The specification is recursive: recursion is used both when comparing node trees, and when comparing trees comprising maps and arrays. When differences are found at any level, the information that is returned identifying the difference includes a path, in the form of a path expression, indicating how the relevant item was reached from the value passed as an argument. A single path is sufficient to identify the data in both the input sequences. The path is a string, in the general form of a path expression, and the specification indicates for each level of recursion how this path is built up. For example: the path $input identifies the input sequence supplied in the call to the expression; $input[3] identifies the third item in the input sequence; if this is an array, $input[3](2)[1] identifies the first item in the second member of that array; if this is a node, then $input[3](2)[1]/node()[3] identifies its third child node, and $input[3](2)[1]/node()[3]/@Q{}name identifies a named attribute of that node.

Each rule has the following properties:

  • Name: the name of the rule. This is used in two ways: in the result of the function, it identifies which rule was not satisfied. In the $options argument to the function, it can be used to suppress checking of a particular rule by setting the corresponding option to false. For example, setting map{"ATTRIBUTES":false()} means that the test named "ATTRIBUTES" is not applied, which means that attributes are not considered when comparing two elements.

  • Condition: indicates what kind of values the rule applies to. The condition applies to both values, and the rule is applied only if the condition is satisfied for both values. In most cases the condition is expressed simply as a SequenceType, and the rule is applicable to items that are instances of that SequenceType.

  • Test: the test that is applied to the two values. In many cases the test is expressed as an XPath expression, in which the two values are denoted as $A and $B. The test is satisfied if the expression returns true; it fails if the expression returns false or fails with a dynamic error.

    If the test is satisfied, no output is generated. If the test fails, a difference record is appended to the output of the function.

    Some tests invoke recursive application of the rules. The recursive call appends a string to the current path information, so that the location of differences can be determined. The result of the recursive call is a sequence of difference records, possibly empty, which is appended to the function result.

    If a rule for comparing two values fails, no further rules for comparing those two values are evaluated. This includes rules that invoke recursion. Comparison of other pairs of values continues, until either all values have been processed, or a limit is reached on the number of differences found.

The result of the function is a sequence of maps, each map holding information about one difference (that is, a failure to satisfy a rule). The map contains the following entries:

key type value
A item()* The first value being compared
B item()* The second value being compared
rule xs:string The name of the rule that was not satisfied
description xs:string A description of the mismatch, intended for the human reader. The content is ·implementation-dependent·.
path xs:string

Path to the items from the root. This captures how the failing values were reached from the original input to the function, as a sequence of selection steps. The steps recorded are as follows:

  • When selecting an item within a sequence, the 1-based position of the item within the sequence.

  • When selecting a member within an array, the 1-based position of the member within the array. Note that this applies whenever a value is processed as a sequence, even if it is actually a singleton sequence. For example, the difference between arrays [1, 2, 4] and [1, 2, 5] will have the path (1, 3, 1) because the difference is in the first item of the sequences passed to fn:differences, and it is in the third member of these arrays, and it is in the first item of these members (array members, in general, are sequences).

  • When selecting a key-value pair within a map, the key value.

  • When selecting the children of a document or element node, the 1-based position of the child node on the child axis. This is the position in the normalized tree (see below).

  • When selecting the attributes of an element node, the node-name of the attribute as an xs:QName.

  • When selecting the namespaces of an element node, the namespace prefix as an NCName or as a zero-length string.

Note:

If the two arguments to fn:differences are singleton items, then the path will always start with the integer 1 (one), that being the position of these items within the containing sequence. Similarly, if the members of an array are singleton items, then the path will contain first an integer representing the position of the member within the array, and then the integer 1 (one) to indicate that the difference is in the first item of the respective array members (members of an array are in general sequences.)

The option limit in the $options argument may be set to an integer indicating the maximum number of differences that should be reported. This is advisory only. The default is 100.

The rules for comparing sequences (at any level of recursion) are as follows:

Name Condition Test

COUNT

item()*

count($A) eq count($B)

ITEMS

item()*

For each pair of items in corresponding positions in the two sequences, apply the rules for comparing items recursively, appending [N] to the current path where N is the 1-based position of the item in the sequence.

The rules for comparing items are given in the next table. Most of the rules are checked by default; those that are not are marked using the symbol † after the name. A rule that is normally checked can be suppressed by including an entry in $options whose key matches the rule name, and whose value is false(). Conversely, a rule that is not checked by default can be activated by means of an entry whose value is true()

Name Condition Test

KIND

item()

Either both items are atomic, or both items are nodes, or both items are functions.

VALUE

xs:anyAtomicType

$A eq $B (using the appropriate collation).

ATOMIC-TYPE†

xs:anyAtomicType

$A and $B have the same type annotation.

NODE-KIND

node()

$A and $B are nodes of the same kind (for example, both elements, or both attributes)

NODE-NAME

node()

fn:deep-equal(fn:node-name($A), fn:node-name($B))

PREFIX†

element(*) or attribute(*)

fn:codepoint-equal(fn:prefix-from-QName(fn:node-name($A)), fn:prefix-from-QName(fn:node-name($B)))

NODE-TYPE-ANNOTATION†

element(*) or attribute(*)

$A and $B have the same type annotation

BASE-URI†

document-node() or element(*)

fn:codepoint-equal(fn:base-uri($A), fn:base-uri($B)))

CONTENT

element(*) or document-node()

Apply the rules for comparing sequences recursively to the sequence of child nodes, appending "/node()" to the path.

ATTRIBUTES

element(*)

Construct maps representing the attributes of the two elements by applying the function map:group-by($A/@*, fn:node-name#1) to each of them, and compare these two maps by recursively invoking the rules for comparing items, appending "/@" to the path.

NAMESPACES†

element(*)

Construct maps representing the namespaces of the two elements by applying the function fn:in-scope-namespaces() to each of them, and compare these two maps by recursively invoking the rules for comparing items, appending "/namespace::" to the path.

STRING-VALUE

node()

string($A) eq string($B) (using the appropriate collation)

TYPED-VALUE†

node()

Compare the typed values of the two nodes by recursively invoking the rules for comparing sequences, appending /data() to the path.

Note:

The typed value of a node is, in general, a sequence.

FUNCTION-KIND

function(*)

$A instance of map(*) eq $B instance of map(*) and $A instance of array(*) eq $B instance of array(*)

ARRAY-SIZE

array(*)

array:size($A) eq array:size($B)

ARRAY-CONTENT

array(*)

for every $i in (1 to min((array:size($A), array:size($B))), compare $A?$i and $B?$i, by recursively applying the rules for comparing sequences, appending ?N to the path where N is the value of $i.

3

MAP-SIZE

map(*)

map:size($A) eq map:size($B)

MAP-KEYS

map(*)

map:size(map:remove($A, map:keys($B)))=0 and map:size(map:remove($B, map:keys($A)))=0

Note:

That is, the two maps have the same set of key values

5

MAP-ENTRIES

map(*)

For every key $k in map:keys($A), compare $A?$k and $B?$k by recursively applying the rules for comparing sequences, appending to the path as follows:

  • If the current path ends with "/@" (that is, if we are comparing attribute values), append the local name of the attribute node if it is in no namespace, or a string in the form Q{uri}local otherwise.

  • If the current path ends with "/namespace::" (that is, if we are comparing namespace nodes), append the local name of the namespace node if it has a name, or *[name()=''] otherwise.

  • If $k is an instance of xs:string, xs:anyURI, or xs:anyAtomicType then append ? followed by the value of $k, surrounded in double quotes if and only if it is not a valid NCName.

  • If $k is an instance of xs:QName, append

  • If $k is any other value, append the key value $k in the form ?(type("value")) where type is the (unprefixed) local name of its primitive type, and value is the result of casting to string: for example ?(date("2020-12-31")).

assessed by applying fn:differences to the two sequences, using the same options, and retaining $k in the path.

2

FUNCTION-NAME

function(*)

fn:deep-equal(fn:function-name($A), fn:function-name($B))

2

FUNCTION-ARITY

function(*)

fn:function-arity($A) eq fn:function-arity($B)

2

FUNCTION-SIGNATURE†

function(*)

The signatures of the two functions are identical (that is, the types of the arguments and the type of the result, but ignoring the names of arguments).

Prior to comparison, the supplied sequences may be normalized. By default, no normalization is performed. The following normalizations are defined. Each is performed only if there is an entry in $options whose key matches the name of the normalization rule, and whose corresponding value is true().

Normalization Rule Action

ignore-comments

Comment nodes are removed within a tree (but not if they appear as top-level items). Following removal of a comment node, adjacent text nodes are merged.

ignore-processing-instructions

Processing instruction nodes are removed within a tree (but not if they appear as top-level items). Following removal of a processing instruction node, adjacent text nodes are merged.

ignore-whitespace-nodes

Whitespace text nodes are removed within a tree (but not if they appear as top-level items).

normalize-space

Any string values that are compared using a collation are first processed using the fn:normalize-space() function.

normalize-unicode

Any string values that are compared using a collation are first processed using the fn:normalize-unicode() function. This is applied after normalize-space.

Notes

The function is primarily designed to enable testing of the results of queries and stylesheets by comparing actual results with expected results. In this scenario, it is useful to know not only whether the actual results match the expected results, but also what the differences are, if any. It is also useful to be able to control which properties of the results are compared, for example, whether namespace prefixes, in-scope namespaces, and whitespace text nodes are considered significant.

Broadly speaking, the function returns an empty sequence in situations where fn:deep-equal returns true. However, the two functions differ slightly in what properties of the supplied input values are considered significant. A reasonably close (but not exact) approximation to the rules for fn:deep-equal is achieved by setting the normalization options ignore-comments and ignore-processing-instructions to true, and by suppressing the tests ATOMIC-TYPE, PREFIX, NODE-TYPE-ANNOTATION, and NAMESPACES.

The function is specified to achieve a high level of interoperability between implementations, but it is to be expected that some differences in results will arise because different implementations perform the same tests in a different order.

History

Proposed for 4.0; not yet reviewed.

14.3 Functions that test the cardinality of sequences

The following functions test the cardinality of their sequence arguments.

Function Meaning
fn:zero-or-one Returns input if it contains zero or one items. Otherwise, raises an error.
fn:one-or-more Returns $input if it contains one or more items. Otherwise, raises an error.
fn:exactly-one Returns $input if it contains exactly one item. Otherwise, raises an error.

The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one defined in this section, check that the cardinality of a sequence is in the expected range. They are particularly useful with regard to static typing. For example, the function call fn:remove($seq, fn:index-of($seq2, 'abc')) requires the result of the call on fn:index-of to be a singleton integer, but the static type system cannot infer this; writing the expression as fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc'))) will provide a suitable static type at query analysis time, and ensures that the length of the sequence is correct with a dynamic check at query execution time.

The type signatures for these functions deliberately declare the argument type as item()*, permitting a sequence of any length. A more restrictive signature would defeat the purpose of the function, which is to defer cardinality checking until query execution time.

14.3.1 fn:zero-or-one

Summary

Returns input if it contains zero or one items. Otherwise, raises an error.

Signature
fn:zero-or-one(
$input as item()*
) as item()?
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Except in error cases, the function returns $input unchanged.

Error Conditions

A dynamic error is raised [err:FORG0003] if $input contains more than one item.

14.3.2 fn:one-or-more

Summary

Returns $input if it contains one or more items. Otherwise, raises an error.

Signature
fn:one-or-more(
$input as item()*
) as item()+
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Except in error cases, the function returns $input unchanged.

Error Conditions

A dynamic error is raised [err:FORG0004] if $input is an empty sequence.

14.3.3 fn:exactly-one

Summary

Returns $input if it contains exactly one item. Otherwise, raises an error.

Signature
fn:exactly-one(
$input as item()*
) as item()
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Except in error cases, the function returns $input unchanged.

Error Conditions

A dynamic error is raised [err:FORG0005] if $input is an empty sequence or a sequence containing more than one item.

14.4 Aggregate functions

Aggregate functions take a sequence as argument and return a single value computed from values in the sequence. Except for fn:count, the sequence must consist of values of a single type or one if its subtypes, or they must be numeric. xs:untypedAtomic values are permitted in the input sequence and handled by special conversion rules. The type of the items in the sequence must also support certain operations.

Function Meaning
fn:count Returns the number of items in a sequence.
fn:avg Returns the average of the values in the input sequence $values, that is, the sum of the values divided by the number of values.
fn:max Returns a value that is equal to the highest value appearing in the input sequence.
fn:min Returns a value that is equal to the lowest value appearing in the input sequence.
fn:sum Returns a value obtained by adding together the values in $values.
fn:all-equal Returns true if all items in a supplied sequence (after atomization) are equal.
fn:all-different Returns true if no two items in a supplied sequence are equal.

14.4.1 fn:count

Summary

Returns the number of items in a sequence.

Signature
fn:count(
$input as item()*
) as xs:integer
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns the number of items in $input.

Notes

Returns 0 if $input is the empty sequence.

Examples
let $seq1 := ($item1, $item2)
let $seq2 := (98.5, 98.3, 98.9)
let $seq3 := ()

The expression fn:count($seq1) returns 2.

The expression fn:count($seq3) returns 0.

The expression fn:count($seq2) returns 3.

The expression fn:count($seq2[. > 100]) returns 0.

The expression fn:count([]) returns 1.

The expression fn:count([1,2,3]) returns 1.

14.4.2 fn:avg

Summary

Returns the average of the values in the input sequence $values, that is, the sum of the values divided by the number of values.

Signature
fn:avg(
$values as xs:anyAtomicType*
) as xs:anyAtomicType?
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $values is the empty sequence, the empty sequence is returned.

If $values contains values of type xs:untypedAtomic they are cast to xs:double.

Duration values must either all be xs:yearMonthDuration values or must all be xs:dayTimeDuration values. For numeric values, the numeric promotion rules defined in 4.2 Arithmetic operators on numeric values are used to promote all values to a single common type. After these operations, $values must satisfy the following condition:

There must be a type T such that:

  1. every item in $values is an instance of T.

  2. T is one of xs:double, xs:float, xs:decimal, xs:yearMonthDuration, or xs:dayTimeDuration.

The function returns the average of the values as sum($values) div count($values); but the implementation may use an otherwise equivalent algorithm that avoids arithmetic overflow.

Error Conditions

A type error is raised [err:FORG0006] if the input sequence contains items of incompatible types, as described above.

Examples
let $d1 := xs:yearMonthDuration("P20Y")
let $d2 := xs:yearMonthDuration("P10M")
let $seq3 := (3, 4, 5)

The expression fn:avg($seq3) returns 4.0. (The result is of type xs:decimal.)

The expression fn:avg(($d1, $d2)) returns xs:yearMonthDuration("P10Y5M").

fn:avg(($d1, $seq3)) raises a type error [err:FORG0006].

The expression fn:avg(()) returns ().

The expression fn:avg((xs:float('INF'), xs:float('-INF'))) returns xs:float('NaN').

The expression fn:avg(($seq3, xs:float('NaN'))) returns xs:float('NaN').

14.4.3 fn:max

Summary

Returns a value that is equal to the highest value appearing in the input sequence.

Signatures
fn:max(
$values as xs:anyAtomicType*
) as xs:anyAtomicType?
fn:max(
$values as xs:anyAtomicType*,
$collation as xs:string
) as xs:anyAtomicType?
Properties

The zero-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

The following conversions are applied to the input sequence $values, in order:

  1. Values of type xs:untypedAtomic in $values are cast to xs:double.

  2. If the resulting sequence contains values that are instances of more than one primitive type (meaning the 19 primitive types defined in [Schema 1.1 Part 2]), then:

    1. If each value is an instance of one of the types xs:string or xs:anyURI, then all the values are cast to type xs:string.

    2. If each value is an instance of one of the types xs:decimal or xs:float, then all the values are cast to type xs:float.

    3. If each value is an instance of one of the types xs:decimal, xs:float, or xs:double, then all the values are cast to type xs:double.

    4. Otherwise, a type error is raised [err:FORG0006].

    Note:

    The primitive type of an xs:integer value for this purpose is xs:decimal.

The items in the resulting sequence may be reordered in an arbitrary order. The resulting sequence is referred to below as the converted sequence. The function returns an item from the converted sequence rather than the input sequence.

If the converted sequence is empty, the function returns the empty sequence.

All items in the converted sequence must be derived from a single base type for which the le operator is defined. In addition, the values in the sequence must have a total order. If date/time values do not have a timezone, they are considered to have the implicit timezone provided by the dynamic context for the purpose of comparison. Duration values must either all be xs:yearMonthDuration values or must all be xs:dayTimeDuration values.

If the converted sequence contains the value NaN, the value NaN is returned (as an xs:float or xs:double as appropriate).

If the items in the converted sequence are of type xs:string or types derived by restriction from xs:string, then the determination of the item with the smallest value is made according to the collation that is used. If the type of the items in the converted sequence is not xs:string and $collation is specified, the collation is ignored.

The collation used by this function is determined according to the rules in 5.3.5 Choosing a collation.

The function returns the result of the expression:

   if (every $v in $c satisfies $c[1] ge $v)
   then $c[1] 
   else fn:max(fn:tail($c))

evaluated with $collation as the default collation if specified, and with $c as the converted sequence.

Error Conditions

A type error is raised [err:FORG0006] if the input sequence contains items of incompatible types, as described above.

Notes

Because the rules allow the sequence to be reordered, if there are two or more items that are "equal highest", the specific item whose value is returned is ·implementation-dependent·. This can arise for example if two different strings compare equal under the selected collation, or if two different xs:dateTime values compare equal despite being in different timezones.

If the converted sequence contains exactly one value then that value is returned.

The default type when the fn:max function is applied to xs:untypedAtomic values is xs:double. This differs from the default type for operators such as gt, and for sorting in XQuery and XSLT, which is xs:string.

The rules for the dynamic type of the result are stricter in version 3.1 of the specification than in earlier versions. For example, if all the values in the input sequence belong to types derived from xs:integer, version 3.0 required only that the result be an instance of the least common supertype of the types present in the input sequence; Version 3.1 requires that the returned value retains its original type. This does not apply, however, where type promotion is needed to convert all the values to a common primitive type.

Examples

The expression fn:max((3,4,5)) returns 5.

The expression fn:max([3,4,5]) returns 5. (Arrays are atomized).

The expression fn:max((xs:integer(5), xs:float(5.0), xs:double(0))) returns xs:double(5.0e0).

fn:max((3,4,"Zero")) raises a type error [err:FORG0006].

The expression fn:max((fn:current-date(), xs:date("2100-01-01"))) returns xs:date("2100-01-01"). (Assuming that the current date is during the 21st century.)

The expression fn:max(("a", "b", "c")) returns "c". (Assuming a typical default collation.)

14.4.4 fn:min

Summary

Returns a value that is equal to the lowest value appearing in the input sequence.

Signatures
fn:min(
$values as xs:anyAtomicType*
) as xs:anyAtomicType?
fn:min(
$values as xs:anyAtomicType*,
$collation as xs:string
) as xs:anyAtomicType?
Properties

The zero-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and implicit timezone.

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations, and static base URI, and implicit timezone.

Rules

The following rules are applied to the input sequence:

  • Values of type xs:untypedAtomic in $values are cast to xs:double.

  • If the resulting sequence contains values that are instances of more than one primitive type (meaning the 19 primitive types defined in [Schema 1.1 Part 2]), then:

    1. If each value is an instance of one of the types xs:string or xs:anyURI, then all the values are cast to type xs:string.

    2. If each value is an instance of one of the types xs:decimal or xs:float, then all the values are cast to type xs:float.

    3. If each value is an instance of one of the types xs:decimal, xs:float, or xs:double, then all the values are cast to type xs:double.

    4. Otherwise, a type error is raised [err:FORG0006].

    Note:

    The primitive type of an xs:integer value for this purpose is xs:decimal.

The items in the resulting sequence may be reordered in an arbitrary order. The resulting sequence is referred to below as the converted sequence. The function returns an item from the converted sequence rather than the input sequence.

If the converted sequence is empty, the empty sequence is returned.

All items in the converted sequence must be derived from a single base type for which the le operator is defined. In addition, the values in the sequence must have a total order. If date/time values do not have a timezone, they are considered to have the implicit timezone provided by the dynamic context for the purpose of comparison. Duration values must either all be xs:yearMonthDuration values or must all be xs:dayTimeDuration values.

If the converted sequence contains the value NaN, the value NaN is returned (as an xs:float or xs:double as appropriate).

If the items in the converted sequence are of type xs:string or types derived by restriction from xs:string, then the determination of the item with the smallest value is made according to the collation that is used. If the type of the items in the converted sequence is not xs:string and $collation is specified, the collation is ignored.

The collation used by this function is determined according to the rules in 5.3.5 Choosing a collation.

The function returns the result of the expression:

   if (every $v in $c satisfies $c[1] le $v) 
   then $c[1] 
   else fn:min(fn:tail($c))

evaluated with $collation as the default collation if specified, and with $c as the converted sequence.

Error Conditions

A type error is raised [err:FORG0006] if the input sequence contains items of incompatible types, as described above.

Notes

Because the rules allow the sequence to be reordered, if there are two or items that are "equal lowest", the specific item whose value is returned is ·implementation-dependent·. This can arise for example if two different strings compare equal under the selected collation, or if two different xs:dateTime values compare equal despite being in different timezones.

If the converted sequence contains exactly one value then that value is returned.

The default type when the fn:min function is applied to xs:untypedAtomic values is xs:double. This differs from the default type for operators such as lt, and for sorting in XQuery and XSLT, which is xs:string.

The rules for the dynamic type of the result are stricter in version 3.1 of the specification than in earlier versions. For example, if all the values in the input sequence belong to types derived from xs:integer, version 3.0 required only that the result be an instance of the least common supertype of the types present in the input sequence; Version 3.1 requires that the returned value retains its original type. This does not apply, however, where type promotion is needed to convert all the values to a common primitive type.

Examples

The expression fn:min((3,4,5)) returns 3.

The expression fn:min([3,4,5]) returns 3. (Arrays are atomized).

The expression fn:min((xs:integer(5), xs:float(5), xs:double(10))) returns xs:double(5.0e0).

fn:min((3,4,"Zero")) raises a type error [err:FORG0006].

fn:min((xs:float(0.0E0), xs:float(-0.0E0))) can return either positive or negative zero. The two items are equal, so it is ·implementation-dependent· which is returned.

The expression fn:min((fn:current-date(), xs:date("1900-01-01"))) returns xs:date("1900-01-01"). (Assuming that the current date is set to a reasonable value.)

The expression fn:min(("a", "b", "c")) returns "a". (Assuming a typical default collation.)

14.4.5 fn:sum

Summary

Returns a value obtained by adding together the values in $values.

Signatures
fn:sum(
$values as xs:anyAtomicType*
) as xs:anyAtomicType
fn:sum(
$values as xs:anyAtomicType*,
$zero as xs:anyAtomicType?
) as xs:anyAtomicType?
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Any values of type xs:untypedAtomic in $values are cast to xs:double. The items in the resulting sequence may be reordered in an arbitrary order. The resulting sequence is referred to below as the converted sequence.

If the converted sequence is empty, then the single-argument form of the function returns the xs:integer value 0; the two-argument form returns the value of the argument $zero.

If the converted sequence contains the value NaN, NaN is returned.

All items in $values must be numeric or derived from a single base type. In addition, the type must support addition. Duration values must either all be xs:yearMonthDuration values or must all be xs:dayTimeDuration values. For numeric values, the numeric promotion rules defined in 4.2 Arithmetic operators on numeric values are used to promote all values to a single common type. The sum of a sequence of integers will therefore be an integer, while the sum of a numeric sequence that includes at least one xs:double will be an xs:double.

The result of the function, using the second signature, is the result of the expression:

if (fn:count($c) eq 0) then
    $zero
else if (fn:count($c) eq 1) then
    $c[1]
else
    $c[1] + fn:sum(subsequence($c, 2))

where $c is the converted sequence.

The result of the function, using the first signature, is the result of the expression: fn:sum($arg, 0).

Error Conditions

A type error is raised [err:FORG0006] if the input sequence contains items of incompatible types, as described above.

Notes

The second argument allows an appropriate value to be defined to represent the sum of an empty sequence. For example, when summing a sequence of durations it would be appropriate to return a zero-length duration of the appropriate type. This argument is necessary because a system that does dynamic typing cannot distinguish "an empty sequence of integers", for example, from "an empty sequence of durations".

If the converted sequence contains exactly one value then that value is returned.

Examples
let $d1 := xs:yearMonthDuration("P20Y")
let $d2 := xs:yearMonthDuration("P10M")
let $seq1 := ($d1, $d2)
let $seq3 := (3, 4, 5)

The expression fn:sum(($d1, $d2)) returns xs:yearMonthDuration("P20Y10M").

The expression fn:sum($seq1[. lt xs:yearMonthDuration('P3M')], xs:yearMonthDuration('P0M')) returns xs:yearMonthDuration("P0M").

The expression fn:sum($seq3) returns 12.

The expression fn:sum(()) returns 0.

The expression fn:sum((),()) returns ().

The expression fn:sum((1 to 100)[. lt 0], 0) returns 0.

fn:sum(($d1, 9E1)) raises a type error [err:FORG0006].

The expression fn:sum(($d1, $d2), "ein Augenblick") returns xs:yearMonthDuration("P20Y10M"). (There is no requirement that the $zero value should be the same type as the items in $value, or even that it should belong to a type that supports addition.)

The expression fn:sum([1, 2, 3]) returns 6. (Atomizing an array returns the sequence obtained by atomizing its members.)

The expression fn:sum([[1, 2], [3, 4]]) returns 10. (Atomizing an array returns the sequence obtained by atomizing its members.)

14.4.6 fn:all-equal

Summary

Returns true if all items in a supplied sequence (after atomization) are equal.

Signatures
fn:all-equal(
$values as xs:anyAtomicType*
) as xs:boolean
fn:all-equal(
$values as xs:anyAtomicType*,
$collation as xs:string
) as xs:boolean
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations.

Rules

Omitting the second argument, $collation, is equivalent to supplying fn:default-collation(). For more information on collations see 5.3.5 Choosing a collation.

The result of the function fn:all-equal($values, $collation) is true if and only if the result of fn:count(fn:distinct-values($values, $collation)) le 1 is true (that is, if the sequence is empty, or if all the items in the sequence are equal under the rules of the fn:distinct-values function).

Examples

The expression fn:all-equal((1,2,3)) returns false().

The expression fn:all-equal((1, 1.0, 1.0e0)) returns true().

The expression fn:all-equal("one") returns true().

The expression fn:all-equal(()) returns true().

The expression fn:all-equal(("ABC", "abc"), "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive") returns true().

The expression fn:all-equal(//p/@class) returns true if all p elements have the same value for @class.

The expression fn:all-equal(*!fn:node-name()) returns true if all element children of the context node have the same name.

History

Originally proposed for 4.0 under the name fn:uniform. Accepted 2022-09-20 with a change of name.

14.4.7 fn:all-different

Summary

Returns true if no two items in a supplied sequence are equal.

Signatures
fn:all-different(
$values as xs:anyAtomicType**
) as xs:boolean
fn:all-different(
$values as xs:anyAtomicType**,
$collation as xs:string
) as xs:boolean
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on collations.

Rules

Omitting the second argument, $collation, is equivalent to supplying fn:default-collation(). For more information on collations see 5.3.5 Choosing a collation.

The result of the function fn:all-different($values, $collation) is true if and only if the result of fn:count(fn:distinct-values($values, $collation)) eq fn:count($values) is true (that is, if the sequence is empty, or if all the items in the sequence are distinct under the rules of the fn:distinct-values function).

Examples

The expression fn:all-different((1,2,3)) returns true().

The expression fn:all-different((1, 1.0, 1.0e0)) returns false().

The expression fn:all-different("one") returns true().

The expression fn:all-different(()) returns true().

The expression fn:all-different(("ABC", "abc"), "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive") returns false().

The expression fn:all-different(//employee/@ssn) is true if no two employees have the same value for their @ssn attribute.

The expression fn:all-different(*!fn:node-name()) returns true if all element children of the context node have distinct names.

History

Originally proposed for 4.0 under the name fn:unique. Accepted 2022-09-20 with a change of name and with clarifications to the description.

14.5 Functions on node identifiers

This section defines a number of functions used to find elements by ID or IDREF value, or to generate IDs.

Function Meaning
fn:id Returns the sequence of element nodes that have an ID value matching the value of one or more of the IDREF values supplied in $values.
fn:element-with-id Returns the sequence of element nodes that have an ID value matching the value of one or more of the IDREF values supplied in $values.
fn:idref Returns the sequence of element or attribute nodes with an IDREF value matching the value of one or more of the ID values supplied in $values.
fn:generate-id This function returns a string that uniquely identifies a given node.

14.5.1 fn:id

Summary

Returns the sequence of element nodes that have an ID value matching the value of one or more of the IDREF values supplied in $values.

Signatures
fn:id(
$values as xs:string*
) as element()*
fn:id(
$values as xs:string*,
$node as node() := .
) as element()*
Properties

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-dependent·.

The two-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns a sequence, in document order with duplicates eliminated, containing every element node E that satisfies all the following conditions:

  1. E is in the target document. The target document is the document containing $node, or the document containing the context item (.) if the second argument is omitted. The behavior of the function if $node is omitted is exactly the same as if the context item had been passed as $node.

  2. E has an ID value equal to one of the candidate IDREF values, where:

    • An element has an ID value equal to V if either or both of the following conditions are true:

      • The is-id property (See Section 5.5 is-id AccessorDM40.) of the element node is true, and the typed value of the element node is equal to V under the rules of the eq operator using the Unicode codepoint collation (http://www.w3.org/2005/xpath-functions/collation/codepoint).

      • The element has an attribute node whose is-id property (See Section 5.5 is-id AccessorDM40.) is true and whose typed value is equal to V under the rules of the eq operator using the Unicode code point collation (http://www.w3.org/2005/xpath-functions/collation/codepoint).

    • Each xs:string in $values is parsed as if it were of type IDREFS, that is, each xs:string in $values is treated as a whitespace-separated sequence of tokens, each acting as an IDREF. These tokens are then included in the list of candidate IDREFs. If any of the tokens is not a lexically valid IDREF (that is, if it is not lexically an xs:NCName), it is ignored. Formally, the candidate IDREF values are the strings in the sequence given by the expression:

      for $s in $values return 
          fn:tokenize(fn:normalize-space($s), ' ')[. castable as xs:IDREF]
  3. If several elements have the same ID value, then E is the one that is first in document order.

Error Conditions

A dynamic error is raised [err:FODC0001] if $node, or the context item if the second argument is absent, is a node in a tree whose root is not a document node.

The following errors may be raised when $node is omitted:

Notes

The effect of this function is anomalous in respect of element nodes with the is-id property. For legacy reasons, this function returns the element that has the is-id property, whereas it would be more appropriate to return its parent, that being the element that is uniquely identified by the ID. A new function fn:element-with-id has been introduced with the desired behavior.

If the data model is constructed from an Infoset, an attribute will have the is-id property if the corresponding attribute in the Infoset had an attribute type of ID: typically this means the attribute was declared as an ID in a DTD.

If the data model is constructed from a PSVI, an element or attribute will have the is-id property if its typed value is a single atomic value of type xs:ID or a type derived by restriction from xs:ID.

No error is raised in respect of a candidate IDREF value that does not match the ID of any element in the document. If no candidate IDREF value matches the ID value of any element, the function returns the empty sequence.

It is not necessary that the supplied argument should have type xs:IDREF or xs:IDREFS, or that it should be derived from a node with the is-idrefs property.

An element may have more than one ID value. This can occur with synthetic data models or with data models constructed from a PSVI where the element and one of its attributes are both typed as xs:ID.

If the source document is well-formed but not valid, it is possible for two or more elements to have the same ID value. In this situation, the function will select the first such element.

It is also possible in a well-formed but invalid document to have an element or attribute that has the is-id property but whose value does not conform to the lexical rules for the xs:ID type. Such a node will never be selected by this function.

Examples
let $emp := 
        validate lax{    
          document{
            <employee xml:id="ID21256"
                      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
                      xmlns:xs="http://www.w3.org/2001/XMLSchema">
               <empnr xsi:type="xs:ID">E21256</empnr>
               <first>John</first>
               <last>Brown</last>
            </employee>
          }
        }
         

The expression $emp/id('ID21256')/name() returns "employee". (The xml:id attribute has the is-id property, so the employee element is selected.)

The expression $emp/id('E21256')/name() returns "empnr". (Assuming the empnr element is given the type xs:ID as a result of schema validation, the element will have the is-id property and is therefore selected. Note the difference from the behavior of fn:element-with-id.)

14.5.2 fn:element-with-id

Summary

Returns the sequence of element nodes that have an ID value matching the value of one or more of the IDREF values supplied in $values.

Signatures
fn:element-with-id(
$values as xs:string*
) as element()*
fn:element-with-id(
$values as xs:string*,
$node as node() := .
) as element()*
Properties

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-dependent·.

The two-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

Note:

The effect of this function is identical to fn:id in respect of elements that have an attribute with the is-id property. However, it behaves differently in respect of element nodes with the is-id property. Whereas the fn:id function, for legacy reasons, returns the element that has the is-id property, this function returns the element identified by the ID, which is the parent of the element having the is-id property.

The function returns a sequence, in document order with duplicates eliminated, containing every element node E that satisfies all the following conditions:

  1. E is in the target document. The target document is the document containing $node, or the document containing the context item (.) if the second argument is omitted. The behavior of the function if $node is omitted is exactly the same as if the context item had been passed as $node.

  2. E has an ID value equal to one of the candidate IDREF values, where:

    • An element has an ID value equal to V if either or both of the following conditions are true:

      • The element has an child element node whose is-id property (See Section 5.5 is-id AccessorDM40.) is true and whose typed value is equal to V under the rules of the eq operator using the Unicode code point collation (http://www.w3.org/2005/xpath-functions/collation/codepoint).

      • The element has an attribute node whose is-id property (See Section 5.5 is-id AccessorDM40.) is true and whose typed value is equal to V under the rules of the eq operator using the Unicode code point collation (http://www.w3.org/2005/xpath-functions/collation/codepoint).

    • Each xs:string in $values is parsed as if it were of type IDREFS, that is, each xs:string in $values is treated as a whitespace-separated sequence of tokens, each acting as an IDREF. These tokens are then included in the list of candidate IDREFs. If any of the tokens is not a lexically valid IDREF (that is, if it is not lexically an xs:NCName), it is ignored. Formally, the candidate IDREF values are the strings in the sequence given by the expression:

      for $s in $arg return 
         fn:tokenize(fn:normalize-space($s), ' ')[. castable as xs:IDREF]
  3. If several elements have the same ID value, then E is the one that is first in document order.

Error Conditions

A dynamic error is raised [err:FODC0001] if $node, or the context item if the second argument is omitted, is a node in a tree whose root is not a document node.

The following errors may be raised when $node is omitted:

Notes

This function is equivalent to the fn:id function except when dealing with ID-valued element nodes. Whereas the fn:id function selects the element containing the identifier, this function selects its parent.

If the data model is constructed from an Infoset, an attribute will have the is-id property if the corresponding attribute in the Infoset had an attribute type of ID: typically this means the attribute was declared as an ID in a DTD.

If the data model is constructed from a PSVI, an element or attribute will have the is-id property if its typed value is a single atomic value of type xs:ID or a type derived by restriction from xs:ID.

No error is raised in respect of a candidate IDREF value that does not match the ID of any element in the document. If no candidate IDREF value matches the ID value of any element, the function returns the empty sequence.

It is not necessary that the supplied argument should have type xs:IDREF or xs:IDREFS, or that it should be derived from a node with the is-idrefs property.

An element may have more than one ID value. This can occur with synthetic data models or with data models constructed from a PSVI where the element and one of its attributes are both typed as xs:ID.

If the source document is well-formed but not valid, it is possible for two or more elements to have the same ID value. In this situation, the function will select the first such element.

It is also possible in a well-formed but invalid document to have an element or attribute that has the is-id property but whose value does not conform to the lexical rules for the xs:ID type. Such a node will never be selected by this function.

Examples
let $emp := 
         validate lax{    
          document{
            <employee xml:id="ID21256"
                      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
                      xmlns:xs="http://www.w3.org/2001/XMLSchema">
               <empnr xsi:type="xs:ID">E21256</empnr>
               <first>John</first>
               <last>Brown</last>
            </employee>
          }
        }
         

The expression $emp/fn:element-with-id('ID21256')/name() returns "employee". (The xml:id attribute has the is-id property, so the employee element is selected.)

The expression $emp/fn:element-with-id('E21256')/name() returns "employee". (Assuming the empnr element is given the type xs:ID as a result of schema validation, the element will have the is-id property and is therefore its parent is selected. Note the difference from the behavior of fn:id.)

14.5.3 fn:idref

Summary

Returns the sequence of element or attribute nodes with an IDREF value matching the value of one or more of the ID values supplied in $values.

Signatures
fn:idref(
$values as xs:string*
) as node()*
fn:idref(
$values as xs:string*,
$node as node() := .
) as node()*
Properties

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-dependent·.

The two-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns a sequence, in document order with duplicates eliminated, containing every element or attribute node $N that satisfies all the following conditions:

  1. $N is in the target document. The target document is the document containing $node or the document containing the context item (.) if the second argument is omitted. The behavior of the function if $node is omitted is exactly the same as if the context item had been passed as $node.

  2. $N has an IDREF value equal to one of the candidate ID values, where:

    • A node $N has an IDREF value equal to V if both of the following conditions are true:

      • The is-idrefs property (see Section 5.6 is-idrefs AccessorDM40) of $N is true.

      • The sequence

        fn:tokenize(fn:normalize-space(fn:string($N)), ' ')

        contains a string that is equal to V under the rules of the eq operator using the Unicode code point collation (http://www.w3.org/2005/xpath-functions/collation/codepoint).

    • Each xs:string in $values is parsed as if it were of lexically of type xs:ID. These xs:strings are then included in the list of candidate xs:IDs. If any of the strings in $values is not a lexically valid xs:ID (that is, if it is not lexically an xs:NCName), it is ignored. More formally, the candidate ID values are the strings in the sequence:

      $values[. castable as xs:NCName]
Error Conditions

A dynamic error is raised [err:FODC0001] if $node, or the context item if the second argument is omitted, is a node in a tree whose root is not a document node.

The following errors may be raised when $node is omitted:

Notes

An element or attribute typically acquires the is-idrefs property by being validated against the schema type xs:IDREF or xs:IDREFS, or (for attributes only) by being described as of type IDREF or IDREFS in a DTD.

Because the function is sensitive to the way in which the data model is constructed, calls on this function are not always interoperable.

No error is raised in respect of a candidate ID value that does not match the IDREF value of any element or attribute in the document. If no candidate ID value matches the IDREF value of any element or attribute, the function returns the empty sequence.

It is possible for two or more nodes to have an IDREF value that matches a given candidate ID value. In this situation, the function will return all such nodes. However, each matching node will be returned at most once, regardless how many candidate ID values it matches.

It is possible in a well-formed but invalid document to have a node whose is-idrefs property is true but that does not conform to the lexical rules for the xs:IDREF type. The effect of the above rules is that ill-formed candidate ID values and ill-formed IDREF values are ignored.

If the data model is constructed from a PSVI, the typed value of a node that has the is-idrefs property will contain at least one atomic value of type xs:IDREF (or a type derived by restriction from xs:IDREF). It may also contain atomic values of other types. These atomic values are treated as candidate ID values if two conditions are met: their lexical form must be valid as an xs:NCName, and there must be at least one instance of xs:IDREF in the typed value of the node. If these conditions are not satisfied, such values are ignored.

Examples
let $emp := 
      validate lax {  
        document {    
          <employees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
                     xmlns:xs="http://www.w3.org/2001/XMLSchema">  
            <employee xml:id="ID21256">
               <empnr xsi:type="xs:ID">E21256</empnr>
               <first>Anil</first>
               <last>Singh</last>
               <deputy xsi:type="xs:IDREF">E30561</deputy>
            </employee>
            <employee xml:id="ID30561">
               <empnr xsi:type="xs:ID">E30561</empnr>
               <first>John</first>
               <last>Brown</last>
               <manager xsi:type="xs:IDREF">ID21256</manager>
            </employee>
          </employees>
        }
      }
         

The expression $emp/(element-with-id('ID21256')/@xml:id => fn:idref())/ancestor::employee/last => string() returns "Brown". (Assuming that manager has the is-idref property, the call on fn:idref selects the manager element. If, instead, the manager had a ref attribute with the is-idref property, the call on fn:idref would select the attribute node.)

The expression $emp/(element-with-id('E30561')/empnr => fn:idref())/ancestor::employee/last => string() returns "Singh". (Assuming that employee/deputy has the is-idref property, the call on fn:idref selects the deputy element.)

14.5.4 fn:generate-id

Summary

This function returns a string that uniquely identifies a given node.

Signatures
fn:generate-id() as xs:string
fn:generate-id(
$node as node()? := .
) as xs:string
Properties

The zero-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-dependent·.

The one-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If the argument is omitted, it defaults to the context item (.). The behavior of the function if the argument is omitted is exactly the same as if the context item had been passed as the argument.

If the argument is the empty sequence, the result is the zero-length string.

In other cases, the function returns a string that uniquely identifies a given node. More formally, it is guaranteed that within a single ·execution scope·, fn:codepoint-equal(fn:generate-id($N), fn:generate-id($M)) returns true if and only if ($M is $N) returns true.

The returned identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name.

Error Conditions

The following errors may be raised when $node is omitted:

Notes

An implementation is free to generate an identifier in any convenient way provided that it always generates the same identifier for the same node and that different identifiers are always generated from different nodes. An implementation is under no obligation to generate the same identifiers each time a document is transformed or queried.

There is no guarantee that a generated unique identifier will be distinct from any unique IDs specified in the source document.

There is no inverse to this function; it is not directly possible to find the node with a given generated ID. Of course, it is possible to search a given sequence of nodes using an expression such as $nodes[generate-id()=$id].

It is advisable, but not required, for implementations to generate IDs that are distinct even when compared using a case-blind collation.

Examples

The primary use case for this function is to generate hyperlinks. For example, when generating HTML, an anchor for a given section $sect can be generated by writing (in either XSLT or XQuery):

<a name="{fn:generate-id($sect)}"/>

and a link to that section can then be produced with code such as:

see <a href="#{fn:generate-id($sect)}">here</a>

Note that anchors generated in this way will not necessarily be the same each time a document is republished.

Since the keys in a map must be atomic values, it is possible to use generated IDs as surrogates for nodes when constructing a map. For example, in some implementations, testing whether a node $N is a member of a large node-set $S using the expression fn:exists($N intersect $S) may be expensive; there may then be performance benefits in creating a map:

let $SMap := map:merge($S!map{fn:generate-id(.) : .})

and then testing for membership of the node-set using:

map:contains($SMap, fn:generate-id($N))

14.6 Functions giving access to external information

The functions in this section provide access to resources (such as files) in the external environment.

Function Meaning
fn:doc Retrieves a document using a URI supplied as an xs:string, and returns the corresponding document node.
fn:doc-available The function returns true if and only if the function call fn:doc($href) would return a document node.
fn:collection Returns a sequence of items identified by a collection URI; or a default collection if no URI is supplied.
fn:uri-collection Returns a sequence of xs:anyURI values representing the URIs in a URI collection.
fn:unparsed-text The fn:unparsed-text function reads an external resource (for example, a file) and returns a string representation of the resource.
fn:unparsed-text-lines The fn:unparsed-text-lines function reads an external resource (for example, a file) and returns its contents as a sequence of strings, one for each line of text in the string representation of the resource.
fn:unparsed-text-available Because errors in evaluating the fn:unparsed-text function are non-recoverable, these two functions are provided to allow an application to determine whether a call with particular arguments would succeed.
fn:environment-variable Returns the value of a system environment variable, if it exists.
fn:available-environment-variables Returns a list of environment variable names that are suitable for passing to fn:environment-variable, as a (possibly empty) sequence of strings.

14.6.1 fn:doc

Summary

Retrieves a document using a URI supplied as an xs:string, and returns the corresponding document node.

Signature
fn:doc(
$href as xs:string?
) as document-node()?
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on available documents, and static base URI.

Rules

If $href is the empty sequence, the result is an empty sequence.

If $href is a relative URI reference, it is resolved relative to the value of the static base URI property from the static context. The resulting absolute URI is promoted to an xs:string.

If the available documents described in Section 2.1.2 Dynamic Context XP31 provides a mapping from this string to a document node, the function returns that document node.

The URI may include a fragment identifier.

By default, this function is ·deterministic·. Two calls on this function return the same document node if the same URI Reference (after resolution to an absolute URI Reference) is supplied to both calls. Thus, the following expression (if it does not raise an error) will always be true:

doc("foo.xml") is doc("foo.xml")

However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003].

Note:

If $href is read from a source document, it is generally appropriate to resolve it relative to the base URI property of the relevant node in the source document. This can be achieved by calling the fn:resolve-uri function, and passing the resulting absolute URI as an argument to the fn:doc function.

If two calls to this function supply different absolute URI References as arguments, the same document node may be returned if the implementation can determine that the two arguments refer to the same resource.

By defining the semantics of this function in terms of a string-to-document-node mapping in the dynamic context, the specification is acknowledging that the results of this function are outside the purview of the language specification itself, and depend entirely on the run-time environment in which the expression is evaluated. This run-time environment includes not only an unpredictable collection of resources ("the web"), but configurable machinery for locating resources and turning their contents into document nodes within the XPath data model. Both the set of resources that are reachable, and the mechanisms by which those resources are parsed and validated, are ·implementation-dependent·.

One possible processing model for this function is as follows. The resource identified by the URI Reference is retrieved. If the resource cannot be retrieved, a dynamic error is raised [err:FODC0002]. The data resulting from the retrieval action is then parsed as an XML document and a tree is constructed in accordance with the [XQuery and XPath Data Model (XDM) 3.0]. If the top-level media type is known and is "text", the content is parsed in the same way as if the media type were text/xml; otherwise, it is parsed in the same way as if the media type were application/xml. If the contents cannot be parsed successfully, a dynamic error is raised [err:FODC0002]. Otherwise, the result of the function is the document node at the root of the resulting tree. This tree is then optionally validated against a schema.

Various aspects of this processing are ·implementation-defined·. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:

  • The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers.

  • The handling of non-XML media types is implementation-defined. Implementations may allow instances of the data model to be constructed from non-XML resources, under user control.

  • It is ·implementation-defined· whether DTD validation and/or schema validation is applied to the source document.

  • Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in parsing and validating its content. When errors have been handled in this way, the function may return either an empty sequence, or a fallback document provided by the error handler.

  • Implementations may provide user options that relax the requirement for the function to return deterministic results.

  • The effect of a fragment identifier in the supplied URI is ·implementation-defined·. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child.

Error Conditions

A dynamic error may be raised [err:FODC0005] if $href is not a valid URI reference.

A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.

A dynamic error is raised [err:FODC0002] if the available documents provides no mapping for the absolutized URI.

A dynamic error is raised [err:FODC0002] if the resource cannot be retrieved or cannot be parsed successfully as XML.

A dynamic error is raised [err:FODC0003] if the implementation is not able to guarantee that the result of the function will be deterministic, and the user has not indicated that an unstable result is acceptable.

14.6.2 fn:doc-available

Summary

The function returns true if and only if the function call fn:doc($href) would return a document node.

Signature
fn:doc-available(
$href as xs:string?
) as xs:boolean
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on available documents, and static base URI.

Rules

If $href is an empty sequence, this function returns false.

If a call on fn:doc($href) would return a document node, this function returns true.

In all other cases this function returns false. This includes the case where an invalid URI is supplied, and also the case where a valid relative URI reference is supplied, and cannot be resolved, for example because the static base URI is absent.

If this function returns true, then calling fn:doc($href) within the same ·execution scope· must return a document node. However, if nondeterministic processing has been selected for the fn:doc function, this guarantee is lost.

14.6.3 fn:collection

Summary

Returns a sequence of items identified by a collection URI; or a default collection if no URI is supplied.

Signatures
fn:collection() as item()*
fn:collection(
$uri as xs:string?
) as item()*
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on available collections, and static base URI.

Rules

This function takes an xs:string as argument and returns a sequence of items obtained by interpreting $uri as an xs:anyURI and resolving it according to the mapping specified in available collections described in Section C.2 Dynamic Context Components XP31.

If available collections provides a mapping from this string to a sequence of items, the function returns that sequence. If available collections maps the string to an empty sequence, then the function returns an empty sequence.

If $uri is not specified, the function returns the sequence of items in the default collection in the dynamic context. See Section C.2 Dynamic Context Components XP31.

If $uri is a relative xs:anyURI, it is resolved against the value of the base-URI property from the static context.

If $uri is the empty sequence, the function behaves as if it had been called without an argument. See above.

By default, this function is ·deterministic·. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is ·implementation-defined·. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003].

There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates.

Error Conditions

A dynamic error is raised [err:FODC0002] if no URI is supplied and the value of the default collection is absentDM40.

A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.

A dynamic error is raised [err:FODC0002] if available node collections provides no mapping for the absolutized URI.

A dynamic error may be raised [err:FODC0004] if $uri is not a valid xs:anyURI.

Notes

In earlier versions of this specification, the primary use for the fn:collection function was to retrieve a collection of XML documents, perhaps held as lexical XML in operating system filestore, or perhaps held in an XML database. In this release the concept has been generalised to allow other resources to be retrieved: for example JSON documents might be returned as arrays or maps, non-XML text files might be returned as strings, and binary files might be returned as instances of xs:base64Binary.

The abstract concept of a collection might be realized in different ways by different implementations, and the ways in which URIs map to collections can be equally variable. Specifying resources using URIs is useful because URIs are dynamic, can be parameterized, and do not rely on an external environment.

14.6.4 fn:uri-collection

Summary

Returns a sequence of xs:anyURI values representing the URIs in a URI collection.

Signatures
fn:uri-collection() as xs:anyURI*
fn:uri-collection(
$uri as xs:string?
) as xs:anyURI*
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on available URI collections, and static base URI.

Rules

The zero-argument form of the function returns the URIs in the default URI collection described in Section C.2 Dynamic Context Components XP31.

If $uri is a relative xs:anyURI, it is resolved against the value of the base-URI property from the static context.

If $uri is the empty sequence, the function behaves as if it had been called without an argument. See above.

The single-argument form of the function returns the sequence of URIs corresponding to the supplied URI in the available URI collections described in Section C.2 Dynamic Context Components XP31.

By default, this function is ·deterministic·. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is ·implementation-defined·. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003].

There is no requirement that the URIs returned by this function should all be distinct, and no assumptions can be made about the order of URIs in the sequence, unless the implementation defines otherwise.

Error Conditions

A dynamic error is raised [err:FODC0002] if no URI is supplied (that is, if the function is called with no arguments, or with a single argument that evaluates to an empty sequence), and the value of the default resource collection is absentDM40.

A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.

A dynamic error is raised [err:FODC0002] if available resource collections provides no mapping for the absolutized URI.

A dynamic error may be raised [err:FODC0004] if $uri is not a valid xs:anyURI.

Notes

In some implementations, there might be a close relationship between collections (as retrieved by the fn:collection function), and URI collections (as retrieved by this function). For example, a collection might return XML documents, and the corresponding URI collection might return the URIs of those documents. However, this specification does not impose such a close relationship. For example, there may be collection URIs accepted by one of the two functions and not by the other; a collection might contain items that do not have any URI; or a URI collection might contain URIs that cannot be dereferenced to return any resource.

Thus, some implementations might ensure that calling fn:uri-collection and then applying fn:doc to each of the returned URIs delivers the same result as calling fn:collection with the same argument; however, this is not guaranteed.

In the case where fn:uri-collection returns the URIs of resources that could also be retrieved directly using fn:collection, there are several reasons why it might be appropriate to use this function in preference to the fn:collection function. For example:

  • It allows different URIs for different kinds of resource to be dereferenced in different ways: for example, the returned URIs might be referenced using the fn:unparsed-text function rather than the fn:doc function.

  • In XSLT 3.0 it allows the documents in a collection to be processed in streaming mode using the xsl:stream instruction.

  • It allows recovery from failures to read, parse, or validate individual documents, by calling the fn:doc (or other dereferencing) function within the scope of try/catch.

  • It allows selection of which documents to read based on their URI, for example they can be filtered to select those whose URIs end in .xml, or those that use the https scheme.

  • An application might choose to limit the number of URIs processed in a single run, for example it might process only the first 50 URIs in the collection; or it might present the URIs to the user and allow the user to select which of them need to be further processed.

  • It allows the URIs to be modified before they are dereferenced, for example by adding or removing query parameters, or by redirecting the request to a local cache or to a mirror site.

For some of these use cases, this assumes that the cost of calling fn:collection might be significant (for example, it might involving retrieving all the documents in the collection over the network and parsing them). This will not necessarily be true of all implementations.

14.6.5 fn:unparsed-text

Summary

The fn:unparsed-text function reads an external resource (for example, a file) and returns a string representation of the resource.

Signatures
fn:unparsed-text(
$href as xs:string?
) as xs:string?
fn:unparsed-text(
$href as xs:string?,
$encoding as xs:string
) as xs:string?
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

Rules

The $href argument must be a string in the form of a URI reference, which must contain no fragment identifier, and must identify a resource for which a string representation is available. If the URI is a relative URI reference, then it is resolved relative to the static base URI property from the static context.

The mapping of URIs to the string representation of a resource is the mapping defined in the available text resourcesXP31 component of the dynamic context.

If the value of the $href argument is an empty sequence, the function returns an empty sequence.

The $encoding argument, if present, is the name of an encoding. The values for this attribute follow the same rules as for the encoding attribute in an XML declaration. The only values which every implementation is required to recognize are utf-8 and utf-16.

The encoding of the external resource is determined as follows:

  1. external encoding information is used if available, otherwise

  2. if the media type of the resource is text/xml or application/xml (see [RFC 2376]), or if it matches the conventions text/*+xml or application/*+xml (see [RFC 7303] and/or its successors), then the encoding is recognized as specified in [Extensible Markup Language (XML) 1.0 (Fifth Edition)], otherwise

  3. the value of the $encoding argument is used if present, otherwise

  4. the processor may use ·implementation-defined· heuristics to determine the likely encoding, otherwise

  5. UTF-8 is assumed.

The result of the function is a string containing the string representation of the resource retrieved using the URI.

Error Conditions

A dynamic error is raised [err:FOUT1170] if $href contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.

A dynamic error is raised [err:FOUT1190] if the value of the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode ·characters· using the specified encoding, or if the resulting characters are not permitted XML characters.

A dynamic error is raised [err:FOUT1200] if $encoding is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.

Notes

If it is appropriate to use a base URI other than the dynamic base URI (for example, when resolving a relative URI reference read from a source document) then it is advisable to resolve the relative URI reference using the fn:resolve-uri function before passing it to the fn:unparsed-text function.

There is no essential relationship between the sets of URIs accepted by the two functions fn:unparsed-text and fn:doc (a URI accepted by one may or may not be accepted by the other), and if a URI is accepted by both there is no essential relationship between the results (different resource representations are permitted by the architecture of the web).

There are no constraints on the MIME type of the resource.

The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are ·implementation-defined·. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:

  • The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers.

  • The handling of media types is implementation-defined.

  • Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in reading its content. When errors have been handled in this way, the function may return a fallback document provided by the error handler.

  • Implementations may provide user options that relax the requirement for the function to return deterministic results.

The rules for determining the encoding are chosen for consistency with [XML Inclusions (XInclude) Version 1.0 (Second Edition)]. Files with an XML media type are treated specially because there are use cases for this function where the retrieved text is to be included as unparsed XML within a CDATA section of a containing document, and because processors are likely to be able to reuse the code that performs encoding detection for XML external entities.

If the text file contains characters such as < and &, these will typically be output as &lt; and &amp; if the string is serialized as XML or HTML. If these characters actually represent markup (for example, if the text file contains HTML), then an XSLT stylesheet can attempt to write them as markup to the output file using the disable-output-escaping attribute of the xsl:value-of instruction. Note, however, that XSLT implementations are not required to support this feature.

Examples

This XSLT example attempts to read a file containing 'boilerplate' HTML and copy it directly to the serialized output file:

<xsl:output method="html"/>

<xsl:template match="/">
  <xsl:value-of select="unparsed-text('header.html', 'iso-8859-1')"
                disable-output-escaping="yes"/>
  <xsl:apply-templates/>
  <xsl:value-of select="unparsed-text('footer.html', 'iso-8859-1')"
                disable-output-escaping="yes"/>
</xsl:template>

14.6.6 fn:unparsed-text-lines

Summary

The fn:unparsed-text-lines function reads an external resource (for example, a file) and returns its contents as a sequence of strings, one for each line of text in the string representation of the resource.

Signatures
fn:unparsed-text-lines(
$href as xs:string?
) as xs:string*
fn:unparsed-text-lines(
$href as xs:string?,
$encoding as xs:string
) as xs:string*
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

Rules

The unparsed-text-lines function reads an external resource (for example, a file) and returns its string representation as a sequence of strings, separated at newline boundaries.

The result of the single-argument function is the same as the result of the expression fn:tokenize(fn:unparsed-text($href), '\r\n|\r|\n')[not(position()=last() and .='')]. The result of the two-argument function is the same as the result of the expression fn:tokenize(fn:unparsed-text($href, $encoding), '\r\n|\r|\n')[not(position()=last() and .='')].

The result is thus a sequence of strings containing the text of the resource retrieved using the URI, each string representing one line of text. Lines are separated by one of the sequences x0A, x0D, or x0Dx0A. The characters representing the newline are not included in the returned strings. If there are two adjacent newline sequences, a zero-length string will be returned to represent the empty line; but if the external resource ends with the sequence x0A, x0D, or x0Dx0A, the result will be as if this final line ending were not present.

Error Conditions

Error conditions are the same as for the fn:unparsed-text function.

Notes

See the notes for fn:unparsed-text.

14.6.7 fn:unparsed-text-available

Summary

Because errors in evaluating the fn:unparsed-text function are non-recoverable, these two functions are provided to allow an application to determine whether a call with particular arguments would succeed.

Signatures
fn:unparsed-text-available(
$href as xs:string?
) as xs:boolean
fn:unparsed-text-available(
$href as xs:string?,
$encoding as xs:string
) as xs:boolean
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

Rules

The fn:unparsed-text-available function determines whether a call on the fn:unparsed-text function with identical arguments would return a string.

If the first argument is an empty sequence, the function returns false.

In other cases, the function returns true if a call on fn:unparsed-text with the same arguments would succeed, and false if a call on fn:unparsed-text with the same arguments would fail with a non-recoverable dynamic error.

The functions fn:unparsed-text and fn:unparsed-text-available have the same requirement for ·determinism· as the functions fn:doc and fn:doc-available. This means that unless the user has explicitly stated a requirement for a reduced level of determinism, either of these functions if called twice with the same arguments during the course of a transformation must return the same results each time; moreover, the results of a call on fn:unparsed-text-available must be consistent with the results of a subsequent call on unparsed-text with the same arguments.

Notes

This requires that the fn:unparsed-text-available function should actually attempt to read the resource identified by the URI, and check that it is correctly encoded and contains no characters that are invalid in XML. Implementations may avoid the cost of repeating these checks for example by caching the validated contents of the resource, to anticipate a subsequent call on the fn:unparsed-text or fn:unparsed-text-lines function. Alternatively, implementations may be able to rewrite an expression such as if (unparsed-text-available(A)) then unparsed-text(A) else ... to generate a single call internally.

Since the function fn:unparsed-text-lines succeeds or fails under exactly the same circumstances as fn:unparsed-text, the fn:unparsed-text-available function may equally be used to test whether a call on fn:unparsed-text-lines would succeed.

14.6.8 fn:environment-variable

Summary

Returns the value of a system environment variable, if it exists.

Signature
fn:environment-variable(
$name as xs:string
) as xs:string?
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on environment variables.

Rules

The set of available environment variablesXP31 is a set of (name, value) pairs forming part of the dynamic context, in which the name is unique within the set of pairs. The name and value are arbitrary strings.

If the $name argument matches the name of one of these pairs, the function returns the corresponding value.

If there is no environment variable with a matching name, the function returns the empty sequence.

The collation used for matching names is ·implementation-defined·, but must be the same as the collation used to ensure that the names of all environment variables are unique.

The function is ·deterministic·, which means that if it is called several times within the same ·execution scope·, with the same arguments, it must return the same result.

Notes

On many platforms, the term "environment variable" has a natural meaning in terms of facilities provided by the operating system. This interpretation of the concept does not exclude other interpretations, such as a mapping to a set of configuration parameters in a database system.

Environment variable names are usually case sensitive. Names are usually of the form (letter|_) (letter|_|digit)*, but this varies by platform.

On some platforms, there may sometimes be multiple environment variables with the same name; in this case, it is implementation-dependent as to which is returned; see for example [POSIX.1-2008] (Chapter 8, Environment Variables). Implementations may use prefixes or other naming conventions to disambiguate the names.

The requirement to ensure that the function is deterministic means in practice that the implementation must make a snapshot of the environment variables at some time during execution, and return values obtained from this snapshot, rather than using live values that are subject to change at any time.

Operating system environment variables may be associated with a particular process, while queries and stylesheets may execute across multiple processes (or multiple machines). In such circumstances implementations may choose to provide access to the environment variables associated with the process in which the query or stylesheet processing was initiated.

Security advice: Queries from untrusted sources should not be permitted unrestricted access to environment variables. For example, the name of the account under which the query is running may be useful information to a would-be intruder. An implementation may therefore choose to restrict access to the environment, or may provide a facility to make fn:environment-variable always return the empty sequence.

14.6.9 fn:available-environment-variables

Summary

Returns a list of environment variable names that are suitable for passing to fn:environment-variable, as a (possibly empty) sequence of strings.

Signature
fn:available-environment-variables() as xs:string*
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on environment variables.

Rules

The function returns a sequence of strings, being the names of the environment variables in the dynamic context in some ·implementation-dependent· order.

The function is ·deterministic·: that is, the set of available environment variables does not vary during evaluation.

Notes

The function returns a list of strings, containing no duplicates.

It is intended that the strings in this list should be suitable for passing to fn:environment-variable.

See also the note on security under the definition of the fn:environment-variable function. If access to environment variables has been disabled, fn:available-environment-variables always returns the empty sequence.

14.7 Parsing and serializing

These functions convert between the lexical representation of XML and the tree representation.

Function Meaning
fn:parse-xml This function takes as input an XML document represented as a string, and returns the document node at the root of an XDM tree representing the parsed document.
fn:parse-xml-fragment This function takes as input an XML external entity represented as a string, and returns the document node at the root of an XDM tree representing the parsed document fragment.
fn:serialize This function serializes the supplied input sequence $input as described in [XSLT and XQuery Serialization 3.1], returning the serialized representation of the sequence as a string.

14.7.1 fn:parse-xml

Summary

This function takes as input an XML document represented as a string, and returns the document node at the root of an XDM tree representing the parsed document.

Signature
fn:parse-xml(
$value as xs:string?
) as document-node(element(*))?
Properties

This function is ·nondeterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

Rules

If $value is the empty sequence, the function returns the empty sequence.

The precise process used to construct the XDM instance is ·implementation-defined·. In particular, it is implementation-defined whether DTD and/or schema validation is invoked, and it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used.

The static base URI property from the static context of the fn:parse-xml function call is used both as the base URI used by the XML parser to resolve relative entity references within the document, and as the base URI of the document node that is returned.

The document URI of the returned node is absentDM40.

The function is not ·deterministic·: that is, if the function is called twice with the same arguments, it is ·implementation-dependent· whether the same node is returned on both occasions.

Error Conditions

A dynamic error is raised [err:FODC0006] if the content of $value is not a well-formed and namespace-well-formed XML document.

A dynamic error is raised [err:FODC0006] if DTD-based validation is carried out and the content of $value is not valid against its DTD.

Notes

Since the XML document is presented to the parser as a string, rather than as a sequence of octets, the encoding specified within the XML declaration has no meaning. If the XML parser accepts input only in the form of a sequence of octets, then the processor must ensure that the string is encoded as octets in a way that is consistent with rules used by the XML parser to detect the encoding.

The primary use case for this function is to handle input documents that contain nested XML documents embedded within CDATA sections. Since the content of the CDATA section are exposed as text, the receiving query or stylesheet may pass this text to the fn:parse-xml function to create a tree representation of the nested document.

Similarly, nested XML within comments is sometimes encountered, and lexical XML is sometimes returned by extension functions, for example, functions that access web services or read from databases.

A use case arises in XSLT where there is a need to preprocess an input document before parsing. For example, an application might wish to edit the document to remove its DOCTYPE declaration. This can be done by reading the raw text using the fn:unparsed-text function, editing the resulting string, and then passing it to the fn:parse-xml function.

Examples

The expression fn:parse-xml("<alpha>abcd</alpha>") returns a newly created document node, having an alpha element as its only child; the alpha element in turn is the parent of a text node whose string value is "abcd".

14.7.2 fn:parse-xml-fragment

Summary

This function takes as input an XML external entity represented as a string, and returns the document node at the root of an XDM tree representing the parsed document fragment.

Signature
fn:parse-xml-fragment(
$value as xs:string?
) as document-node()?
Properties

This function is ·nondeterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

Rules

If $value is the empty sequence, the function returns the empty sequence.

The input must be a namespace-well-formed external general parsed entity. More specifically, it must be a string conforming to the production rule extParsedEntXML in [Extensible Markup Language (XML) 1.0 (Fifth Edition)], it must contain no entity references other than references to predefined entities, and it must satisfy all the rules of [Namespaces in XML] for namespace-well-formed documents with the exception that the rule requiring it to be a well-formed document is replaced by the rule requiring it to be a well-formed external general parsed entity.

The string is parsed to form a sequence of nodes which become children of the new document node, in the same way as the content of any element is converted into a sequence of children for the resulting element node.

Schema validation is not invoked, which means that the nodes in the returned document will all be untyped.

The precise process used to construct the XDM instance is ·implementation-defined·. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used.

The static base URI from the static context of the fn:parse-xml-fragment function call is used as the base URI of the document node that is returned.

The document URI of the returned node is absentDM40.

The function is not ·deterministic·: that is, if the function is called twice with the same arguments, it is ·implementation-dependent· whether the same node is returned on both occasions.

Error Conditions

A dynamic error is raised [err:FODC0006] if the content of $value is not a well-formed external general parsed entity, if it contains entity references other than references to predefined entities, or if a document that incorporates this well-formed parsed entity would not be namespace-well-formed.

Notes

See also the notes for the fn:parse-xml function.

The main differences between fn:parse-xml and fn:parse-xml-fragment are that for fn:parse-xml, the children of the resulting document node must contain exactly one element node and no text nodes, wheras for fn:parse-xml-fragment, the resulting document node can have any number (including zero) of element and text nodes among its children. An additional difference is that the text declaration at the start of an external entity has slightly different syntax from the XML declaration at the start of a well-formed document.

Note that all whitespace outside the text declaration is significant, including whitespace that precedes the first element node.

One use case for this function is to handle XML fragments stored in databases, which frequently allow zero-or-more top level element nodes. Another use case is to parse the contents of a CDATA section embedded within another XML document.

Examples

The expression fn:parse-xml-fragment("<alpha>abcd</alpha><beta>abcd</beta>") returns a newly created document node, having two elements named alpha and beta as its children; each of these elements in turn is the parent of a text node.

The expression fn:parse-xml-fragment("He was <i>so</i> kind") returns a newly created document node having three children: a text node whose string value is "He was ", an element node named i having a child text node with string value "so", and a text node whose string value is " kind".

The expression fn:parse-xml-fragment("") returns a document node having no children.

The expression fn:parse-xml-fragment(" ") returns a document node whose children comprise a single text node whose string value is a single space.

The expression fn:parse-xml-fragment('<?xml version="1.0" encoding="utf8" standalone="yes"?><a/>') results in a dynamic error [err:FODC0006] because the "standalone" keyword is not permitted in the text declaration that appears at the start of an external general parsed entity. (Thus, it is not the case that any input accepted by the fn:parse-xml function will also be accepted by fn:parse-xml-fragment.)

14.7.3 fn:serialize

Summary

This function serializes the supplied input sequence $input as described in [XSLT and XQuery Serialization 3.1], returning the serialized representation of the sequence as a string.

Signatures
fn:serialize(
$input as item()*
) as xs:string
fn:serialize(
$input as item()*,
$options as item()?
) as xs:string
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The value of the first argument $input acts as the input sequence to the serialization process, which starts with sequence normalization.

The second argument $options, if present, provides serialization parameters. These may be supplied in either of two forms:

  1. As an output:serialization-parameters element, having the format described in Section 3.1 Setting Serialization Parameters by Means of a Data Model Instance SER31. In this case the type of the supplied argument must match the required type element(output:serialization-parameters).

  2. As a map. In this case the type of the supplied argument must match the required type map(*)

The single-argument version of this function has the same effect as the two-argument version called with $options set to an empty sequence. This in turn is the same as the effect of passing an output:serialization-parameters element with no child elements.

The final stage of serialization, that is, encoding, is skipped. If the serializer does not allow this phase to be skipped, then the sequence of octets returned by the serializer is decoded into a string by reversing the character encoding performed in the final stage.

If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is ·implementation-defined·, and may depend on the context.

If the second argument is supplied as a map, then the ·option parameter conventions· apply. In this case:

  1. Each entry in the map defines one serialization parameter.

  2. The key of the entry is an xs:string value in the cases of parameter names defined in these specifications, or an xs:QName (with non-absent namespace) in the case of implementation-defined serialization parameters.

  3. The required type of each parameter, and its default value, are defined by the following table. The default value is used when the map contains no entry for the parameter in question, and also when an entry is present, with the empty sequence as its value. The table also indicates how the value of the map entry is to be interpreted in cases where further explanation is needed.

Parameter Required type Interpretation Default Value
allow-duplicate-names xs:boolean? true() means "yes", false() means "no" no
byte-order-mark xs:boolean? true() means "yes", false() means "no" no
cdata-section-elements xs:QName* ()
doctype-public xs:string? Zero-length string and () both represent "absent" absent
doctype-system xs:string? Zero-length string and () both represent "absent" absent
encoding xs:string? utf-8
escape-uri-attributes xs:boolean? true() means "yes", false() means "no" yes
html-version xs:decimal? 5
include-content-type xs:boolean? true() means "yes", false() means "no" yes
indent xs:boolean? true() means "yes", false() means "no" no
item-separator xs:string? absent
json-node-output-method union(xs:string, xs:QName)? See Notes 1, 2 xml
media-type xs:string? (a media type suitable for the chosen method)
method union(xs:string, xs:QName)? See Notes 1, 2 xml
normalization-form xs:string? none
omit-xml-declaration xs:boolean? true() means "yes", false() means "no" yes
standalone xs:boolean? true() means "yes", false() means "no", () means "omit" omit
suppress-indentation xs:QName* ()
undeclare-prefixes xs:boolean? true() means "yes", false() means "no" no
use-character-maps map(xs:string, xs:string)? See Note 3 map{}
version xs:string? 1.0

Notes to the table:

  1. The notation union(A, B) is used to represent a union type whose member types are A and B.

  2. If an xs:QName is supplied for the method or json-node-output-method options, then it must have a non-absent namespace URI. This means that system-defined serialization methods such as xml and json are defined as strings, not as xs:QName values.

  3. For the use-character-maps option, the value is a map, whose keys are the characters to be mapped (as xs:string instances), and whose corresponding values are the strings to be substituted for these characters.

Error Conditions

A type error [err:XPTY0004]XP occurs if the $options argument is present and does not match either of the types element(output:serialization-parameters)? or map(*).

Note:

This is defined as a type error so that it can be enforced via the function signature by implementations that generalize the type system in a suitable way.

If the host language makes serialization an optional feature and the implementation does not support serialization, then a dynamic error [err:FODC0010] is raised.

The serialization process will raise an error if $input is an attribute or namespace node.

When the second argument is supplied as a map, and the supplied value is of the wrong type for the particular parameter, for example if the value of indent is a string rather than a boolean, then as defined by the ·option parameter conventions·, a type error [err:XPTY0004]XP is raised. If the value is of the correct type, but does not satisfy the rules for that parameter defined in [XSLT and XQuery Serialization 3.1], then a dynamic error [err:SEPM0016]SER31 is raised. (For example, this occurs if the map supplied to use-character-maps includes a key that is a string whose length is not one (1)).

If any serialization error occurs, including the detection of an invalid value for a serialization parameter as described above, this results in the fn:serialize call failing with a dynamic error.

Notes

One use case for this function arises when there is a need to construct an XML document containing nested XML documents within a CDATA section (or on occasions within a comment). See fn:parse-xml for further details.

Another use case arises when there is a need to call an extension function that expects a lexical XML document as input.

There are also use cases where the application wants to post-process the output of a query or transformation, for example by adding an internal DTD subset, or by inserting proprietary markup delimiters such as the <% ... %> used by some templating languages.

The ability to specify the serialization parameters in an output:serialization-parameters element provides backwards compatibility with the 3.0 version of this specification; the ability to use a map takes advantage of new features in the 3.1 version. The default parameter values are implementation-defined when an output:serialization-parameters element is used (or when the argument is omitted), but are fixed by this specification in the case where a map (including an empty map) is supplied for the argument.

Examples

Given the variables:

let $params := 
<output:serialization-parameters 
        xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
  <output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
         
let $data := 
<a b="3"/>
         

The following call might produce the output shown:

The expression fn:serialize($data, $params) returns '<a b="3"/>'.

The following call would also produce the output shown (though the second argument could equally well be supplied as an empty map (map{}), since both parameters are given their default values):

The expression fn:serialize($data, map{"method":"xml", "omit-xml-declaration":true()}) returns '<a b="3"/>'.