1 Introduction

The purpose of this document is to propose functions and operators to be included in XPath 4.0, XQuery 4.0, and XSLT 4.0. Note that this proposal has no official standing at the time of publication. The exact syntax used to call these functions and operators is specified in [XML Path Language (XPath) 4.0], [XQuery 4.1: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0].

This document defines three classes of functions:

[XML Schema Part 2: Datatypes Second Edition] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in Section 2.7 Schema Information DM31 of the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operations are available for use in [XML Path Language (XPath) 4.0], [XQuery 4.1: An XML Query Language] and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.

[Schema 1.1 Part 2] adds to the datatypes defined in [XML Schema Part 2: Datatypes Second Edition]. It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.

References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: 'XQ' [XQuery 4.1: An XML Query Language], 'XT' [XSL Transformations (XSLT) Version 4.0], 'XP' [XML Path Language (XPath) 4.0], and 'DM' [XQuery and XPath Data Model (XDM) 3.1].

1.1 Conformance

This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:

Other recommendations ("host languages") that reference this document may dictate:

Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.

Note:

Adding such constraints in a host language, however, is discouraged because it makes it difficult to re-use implementations of the function library across host languages.

This specification allows flexibility in the choice of versions of specifications on which it depends:

Note:

The XML Schema 1.1 recommendation introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates the types xs:dayTimeDuration, xs:yearMonthDuration, and xs:anyAtomicType which were previously defined in earlier versions of [XQuery and XPath Data Model (XDM) 3.1]. Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0.

In this document, text labeled as an example or as a Note is provided for explanatory purposes and is not normative.

1.2 Namespaces and prefixes

The functions and operators defined in this document are contained in one of several namespaces (see [Namespaces in XML]) and referenced using an xs:QName.

This document uses conventional prefixes to refer to these namespaces. User-written applications can choose a different prefix to refer to the namespace, so long as it is bound to the correct URI. The host language may also define a default namespace for function calls, in which case function names in that namespace need not be prefixed at all. In many cases the default namespace will be http://www.w3.org/2005/xpath-functions, allowing a call on the fn:name function (for example) to be written as name() rather than fn:name(); in this document, however, all example function calls are explicitly prefixed.

The URIs of the namespaces and the conventional prefixes associated with them are:

Note:

The above namespace URIs are not expected to change from one version of this document to another. The contents of these namespaces may be extended to allow additional functions (and errors, and serialization parameters) to be defined.

1.3 Function overloading

A function is uniquely defined by its name and arity (number of arguments); it is therefore not possible to have two different functions that have the same name and arity, but different types in their signature. That is, function overloading in this sense of the term is not permitted. Consequently, functions such as fn:string which accept arguments of many different types have a signature that defines a very general argument type, in this case item()? which accepts any single item; supplying an inappropriate item (such as a function item) causes a dynamic error.

Some functions on numeric types include the type xs:numeric in their signature as an argument or result type. In this version of the specification, xs:numeric has been redefined as a built-in union type representing the union of xs:decimal, xs:float, xs:double (and thus automatically accepting types derived from these, including xs:integer).

Operators such as "+" may be overloaded: they map to different underlying functions depending on the dynamic types of the supplied operands.

It is possible for two functions to have the same name provided they have different arity (number of arguments). For the functions defined in this specification, where two functions have the same name and different arity, they also have closely related behavior, so they are defined in the same section of this document.

1.4 Function signatures and descriptions

Each function (or group of functions having the same name) is defined in this specification using a standard proforma.

The function name is a QName as defined in [XML Schema Part 2: Datatypes Second Edition] and must adhere to its syntactic conventions. Following the precedent set by [XML Path Language (XPath) Version 1.0], function names are generally composed of English words separated by hyphens ("-"). Abbreviations are used only where there is a strong precedent in other programming languages (as with math:sin and math:cos for sine and cosine). If a function name contains a [XML Schema Part 2: Datatypes Second Edition] datatype name, it may have intercapitalized spelling and is used in the function name as such. An example is fn:timezone-from-dateTime.

The first section in the proforma is a short summary of what the function does. This is intended to be informative rather than normative.

Each function is then defined by specifying its signature, which defines the types of the parameters and of the result value.

Each function's signature is presented in a form like this:

fn:function-name(
$parameter-name as parameter-type,
$... as 
) as return-type

In this notation, function-name, in bold-face, is the name of the function whose signature is being specified. If the function takes no parameters, then the name is followed by an empty parameter list: "()"; otherwise, the name is followed by a parenthesized list of parameter declarations, in which each declaration specifies the static type of the parameter, in italics, and a descriptive, but non-normative, name. If there are two or more parameter declarations, they are separated by a comma. The return-type, also in italics, specifies the static type of the value returned by the function. The dynamic type of the value returned by the function is the same as its static type or derived from the static type. All parameter types and return types are specified using the SequenceType notation defined in Section 2.5.4 SequenceType Syntax XP31.

One function, fn:concat, has a variable number of arguments (two or more). More strictly, there is an infinite set of functions having the name fn:concat, with arity ranging from 2 to infinity. For this special case, a single function signature is given, with an ellipsis indicating an indefinite number of arguments.

The next section in the proforma defines the semantics of the function as a set of rules. The order in which the rules appear is significant; they are to be applied in the order in which they are written. Error conditions, however, are generally listed in a separate section that follows the main rules, and take precedence over non-error rules except where otherwise stated. The principles outlined in Section 2.3.4 Errors and Optimization XP31 apply by default: to paraphrase, if the result of the function can be determined without evaluating all its arguments, then it is not necessary to evaluate the remaining arguments merely in order to determine whether any error conditions apply.

Where the proforma includes sections headed Notes or Examples, these are non-normative.

Rules for passing parameters to operators are described in the relevant sections of [XQuery 4.1: An XML Query Language] and [XML Path Language (XPath) 4.0]. For example, the rules for passing parameters to arithmetic operators are described in Section 3.5 Arithmetic Expressions XP31. Specifically, rules for parameters of type xs:untypedAtomic and the empty sequence are specified in this section.

As is customary, the parameter type name indicates that the function or operator accepts arguments of that type, or types derived from it, in that position. This is called subtype substitution (See Section 2.5.5 SequenceType Matching XP31). In addition, numeric type instances and instances of type xs:anyURI can be promoted to produce an argument of the required type. (See Section B.1 Type Promotion XP31).

  1. Subtype Substitution: A derived type may substitute for its base type. In particular, xs:integer may be used where xs:decimal is expected.

  2. Numeric Type Promotion: xs:decimal may be promoted to xs:float or xs:double. Promotion to xs:double should be done directly, not via xs:float, to avoid loss of precision.

  3. anyURI Type Promotion: A value of type xs:anyURI can be promoted to the type xs:string.

Some functions accept a single value or the empty sequence as an argument and some may return a single value or the empty sequence. This is indicated in the function signature by following the parameter or return type name with a question mark: "?", indicating that either a single value or the empty sequence must appear. See below.

fn:function-name(
$parameter-name as parameter-type
) as return-type?

Note that this function signature is different from a signature in which the parameter is omitted. See, for example, the two signatures for fn:string. In the first signature, the parameter is omitted and the argument defaults to the context item, referred to as .. In the second signature, the argument must be present but may be the empty sequence, written as ().

Some functions accept a sequence of zero or more values as an argument. This is indicated by following the name of the type of the items in the sequence with *. The sequence may contain zero or more items of the named type. For example, the function below accepts a sequence of xs:double and returns a xs:double or the empty sequence.

fn:median(
$arg as xs:double*
) as xs:double?

In XPath 4.0, the arguments in a function call can be supplied by keyword as an alternative to supplying them positionally. For example the call resolve-uri(@href, static-base-uri()) can now be written resolve-uri(base: static-base-uri(), relative: @href). The order in which arguments are supplied can therefore differ from the order in which they are declared. The specification, however, continues to use phrases such as "the second argument" as a convenient shorthand for "the value of the argument that is bound to the second parameter declaration".

1.5 Options

As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.

For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:

fn:xml-to-json($input, map{'indent':true()})

[Definition] Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.

Where a function adopts the ·option parameter conventions·, the following rules apply:

  1. The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.

  2. The type of the options parameter in the function signature is always given as map(*).

  3. Although option names are described above as strings, the actual key may be any value that compares equal to the required string (using the eq operator with Unicode codepoint collation; or equivalently, the op:same-key relation). For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.

    Note:

    This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.

  4. It is not an error if the options map contains options with names other than those described in this specification. Implementations may attach an ·implementation-defined· meaning to such entries, and may define errors that arise if such entries are present with invalid values. Implementations must ignore such entries unless they have a specific ·implementation-defined· meaning. Implementations that define additional options in this way should use values of type xs:QName as the option names, using an appropriate namespace.

  5. All entries in the options map are optional, and supplying an empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.

  6. For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the function conversion rulesXP31. This will result in an error (typically [err:XPTY0004]XP or [err:FORG0001]FO) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.

    Note:

    It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function calling rules.

  7. In cases where an option is list-valued, by convention the value may be supplied either as a sequence or as an array. Accepting a sequence is convenient if the value is generated programmatically using an XPath expression; while accepting an array allows the options to be held in an external file in JSON format, to be read using a call on the fn:json-doc function.

  8. In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.

1.6 Type System

The diagrams in this section show how nodes, functions, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct subsystems that both include the primitive atomic types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the arrowheads point toward the type from which they are derived. The dashed line represents relationships not present in this diagram, but that appear in one of the other diagrams. Dotted lines represent additional relationships that follow an evident pattern. The information that appears in each diagram is recapitulated in tabular form.

The xs:IDREFS, xs:NMTOKENS, xs:ENTITIES types, and xs:numeric and both the user-defined list types and user-defined union types are special types in that these types are lists or unions rather than types derived by extension or restriction.

1.6.1 Item Types

The first diagram and its corresponding table illustrate the relationship of various item types.

Item types are used to characterize the various types of item that can appear in a sequence (nodes, atomic values, and functions), and they are therefore used in declaring the types of variables or the argument types and result types of functions.

Item types in the data model form a directed graph, rather than a hierarchy or lattice: in the relationship defined by the derived-from(A, B) function, some types are derived from more than one other type. Examples include functions (function(xs:string) as xs:int is substitutable for function(xs:NCName) as xs:int and also for function(xs:string) as xs:decimal), and union types (A is substitutable for union(A, B) and also for union(A, C). In XDM, item types include node types, function types, and built-in atomic types. The diagram, which shows only hierarchic relationships, is therefore a simplification of the full model.

%3 anyAtomicType xs:anyAtomicType xitem item xitem->anyAtomicType xnode node xitem->xnode function function(*) xitem->function attribute attribute xnode->attribute document document xnode->document element element xnode->element leafNodes text comment processing-instruction namespace xnode->leafNodes user_defined_attribute_types user-defined attribute types attribute->user_defined_attribute_types user_defined_document_types user-defined document types document->user_defined_document_types user_defined_element_types user-defined element types element->user_defined_element_types array array(*) function->array map map(*) function->map %3 cluster_legend Legend _bat Built-in atomic types _nod Node types _fit Function item types _udt User-defined types _abt Abstract types

The image shows a portion of the type hierarchy, rooted at the abstract type item. The types xs:anyAtomicType, and the abstract types for nodes and functions are derived from item. Arrays and maps are further derived from functions. Attribute, document, element, text, comment, processing-instruction, and namespace nodes are derived from node. User-defined attribute, document, and element types are also be derived from attribute, document, and element, respectively.

1.6.2 Schema Type Hierarchy

The next diagram and table illustrate the schema type subsystem, in which all types are derived from the distinguished type xs:anyType.

Schema types include built-in types defined in the XML Schema specification, and user-defined types defined using mechanisms described in the XML Schema specification. Schema types define the permitted contents of nodes. The main categories are complex types, which define the permitted content of elements, and simple types, which can be used to constrain the values of both elements and attributes.

%3 cluster_XmlSchemaTypes cluster_simpleTypes cluster_atomicTypes anyType xs:anyType conc_xsd XML Schema types anyType->conc_xsd anySimpleType xs:anySimpleType anyType->anySimpleType complex_types complex types anyType->complex_types conc_simple Simple types anySimpleType->conc_simple anyAtomicType xs:anyAtomicType anySimpleType->anyAtomicType list_types list types anySimpleType->list_types union_types union types anySimpleType->union_types conc_atomic Atomic types anyAtomicType->conc_atomic listTypes xs:IDREFS xs:NMTOKENS xs:ENTITIES list_types->listTypes user_defined_list_types user-defined list types list_types->user_defined_list_types numeric xs:numeric union_types->numeric user_defined_union_types user-defined union types union_types->user_defined_union_types untyped xs:untyped complex_types->untyped user_defined_complex_types user-defined complex types complex_types->user_defined_complex_types %3 cluster_legend Legend _bat Built-in atomic types _bic Built-in complex types _fit Built-in list types _udt User-defined types _abt Conceptual types

The image shows a portion of the type hierarchy, rooted at xs:anyType which represents, conceptually, all of the XML Schema types. The xs:anySimpleType, representing conceptually all of the simple types, and all of the conceptual complex types, are derived from xs:anyType. The xs:anyAtomicType representing conceptually all of the atomic types, and all of the conceptual list and union types are derived from xs:anySimpleType. The types xs:IDREFS, xs:NMTOKENS, xs:ENTITIES, and user-defined list types are derived from list types. The types xs:numeric and user-defined union types are derived from the union types. The types xs:untyped and user-defined complex types are derived from complex types.

1.6.3 Atomic Type Hierarchy

The final diagram and table show all of the atomic types, including the primitive simple types and the built-in types derived from the primitive simple types. This includes all the built-in datatypes defined in [XML Schema Part 2: Datatypes Second Edition].

Atomic types are both item types and schema types, so the root type xs:anyAtomicType may be found in both the previous diagrams.

%3 anyAtomicType xs:anyAtomicType untypedAtomic xs:untypedAtomic anyAtomicType->untypedAtomic string xs:string anyAtomicType->string duration xs:duration anyAtomicType->duration dateAndTime xs:date xs:time anyAtomicType->dateAndTime dateTime xs:dateTime anyAtomicType->dateTime floatingPoint xs:double xs:float anyAtomicType->floatingPoint decimal xs:decimal anyAtomicType->decimal binaryTypes xs:base64Binary xs:hexBinary anyAtomicType->binaryTypes boolean xs:boolean anyAtomicType->boolean anyURI xs:anyURI anyAtomicType->anyURI QName xs:QName anyAtomicType->QName NOTATION xs:NOTATION anyAtomicType->NOTATION gDateTypes xs:gYear xs:gMonth xs:gDay xs:gYearMonth xs:gMonthDay anyAtomicType->gDateTypes normalizedString xs:normalizedString string->normalizedString dateTimeStamp xs:dateTimeStamp durationTypes xs:yearMonthDuration xs:dayTimeDuration duration->durationTypes integer xs:integer nonPositiveInteger xs:nonPositiveInteger integer->nonPositiveInteger long xs:long integer->long nonNegativeInteger xs:nonNegativeInteger integer->nonNegativeInteger negativeInteger xs:negativeInteger nonPositiveInteger->negativeInteger int xs:int long->int short xs:short int->short byte xs:byte short->byte unsignedLong xs:unsignedLong nonNegativeInteger->unsignedLong positiveInteger xs:positiveInteger nonNegativeInteger->positiveInteger unsignedInt xs:unsignedInt unsignedLong->unsignedInt unsignedShort xs:unsignedShort unsignedInt->unsignedShort unsignedByte xs:unsignedByte unsignedShort->unsignedByte dateTime->dateTimeStamp decimal->integer token xs:token normalizedString->token language xs:language token->language NMTOKEN xs:NMTOKEN token->NMTOKEN Name xs:Name token->Name NCName xs:NCName Name->NCName idTypes xs:ID xs:IDREF NCName->idTypes ENTITY xs:ENTITY NCName->ENTITY %3 cluster_legend Legend _bat Built-in atomic types

The image shows a portion of the type hierarchy, rooted at xs:anyAtomicType. The types xs:untypedAtomic, xs:string, xs:duration, xs:date, xs:time, xs:dateTime, xs:double, xs:float, xs:decimal, xs:base64Binary, xs:hexBinary, xs:boolean, xs:anyURI, xs:QName, xs:Notation, xs:gYear, xs:gMonth, xs:gDay, xs:gYearMonth, and xs:gMonthDay are derived from xs:anyAtomicType.

Starting at xs:string, xs:normalizedString is derived and xs:token is derived from that. The types xs:language, xs:NMTOKEN, and xs:Name are derived from xs:token. The type xs:NCName is further derived from xs:Name, and the types xs:ID, xs:IDREF, and xs:ENTITY are derived from xs:NCName.

The types xs:yearMonthDuration and xs:dayTimeDuration are derived from xs:duration.

The type xs:integer is derived from xs:decimal. The types xs:nonPositiveInteger (and from that xs:negativeInteger), xs:long, and xs:nonNegativeInteger are derived from xs:integer. An xs:long is the head of a chain of derivations from xs:int to xs:short to xs:byte. Finally, a positiveInteger and the chain of derivations from xs:unsignedLong to xs:unsignedInt to xs:unsighedShort to xs:unsignedByte are derived from xs:nonNegativeInteger.

1.7 Terminology

The terminology used to describe the functions and operators on types defined in [XML Schema Part 2: Datatypes Second Edition] is defined in the body of this specification. The terms defined in this section are used in building those definitions.

Note:

Following in the tradition of [XML Schema Part 2: Datatypes Second Edition], the terms type and datatype are used interchangeably.

1.7.1 Strings, characters, and codepoints

This document uses the terms string, character, and codepoint with meanings that are normatively defined in [XQuery and XPath Data Model (XDM) 3.1], and which are paraphrased here for ease of reference:

[Definition] A character is an instance of the CharXML production of [Extensible Markup Language (XML) 1.0 (Fifth Edition)].

Note:

This definition excludes Unicode characters in the surrogate blocks as well as xFFFE and xFFFF, while including characters with codepoints greater than xFFFF which some programming languages treat as two characters. The valid characters are defined by their codepoints, and include some whose codepoints have not been assigned by the Unicode consortium to any character.

[Definition] A string is a sequence of zero or more ·characters·, or equivalently, a value in the value space of the xs:string datatype.

[Definition] A codepoint is an integer assigned to a ·character· by the Unicode consortium, or reserved for future assignment to a character.

Note:

The set of codepoints is thus wider than the set of characters.

This specification spells "codepoint" as one word; the Unicode specification spells it as "code point". Equivalent terms found in other specifications are "character number" or "code position". See [Character Model for the World Wide Web 1.0: Fundamentals]

Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader's attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.

It is ·implementation-defined· which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used.

Unless explicitly stated, the xs:string values returned by the functions in this document are not normalized in the sense of [Character Model for the World Wide Web 1.0: Fundamentals].

Notes:

In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML ·characters· in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above xFFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.

1.7.2 Namespaces and URIs

This document uses the phrase "namespace URI" to identify the concept identified in [Namespaces in XML] as "namespace name", and the phrase "local name" to identify the concept identified in [Namespaces in XML] as "local part".

It also uses the term "expanded-QName" defined below.

[Definition] An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.

The term URI is used as follows:

[Definition] Within this specification, the term URI refers to Universal Resource Identifiers as defined in [RFC 3986] and extended in [RFC 3987] with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in [XML Schema Part 2: Datatypes Second Edition].

Note:

Note that this means, in practice, that where this specification requires a "URI Reference", an IRI as defined in [RFC 3987] will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in [RFC 3987]; for example it does not require non-ASCII characters to be escaped.

1.7.3 Conformance terminology

In this specification:

  • The auxiliary verb must, when rendered in small capitals, indicates a precondition for conformance.

    • When the sentence relates to an implementation of a function (for example "All implementations must recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.

    • When the sentence relates to the result of a function (for example "The result must have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.

    • When the sentence relates to the arguments to a function (for example "The value of $arg must be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied.

  • The auxiliary verb may, when rendered in small capitals, indicates optional or discretionary behavior. The statement "An implementation may do X" implies that it is implementation-dependent whether or not it does X.

  • The auxiliary verb should, when rendered in small capitals, indicates desirable or recommended behavior. The statement "An implementation should do X" implies that it is desirable to do X, but implementations may choose to do otherwise if this is judged appropriate.

[Definition] Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.

[Definition] Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.

Note:

Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.

1.7.4 Properties of functions

This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.

[Definition] An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library. In XSLT, for example, any two function calls executed during the same transformation are in the same execution scope (except that static expressions, such as those used in use-when attributes, are in a separate execution scope).

The following definition explains more precisely what it means for two function calls to return the same result:

[Definition] Two values are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:

  1. Both items are atomic values, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings.

  2. Both items are nodes, and represent the same node.

  3. Both items are maps, both maps have the same number of entries, and for every entry E1 in the first map there is an entry E2 in the second map such that the keys of E1 and E2 are ·the same key·, and the corresponding values V1 and V2 are ·identical·.

  4. Both items are arrays, both arrays have the same number of members, and the members are pairwise ·identical·.

  5. Both items are function items, neither item is a map or array, and all the following conditions apply:

    1. Either both functions have the same name, or both names are absentDM31.

    2. Both functions have the same arity.

    3. Both functions have the same function signature. Two function signatures are defined to be the same if the declared result types are identical and the declared argument types are pairwise identical. Two types S and T are defined to be identical if and only if subtype(S, T) and subtype(T, S) both hold, where the subtype relation is defined in Section 2.5.6.1 The judgement subtype(A, B) XP31.

      Note:

      Under this definition, a union type with memberTypes="xs:double xs:decimal" is identical to a union type with memberTypes="xs:decimal xs:double". However, two functions whose signatures differ in this way will probably be deemed non-identical under rule (e) below, because they are likely to have different effect when invoked with an argument of type xs:untypedAtomic.

    4. Both functions have the same nonlocal variable bindings (sometimes called the function's closure).

    5. The processor is able to determine that the implementations of the two functions are equivalent, in the sense that for all possible combinations of arguments, the two functions have the same effect.

    Note:

    There is no function or operator defined in the specification that tests whether two function items are identical. Where the specification requires two function items to be identical, for example in the results of repeated calls of a function whose result is a function, then the processor must ensure that it returns functions that are indistinguishable in their observable effect. Where the specification defines behavior conditional on two function items being identical, the determination of identity is to some degree implementation-dependent. There are cases where function items are definitely not identical (for example if they have different name or arity), but positive determination of identity is possible only using implementation-dependent techniques, for example when both items contain references to the same piece of code representing the function's implementation.

Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.

[Definition] A function may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context as well as on the actual supplied arguments (if any).

[Definition] A function that is not ·context-dependent· is called context-independent.

A function that is context-dependent can be used as a named function reference, can be partially applied, and can be found using fn:function-lookup. The principle in such cases is that the static context used for the function evaluation is taken from the static context of the named function reference, partial function application, or the call on fn:function-lookup; and the dynamic context for the function evaluation is taken from the dynamic context of the evaluation of the named function reference, partial function application, or the call of fn:function-lookup. In effect, the static and dynamic part of the context thus act as part of the closure of the function item.

Context-dependent functions fall into a number of categories:

  1. The functions fn:current-date, fn:current-dateTime, fn:current-time, fn:default-language, fn:implicit-timezone, fn:adjust-date-to-timezone, fn:adjust-dateTime-to-timezone, and fn:adjust-time-to-timezone depend on properties of the dynamic context that are fixed within the ·execution scope·. The same applies to a number of functions in the op: namespace that manipulate dates and times and that make use of the implicit timezone. These functions will return the same result if called repeatedly during a single ·execution scope·.

  2. A number of functions including fn:base-uri#0, fn:data#0, fn:document-uri#0, fn:element-with-id#1, fn:id#1, fn:idref#1, fn:lang#1, fn:last#0, fn:local-name#0, fn:name#0, fn:namespace-uri#0, fn:normalize-space#0, fn:number#0, fn:path#0, fn:position#0, fn:root#0, fn:string#0, and fn:string-length#0 depend on the focusXP31. These functions will in general return different results on different calls if the focus is different.

    [Definition] A function is focus-dependent if its result depends on the focusXP31 (that is, the context item, position, or size).

    [Definition] A function that is not ·focus-dependent· is called focus-independent

  3. The function fn:default-collation and many string-handling operators and functions depend on the default collation and the in-scope collations, which are both properties of the static context. If a particular call of one of these functions is evaluated twice with the same arguments then it will return the same result each time (because the static context, by definition, does not change at run time). However, two distinct calls (that is, two calls on the function appearing in different places in the source code) may produce different results even if the explicit arguments are the same.

  4. Functions such as fn:static-base-uri, fn:doc, and fn:collection depend on other aspects of the static context. As with functions that depend on collations, a single call will produce the same results on each call if the explicit arguments are the same, but two calls appearing in different places in the source code may produce different results.

The fn:function-lookup function is a special case because it is potentially dependent on everything in the static and dynamic context. This is because the static and dynamic context of the call to fn:function-lookup are used as the static and dynamic context of the function that fn:function-lookup returns.

[Definition] For a ·context-dependent· function, the parts of the context on which it depends are referred to as implicit arguments.

[Definition] A function that is guaranteed to produce ·identical· results from repeated calls within a single ·execution scope· if the explicit and implicit arguments are identical is referred to as deterministic.

[Definition] A function that is not ·deterministic· is referred to as nondeterministic.

All functions defined in this specification are ·deterministic· unless otherwise stated. Exceptions include the following:

  • [Definition] Some functions (such as fn:distinct-values, fn:unordered, map:keys, and map:for-each) produce results in an ·implementation-defined· or ·implementation-dependent· order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.

  • Some functions (such as fn:analyze-string, fn:parse-xml, fn:parse-xml-fragment, and fn:json-to-xml) construct a tree of nodes to represent their results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). However, if non-identical nodes are returned, their content will be the same in the sense of the fn:deep-equal function. Such a function is said to be non-deterministic with respect to node identity.

  • Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be ·deterministic· with the exception that an implementation is allowed to make them non-deterministic as a user option.

Where the results of a function are described as being (to a greater or lesser extent) ·implementation-defined· or ·implementation-dependent·, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results.