2 Sequence Normalization

An instance of the data model that is input to the serialization process is a sequence. Prior to serializing a sequence using any of the output methods whose behavior is specified by this document (3 Serialization Parameters), with the exception of the JSON and Adaptive output methods, the serializer MUST first compute a normalized sequence for serialization; it is the normalized sequence that is actually serialized. [Definition: The purpose of sequence normalization is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.] [Definition: The result of the sequence normalization process is a result tree.]

The normalized sequence for serialization is constructed by applying all of the following rules in order, with the initial sequence being input to the first step, and the sequence that results from any step being used as input to the subsequent step. For any implementation-defined output method, it is implementation-defined whether this sequence normalization process takes place. For the JSON and Adaptive output methods, sequence normalization MUST NOT take place.

Where the process of converting the input sequence to a normalized sequence indicates that a value MUST be cast to xs:string, that operation is defined in Section 19.1.1 Casting to xs:string and xs:untypedAtomic FO31 of [XQuery and XPath Functions and Operators 3.1]. Where a step in the sequence normalization process indicates that a node should be copied, the copy is performed in the same way as an XSLT xsl:copy-of instruction that has a validation attribute whose value is preserve and has a select attribute whose effective value is the node, as described in Section 11.9.2 Deep Copy XT30 of [XSL Transformations (XSLT) Version 3.0], or equivalently in the same way as an XQuery content expression as described in Step 1e of Section 3.9.1.3 Content XQ31 of [XQuery 3.1: An XML Query Language], where the construction mode is preserve. The steps in computing the normalized sequence are:

  1. If the sequence that is input to serialization is empty, create a sequence S1 that consists of a zero-length string. Otherwise, copy each item in the sequence that is input to serialization to create the new sequence S1. Each item in the sequence that is an array is flattened by calling the function array:flatten() before being copied.

  2. For each item in S1, if the item is atomic, obtain the lexical representation of the item by casting it to an xs:string and copy the string representation to the new sequence; otherwise, copy the item to the new sequence. The new sequence is S2.

  3. If the item-separator serialization parameter is absent, then for each subsequence of adjacent strings in S2, copy a single string to the new sequence equal to the values of the strings in the subsequence concatenated in order, each separated by a single space. Copy all other items to the new sequence. Otherwise, copy each item in S2 to the new sequence, inserting between each pair of items a string whose value is equal to the value of the item-separator parameter. The new sequence is S3.

  4. For each item in S3, if the item is a string, create a text node in the new sequence whose string value is equal to the string; otherwise, copy the item to the new sequence. The new sequence is S4.

  5. For each item in S4, if the item is a document node, copy its children to the new sequence; otherwise, copy the item to the new sequence. The new sequence is S5.

  6. For each subsequence of adjacent text nodes in S5, copy a single text node to the new sequence equal to the values of the text nodes in the subsequence concatenated in order. Any text nodes with values of zero length are dropped. Copy all other items to the new sequence. The new sequence is S6.

  7. It is a serialization error [err:SENR0001] if an item in S6 is an attribute node, a namespace node or a function. Otherwise, construct a new sequence, S7, that consists of a single document node and copy all the items in the sequence, which are all nodes, as children of that document node.

S7 is the normalized sequence.

The result tree rooted at the document node that is created by the final step of this sequence normalization process is the instance of the data model to which the rules of the appropriate output method are applied. If the sequence normalization process results in a serialization error, the serializer MUST signal the error.

Note:

If the item-separator serialization parameter is absent, the sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:document>
  <xsl:copy-of select="$seq" validation="preserve"/>
</xsl:document>

or the XQuery expression:

declare construction preserve;

document { $seq }

If the item-separator serialization parameter is present, the sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:document>
  <xsl:for-each select="$seq">
    <xsl:sequence select="if (position() gt 1) 
                          then $sep 
                          else ()"/>

    <xsl:choose>
      <xsl:when test=". instance of node()">
        <xsl:sequence select="."/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:document>

or the XQuery expression:

declare construction preserve; 

document {
  for $item at $pos in $seq
  let $node := 
    if ($item instance of node()) then 
      $item 
    else 
      text { $item }
  return
    if ($pos eq 1) then
      $node
    else
      ($sep, $node)  
}

where the value of the sep variable is a string whose value is equal to the value of the item-separator serialization parameter.

This process results in a serialization error [err:SENR0001] if $seq contains functions, attribute nodes or namespace nodes.