5 XML Output Method

The XML output method serializes the normalized sequence as an XML entity that MUST satisfy the rules for either a well-formed XML document entity, a well-formed XML external general parsed entity, or both. A serialization error [err:SERE0003] results if the serializer is unable to satisfy those rules, except for content modified by the character expansion phase of serialization, as described in 4 Phases of Serialization. The effects of the character expansion phase could result in the serialized output being not well-formed, but will not result in a serialization error. If a serialization error results, the serializer MUST signal the error.

If the document node of the normalized sequence has a single element node child and no text node children, then the serialized output is a well-formed XML document entity, and the serialized output MUST conform to the appropriate version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1]. If the normalized sequence does not take this form, then the serialized output is a well-formed XML external general parsed entity, which, when referenced within a trivial XML document wrapper like this:

<?xml version="version"?>
<!DOCTYPE doc [
<!ENTITY e SYSTEM "entity-URI">
]>
<doc>&e;</doc>

where entity-URI is a URI for the entity, and the value of the version pseudo-attribute is the value of the version parameter, produces a document which MUST itself be a well-formed XML document conforming to the corresponding version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1].

[Definition: A reconstructed tree may be constructed by parsing the XML document and converting it into an instance of the data model as specified in [XQuery and XPath Data Model (XDM) 3.1].] The result of serialization MUST be such that the reconstructed tree is the same as the result tree except for the following permitted differences:

A consequence of this rule is that certain characters MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "&#xD;", "&#x85;", and "&#x2028;", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "&#xD;", "&#xA;", "&#x9;", "&#x85;", and "&#x2028;", or their equivalents. In addition, the non-whitespace control characters #x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be output as character references.

For example, an attribute with the value "x" followed by "y" separated by a newline will result in the output "x&#xA;y" (or with any equivalent character reference). The XML output cannot be "x" followed by a literal newline followed by a "y" because after parsing, the attribute value would be "x y" as a consequence of the XML attribute normalization rules.

Note:

XML 1.0 did not permit an XML processor to normalize NEL or LINE SEPARATOR characters to a LINE FEED character. However, if a document entity that specifies version 1.1 invokes an external general parsed entity with no text declaration or a text declaration that specifies version 1.0, the external parsed entity is processed according to the rules of XML 1.1. For this reason, NEL and LINE SEPARATOR characters in text and attribute nodes MUST always be escaped using character references, regardless of the value of the version parameter.

XML 1.0 permitted control characters in the range #x7F through #x9F to appear as literal characters in an XML document, but XML 1.1 requires such characters, other than NEL, to be escaped as character references. An external general parsed entity with no text declaration or a text declaration that specifies a version pseudo-attribute with value 1.0 that is invoked by an XML 1.1 document entity MUST follow the rules of XML 1.1. Therefore, the non-whitespace control characters in the ranges #x1 through #x1F and #x7F through #x9F MUST always be escaped, regardless of the value of the version parameter.

It is a serialization error [err:SEPM0004] to specify the doctype-system parameter, or to specify the standalone parameter with a value other than omit, if the instance of the data model contains text nodes or multiple element nodes as children of the root node. The serializer MUST either signal the error, or recover by ignoring the request to output a document type declaration or standalone parameter.

5.1 The Influence of Serialization Parameters upon the XML Output Method

5.1.1 XML Output Method: the version Parameter

The version parameter specifies the version of XML and the version of Namespaces in XML to be used for outputting the instance of the data model. The version output in the XML declaration (if an XML declaration is not omitted) MUST correspond to the version of XML that the serializer used for outputting the instance of the data model. The value of the version parameter MUST match the VersionNumXML production of the XML Recommendation [XML10] or [XML11]. A serialization error [err:SESU0013] results if the value of the version parameter specifies a version of XML that is not supported by the serializer; the serializer MUST signal the error.

This document provides the normative definition of serialization for the XML output method if the version parameter has either the value 1.0 or 1.1. For any other value of version parameter, the behavior is implementation-defined. In that case the implementation-defined behavior MAY supersede all other requirements of this recommendation.

If the serialized result would contain an NCNameNames that contains a character that is not permitted by the version of Namespaces in XML specified by the version parameter, a serialization error [err:SERE0005] results. The serializer MUST signal the error.

If the serialized result would contain a character that is not permitted by the version of XML specified by the version parameter, a serialization error [err:SERE0006] results. The serializer MUST signal the error.

For example, if the version parameter has the value 1.0, and the instance of the data model contains a non-whitespace control character in the range #x1 to #x1F, a serialization error [err:SERE0006] results. If the version parameter has the value 1.1 and a comment node in the instance of the data model contains a non-whitespace control character in the range #x1 to #x1F or a control character other than NEL in the range #x7F to #x9F, a serialization error [err:SERE0006] results.

5.1.2 XML Output Method: the html-version Parameter

The html-version parameter is not applicable to the XML output method. It is the responsibility of the host language to specify whether an error occurs if this parameter is specified in combination with the XML output method, or if the parameter is simply dropped.

5.1.3 XML Output Method: the encoding Parameter

The encoding parameter specifies the encoding to be used for outputting the instance of the data model. Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error [err:SESU0007] occurs if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding. The serializer MUST signal the error, or recover by using UTF-8 or UTF-16 instead. The serializer MUST NOT use an encoding whose name does not match the EncNameXML production of the XML Recommendation [XML10].

When outputting a newline character in the instance of the data model, the serializer is free to represent it using any character sequence that will be normalized to a newline character by an XML parser, unless a specific mapping for the newline character is provided in a character map (see 11 Character Maps).

When outputting any other character that is defined in the selected encoding, the character MUST be output using the correct representation of that character in the selected encoding.

It is possible that the instance of the data model will contain a character that cannot be represented in the encoding that the serializer is using for output. In this case, if the character occurs in a context where XML recognizes character references (that is, in the value of an attribute node or text node), then the character MUST be output as a character reference. A serialization error [err:SERE0008] occurs if such a character appears in a context where character references are not allowed (for example, if the character occurs in the name of an element). The serializer MUST signal the error.

For example, if a text node contains the character LATIN SMALL LETTER E WITH ACUTE (#xE9), and the value of the encoding parameter is US-ASCII, the character MUST be serialized as a character reference. If a comment node contains the same character, a serialization error [err:SERE0008] results.

5.1.4 XML Output Method: the indent and suppress-indentation Parameters

The indent and suppress-indentation parameters control whether the serializer MAY adjust the whitespace in the serialized result so that a person will find it easier to read. If the indent parameter has one of the values yes, true or 1, the serializer MAY output whitespace characters in addition to the whitespace characters in the instance of the data model. It MAY also elide from the output whitespace characters that occurred in the instance of the data model or replace such whitespace characters with other whitespace characters.

[Definition: The term content has the same meaning as the term ContentXML defined in Section 3.1 Start-Tags, End-Tags, and Empty-Element TagsXML of [XML10].] [Definition: The immediate content of an element is the part of the content of the element that is not also in the content of a child element of that element.]

If the indent parameter has the value no, false or 0, the serializer MUST NOT output any additional, elide or replace whitespace characters. If the indent parameter has one of the values yes, true or 1, the serializer MUST use an algorithm for dealing with whitespace characters that satisfies all of the following constraints. If more than one constraint applies, the serializer MUST apply the most restrictive constraint. That is, if any applicable constraint indicates that whitespace MUST NOT be added, elided or replaced, that constraint prevails; if an applicable constraint indicates that whitespace SHOULD NOT be added, elided or replaced, while all other applicable constraints indicate that whitespace MAY be added, elided or replaced, whitespace SHOULD NOT be added, elided or replaced.

  • Whitespace characters MAY be added adjacent to a text node only if the text node contains only whitespace characters. Whitespace characters in such a text node MAY also be elided or replaced. For example, a tab MAY be inserted as a replacement for existing spaces.

  • Whitespace characters MAY be added, elided or replaced in the immediate content of an element whose type annotation is xs:untyped or xs:anyType and that has element node children, in the immediate content of an element whose content model is element only, or outside the content of any element.

  • Whitespace characters MUST NOT be added, elided or replaced in the immediate content of an element whose content model is known to be simple or empty.

  • Whitespace characters SHOULD NOT be added, elided or replaced in places where the characters would constitute significant whitespace, for example, in the immediate content of an element that is annotated with a type other than xs:untyped or xs:anyType, and whose content model is known to be mixed.

  • Whitespace characters MUST NOT be added, elided or replaced in the content of an element whose expanded QName is a member of the list of expanded QNames in the value of the suppress-indentation parameter.

  • Whitespace characters MUST NOT be added, elided or replaced in a part of the result document that is controlled by an xml:space attribute with value preserve (See [XML10] for more information about the xml:space attribute).

Note:

The effect of these rules is to ensure that whitespace is only added in places where (a) XSLT's <xsl:strip-space> declaration could cause it to be removed, and (b) it does not affect the string value of any element node with simple content. It is usually not safe to indent document types that include elements with mixed content.

Note:

The whitespace added may possibly be based on whitespace stripped from either the source document or the stylesheet (in the case of XSLT), or guided by other means that might depend on the host language, in the case of an instance of the data model created using some other process.

5.1.5 XML Output Method: the cdata-section-elements Parameter

The cdata-section-elements parameter contains a list of expanded QNames. If the expanded QName of the parent of a text node is a member of the list, then the text node MUST be output as a CDATA section, except in those circumstances described below.

If the text node contains the sequence of characters ]]>, then the currently open CDATA section MUST be closed following the ]] and a new CDATA section opened before the >.

If the text node contains characters that are not representable in the character encoding being used to output the instance of the data model, then the currently open CDATA section MUST be closed before such characters, the characters MUST be output using character references or entity references, and a new CDATA section MUST be opened for any further characters in the text node.

CDATA sections MUST NOT be used except where they have been explicitly requested by the user, either by using the cdata-section-elements parameter, or by using some other implementation-defined mechanism.

Note:

This is phrased to permit an implementor to provide an option that attempts to preserve CDATA sections present in the source document.

5.1.6 XML Output Method: the omit-xml-declaration and standalone Parameters

The XML output method MUST output an XML declaration if the omit-xml-declaration parameter has the value no, false or 0. The XML declaration MUST include both version information and an encoding declaration. If the standalone parameter has one of the values yes, true, 1, no, false or 0, the XML declaration MUST include a standalone document declaration with the same value as the value of the standalone parameter. If the standalone parameter has the value omit, the XML declaration MUST NOT include a standalone document declaration; this ensures that it is both an XML declaration (allowed at the beginning of a document entity) and a text declaration (allowed at the beginning of an external general parsed entity).

A serialization error [err:SEPM0009] results if the omit-xml-declaration parameter has one of the values yes, true or 1, and

  • the standalone parameter has a value other than omit; or

  • the version parameter has a value other than 1.0 and the doctype-system parameter is specified.

The serializer MUST signal the error.

Otherwise, if the omit-xml-declaration parameter has one of the values yes, true or 1, the XML output method MUST NOT output an XML declaration.

5.1.7 XML Output Method: the doctype-system and doctype-public Parameters

If the doctype-system parameter is specified, the XML output method MUST output a document type declaration immediately before the first element. The name following <!DOCTYPE MUST be the name of the first element, if any. If the doctype-public parameter is also specified, then the XML output method MUST output PUBLIC followed by the public identifier and then the system identifier; otherwise, it MUST output SYSTEM followed by the system identifier. The internal subset MUST be empty. The doctype-public parameter MUST be ignored unless the doctype-system parameter is specified.

5.1.8 XML Output Method: the undeclare-prefixes Parameter

The Data Model allows an element node that binds a non-empty prefix to have a child element node that does not bind that same prefix. In Namespaces in XML 1.1 ([XML Names 1.1]), this can be represented accurately by undeclaring prefixes. For the undeclaring prefix of the child element node, if the undeclare-prefixes parameter has one of the values yes, true or 1, the output method is XML or XHTML, and the version parameter value is greater than 1.0, the serializer MUST undeclare its namespace. If the undeclare-prefixes parameter has the value no, false or 0 and the output method is XML or XHTML, then the undeclaration of prefixes MUST NOT occur.

Consider an element x:foo with four in-scope namespaces that associate prefixes with URIs as follows:

  • x is associated with http://example.org/x

  • y is associated with http://example.org/y

  • z is associated with http://example.org/z

  • xml is associated with http://www.w3.org/XML/1998/namespace

Suppose that it has a child element x:bar with three in-scope namespaces:

  • x is associated with http://example.org/x

  • y is associated with http://example.org/y

  • xml is associated with http://www.w3.org/XML/1998/namespace

If namespace undeclaration is in effect, it will be serialized this way:

<x:foo xmlns:x="http://example.org/x"
       xmlns:y="http://example.org/y"
       xmlns:z="http://example.org/z">
       
       <x:bar xmlns:z="">...</x:bar>
       
</x:foo>

In Namespaces in XML 1.0 ([XML Names]), prefix undeclaration is not possible. If the output method is XML or XHTML, the value of the undeclare-prefixes parameter is one of, yes, true or 1, and the value of the version parameter is 1.0, a serialization error [err:SEPM0010] results; the serializer MUST signal the error.

5.1.9 XML Output Method: the normalization-form Parameter

The normalization-form parameter is applicable to the XML output method. The values NFC and none MUST be supported by the serializer. A serialization error [err:SESU0011] results if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer; the serializer MUST signal the error.

The meanings associated with the possible values of the normalization-form parameter are as follows:

If the value of the parameter is fully-normalized, then no relevant construct of the parsed entity created by the serializer may start with a composing character. The term relevant construct has the meaning defined in section 2.13 of [XML11]. If this condition is not satisfied, a serialization error [err:SERE0012] MUST be signaled.

Note:

Specifying fully-normalized as the value of this parameter does not guarantee that the XML document output by the serializer will in fact be fully normalized as defined in [XML11]. This is because the serializer does not check that the text is include normalized, which would involve checking all external entities that it refers to (such as an external DTD). Furthermore, the serializer does not check whether any character escape generated using character maps represents a composing character.

5.1.10 XML Output Method: the media-type Parameter

The media-type parameter is applicable to the XML output method. See 3 Serialization Parameters for more information.

5.1.11 XML Output Method: the use-character-maps Parameter

The use-character-maps parameter is applicable to the XML output method. The result of serialization using the XML output method is not guaranteed to be well-formed XML if character maps have been specified. See 11 Character Maps for more information.

5.1.12 XML Output Method: the byte-order-mark Parameter

The byte-order-mark parameter is applicable to the XML output method. See 3 Serialization Parameters for more information.

Note:

The byte order mark may be undesirable under certain circumstances; for example, to concatenate resulting XML fragments without additional processing to remove the byte order mark. Therefore this specification does not mandate the byte-order-mark parameter to have one of the values yes, true or 1 when the encoding is UTF-16, even though the XML 1.0 and XML 1.1 specifications state that entities encoded in UTF-16 MUST begin with a byte order mark. Consequently, this specification does not guarantee that the resulting XML fragment, without a byte order mark, will not cause an error when processed by a conforming XML processor.

5.1.13 XML Output Method: the escape-uri-attributes Parameter

The escape-uri-attributes parameter is not applicable to the XML output method. It is the responsibility of the host language to specify whether an error occurs if this parameter is specified in combination with the XML output method, or if the parameter is simply dropped.

5.1.14 XML Output Method: the include-content-type Parameter

The include-content-type parameter is not applicable to the XML output method. It is the responsibility of the host language to specify whether an error occurs if this parameter is specified in combination with the XML output method, or if the parameter is simply dropped.

5.1.15 XML Output Method: the item-separator Parameter

The effect of the item-separator serialization parameter is described in 2 Sequence Normalization.

5.1.16 XML Output Method: the allow-duplicate-names Parameter

The allow-duplicate-names serialization parameter is not applicable to the XML output method.

5.1.17 XML Output Method: the json-node-output-method Parameter

The json-node-output-method serialization parameter is not applicable to the XML output method.