6 XHTML Output Method

The XHTML output method serializes the instance of the data model as XML, using the HTML compatibility guidelines defined in the XHTML specification ([XHTML 1.0] or the XHTML syntax of HTML5 (see [HTML5]).

[Definition: An element node is recognized as an HTML element by the XHTML output method if

]

It is entirely the responsibility of the person or process that creates the instance of the data model to ensure that the instance of the data model conforms to the [XHTML 1.0] or [XHTML 1.1] specification if the html-version serialization parameter is absent or has a value less than 5.0 or the XHTML syntax of HTML5 if the value of the html-version serialization parameter is 5.0. It is not an error if the instance of the data model is invalid XHTML. Equally, it is entirely under the control of the person or process that creates the instance of the data model whether the output conforms to XHTML 1.0 Strict, XHTML 1.0 Transitional, the XHTML syntax of HTML5 (see [HTML5]), [POLYGLOT] or any other specific definition of XHTML.

The serialization of the instance of the data model follows the same rules as for the XML output method, with the general exceptions noted below and parameter-specific exceptions in 6.1 The Influence of Serialization Parameters upon the XHTML Output Method. These differences are based on the HTML compatibility guidelines published in Appendix C of [XHTML 1.0] and on [POLYGLOT], both of which are designed to ensure that as far as possible, XHTML is rendered correctly on user agents designed originally to handle HTML.

If the value of the html-version serialization parameter is 5.0, the instance of the data model that is to be serialized is first subjected to prefix normalization.

[Definition: During prefix normalization, any element node in the instance of the data model that is to be serialized that is in one of the XHTML namespace, the SVG namespace or the MathML namespace has its name replaced by the local part of its name. Such an element node is given a default namespace node whose value is the element's namespace URI. Any namespace node for any of those three namespaces that was previously present on any element node in the instance of the data model is also removed, unless the prefix that that namespace node declared is used as the prefix on the name of an attribute on that element or an ancestor of that element.]

The process of prefix normalization is equivalent to replacing the instance of the data model that is to be serialized with the result of the transformation described by this XSLT stylesheet, with the instance of the data model as the initial context item.

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="3.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <xsl:template match="xhtml:*|svg:*|mathml:*">
    <xsl:element name="{local-name()}" 
                 namespace="{namespace-uri()}">
      <xsl:apply-templates select="@*|namespace::*|node()"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="node()|@*|namespace::*">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*|namespace::*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template
    match="namespace::*[. eq 'http://www.w3.org/1999/xhtml']|
           namespace::*[. eq 'http://www.w3.org/2000/svg']|
           namespace::*[. eq 'http://www.w3.org/1998/Math/MathML']"/>
</xsl:stylesheet>

Note:

[POLYGLOT] and Appendix C of [XHTML 1.0] describe a number of compatibility guidelines for users of XHTML who wish to render their XHTML documents with HTML user agents. In some cases, such as the guideline on the form empty elements take, only the serialization process itself has the ability to follow the guideline. In such cases, those guidelines are reflected in the requirements on the serializer described above.

In all other cases, the guidelines can be adhered to by the instance of the data model that is input to the serialization process. The guideline on the use of whitespace characters in attribute values is one such example. Another example is that xml:lang="..." does not serialize to both xml:lang="..." and lang="..." as required by some legacy user agents. It is the responsibility of the person or process that creates the instance of the data model that is input to the serialization process to ensure it is created in a way that is consistent with the guidelines. No serialization error results if the input instance of the data model does not adhere to the guidelines.

6.1 The Influence of Serialization Parameters upon the XHTML Output Method

6.1.1 XHTML Output Method: the version Parameter

The behavior for the version parameter for the XHTML output method is described in 5.1.1 XML Output Method: the version Parameter.

6.1.2 XHTML Output Method: the html-version Parameter

The html-version parameter specifies whether the XHTML output method will produce a serialized document following rules that are tailored to the requirements of the XHTML syntax of [HTML5] or the requirements of [XHTML 1.0] and [XHTML 1.1].

The differences are described in detail throughout 6 XHTML Output Method.

6.1.3 XHTML Output Method: the encoding Parameter

The behavior for encoding parameter for the XHTML output method is described in 5.1.3 XML Output Method: the encoding Parameter.

6.1.4 XHTML Output Method: the indent and suppress-indentation Parameters

If the indent parameter has one of the values yes, true or 1, the serializer MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

  • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

  • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those elements recognized as HTML elements that are in the %inline category of any of the XHTML 1.0 DTDs, in the %inline.class category of the XHTML 1.1 DTD, those elements defined to be phrasing elements in HTML5 and elements recognized as HTML elements with local names ins and del if they are used as inline elements (i.e., if they do not contain element children).

  • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those recognized as HTML elements with local names pre, script, style, title, and textarea.

  • Whitespace characters MUST NOT be added in the content of an element whose expanded QName matches a member of the list of expanded QNames in the value of the suppress-indentation parameter. The expanded QName of an element node is considered to match a member of the list of expanded QNames if:

Note:

The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how an HTML user agent that conforms to the specified version of HTML would render the output, assuming the serialized document does not refer to any HTML style sheets.

The HTML definition of whitespace is different from the XML definition: see section 9.1 of [HTML] 4.01 specification.

6.1.5 XHTML Output Method: the cdata-section-elements Parameter

The behavior for cdata-section-elements parameter for the XHTML output method is described in 5.1.5 XML Output Method: the cdata-section-elements Parameter.

6.1.6 XHTML Output Method: the omit-xml-declaration and standalone Parameters

The behavior for omit-xml-declaration and standalone parameters for the XHTML output method is described in 5.1.6 XML Output Method: the omit-xml-declaration and standalone Parameters.

Note:

As with the XML output method, the XHTML output method specifies that an XML declaration will be output unless it is suppressed using the omit-xml-declaration parameter. Appendix C.1 of [XHTML 1.0] provides advice on the consequences of including, or omitting, the XML declaration.

6.1.7 XHTML Output Method: the doctype-system and doctype-public Parameters

If the value of the html-version serialization parameter is 5.0, the doctype-system serialization parameter is absent, the first element node child of the document node that is to be serialized is recognized as an HTML element, the local part of the QName of which is equal to the string HTML, without regard to case, and any text node preceding that element in document order contains only whitespace characters, then the XHTML output method MUST output a document type declaration immediately before the first element, with no public or system identifier. The name following <!DOCTYPE MUST be the same as the local part of the name of the element.

Otherwise, the behavior for doctype-system and doctype-public parameters for the XHTML output method is described in 5.1.7 XML Output Method: the doctype-system and doctype-public Parameters.

6.1.8 XHTML Output Method: the undeclare-prefixes Parameter

The behavior for undeclare-prefixes parameter for the XHTML output method is described in 5.1.8 XML Output Method: the undeclare-prefixes Parameter.

6.1.9 XHTML Output Method: the normalization-form Parameter

The behavior for normalization-form parameter for the XHTML output method is described in 5.1.9 XML Output Method: the normalization-form Parameter.

6.1.10 XHTML Output Method: the media-type Parameter

The behavior for media-type parameter for the XHTML output method is described in 5.1.10 XML Output Method: the media-type Parameter.

6.1.11 XHTML Output Method: the use-character-maps Parameter

The behavior for use-character-maps parameter for the XHTML output method is described in 5.1.11 XML Output Method: the use-character-maps Parameter.

6.1.12 XHTML Output Method: the byte-order-mark Parameter

The behavior for byte-order-mark parameter for the XHTML output method is described in 5.1.12 XML Output Method: the byte-order-mark Parameter.

6.1.13 XHTML Output Method: the escape-uri-attributes Parameter

If the escape-uri-attributes parameter has one of the values yes, true or 1, the XHTML output method MUST apply URI escaping to URI attribute values, except that relative URIs MUST NOT be absolutized.

Note:

This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting escape-uri-attributes to no, and controlling the escaping of URIs by using methods defined in Section 6.2 fn:encode-for-uri FO31 and Section 6.3 fn:iri-to-uri FO31.

6.1.14 XHTML Output Method: the include-content-type Parameter

If the instance of the data model includes a head element recognized as an HTML element, and the include-content-type parameter has one of the values yes, true or 1, the XHTML output method MUST add a meta element as the first child element of the head element, specifying the character encoding actually used. The meta element SHOULD be in no namespace if the head element is in no namespace, and in the XHTML namespace if the head element is in the XHTML namespace.

For example,

<head>
<meta http-equiv="Content-Type" 
      content="text/html; charset=EUC-JP" />
...

The content type SHOULD be set to the value given for the media-type parameter.

Note:

It is recommended that the host language use as default value for this parameter one of the MIME types ([RFC2046]) registered for XHTML. Currently, these are text/html (registered by [RFC2854]) and application/xhtml+xml (registered by [RFC3236]). Note that some user agents fail to recognize the charset parameter if the content type is not text/html.

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type", making the comparison without regard to case after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison, MUST be discarded.

Note:

This process removes possible parameters in the attribute value. For example,

<meta http-equiv="Content-Type" 
      content="text/html;version='3.0'" />

in the data model instance would be replaced by,

<meta http-equiv="Content-Type" 
      content="text/html;charset=utf-8" />

6.1.15 XHTML Output Method: the item-separator Parameter

The effect of the item-separator serialization parameter is described in 2 Sequence Normalization.

6.1.16 XHTML Output Method: the allow-duplicate-names Parameter

The allow-duplicate-names serialization parameter is not applicable to the XHTML output method.

6.1.17 XHTML Output Method: the json-node-output-method Parameter

The json-node-output-method serialization parameter is not applicable to the XHTML output method.