7 HTML Output Method

The HTML output method serializes the instance of the data model as HTML.

For example, the following XSL stylesheet generates html output,

<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0"/>
<xsl:template match="/">
  <html>
    <xsl:apply-templates/>
  </html>
</xsl:template>
...
</xsl:stylesheet>

In the example, the version attribute of the xsl:output element indicates the version of the HTML Recommendation [HTML] to which the serialized result is to conform.

It is entirely the responsibility of the person or process that creates the instance of the data model to ensure that the instance of the data model conforms to the HTML Recommendation [HTML]. It is not an error if the instance of the data model is invalid HTML. Equally, it is entirely under the control of the person or process that creates the instance of the data model whether the output conforms to HTML. If the result tree is valid HTML, the serializer MUST serialize the result in a way that conforms with the version of HTML specified by the requested HTML version.

7.1 Markup for Elements

As is described in detail below, the HTML output method will not output an element differently from the XML output method unless the element is to be serialized as an HTML element. [Definition: The portion of the serialized document representing the result of serializing an element, that is not to be serialized as an HTML element, is known as an XML Island.] [Definition: An element node is serialized as an HTML element if

]

If the element is to be serialized as an HTML element, but the local part of the expanded QName is not recognized as the name of an HTML element, the element MUST be output in the same way as a non-empty, inline element such as span. In particular:

  1. Any namespace node in the result tree for the XML namespace, is ignored by the HTML output method. In addition, if the requested HTML version is 5.0, any element node that has a prefix and is in the XHTML namespace, MathML namespace, or SVG namespace MUST be serialized with an unprefixed element name. The serializer MUST serialize an attribute with the name xmlns whose value is equal to the namespace URI of the element node, unless an ancestor element in the serialized result already has an attribute named xmlns with the same value, and no intervening element has an attribute named xmlns with a different value. If the element node has a namespace node for the default namespace whose value is not equal to the namespace URI of the element node, the namespace node is ignored. The serializer MUST NOT serialize a namespace declaration for the namespace node declaring the element node's prefix, unless an attribute of the element node has the same prefix. For namespace nodes in the result tree that are not ignored, the HTML output method MUST represent these namespaces using attributes named xmlns or xmlns:prefix in the same way as the XML output method would represent them when the version parameter is set to 1.0.

  2. If the result tree contains elements or attributes whose names have a non-null namespace URI, the HTML output method MUST generate namespace-prefixed QNames for these nodes in the same way as the XML output method would do when the version parameter is set to 1.0.

  3. Where special rules are defined later in this section for serializing specific HTML elements and attributes, these rules MUST NOT be applied to an element that is not to be serialized as an HTML element or an attribute whose name has a non-null namespace URI. However, the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, MUST be used also for namespaced elements and attributes.

  4. When serializing an element whose name is not defined in the HTML specification, but that is to be serialized as an HTML element, the HTML output method MUST apply the same rules (for example, indentation rules) as when serializing a span element. The descendants of such an element MUST be serialized as if they were descendants of a span element.

  5. When serializing an element whose name is in a non-null namespace, the HTML output method MUST apply the same rules (for example, indentation rules) as when serializing a div element. The descendants of such an element MUST be serialized as if they were descendants of a div element, except for the influence of the cdata-section-elements serialization parameter on any text node children of the element.

The HTML output method MUST NOT output an end-tag for an empty element if the element type has an empty content model, and the value of the requested HTML version is less than 5.0, or the element is a void element and the value of the requested HTML version is 5.0.

For HTML 4.0, the element types that have an empty content model are area, base, basefont, br, col, embed, frame, hr, img, input, isindex, link, meta and param. For HTML5, the void elements are area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track and wbr. It is implementation-defined whether the basefont, frame and isindex elements, which are not part of HTML5 are considered to be void elements when the requested HTML version has the value 5.0.

For example, an element written as <br/> or <br></br> in an XSLT stylesheet MUST be output as <br>.

Note:

The markup generation step of the phases of serialization only creates start tags and end tags for the HTML output method, never XML-style empty element tags. As such, a serializer MUST serialize an HTML element that has no children, but whose content model is not empty, using a pair of adjacent start and end element tags, or as a solitary start tag if permitted by the context.

For any element node that is to be serialized as an HTML element, the HTML output method MUST compare the local part of the name of the element node with the names of HTML elements making the comparison without regard to case. If the local part of the name of the element node compares equal to that of any HTML element, the element node MUST be recognized as being that kind of HTML element. For example, elements named br, BR or Br MUST all be recognized as the HTML br element and output without an end-tag.

The HTML output method MUST NOT perform escaping for any text node descendant, nor for any attribute of an element node descendant, of a script or style element.

For example, a script element created by an XQuery direct element constructor or an XSLT literal result element, such as:

<script>if (a &lt; b) foo()</script>

or

<script><![CDATA[if (a < b) foo()]]></script>

MUST be output as

<script>if (a < b) foo()</script>

A common requirement is to output a script element as shown in the example below:

<script type="application/ecmascript">
      document.write ("<em>This won't work</em>")
</script>

This is invalid HTML, for the reasons explained in section B.3.2 of the [HTML] 4.01 specification. Nevertheless, it is possible to output this fragment, using either of the following constructs:

Firstly, by use of a script element created by an XQuery direct element constructor or an XSLT literal result element:

<script type="application/ecmascript">
      document.write ("<em>This won't work</em>")
</script>

Secondly, by constructing the markup from ordinary text characters:

<script type="application/ecmascript">
      document.write ("&lt;em&gt;This won't work&lt;/em&gt;")
</script>

As the [HTML] specification points out, the correct way to write this is to use the escape conventions for the specific scripting language. For JavaScript, it can be written as:

<script type="application/ecmascript">
      document.write ("&lt;em&gt;This will work&lt;\/em&gt;")
</script>

The [HTML] 4.01 specification also shows examples of how to write this in various other scripting languages. The escaping MUST be done manually; it will not be done by the serializer.

7.2 Writing Attributes

The HTML output method MUST NOT escape "<" characters occurring in attribute values.

A boolean attribute is an attribute with only a single allowed value in any of the HTML DTDs or that is specified to be a boolean attribute by HTML5 (see [HTML5]), where the allowed value is equal without regard to case to the name of the attribute. The HTML output method MUST output any boolean attribute in minimized form if and only if the value of the attribute node actually is equal to the name of the attribute making the comparison without regard to case.

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<OPTION selected="selected">

MUST be output as

<OPTION selected>

The HTML output method MUST NOT escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML Recommendation [HTML]).

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<BODY bgcolor='&amp;{{randomrbg}};'>

MUST be output as

<BODY bgcolor='&{randomrbg};'>

See 7.4 The Influence of Serialization Parameters upon the HTML Output Method for additional directives on how attributes MAY be written.

7.3 Writing Character Data

The HTML output method MAY output a character using a character entity reference in preference to using a numeric character reference, if an entity is defined for the character in the version of HTML that the output method is using. Entity references and character references SHOULD be used only where the character is not present in the selected encoding, or where the visual representation of the character is unclear (as with &nbsp;, for example).

When outputting a sequence of whitespace characters in the instance of the data model, within an element where whitespace characters are treated normally (but not in elements such as pre and textarea), the HTML output method MAY represent it using any sequence of whitespace characters that will be treated in the same way by an HTML user agent. See section 3.5 of [XHTML Modularization] for some additional information on handling of whitespace by an HTML user agent for versions of HTML prior to HTML5, and see the [HTML5] for information on the handling of whitespace characters by an HTML5 user agent.

Note:

The terms space character and white_space character defined in HTML5 do not match the definition of whitespace character in this specification.

Certain characters are permitted in XML, but not in HTML prior to HTML5 — for example, the control characters #x7F-#x9F, are permitted in both XML 1.0 and XML 1.1, and the control characters #x1-#x8, #xB, #xC and #xE-#x1F are permitted in XML 1.1, but none of these is permitted in HTML prior to HTML5 . It is a serialization error [err:SERE0014] to use the HTML output method if such characters appear in the instance of the data model and the value of the requested HTML version is less than 5.0. The serializer MUST signal the error.

The HTML output method MUST terminate processing instructions with > rather than ?>. It is a serialization error [err:SERE0015] to use the HTML output method when > appears within a processing instruction in the data model instance being serialized.

7.4 The Influence of Serialization Parameters upon the HTML Output Method

7.4.1 HTML Output Method: the version and html-version Parameters

The html-version or the version serialization parameter indicates the version of the HTML Recommendation [HTML] or [HTML5] to which the serialized result is to conform. [Definition: If the html-version serialization parameter is not absent, the requested HTML version is the value of the html-version serialization parameter; otherwise, it is the value of the version serialization parameter.] If the serializer does not support the version of HTML specified by the requested HTML version, it MUST signal a serialization error [err:SESU0013].

This document provides the normative definition of serialization for the HTML output method if the requested HTML version has the lexical form of a value of type decimal whose value is 1.0 or greater, but no greater than 5.0. For any other value of version parameter, the behavior is implementation-defined. In that case the implementation-defined behavior MAY supersede all other requirements of this recommendation.

7.4.2 HTML Output Method: the encoding Parameter

The encoding parameter specifies the encoding to be used. Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error [err:SESU0007] occurs if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding. The serializer MUST signal the error.

It is possible that the instance of the data model will contain a character that cannot be represented in the encoding that the serializer is using for output. In this case, if the character occurs in a context where HTML recognizes character references, then the character MUST be output as a character entity reference or decimal numeric character reference; otherwise (for example, in a script or style element or in a comment), the serializer MUST signal a serialization error [err:SERE0008].

See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type parameter.

7.4.3 HTML Output Method: the indent and suppress-indentation Parameters

If the indent parameter has one of the values yes, true or 1, then the HTML output method MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

  • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

  • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those included in the %inline category of any of the HTML 4.01 DTDs or those elements defined to be phrasing elements in HTML5, as well as the ins and del elements if they are used as inline elements (i.e., if they do not contain element children).

  • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, title, and textarea.

  • Whitespace characters MUST NOT be added in the content of an element whose expanded QName matches a member of the list of expanded QNames in the value of the suppress-indentation parameter. The expanded QName of an element node is considered to match a member of the list of expanded QNames if:

Note:

The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.

Note that the HTML definition of whitespace is different from the XML definition (see section 9.1 of the [HTML] specification).

7.4.4 HTML Output Method: the cdata-section-elements Parameter

The cdata-section-elements parameter is not applicable to the HTML output method, except in the case of XML Islands.

7.4.5 HTML Output Method: the omit-xml-declaration and standalone Parameters

The omit-xml-declaration and standalone parameters are not applicable to the HTML output method.

7.4.6 HTML Output Method: the doctype-system and doctype-public Parameters

If the doctype-public or doctype-system parameters are specified, then the HTML output method MUST output a document type declaration. If the doctype-public parameter is specified, then the output method MUST output PUBLIC followed by the specified public identifier; if the doctype-system parameter is also specified, it MUST also output the specified system identifier following the public identifier. If the doctype-system parameter is specified but the doctype-public parameter is not specified, then the output method MUST output SYSTEM followed by the specified system identifier.

If the value of the requested HTML version is 5.0, the doctype-public and doctype-system serialization parameters are both absent, the first element node child of the document node that is to be serialized is to be serialized as an HTML element, the local part of the QName of which is equal to the string HTML, without regard to case, and any text node that precedes that element node in document contain only whitespace characters, then the HTML output method MUST output a document type declaration, with no public or system identifier.

If the HTML output method MUST output a document type declaration, it MUST be serialized immediately before the first element, if any, and the name following <!DOCTYPE MUST be HTML or html.

7.4.7 HTML Output Method: the undeclare-prefixes Parameter

The undeclare-prefixes parameter is not applicable to the HTML output method.

7.4.8 HTML Output Method: the normalization-form Parameter

The normalization-form parameter is applicable to the HTML output method. The values NFC and none MUST be supported by the serializer. A serialization error [err:SESU0011] results if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer; the serializer MUST signal the error.

7.4.9 HTML Output Method: the media-type Parameter

The media-type parameter is applicable to the HTML output method. See 3 Serialization Parameters for more information. See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type parameter.

7.4.10 HTML Output Method: the use-character-maps Parameter

The use-character-maps parameter is applicable to the HTML output method. See 11 Character Maps for more information.

7.4.11 HTML Output Method: the byte-order-mark Parameter

The byte-order-mark parameter is applicable to the HTML output method. See 3 Serialization Parameters for more information.

7.4.12 HTML Output Method: the escape-uri-attributes Parameter

If the escape-uri-attributes parameter has one of the values yes, true or 1, the HTML output method MUST apply URI escaping to URI attribute values, except that relative URIs MUST NOT be absolutized.

Note:

This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting escape-uri-attributes to no, and controlling the escaping of URIs by using methods defined in Section 6.2 fn:encode-for-uri FO31 and Section 6.3 fn:iri-to-uri FO31.

7.4.13 HTML Output Method: the include-content-type Parameter

If there is a head element, and the include-content-type parameter has one of the values yes, true or 1, the HTML output method MUST add a meta element as the first child element of the head element specifying the character encoding actually used.

For example,

<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
...

The content type MUST be set to the value given for the media-type parameter.

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type", making the comparison without regard to case after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison, MUST be discarded.

Note:

This process removes possible parameters in the attribute value. For example,

<meta http-equiv="Content-Type" 
      content="text/html;version='3.0'"/>

in the data model instance would be replaced by,

<meta http-equiv="Content-Type" 
      content="text/html;charset=utf-8"/>

7.4.14 HTML Output Method: the item-separator Parameter

The effect of the item-separator serialization parameter is described in 2 Sequence Normalization.

7.4.15 HTML Output Method: the allow-duplicate-names Parameter

The allow-duplicate-names serialization parameter is not applicable to the HTML output method.

7.4.16 HTML Output Method: the json-node-output-method Parameter

The json-node-output-method serialization parameter is not applicable to the HTML output method.