The HTML output method serializes the instance of the data model as HTML.
For example, the following XSL stylesheet generates html output,
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" version="4.0"/> <xsl:template match="/"> <html> <xsl:apply-templates/> </html> </xsl:template> ... </xsl:stylesheet>
In the example, the version
attribute of the xsl:output
element indicates the version of the HTML Recommendation [HTML] to which the serialized result is to conform.
It is entirely the responsibility of the person or process that creates the instance of the data model to ensure that the instance of the data model conforms to the HTML Recommendation [HTML]. It is not an error if the instance of the data model is invalid HTML. Equally, it is entirely under the control of the person or process that creates the instance of the data model whether the output conforms to HTML. If the result tree is valid HTML, the serializer MUST serialize the result in a way that conforms with the version of HTML specified by the requested HTML version.
As is described in detail below, the HTML output method will not output an element differently from the XML output method unless the element is to be serialized as an HTML element. [Definition: The portion of the serialized document representing the result of serializing an element, that is not to be serialized as an HTML element, is known as an XML Island.] [Definition: An element node is serialized as an HTML element if
the expanded QName of the element has a null namespace URI, regardless of the value of the requested HTML version, or
the value of the
requested HTML
version
is 5.0
or
greater, and
the element node is in the
XHTML namespace.
]
If the
element is to be
serialized as an HTML
element,
but the local part of the expanded QName is not recognized as the name
of an HTML element, the element
MUST be output in the same way as a
non-empty, inline element such as span
. In particular:
Any namespace node in the result tree
for the XML namespace, is ignored
by the HTML output method.
In addition,
if the
requested HTML version
is 5.0
, any element node that has a prefix and is in the
XHTML namespace,
MathML namespace,
or SVG namespace
MUST be serialized
with an unprefixed element name. The serializer MUST
serialize an attribute with the name xmlns
whose value is
equal to the namespace URI of the element node, unless an ancestor
element in the serialized result already has an attribute named
xmlns
with the same value, and no intervening element
has an attribute named xmlns
with a different value.
If the element
node has a namespace node for the default namespace whose value is not
equal to the namespace URI of the element node,
the namespace node is ignored.
The serializer
MUST NOT serialize a namespace declaration for the
namespace node declaring the element node's prefix, unless an attribute
of the element node has the same prefix.
For namespace nodes in the result tree
that are not ignored, the HTML output method
MUST represent these namespaces using
attributes named xmlns
or xmlns:
prefix
in the same way as the XML output method would represent them when the
version
parameter is set to 1.0
.
If the result tree contains elements or attributes whose names have a
non-null namespace URI, the HTML output method
MUST generate
namespace-prefixed QNames for these nodes in the same way as the XML output
method would do when the version
parameter is set to 1.0
.
Where special rules are defined later in this section for serializing specific HTML elements and attributes, these rules MUST NOT be applied to an element that is not to be serialized as an HTML element or an attribute whose name has a non-null namespace URI. However, the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, MUST be used also for namespaced elements and attributes.
When serializing an element whose name is not defined in the
HTML specification, but that is
to be
serialized as an HTML
element, the HTML output method
MUST
apply the same rules (for example, indentation rules) as
when serializing a span
element. The descendants of such
an element
MUST be serialized as if they were descendants of a
span
element.
When serializing an element whose name is in a non-null
namespace, the HTML output method
MUST apply the same rules (for
example, indentation rules) as when serializing a div
element. The descendants of such an element
MUST be serialized as if
they were descendants of a div
element,
except for the influence
of the cdata-section-elements
serialization parameter
on any text node children of the element.
The HTML output method
MUST NOT output an end-tag for an empty element
if the element type has an empty content model,
and the value of the
requested HTML
version
is less than 5.0
, or the element is a void
element and the value of the
requested HTML
version
is 5.0
.
For HTML 4.0, the
element types that have an empty content model are
area
, base
, basefont
,
br
, col
,
embed
,
frame
,
hr
, img
, input
,
isindex
, link
, meta
and
param
.
For HTML5, the void elements are
area
, base
,
br
, col
, embed
,
hr
, img
, input
,
keygen
, link
, meta
,
param
, source
, track
and
wbr
. It is implementation-defined
whether the basefont
, frame
and isindex
elements, which are not part of HTML5 are considered to be void elements when
the
requested HTML
version
has the value 5.0
.
For example, an element written as
<br/>
or <br></br>
in an
XSLT stylesheet
MUST be output as <br>
.
Note:
The markup generation step of the phases of serialization only creates start tags and end tags for the HTML output method, never XML-style empty element tags. As such, a serializer MUST serialize an HTML element that has no children, but whose content model is not empty, using a pair of adjacent start and end element tags, or as a solitary start tag if permitted by the context.
For any element node that is to be
serialized as an HTML
element,
the HTML output method
MUST
compare the local part of the name of
the element node with the names of HTML elements
making the comparison
without regard to case.
If the local part of the name of the
element node compares equal to that of any HTML element, the element node
MUST be recognized as being that kind of HTML
element.
For example, elements named
br
, BR
or Br
MUST all be
recognized as the HTML br
element and output without an
end-tag.
The HTML output method
MUST NOT perform escaping for
any text node
descendant, nor for any attribute of an element node descendant,
of
a
script
or
style
element.
For example, a script
element
created by an XQuery direct element constructor or an XSLT
literal result element, such as:
<script>if (a < b) foo()</script>
or
<script><![CDATA[if (a < b) foo()]]></script>
MUST be output as
<script>if (a < b) foo()</script>
A common requirement is to output a script
element
as shown in the example below:
<script type="application/ecmascript"> document.write ("<em>This won't work</em>") </script>
This is invalid HTML, for the reasons explained in section B.3.2 of the [HTML] 4.01 specification. Nevertheless, it is possible to output this fragment, using either of the following constructs:
Firstly, by use of a script
element
created by an XQuery direct element constructor or an
XSLT literal result element:
<script type="application/ecmascript"> document.write ("<em>This won't work</em>") </script>
Secondly, by constructing the markup from ordinary text characters:
<script type="application/ecmascript"> document.write ("<em>This won't work</em>") </script>
As the [HTML] specification points out, the correct way to write this is to use the escape conventions for the specific scripting language. For JavaScript, it can be written as:
<script type="application/ecmascript"> document.write ("<em>This will work<\/em>") </script>
The [HTML] 4.01 specification also shows examples of how to write this in various other scripting languages. The escaping MUST be done manually; it will not be done by the serializer.
The HTML output method
MUST NOT escape
"<
" characters occurring in attribute values.
A boolean attribute is an attribute with only a single allowed value in any of the HTML DTDs or that is specified to be a boolean attribute by HTML5 (see [HTML5]), where the allowed value is equal without regard to case to the name of the attribute. The HTML output method MUST output any boolean attribute in minimized form if and only if the value of the attribute node actually is equal to the name of the attribute making the comparison without regard to case.
For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element
<OPTION selected="selected">
MUST be output as
<OPTION selected>
The HTML output method
MUST NOT escape a
&
character occurring in an attribute value
immediately followed by a {
character (see Section
B.7.1 of the HTML Recommendation [HTML]).
For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element
<BODY bgcolor='&{{randomrbg}};'>
MUST be output as
<BODY bgcolor='&{randomrbg};'>
See 7.4 The Influence of Serialization Parameters upon the HTML Output Method for additional directives on how attributes MAY be written.
The HTML output method MAY output a character using a
character entity reference in preference to using a numeric character
reference, if an entity is defined for the character in the version of
HTML that the output method is using. Entity references and character
references SHOULD be used only where the character is not present in
the selected encoding, or where the visual representation of the
character is unclear (as with
, for
example).
When outputting a sequence of
whitespace characters in the
instance of the data model, within an element where
whitespace
characters are
treated normally
(but not in elements such as pre
and
textarea
), the HTML output method
MAY
represent it using any sequence of whitespace
characters that will be treated
in the same way by an HTML user agent. See section 3.5 of [XHTML Modularization] for some additional information on
handling of whitespace by an HTML user agent
for versions of HTML prior to HTML5,
and see the [HTML5] for information on the handling of whitespace
characters by an HTML5 user agent.
Note:
The terms space character and white_space character defined in HTML5 do not match the definition of whitespace character in this specification.
Certain characters
are
permitted
in XML, but not in HTML
prior to HTML5
— for example,
the control characters #x7F-#x9F, are
permitted
in both XML 1.0 and XML 1.1, and
the control characters #x1-#x8, #xB, #xC and #xE-#x1F are
permitted
in XML 1.1, but
none of these is permitted in HTML
prior to HTML5
.
It is a
serialization error [err:SERE0014] to use the HTML
output method if such characters
appear in the instance of the data model
and the value of the
requested HTML
version
is less than 5.0
. The
serializer
MUST signal the error.
The HTML output method
MUST terminate processing
instructions with >
rather than
?>
. It is a serialization error [err:SERE0015] to use the HTML output method when >
appears within a processing instruction in the data model instance being serialized.
version
and html-version
Parameters
The
html-version
or the
version
serialization parameter
indicates the version of the HTML
Recommendation [HTML]
or [HTML5]
to which the serialized result is
to conform.
[Definition:
If the
html-version
serialization parameter is not absent, the
requested HTML version is the value of the
html-version
serialization parameter; otherwise, it is
the value of the version
serialization
parameter.]
If the serializer does
not support the version of HTML specified by
the requested
HTML version, it
MUST signal a
serialization error [err:SESU0013].
This document provides the normative definition of serialization for the HTML output method if the requested HTML version has the lexical form of a value of type decimal whose value is 1.0 or greater, but no greater than 5.0. For any other value of version parameter, the behavior is implementation-defined. In that case the implementation-defined behavior MAY supersede all other requirements of this recommendation.
encoding
Parameter
The encoding
parameter specifies the encoding to be used.
Serializers are
REQUIRED to support values of UTF-8
and
UTF-16
. A serialization error [err:SESU0007] occurs if an output
encoding other than UTF-8
or UTF-16
is
requested and the serializer
does not support that encoding. The serializer
MUST signal the error.
It is possible that the instance of the data model will contain a character that
cannot be represented in the encoding that the serializer
is using for
output. In this case, if the character occurs in a context where HTML
recognizes character references, then the character
MUST be output
as a character entity reference or decimal numeric character
reference; otherwise (for example, in a script
or
style
element or in a comment), the serializer
MUST
signal a serialization error [err:SERE0008].
See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type
parameter.
indent
and suppress-indentation
Parameters
If the indent
parameter has
one of the values yes
, true
or 1
,
then the
HTML output method MAY add or remove whitespace as it
serializes the result tree,
if it observes the following
constraints.
Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.
Whitespace MUST NOT be added or removed adjacent to
an inline element. The inline elements are those included in the
%inline
category of any of the HTML 4.01 DTDs
or those elements defined to be phrasing
elements in HTML5, as well as the
ins
and del
elements if they are used as inline
elements (i.e., if they do not contain element children).
Whitespace MUST NOT be added or removed inside a
formatted element, the formatted elements being pre
,
script
, style
,
title
,
and textarea
.
Whitespace characters
MUST NOT be added in the content of an element
whose expanded QName matches
a member of the list of expanded QNames in the
value of the suppress-indentation
parameter.
The expanded QName of an element node
is considered to match a member of the list of expanded QNames
if:
the two expanded QNames are equal;
the expanded QNames both have null namespace URIs, and the local parts of the two QNames are equal without regard to case; or
the value of the
requested HTML
version
is 5.0
, the local parts of the two QNames are equal
without regard to case
and one QName has a null namespace
URI and the namespace URI of the other is equal to the XHTML
namespace URI.
Note:
The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.
Note that the HTML definition of whitespace is different from the XML definition (see section 9.1 of the [HTML] specification).
cdata-section-elements
Parameter
The cdata-section-elements
parameter is not applicable to the HTML output method, except in the case of XML Islands.
omit-xml-declaration
and standalone
Parameters
The omit-xml-declaration
and standalone
parameters are not applicable to the HTML output method.
doctype-system
and doctype-public
Parameters
If the doctype-public
or doctype-system
parameters are specified, then the HTML output method MUST
output a document type declaration.
If the
doctype-public
parameter is specified, then the output
method
MUST output PUBLIC
followed by the specified
public identifier; if the doctype-system
parameter is
also specified, it
MUST also output the specified
system identifier
following the public identifier. If the doctype-system
parameter is specified but the doctype-public
parameter
is not specified, then the output method
MUST output
SYSTEM
followed by the specified system identifier.
If the value of the
requested HTML
version
is 5.0
, the
doctype-public
and doctype-system
serialization
parameters are both absent,
the first element node child of
the document node that is to be serialized
is to be
serialized as an HTML
element, the local part of the QName of which is equal to
the string HTML
,
without regard to case,
and any text node that precedes that
element node in document contain only whitespace characters,
then
the HTML output method MUST output a document type
declaration, with no public or system identifier.
If the HTML output method MUST
output a document type declaration, it MUST be serialized
immediately before the first element, if any, and the name following
<!DOCTYPE
MUST be HTML
or html
.
undeclare-prefixes
Parameter
The undeclare-prefixes
parameter is not applicable to the HTML output method.
normalization-form
Parameter
The
normalization-form
parameter is applicable to the
HTML output method.
The values NFC
and
none
MUST be supported by the serializer.
A serialization error [err:SESU0011] results if the value of the normalization-form
parameter specifies a normalization form that is not supported by the
serializer;
the serializer
MUST signal the error.
media-type
Parameter
The media-type
parameter is applicable to the
HTML output method.
See 3 Serialization Parameters for more
information. See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type
parameter.
use-character-maps
Parameter
The use-character-maps
parameter is applicable to the
HTML output method. See 11 Character Maps for more
information.
byte-order-mark
Parameter
The byte-order-mark
parameter is
applicable to the HTML output method. See
3 Serialization Parameters for more information.
escape-uri-attributes
Parameter
If the escape-uri-attributes
parameter
has
one of the values yes
, true
or 1
,
the HTML output method MUST
apply URI escaping to
URI attribute values, except that relative URIs MUST NOT be absolutized.
Note:
This escaping is deliberately confined to non-ASCII characters,
because escaping of ASCII characters is not always appropriate, for
example when URIs or URI fragments are interpreted locally by the HTML
user agent. Even in the case of non-ASCII characters, escaping can
sometimes cause problems. More precise control of URI escaping is
therefore available by setting escape-uri-attributes
to
no
, and controlling the escaping of URIs by using methods defined in
Section
6.2 fn:encode-for-uri
FO31 and Section
6.3 fn:iri-to-uri
FO31.
include-content-type
Parameter
If there is a head
element,
and the include-content-type
parameter has
one of the values yes
, true
or 1
,
the HTML output method
MUST add a meta
element
as the first child element
of the head
element specifying the character encoding
actually used.
For example,
<HEAD> <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> ...
The content type MUST
be set to the value given for the
media-type
parameter.
If a meta
element has been added to the head
element as described above,
then any existing meta
element child of the head
element having an
http-equiv
attribute with the value
"Content-Type", making the comparison
without regard to case
after first stripping leading and trailing spaces from the value of
the attribute solely for the purposes of comparison,
MUST be discarded.
Note:
This process removes possible parameters in the attribute value. For example,
<meta http-equiv="Content-Type" content="text/html;version='3.0'"/>
in the data model instance would be replaced by,
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
item-separator
Parameter
The effect of the item-separator
serialization parameter
is described in 2 Sequence Normalization.