This section summarizes data model construction from an Infoset for each kind of information item. General notes occur elsewhere.
The document information item is required. A Document Node is constructed for each document information item.
The following infoset properties are required: [children] and [base URI].
The following infoset properties are optional: [unparsed entities].
Document Node properties are derived from the infoset as follows:
The value of the [base URI] property, if available. Note that the base URI property, if available, is always an absolute URI (if an absolute URI can be computed) though it may contain Unicode characters that are not allowed in URIs. These characters, if they occur, are present in the base-uri property and will have to be encoded and escaped by the application to obtain a URI suitable for retrieval, if retrieval is required.
In practice a [base URI] is not always known. In this case the value of the base-uri property of the document node will be the empty sequence. This is not intrinsically an error, though it may cause some operations that depend on the base URI to fail.
The sequence of nodes constructed from the information items found in the [children] property.
For each element, processing instruction, and comment found in the [children] property, a corresponding Element, Processing Instruction, or Comment Node is constructed and that sequence of nodes is used as the value of the children property.
If present among the [children], the document type declaration information item is ignored.
If the [unparsed entities] property is present and is not the empty set, the values of the unparsed entity information items must be used to support the dm:unparsed-entity-system-id and dm:unparsed-entity-public-id accessors.
The internal structure of the values of the unparsed-entities property is implementation defined.
The concatenation of the string-values of all its Text Node descendants in document order. If the document has no such descendants, the zero-length string.
The dm:string-value of the node as an
xs:untypedAtomic
value.
The document-uri property holds the absolute URI for the resource from which the document node was constructed, if one is available and can be made absolute. For example, if a collection of documents is returned by the fn:collection function, the document-uri property may serve to distinguish between them even though each has the same base-uri property.
If the document-uri is not
the empty sequence, then the following constraint must hold: the node returned
by evaluating fn:doc()
with the
document-uri as its argument must
return the document node that provided the value of the
document-uri property.
In other words, for any Document Node $arg
, either
fn:document-uri($arg)
must return the empty sequence or
fn:doc(fn:document-uri($arg))
must return $arg
.
The element information items are required. An Element Node is constructed for each element information item.
The following infoset properties are required: [namespace name], [local name], [children], [attributes], [in-scope namespaces], [base URI], and [parent].
Element Node properties are derived from the infoset as follows:
The value of the [base URI] property, if available. Note that the base URI property, if available, is always an absolute URI (if an absolute URI can be computed) though it may contain Unicode characters that are not allowed in URIs. These characters, if they occur, are present in the base-uri property and will have to be encoded and escaped by the application to obtain a URI suitable for retrieval, if retrieval is required.
In practice a [base URI] is not always known. In this case the value of the base-uri property of the document node will be the empty sequence. This is not intrinsically an error, though it may cause some operations that depend on the base URI to fail.
An xs:QName
constructed from the
[prefix],
[local name],
and
[namespace name] properties.
The node that corresponds to the value of the [parent] property or the empty sequence if there is no parent.
All Element Nodes constructed from an infoset have the type
xs:untyped
.
The sequence of nodes constructed from the information items found in the [children] property.
For each element, processing instruction, comment, and maximal sequence of adjacent character information items found in the [children] property, a corresponding Element, Processing Instruction, Comment, or Text Node is constructed and that sequence of nodes is used as the value of the children property.
Because the data model requires that all general entities be expanded, there will never be unexpanded entity reference information item children.
A set of Attribute Nodes constructed from the
attribute information items
appearing in the [attributes]
property. This includes all of the "special" attributes
(xml:lang
, xml:space
, xsi:type
, etc.)
but does not include namespace declarations (because they are not attributes).
Default and fixed attributes provided by the DTD are added to the [attributes] and are therefore included in the data model attributes of an element.
A set of Namespace Nodes constructed from the namespace information items appearing in the [in-scope namespaces] property. Implementations that do not support Namespace Nodes may simply preserve the relevant bindings in this property.
Implementations may ignore namespace information items for namespaces which are not known to be used. A namespace is known to be used if:
It appears in the expanded QName of the node-name of the element.
It appears in the expanded QName of the node-name of any of the element's attributes.
Note: applications may rely on namespaces that are not known to be used,
for example when QNames are used in content and that content does not
have a type of xs:QName
Such applications may have difficulty
processing data models where some namespaces have been ignored.
All Element Nodes constructed from an infoset have a nilled property of "false".
The string-value is constructed from the character information item [children] of the element and all its descendants. The precise rules for selecting significant character information items and constructing characters from them is described in 6.7.3 Construction from an Infoset of 6.7 Text Nodes.
This process is equivalent to concatenating the dm:string-values of all of the Text Node descendants of the resulting Element Node.
If the element has no such descendants, the string-value is the empty string.
The string-value as an xs:untypedAtomic
.
All Element Nodes constructed from an infoset have a is-id property of "false".
All Element Nodes constructed from an infoset have a is-idrefs property of "false".
The attribute information items are required. An Attribute Node is constructed for each attribute information item.
The following infoset properties are required: [namespace name], [local name], [normalized value], [attribute type], and [owner element].
Attribute Node properties are derived from the infoset as follows:
An xs:QName
constructed from the
[prefix],
[local name],
and
[namespace name] properties.
The Element Node that corresponds to the value of the [owner element] property or the empty sequence if there is no owner.
The value xs:untypedAtomic
.
The [normalized value] of the attribute.
The attribute’s typed-value
is its dm:string-value as an xs:untypedAtomic
.
If the attribute is named xml:id
and its
[attribute type] property does not
have the value ID
, then [xml:id] processing
is performed. This will assure that the value does have the type ID
and that it is properly normalized. If an error is encountered during
xml:id processing, an implementation may raise a dynamic error.
The
is-id property is always true
for
attributes named xml:id
.
If the [attribute type] property
has the value ID
, true
, otherwise false
.
If the [attribute type] property
has the value IDREF
or IDREFS
,
true
, otherwise false
.
The namespace information items are required.
The following infoset properties are required: [prefix], [namespace name].
Namespace Node properties are derived from the infoset as follows:
The [prefix] property.
The [namespace name] property.
The element in whose [in-scope namespaces] property the namespace information item appears, if the implementation exposes any mechanism for accessing the dm:parent accessor of Namespace Nodes.
A Processing Instruction Node is constructed for each processing instruction information item that is not ignored.
The following infoset properties are required: [target], [content], [base URI], and [parent].
Processing Instruction Node properties are derived from the infoset as follows:
The value of the [target] property.
The value of the [content] property.
The value of the [base URI] property, if available. Note that the base URI property, if available, is always an absolute URI (if an absolute URI can be computed) though it may contain Unicode characters that are not allowed in URIs. These characters, if they occur, are present in the base-uri property and will have to be encoded and escaped by the application to obtain a URI suitable for retrieval, if retrieval is required.
In practice a [base URI] is not always known. In this case the value of the base-uri property of the document node will be the empty sequence. This is not intrinsically an error, though it may cause some operations that depend on the base URI to fail.
The node corresponding to the value of the [parent] property.
There are no Processing Instruction Nodes for processing instructions that are children of a document type declaration information item.
The comment information items are optional.
A Comment Node is constructed for each comment information item.
The following infoset properties are required: [content] and [parent].
Comment Node properties are derived from the infoset as follows:
The value of the [content] property.
The node corresponding to the value of the [parent] property.
There are no Comment Nodes for comments that are children of a document type declaration information item.
The character information items are required. A Text Node is constructed for each maximal sequence of character information items in document order.
The following infoset properties are required: [character code] and [parent].
The following infoset properties are optional: [element content whitespace].
A sequence of character information items is maximal if it satisfies the following constraints:
All of the information items in the sequence have the same parent.
The sequence consists of adjacent character information items uninterrupted by other types of information item.
No other such sequence exists that contains any of the same character information items and is longer.
Text Node properties are derived from the infoset as follows:
A string comprised of characters that correspond to the [character code] properties of each of the character information items.
If the resulting Text Node consists entirely of whitespace and
the [element content whitespace] property
of the character information items used to
construct this node are true
,
the content of the Text Node
is the zero-length string. Text Nodes are only allowed to be empty if they
have no parents; an empty Text Node will be discarded when its parent
is constructed, if it has a parent.
The content of the Text Node is not necessarily normalized as described in the [Character Model]. It is the responsibility of data producers to provide normalized text, and the responsibility of applications to make sure that operations do not de-normalize text.
The node corresponding to the value of the [parent] property.