4 Processing Model

Inclusion as defined in this document is a specific type of [XML Information Set] transformation.

[Definition: The input for the inclusion transformation consists of a source infoset.] [Definition: The output, called the result infoset, is a new infoset which merges the source infoset with the infosets of resources identified by URI references or IRI references appearing in xi:include elements.] Thus a mechanism to resolve URIs or IRIs and return the identified resources as infosets is assumed. Well-formed XML entities that do not have defined infosets (e.g. an external entity with multiple top-level elements) are outside the scope of this specification, either for use as a source infoset or the result infoset.

xi:include elements in the source infoset serve as inclusion transformation instructions. [Definition: The information items located by the xi:include element are called the top-level included items ]. [Definition: The top-level included items together with their attributes, namespaces, and descendants, are called the included items ]. The result infoset is essentially a copy of the source infoset, with each xi:include element and its descendants replaced by its corresponding included items.

4.1 The Include Location

The value of the href attribute, after escaping according to 4.1.1 Escaping of href attribute values, is interpreted as either a URI reference or an IRI reference. The base URI for relative URIs or IRIs is the base URI of the xi:include element as specified in [XML Base]. [Definition: The URI or IRI resulting from resolution of the normalized value of the href attribute (or the empty string if no attribute appears) to absolute URI or IRI form is called the include location.]

The absence of a value for the href attribute, either by the appearance of href="" or by the absence of the href attribute, represents a case which may be incompatible with certain implementation strategies. For instance, an XInclude processor might not have a textual representation of the source infoset to include as parse="text", or it may be unable to access another part of the document using parse="xml" and an xpointer because of streamability concerns. An implementation may choose to treat any or all absences of a value for the href attribute as resource errors. Implementations should document the conditions under which such resource errors occur.

4.1.1 Escaping of href attribute values

The value of this attribute is an XML resource identifer as defined in [XML 1.1] section 4.2.2 "External Entities", which is interpreted as [Definition: an IRI Reference as defined in RFC 3987 [IETF RFC 3987]], after the escaping procedure described in [XML 1.1] section 4.2.2 is applied. If necessary for the implementation, the value may be further converted to a URI reference as described in [XML 1.1].

4.1.2 Using XInclude with Content Negotiation

The use of a mechanism like HTTP [IETF RFC 2616] content negotiation introduces an additional level of potential complexity into the use of XInclude. Developers who use XInclude in situations where content negotiation is likely or possible should be aware of the possibility that they will be including content that may differ structurally from the content they expected, even if that content is XML. For example, a single URI or IRI may variously return a raw XML representation of the resource, an XSL-FO [XSL-FO] representation, or an XHTML [XHTML] representation, as well as versions in different character encodings or languages.

Authors whose XInclude processing depends on the receipt of a particular vocabulary of XML should use the accept and accept-language attributes to increase the probability that the resource is provided in the expected format.

4.2 Included Items when parse="xml"

When parse="xml", the include location is dereferenced, the resource is fetched, and an infoset is created by parsing the resource as if the media type were application/xml (including character encoding determination).

Note:

The specifics of how an infoset is created are intentionally unspecified, to allow for flexibility by implementations and to avoid defining a particular processing model for components of the XML architecture. Particulars of whether DTD or XML schema validation are performed, for example, are not constrained by this specification.

Note:

The character encodings of the including and included resources can be different. This does not affect the resulting infoset, but might need to be taken into account during any subsequent serialization.

Resources that are unavailable for any reason (for example the resource doesn't exist, connection difficulties or security restrictions prevent it from being fetched, the URI scheme isn't a fetchable one, the resource is in an unsupported encoding, or the resource is determined through implementation-specific mechanisms not to be XML) result in a resource error. Resources that contain non-well-formed XML result in a fatal error.

Note:

The distinction between a resource error and a fatal error is somewhat implementation-dependent. Consider an include location returning an HTML document, perhaps as an error page. One processor might determine that no infoset can be created from the resource (by examining the media type, for example) and raise a resource error, enabling fallback behavior. Another processor with no such heuristics might attempt to parse the non-XML resource as XML and encounter a well-formedness (fatal) error.

[Definition: xi:include elements in this infoset are recursively processed to create the acquired infoset. For an intra-document reference (via xpointer attribute) the source infoset is used as the acquired infoset.]

[Definition: The portion of the acquired infoset to be included is called the inclusion target.] The document information item of the acquired infoset serves as the inclusion target unless the xpointer attribute is present and identifies a subresource. XPointers of the forms described in [XPointer Framework] and [XPointer element() scheme] must be supported. XInclude processors optionally support other forms of XPointer such as that described in [XPointer xpointer() Scheme]. An error in the XPointer is a resource error.

The [XPointer xpointer() Scheme] is not specified in terms of the [XML Information Set], but instead is based on the [XPath 1.0] Data Model, because the XML Information Set had not yet been developed. The mapping between XPath node locations and information items is straightforward. However, xpointer() assumes that all entities have been expanded. Thus it is a fatal error to attempt to resolve an xpointer() scheme on a document that contains unexpanded entity reference information items.

The set of top-level included items is derived from the acquired infoset as follows.

4.2.1 Document Information Items

The inclusion target might be a document information item (for instance, no specified xpointer attribute, or an XPointer specifically locating the document root.) In this case, the set of top-level included items is the children of the acquired infoset's document information item, except for the document type declaration information item child, if one exists.

Note:

The XML Information Set specification does not provide for preservation of white space outside the document element. XInclude makes no further provision to preserve this white space.

4.2.2 Multiple Nodes

The inclusion target might consist of more than a single node. In this case the set of top-level included items is the set of information items from the acquired infoset corresponding to the nodes referred to by the XPointer, in the order in which they appear in the acquired infoset.

4.2.3 Range Locations

The inclusion target might be a location set that represents a range or a set of ranges.

Each range corresponds to a set of information items in the acquired infoset. [Definition: An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range.] [Definition: An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range.] By definition, a character information item cannot be partially selected.

The set of top-level included items is the union, in document order with duplicates removed, of the information items either selected or partially selected by the range. The children property of selected information items is not modified. The children property of partially selected information items is the set of information items that are in turn either selected or partially selected, and so on.

4.2.4 Point Locations

The inclusion target might be a location set that represents a point. In this case the set of included items is empty.

4.2.5 Element, Comment, and Processing Instruction Information Items

The inclusion target might be an element node, a comment node, or a processing instruction node, respectively representing an element information item, a comment information item, or a processing instruction information item. In this case the set of top-level included items consists of the information item corresponding to the element, comment, or processing instruction node in the acquired infoset.

4.2.6 Attribute and Namespace Declaration Information Items

It is a fatal error for the inclusion target to be an attribute node or a namespace node.

4.2.7 Inclusion Loops

When recursively processing an xi:include element, it is a fatal error to process another xi:include element with an include location and xpointer attribute value that have already been processed in the inclusion chain.

In other words, the following are all legal:

  • An xi:include element may reference the document containing the include element, when parse="text".

  • An xi:include element may identify a different part of the same local resource (same href, different xpointer).

  • Two non-nested xi:include elements may identify a resource which itself contains an xi:include element.

The following are illegal:

  • An xi:include element pointing to itself or any ancestor thereof, when parse="xml".

  • An xi:include element pointing to any include element or ancestor thereof which has already been processed at a higher level.

4.3 Included Items when parse="text"

When parse="text", the include location is dereferenced and the resource is fetched and transformed to a set of character information items. This feature facilitates the inclusion of working XML examples, as well as other text-based formats.

Resources that are unavailable for any reason (for example the resource doesn't exist, connection difficulties or security restrictions prevent it from being fetched, the URI scheme isn't a fetchable one, or the resource is in an unsupported encoding) result in a resource error.

The encoding of such a resource is determined by:

Byte sequences outside the range allowed by the encoding are a fatal error. Characters that are not permitted in XML documents also are a fatal error.

Each character obtained from the transformation of the resource is represented in the top-level included items as a character information item with the character code set to the character code in ISO 10646 encoding, and the element content whitespace set to false.

When the first character is U+FEFF and is interpreted as a Byte-Order Mark, it should be discarded. It is interpreted as a BOM in UTF-8, UTF-16, and UTF-32 encodings; it is not interpreted as a BOM in the UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE encodings.

The [Character Model] discusses normalization of included text.

4.4 Fallback Behavior

XInclude processors must perform fallback behavior in the event of a resource error, as follows:

If the children of the xi:include element information item in the source infoset contain exactly one xi:fallback element, the top-level included items consist of the information items corresponding to the result of performing XInclude processing on the children of the xi:fallback element. It is a fatal error if there is zero or more than one xi:fallback element.

Note:

Fallback content is not dependent on the value of the parse attribute. The xi:fallback element can contain markup even when parse="text". Likewise, it can contain a simple string when parse="xml".

4.5 Creating the Result Infoset

The result infoset is a copy of the source infoset, with each xi:include element processed as follows:

The information item for the xi:include element is found. [Definition: The parent property of this item refers to an information item called the include parent.] The children property of the include parent is modified by replacing the xi:include element information item with the top-level included items. The parent property of each included item is set to the include parent.

It is a fatal error to attempt to replace an xi:include element appearing as the document (top-level) element in the source infoset with something other than a list of zero or more comments, zero or more processing instructions, and one element.

Some processors may not be able to represent an element's in-scope namespaces property if it does not include bindings for all the prefixes bound in its parent's in-scope namespaces. Such processors may therefore include additional namespace bindings inherited from the include parent in the in-scope namespaces of the included items.

The inclusion history of each top-level included item is recorded in the extension property include history. The include history property is a list of element information items, representing the xi:include elements for recursive levels of inclusion. If an include history property already appears on a top-level included item, the xi:include element information item is prepended to the list. If no include history property exists, then this property is added with the single value of the xi:include element information item.

The included items will all appear in the result infoset. This includes unexpanded entity reference information items if they are present.

Intra-document references within xi:include elements are resolved against the source infoset. The effect of this is that the order in which xi:include elements are processed does not affect the result.

In the following example, the second include always points to the first xi:include element and not to itself, regardless of the order in which the includes are processed. Thus the result of this inclusion is two copies of something.xml, and does not produce an inclusion loop error.

<x xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="something.xml"/>
  <xi:include xpointer="xmlns(xi=http://www.w3.org/2001/XInclude)xpointer(x/xi:include[1])"
              parse="xml"/>
</x>

An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup.

4.5.1 Unparsed Entities

Any unparsed entity information item appearing in the references property of an attribute on the included items or any descendant thereof is added to the unparsed entities property of the result infoset's document information item, if it is not a duplicate of an existing member. Duplicates do not appear in the result infoset.

Unparsed entity items with the same name, system identifier, public identifier, declaration base URI, notation name, and notation are considered to be duplicate. An application may also be able to detect that unparsed entities are duplicate through other means. For instance, the URI resulting from combining the system identifier and the declaration base URI is the same.

It is a fatal error to include unparsed entity items with the same name, but which are not determined to be duplicates.

4.5.2 Notations

Any notation information item appearing in the references property of an attribute in the included items or any descendant thereof is added to the notations property of the result infoset's document information item, if it is not a duplicate of an existing member. Likewise, any notation referenced by an unparsed entity added as described in 4.5.1 Unparsed Entities, is added unless it is a duplicate. Duplicates do not appear in the result infoset.

Notation items with the same name, system identifier, public identifier, and declaration base URI are considered to be duplicate. An application may also be able to detect that notations are duplicate through other means. For instance, the URI resulting from combining the system identifier and the declaration base URI is the same.

It is a fatal error to include notation items with the same name, but which are not determined to be duplicates.

4.5.3 references Property Fixup

During inclusion, an attribute information item whose attribute type property is IDREF or IDREFS has a references property with zero or more element values from the source or included infosets. These values must be adjusted to correspond to element values that occur in the result infoset. During this process, XInclude also corrects inconsistencies between the references property and the attribute type property, which might arise in the following circumstances:

  • A document fragment contains an IDREF pointing to an element in the included document but outside the part being included. In this case there is no element in the result infoset that corresponds to the element value in the original references property.

  • A document or document fragment is not self-contained. That is, it contains IDREFs which do not refer to an element within that document or document fragment, with the intention that these references will be realized after inclusion. In this case, the value of the references property is unknown or has no value.

  • The result infoset has ID clashes - that is, more than one attribute with attribute type ID with the same normalized value. In this case, attributes with attribute type IDREF or IDREFS with the same normalized value might have different values for their references properties.

In resolving these inconsistencies, XInclude takes the attribute type property as definitive. In the result infoset, the value of the references property of an attribute information item whose attribute type property is IDREF or IDREFS is adjusted as follows:

For each token in the normalized value property, the references property contains an element information item with the same properties as the element information item in the result infoset with an attribute with attribute type ID and normalized value equal to the token. The order of the elements in the references property is the same as the order of the tokens appearing in the normalize value. If for any of the token values, no element or more than one element is found, the references property has no value.

4.5.4 Namespace Fixup

The in-scope namespaces property ensures that namespace scope is preserved through inclusion. However, after inclusion, the namespace attributes property might not provide the full list of namespace declarations necessary to interpret qualified names in attribute or element content in the result. It is therefore not recommended that XInclude processors expose namespace attributes in the result. If this is unavoidable, the implementation may add attribute information items to the namespace attributes property in order to approximate the information conveyed by in-scope namespaces.

4.5.5 Base URI Fixup

The base URI property of the acquired infoset is not changed as a result of merging the infoset, and remains unchanged after merging. Thus relative URI references in the included infoset resolve to the same URI despite being included into a document with a potentially different base URI in effect. xml:base attributes are added to the result infoset to indicate this fact.

Each element information item in the top-level included items which has a different base URI than its include parent has an attribute information item added to its attributes property. This attribute has the following properties:

  1. A namespace name of http://www.w3.org/XML/1998/namespace.

  2. A local name of base.

  3. A prefix of xml.

  4. A normalized value equal to either the base URI of the element, or an equivalent URI reference relative to the base URI of the include parent. The circumstances in which a relative URI is desirable, and how to compute such a relative URI, are implementation-dependent.

  5. A specified flag indicating that this attribute was actually specified in the start-tag of its element.

  6. An attribute type of CDATA.

  7. A references property with no value.

  8. An owner element of the information item of the element.

If an xml:base attribute information item is already present, it is replaced by the new attribute.

4.5.6 Language Fixup

While the xml:lang attribute is described as inherited by XML, the XML Information Set makes no provision for preserving the inheritance of this property through document composition such as XInclude provides. This section introduces a language property which records the scope of xml:lang information in order to preserve it during inclusion.

An XInclude processor should augment the source infoset and the acquired infoset by adding the language property to each element information item. The value of this property is the normalized value of the xml:lang attribute appearing on that element if one exists, with xml:lang="" resulting in no value, otherwise it is the value of the language property of the element's parent element if one exists, otherwise the property has no value.

Each element information item in the top-level included items which has a different value of language than its include parent (taking case-insensitivity into account per [IETF RFC 3066]), or that has a value if its include parent is a document information item, has an attribute information item added to its attributes property. This attribute has the following properties:

  1. A namespace name of http://www.w3.org/XML/1998/namespace.

  2. A local name of lang.

  3. A prefix of xml.

  4. A normalized value equal to the language property of the element. If the language property has no value, the normalized value is the empty string.

  5. A specified flag indicating that this attribute was actually specified in the start-tag of its element.

  6. An attribute type of CDATA.

  7. A references property with no value.

  8. An owner element of the information item of the element.

If an xml:lang attribute information item is already present, it is replaced by the new attribute.

Note:

The xml:space attribute is not treated specially by XInclude.

4.5.7 Properties Preserved by the Infoset

As an infoset transformation, XInclude operates on the logical structure of XML documents, not on their text serialization. All properties of an information item described in [XML Information Set] other than those specifically modified by this specification are preserved during inclusion. The include history and language properties introduced in this specification is also preserved. Extension properties such as [XML Schemas] Post Schema Validation Infoset (PSVI) properties are discarded by default. However, an XInclude processor may, at user option, preserve these properties in the resulting infoset if they are correct according to the specification describing the semantics of the extension properties.

For instance, the PSVI validity property describes the conditions of ancestors and descendants. Modification of ancestors and descendants during the XInclude process can render the value of this property inaccurate. By default, XInclude strips this property, but by user option the property could be recalculated to obtain a semantically accurate value. Precisely how this is accomplished is outside the scope of this specification.