3 Data Model Construction

This section describes the constraints on instances of the data model.

The data model supports well-formed XML documents conforming to [Namespaces in XML] or [Namespaces in XML 1.1]. Documents that are not well-formed are, by definition, not XML. XML documents that do not conform to [Namespaces in XML] or [Namespaces in XML 1.1] are not supported (nor are they supported by [Infoset]).

In other words, the data model supports the following classes of XML documents:

This document describes how to construct an instance of the data model from an infoset ([Infoset]) or a Post Schema Validation Infoset (PSVI), the augmented infoset produced by an XML Schema validation episode.

An instance of the data model can also be constructed directly through application APIs, or from non-XML sources such as relational tables in a database. Data model construction from sources other than an Infoset or PSVI is implementation-defined. Regardless of how an instance of the data model is constructed, every node and atomic value in the data model must have a typed-value that is consistent with its type.

The data model supports some kinds of values that are not supported by [Infoset]. Examples of these are document fragments and sequences of Document Nodes. The data model also supports values that are not nodes. Examples of these are sequences of atomic values, or sequences mixing nodes and atomic values. These are necessary to be able to represent the results of intermediate expressions in the data model during expression processing.

3.1 Direct Construction

Although this document describes construction of an instance of the data model in terms of infoset properties, an infoset is not a necessary precondition for building an instance of the data model.

There are no constraints on how an instance of the data model may be constructed directly, save that the resulting instance must satisfy all of the constraints described in this document.

3.2 Construction from an Infoset

An instance of the data model can be constructed from an infoset that satisfies the following general constraints:

An instance of the data model constructed from an information set must be consistent with the description provided for each node kind.

Furthermore, construction of an instance of the data model from an Infoset is only guaranteed to be well-defined for Infosets that could have been derived from a conforming XML document.

3.3 Construction from a PSVI

An instance of the data model can be constructed from a PSVI, whose element and attribute information items have been strictly assessed, laxly assessed, or have not been assessed. Constructing an instance of the data model from a PSVI must be consistent with the description provided in this section and with the description provided for each node kind.

Data model construction requires that the PSVI provide unique names for all anonymous schema types.

Note:

[Schema Part 1] does not require all schema processors to provide unique names for anonymous schema types. In order to build an instance of the data model from a PSVI produced by a processor that does not provide the names, some post-processing will be required in order to assure that they are all uniquely identified before construction begins.

[Definition: An incompletely validated document is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the [validity] property in the PSVI.]

The data model supports incompletely validated documents. Elements and attributes that are not valid are treated as having unknown types.

The most significant difference between Infoset construction and PSVI construction occurs in the area of schema type assignment. Other differences can also arise from schema processing: default attribute and element values may be provided, white space normalization of element content may occur, and the user-supplied lexical form of elements and attributes with atomic schema types may be lost.

3.3.1 Mapping PSVI Additions to Node Properties

A PSVI element or attribute information item may have a [validity] property. The [validity] property may be "valid", "invalid", or "notKnown" and reflects the outcome of schema-validity assessment. In the data model, precise schema type information is exposed for Element and Attribute Nodes that are "valid". Nodes that are not "valid" are treated as if they were simply well-formed XML and only very general schema type information is associated with them.

3.3.1.1 Element and Attribute Node Types

The precise definition of the schema type of an element or attribute information item depends on the properties of the PSVI. In the PSVI, [Schema Part 1] defines a [type definition] property as well as the [type definition namespace], [type definition name] and [type definition anonymous] properties, which are effectively short-cut terms for properties of the type definition. Further, the [element declaration] and [attribute declaration] properties are defined for elements and attributes, respectively. These declarations in turn will identify the [type definition] declared for the element or attribute. To distinguish the [type definition] given in the PSVI for the element or attribute instance from the [type definition] associated with the declaration, the former is referred to below as the actual type and the latter as the declared type of the element or attribute instance in question.

The type depends on the declared type, the actual type, and the [validity] and [validation attempted] properties in the PSVI. If:

  • The [validity] and [validation attempted] properties exist and have the values "valid" and "full", respectively, the schema type of an element or attribute information item is represented by an expanded-QName whose namespace and local name correspond to the first applicable items in the following list:

    • If the declared type exists and is a union and the actual type is (not the same as the declared type, and not a type derived from the declared type, but) one of the member types of the union, or derived from one of its member types:

      • If the {name} property of the declared type is present: the {target namespace} and {name} properties of the declared type.

      • If the {name} property of the declared type is absent: the namespace and local name of the anonymous type name supplied for the declared type.

    • If there is no declared type, and the actual type is a union, then:

      • If the {name} property of the actual type is present: the {target namespace} and {name} properties of the actual type.

      • If the {name} property of the actual type is absent: the namespace and local name of the anonymous type name supplied for the actual type.

    • Otherwise:

      • If [type definition anonymous] is false: the {target namespace} and {name} properties of the actual type.

      • If [type definition anonymous] is true: the namespace and local name of the anonymous type name supplied for the actual type.

  • The [validity] property exists and is "invalid", or the [validation attempted] property exists and is "partial", the schema type of an element is xs:anyType and the type of an attribute is xs:anySimpleType.

  • The [validity] property exists and is "notKnown", the schema type of an element is xs:anyType and the type of an attribute is xs:anySimpleType.

  • The [validity] or [validation attempted] properties do not exist, the schema type of an element is xs:untyped and the type of an attribute is xs:untypedAtomic.

The prefix associated with the type names is implementation-dependent.

3.3.1.2 Typed Value Determination

This section describes how the typed value of an Element or Attribute Node is computed from an element or attribute PSVI information item, where the information item has either a simple type or a complex type with simple content. For other kinds of Element Nodes, see 6.2.4 Construction from a PSVI; for other kinds of Attribute Nodes, see 6.3.4 Construction from a PSVI.

The typed value of Attribute Nodes and some Element Nodes is a sequence of atomic values. The types of the items in the typed value of a node may differ from the type of the node itself. This section describes how the typed value of a node is derived from the properties of an information item in a PSVI.

The types of the items in the typed value of a node are determined as follows. The process begins with a type, T. If the schema type of the node itself, as represented in the PSVI, is a complex type with simple content, then T is the {content type} of the schema type of the node; otherwise, T is the schema type of the node itself. For each primitive or ordinary simple type T, the W3C XML Schema specification defines a function M mapping the lexical representation of a value onto the value itself.

Note:

For atomic and list types, the mapping is the “lexical mapping” defined for T in [Schema Part 2]; for union types, the mapping is the lexical mapping defined in [Schema Part 2] modified as appropriate by any applicable rules in [Schema Part 1]. The mapping, so modified, is a function (in the mathematical sense) which maps to a single value even in cases where the lexical mapping proper maps to multiple values.

The typed value is determined as follows:

  • If the nilled property of the node in question is true, then the typed value is the empty sequence.

  • If T is xs:anySimpleType or xs:anyAtomicType, the typed value is the [schema normalized value] as an instance of xs:untypedAtomic.

  • Otherwise, the typed value is the result of applying M to the string value as an instance of the appropriate value type, where the appropriate value type is the [member type definition] if T is a union type, otherwise it is simply T.

The typed value determination process is guaranteed to result in a sequence of atomic values, each having a well-defined atomic type. This sequence of atomic values, in turn, determines the typed-value property of the node in the data model.

3.3.1.3 Relationship Between Typed-Value and String-Value

Element and attribute nodes have both typed-value and string-value properties. However, implementations are allowed some flexibility in how these properties are stored. An implementation may choose to store the string-value only and derive the typed-value from it, or to store the typed-value only and derive the string-value from it, or to store both the string-value and the typed-value.

In order to permit these various implementation strategies, some variations in the string value of a node are defined as insignificant. Implementations that store only the typed value of a node are permitted to return a string value that is different from the original lexical form of the node content. For example, consider the following element:

<offset xsi:type="xs:integer">0030</offset>

Assuming that the node is valid, it has a typed value of 30 as an xs:integer. An implementation may return either "30" or "0030" as the string value of the node. Any string that is a valid lexical representation of the typed value is acceptable. In this specification, we express this rule by saying that the relationship between the string value of a node and its typed value must be "consistent with schema validation."

If an implementation stores only the string-value of a node, the following considerations apply:

  • Where union types occur, the implementation must be able to deliver the typed-value as an instance of the appropriate member type. For example, if the type of an element node is my:integer-or-string, which is defined as a union of xs:integer and xs:string, and the string-value of the node is "47", the implementation must be able to deliver the typed-value of the node as either the integer 47 or the string "47", depending on which member type validated the element.

  • Where types of xs:QName, xs:NOTATION, or types derived from one of these types occur, the implementation must be able to deliver the typed-value as a triple including a local name, a namespace prefix, and a namespace URI, even though the namespace URI is not part of the string-value (see 3.3.3 QNames and NOTATIONS).

  • Where an element with a complex type and element-only content occurs, it is an error to attempt to access the typed-value of the Element Node.

If an implementation stores only the typed-value of a node, it must be prepared to construct string values from not only the node, but in some cases also the descendants of that node. For example, an element with a complex type and element-only content has no typed-value but does have a string-value that is the concatenation of the string-values of all its Text Node descendants in document order.

A further caveat applies if an implementation stores the typed value of a node. If a new data model is constructed by copying portions of another data model, and the copy operation does not preserve inherited namespaces, and the type is a union type that is sensitive to the namespace context, then the typed value may be different than what would be obtained by revalidating the node within its new namespace context. Although this may stretch the semantics of “consistent with schema validation”, we accept this possibility; it is not an error.

3.3.1.4 Pattern Facets

Creating a subtype by restriction generally reduces the value space of the original schema type. For example, expressing a hat size as a restriction of decimal with a minimum value of 6.5 and maximum value of 8.0 creates a schema type whose valid values are only those in the range 6.5 to 8.0.

The pattern facet is different because it restricts the lexical space of the schema type, not its value space. Expressing a three-digit number as a restriction of integer with the pattern facet “[0-9]{3}” creates a schema type whose valid values are only those with a lexical form consisting of three digits.

The pattern facet is not reversible in practice. A given point in the value space might have several lexical representations. In general, there's no practical way to determine which, if any, of these representations satisfies the pattern facet of the type.

As a consequence, pattern facets are not respected when mapping to an Infoset or during serialization and values in the data model that were originally valid with respect to a schema that contains pattern-based restrictions may be invalid after serialization.

3.3.2 Dates and Times

The date and time types require special attention. This section applies to implementations that store the typed value of xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay, and types that are derived from them. These are known collectively as the date/time types in this specification.

The values of the date/time types are represented in the data model using seven components:

year

An xs:integer.

month

An xs:integer between 1 and 12, inclusive.

day

An xs:integer between 1 and 31, inclusive, possibly restricted further depending on the values of month and year.

hour

An xs:integer between 0 and 23, inclusive.

minute

An xs:integer between 0 and 59, inclusive.

second

An xs:decimal greater than or equal to zero and less than 60. Leap seconds are not supported.

timezone

An xs:dayTimeDuration between -PT14H00M and PT14H00M, inclusive. All timezone values must be an integral number of minutes.

Components that are intrinsic to the datatype (for example, day, month, and year in a xs:date) are required; components that can never be part of a datatype (for example, years in a xs:time) must be missing. Missing components are represented by the empty sequence. When a component is present, it contains the “local value” that has not been normalized in any way. The timezone component is optional for all the date/time datatypes.

Thus, the lexical xs:dateTime representation “2003-01-02T11:30:00-05:00” is stored as “{2003, 1, 2, 11, 30, 0.0, -PT05H00M}”. The value of the lexical representation “2003-01-16T16:30:00” is stored as “{2003, 1, 16, 16, 30, 0, ()}” because it has no timezone. The value of the lexical xs:gDay representation “---30+10:30” is stored as “{(), (), 30, (), (), (), PT10H30M}”.

The lexical form “24:00:00” is normalized in the component model. As a xs:time, it is stored as “{(), (), (), 0, 0, 0.0, ()}” and the xs:dateTime representation “1999-12-31T24:00:00” is stored as “{2000, 1, 1, 0, 0, 0.0, ()}”.

Note:

Implementations are permitted to store date/time values in any representation that's convenient for them, provided that the individual properties can be accessed and modified.

3.3.3 QNames and NOTATIONS

The QName and NOTATION data types require special attention. The following sections apply to xs:QName, xs:NOTATION, and types derived from them. These types are referred to collectively as “qualified names”.

As defined in XML Schema, the lexical space for qualified names includes a local name and an optional namespace prefix. The value space for qualified names contains a local name and an optional namespace URI. Therefore, it is not possible to derive a lexical value from the typed value, or vice versa, without access to some context that defines the namespace bindings.

When qualified names exist as values of nodes in a well-formed document, it is always possible to determine such a namespace context. However, the data model also allows qualified names to exist as freestanding atomic values, or as the name or value of a parentless attribute node, and in these cases no namespace context is available.

In this Data Model, therefore, the value space for qualified names contains a local-name, an optional namespace URI, and an optional prefix. The prefix is used only when producing a lexical representation of the value, that is, when casting the value to a string. The prefix plays no part in other operations involving qualified names: in particular, two qualified names are equal if their local names and namespace URIs match, regardless whether they have the same prefix.

The following consistency constraints apply:

  • If the namespace URI of a qualified name is absent, then the prefix must also be absent.

  • For every element node whose name has a prefix, the prefix must be one that has a binding to the namespace URI of the element name in the namespaces property of the element.

  • For every element node whose name has no prefix, the element must have a a binding for the empty prefix to the namespace URI of the element name, or must have no binding for the empty prefix in the case where the name of the element has no namespace URI.

  • For every attribute node whose name has a prefix, the attribute node must either be parentless, or the prefix must be one that has a binding to the namespace URI of the attribute name in the namespaces property of the parent element.

  • For every qualified name that contains a prefix and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, the prefix must be one that is bound to the namespace URI of the qualified name in the namespaces property of that element.

  • For every qualified name that contains a namespace URI and no prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that element node must have a binding for the empty prefix to that namespace URI in its namespace property.

  • For every qualified name that contains neither a namespace URI nor a prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that node must not have a binding for the empty prefix.

  • No qualified name that contains a prefix may be included in the typed value of an attribute node that has no parent.