6 Functions that manipulate URIs

This section specifies functions that manipulate URI values, either as instances of xs:anyURI or as strings.

Function Meaning
fn:resolve-uri Resolves a relative IRI reference against an absolute IRI.
fn:encode-for-uri Encodes reserved characters in a string that is intended to be used in the path segment of a URI.
fn:iri-to-uri Converts a string containing an IRI into a URI according to the rules of [RFC 3987].
fn:escape-html-uri Escapes a URI in the same way that HTML user agents handle attribute values expected to contain URIs.

6.1 fn:resolve-uri

Summary

Resolves a relative IRI reference against an absolute IRI.

Signatures
fn:resolve-uri(
$relative as xs:string?
) as xs:anyURI?
fn:resolve-uri(
$relative as xs:string?,
$base as xs:string
) as xs:anyURI?
Properties

The one-argument form of this function is ·deterministic·, ·context-dependent·, and ·focus-independent·. It depends on static base URI.

The two-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function is defined to operate on IRI references as defined in [RFC 3987], and the implementation must permit all arguments that are valid according to that specification. In addition, the implementation may accept some or all strings that conform to the rules for (absolute or relative) Legacy Extended IRI references as defined in [Legacy extended IRIs for XML resource identification]. For the purposes of this section, the terms IRI and IRI reference include these extensions, insofar as the implementation chooses to support them.

The following rules apply in order:

  1. If $relative is the empty sequence, the function returns the empty sequence.

  2. If $relative is an absolute IRI (as defined above), then it is returned unchanged.

  3. If the $base argument is not supplied, then:

    1. If the static base URI in the static context is not absent, it is used as the effective value of $base.

    2. Otherwise, a dynamic error is raised: [err:FONS0005].

  4. The function resolves the relative IRI reference $relative against the base IRI $base using the algorithm defined in [RFC 3986], adapted by treating any ·character· that would not be valid in an RFC3986 URI or relative reference in the same way that RFC3986 treats unreserved characters. No percent-encoding takes place.

Error Conditions

The first form of this function resolves $relative against the value of the base-uri property from the static context. A dynamic error is raised [err:FONS0005] if the base-uri property is not initialized in the static context.

A dynamic error is raised [err:FORG0002] if $relative is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable relative reference to use as input to the RFC3986 resolution algorithm extended to handle additional unreserved characters.

A dynamic error is raised [err:FORG0002] if $base is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable IRI to use as input to the chosen resolution algorithm (for example, if it is a relative IRI reference, if it is a non-hierarchic URI, or if it contains a fragment identifier).

A dynamic error is raised [err:FORG0009] if the chosen resolution algorithm fails for any other reason.

Notes

Resolving a URI does not dereference it. This is merely a syntactic operation on two ·strings·.

The algorithms in the cited RFCs include some variations that are optional or recommended rather than mandatory; they also describe some common practices that are not recommended, but which are permitted for backwards compatibility. Where the cited RFCs permit variations in behavior, so does this specification.

Throughout this family of specifications, the phrase "resolving a relative URI (or IRI) reference" should be understood as using the rules of this function, unless otherwise stated.

RFC3986 defines an algorithm for resolving relative references in the context of the URI syntax defined in that RFC. RFC3987 describes a modification to that algorithm to make it applicable to IRIs (specifically: additional characters permitted in an IRI are handled the same way that RFC3986 handles unreserved characters). The LEIRI specification does not explicitly define a resolution algorithm, but suggests that it should not be done by converting the LEIRI to a URI, and should not involve percent-encoding. This specification fills this gap by defining resolution for LEIRIs in the same way that RFC3987 defines resolution for IRIs, that is by specifying that additional characters are handled as unreserved characters.

6.2 fn:encode-for-uri

Summary

Encodes reserved characters in a string that is intended to be used in the path segment of a URI.

Signature
fn:encode-for-uri(
$value as xs:string?
) as xs:string
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $value is the empty sequence, the function returns the zero-length string.

This function applies the URI escaping rules defined in section 2 of [RFC 3986] to the xs:string supplied as $value. The effect of the function is to escape reserved characters. Each such character in the string is replaced with its percent-encoded form as described in [RFC 3986].

Since [RFC 3986] recommends that, for consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings, this function must always generate hexadecimal values using the upper-case letters A-F.

Notes

All characters are escaped except those identified as "unreserved" by [RFC 3986], that is the upper- and lower-case letters A-Z, the digits 0-9, HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", and TILDE "~".

This function escapes URI delimiters and therefore cannot be used indiscriminately to encode "invalid" characters in a path segment.

This function is invertible but not idempotent. This is because a string containing a percent character will be modified by applying the function: for example 100% becomes 100%25, while 100%25 becomes 100%2525.

Examples

The expression fn:encode-for-uri("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean") returns "http%3A%2F%2Fwww.example.com%2F00%2FWeather%2FCA%2FLos%2520Angeles%23ocean". (This is probably not what the user intended because all of the delimiters have been encoded.)

The expression concat("http://www.example.com/", encode-for-uri("~bébé")) returns "http://www.example.com/~b%C3%A9b%C3%A9".

The expression concat("http://www.example.com/", encode-for-uri("100% organic")) returns "http://www.example.com/100%25%20organic".

6.3 fn:iri-to-uri

Summary

Converts a string containing an IRI into a URI according to the rules of [RFC 3987].

Signature
fn:iri-to-uri(
$value as xs:string?
) as xs:string
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $value is the empty sequence, the function returns the zero-length string.

Otherwise, the function converts $value into a URI according to the rules given in Section 3.1 of [RFC 3987] by percent-encoding characters that are allowed in an IRI but not in a URI. If $value contains a character that is invalid in an IRI, such as the space character (see note below), the invalid character is replaced by its percent-encoded form as described in [RFC 3986] before the conversion is performed.

Since [RFC 3986] recommends that, for consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings, this function must always generate hexadecimal values using the upper-case letters A-F.

Notes

The function is idempotent but not invertible. Both the inputs My Documents and My%20Documents will be converted to the output My%20Documents.

This function does not check whether $iri is a valid IRI. It treats it as an ·string· and operates on the ·characters· in the string.

The following printable ASCII characters are invalid in an IRI: "<", ">", " " " (double quote), space, "{", "}", "|", "\", "^", and "`". Since these characters should not appear in an IRI, if they do appear in $iri they will be percent-encoded. In addition, characters outside the range x20-x7E will be percent-encoded because they are invalid in a URI.

Since this function does not escape the PERCENT SIGN "%" and this character is not allowed in data within a URI, users wishing to convert character strings (such as file names) that include "%" to a URI should manually escape "%" by replacing it with "%25".

Examples

The expression fn:iri-to-uri ("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean") returns "http://www.example.com/00/Weather/CA/Los%20Angeles#ocean".

The expression fn:iri-to-uri ("http://www.example.com/~bébé") returns "http://www.example.com/~b%C3%A9b%C3%A9".

6.4 fn:escape-html-uri

Summary

Escapes a URI in the same way that HTML user agents handle attribute values expected to contain URIs.

Signature
fn:escape-html-uri(
$value as xs:string?
) as xs:string
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If $value is the empty sequence, the function returns the zero-length string.

Otherwise, the function escapes all ·characters· except printable characters of the US-ASCII coded character set, specifically the ·codepoints· between 32 and 126 (decimal) inclusive. Each character in $uri to be escaped is replaced by an escape sequence, which is formed by encoding the character as a sequence of octets in UTF-8, and then representing each of these octets in the form %HH, where HH is the hexadecimal representation of the octet. This function must always generate hexadecimal values using the upper-case letters A-F.

Notes

The behavior of this function corresponds to the recommended handling of non-ASCII characters in URI attribute values as described in [HTML 4.0] Appendix B.2.1.

Examples

The expression fn:escape-html-uri("http://www.example.com/00/Weather/CA/Los Angeles#ocean") returns "http://www.example.com/00/Weather/CA/Los Angeles#ocean".

The expression fn:escape-html-uri("javascript:if (navigator.browserLanguage == 'fr') window.open('http://www.example.com/~bébé');") returns "javascript:if (navigator.browserLanguage == 'fr') window.open('http://www.example.com/~b%C3%A9b%C3%A9');".

6.5 Parsing and building URIs

This section specifies functions that parse strings as URIs, to identify their structure, and construct URI strings from their structured representation.

Function Meaning
fn:parse-uri Parses the URI provided and returns a map of its parts.
fn:build-uri Constructs a URI from the parts provided.

The structured representation of a URI is described by the uri-structure-record:

uri-structure-record:
record(
uri?  as xs:string,
scheme?  as xs:string,
authority?  as xs:string,
userinfo?  as xs:string,
host?  as xs:string,
port?  as xs:string,
path?  as xs:string,
query?  as xs:string,
fragment?  as xs:string,
path-segments?  as array(xs:string),
query-segments?  as array(record(key? as xs:string, value? as xs:string, *)),
*
)

The parts of this structure are:

The URI structure record
uri The original URI. This element is returned by fn:parse-uri, but ignored by fn:build-uri.
scheme The URI scheme (e.g., “https” or “file”).
authority The authority portion of the URI (e.g., “example.com:8080”).
userinfo Any userinfo that was passed as part of the authority.
host The host passed as part of the authority (e.g., “example.com”).
port The port passed as part of the authority (e.g., “8080”).
path The path portion of the URI.
query Any query string.
fragment Any fragment identifier.
path-segments Parsed and unescaped path segments.
query-segments Parsed and unescaped query terms
* Additional, information defined structures are allowed.

The segmented forms of the path and query parameters provide convenient access to commonly used information. They’re represented in the map as arrays, instead of sequences, just for the convenience of serializing the structure.

The path, if there is one, is tokenized on “/” characters and each segment is unesaped. Consider the URI http://example.com/path/to/a%2fb. The path portion has to be returned as /path/to/a%2fb because decoding the %2f would change the nature of the path. The unescaped form is easily accessible from the path-segments array:

[
  "",
  "path",
  "to",
  "a/b"
]

Note that the presence or absence of a leading slash on the path will effect whether or not the array begins with an empty string.

The query parameters are similarly decoded. Consider the URI: http://example.com/path?a=1&b=2%264&a=3. Here the decoded form in the query-segments gives quick access to the parameter values:

[
  { "key": "a",
    "value": "1" },
  { "key": "b",
    "value": "2&4" },
  { "key": "a",
    "value": "3" }
]

Note that both keys and values are unescaped and that it’s an array of maps because key values can be repeated, as seen for a in this example.

6.5.1 fn:parse-uri

Summary

Parses the URI provided and returns a map of its parts.

Signature
fn:parse-uri(
$uri as xs:string,
$options as map(*) := map{}
) as uri-structure-record
Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function parses the $uri provided, returning a map containing its constituent parts: scheme, authority components, path, etc. In addition to parsing URIs as defined by [RFC 3986] (and [RFC 3987]), this function also attempts to account for strings that are not valid URIs but that often appear in URI-adjacent spaces, such as file names.

This function is described as a series of transformations over the input string to identify the parts of a URI that are present. Some portions of the URI are identified by matching with a regular expression. This approach is designed to make the description clear and unambiguous, it is not implementation advice.

Begin with a string that is equal to the $uri. If the string contains any backlashes (“\”), replace them with forward slashes (“/”).

If the string matches ^(.*)#([^#]*)$, the string is the first match group and the fragment is the second match group. Otherwise, the string is unchanged and the fragment is the empty sequence.

If the string matches ^(.*)\?([^\?]*)$, the string is the first match group and the query is the second match group. Otherwise, the string is unchanged and the query is the empty sequence.

If the string matches ^[a-zA-Z]:, the scheme is file and the string is unchanged. Otherwise, if the string matches ^([a-zA-Z][A-Za-z0-9\+\-\.]*):(.*)$, the scheme is the first match group and the string is the second match group. If the string does not match either expression, the scheme is the empty sequence and the string is unchanged.

If the string matches ^//*([a-zA-Z]:.*)$, the authority is empty and the string is the first match group. Otherwise, if the string matches ^///*([^/]+)(/.*)?$ then the authority is the first match group and the string is the second match group. If the string does not match either regular expression, the authority is the empty sequence and the string is unchanged.

If the authority matches (([^@]*)@)(.*)(:([^:]*))?$, then the userinfo is match group 2, otherwise userinfo is the empty sequence.

If the authority matches (([^@]*)@)?(.+)(:([^:]*))?$, then the host is match group 3, otherwise host is the empty sequence.

If the authority matches (([^@]*)@)?(.*)(:([^:]*))$, then the port is match group 5, otherwise port is the empty sequence.

If the string is the empty string, then path is the empty sequence, otherwise the path is the whole string.

If the $options map contains a key named “path-separator”, the value of that key is the path separator otherwise the separator is a single slash (“/”). It is a dynamic error XXXX if the key is present and it’s value is not a string of length one.

A path-segments array is constructed as follows: tokenize the string on the path separator, apply uri decoding on each token, and convert the result to an array.

Applying uri decoding replaces all occurrences of plus (“+”) with spaces and all occurrences of %[a-fA-F0-9][a-fA-F0-9] with a single character with the codepoint represented by the two digit hexadecimal number that follows the “%”. In other words, “A%42C” becomes “ABC”. If there are any occurrences of % followed by up to two characters that are not hexadecimal digits, they are replaced by the single character with the codepoint 0xfffd. In other words “A%XYC%Z” becomes “A�C�”.

If the $options map contains a key named “query-separator”, the value of that key is the query separator otherwise the separator is a single ampersand (“&”). It is a dynamic error XXXX if the key is present and it’s value is not a string of length one.

A query-segments is constructed as follows: tokenize the query on the query separator. For each token, construct a map. If the token contains an equal sign (“=”), the map contains a key named key with a value equal to the string preceding the first equal sign and a key named value with a value equal to the string following the first equal sign. If the token does not contain an equal sign, the map contains a single key named value with a value equal to the token. In every case, uri decoding is applied to each value add to the map. The resulting sequence of maps is converted into an array.

The following map is returned:

{
  "uri": $uri,
  "scheme": scheme,
  "authority": authority,
  "userinfo": userinfo,
  "host": host,
  "port": port,
  "path": path,
  "query": query,
  "fragment": fragment,
  "path-segments": path-segments,
  "query-segments": query-segments
}

The map should only be populated with keys that have a non-empty value (keys who’s value is the empty sequence or an empty array should be omitted).

Implementations may implement additional or different rules for URIs that have a scheme or pattern that they recognize. An implementation might choose to parse jar: URIs with special rules, for example, since they extend the syntax in ways not defined by [RFC 3986]. Implementations may add additional keys to the map. The meaning of those keys is implementation-defined.

TODO: In order to better support implementation extensibility, should the keys in the map be QNames with the requirement that implementation-defined keys be in a non-empty namespace?

Error Conditions

An error is raised XXXX if the supplied path separator is not a single character.

An error is raised XXXX if the supplied query separator is not a single character.

Notes

Like fn:resolve-uri, this function handles the additional characters allowed in [RFC 3987] IRIs in the same way that other unreserved characters are handled.

Unlike fn:resolve-uri, this function is not attempting to resolve one URI against another and consequently, the errors that can arise under those circumstances do not apply here. The fn:parse-uri function will accept strings that would raise errors if resolution was attempted, see fn:build-uri.

Examples

In the examples that follow, keys with values that are null, or an empty array, are elided for editorial clarity.

The expression fn:parse-uri("http://qt4cg.org/specifications/xpath-functions-40/Overview.html#parse-uri") returns

map {
  "uri": "http://qt4cg.org/specifications/xpath-functions-40/Overview.html#parse-uri",
  "scheme": "http",
  "authority": "qt4cg.org",
  "host": "qt4cg.org",
  "path": "/specifications/xpath-functions-40/Overview.html",
  "fragment": "parse-uri",
  "path-segments": array { "", "specifications", "xpath-functions-40", "Overview.html" }
}
.

The expression fn:parse-uri("http://www.ietf.org/rfc/rfc2396.txt") returns

map {
  "uri": "http://www.ietf.org/rfc/rfc2396.txt",
  "scheme": "http",
  "authority": "www.ietf.org",
  "host": "www.ietf.org",
  "path": "/rfc/rfc2396.txt",
  "path-segments": array { "", "rfc", "rfc2396.txt" }
}
.

The expression fn:parse-uri("https://example.com/path/to/file") returns

map {
  "uri": "https://example.com/path/to/file",
  "scheme": "https",
  "authority": "example.com",
  "host": "example.com",
  "path": "/path/to/file",
  "path-segments": array { "", "path", "to", "file" }
}
.

The expression fn:parse-uri("https://example.com:8080/path?s=%22hello world%22&sort=relevance") returns

map {
  "uri": "https://example.com:8080/path?s=%22hello world%22&sort=relevance",
  "scheme": "https",
  "authority": "example.com:8080",
  "host": "example.com",
  "port": "080",
  "path": "/path",
  "query": "s=%22hello world%22&sort=relevance",
  "query-segments": array {
    map { "key": "s", "value": """hello world""" },
    map { "key": "sort", "value": "relevance" }
  },
  "path-segments": array { "", "path" }
}
.

The expression fn:parse-uri("https://user@example.com/path/to/file") returns

map {
  "uri": "https://user@example.com/path/to/file",
  "scheme": "https",
  "authority": "user@example.com",
  "userinfo": "user",
  "host": "example.com",
  "path": "/path/to/file",
  "path-segments": array { "", "path", "to", "file" }
}
.

The expression fn:parse-uri("ftp://ftp.is.co.za/rfc/rfc1808.txt") returns

map {
  "uri": "ftp://ftp.is.co.za/rfc/rfc1808.txt",
  "scheme": "ftp",
  "authority": "ftp.is.co.za",
  "host": "ftp.is.co.za",
  "path": "/rfc/rfc1808.txt",
  "path-segments": array { "", "rfc", "rfc1808.txt" }
}
.

The expression fn:parse-uri("file:////uncname/path/to/file") returns

map {
  "uri": "file:////uncname/path/to/file",
  "scheme": "file",
  "authority": "uncname",
  "host": "uncname",
  "path": "/path/to/file",
  "path-segments": array { "", "path", "to", "file" }
}
.

The expression fn:parse-uri("file:///c:/path/to/file") returns

map {
  "uri": "file:///c:/path/to/file",
  "scheme": "file",
  "path": "c:/path/to/file",
  "path-segments": array { "c:", "path", "to", "file" }
}
.

The expression fn:parse-uri("file:/C:/Program%20Files/test.jar") returns

map {
  "uri": "file:/C:/Program%20Files/test.jar",
  "scheme": "file",
  "path": "C:/Program%20Files/test.jar",
  "path-segments": array { "C:", "Program Files", "test.jar" }
}
.

The expression fn:parse-uri("file:\\c:\path\to\file") returns

map {
  "uri": "file:\\c:\path\to\file",
  "scheme": "file",
  "path": "c:/path/to/file",
  "path-segments": array { "c:", "path", "to", "file" }
}
.

The expression fn:parse-uri("file:\c:\path\to\file") returns

map {
  "uri": "file:\c:\path\to\file",
  "scheme": "file",
  "path": "c:/path/to/file",
  "path-segments": array { "c:", "path", "to", "file" }
}
.

The expression fn:parse-uri("c:\path\to\file") returns

map {
  "uri": "c:\path\to\file",
  "scheme": "file",
  "path": "c:/path/to/file",
  "path-segments": array { "c:", "path", "to", "file" }
}
.

The expression fn:parse-uri("/path/to/file") returns

map {
  "uri": "/path/to/file",
  "path": "/path/to/file",
  "path-segments": array { "", "path", "to", "file" }
}
.

The expression fn:parse-uri("#testing") returns

map {
  "uri": "#testing",
  "path": "",
  "fragment": "testing"
}
.

The expression fn:parse-uri("?q=1") returns

map {
  "uri": "?q=1",
  "path": "",
  "query": "q=1",
  "query-segments": array {
    map { "key": "q", "value": "1" }
  }
}
.

The expression fn:parse-uri("ldap://[2001:db8::7]/c=GB?objectClass?one") returns

map {
  "uri": "ldap://[2001:db8::7]/c=GB?objectClass?one",
  "scheme": "ldap",
  "authority": "[2001:db8::7]",
  "host": "[2001:db8::7]",
  "path": "/c=GB",
  "query": "objectClass?one",
  "query-segments": array {
    map { "value": "objectClass?one" }
  },
  "path-segments": array { "", "c=GB" }
}
.

The expression fn:parse-uri("mailto:John.Doe@example.com") returns

map {
  "uri": "mailto:John.Doe@example.com",
  "scheme": "mailto",
  "path": "John.Doe@example.com",
  "path-segments": array { "John.Doe@example.com" }
}
.

The expression fn:parse-uri("news:comp.infosystems.www.servers.unix") returns

map {
  "uri": "news:comp.infosystems.www.servers.unix",
  "scheme": "news",
  "path": "comp.infosystems.www.servers.unix",
  "path-segments": array { "comp.infosystems.www.servers.unix" }
}
.

The expression fn:parse-uri("tel:+1-816-555-1212") returns

map {
  "uri": "tel:+1-816-555-1212",
  "scheme": "tel",
  "path": "+1-816-555-1212",
  "path-segments": array { " 1-816-555-1212" }
}
.

The expression fn:parse-uri("telnet://192.0.2.16:80/") returns

map {
  "uri": "telnet://192.0.2.16:80/",
  "scheme": "telnet",
  "authority": "92.0.2.16:80",
  "host": "92.0.2.16",
  "port": "0",
  "path": "/",
  "path-segments": array { "", "" }
}
.

The expression fn:parse-uri("urn:oasis:names:specification:docbook:dtd:xml:4.1.2") returns

map {
  "uri": "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",
  "scheme": "urn",
  "path": "oasis:names:specification:docbook:dtd:xml:4.1.2",
  "path-segments": array { "oasis:names:specification:docbook:dtd:xml:4.1.2" }
}
.

The expression fn:parse-uri("tag:textalign.net,2015:ns") returns

map {
    "uri": "tag:textalign.net,2015:ns",
    "scheme": "tag",
    "path": "textalign.net,2015:ns",
    "path-segments": [ "textalign.net,2015:ns" ]
  }
.

The expression fn:parse-uri("tag:jan@example.com,1999-01-31:my-uri") returns

map {
    "uri": "tag:jan@example.com,1999-01-31:my-uri"
    "scheme": "tag",
    "path": "jan@example.com,1999-01-31:my-uri",
    "path-segments": [ "jan@example.com,1999-01-31:my-uri" ],
}
.

This example uses the algorithm described above, not an algorithm that is specifically aware of the jar: scheme.

The expression fn:parse-uri("jar:file:/C:/Program%20Files/test.jar!/foo/bar") returns

map {
  "uri": "jar:file:/C:/Program%20Files/test.jar!/foo/bar",
  "scheme": "jar",
  "path": "file:/C:/Program%20Files/test.jar!/foo/bar",
  "path-segments": array { "file:", "C:", "Program Files", "test.jar!", "foo", "bar" }
}
.

This example demonstrates that parsing the URI treats non-URI characters in lexical IRIs as “unreserved characters”. The rationale for this is given in the description of fn:resolve-uri.

The expression fn:parse-uri("http://www.example.org/Dürst") returns

map {
    "uri": "http://www.example.org/Dürst",
    "scheme": "http",
    "authority": "www.example.org",
    "host": "www.example.org",
    "path": "/Dürst",
    "path-segments": [ "","Dürst" ]
}
.

This example demonstrates a non-standard query separator.

The expression fn:parse-uri("https://example.com:8080/path?s=%22hello world%22;sort=relevance", map { "query-separator": ";" }) returns

map {
  "uri": "https://example.com:8080/path?s=%22hello world%22;sort=relevance",
  "scheme": "https",
  "authority": "example.com:8080",
  "host": "example.com",
  "port": "080",
  "path": "/path",
  "query": "s=%22hello world%22;sort=relevance",
  "query-segments": array {
    map { "key": "s", "value": """hello world""" },
    map { "key": "sort", "value": "relevance" }
  },
  "path-segments": array { "", "path" }
}
.

This example uses an invalid query separator so raises an error.

The expression fn:parse-uri("https://example.com:8080/path?s=%22hello world%22;;sort=relevance", map { "query-separator": ";;" }) raises error FOXX0000.

History

Proposed on 17 Oct 2022 to resolve issue #72. Accepted in principle on 15 Nov 2022, with some details still to be resolved.

6.5.2 fn:build-uri

Summary

Constructs a URI from the parts provided.

Signature
fn:build-uri(
$parts as uri-structure-record,
$options as map(*) := map{}
) as xs:string
Properties

This function is ·deterministic·, ·context-dependent·, and ·focus-independent·.

Rules

A URI is composed from a scheme, authority, path, query, and fragment. These components are derived from the contents of the $parts map in the following way:

If the scheme key is present in the map, the URI begins with the value of that key concatenated with //, otherwise it begins //.

If any of userinfo, host, or port are present in the map, the following authority is added to the URI under construction:

concat((if (exists($parts?userinfo)) then $parts?userinfo || "@" else ""),
       $host,
       (if (exists($parts?port)) then ":" || $parts?port else ""))

If none of userinfo, host, or port is present, and authority is present, the value of the authority key is added to the URI.

If the path-segments key exists in the map, then the path is constructed with string-join($parts?path-segments ! encode-for-uri(.), "/"), otherwise the value of the path key is used. If the path value is the empty sequence, the empty string is used for the path. The path is added to the URI.

If the query-segments key exists in the map, then a sequence of strings is constructed from each segment in turn. If the segment contains both a key and a value, the string is the concatenation of the value of the key, an equal sign (“=”), and the value of the value. If it contains only one of those keys, then it is the value of that key. If it contains neither, it is ignored. The query is constructed by joining the resulting strings into a single string, separated by ampersands (“&”). If the query-segments key does not exist in the map, but the query key does, then the query is the value of the query key. If there’s a query, it is added to the URI with a preceding question mark (“?”).

If the fragment key exists in the map, then the value of that key is added to the URI with a preceding hash mark (“#”).

The resulting URI is returned.

Examples

The expression fn:build-uri(map { "scheme": "https", "host": "qt4cg.org", "port": (), "path": "/specifications/index.html" }) returns https://qt4cg.org/specifications/index.html.

History

Proposed on 17 Oct 2022 to resolve issue #72. Accepted in principle on 15 Nov 2022, with some details still to be resolved.