2. p:directory-list

The p:directory-list step produces a list of the contents of a specified directory.

<p:declare-step type="p:directory-list">
     <p:output port="result" content-type="application/xml"/>
     <p:option name="path" required="true" as="xs:anyURI"/>        
     <p:option name="detailed" as="xs:boolean" select="false()"/>  
     <p:option name="max-depth" as="xs:string?" select="'1'"/>     
     <p:option name="include-filter" as="xs:string*"/>             
     <p:option name="exclude-filter" as="xs:string*"/>             
     <p:option name="override-content-types" as="array(array(xs:string))?"/>
</p:declare-step>

Conformant processors must support directory paths whose scheme is file. It is implementation-defined what other schemes are supported by p:directory-list, and what the interpretation of ‘directory’, ‘file’ and ‘contents’ is for those schemes. It is a dynamic error (err:XC0090) if an implementation does not support directory listing for a specified scheme.

If path is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:directory-list in the case of a syntactic shortcut value). It is a dynamic error (err:XD0064) if the base URI is not both absolute and valid according to [RFC 3986]. It is a dynamic error (err:XC0017) if the absolute path does not identify a directory. It is a dynamic error (err:XC0012) if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

If the detailed option is true, the pipeline author is requesting additional information about the matching entries, see Section 2.1, “Directory list details”.

The max-depth option may contain either the string “unbounded” or a string that may be cast to a non-negative integer. An integer value of 0 means that only information about the directory that is given in the path option is returned. A max-depth of 1, which is the default, will effect that also information about the top-level directory’s immediate children will be included. For larger values of max-depth, also the content of directories will be considered recursively up to the maximum depth, and it will be included as children of the corresponding c:directory elements.

If present, the value of the include-filter or exclude-filter option must be a sequence of strings, each one representing a regular expressions as specified in [XPath and XQuery Functions and Operators 3.1], section 7.61 “Regular Expression Syntax”. It is a dynamic error (err:XC0147) if a specified value is not a valid XPath regular expression.

The regular expressions will be matched against an item’s file system path relative to the top-level path that was given in the path option. If the item is a directory, a trailing slash will be appended. The matching is done unanchored: it is a match if the regular expression matches part of the relative item’s file system path. Informally: matching behaves like applying the XPath matches#2 function, like in matches($path, $regular-expression).

Examples: A file file.txt in the directory specified by path will remain file.txt, a relative path dir1/file.txt will remain dir1/file.txt, while a relative path dir1/dir2 will become dir1/dir2/ if dir2 is a directory.

Regular expressions that match a/a/b/file.txt are, for example, ^/(\w+/){2,3}.+\.txt$, a/a/b/, or /file\.[^/]+$.

If any include-filter pattern matches the slash-augmented relative path, the entry is included in the output. If a directory’s path matches the inclusion regex, the directory’s content will not automatically be included, too. They need to match, the regular expression, too. So the filter regex ^dir/ will match the directory content but ^dir/$ won’t, and as a consequence the directory’s content will not be included in the result.

If a relative path is matched by an include filter, all its ancestor directories starting from the initial directory (but not their content if not included explicitly) will be included, too.

Example 1. Sample Directory List Output for a Single File

For a file a/a/b/file.txt below the initial directory /home/jane, this output will be produced, omitting content that might be present in the intermediate directories:

<c:directory xml:base="file:///home/jane/" name="jane">
  <c:directory xml:base="a/" name="a">
    <c:directory xml:base="a/" name="a">
      <c:directory xml:base="b/" name="b">
        <c:file xml:base="file.txt" name="file.txt"/>
      </c:directory>
    </c:directory>
  </c:directory>
</c:directory>

If the exclude-filter pattern matches the slash-augmented relative path, the entry (and all of its content in case of a directory) is excluded in the output.

If both options are provided, the include filter is processed first, then the exclude filter. As a result, an item is included if it matches (at least) one of the include-filter values and none of the exclude-filter values.

If no include-filter is given, that is, if include-filter is an empty sequence, any item will be included in the result (unless it is excluded by exclude-filter).

Note

There is no way to specify a list of values using attribute value templates. If the option shortcut syntax is used to provide the include-filter or exclude-filter option, it will consist of a single regular expression. To specify a list of regular expressions, you must use the p:with-option syntax.

The override-content-types option can be used to partially override the content-type determination mechanism. This works just like with the override-content-types option of p:archive-manifest and p:unarchive, except that the regular expression matching is done against the paths as used for the matching of the include-filter and exclude-filter options.

The result document produced for the specified directory path has a c:directory document element whose base URI, attached as an xml:base attribute, is the absolute directory path (expressed as a URI that ends in a slash) and whose name attribute (without a trailing slash) is the last segment of the directory path. The same base URI is attached as the resulting document’s base-uri property and, accordingly, as its document node’s base URI.

<c:directory
  name = string>
    (
  size = integer
  readable = boolean
  writable = boolean
  last-modified = dateTime
  hidden = boolean)*,
    (c:file |
     c:directory |
     c:other)*
</c:directory>

Its contents are determined as follows, based on the entries in the directory identified by the directory path. For each entry in the directory and subject to the rules that are imposed by the max-depth, include-filter, and exclude-filter options, a c:file, a c:directory, or a c:other element is produced, as follows:

  • A c:directory is produced for each subdirectory not determined to be special. Depending on the values of the three options, it may contain child elements for the directory’s content.

  • A c:file is produced for each file not determined to be special.

    <c:file
      name = string
      content-type? = ContentType>
        (
      size = integer
      readable = boolean
      writable = boolean
      last-modified = dateTime
      hidden = boolean)*
    </c:file>

  • Any file or directory determined to be special by the p:directory-list step may be output using a c:other element but the criteria for marking a file as special are implementation-defined.

    <c:other
      name = string>
        (
      size = integer
      readable = boolean
      writable = boolean
      last-modified = dateTime
      hidden = boolean)*
    </c:other>

Each of the elements c:file, c:directory, and c:other has a name attribute, whose value is a relative IRI reference, giving the (local) file or directory name.

Each of these element also contains the corresponding resource’s URI in an xml:base attribute, which may be a relative URI for any but the top-level c:directory element. In the case of c:directory, it must end in a trailing slash. This way, users will always be able to compute the absolute URI for any of these elements by applying fn:base-uri() to it.

2.1. Directory list details

If detailed is false, then only the name and xml:base attributes are expected on c:file, c:directory, or c:other elements.

If detailed is true, then the pipeline author is expecting additional details about each entry. The following attributes should be provided by the implementation:

content-type

The content-type attribute contains the content type of the respective file. The value “application/octet-stream” will be used if the processor is not able to identify another content type.

readable

true” if the entry is readable.

writable

true” if the entry is writable.

hidden

true” if the entry is hidden.

last-modified

The last modification time of the entry, expressed as a lexical xs:dateTime in UTC.

size

The size of the entry in bytes.

The precise meaning of these properties are implementation-defined and may vary according to the URI scheme of the path. If the value of an attribute is “false” or if it has no meaningful value, the attribute may be omitted.

Any other attributes on c:file, c:directory, or c:other are implementation-defined.

Document properties

Besides the content-type property, the resulting document has a base-uri. Its value is identical to the top-level element’s xml:base attribute, that is, to the directory’s URI.