NOTE: SOME FUNCTIONALITY EMPLOYS JAVASCRIPT WASD Web Environment – Document Access and Specification

WASD Web Environment

2.Document Access and Specification

2.1Document Content Type
2.2Explicitly Specifying Content-Type
2.3Document Specification
2.3.1Absolute File Path
2.3.2Partial (or Relative) File Path
2.4Extended File Specifications (ODS-5)
2.4.1Characters In Request Paths
2.4.2Characters In Server-Generated Paths
2.4.3Document Cache

Arbitrary documents may not be accessed.

The server can only access files where the path is allowed according to a specified set of rules specified within the web environment.

Documents must be read-accessible.

The server can only access files that are world readable, or that have an ACL specifically controlling access for "HTTP$SERVER", the server account.

2.1Document Content Type

Document (file) retrieval is initiated by providing the server with the file specification as a URL path. Server configuration determines the format in which the file is returned to the client. It may contain text or images immediately diplayable by the browser, or by a viewer external to the browser may be spawned. The server may automatically activate a script to provide a gateway to non-native information (see description of [AddType] configuration directive in the Technical Overview). The file type (extension) determines the content type by which the server returns (and/or interprets) the file.

The following table lists some of the current file types (as examples) and their associated MIME-style content type. HTML documents are presented layed-up according to the full HTML-capabilities of the browser. Plain-text documents are presented in a fixed-font format. Other types require an external viewer to be activated. Here are a few examples.

.BKB Bookreader document (BNU) text/html, gateway script activated .BKS Bookreader shelf (BNU) text/html, gateway script activated .C C source text/plain .COM DCL procedure text/plain .CONF configuration file text/plain .CPP C++ source text/plain .DECW$BOOK Bookreader document text/html, gateway script activated .FOR Fortran source text/plain .GIF GIF image image/gif .H C header text/plain .HLB VMS Help library text/html, gateway script activated .HTML HyperText Markup Language text/html .HTM HyperText Markup Language text/html .JPG JPEG image image/jpeg .LIS Listing text/plain .MAR Macro source text/plain .PAS Pascal source text/plain .PRO IDL source text/plain .PS PostScript application/PostScript .TEXT Text text/plain .TLB VMS text library text/html, gateway script activated .TXT Text text/plain .SHTML HyperText Markup Language pre-processed text/html .ZIP zipped file application/binary

If other file types are required to be defined contact the Web administrator.

2.2Explicitly Specifying Content-Type

When accessing files it is possible to explicitly specify the identifying content-type to be returned to the browser in the HTTP response header. Of course this does not change the actual content of the file, just the header content-type! This is primarily provided to allow access to plain-text documents that have obscure, non-"standard" or non-configured file extensions.

It could also be used for other purposes, "forcing" the browser to accept a particular file as a particular content-type. This can be useful if the extension is not configured (as mentioned above) or in the case where the file contains data of a known content-type but with an extension conflicting with an already configured extension specifying data of a different content-type.

Enter the file path into the browser's URL specification field ("Location:", "Address:"). Then, for plain-text, append the following query string:

?httpd=content&type=text/plain
/wasd_root/wasdoc/env/file.unknown

For another content-type substitute it appropriately. For example, to retrieve a text file in binary (why I can't imagine :^) use

?httpd=content&type=application/octet-stream
/wasd_root/wasdoc/env/file.unknown?httpd=content&type=text/plain

It is also posssible to "force" the content-type for all files in a particular directory. See 3.3.14 Specifying Content-Type.

Ignored Content-Type

Even then some browsers and/or some operating systems and/or some version combinations insist on ignoring the response header specified content-type and instead seem to second-guess (often incorrectly) based on the file name extension. A common example is the content of DCL procedures on Windows and up-until-fairly-recent versions of Internet Explorer.

Faux Extension

Notwithstanding, if a '$' and then a second extension is appended to the URI this is often sufficient to coerce the browser into displaying the content associated with the bogus extension. In the case of DCL procedure access on a Windows platform try using "$.txt", or for other purposes whatever extension fits the requirement, as in the following examples.

/wasd_root/src/build_all.com$.txt /wasd_root/src/build_all.com$.anythingatall
WASD specially handles a URI in this format when the requested resource is not found by internally stripping the "$." extension and attempting to access the resultant file name again. This technique works for file based resources and not for scripts, etc.

2.3Document Specification

For the "http:" protocol, file and directory locations are specified using URL path syntax where slash-separated ("/") elements delineate a hierarchy leading to a data item. Anyone familiar with the syntax of the Unix file system, or the MS-DOS file system (where back-slashes are hierarchy delimiters), will feel at home with URL syntax. Specifications under VMS are not case-sensitive.

A VMS directory specification

WEB:[TECHNICAL.HTML-PRIMER]
would be represented in URL syntax as
/web/technical/html-primer/
and a VMS file specification
WEB:[TECHNICAL.HTML-PRIMER]HTML-PRIMER.HTML
represented as
/web/technical/html-primer/html-primer.html
Master file directory [000000]
[000000]

It is not required (although not forbidden) to supply a VMS master file directory component ("[000000]", "[000000.", etc.) in a URL specification. Hence the file specification
WEB:[000000]HOME.HTML
should be represented as
/web/home.html

2.3.1Absolute File Path

A file may be specified using an absolute, or full path. This must specify the location of the file exactly. Absolute paths always begin with a forward-slash ("/"). For example:

/web/committee/minutes/1994/1994-09-27.txt /web/committee/constitution.txt /web/committee/membership/fred-bloggs.txt

2.3.2Partial (or Relative) File Path

(Strictly speaking, it is a function of the client to construct a full URL from such a relative URL before sending the request to the server.)

A file may be specified relative to its current location. That is, a current document (or menu) may specify another document file relative to itself. This may be at the current level, a subdirectory, or in another part of the directory tree related to the current. Relative paths never begin with forward-slash ("/").

For example, documents at the same level as the current may be specified without any hierachy being indicated:

1994-07-22.txt 1994-08-24.txt 1994-09-27.txt

Documents at an inferior point in the hierarchy may be specified as in the following example:

1993/1993-02-17.txt 1993/reports/membership.txt other/etc.txt

Documents in a related part of the hierarchy may be referenced using the "./" construct. As with MS-DOS and Unix this syntax indicates the immediately superior directory.

../other_committee/1993/1993-02-17.txt ../other_committee/1993/reports/balance-sheet.txt ../../other_section/committee/constitution.txt

2.4Extended File Specifications (ODS-5)

OpenVMS Alpha V7.2 introduced a new on-disk file system structure, ODS-5. This brings to VMS in general, and WASD and other Web servers in particular, a number of issues regarding the handling of characters previously not encountered during (ODS-2) file system activities.

2.4.1Characters In Request Paths

There is a standard for characters used in HTTP requests paths and query strings (URLs). This includes conventions for the handling of reserved characters, for example "?", "+", "&", "=" that have specific meanings in a request, characters that are completely forbidden, for example white-space, control characters (0x00 to 0x1f), and others that have usages by convention, for example the "~", commonly used to indicate a username mapping. The request can otherwise contain these characters provided they are URL-encoded (i.e. a percentage symbol followed by two hexadecimal digits representing the hexadecimal-encoded character value).

There is also an RMS standard for handling characters in extended file specifications, some of which are forbidden in the ODS-2 file naming conventions, and others which have a reserved meaning to either the command-line interpreter (e.g. the space) or the file system structure (e.g. the ":", "[", "]" and "."). Generally the allowed but reserved characters can be used in ODS-5 file names if escaped using the "^" character. For example, the ODS-2 file name "THIS_AND_THAT.TXT" could be named "This^_^&^_That.txt" on an ODS-5 volume. More complex rules control the use of character combinations with significance to RMS, for instance multiple periods. The following file name is allowed on an ODS-5 volume, "A-GNU-zipped-TAR-archive^.tar.gz", where the non-significant period has been escaped making it acceptable to RMS.

The WASD server will accept request paths for file specifications in both formats, URL-encoded and RMS-escaped. Of course characters absolutely forbidden in request paths must still be URL-encoded, the most obvious example is the space. RMS will accept the file name "This^ and^ that.txt" (i.e. containing escaped spaces) but the request path would need to be specified as "This%20and%20that.txt", or possibly "This^%20and^%20that.txt" although the RMS escape character is basically redundant.

Unlike for ODS-2 volumes, ODS-5 volumes do not have "invalid" characters, so unlike with ODS-2 no processing is performed by the server to ensure RMS compliance.

2.4.2Characters In Server-Generated Paths

When the server generates a path to be returned to the browser, either in a viewable page such as a directory listing or error message, or as a part of the HTTP transaction such as a redirection, the path will contain the URL-encoded equivalent of the canonical form of an extended file specification escaped character. For example, the file name "This^_and^_that.txt" will be represented by "This%20and%20that.txt".

When presenting a file name in a viewable page the general rule is to also provide this URL-equivalent of the unescaped file name, with a small number of exceptions. The first is a directory listing where VMS format has been requested by including a version component in the request file specification. The second is in similar fashion, but with the tree facility, displaying a directory tree. The third is in the navigation page of the UPDate menu. In all of the instances the canonical form of the extended file specification is presented (although any actual reference to the file is URL-encoded as described above).

2.4.3Document Cache

The Web server is most commonly set up to cache static documents (files). A cache is higher speed storage, in-memory, in the server itself. Cached documents are checked periodically for changes when being requested. Changes to a file are determined by the comparing the modification date/time and file length. A common check period is one minute, though it can set longer or even disabled. If a document has changed the old one is discarded from cache (called invalidation) and the new one loaded into cache while being transfered to the client.

After making changes to a document it is possible the server will continue to serve the old one for a short period. This can be overridden by using the browser's Reload facility. This directs the server to go and check the on-disk file regardless, invalidating it if necessary.