|wickensonline.co.uk Retrochallenge 2009 Winter Warmup Entry Documentation|
The HTML documentation for this Retrochallenge entry has all been created using
the VAXstation 4000/90 running OpenVMS. I had a clear idea early on that I
wanted to use the DIGITAL EQUIPMENT CORPORATION (DEC) applications software
package ALLIN1 to create the documentation. ALLIN1 is a terminal based office
package, the word processing part of which is WPS-PLUS. ALLIN1 is described in
detail in another diary entry.
WPS-PLUS Editing Screen
The first step was to determine the output formats supported by WPS-PLUS.
There were three main options: postscript, DEC/ANSI escaped text and plain text.
I wanted to include the bold and underlined text so plain text was out. I looked
into the possibility of formatting the postscript into HTML. The first step was
to convert the postscript to PDF. There is a utility called pstopdf that runs
on top of the ghostscript engine that produces very nice quality pdf from the
postscript produced by ALLIN1. The next step, to convert the PDF to HTML can be
performed by various commercial tools on the internet, including an online
service provided by Adobe. I tried this with the PDF produced by pstopdf
(running on Linux) but it would not recognize the file as a PDF.
The last alternative was to print to a DEC/ANSI escaped text file. This file has
embedded ANSI control sequences to provide indent and formatting instructions.
The typical escape sequences used within the file produced are:
ESC [1m Format text in bold
ESC [4m Format text in underline
ESC [m Turn off all formatting
ESC [37C Move cursor the number of characters between the [ and C
ESC (<# A British pound symbol
ESC (B Selects the character set to use
You can TYPE (OpenVMS) or cat (Linux) this file, in DEC/ANSI escaped format,
to your terminal and it should be processed to include the margins and
bold/underlined text. UNIX MAN pages have a similar level of formatting.
Under OpenVMS, the obvious choice of language to use to write an application to
convert between the ANSI escape sequences and HTML constructs is DECtpu. The
manual describes it as follows:
DECtpu is a high-performance, programmable, text processing utility that
includes the following:
o A high-level procedural language
o A compiler
o An interpreter
o Text manipulation routines
o Integrated display managers for the character-cell terminal and
o The Extensible Versatile Editor (EVE) interface, which is written
o DECTPU is a procedural programming language that enables text
processing tasks; it is not an application.
I modified an example from the excellent manual and after a couple of hours had
the basics of a working solution. The ESCTOHTML utility is called with the
following DCL procedure (DCL is DEC COMMAND LANGUAGE, the equivalent of the
shell environment under UNIX of CMD in Windows):
$ SET VERIFY
$! This command procedure invokes DECTPU without an editor.
$! The file ESCTOHTML.TPU contains the edits to be made.
$! Specify the file to which you want the edits made as p1.
$! Determine .HTML version of filename
$ file = f$parse("''p1'","","","NAME")
$ out_name = "''file'.html"
$ EDIT/TPU/NOSECTION/COMMAND=esctohtmldir:esctohtml.tpu -
This script uses the lexical function f$parse to get the basename of the
filename specified as the first argument, then appends .HTML to get the output
filename. The TPU program esctohtml.tpu is called in the last line.
The TPU program consists of a number of defined procedures and a default
section of commands that forms the main part of the program. The following code
is an excerpt from the replace_spaces() procedure that converts an ESC[37c
DEC/ANSI escape code into a series of HTML non-breaking spaces:
search_pattern := find_opening + UNANCHOR + find_closing;
! Search returns a range if found
src_range := SEARCH (search_pattern, FORWARD);
text_count := SUBSTR (src_range, LENGTH(find_opening)+1, LENGTH(src_range) -
number_of_spaces := INT(text_count);
ERASE (src_range); ! Remove first string
POSITION (END_OF (src_range)); ! Move to right place
space_index := 0;
EXITIF space_index = number_of_spaces;
space_index := space_index + 1;
replacement_count := replacement_count + 1;
This gives you a feel for the language. The loop uses the built-in procedure
SEARCH() to find a search pattern that is effectively a regular expression (the
UNANCHOR keyword is equivalent to a '.*' regular expression).
I also wanted to be able to process embedded HTML hyperlinks and image
definitions. Rather than define a separate mini-language to encode hyperlinks
(such as by using square brackets, e.g.: [ALLIN1,allin1.html]) I decided to use
a sequence of characters curly-brace, vertical bar to replace the angle brackets
'<' and '>' normally used to enclose HTML commands. This gives me the
flexibility to use any HTML I require but also allows me to include code
snippets that contain angle brackets. The only problem of course is including
the characters in a description of the functionality.
The ESCTOHTML.COM DCL command is called from a BUILD.COM script that processes
the VT files produced by ALLIN1 into HTML files then prepends an HTML header
file and appends a HTML footer file to form the pages as seen here. The final
step in processing is the replace a page title keyword defined within the header
with the name of the HTML file. This is used in the navigation bar at the top of
the page and as a page header and description.
The only obvious limitation I've found is that within the endnotes if the URL
spans more than one line (including the description) then the line-end
processing corrupts the URL. The solution is to use the TinyURL service and
ensure that the description is short enough to fit on a line.
1. A description of ALLIN1
2. Adobe Web Conversion Service
6. Tiny URL Service