Retrochallenge 2009
      Mark Wickens
      27-Dec-2008 08:28
                                        Documentation

      The HTML documentation for this Retrochallenge entry has all been created using
      the VAXstation 4000/90 running OpenVMS. I had a clear idea early on that I
      wanted to use the DIGITAL EQUIPMENT CORPORATION (DEC) applications software
      package ALLIN1 to create the documentation. ALLIN1 is a terminal based office
      package, the word processing part of which is WPS-PLUS. ALLIN1 is described in
      detail in another diary entry[1].

      WPS-PLUS Editing Screen

                                   WPS-PLUS Editing Screen

      The first step was to determine the output formats supported by WPS-PLUS.
      There were three main options: postscript, DEC/ANSI escaped text and plain text.
      I wanted to include the bold and underlined text so plain text was out. I looked
      into the possibility of formatting the postscript into HTML. The first step was
      to convert the postscript to PDF. There is a utility called pstopdf that runs
      on top of the ghostscript engine that produces very nice quality pdf from the
      postscript produced by ALLIN1. The next step, to convert the PDF to HTML can be
      performed by various commercial tools on the internet, including an online
      service provided by Adobe[2]. I tried this with the PDF produced by pstopdf
      (running on Linux) but it would not recognize the file as a PDF.

      The last alternative was to print to a DEC/ANSI escaped text file. This file has
      embedded ANSI control sequences to provide indent and formatting instructions.
      The typical escape sequences used within the file produced are:

      ESC [1m   Format text in bold
      ESC [4m   Format text in underline
      ESC [m    Turn off all formatting
      ESC [37C  Move cursor the number of characters between the [ and C
      ESC (<#   A British pound symbol
      ESC (B    Selects the character set to use

      You can TYPE (OpenVMS) or cat (Linux) this file[3], in DEC/ANSI escaped format,
      to  your terminal and it should be processed to include the margins and
      bold/underlined text. UNIX MAN pages have a similar level of formatting.

      Under OpenVMS, the obvious choice of language to use to write an application to
      convert between the ANSI escape sequences and HTML constructs is DECtpu. The
      manual describes it as follows:

      DECtpu is a high-performance, programmable, text processing utility that
      includes the following:

         o  A high-level procedural language
         o  A compiler
         o  An interpreter
         o  Text manipulation routines
         o  Integrated display managers for the character-cell terminal and
            DECwindows environments
         o  The Extensible Versatile Editor (EVE) interface, which is written
            in DECTPU
         o  DECTPU is a procedural programming language that enables text
            processing tasks; it is not an application.

      I modified an example from the excellent manual and after a couple of hours had
      the basics of a working solution. The ESCTOHTML utility is called with the
      following DCL procedure[4] (DCL is DEC COMMAND LANGUAGE, the equivalent of the
      shell environment under UNIX of CMD in Windows):

      _______________________________________________________________________________

      $ SET VERIFY
      $! This command procedure invokes DECTPU without an editor.
      $! The file ESCTOHTML.TPU contains the edits to be made.
      $! Specify the file to which you want the edits made as p1.
      $!
      $! Determine .HTML version of filename
      $ file = f$parse("''p1'","","","NAME")
      $ out_name = "''file'.html"
      $ EDIT/TPU/NOSECTION/COMMAND=esctohtmldir:esctohtml.tpu -
           /OUTPUT='out_name'/NODISPLAY 'p1'
      _______________________________________________________________________________

      This script uses the lexical function f$parse to get the basename of the
      filename specified as the first argument, then appends .HTML to get the output
      filename. The TPU program esctohtml.tpu is called in the last line.

      The TPU program[5] consists of a number of defined procedures and a default
      section of commands that forms the main part of the program. The following code
      is an excerpt from the replace_spaces() procedure that converts an ESC[37c
      DEC/ANSI escape code into a series of HTML non-breaking spaces:

      _______________________________________________________________________________

      LOOP
         search_pattern := find_opening + UNANCHOR + find_closing;
         ! Search returns a range if found
         src_range := SEARCH (search_pattern, FORWARD);  

         text_count := SUBSTR (src_range, LENGTH(find_opening)+1, LENGTH(src_range) -
                       LENGTH(find_closing)-2);

         MESSAGE(text_count);
         number_of_spaces := INT(text_count);
         ERASE (src_range);                           ! Remove first string
         POSITION (END_OF (src_range));               ! Move to right place
         space_index := 0;

         LOOP
            EXITIF space_index = number_of_spaces;
            COPY_TEXT (replace_string);
            space_index := space_index + 1;
         ENDLOOP;

         replacement_count := replacement_count + 1;
      ENDLOOP;
      _______________________________________________________________________________

      This gives you a feel for the language. The loop uses the built-in procedure
      SEARCH() to find a search pattern that is effectively a regular expression (the
      UNANCHOR keyword is equivalent to a '.*' regular expression).  

      I also wanted to be able to process embedded HTML hyperlinks and image
      definitions. Rather than define a separate mini-language to encode hyperlinks
      (such as by using square brackets, e.g.: [ALLIN1,allin1.html]) I decided to use
      a sequence of characters curly-brace, vertical bar to replace the angle brackets
      '<'  and '>' normally used to enclose HTML commands. This gives me the
      flexibility to use any HTML I require but also allows me to include code
      snippets that contain  angle brackets. The only problem of course is including
      the characters in a description of the functionality.

      The ESCTOHTML.COM DCL command is called from a BUILD.COM script that processes
      the VT files produced by ALLIN1 into HTML files then prepends an HTML header
      file and appends a HTML footer file to form the pages as seen here. The final
      step in processing is the replace a page title keyword defined within the header
      with the name of the HTML file. This is used in the navigation bar at the top of
      the page and as a page header and description.
      
                                         Limitations

      The only obvious limitation I've found is that within the endnotes if the URL
      spans more than one line (including the description) then the line-end
      processing corrupts the URL. The solution is to use the TinyURL[6] service and
      ensure that the description is short enough to fit on a line.

      
      ENDNOTES

      1. A description of ALLIN1

      2. Adobe Web Conversion Service

      3. documentation.vt

      4. esctohtml.com

      5. esctohtml.tpu

      6. Tiny URL Service