John Hurst:Document Technology Page

Document Technology Interests

XML

I started using XML in 1999, having been intrigued by the possibilities of using it for Literate Programming some time before,but became a little frustrated with the tools available. When a computer scientist is faced with this situation, what does he do? she do? Write your own!

AXE -- Ajh's Xml Engine

I liked the possibilities of XML, but found that all the tools I could lay my hands on didn't do what I wanted them to do! So AXE is an attempt to provide a general purpose XML to anything translation mechanism. Here's what it will do.

The following section is grabbed straight out of the program literate code.

User Manual for AXE

Running AXE

The AXE program converts XML source documents to TeX or HTML documents. To use the program, type

axe -t html file.xml 
which will convert source document file.xml to the target document file.html. Alternatively,
axe -t tex file.xml 
will convert file.xml to the target document file.tex, which may then be processed by a TeX processor.

The source document is required only to be well-formed in the XML sense. The translations are defined in an external file, called the translation file. The translation file is identified by composing the document type (the name of the root element of the source XML document) with the target format (current "tex" or "html"). This document is retrieved from a search path specified by the environment variable XMLLIB.

For example, if XMLLIB contains the string .:~/lib/xml, and axe is invoked with

axe -t html fred.xml

where fred contains

<article>
  <section><title>Fred's Article</title>
    <p>Blah, blah, blah, ...</p>
  </section>
</article>

then the translation file is known as article.html, and if it exists in the current directory (the "." entry in XMLLIB), that file will be read for the translation definitions, otherwise it will be looked up in the directory ~/lib/xml. If it doesn't exist there, it is an error.

Writing the Translation Files

Entries in the translation file fall into three main groups:

  1. Comments
  2. Include commands
  3. Translation commands

We consider each of these in the sections below. The model used for translation is that every element in the document is broken down into start,content,end; where each of start, content, and end has an appropriate translation, called the prefix, inner, and postfix components. The inner translation is defined by the nested content, and is recursively translated, but the prefix and postfix translations are specified directly.

For example, if the document to be translated consists of

<uri href="research/doctech/there">Go There</uri>

and the translation for the tag uri is defined as

<A HREF="@@<href>">.^^.</A>

then the translated value is

<A HREF="there">Go There</A>

If the text Go There instead contained other elements, these would be translated according to their specific rules. This is indicated by the inner context flag ^^, which indicates where the element content is to be translated and inserted.

Comments

Comments are indicated by a leading non-blank character of `#'. Subsequent text up to the end of line is ignored.

Include Commands

Include commands have the form:

include filename

where filename is a relative or absolute filename. If relative, it is searched for in the current directory, i.e., the directory in which axe was invoked. (Not any directory visited by other includes or the directory containing the including file. Note that it is intended to change this rule.)

Translation Commands

These define the translations to be applied upon recognizing the various start and end tags for each element in the document. The translation is enclosed in matching start and end XML tags, and is free format in that blanks and new lines may be used to improve readability. The translation itself consists of various replacement fragments or texts, which can be string literals or code fragments. The latter are Perl code fragments executed in the AXE environment. Each replacement fragment is separated from adjacent ones by the Perl concatenation operator "."

The translation is usually in three parts, corresponding to the prefix translation, applied before the element content is translated, then the element content itself, then the postfix translation, applied after the element content is translated. The element content is indicated either by a bare variable name, where the variable name is the same as the element tag, or by the characters ^^. Either the prefix or postfix translations can be empty.

Where the element content indicator does not appear, the translation is treated as a prefix translation only. Note that this allows means that there is no translation of the element content, and if there is any, it will not appear in the translated document at this point. This is usually used in the case of empty elements, but it can also be used to store content for later use, as any content that does appear will be saved under the eponymous variable name (that is, the variable with the same name as the element tag).

String replacement texts may be singly or doubly quoted, as in Perl. Singly quoted strings are not evaluated, meaning that everything within them is treated literally. Doubly quoted strings are evaluated (in the AXE environment), which means that variables are replaced by their values, and escaped characters are replaced by their escaped values (\n becomes a new line, \t becomes a tab, and so on). A string may be unquoted, in which case it must not contain any full stops/periods, and it will not be evaluated for escaped characters or variables.

Code fragments are enclosed in outer matching curly braces {}, and are evaluated in the AXE environment. Calls to interface routines must therefore be prefaced with a main:: prefix. Note that the value of the last expression in the code fragment is the value of that fragment, and is appended to the replacement text being assembled. Note that if the code fragment consists of just a single variable reference, the enclosing braces may be omitted.

Within the replacement text (all forms), the sequence @@<attr> is replaced by the attribute value for attr. If this attribute is not present in the original XML tag, an error is flagged. The sequence @@?<attr> is replaced by an expression that evaluates to true if the attribute is present, false otherwise.

Examples of string translations:

<uri> '<A>' .$uri. "</A>" </uri> Replace the XML uri element with the equivalent HTML element <A>inner text</A>
<uri> <A HREF="@@<href>" .^^. </A> </uri> Replace the XML uri element <uri href="research/doctech/blah"> inner text </uri> with the equivalent HTML element <A HREF="blah"> inner text </A>
<uri href> <A HREF="@@<href"> .^^. </A> </uri> Replace the XML uri element <uri href="research/doctech/blah"> inner text </uri> with the equivalent HTML element <A HREF="blah"> inner text </A>

Note that in both of the last two cases an href attribute is required. It is an error if the attribute is not present, in the former case detected by the translator, in the latter by the parser.

Examples of code translations:

<today>$today</today> Replace the XML today element (presumably empty: nothing is done with the content) with the value of the variable $today (which has presumably been set somewhere else).
<section>
{main::enter_section(0)}
.^^.
{main::exit_section(0)}
</section>
Call the enter_section routine before translating the inner content, and the exit_section after.

Rules for the Translation Commands

A translation command has the general form (in BNF)

ElementTranslation = start prefix ['.' inner ['.' postfix]] end |
start inner ['.' postfix] end
.
start = '<' elementname '>' .
prefix = translation ('.' translation)* .
inner = '^^' | '$' elementname .
postfix = translation ('.' translation)* .
translation = single | double | code | string | variable |
      conditional translation .
conditional = '?' '<' attributename '>' .
single = "'" non-single-quote-char* "'" .
double = '"' non-double-or-escaped-char '"' .
code = '{' valid-Perl-code-fragment '}' .
string = non-period-char* .
variable = scalar-Perl-variable .
end = '</' elementname '>' .

where elementname is a valid XML element name, attributename is a valid XML attribute name, non-single-quote-char is any character but ', non-double-or-escaped-char is any character but ", or the pair of characters \ followed by any character, non-period-char is any character but ., and scalar-Perl-variable and valid-Perl-code-fragment are the appropriate Perl (syntactically correct) elements.

Interface to the Perl Program

output(string)
Append "string" to the output.
on_output()
Turn on output. Output is processed normally
off_output()
Turn off output. No output is made to the target file or the saved stream.
enter_section(int)
Start numbering and title collecting for a new section. The top level section is 0, and it is numbered with a single digit. Subsequent section level nesting (values of 1 and up) is numbered with decimal points indicating the subsection number.
exit_section(int)
Exit the section started with enter_section(int). If the section nesting is not strictly followed, a warning message is generated.
current_context()
Return a string representing the current element tag.
sourcename()
Return a string representing the source file name.

This page is copyright, and maintained by John Hurst. 0 accesses all since
20 Apr 2024
My PhotoMy PhotoTrain Photo

Local servers: Localhost Newport Burnley Geelong Jeparit Reuilly Spencer (accessible only on local network.)
Public Web Servers: ajhurst.org ajh.co ajh.id.au (not all may be active.)
Dynamically generated at 20240420:1440 from an XML file modified on 20180703:0430, by index.py version 1.6.5.


20240420:1440: MESSAGES GENERATED BY: /home/ajh/www/research/doctech/index.xml

20240420:1440: b'/home/ajh/www/research/doctech/index.xml:1: warning: failed to load external entity "/home/ajh/www/research/doctech/ajhwebdoc.dtd"'
20240420:1440: b'DOCTYPE ajhwebdoc PUBLIC "-//MONASH-CSSE//DTD ajhwebdoc 1.0//EN" "ajhwebdoc.dtd"'
20240420:1440: b'                                                                               ^'
20240420:1440: b'warning: failed to load external entity "file:///home/ajh/local/ajhurst.org/counters/index-bad"'
20240420:1440: b'warning: failed to load external entity "file:///home/ajh/local/ajhurst.org/counters/index-bad"'
20240420:1440: b'No template for ajhwebdoc'
20240420:1440: b'No template for border'
20240420:1440: b'No template for ajhtrailer'

Please forward these details to John Hurst