This document describes files used for processing email.
1 | Abstract | |
2 | Maintenance History | |
3 | Introduction | |
4 | More Detailed Description | |
5 | Description of The Environment | |
6 | Program Requirements | |
6.1 | R1: uuencoded mail | |
6.2 | R2: mime encoded mail | |
7 | Design Document | |
8 | User Manual | |
9 | Literate Data | |
10 | Literate Definitions | |
11 | Literate Code | |
11.1 | .procmailrc | |
11.1.1 | Null Filters | |
11.1.2 | Forwarding Filters | |
11.2 | XML Perl filtering | |
11.3 | R1: uuencoded mail | |
11.3.1 | The uudecoder filter uudecoder | |
11.3.2 | Manning the Microsoft Ramparts | |
11.4 | R2: mimencoded mail | |
11.4.1 | The mime filter mimedecoder | |
11.5 | Obsolete Components | |
12 | Literate Build Scripts | |
13 | Literate Tests | |
14 | Bibliography | |
15 | Glossary | |
16 | Indices | |
16.1 | Files | |
16.2 | Chunk Names | |
16.3 | Identifiers |
09 Jul 1998 | John Hurst | 1.0 | initial version |
10 Jul 1998 | John Hurst | 1.1 | develop literate program: requirement R1 |
17 Sep 1998 | John Hurst | 1.2 | requirement R2; warning messages; revamped metamail processing |
18 Sep 1998 | John Hurst | 1.2.1 | fixed some bugs arising out of changes in version 1.2 |
21 Sep 1998 | John Hurst | 1.2.2 | mimedecoder now handle upper and lower case triggers, and text versions of docs sent to self now identify the source file. |
23 Sep 1998 | John Hurst | 1.2.3 | fixed metamail path |
25 Sep 1998 | John Hurst | 1.2.4 | some literate program revisions, and fixed bug with file name matching in mimedecoder. |
05 Oct 1998 | John Hurst | 1.2.5 | add tests for file existence in uudecoder. |
12 Oct 1998 | John Hurst | 1.2.6 | include .dot as a word document trigger. |
01 Dec 1998 | John Hurst | 1.2.7 | expand documentation on the Microsoft issue. |
29 Mar 1999 | John Hurst | 1.2.8 | add rtf files to the list of dammed, and revised warning message. |
27 Apr 1999 | John Hurst | 1.2.9 | turn auto-replies off at JohnR's request |
27 Apr 1999 | John Hurst | 1.2.10 | escape blanks within filenames |
28 Aug 1999 | John Hurst | 1.3.0 | revise for hawthorn |
15 Dec 2000 | John Hurst | 1.4.0 | revise for xlp; change warning |
20 Dec 2000 | John Hurst | 1.4.0 | revise copy circulation for ajh2, mh |
08 Jan 2001 | John Hurst | 1.4.1 | revise to reinclude attachment processing |
18 Jan 2001 | John Hurst | 1.4.2 | experimented with lock files on mime handling: discarded. |
22 Feb 2001 | John Hurst | 1.4.3 | add XML-Perl handling |
28 Feb 2001 | John Hurst | 1.4.4 | add mimedecode file name filtering |
07 Mar 2001 | John Hurst | 1.4.5 | add mimedecode file name filtering |
29 Mar 2001 | John Hurst | 1.4.6 | extract files to attachments directory |
05 Apr 2001 | John Hurst | 1.4.7 | forward .doc attachments to ajh2 |
19 Apr 2001 | John Hurst | 1.4.8 | reinstate forwarding of ALL mail to ajh2 for trial of mozilla |
This document defines scripts and code to interface to the procmail program, for the purposes of managing mail
This document started from a realization that I needed to keep changing the files I used to process incoming email, such as .procmailrc and the like. I found that I was continually rethinking much of what had been done before, so a literate program to handle it all seemed like a good idea. This has proved true in practice.
In handling electronic mail, a great deal of stuff arrives that could have some pre-processing performed upon it. For example, mail might be presorted into folders, mail from some addresses or containing certain patterns in its contents (such as ``make money fast!'') might be discarded, while other items might undergo some form of processing before being presented to the user.
There is a standard Unix tool to perform this, know as procmail. Unfortunately, its syntax is baroque, and it is not easy for novice users to get it right. Worse, wrong procmail scripts may discard mail never to be retrieved. What we describe here is a suite of programs to tackle this task, and document just what is happening so that the mail filtering is handled in as perspicacious a fashion as possible.
My work as ADT means that I receive a significant volume of email, which, although not large in absolute terms, does mean that some smart front end tools to ease the task of reading and handling mail is warranted. The environment includes a university administration that is not computer literate, and consequently there is much inappropriate use of the technology. Keeping a lid on this is one of the aims of this suite of literate programs.
When an email arrives that contains a uuencoded document, the filter should perform the decode, and then forward mail stating the name of the file, and where the file has been placed.
It is also appropriate that if the document is a Word document, a polite message informing the sender of potential delays in accessing the document is returned.
A similar scenario to uuencoded mail, as described above. The only tool I can find to perform mime decoding is metamail, which isn't very unix-friendly.
CS: It should examine each aspect of the design in enough detail to convince the reader that you had reasons for your choices, and didn't do anything just because that was the first way that came into your head. I like to divide it into sections, with each section presenting one design point, discussing the pros and cons of different approaches to that point, and ending with the approach I decided to take, supported by explicit references to the requirements (e.g. "I chose this sort routine over that one because its performance scales better, and scalability to larger data sets is requirement 2.3.1"). If you find you have no justification for one or more points of the design, that should tell you something (something bad) about the program you've written. Pictures of your data structures and literature citations for any tricky algorithms can go in this section (with cross-references to the code that implements them). It's good if this section can stand somewhat on its own, so it can be used in design reviews, or distributed separately to people who need to understand the system but don't need to understand the code.
There is not much to say from a user perspective. The programs are designed to be as transparent as possible.
To install this suite of programs, you will need to revise various paths and the like. The following table lists those things that will need changing in this document. They are distributed throughout the document to keep related components together, but this list allows us to cross-reference them. Isn't literate programming wonderful?
Most of these come from the section LiterateDefinitions.
<perl location 1> | absolute pathname for the perl interpreter |
<logfile location 5> | absolute pathname for where logging messages are to be stored |
<warning 6> | absolute pathname for the warning message which is sent to Word .doc senders. You may also want to change the file named ``warning'' as well (see <warning 6>). |
<decoder binaries 14> | absolute pathname for procmail to find the binaries defined by this literate program suite. Usually your binaries directory, but it can be anywhere. |
CS: The nice thing about a language-independent tool like noweb is that you can document anything with it, not just code. I like to have a section or chapter showing actual samples of input and output that the program is supposed to read and write. This gives me a good place to talk about range limits on data (and how we handle out-of-range conditions). Having a concrete example of typical data in mind helps the reader understand the code that processes the data when s/he gets to it. You put this stuff in "code" chunks rather than in tables or figures so you can extract it into files and use it in your tests (see below). (You may want to have only short samples here and put full-blown test data in an appendix, so as not to halt the momentum of the presentation before the reader gets to the code.)
In this section we define basic parameterizations of things. Anything which might reasonably be changed should go in this section.
Several programs use the perl system: define where to find the perl interpreter:
<perl location 1>=
<perl location for indy03 2>=
<perl location for central 3>=
<perl location for hawthorn 4>=
We also record logging details on a logfile:
<logfile location 5>=
And have a polite warning message for Word users
<warning 6>=
This is really the heart of this literate program: developing the .procmailrc file that drives procmail.
".procmailrc" 7=
We start with some variable definitions.
".procmailrc" 8=
discard all master-synch messages before proceeding.
".procmailrc" 9=
Process any mime attachments. The 1 means there is one condition in the <procmail mimencode content pattern 31>, the B means to match the condition (egrep) over the body, and the c means to continue processing even if this matches.
The second part is an appended action (A), and signifies that mail matching the <procmail mimencode content pattern 31> is also forwarded to ajh2. As of version 1.4.8, this reverts to forwarding ALL messages
".procmailrc" 10=
Process xml-perl mailing list.
The following is temporarily macroed out.
<old.procmailrc 11>=
"old.procmailrc" 12=
".procmailrc" 13=
<decoder binaries 14>=
<procmail pilot-link content pattern 15>=
<procmail pilot-link handling 16>=
<procmail xml-perl content pattern 17>=
<procmail xml-perl handling 18>=
uuencoded mail is detected with a content pattern that recognizes lines of the form:
begin 777 filename.ext
where 777 are the permission bits.
Here's the pattern for Microsoft word documents, the only file types recognized at the moment.
<procmail uuencode content pattern 19>=
<procmail uuencode handling 20>=
The filter for this task reads from standard input, and writes a file <uudecode mail file 22> containing a copy of the input. The encoded file name is extracted, and the mail file run through the decoder. The resultant decoded file is saved, and also passed through catdoc for remailing to me.
"uudecoder" 21=
<uudecode mail file 22>=
<uudecoder: read from standard input and store somewhere 23>=
Buried within the incoming mail is a line that triggered the execution of this script (see <procmail uuencode content pattern 19>), so we examine each line as it goes past to capture the file name details. The pattern is the word begin, starting in column 1, followed by a 3 octal digit permission attribute, then a file name.
A later version might use this recognition as a flag to turn on output to the OFILE, since lines to this point are ignored by uudecode, and might as well be discarded now.
We currently recognize only .doc files: maybe a later version will expand this.
<uudecoder: check and collect file name 24>=
<uudecoder: check and collect From field 25>=
When the mail file is decoded, the encoded file appears in the current directory with the file name $name. This value is extracted as we scan the standard input, in steps <uudecoder: read from standard input and store somewhere 23> and <uudecoder: check and collect file name 24>.
<uudecoder: pass constructed file through uudecoder 26>=
<uudecoder: report appropriate information and log details 27>=
I'm totally fed up with Microflabby software (or is it Microsoft flabware?). Annoy the user a la Lloyd.
<send sender a warning 28>=
There is a growing (perhaps not rapidly enough!) community that views the increasing dominance of Microsoft in the computing world as a serious threat to the ``diversity of species'' in the computing world. I am all for competition and survival of the fittest, but there does need to be an adequate gene pool (to carry the analogy perhaps as far as one might) to ensure that robust, adaptable, and accurate software remains available within the computing community.
Accordingly, the following warning message attempts to give an alternate perspective as to how to avoid propogating the Microsoft gene pool. See also my related documents on web pages.
As Jay Sekora (http://www.aq.org/~js/, js@aq.org) stated in the pilot-unix mailing list on Tue, 01 Dec 1998 23:58:16 says:
"warning" 29=
For the Makefile, we need to change the permission bits on the constructed uudecoder file.
<uudecode installation 30>=
<procmail mimencode content pattern 31>=
<procmail mimencode handling 32>=
"mimedecoder" 33=
The mime filter is called by procmail when we recognize the pattern <procmail mimencode content pattern 31> in the incoming mail. It writes some logging messages, reads standard input (the incoming mail text), and saves that to an intermediate file. This is necessary, as we want to pass the mail through the metamail filter, which does most of the hard work for us. We then look at the output of metamail, and perform some processing based upon what we find there.
Because the script doesn't seem to be properly working as yet, in a way that seems very asynchronous, I've added a lock around the whole thing (but note that it does have a race condition). I'm not convinced it is at all useful.
<mimedecoder: get date and time 34>=
This stuff just computes the date and time for use in the log file.
<mimedecode mail file 35>=
define where the mime document is saved
<mimedecode list file 36>=
define where the temporary file used in processing the mime document is kept.
<mimedecoder: read from standard input and store somewhere 37>=
<mimedecoder: pass constructed file through metamail 38>=
remove any previous file, then invoke metamail
<mimedecoder: check and collect From field 39>=
<mimedecoder: report appropriate information and log details 40>=
Sometimes a filename comes through in the Content-Description field. Extract it if there is one. I don't believe this is necessary though, since metamail will pick up the real filename as necessary. I've left this here just in case: it won't do any harm, since we will see a filename under the <mimedecoder: handle wrotefile line from metamail 43>.
<mimedecoder: look at Content Description 41>=
<mimedecoder: recognize mime encoded documents 42>=
I had an && $foundfile
appended to the
following condition, but it was missing some relevant files,
and I couldn't see its purpose, so I've taken it out. I'll
probably have to put it back again later, hence this
comment.
<mimedecoder: handle wrotefile line from metamail 43>=
<escape non-filename characters in <X#1> 44>=
Other systems use various characters in filenames that are not valid Unix filename characters, so we escape them as necessary.
<check for bad filenames and filter 45>=
Two main things concerning filenames:
<discard bad filenames and skip to next 46>=
<munge very long file names 47>=
We have sussed out that this is a word document, given that it has been mime-encoded as an `application/msword' (strong evidence), or as an `application/octet-stream' (weaker evidence, but then, what would you expect from brain damaged software anyway?).
Extract a filename for the text version (.txt extension in place of .doc), feed it through catdoc and mail the output back to me, so that I get a readable version. Send a dialectic message back to the sender, as well.
Note that at the moment I don't recognise .rtf files. I might have to change this in future.
<mimedecoder: handle word document 48>=
rtf files are neither here nor there. I can't use catdoc to handle them, so just send the dialectic message.
<mimedecoder: handle rtf document 49>=
For the Makefile, we need to change the permission bits on the constructed uudecoder file.
<mimedecoder installation 50>=
"never-used" 51=
The makefile relies upon an additional file
MakefileImplicit which defines standard nutweb build
operations. default defines the name of the literate
program (here process), while {\tt flags} define any
parameters to the nutweb tangle and weave operations
(``-1 ${HOSTNAME}
'' is assumed by
MakefileImplicit).
"Makefile" 52=
CS: The bibliography should include references not only to books and journal articles (which I actually hardly ever need, unless I've implemented an especially tricky data structure or algorithm), but more importantly to internal memos, system design documents, software manuals, standards, file format specifications, and other documentation a programmer would find useful. This is not a place to exercise restraint -- anything that might be useful should be listed here, because it probably won't ever have been officially noted anywhere else. In many cases the only way your successor will even know that a certain helpful reference exists is if s/he sees it listed here.
CS: Whether I include one or both, or a combined list, or break out abbreviations separately, depends on how many entries of each type I need and who I expect will be reading the document.
CS: I prefer to have two: one for "code stuff", such as the names of variables, subroutines, data types, etc., that a programmer would want, and a separate one for the "text stuff", which might be read by a non-coder who is skimming the documentation.
Three sets of indices can be created automatically by nutweb: an index of file names, an index of macro names, and an index of user-specified identifiers. An index entry includes the name of the entry, where it was defined, and where it was referenced.
Knuth prints his index of identifiers in a two-column format. This requires modification of the TeX output routine, and significantly increases the size of the nutmacs.tex file. Therefore, it seems better to leave it this up to the user.