View and Rank Trains

John Hurst

Version 0.1.0

1	Overview
	1.1	Define Global Constants
2	viewtrains.py Main Body
	2.1	Initialisation
		2.1.1	Imports
		2.1.2	Define Global Constants
		2.1.3	Define Regular Expression Patterns
		2.1.4	Define Subroutines
	2.2	Supporting Code
		2.2.1	collect cgi parameters
		2.2.2	Collect Previous Rankings
		2.2.3	print header of html page
		2.2.4	Print Trailer of HTML Page
	2.3	Define Subroutines
		2.3.1	define function convertIPtoHex
		2.3.2	define procedure log
		2.3.3	Determine rank of image
		2.3.4	Define the XML Dispatch Routines
		2.3.5	define procedure to display the image
		2.3.6	Define Procedure to Search Directories
3	ranktrains.py
	3.1	ranktrains: define constants
	3.2	ranktrains: collect cgi parameters
	3.3	ranktrains: collect previous ranking information
	3.4	ranktrains: Update Rankings with Latest Log Info
	3.5	rankings: Print Forward and Backward Buttons
	3.6	ranktrains: Generate Image Rankings
	3.7	ranktrains: print rankings table
	3.8	ranktrains: Print Ranking Analysis
4	tidytrains.py
5	ranking.py
6	rank.py
	6.1	rank: define the strtotime function
	6.2	rank: define rankdata procedure
	6.3	rank: define ranklog procedure
7	TO DO
8	Indices
	8.1	Identifier Index
	8.2	Chunk Index
	8.3	File Index

1. Overview

This document describes various python cgi scripts for viewing the author's railway photograph collection. The collection is held in a database (directory plus sub-directories) that can be accessed from a variety of web pages (see for example, my school server page). There are two main scripts, one to view an image at full resolution, and one to view the popularity rankings of the images. Other scripts described here help in maintaining the system.

The first script defined is <viewtrains.py 2.1>. This script takes a single parameter, which is the short name of an image, and searches the database for that image, and then renders it at full resolution, along with data about the image retrieved from the associated XML file.

The other main script is <ranktrains.py 3.1>, which delivers a page of thumbnail images in order of popularity. Because of the large number of images, each such page delivers only a few of the total number of images, but buttons are included to navigate around the complete rankings.

<tidytrains.py 4.1> reorganizes the ranking files. It reads two files to determine current rankings. The first file contains a single entry for each image, containing the current voting score, and the second file contains a list of votes since then. This ranking is then used to determine if any images should be removed from the system, depending upon whether or not their rank is less than some threshold.

<ranking.py 5.1,5.2> reads two files to determine current rankings. The first file contains a single entry for each image, containing the current voting score, and the second file contains a list of votes since then. A third file is then constructed, containing the updated rankings.

<rank.py 6.1> provides three procedures (strtotime, rankdata, ranklog) for other programs.

1.1 Define Global Constants

<define global constants 1.1> =

(year, month, day, hour, minute, second, weekday, yday, DST) = \
  time.localtime(time.time())

debug=1

<determine server environment 1.5>
EXTN=".xml"

Chunk referenced in 2.3 3.2 4.1 5.1

<globals for macosx 1.2> =

CGIBIN="http://%s/~ajh/cgi-bin" % (server)
HOMEPAGE="http://%s/~ajh" % (server)
BASEPAGE="/home/ajh/www"
LOGFILE="/home/ajh/local/%s/logs" % host

Chunk referenced in 1.5

<globals for solaris 1.3> =

CGIBIN="http://www.csse.monash.edu.au/~ajh/cgi-bin"
HOMEPAGE="http://www.csse.monash.edu.au/~ajh"
BASEPAGE="/u/web/homes/ajh"
LOGFILE=BASEPAGE+"/logs"

Chunk referenced in 1.5

<globals for linux 1.4> =

CGIBIN="http://%s/~ajh/cgi-bin" % (server)
HOMEPAGE="http://%s/~ajh" % (server)
BASEPAGE="/home/ajh/www"
LOGFILE="/home/ajh/local/%s/logs" % host

Chunk referenced in 1.5

<determine server environment 1.5> =

# determine which host/server environment
if os.environ.has_key("SERVER_NAME"):
  server=os.environ["SERVER_NAME"]
elif os.environ.has_key("HOST"):
  server=os.environ["HOST"]
else:
  p=Popen(['hostname'],stdout=PIPE)
  server=p.communicate()[0]
MacOSX='MacOSX' ; Solaris='Solaris' ; Linux="Linux"
if server in ["www.csse.monash.edu.au","nexus"]:
  host='csse' ; ostype='Solaris' ; system=Solaris
elif server in ["njhurst.com","www.njhurst.com",\
                "ajhurst.org","www.ajhurst.org",\
                'chairsabs.org.au','www.chairsabs.org.au']:
  host='sequoia' ; ostype='Linux' ; system=Linux
elif server in ['bittern.local','bittern']:
  host='bittern' ; ostype='MacOSX' ; system=MacOSX
elif server in ['murtoa.local','murtoa']:
  host='murtoa' ; ostype='MacOSX' ; system=MacOSX
elif server=='dimboola.infotech.monash.edu.au':
  host='dimboola' ; ostype='MacOSX' ; system=MacOSX
elif server in ['ajh.id.au','www.ajh.id.au','10.0.0.105']:
  host='rainbow' ; ostype='MacOSX' ; system=MacOSX
elif server=='localhost':
  if os.environ.has_key('HOST'):
    host=os.environ['HOST']
  else:
    host='localhost'
  ostype='MacOSX' ; system=MacOSX
else:
  host=system=ostype='unknown'
  print "I don't know this host: %s<BR/>" % server

if system==MacOSX:
  <globals for macosx 1.2>
elif system==Solaris:
  <globals for solaris 1.3>
elif system==Linux:
  <globals for linux 1.4>
else:
  print "Unknown system %s<BR/>" % (system)
  sys.exit(1)

Chunk referenced in 1.1

2. `viewtrains.py` Main Body

"viewtrains.py" 2.1 =

#!/usr/bin/python
# DO NOT EDIT THIS FILE!  
# use ~/Computers/python/viewtrains/viewtrains.xlp instead

<imports 2.2>
<define constants for viewtrains 2.3,2.7>
<define regular expression patterns 2.4>
<define subroutines 2.5>

print "Content-Type: text/html\n\n";
#print "TEST MESSAGE!\n"

<collect cgi parameters 2.6>
<collect previous rankings 2.8>

htmltitle="Full size image of "+imageparm
<print header of html page 2.9,2.10,2.11,2.12,2.13>

if imageisrelative:
  display(imageparm)
else:
  res=visit(top,0,imageparm)

scriptnameparm="viewtrains.py?image=%s" % (escimageparm)
<print trailer of html page 2.14> 

The main work of this script is done in one of the two procedures display and visit, depending upon whether the user offered a relative image path or not.

Somewhat contradictory to normal usage, a relative path refers to the location of the image relative to the base trains directory, rather than the root directory, hence the terminology. A non-relative path simply gives the image name, and hence a full directory search must be carried out in order to locate the image.

2.1 Initialisation

2.1.1 Imports

<imports 2.2> =

import cgi
import os
import datetime
import math
import rank
from rank import DECAY
import re
import string
from subprocess import Popen,PIPE
import sys
import time
import urllib
from xml.dom.minidom import parse, parseString, Node

Chunk referenced in 2.1 4.1

2.1.2 Define Global Constants

<define constants for viewtrains 2.3> =

<define global constants 1.1>
tm = "%4d%02d%02d:%02d%02d" % (year, month, day, hour, minute)

SCRIPT=CGIBIN+"/viewtrains.py"
SCRIPTIMAGE=SCRIPT+"?image="
top=BASEPAGE+"/trains"
now=datetime.datetime.now()
startnow=datetime.datetime.now()
today=startnow.strftime("%Y%m%d")

Chunk referenced in 2.1
Chunk defined in 2.3,2.7

2.1.3 Define Regular Expression Patterns

<define regular expression patterns 2.4> =

ignoredirs = re.compile('(tmp)|(units)')

Chunk referenced in 2.1

2.1.4 Define Subroutines

<define subroutines 2.5> =

<define function convertIPtoHex 2.15>
<define procedure log 2.16>
<define the ranking procedure 2.17>
<define the XML dispatch routines 2.20>
<define procedure to display the image 2.21>
<define procedure to search directories 2.26>

Chunk referenced in 2.1

These procedures are of sufficient significance that they have been moved to a separate section.

2.2 Supporting Code

2.2.1 collect cgi parameters

<collect cgi parameters 2.6> =

form = cgi.FieldStorage()
#print form
#print cgi.print_environ()
ipadr=convertIPtoHex(os.getenv("REMOTE_ADDR"))

gotparms=0; dontlog=0

#print "QUERY_STRING=",os.getenv("QUERY_STRING"),"<BR/>"
#print "USER_AGENT=",os.getenv("USER_AGENT"),"<BR/>"

if form.has_key("image"):
  imageparm=form["image"].value
  gotparms=1
  res=re.match('([^.]+).jpg$',imageparm)
  if res:
    imageparm=res.group(1)
  res=re.match('^trains/',imageparm)
  if res:
    imageisrelative=1
  else:
    imageisrelative=0
if form.has_key("disablevote"):
  dontlog=1
if not gotparms:
  print "<H1>Error!</H1>"
  print "<P>You are using a browser which has not passed in the ",
  print "cgi parameters ",
  print "correctly.  Please use a different browser that does ",
  print "handle parameters properly. ",
  print "(Mozilla, Safari, Epiphany, Firefox are known to work).</P>"
  print "<P>Alternatively, type the name of an image into the following box and click submit/hit enter</P>"
  print "      <p></p>\n"
  print "      <form action=\"%s/viewtrains.py\" method=\"post\">\n" % (cgiserver)
  print "        <input type=\"submit\" value=\"submit\"/>"
  print "        <input type=\"text\" size=\"30\" name=\"image\" value=\"%s\"/>" % ("")
  print "      </form>"
  sys.exit(0)

escimageparm=urllib.quote(imageparm)

Chunk referenced in 2.1

Use the Python library to retrieve cgi parameters. Currently there only one, image, which is the name of an image in the train library. Two alternatives are available:

The parameter starts with "trains/", in which case it is a relative pathname into the trains directory, and no searching is required; or
It does not, in which case the name must be searched against the image library to find the required image.

The choice between these two is flagged in the variable imageisrelative.

Discard any ".jpg" suffix.

Escape any suspect URL parameter characters.

<define constants for viewtrains 2.7> =

cgiserver="http://localhost/cgi-bin/ajh"

Chunk referenced in 2.1
Chunk defined in 2.3,2.7

2.2.2 Collect Previous Rankings

All previous rankings have been reduced to a single vote value for each image. These values are stored in a file RANKINGS, together with the date and time of the rankings. These votes are exponentially decayed, and used as the base values for any additional votes cast since that date.

<collect previous rankings 2.8> =

RANKINGS=LOGFILE+"/trainrank"
VIEWINGS=LOGFILE+"/trainview"
totalimages, datatime, votefactor, table = rank.rankdata(RANKINGS)

Chunk referenced in 2.1

2.2.3 print header of html page

This code prints the header part of the html page.

<print header of html page 2.9> =

print """
    <html>
      <head>

Chunk referenced in 2.1 3.1
Chunk defined in 2.9,2.10,2.11,2.12,2.13

print the starting lines, then ...

<print header of html page 2.10> =

        <title>""",
print htmltitle,
print """</title>

Chunk referenced in 2.1 3.1
Chunk defined in 2.9,2.10,2.11,2.12,2.13

print the page title,including the "MONASH UNIVERSITY", "INFORMATION TECHNOLOGY" and "Clayton School" parts.

<print header of html page 2.11> =

        <base href=\""""+HOMEPAGE+"""/"/>
        <link rel="stylesheet" HREF="styles/monash.css" type="text/css" />
      </head>
      <body>
        <div id="global-header">
        <div id="global-images">
        <table width="100%" bgcolor="white">
          <tr width="100%">
            <td align="left">
              <table>
                <tr>
                  <td align="left">
                    <a href="http://www.monash.edu.au">
                      <span style="font-family:sans-serif;font-size:+160%;font-weight:bold;
                          background-color:#ffffff;color:black">
                          MONASH UNIVERSITY
                      </span>
                    </a>
                  </td>
                </tr>
                <tr>
                  <td align="left">
                    <a href="http://www.infotech.monash.edu.au" COLOR="black">
                      <span style="font-family:sans-serif;font-size:+140%;font-weight:bold;
                        background-color:#ffffff;color:black">INFORMATION TECHNOLOGY</span>
                    </a>
                  </td>
                </tr>
                <tr>
                  <td align="left">
                    <a href="http://www.csse.monash.edu.au" COLOR="black">
                      <span style="font-family:sans-serif;font-size:+120%;
                        font-weight:bold;background-color:#ffffff;color:black">
                        Clayton School</span>
                    </a>
                  </td>
                </tr>
              </table>
            </td>
            <td align="right">
""",

Chunk referenced in 2.1 3.1
Chunk defined in 2.9,2.10,2.11,2.12,2.13

Generate the trains image on the trains page. We do this from the list of available images in web/images/banner (added manually to this list), by choosing one indexed by the low order bits of the current microsecond, that is, pseudo-randomly.

<print header of html page 2.12> =

rightnow=datetime.datetime.now()
locos=["R707-1.jpg",      "6029-32.jpg",     "4472=R761-11.jpg",\
       "621-16.jpg",      "F255-1.jpg",      "5910-4.jpg",\
       "3813-5.jpg",      "5112+5910-4a.jpg","W933-8.jpg",\
       "520-6.jpg",       "3203-3.jpg",      "3642-1.jpg",\
       "Rx207-15.jpg",    "38=D3+K=R-3.jpg", "D3-639+R707-1.png",\
       "J549-21.jpg",     "5367-2.jpg",      "W22-1.png",\
       "3813-5a.jpg",     "tgv-13.jpg",      "S300-1.jpg",\
       "4472=R761-11.jpg","6029-6.jpg",      "Callington-1.jpg",\
       "NYCHudson-1.jpg", "R711-3.jpg",      "MerddinEmrys-2.jpg",\
       "3801+3813+3820-2.jpg"]
loco=locos[rightnow.microsecond % len(locos)]
print "<img align=\"right\" SRC=\"images/banner/" + loco,
print "\" height=\"79\" alt=\"steam loco " + loco + "\"\/>",

Chunk referenced in 2.1 3.1
Chunk defined in 2.9,2.10,2.11,2.12,2.13

Now complete the final part of the header. A warning message about using Internet Explorer is also added, as that application is not W3C compliant.

<print header of html page 2.13> =

print """
            </td>
          </tr>
        </table>
      </div>
      <div class="spacer"></div>
      <table style="background-color:#339933;border-top:1px solid #000000" 
        width="100%" id="global-nav" summary="Layout for site-wide navigation">
        <tr>
          <td valign="center">
            <div style="font-size:+140%;margin-left:1em">
              <a HREF="index"""+EXTN+"""">JOHN HURST</a>
              Warning: This page works with any browser EXCEPT Internet Explorer!
              <xsl:copy-of select="$GlobalNavBar"/>
            </div>
          </td>
        </tr>
      </table>
      <!-- U T I L I T Y   N A V I G A T I O N  --> 
      <table style="background-color:#3c6;color:#fff;vertical-align:middle;
        text-align: right" width="100%" id="global-utils" 
                           summary="Layout for utility navigation">
        <tr>
          <td align="left">
            <a HREF="position/index"""+EXTN+"""">Position</a> | 
            <a HREF="research/index"""+EXTN+"""">Research</a> | 
            <a HREF="teaching/index"""+EXTN+"""">Teaching</a> | 
            <a HREF="admin/index"""+EXTN+"""">Administration</a> | 
            <a HREF="professional/index"""+EXTN+"""">Professional</a> | 
            <a HREF="personal/index"""+EXTN+"""">Personal</a> | 
            <a HREF="trains/index"""+EXTN+"""">Railways</a>
            | 
            <a HREF=\""""+CGIBIN+"""/train-map.py">Site map</a>
          </td>
        </tr>
      </table>
<TABLE WIDTH="100%" BGCOLOR="#fff" CELLSPACING="0" CELLPADDING="0">
<TR><TD COLOR="#ffffff" BGCOLOR="#33cc66" COLSPAN="40" ALIGN="center">
<B>Central Shunting Yard</B></TD></TR>

<TR>
<TD ALIGN="center" BGCOLOR="silver" COLSPAN="3">Main</TD>
<TD ALIGN="center" BGCOLOR="lightgreen" COLSPAN="7">Australia</TD>
<TD ALIGN="center" BGCOLOR="lightpink" COLSPAN="3">Miscellaneous</TD>
<TD ALIGN="center" BGCOLOR="lightblue" COLSPAN="5">Rest of World</TD>
</TR><TR>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+"""">
<IMG SRC="trains/./trains.gif" ALT="Main Railway Page" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+"""">
<IMG SRC="trains/new/trains.gif" ALT="The Latest Additions"
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+"""">
<IMG SRC="trains/pops/thumb/ajh.gif" ALT="The Most Popular Images"
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white">
<A HREF="trains/anr/index"""+EXTN+"""">
<IMG SRC="trains/anr/trains.gif" ALT="Australian National Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+"""">
<IMG SRC="trains/nsw/trains.gif" ALT="New South Wales Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+"""">
<IMG SRC="trains/qld/trains.gif" ALT="Queensland Railways" HEIGHT="30" 
        WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+"""">
<IMG SRC="trains/sa/trains.gif" ALT="South Australian Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+"""">
<IMG SRC="trains/tas/trains.gif" ALT="Tasmanian Railways" HEIGHT="30" 
        WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+"""">
<IMG SRC="trains/vic/trains.gif" ALT="Victorian Railways" HEIGHT="30" 
        WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+"""">
<IMG SRC="trains/wa/trains.gif" ALT="West Australian Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+"""">
<IMG SRC="trains/misc/trains.gif" ALT="Miscellaneous Railway Items" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A
        HREF="trains/private/index"""+EXTN+"""">
<IMG SRC="trains/private/trains.gif" ALT="Private Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A
        HREF="trains/tourist/index"""+EXTN+"""">
<IMG SRC="trains/tourist/trains.gif" ALT="Tourist and Preservation" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+"""">
<IMG SRC="trains/./thumb/rest.gif" ALT="African/Asian Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+"""">
<IMG SRC="trains/br/trains.gif" ALT="British Railways" HEIGHT="30" 
        WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+"""">
<IMG SRC="trains/europe/trains.gif" ALT="Continental European Railways"
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+"""">
<IMG SRC="trains/nz/trains.gif" ALT="New Zealand Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+"""">
<IMG SRC="trains/usa/trains.gif" ALT="North American Railways" 
        HEIGHT="30" WIDTH="30"></A></TD>
</TR><TR>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+"""">Central</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+"""">Latest</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+"""">VoxPop</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/anr/index"""+EXTN+"""">ANR</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+"""">NSW</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+"""">QLD</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+"""">SA</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+"""">TAS</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+"""">VIC</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+"""">WA</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+"""">Misc</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/private/index"""+EXTN+"""">Private</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tourist/index"""+EXTN+"""">Tourist</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+"""">Rest</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+"""">BR</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+"""">Europe</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+"""">NZ</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+"""">US&amp;</A></TD>
</TR>
</TABLE></div>"""

Chunk referenced in 2.1 3.1
Chunk defined in 2.9,2.10,2.11,2.12,2.13

2.2.4 Print Trailer of HTML Page

<print trailer of html page 2.14> =

print """
    <HR SIZE="4" NOSHADE="on" COLOR="#339"/>
    <TABLE width="100%" align="center" border="0" cellspacing="0" cellpadding="0">
      <TR><TD height="10"></TD></TR>
      <TR>
        <TD>This page maintained by John Hurst. <BR/>
          Copyright 
          <A HREF="http://www.adm.monash.edu.au/unisec/pol/itec12.html">
            Monash University Acceptable Use Policy
          </A>
        </TD>
        <TD><xsl:copy-of select="$GlobalCounter"/></TD>
        <TD ALIGN="right" ROWSPAN="2">
          <IMG VALIGN="bottom" SRC="images/MadeOnMac.gif"/>
          <A HREF="index"""+EXTN+"""">
            <IMG ALIGN="center" height="50" width="33" 
              SRC="family/john9808.gif"
              alt="My Photo"/></A>
          <A HREF="trains/index"""+EXTN+"""">
            <IMG ALIGN="center" height="50" width="33"
              SRC="images/train.gif"  
              alt="Train Photo"/></A>
        </TD>
      </TR>
      <TR>
        <TD ALIGN="left" valign="bottom" COLSPAN="3">
          <SPAN STYLE="font-size:80%">
            <P>
              Dynamically generated at """+\
              tm+"""\
              <BR/>
              Maintainer use only; not generally accessible: 
              <!-- **** NB! The "localhost" in the following MUST 
              be split to avoid being converted for 
              other server contexts -->"""
ind='              '
print ind+'<A href="http://local'+'host/~ajh/cgi-bin/'+scriptnameparm+'">Local Server</A>'
print ind+"<xsl:text>&#x0a;</xsl:text>"
print ind+'<A href="http://www.ajh.id.au/~ajh/cgi-bin/'+scriptnameparm+'">Home Server</A>'
print ind+"<xsl:text>&#x0a;</xsl:text>"
print ind+'<A href="http://www.ajhurst.org/~ajh/cgi-bin/'+scriptnameparm+'">Hurst Server</A>'
print ind+"<xsl:text>&#x0a;</xsl:text>"
print ind+'<A href="http://dimboola.infotech.monash.edu.au/~ajh/cgi-bin/'+scriptnameparm+'">'
print ind+'Work Server</A>'
print ind+"<xsl:text>&#x0a;</xsl:text>"
print ind+'<A href="http://www.csse.monash.edu.au/cgi-bin/cgiwrap/ajh/'+scriptnameparm+'">'
print ind+'CSSE Server</A>'
print """
              <xsl:text>&#x0a;</xsl:text>
            </P>
          </SPAN>
        </TD>
      </TR>
    </TABLE>
  </body>
</html>
"""

Chunk referenced in 2.1 3.1

2.3 Define Subroutines

2.3.1 define function convertIPtoHex

<define function convertIPtoHex 2.15> =

def convertIPtoHex(ipadrDec):
  ipadrHex=ipadrDec
  res=re.match(r'(\d+)\.(\d+)\.(\d+)\.(\d+)',ipadrDec)
  if res:
    d1=int(res.group(1))
    d2=int(res.group(2))
    d3=int(res.group(3))
    d4=int(res.group(4))
    ipadrHex = "%02x%02x%02x%02x" % (d1,d2,d3,d4)
  return ipadrHex

Chunk referenced in 2.5

The logic of this function is simple enough: extract the integer (decimal) values of each field in an IP address, and convert each to a two-digit hexadecimal value. Concatenate all these into a single hex string, which is returned.

2.3.2 define procedure log

<define procedure log 2.16> =

def log(ipadr,image,acc,ok):
  global dontlog
  if dontlog:
    return
  access=""
  if not acc:
    refer = os.getenv("HTTP_REFERRER")
    access=" *** not served *** (ref: %s)" % (refer)
  elif not ok:
    access=" *** already voted ***"
  try:
    f=open(VIEWINGS,'a')
  except:
    print "Cannot open logfile %s" % (VIEWINGS)
  f.write("%s %s %s%s\n" % (tm,ipadr,image,access))
  f.close()
  #print "<P>Logged %s %s</P>" % (tm,image)

Chunk referenced in 2.5

Every image access is logged, for recording its popularity. The exceptions are where there is an explicit request not to log (dontlog is true), and where there is some problem in delivering the image.

2.3.3 Determine rank of image

<define the ranking procedure 2.17> =

logentrypat=\
    re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) "+\
               "([0-9a-f\.:]+ )?trains/(.*)$")
<define getrank routine 2.18>

Chunk referenced in 2.5

<define getrank routine 2.18> =

def getrank(path,imageparm):
  global ipadr
  res=re.match(r'trains/(.*)$',imageparm)
  if res:
    imageparm=res.group(1)
  ok=1
  notserved=re.compile(".*\*\*\* .* \*\*\*")
  res=re.match('.*trains/(.*)$',path)
  if res:
    path=res.group(1)
  try:
    data=open(VIEWINGS)
  except:
    print "Cannot open logfile %s" % (VIEWINGS)
    sys.exit(1)
  for l in data.readlines():
    res=notserved.match(l)
    if res:
      continue
    <getrank: extract data from a single logfile entry 2.19>
    pass
  if ok:
    if table.has_key(imageparm):
      if not dontlog:
        table[imageparm]+=1.0 # add one for this viewing!
    else:
      table[imageparm]=1.0 # add one for this viewing!
  list=[]
  for key in sorted(table.keys()):
    list.append((table[key],key))
    #print "%f %s" % (table[key],key)
  sortlist=sorted(list,reverse=True)
  totalimages=len(sortlist)
  
  i=1; last=0.0; rank=1; thisrank=0
  for (n,k) in sortlist:
    if last!=n:
      rank=i
    #print "%4d %2.6f %s<BR/>" % (rank,n,k)
    i+=1
    last=n
    if k==path:
      thisrank=rank
      break
  return (totalimages,thisrank,ok)

Chunk referenced in 2.17

Match the various fields in a logfile entry. These are the year, month, day, hour and minute of the entry (in the format YYYYMMDD:hhmm), followed by the IP address (now stored in hexadecimal, but originally in decimal, and before that, not at all), and then the image address, including the base directory trains/, which is stripped off. Note that any logging of whether the image actually was served, or had already been voted upon, has been discarded previously.

<getrank: extract data from a single logfile entry 2.19> =

res=logentrypat.match(l)
if res:
  logdate=l[0:8]
  year=int(res.group(1))
  month=int(res.group(2))
  day=int(res.group(3))
  hour=int(res.group(4))
  minute=int(res.group(5))
  accesstime = datetime.datetime(year,month,day,hour,minute)
  timesinceaccess=now-accesstime
  dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
  expval=-dayssinceaccess/DECAY
  voteval=math.exp(expval)
  #print "%1.4f  %2.6f %2.5f" % (voteval,expval,dayssinceaccess)
  thisipadr=res.group(6)
  if thisipadr:
    thisipadr=thisipadr.strip()
  imagename=res.group(7)
  #print "%s %s %s : %s %s <%s> %s" % \
  #      (year,month,day,hour,minute,thisipadr,imagename)
  if table.has_key(imagename):
    table[imagename]+=voteval
  else:
    table[imagename]=voteval
  if thisipadr==ipadr and imagename==imageparm and logdate==today:
    ok = 0

Chunk referenced in 2.18

2.3.4 Define the XML Dispatch Routines

<define the XML dispatch routines 2.20> =

def doname(elem,path):
  if elem.firstChild:
    text=elem.firstChild.nodeValue
    pathsplit=re.match(BASEPAGE+'/(.*)/([^/]*)$',path)
    if pathsplit:
      pathbase=pathsplit.group(1)
      pathfile=pathsplit.group(2)
      dirattr=elem.getAttribute('dir')
      dir=""
      if dirattr:
        dir=dirattr
      page=""
      pageattr=elem.getAttribute('page')
      if pageattr:
        url=pathbase+'/'+pageattr+EXTN+'#'+pathfile
        page=pageattr
      else:
        url=pathbase+'/index'+EXTN+'#'+pathfile
      print "<LI><I>name:</I> <A HREF=\"%s\">%s</A> dir=%s page=%s</LI>" % (url,text,dir,page)
    else:
      print "<LI><I>name:</I> %s</LI>" % (text)

def dothumb(elem,path):
  pass
  #text=elem.firstChild.nodeValue
  #print "<LI><I>thumb:</I> %s</LI>" % text

def dosize(elem,path):
  bytes=pixels=""
  print "%s" % (elem)
  attrs=elem.attributes
  for i in range(attrs.length):
    attr = attrs.item(i)
    if attr.name=='bytes':
       bytes=attr.value
    if attr.name=='pixels':
       pixels=attr.value
  print "<LI><I>size:</I> %s bytes, %s pixels</LI>" % (bytes,pixels)

def dodate(elem,path):
  taken=catalogued=""
  attrs=elem.attributes
  for i in range(attrs.length):
    attr = attrs.item(i)
    if attr.name=='taken':
       taken=attr.value
    if attr.name=='catalogued':
       catalogued=attr.value
  print "<LI><I>date:</I> taken: %s, catalogued %s</LI>" % (taken,catalogued)

def dophotographer(elem,path):
  text=elem.firstChild.nodeValue
  print "<LI><I>photographer:</I> %s</LI>" % text

def doindex(elem,path):
  if elem.firstChild:
    text=elem.firstChild.nodeValue
    print "<LI><I>index terms:</I> %s</LI>" % text

def totext(node,path):
  if node.nodeType==node.TEXT_NODE:
    return node.nodeValue
  elif node.nodeType==node.ELEMENT_NODE:
    text=''
    for n in node.childNodes:
      text=text+totext(n,path)
    if node.tagName=='narrower':
      return "<DIV style=\"margin-left:20;font-style:italic\">%s</DIV>" % text
    elif node.tagName=='uri':
      attributes=node.attributes
      for i in range(attributes.length):
        attr = attributes.item(i)
        if attr.name=='href':
          href=attr.value
      return "<A HREF=\"%s\">%s</A>" % (href,text)
    elif node.tagName=='p':
      return "<P>%s</P>" % (text)
    elif node.tagName=='b':
      return "<B>%s</B>" % (text)
    elif node.tagName=='i':
      return "<I>%s</I>" % (text)
    elif node.tagName=='em':
      return "<EM>%s</EM>" % (text)
    elif node.tagName=='dq':
      return "\"%s\"" % (text)
    elif node.tagName=='description':
      return text
    else:
      return "&amp;lt;%s>%s&amp;lt;/%s>" % (node.tagName,text,node.tagName)
  elif node.childNodes:
    text=''
    for n in node.childNodes:
      text=text+totext(n,path)
    return text
  else:
    return "**unknown node**"

def dodescription(elem,path):
  text=totext(elem,path)
  print "<LI><I>description:</I> %s</LI>" % text

dispatch={'name':doname,
          'thumb':dothumb,
          'size':dosize,
          'date':dodate,
          'photographer':dophotographer, 
          'index':doindex,
          'description':dodescription}

Chunk referenced in 2.5

2.3.5 define procedure to display the image

This is the procedure that does most of the real work in displaying the full image.

<define procedure to display the image 2.21> =

def display(image):
  global totalimages,imageparm
  path=top[0:len(top)-6]+image
  xmlfile=path+EXTN
  jpgfile=path+".jpg"
  acc=os.access(jpgfile,os.R_OK)
  pathsplit=re.match(BASEPAGE+'/(.*)/([^/]*)$',path)
  if pathsplit:
    pathbase=pathsplit.group(1)
    pathfile=pathsplit.group(2)
  if acc:
    <display can access file 2.22>
  else:
    <display cannot access file 2.23>
  #print '<P><A href="%s">Go to %s</A></P>' % (image,image)
  (totalimages,rank,ok)=getrank(path,imageparm)
  log(ipadr,image,acc,ok)
  startrank = 25 * ((rank-1) / 25)
  print <Print ranking information 2.25>
  if not ok:
    print '<P>You have already voted for this image today!</P>\n'
  return

Chunk referenced in 2.5

<display can access file 2.22> =

#print "<P>%s,%08x</P>" % (jpgfile,acc)
print "<IMG SRC=\"%s\"/>" % (image+".jpg")
print "<LI><I>file:</I> %s </LI>" % (xmlfile)
dom=parse(xmlfile)
elems=dom.getElementsByTagName('image').item(0).childNodes
for n in elems:
  if n.nodeType == Node.ELEMENT_NODE:
    #print n.tagName
    if dispatch.has_key(n.tagName):
      dispatch[n.tagName](n,path)

Chunk referenced in 2.21

The file to display is accessible, so generate the HTML reference to it, print the name of the XML file, then parse it in order to display the various attributes relating to the image (as defined in the XML file).

<display cannot access file 2.23> =

if pathsplit:
  gifpath=pathbase+'/thumb/'+pathfile+'.gif'
  gifacc=os.access(BASEPAGE+'/'+gifpath,os.R_OK)
else:
  gifacc=False
#print "<P>%s,%08x</P>" % (jpgfile,acc)
print "<P><B>The file %s is not available</B>" % (jpgfile)
if gifacc:
  print '<IMG SRC="'+gifpath+'"/></P>'
  print '<P>The image has been removed for space reasons.  '
  print 'It will be retrieved overnight.</P>'
else:
  print "</P>"
  res=re.match(".*/([^/]*)$",path)
  name="Sorry"
  if res:
    name=res.group(1)
  print <Explain missing images 2.24>

Chunk referenced in 2.21

<Explain missing images 2.24> =

"""
<P>This may be because the file has been relocated to a different location.
Try clicking this link to search the website:
<FORM action=\""""+SCRIPT+"""\" method=\"post\" name=\"image\">
<INPUT type=\"submit\" name=\"image\" value=\"%s\"><IMG SRC="%s"/></INPUT>
</FORM>
If that does not work, it may be that the image has been removed for space reasons. Sorry.
""" % (name,gifpath)

Chunk referenced in 2.23

<Print ranking information 2.25> =

"""
<FORM action=\""""+CGIBIN+"""/ranktrains.py" method="post">
<INPUT type="hidden" name="number" value="25"/>
<P>This image ranks %d out of %d
<BUTTON type="submit" name="startnum" value="%d">%04d-%04d</BUTTON>
</P>
</FORM>
""" % (rank,totalimages,startrank,startrank+1,startrank+25)

Chunk referenced in 2.21

2.3.6 Define Procedure to Search Directories

The procedure visit is called when we have an image name, but no path to the image. The procedure recursively visits all directories reachable from the initial parameter dir, and if it finds the image, calls display to perform the actual display of the image. It then returns. Thumbnail directories are skipped. It is assumed that the initial dir parameter contains the substring trains/, indicating where the trains subdirectory begins.

<define procedure to search directories 2.26> =

def visit(dir,level,image):
  list = os.listdir(dir)
  for f in list:
    if f == 'thumb':
      continue
    path = dir + "/" + f
    if f == image+'.jpg':
      res=re.match('(.*)(trains/[^.]*)\.jpg',path)
      if res:
        display(res.group(2))
        return 1
    if os.path.isdir(path):
      res=visit(path,level+2,image)
      if res:
        return 1
  return 0

Chunk referenced in 2.5

3. `ranktrains.py`

ranktrains.py is a web script that delivers pages of rankings for the railway database. Each page is limited in size (currently 25 images), and there are navigation buttons to browse forwards and backwards through the rankings. Images are buttons that take the viewer to the full-size image (via viewtrains), while image titles are links that take the viewer to the home page of the image.

"ranktrains.py" 3.1 =

#!/usr/bin/python
# DO NOT EDIT THIS FILE!  
# use ~/Computers/python/viewtrains/viewtrains.xlp instead

import re,string,sys,datetime
import cgi,os
import time
import urllib
import math
import rank
from xml.dom.minidom import parse, parseString, Node

print "Content-Type: text/html\n\n";

<ranktrains: define constants 3.2>
print "<base href=\""+HOMEPAGE+"/\"><BR/>\n"

if debug:
  print "CGIBIN=%s<BR/>" % (CGIBIN)
  print "HOMEPAGE=%s<BR/>" % (HOMEPAGE)
  print "BASEPAGE=%s<BR/>" % (BASEPAGE)
  print "LOGFILE=%s<BR/>" % (LOGFILE)
  #print "=%s" % ()
  #print "=%s" % ()

<ranktrains: collect cgi parameters 3.3>
<ranktrains: collect previous ranking information 3.4>
<ranktrains: update rankings with latest log info 3.5>
htmltitle="Dynamic Rankings"
<print header of html page 2.9,2.10,2.11,2.12,2.13>

<rankings: print forward and backward buttons 3.6>
<ranktrains: generate image rankings 3.7>
<rankings: print forward and backward buttons 3.6>
<ranktrains: print rankings table 3.8>
<ranktrains: print ranking analysis 3.9>

scriptnameparm="ranktrains.py"
<print trailer of html page 2.14>

3.1 ranktrains: define constants

This macro is also defined in other sections.

<ranktrains: define constants 3.2> =

<define global constants 1.1>
tm = time.asctime(time.localtime(time.time())) + ["", "(Daylight savings)"][DST]
SCRIPT=CGIBIN+"/ranktrains.py"

Chunk referenced in 3.1

3.2 ranktrains: collect cgi parameters

<ranktrains: collect cgi parameters 3.3> =

form = cgi.FieldStorage()

gotparms=0
numtodisplay=25; startnum=0

if form.has_key("number"):
  numtodisplay=int(form["number"].value)
  gotparms=1
if form.has_key("startnum"):
  startnum=int(form["startnum"].value)
  gotparms=1
stopnum=startnum+numtodisplay

if debug:
  print "numtodisplay = %d, startnum = %d" % (numtodisplay,startnum)

Chunk referenced in 3.1

3.3 ranktrains: collect previous ranking information

<ranktrains: collect previous ranking information 3.4> =

RANKINGS=LOGFILE+"/trainrank"
VIEWINGS=LOGFILE+"/trainview"
totalimages, datatime, votefactor, table = rank.rankdata(RANKINGS)

if debug:
  print "totalimages=%s<br/>" % (totalimages)
  print "datatime=%s<br/>" % (datatime)
  print "votefactor=%s<br/>" % (votefactor)
  print "table=%s<br/>" % (table)

Chunk referenced in 3.1

3.4 ranktrains: Update Rankings with Latest Log Info

<ranktrains: update rankings with latest log info 3.5> =

logentrypat=\
    re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) "+\
               "([0-9a-f\.:]+ )?(trains/)?(.*)$")
notserved=re.compile(".*\*\*\* .* \*\*\*")

totalimages, logcount, ranktime, sorttime, sortlist = \
  rank.ranklog(VIEWINGS,table,notserved)

if debug:
  print "totalimages=%s<br/>" % (totalimages)
  print "logcount=%s<br/>" % (logcount)
  print "ranktime=%s<br/>" % (ranktime)
  print "sorttime=%s<br/>" % (sorttime)
  print "sortlist=%s<br/>" % (sortlist)

Chunk referenced in 3.1

3.5 rankings: Print Forward and Backward Buttons

<rankings: print forward and backward buttons 3.6> =

print "<FORM action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n"
print "<table align=\"center\"><tr>\n"
print "<INPUT type=\"hidden\" name=\"number\" value=\"%d\"/>" % (numtodisplay)
if startnum-numtodisplay>=0:
  print "<td><BUTTON type=\"submit\" name=\"startnum\" "
  print "value=\"%d\">Prev (%d-%d)</BUTTON></td>" % \
        (startnum-numtodisplay,startnum-numtodisplay+1,startnum)
else:
  print "<td><BUTTON type=\"submit\">(Prev)</BUTTON></td>"
print "<td><BUTTON type=\"submit\" name=\"startnum\" "
print "value=\"%d\">Next (%d-%d)</BUTTON></td>" % \
      (startnum+numtodisplay,startnum+numtodisplay+1,startnum+2*numtodisplay)
print "</tr></table>\n"
print "</FORM>\n"

Chunk referenced in 3.1

3.6 ranktrains: Generate Image Rankings

<ranktrains: generate image rankings 3.7> =

print "<H1>Image Rankings %d - %d</H1>\n" % (startnum+1,stopnum)

print "<FORM action=\""+CGIBIN+"/viewtrains.py\" "
print "method=\"post\" name=\"image\"><table align=\"center\">\n"
perline=5; posonline=0
i=1; last=0.0; rank=1
for (n,k) in sortlist:
  #print "%1.4f %s <BR/>" % (n,k)
  #(n,k) = sortlist[i]
  if last!=n:
    rank=i
  if i>startnum:
    if posonline==0:
      print "<tr>\n"
    res=re.match("(.*)/([^/]*)",k)
    path="" ; image=""
    if res:
      path=res.group(1)
      image=res.group(2)
    else:
      image=k
    caption=k
    if len(caption)>17:
      caption=path+"<BR/>"+image
    try:
      xmlfname=BASEPAGE+"/trains/"+path+"/"+image+".xml"
      xmlfile = open(xmlfname)
      dom = parse(xmlfile)
      nameelem=dom.getElementsByTagName('name').item(0)
      pageattr=nameelem.getAttributeNode('page')
      if pageattr:
        page=pageattr.nodeValue
      else:
        page='index'
      xmlfile.close()
      print "<td align=\"center\">"
      print "<table><tr><td>%4d</td>" % (rank)
      print "<td align=\"right\">%2.6f</td></tr>" % (n)
      print "<tr><td colspan=\"2\" align=\"center\">"
      #print '<INPUT type="hidden" name="disablevote" value="1"/>'
      print "<BUTTON type=\"submit\" name=\"image\" value=\"trains/%s\">" % (k)
      print "<IMG align=\"center\" ALT=\"click me for full image\" "
      print "SRC=\"trains/%s/thumb/%s.gif\">" % (path,image)
      print "</BUTTON></td></tr><tr><td colspan=\"2\" align=\"center\">"
      print "<A HREF=\"trains/%s/%s.xml#%s\">%s</A></td></tr></table></td>" % \
            (path,page,image,caption)
    except IOError:
      print "<td align=\"center\">"
      print "<table><tr><td>%4d</td>" % (rank)
      print "<td align=\"right\">%2.6f</td></tr>" % (n)
      print "<tr><td colspan=\"2\" align=\"center\">"
      print "Cannot access<BR/>"
      print "%s at %s</td></tr></table></td>" % (caption,xmlfname)
      pass
    posonline+=1
    if posonline==perline:
      print "</tr>\n"
      posonline=0
  i+=1
  if i>stopnum:
    break
  last=n
print "</table></FORM>\n"

Chunk referenced in 3.1

3.7 ranktrains: print rankings table

<ranktrains: print rankings table 3.8> =

print "<H1>Rankings Images Table</H1>\n"

print "<FORM action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n"
print "<table align=\"center\"><tr>\n"
print "<INPUT type=\"hidden\" name=\"number\" value=\"%d\"/>" % (numtodisplay)
linecount=0
# compute score from first image
(score,key)=sortlist[0]
print "<td align=\"left\">%7.4f</td>" % (score)
nimages=len(sortlist)-1
for i in range(0,totalimages,numtodisplay):
    print "<td align=\"center\">"
    print "<BUTTON type=\"submit\" name=\"startnum\" "
    print "value=\"%d\">%04d-%04d</BUTTON></td>" % (i,i+1,i+numtodisplay)
    linecount+=1
    # compute score from image number i+numtodisplay
    j=i+numtodisplay-1
    if j > nimages: j = nimages
    (score,key)=sortlist[j]
    if linecount % 6 == 2:
      print "<td align=\"left\">%7.4f</td>" % (score)
    if linecount % 6 == 4:
      print "<td align=\"left\">%7.4f</td>" % (score)
    if linecount % 6 == 0:
      print "<td align=\"left\">%7.4f</td>" % (score)
      print "</tr><tr>\n"
      # compute score from image number i+numtodisplay+1
      j=i+numtodisplay
      if j > nimages: j = nimages
      (score,key)=sortlist[j]
      print "<td align=\"left\">%7.4f</td>" % (score)
# compute score from last image
(score,key)=sortlist[len(sortlist)-1]
print "<td align=\"left\">%7.4f</td>" % (score)
print "</tr></table>\n"
print "</FORM>\n"
print """
<P>
  A full explanation of how these rankings are computed can be found
  on the <A HREF=\""""+HOMEPAGE+"""/trains/pops/index"""+EXTN+"""">Vox Pops Page</A>
</P>
"""

Chunk referenced in 3.1

3.8 ranktrains: Print Ranking Analysis

<ranktrains: print ranking analysis 3.9> =

print "<H1>Ranking Analysis Data</H1>\n"

print '<P>Time analyses based on wallclock times</P>'

print 'votefactor=%f ' % (votefactor) + \
      '(this is the decay since last rankings were computed)<BR/>'
print "Ranking data input took %d.%06d seconds for %d images<BR/>" % \
      (datatime.seconds,datatime.microseconds,totalimages)
print "Logfile input took %d.%06d seconds for %d entries<BR/>" % \
      (datatime.seconds,datatime.microseconds,logcount)
print "Input analysis and sorting took %d.%06d seconds<BR/>" % \
      (sorttime.seconds,sorttime.microseconds)
print "Data ranking took %d.%06d seconds<BR/>" % \
      (ranktime.seconds,ranktime.microseconds)

Chunk referenced in 3.1

4. `tidytrains.py`

read two files to determine current rankings. The first file contains a single entry for each image, containing the current voting score, and the second file contains a list of votes since then. This ranking is then used to determine if any images should be removed from the system, depending upon whether or not their rank is less than some threshold.

"tidytrains.py" 4.1 =

#!/usr/bin/python

<imports 2.2>
import getopt
import shutil

<define global constants 1.1>

THRESHOLD=0.000170
MASTERDIR=BASEPAGE+'/trains'
WEBDIR=BASEPAGE
WEBLOG=LOGFILE
WEBPAGE=WEBDIR+'/trains/'
WEBRANK=WEBLOG+'/trainrank'
WEBVIEW=WEBLOG+'/trainview'

opts, args = getopt.getopt(sys.argv[1:],"s:f:")
#CURRENT=WEBRANK
#LOGFILE=WEBVIEW
CURRENT=WEBLOG+'/trainrank'
LOGFILE=WEBLOG+'/trainview'
LISTFILE=WEBLOG+'/trainlist'
command="find %s -name \*.jpg >%s" % (MASTERDIR,LISTFILE)
status=os.system(command)
if status:
  print "Urrk 2! %d" % (status)
  sys.exit(status)
available=[]
avfile=open(LISTFILE)
for l in avfile.readlines():
  l=l.strip()
  l=l[11:] # strip off www/trains/
  #print ">%s<" % l
  available.append(l)
avfile.close()
#os.remove(LISTFILE)

oneday=datetime.timedelta(days=1)
yesterday=datetime.datetime.now()-oneday
starttime=yesterday.strftime("%Y%m%d:000000")
finishtime="20201231:235959"
for opt,val in opts:
  print "%s  %s" % (opt,val)
  if opt=='-s':
    starttime=val
  elif opt=='-f':
    finishtime=val
  else:
    print "Unknown option %s" % (opt)

print "starttime = %s" % (starttime)
totalimages, datatime, votefactor, table = rank.rankdata(CURRENT)

ignorepat=re.compile(".*\*\*\* [^*]* \*\*\*")
totalimages,logcount, ranktime,sorttime,sortlist = \
  rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)

for (val,key) in sortlist:
  if val<THRESHOLD:
    #path=WEBPAGE+key+'.jpg'
    path=SCSSEPAGE+key+'.jpg'
    if key in available:
      print 'removing %s (not really)' % (path)
      #command='ssh -1 nexus.csse.monash.edu.au "rm %s"' % (path)
      #status=os.system(command)
      #os.remove(path)
    else:
      print '%s already removed' % (path)
      pass
  
table = {}

ignorepat=re.compile(r'.*((-[0-9a-z]+)|(already voted \*\*\*))$')
totalimages,logcount, ranktime,sorttime,sortlist = \
  rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)

pathpat=re.compile(r'([^ ]+) ')
for (val,key) in sortlist:
  #print "%s %7.6f" % (key,val)
  res=pathpat.match(key)
  if res:
    path=res.group(1)
    webpath  = MASTERDIR+path+'.jpg'
    #sitepath = WEBPAGE+path+'.jpg'
    sitepath = SCSSEPAGE+path+'.jpg'
    #webacc=os.access(webpath,os.F_OK)
    #siteacc=os.access(sitepath,os.F_OK)
    #if webacc:
      #if siteacc:
        #pass
      #else:
        #print 'cp -p %s %s' % (webpath,sitepath)
        #shutil.copy2(webpath,sitepath)
    #else:
      #if siteacc:
        #print 'Anomalous: %s exists but %s does not' % (sitepath,webpath)
      #else:
        #print 'Cannot recover %s as there is no master copy' % (sitepath)
    command='/usr/local/bin/rsync -auv %s nexus.csse.monash.edu.au:%s &>/dev/null' % \
      (webpath,sitepath)
    status=os.system(command)
    if not status:
      print 'recovered %s' % (sitepath)
    else:
      print 'Could NOT recover %s, status:%d' % (sitepath,status)

sys.exit(0)

5. `ranking.py`

"ranking.py" 5.1 =

#!/usr/bin/python

# read two files to determine current rankings. The first file
# contains a single entry for each image, containing the current
# voting score, and the second file contains a list of votes since
# then.  A third file is then constructed, containing the updated
# rankings.

# DO NOT EDIT THIS FILE!  
# use ~/Computers/python/viewtrains/viewtrains.xlp instead

import re,string,sys,datetime
import cgi,os
import getopt
import time
import urllib
import math

<define global constants 1.1>

tm = time.asctime(time.localtime(time.time())) + ["", "(Daylight savings)"][DST]

startnow=datetime.datetime.now()

ignoredirs = re.compile('(tmp)|(units)')

top=BASEPAGE+"/trains"
jpgpat=re.compile(r'(.*)\.jpg$')
xmlpat=re.compile(r'.*\.xml$')
datepat=re.compile(r'(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2})(\d{2})')

def strtotime(str,default):
  res=datepat.match(str)
  if res:
    thisdatetime=datetime.datetime(int(res.group(1)), # year
                                   int(res.group(2)), # month
                                   int(res.group(3)), # day
                                   int(res.group(4)), # hour
                                   int(res.group(5)), # minute
                                   int(res.group(6))) # second
    return thisdatetime
  else:
    return default
    

opts, args = getopt.getopt(sys.argv[1:],"s:f:")
CURRENT=args[0]
LOGFILE=args[1]
NEW=args[2]

starttime="20050101:000000"
finishtime="20201231:235959"
for opt,val in opts:
  print "%s  %s" % (opt,val)
  if opt=='-s':
    starttime=val
  elif opt=='-f':
    finishtime=val
  else:
    print "Unknown option %s" % (opt)
print "%s %s" % (starttime,finishtime)
starttime=strtotime(starttime,None)
finishtime=strtotime(finishtime,None)
print "%s %s" % (starttime,finishtime)

currcount=0
currlist=file(CURRENT,"r")
table={}

DECAY=15.0

currdate=currlist.readline()
currdatetime=strtotime(currdate,startnow)

timesincelast=startnow-currdatetime
dayssincelast=timesincelast.days+timesincelast.seconds/86400.0
expval=-dayssincelast/DECAY
votefactor=math.exp(expval)

print 'votefactor=%f' % (votefactor)

totalimages=0
for l in currlist.readlines():
  res=re.match(r'([^ ]+) +([0-9.]+)$',l)
  if res:
    lastvote=float(res.group(2))
    nowvote=votefactor*lastvote
    table[res.group(1)]=nowvote
  else:
    print 'bad format in %s' % (l)
  totalimages+=1
currlist.close()
datanow=datetime.datetime.now()
datatime = datanow-startnow
print "Data input took %d.%06d seconds for %d images<BR/>" % \
    (datatime.seconds,datatime.microseconds,totalimages)


data=open(LOGFILE)
pat=re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) ([0-9a-f\.:]+ )?(trains)?(.*)$")

Chunk defined in 5.1,5.2

20100610:143701 Note that the : in the IP field (group 6) has been added to cope with IPv6 naming conventions.

"ranking.py" 5.2 =

notserved=re.compile(".*\*\*\* [^*]* \*\*\*")

logcount=0
for l in data.readlines():
  logcount+=1
  res=notserved.match(l)
  if res:
    continue
  res=pat.match(l)
  if res:
    <ranking: parse and process a ranking entry 5.3>
  pass
ranknow=datetime.datetime.now()
ranktime = ranknow-datanow
print "Input analysis took %d.%06d seconds for %d entries<BR/>" % \
  (ranktime.seconds,ranktime.microseconds,logcount)

list=[]

for key in sorted(table.keys()):
  list.append((table[key],key))
sortlist=sorted(list,reverse=True)

<ranking: build new ranking list 5.4>
<ranking: print closing summary 5.5>

sys.exit(0)

Chunk defined in 5.1,5.2

<ranking: parse and process a ranking entry 5.3> =

year=int(res.group(1))
month=int(res.group(2))
day=int(res.group(3))
hour=int(res.group(4))
minute=int(res.group(5))
accesstime = datetime.datetime(year,month,day,hour,minute)
#print "%s %s %s" % (accesstime,starttime,finishtime)
if accesstime < starttime or accesstime > finishtime:
  print "ignoring %s" % (l)
  continue
timesinceaccess=startnow-accesstime
dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
expval=-dayssinceaccess/DECAY
voteval=math.exp(expval)
#print "%1.4f  %2.6f %2.5f" % (voteval,expval,dayssinceaccess)
ipadr=res.group(6)
if ipadr:
  ipadr=ipadr.strip()
imagename=res.group(8)
res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagename)
while res:
  imagename=res.group(1)+'/'+res.group(3) 
  res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagename)
  print imagename
imagename=imagename[1:]
if table.has_key(imagename):
  table[imagename]+=voteval
else:
  table[imagename]=voteval

Chunk referenced in 5.2

<ranking: build new ranking list 5.4> =

currlist=file(NEW,"w")
currlist.write('%04d%02d%02d:%02d%02d%02d\n' % (startnow.year,
                                              startnow.month,
                                              startnow.day,
                                              startnow.hour,
                                              startnow.minute,
                                              startnow.second))
for (val,key) in sortlist:
  currlist.write('%s %f\n' % (key,val))
currlist.close()

Chunk referenced in 5.2

<ranking: print closing summary 5.5> =

closenow=datetime.datetime.now()
sorttime = closenow-ranknow
print "Data sorting took %d.%06d seconds<BR/>" % (sorttime.seconds,sorttime.microseconds)

Chunk referenced in 5.2

6. `rank.py`

This module provides three procedures for analysing the rankings of John's railway photographs. Rnaking data is stored in two files, generically known as the trainrank and trainview files. The first stores ranking data for a set of images, as computed at a specified data and time. The second stores access requests for images in the database, together with the time and IP address of the request. Note that there is one entry for each unique image in the first file, whereas there may be multiple entries in the second file for any given image.

"rank.py" 6.1 =

import re,string,sys,datetime
import cgi
import getopt
import math
import os
import time
import urllib

startnow=datetime.datetime.now()

datepat=re.compile(r'(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2})(\d{2})')
pat=re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) ([0-9a-f\.:]+ )?(trains/)?(.*)$")
DECAY=15.0

<rank: define the strtotime function 6.2>    
<rank: define rankdata procedure 6.3>
<rank: define ranklog procedure 6.4>

6.1 rank: define the strtotime function

<rank: define the strtotime function 6.2> =

def strtotime(str,default):
  res=datepat.match(str)
  if res:
    thisdatetime=datetime.datetime(int(res.group(1)), # year
                                   int(res.group(2)), # month
                                   int(res.group(3)), # day
                                   int(res.group(4)), # hour
                                   int(res.group(5)), # minute
                                   int(res.group(6))) # second
    return thisdatetime
  else:
    return default

Chunk referenced in 6.1

6.2 rank: define rankdata procedure

The rankdata procedure reads the train rank file (the path to which is passed as the parameter), containing a date and time when the file was created, together a line for every image in the database. Each line contains the image path, starting at the trains subdirectory, together with the (decayed) vote value. Each vote is decayed by an exponential factor votefactor, where the exponent is proportional to the length of time since the creation date of the file. An updated vote value is entered into an associative table, which is returned, along with various housekeeping values of totalimages (the total number of distinct images processed), and datatime, the elapsed wall time spent in processing this file.

<rank: define rankdata procedure 6.3> =

def rankdata(CURRENT):
  datastart=datetime.datetime.now()
  totalimages=0
  table={}
  currlist=file(CURRENT,"r")
  currdate=currlist.readline()
  currdatetime=strtotime(currdate,startnow)
  timesincelast=startnow-currdatetime
  dayssincelast=timesincelast.days+timesincelast.seconds/86400.0
  expval=-dayssincelast/DECAY
  votefactor=math.exp(expval)
  for l in currlist.readlines():
    res=re.match(r'([^ ]+) +([0-9.]+)$',l)
    if res:
      lastvote=float(res.group(2))
      nowvote=votefactor*lastvote
      table[res.group(1)]=nowvote
    else:
      print 'bad format in %s' % (l)
    totalimages+=1
  currlist.close()
  datanow=datetime.datetime.now()
  datatime = datanow-datastart
  return totalimages, datatime, votefactor, table

Chunk referenced in 6.1

6.3 rank: define ranklog procedure

<rank: define ranklog procedure 6.4> =

def ranklog(LOGFILE,table,ignorepat,
            starttime="20050101:000000",finishtime="20201231:235959"):
  logstart=datetime.datetime.now()
  data=open(LOGFILE)  
  logcount=0  
  starttime=strtotime(starttime,None)
  finishtime=strtotime(finishtime,None)
  for l in data.readlines():
    logcount+=1
    l=l.strip()
    res=ignorepat.match(l)
    if res:
      continue
    res=pat.match(l)
    if res:
      year=int(res.group(1))
      month=int(res.group(2))
      day=int(res.group(3))
      hour=int(res.group(4))
      minute=int(res.group(5))
      accesstime = datetime.datetime(year,month,day,hour,minute)
      #print "%s %s %s" % (accesstime,starttime,finishtime)
      if accesstime < starttime or accesstime > finishtime:
        #print "ignoring %s" % (l)
        continue
      timesinceaccess=startnow-accesstime
      dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
      expval=-dayssinceaccess/DECAY
      voteval=math.exp(expval)
      #print "%1.4f  %2.6f %2.5f" % (voteval,expval,dayssinceaccess)
      ipadr=res.group(6)
      if ipadr:
        ipadr=ipadr.strip()
      imagename=res.group(8)
      if table.has_key(imagename):
        table[imagename]+=voteval
      else:
        table[imagename]=voteval
    pass
  ranknow=datetime.datetime.now()
  ranktime = ranknow-logstart
  list=[]  
  for key in sorted(table.keys()):
    list.append((table[key],key))
  sortlist=sorted(list,reverse=True)
  totalimages = len(sortlist)
  closenow=datetime.datetime.now()
  sorttime = closenow-ranknow
  return totalimages, logcount, ranktime, sorttime, sortlist

Chunk referenced in 6.1

7. TO DO

Make link of "rank n" to anchor point in ranktrains.py. First need to add anchors to ranktrains.py to make this work.

8. Indices

8.1 Identifier Index

Identifier	Defined in	Used in
BASEPAGE	1.2	1.3, 2.3, 2.20, 2.21, 2.23, 3.7
BASEPAGE	1.3	1.3, 2.3, 2.20, 2.21, 2.23, 3.7
BASEPAGE	1.4	1.3, 2.3, 2.20, 2.21, 2.23, 3.7
RANKINGS	2.8
VIEWINGS	2.8
datatime	6.3
dir	2.26	2.26, 2.26
display	2.21	2.1, 2.26
imageisrelative	2.6	2.1
imageisrelative	2.6	2.1
logentrypat	2.17	2.19
logentrypat	3.5	2.19
rankdata	6.3
ranklog	6.4	3.5
strtotime	6.2	5.1
table	6.3
top	2.3	2.1, 2.21, 2.21
totalimages	6.3
visit	2.26	2.1, 2.26
votefactor	6.3

8.2 Chunk Index

Chunk Name	Defined in	Used in
Explain missing images	2.24	2.23
Print ranking information	2.25	2.21
collect cgi parameters	2.6	2.1
collect previous rankings	2.8	2.1
define constants for viewtrains	2.3, 2.7	2.1
define constants for viewtrains	2.3, 2.7	2.1
define function convertIPtoHex	2.15	2.5
define getrank routine	2.18	2.17
define procedure log	2.16	2.5
define procedure to display the image	2.21	2.5
define procedure to search directories	2.26	2.5
define regular expression patterns	2.4	2.1
define subroutines	2.5	2.1
define the XML dispatch routines	2.20	2.5
define the ranking procedure	2.17	2.5
determine server environment	1.5	1.1
display can access file	2.22	2.21
display cannot access file	2.23	2.21
getrank: extract data from a single logfile entry	2.19	2.18
globals for linux	1.4	1.5
globals for macosx	1.2	1.5
globals for solaris	1.3	1.5
imports	2.2	2.1, 4.1
print header of html page	2.9, 2.10, 2.11, 2.12, 2.13	2.1, 3.1
print trailer of html page	2.14	2.1, 3.1
rank: define rankdata procedure	6.3	6.1
rank: define ranklog procedure	6.4	6.1
rank: define the strtotime function	6.2	6.1
ranking: build new ranking list	5.4	5.2
ranking: parse and process a ranking entry	5.3	5.2
ranking: print closing summary	5.5	5.2
rankings: print forward and backward buttons	3.6	3.1, 3.1
ranktrains: collect cgi parameters	3.3	3.1
ranktrains: collect previous ranking information	3.4	3.1
ranktrains: define constants	3.2	3.1
ranktrains: generate image rankings	3.7	3.1
ranktrains: print ranking analysis	3.9	3.1
ranktrains: print rankings table	3.8	3.1
ranktrains: update rankings with latest log info	3.5	3.1

8.3 File Index

File Name	Defined in
rank.py	6.1
ranking.py	5.1, 5.2
ranktrains.py	3.1
tidytrains.py	4.1
viewtrains.py	2.1

View and Rank Trains

John Hurst

Version 0.1.0

Table of Contents

1. Overview

1.1 Define Global Constants

2. viewtrains.py Main Body

TEST MESSAGE!

2.1 Initialisation

2.1.1 Imports

2.1.2 Define Global Constants

2.1.3 Define Regular Expression Patterns

2.1.4 Define Subroutines

2.2 Supporting Code

2.2.1 collect cgi parameters

2.2.2 Collect Previous Rankings

2.2.3 print header of html page

2.2.4 Print Trailer of HTML Page

2.3 Define Subroutines

2.3.1 define function convertIPtoHex

2.3.2 define procedure log

2.3.3 Determine rank of image

2.3.4 Define the XML Dispatch Routines

2.3.5 define procedure to display the image

2.3.6 Define Procedure to Search Directories

3. ranktrains.py

3.1 ranktrains: define constants

3.2 ranktrains: collect cgi parameters

3.3 ranktrains: collect previous ranking information

3.4 ranktrains: Update Rankings with Latest Log Info

3.5 rankings: Print Forward and Backward Buttons

3.6 ranktrains: Generate Image Rankings

3.7 ranktrains: print rankings table

3.8 ranktrains: Print Ranking Analysis

4. tidytrains.py

5. ranking.py

6. rank.py

6.1 rank: define the strtotime function

6.2 rank: define rankdata procedure

6.3 rank: define ranklog procedure

7. TO DO

8. Indices

8.1 Identifier Index

8.2 Chunk Index

8.3 File Index

2. `viewtrains.py` Main Body

3. `ranktrains.py`

4. `tidytrains.py`

5. `ranking.py`

6. `rank.py`