Global Synchronization of Files

Version 2.1.4

John Hurst

Table of Contents

1 Abstract
2 Introduction
3 More Detailed Description
4 Description of The Environment
5 Program Specification
6 Design Document
7 User Manual
8 Literate Data
9 Literate Code
9.1 The blat module
9.2 blat: define the main routine
9.3 blat: main cli interpretation
9.4 blat: define the blat class
9.4.1 blat: class: define the commandMachine method
9.4.2 blat: class: define the loadState method
9.4.3 blat: class: define the saveState method
9.4.4 blat: class: define the checkMachines method
9.4.5 blat: class: define the blatFile method
9.4.6 blat: class: define the blatDirectory method
9.4.7 blat: class: define the updateToDos method
10 Makefile
11 Literate Tests
12 Bibliography
13 Glossary
14 Indices
14.1 Chunk Names
14.2 Identifiers
15 Maintenance History
15.1 Current status of this document:
15.2 Previous status of this document


1. Abstract

This document describes a python suite to synchronize files across multiple machines, known by their domain names.

2. Introduction

The blat.py program is written as a module, containing a single class, blat. This class provides methods to perform all the necessary operations in copying files around the network (called "blatting" the file). When called as a standalone program, blat provides a simple interface for copying a list of files or directories.

3. More Detailed Description

Working with multiple machines raises the question of how one maintains consistency of working files across the multiple platforms. I have previously used various scripts with rsync as the key ingredient, but these have all been limited to just single peer-to-peer transactions. What I was looking for is a script that can operate on any machine in the network, and which will bring all copies anywhere on the network up-to-date.

There are some limitations to this. Obviously, the machines need to be available. If it is off-line, shut down, or otherwise unavailable, it cannot participate in the sychronize operation. So one of the first tasks of the script must be to ascertain what machines are available. My first pass at this required a separate program (online.py) to check all machines, so that this did not need to be done everytime a file was synchronized across the network.

A second phase, synchronize.py, was used to actually perform the synchronization. An optional parameter to this program could be used to force a rescan of the network, but usually, once this had been done, subsequent invocations did not need it, as machines did not change their status that quickly.

But then I realized that it did not need to be so decoupled. As attempts were made to get file status, and to copy files, information on the current machine states could be updated. This would only need to be done as required, rather than attempt to update all machines at once. Accordingly, the two programs synchronize.py and online.py were merged into a single piece of code, blat.py, and the whole suite was rewritten in Python 3 (currently 3.3 compatible, although developed under 3.5)

4. Description of The Environment

The program is intended to run under Unix, in all its various flavours, including Darwin (the MacOSX version of Unix). (It has not been tested it on all variants of Unix, however.)

5. Program Specification

Not yet written.

6. Design Document

Not yet written.

7. User Manual

The blat module can be imported, or used as a standalone program.

As a standalone program, it provides a simple interface to the facilities of the module. Here is the usage outline:

        blat [-V|--version] [-r|--reload] {file/directory}*
      
-V|--version
This option prints the current version then exits.
-r|--reload
This option forces a reload of the state table before anything else is done.
file|directory
This option is a list of files and/or directories for copying across the network, where the current state of the network is as defined by the state table.

As an imported module, it provides a blat class. Instances of the blat class have the following methods:

probeMachine(m)
Check if machine m is alive or not. Return True or False as appropriate.
commandMachine(m,c)
loadState(file='~/etc/machine.txt')
Load the current state tables from a file
saveState(file='~/etc/machine.txt')
Save the current state tables to a file.
checkMachines()
Probe all known machines to confirm their current state.
blatFile(f)
Write the file f to all currently active machines.
blatDirectory(d,r)
Write the contents of directory d to all currently active machines. If r is True, perform this action recursively.
catchupBlatup(m)
Not yet implemented.

8. Literate Data

The file generated by the blat.py module contains lines of data indicating the machine, its IP address, the operating system used, its current state, together with a possible empty list of aliases used by the machine.

The possible states of a machine are

online
The machine is on-line and available.
outofuse
The machine may or may not be on-line, but we are not interested in its current state. This may happen if we know the machine is out of action, and there is no point in testing it, or where the particular updates being performed are not relevant to this particular machine.
offline
The machine is not responding to ssh requests.
unknown
The state of the machine is unknown. This is the default state for all machines until established otherwise.
Note that when the machine.txt file is first loaded, the machine state is taken as presumptive, meaning that if it is online, it is assumed to still be so, unless an attempt to talk to it establishes otherwise.

"machine.txt" 8.1 =
# machine.txt file format #machine,IPaddress,OS,state,[(alias,aliasIP),...] spencer,10.0.0.120,Linux,online,[(ajh.co,101.167.106.147),(ajh.id.au,101.167.106.147)] dimboola,10.0.0.110,Darwin,offline hamilton,10.0.0.118,Darwin,unknown

9. Literate Code

9.1 The blat module

"blat.py" 9.1 =
#!/usr/bin/python3 # # A python 3 program to blat copies of files and directories across the network # #module to provide blat facilities import argparse import datetime import os,os.path import re import socket import subprocess import sys thismachine=socket.gethostname() if thismachine=='MU00087507X': thismachine='hamilton.local' verbose=False debug=False version="<current version 15.1>" MissedFilesLog='/home/ajh/logs/blat.log' def toString(fObj): rtn='' for line in fObj: rtn+=line return rtn <blat: define the blat class 9.4> <blat: define the main routine 9.2> <blat: main cli interpretation 9.3>

We need to check upon which machine this code is running. Hence the call to socket.gethostname. However, this call returns the wrong machine name for hamilton (because of Monash University policy), and hence the correction applied.

9.2 blat: define the main routine

<blat: define the main routine 9.2> =
def main(blatter,path): if verbose: print(path) for f in path: f=os.path.abspath(f) if os.path.isdir(f): blatter.blatDirectory(f,True) else: blatter.blatFile(f)
Chunk referenced in 9.1

9.3 blat: main cli interpretation

<blat: main cli interpretation 9.3> =
if __name__ == '__main__': skipMachines=[] parser=argparse.ArgumentParser(description='Blat files and directories around network') parser.add_argument('-V','--version',action='version',version=version) parser.add_argument('-r','--reload',action='store_true') parser.add_argument('-v','--verbose',action='store_true') parser.add_argument('path',nargs='*') args=parser.parse_args() if args.verbose: verbose=True else: verbose=False if verbose: print(args) blatter=blat() curState=blatter.loadState() if args.reload: machines=blatter.checkMachines() main(blatter,args.path) if verbose: print("\n\nList of files copied:") for (t,m,f) in blatter.copied: print(t,m,f) blatter.saveState()
Chunk referenced in 9.1

This is my first attempt at using argparse, and I have to say, it was not intuitively obvious how to use it. But I have sort-of cracked it. We first build a parser for the cli arguments, and describe it appropriately. This description is used in the help menu (blat -h) to remind the user what the program is for.

Then we add the various cli arguments that can be used:

-V --version
Print the current version number and exit.
-r --reload
Probe every machine defined in the state table to determine its current status, before performing any other action.
path[s]
Remaining parameters are assumed to be a list of files or directories for 'blatting' (copying) across the network, according to what machines are accessible.

Then we can parse the arguments/parameters, creating a namespace of parameter values inside the object args. If the -v arg is given, just print the version number and exit.

Make a blatter object and load its state. If the reload flag is given, run through every machine in the state table and check its availability. Note that machines whose state is given as outofuse are ignored.

Then we are ready to run the main routine, and following the successful completion of this, we save the current state table, and exit the program.

9.4 blat: define the blat class

<blat: define the blat class 9.4> =
class blat(): def __init__(self): self.machines={} # indexed by machine self.copied=[] # list of tuples describing files copied <blat: class: define the commandMachine method 9.5> def probeMachine(self,m): ''' Run a dummy ssh call to see if machine is alive ''' (r,s)=self.commandMachine(m,['echo','"OK"']) return r==0 def __str__(self): retstr='' for k in self.machines: (machine,ipadr,system,state,aliases)=self.machines[k] retstr+="machine {} is at {}\n".format(machine,ipadr) retstr+=" running {} and state {}\n".format(system,state) if aliases: retstr+=" aliases are {}\n".format(aliases) return retstr <blat: class: define the loadState method 9.7> <blat: class: define the saveState method 9.8> <blat: class: define the checkMachines method 9.9> <blat: class: define the blatFile method 9.10> <blat: class: define the blatDirectory method 9.16> <blat: class: define the updateToDos method 9.17> def catchupBlatup(self,m): ''' m is a machine to check if now alive. Update list of machines depending on outome. If alive, check log, and blat all files mentioned there. Delete log entry if successful blatting. ''' return OK
Chunk referenced in 9.1

This class provides all the methods needed to blat files and directories across the network. A key data structure is the state table, blat.machines (the file format for which is defined in section Literate Data).

9.4.1 blat: class: define the commandMachine method

<blat: class: define the commandMachine method 9.5> =
def commandMachine(self,m,cmd): ''' Run a real ssh call to perform something. Update machine state if different from current. cmd is a list of tokens to form command ''' # don't need ssh if local machine if m==thismachine: perform=cmd else: perform=['ssh','-o','ConnectTimeOut=3',m]+cmd p=subprocess.Popen(perform,\ universal_newlines=True,\ stdout=subprocess.PIPE,\ stderr=subprocess.PIPE) errors=toString(p.stderr) outstr=toString(p.stdout) returncode=p.wait() #print(returncode,outstr) if returncode==0: <blat: class: commandMachine: command performed OK 9.6> else: # returncode!=0, but why? # check no connection failures if "No route to host" in outstr or 'timed out' in outstr: # only update table if no connection (machine,ipadr,system,state,aliases)=self.machines[m] state='offline' self.machines[m]=(machine,ipadr,system,state,aliases) # otherwise just return the non-zero return code, with err msg return (returncode,errors) return (returncode,outstr)
Chunk referenced in 9.4

This method forms the heart of all intermachine communication. It provides a mechanism to run a command on a remote machine. Either this command succeeds (when we know that the machine in question is alive, and we can update its status), or it fails, when we should also check that the status reflects the reason why it failed.

The key component in this method is the call to the subprocess method Popen. I experimented with other methods in subprocess (run, 3.5; and call, 3.3), before settling upon Popen, as this is the most generic of all, and runs on earlier versions of Python 3. But it is also more clunky, and requires more work on the part of the programmer.

The arguments for Popen are assembled, including the remote ssh call if a remote machine is accessed, and the parameters to Popen set up. The universal_newlines flag ensures that the standard out/err channels are returned as strings. The toString call then unpacks these file objects into a single string.

We then have two outcomes, depending on the value of the returncode. If it is non-zero, return the return code and the errors generated. These may be because the machine is not accessible, in which case update the machine state table. If it is zero, then we check that it is a known machine and update the state table if necessary. If not known, then add this machine to the state table, along with dummy information as necessary.

Finally, return the output from the system call.

<blat: class: commandMachine: command performed OK 9.6> =
# update machine state table if necessary if m in self.machines: (machine,ipadr,system,state,aliases)=self.machines[m] if state!='online': state='online' self.machines[m]=(machine,ipadr,system,state,aliases) else: # new machine, check if in aliases Found=False for k in self.machines: (machine,ipadr,system,state,aliases)=self.machines[k] if m in aliases: Found=True break if not Found: if verbose: print("Machine {} not in state table, adding it".format(m)) machine=m ipadr='' system='' state='online' aliases=[] self.machines[m]=(machine,ipadr,system,state,aliases)
Chunk referenced in 9.5

9.4.2 blat: class: define the loadState method

<blat: class: define the loadState method 9.7> =
def loadState(self,filename='/home/ajh/etc/machine.txt'): self.machines={} try: stateFile=open(filename,'r') except IOError: print("cannot open state file {}".format(filename)) sys.exit(1) for machineEntry in stateFile.readlines(): machineEntry=machineEntry.strip() #print(machineEntry) if machineEntry[0]=='#': continue res=re.match('([^,]*),([^,]*),([^,]*),([^,]*)(,\[[^]]*\])?$',machineEntry) if not res: print("Cannot parse machine state entry {}".format(machineEntry)) sys.exit(2) machine=res.group(1) ipadr=res.group(2) system=res.group(3) state=res.group(4) aliases=res.group(5) if aliases: aliases=aliases[1:] else: aliases=[] self.machines[machine]=(machine,ipadr,system,state,aliases) return self.machines
Chunk referenced in 9.4

9.4.3 blat: class: define the saveState method

<blat: class: define the saveState method 9.8> =
def saveState(self,filename='/home/ajh/etc/machine.txt'): try: stateFile=open(filename,'w') except IOError: print("cannot open state file {} for writing".format(filename)) sys.exit(1) keys=list(self.machines.keys()) keys.sort() for key in keys: (machine,ipadr,system,state,aliases)=self.machines[key] if not ipadr and state=='online': # don't know IP address and can look it up! (r,s)=self.commandMachine(machine,['/home/ajh/bin/eth0']) if r==0: ipadr=s.strip() print("{},{},{},{}".format(machine,ipadr,system,state),end='',file=stateFile) if aliases: print(",{}".format(aliases),file=stateFile) else: print("",file=stateFile)
Chunk referenced in 9.4

9.4.4 blat: class: define the checkMachines method

<blat: class: define the checkMachines method 9.9> =
def checkMachines(self): ''' Probe across network to see what machines are live. Build a file containing current accessibilty data 'machine.txt' This file should also contain system specific info ''' for k in self.machines: (m,i,o,s,a)=self.machines[k] test=self.probeMachine(m) if test: s='online' else: s='offline' self.machines[k]=(m,i,o,s,a) return self.machines
Chunk referenced in 9.4

9.4.5 blat: class: define the blatFile method

<blat: class: define the blatFile method 9.10> =
def blatFile(self,f): ''' write file f to all live machines. any machines not active, log the attempt for later ''' print("\nBlatting file {}".format(f)) abspath=os.path.join(os.getcwd(),f) <blat: blatFile identify latest copy 9.11> <blat: blatFile identify machines with latest copy 9.12> <blat: blatFile get latest copy if we don't have it 9.13> <blat: blatFile copy out latest copies 9.14> <blat: blatFile log unsuccessful attempt 9.15> if verbose: print("all done") return
Chunk referenced in 9.4

This method is the raison d'etre of the whole sheboodle. The work done in copying (blatting) a file consists of

A task (yet to be implemented) is a mechanism to run through (from time to time) this list of un-updated machines, and copy the file if the machine becomes available.

<blat: blatFile identify latest copy 9.11> =
# first find latest copy latest=0 times={} for k in self.machines: (m,i,o,s,a)=self.machines[k] if s=='outofuse': continue # don't even bother! print("{: <20}: ".format(m),end='') if o=='Linux': cmd=['/usr/bin/stat','-c','%Y',abspath] elif o=='Darwin': cmd=['/usr/bin/stat','-f','%m',abspath] else: print("Cannot identify machine {}'s system '{}'".format(m,o)) cmd=[] (res,str)=self.commandMachine(m,cmd) if res==0: time=int(str) times[m]=time time=datetime.datetime.fromtimestamp(time) print("{}".format(time)) else: times[m]=0 if 'No such file' in str: print('does not exist') s='online' # since we got a response self.machines[k]=(m,i,o,s,a) else: if verbose: print("not available, msg={}".format(str)) else: print("not available") # log this message for later attention logf=open(MissedFilesLog,'a') print('blat to machine {} returned {}'.format(m,str),file=logf) logf.close() #print(times) for k in times: t=times[k] if t>latest: latest=t; # 'latest' is datetime of latest copy if latest==0: print("Could not find a recent copy of {}. Are you sure it exists?".format(abspath))
Chunk referenced in 9.10
<blat: blatFile identify machines with latest copy 9.12> =
# now find machines with latest copy machines=[] for tk in times.keys(): if times[tk]==latest: machines.append(tk) # 'machines' is list of machines with latest copy lateststr=datetime.datetime.fromtimestamp(latest) if verbose: print("\nlatest copy is at {} on machine(s) {}".format(lateststr,machines)) print("we are on %s" % (thismachine))
Chunk referenced in 9.10
<blat: blatFile get latest copy if we don't have it 9.13> =
# make sure we have latest copy if not times.has_key(thismachine): print("This machine ({}) not in machines.txt".format(thismachine) sys.exit(1) if thismachine not in machines and times[thismachine]<latest: print("getting latest copy from %s to here ..." % (machines[0])) # do xfer of f from machine to this machine cmd=['/usr/bin/rsync','-auv','%s:%s' % (machines[0],abspath), abspath] if verbose: print(' '.join(cmd),' =>', ) stat=subprocess.check_output(cmd,universal_newlines=True,stderr=subprocess.PIPE) if verbose: print('OK') times[thismachine]=latest
Chunk referenced in 9.10
<blat: blatFile copy out latest copies 9.14> =
# now copy the latest version to thise machines with out-of-date copies print("") for machine in times.keys(): (m,i,o,s,a)=self.machines[machine] if s!='online': # must be up-to-date, since it has a times entry # but it is not online, so skip if verbose: print("Skipping machine {} from copy".format(m)) continue t=times[machine] if t<latest: print("copying to machine {: <20} ...".format(machine),end='') # do xfer of f from this machine to machine # these xfers are dodgy, depending upon several things, so do try try: # first ensure that target directory exists d=os.path.dirname(abspath) cmd=['mkdir','-p',d] (res,str)=self.commandMachine(m,cmd) if debug: print("mkdir returns:{},{}".format(res,str)) if res!=0: # log all errors for later analysis logf=open(MissedFilesLog,'a') print("machine {} mkdir returned msg {}".format(m,str),file=logf) logf.close() # now do the real copy cmd=['/usr/bin/rsync','-auv',abspath,'%s:%s' % (machine,abspath)] #print(' '.join(cmd),' =>', ) stat=subprocess.check_output(cmd,universal_newlines=True,stderr=subprocess.PIPE) #(res,str)=self.commandMachine(m,cmd) #print("rsync returns:",stat) print('OK') copy=(lateststr,m,abspath) self.copied.append(copy) except subprocess.CalledProcessError as err: print("Oops!") print("Could not copy %s to machine %s, reason:" % (abspath,machine)) print(err)
Chunk referenced in 9.10
<blat: blatFile log unsuccessful attempt 9.15> =
for k in self.machines: (m,i,o,s,a)=self.machines[k] if s!='online': # could not/did not copy file to this machine synchFails=open(MissedFilesLog,'a') synchFails.write('{0} {1} {2}\n'.format(lateststr,m,abspath)) #print("writing {0} {1} {2} to log".format(lateststr,m,abspath)) synchFails.close()
Chunk referenced in 9.10

9.4.6 blat: class: define the blatDirectory method

<blat: class: define the blatDirectory method 9.16> =
def blatDirectory(self,d,r): ''' write all files in d to all live machines. If r True, recursively visit nested directories ''' if verbose: print("\nBlatting directory {}".format(d)) dirlist=os.scandir(d) for direntry in dirlist: relpath=direntry.path if direntry.is_file(): self.blatFile(relpath) elif r and direntry.is_dir(): self.blatDirectory(relpath,r) return
Chunk referenced in 9.4

9.4.7 blat: class: define the updateToDos method

<blat: class: define the updateToDos method 9.17> =
def updateToDos(self): datepat='(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})' machinepat='([\w\.]+)' filenamepat='([\w/\.]+)' pat=re.compile(datepat+' +'+machinepat+' +'+filenamepat) updates={} f=open(MissedFilesLog,'r') for l in f.readlines(): l=l.strip() #print(l) res=re.match(pat,l) if res: year=int(res.group(1)) month=int(res.group(2)) day=int(res.group(3)) hour=int(res.group(4)) min=int(res.group(5)) sec=int(res.group(6)) thisdt=datetime.datetime(year,month,day,hour,min,sec) machine=res.group(7) filename=res.group(8) #print(thisdt,machine,filename) if not machine in updates: updates[machine]=[] if (thisdt,filename) not in updates[machine]: updates[machine].append((thisdt,filename)) else: print("could not match {}".format(l)) # now sort lists for each machine for m in updates: l=updates[m] l.sort() updates[m]=l return updates def updateFileRequests(self,updates): for mac in updates: if mac in self.machines: (m,i,o,s,a)=self.machines[mac] if s=='online': newlist=[] for (d,f) in updates[mac]: print("copy file {} to machine {}".format(f,mac)) updates[mac]=[] else: print("Machine {} still offline".format(mac)) return updates
Chunk referenced in 9.4

10. Makefile

This uses my standard literate makefile structure, and relies heavily upon the generic package MakeXLP to build all the tangled and woven parts.

The default=synchronize.tangle defines the base (dummy) target file of the literate makes. Note that the Makefile (independently constructed) includes both this Makefile-synchronize, and the separately generated Makefile-backup (q.v.).

"Makefile-blat" 10.1 =
default=blat.tangle include ${HOME}/etc/MakeXLP blat.py: blat.tangle blat: blat.py blat.tangle cp blat.py blat chmod 755 blat install-blat: blat cp blat ${HOME}/bin/ cp blat.py ${HOME}/lib/python/ touch install-blat

11. Literate Tests

I used a very simple test script. Since this code is generated using a literate program, I used cycles of the following two commands to test:

        $ make blat
        $ blat blat.xlp
      
with appropriate checking of the state table after step two.

12. Bibliography

Nothing appropriate.

13. Glossary

Nothing appropriate.

14. Indices

14.1 Chunk Names

Chunk Name Defined in Used in
blat: blatFile copy out latest copies 9.14 9.10
blat: blatFile get latest copy if we don't have it 9.13 9.10
blat: blatFile identify latest copy 9.11 9.10
blat: blatFile identify machines with latest copy 9.12 9.10
blat: blatFile log unsuccessful attempt 9.15 9.10
blat: class: commandMachine: command performed OK 9.6 9.5
blat: class: define the blatDirectory method 9.16 9.4
blat: class: define the blatFile method 9.10 9.4
blat: class: define the checkMachines method 9.9 9.4
blat: class: define the commandMachine method 9.5 9.4
blat: class: define the loadState method 9.7 9.4
blat: class: define the saveState method 9.8 9.4
blat: class: define the updateToDos method 9.17 9.4
blat: define the blat class 9.4 9.1
blat: define the main routine 9.2 9.1
blat: main cli interpretation 9.3 9.1
current comment 15.3
current date 15.2
current version 15.1 9.1

14.2 Identifiers

Identifier Defined in Used in

15. Maintenance History

15.1 Current status of this document:

<current version 15.1> = 2.1.4
Chunk referenced in 9.1
<current date 15.2> = 20170215:145950
<current comment 15.3> =
further revision of non-verbose printing

15.2 Previous status of this document

<current date 15.2> ajh <current version 15.1> <current comment 15.3>
20170211:164410 ajh 2.1.3 revised non-verbose printing
20170116:145456 ajh 2.1.2 add initial update mechanisms
20170109:164708 ajh 2.1.1 fix version option in accordance with argparse semantics
20170107:131318 ajh 2.1.0 converted subprocess.run to subprocess.call, to preserve backwards compatibility through earlier versions of Python 3
20170103:175225 ajh 2.0.0 start conversion to python 3, with new module 'blat'
20160915:111941 ajh 1.2 add actual synchronization calls
20160912 ajh 1.1 add outline of synchronization process
20160910 ajh 1.0 provide file synchronization primitives

51 accesses since 17 Jan 2017, HTML cache rendered at 20170311:0100