Automated Copying and Fixity Check
Testimonial: Eddy Colloton
Type: Software Workflow
Data and Digital Systems Manager at the Irish Film Institute, Kieran O’Leary, has an invaluable suite of tools written in python on his IFIscripts Github repo. O’Leary’s work was recognized at the Digital Preservation Coalition in 2018. A great script from this suite is the copyit.py python script. This script does require python. For mac users, I recommend downloading python3 following the homebrew instructions here: https://www.saintlad.com/install-python-3-on-mac/
The script automates the copying of a directory and its contents accompanied by fixity checks both prior to, and after the copy. Running the script on a set of files will result in a manifest of md5 checksums from the source and the destination. Developer Archivist Joanna White wrote up a great summary of this tool on her blog, which unpacks the tools functionality as well as how it fits into her workflow. To use copyit.py, follow the steps below:
- Download the IFIscripts from O’Leary’s github (about 2 mb).
- Navigate to the IFIscripts directory via the command line:
- cd [~/path/to/IFIscripts/]
- Run copyit.py using python, specifying the source of the files, and their destination:
- python3 copyit.py [path_to_files/] [path_to_destination/]
- If you receive the error message ModuleNotFoundError: No module named 'lxml' simply install the lxml module like so:
- sudo pip3 install lxml
- Once running, the script will first check for write access, and if it is available, delete any hidden files such as .DS_store or Desktop.ini, then create a checksum manifest using the hashlib tool.
- Prior to moving the files, the script checks that adequate space is available on the destination volume.
- After the files are copied (using either robocopy on Windows, cp on Linux, or rsync on macOS) the script will create a second md5 checksum manifest, and verify that it matches the original manifest.
- The script will create 2 directories on the Desktop, “ifiscripts_logs” and “moveit_manifests.” The “logs” directory will contain a text file with a .log extension, detailing the copyit.py process and noting any errors, while the “moveit_manifests” directory will store text files with a .md5 extension, containing the md5 checksum manifests.
- White found that, if writing to LTO, use the “-lto” flag, which will use the GCP copying tool, rather than rsync, found to be much faster when writing to LTO:
- python copyit.py -lto [path_to_files/] [path_to_LTO/]