M

mpi-archiver

mpi-archiver is providing a standard way to create an archive file of files in a given directory and uploading it to an archival server.

Name Last Update
shunit2/2.1 Loading commit data...
tests Loading commit data...
CONTRIBUTING.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
TODO.txt Loading commit data...
mpi-archiver Loading commit data...
run-tests Loading commit data...

MPI Archiver

What this program does

mpi-archiver is providing a standard way to create an archive file of files in a given directory. The focus is on absolute simplicity. All the tools used are standard, freely available and very likely to be still available in 10 years. The format chosen for packing all files into one is tar, as this is very popular and available almost everywhere.

The files are copied to a staging area first. This is to make sure that during the next steps no changes to the files will happen. Later those data are packed and uploaded to the target system.

Along with the archive file two other files are stored: an INDEX and a README file. The INDEX is a list of all files together with their size, last modification date, path and an SHA-1 checksum, as generated by the sha1deep program. This can later be used to search (grep) for files.

The README is a non-standardised text file with all available meta data about the archive. This should include project name, authors, bibtex keys, DOI numbers, abstract or everything needed to properly identify a project.

Those three files are given a unique name for identification of individual archives. See "Naming" for an explanation.

What it does not

  • Define in any way how meta data should be formatted
  • Care for redundancy or longevity of data (this should be done at the target storage layer)
  • Maintain a database for easy search and retrieval of data (although you can simply keep the local cache of INDEX and README files and search those, it's probably just good enough)

Installation

  • Set up a staging server. It can be any server with enough space to temporarily hold the archive data.
  • Log into your staging server
  • Install mpi-archiver somewhere in your $PATH
  • Make sure you have sha1deep and zsh (apt-get install md5deep zsh) installed
  • Add a configuration file in your home as $HOME/.mpi-archiver.conf. This file is simply sourced from the main script, so you can overwrite any variable. See the script header for a list of variables. Usually you will want to set at least the MPIA_DEPT variable to match your department.
  • Configure ssh to use the correct keys for the servers you will use. Those are the RZG server and the server where the user archive data are located.

Run Tests

mpi-archiver comes with unit tests. To run the unit tests simply run the command "run-tests".

Get started

Here is an example config file for trying out locally. Put this in $HOME/.mpi-archiver.conf:

MPIA_HOST=""
MPIA_STAGING_PATH=/tmp/staging
MPIA_ARCHIVE_PATH=/tmp/archive
MPIA_PATH=/tmp/target

Do not forget to mkdir /tmp/{staging,archive,target}. After this you should be able make a first archive. Create a README file somewhere with some description of your project. Then run the following command:

mpi-archiver -r README -d /some/project/of/mine

After the command finished you should have three new files in /tmp/target: The README, an index and the data itself in a tar archive. The project data can also be copied from another host, just add a host to the -d parameter like this: -d server:/some/project/of/mine. If you set the MPIA_HOST variable in the config file your archive will be uploaded to that host. You can also add your user name to it, like e. g.:

MPIA_HOST="sstark@archive.rzg.mpg.de:"

Preflight

You want to make sure the archived data do not contain "strange" file names or symbolic links outside the scope of the archive. To check this you can set the MPIA_PREFLIGHT variable to point to an executable (script, compiled, whatever you prefer). This program will be executed before staging and, if it returns a non-negative value, will stop any further processing.

RZG Account

Apply for an RZG account here: http://www.rzg.mpg.de/userspace/forms/onlineregistrationform

Ask for ssh access to archive.rzg.mpg.de:/p/MMGT

Internal Documentation

Internal to MPI IS: