In the previous post, I talked about using duplicity and s3 as a remote backup solution. In that post, my backup rules excluded certain huge dirs, like ~/Pictures. It's not that I didn't want to back these up, but given their size (50GB), the fact they are pretty static, and it's unlikely I will want to restore them very often, Amazon glacier is a cheap alternative to s3.

Firstly, I have a dir ~/StaticBackup/, which along with the other big dirs (~/Pictures, ~/Documents,...) is excluded from my primary backup script. I manually archive folders from ~/Pictures to ~/StaticBackup when I'm ready to have them backed up (you could just back them up directly of course)

Now I create another bucket on s3 called backup-static , and under "Lifecycle", add a rule that moves all files prefixed with "duplicity" (this is how duplicity prefixes all files by default) to glacier 0 days (i.e. immediately) after upload, and deletes the copy on s3 0 days (immediately) after the transfer.

Now the trade off with glacier is that the files are no longer immediately accessible. Duplicity needs to be able to access its manifest files or else it can't do incremental backup. To this end the latest releases of duplicity with the options:

 --file-prefix, --file-prefix-manifest, --file-prefix-archive, --file-prefix-signature


come in handy to give the files different prefixes depending on what type they are. Previously you had to write a hack to do this renaming yourself. Judicious use of these prefixes allows the life-cycle rule we set to only send the archive files to glacier, keeping the manifest and sig files immediately accessible. on s3 (NB I have to add the duplicity-team ppa and update to latest stable duplicity to access these flags). I now add the following:

--file-prefix-manifest=_ --file-prefix-signature=_


to the COMMON_OPTS variable I used in the script in my previous post. This tells duplicity to prefix all manifest and sig files with "_". We can use this prefix to stop Amazon freezing those files in glacier.

Note that duplicity will not be able to restore your backup as long as it is stored on Glacier. You will need to restore it via the Amazon Management Console (it will take a few hours) before starting the restore.

You can check from the s3 console if the rule has worked as next to the file, under "Storage Class" it should say "Glacier" and not "Standard".

I remove the options for forcing full backups and deleting old sets for this script, just letting it chug away incrementally. 70GB of backup is just too big to periodically destroy and push up fully again. Perhaps I will revise that in the future to be really long frequency full backup forcing, like every year or two with two backup sets kept at a given time, but for now I don't see a major disadvantage in this case for solely incremental backups. Perhaps I will live to regret that?

I advise you to test all this first with a smallish directory. Try an initial backup. Wait for the files to move to glacier. Try adding some file to the local dir and then run the script again for an incremental backup. Finally try restoring the files from glacier (yes there are penalties but your files should be very small for testing), then restoring using duplicity.

I turn off forced full backups and the deletion of full backups older than N on this script. The size, ~100gb, is just too big to have a new full backup set pushed up to s3/glacier periodically and the old one destroyed. I just let duplicity incrementally chug away with a single full backup set. Perhaps this is foolish and I should really just set the frequency to something really low, like once a year we have a new full chain, and we keep two backup chains at any given time, but for now I will see how it goes...

Current rating: 5