Ibid: Incremental backups to infinite disk

You back up the files on your computer regularly, right? You don’t? You’re just asking for trouble. But you already knew that, didn’t you? You’d do periodic backups if you could, but there just are no good backup tools, right? They’re too complicated to use? You don’t have enough CD-R’s to hold all your files? Or you already have too many and are drowning in the clutter?

Well now you can save the lame excuses for why you don’t floss. I’ve written ibid, a simple tool for performing incremental backups on Linux and Unix-like systems — possibly also Mac OS X, I haven’t checked yet. And who knows, maybe it can even be made to run under Windows.

Incremental backups

An incremental backup is a backup of just those files that have changed since the last backup.

Many people use mirroring for backups, periodically duplicating essential files to some destination, such as a spare external hard drive, replacing those copies with fresh copies on each backup. This approach is economical in terms of the storage space used, but it only allows you to restore the latest version of a given file. If you have a file called my_novel.doc and you mirror it to a spare hard drive, then you accidentally delete half of it but don’t notice until after the next time you mirror it to that hard drive, then the pre-accident version of my_novel.doc is gone forever.

With a true backup system like ibid, the pre-accident backup copy of my_novel.doc still exists; the post-accident backup copy is separate. The main drawback to this safer approach, of course, is that the more backups you do, the more space you need.

For a long time I used a makeshift system that backed up files to blank CDs or DVDs, but it was slow and difficult to use and it required my constant attendance (for swapping discs in and out) and very often the disc burner would crap out at the last minute after half an hour of burning and produce an unreadable disc. I didn’t trust the archives that I created with this system; it was annoying to have to render botched discs unreadable with scissors all the time (for privacy reasons); and the more backups I did, the more crowded with discs my safe-deposit box got.

Then I discovered Jungledisk, a commercial “infinite filesystem” service that’s easy to use and has good privacy and security features, not to mention reasonable fees. In conjunction with davfs it can be made to act just like a mounted filesystem.

Ibid doesn’t care if you’re using Jungledisk or some other destination service or media. It simply copies the files you ask for to the target directory tree. The backed-up copies are just plain files residing under their original names in a plain filesystem, so retrieving old data is simple.

Here are ibid’s other main features:

Written as a single Perl file, no complicated build/install process;
Limit sessions by size;
Exclude files and directories by pattern (using Perl “regular expressions”);
Files with multiple “hard links” stored only once — link names stored separately;
Files re-backed up when they change;
Files not re-backed up if they’re merely renamed (the new name is stored separately);
Can run unattended, and even at automated intervals (e.g. via cron).

You can download it here. At some point soon perhaps I will create a proper website for it, at which time I will update this blog post with the new information; meanwhile, this is the definitive home of ibid (and I will keep the copy at the above link up to date with bugfixes and other changes).

Here is the documentation for ibid.

NAME

ibid – incremental backups to infinite disk

SYNOPSIS

ibid [options] FILESET TARGET

ibid --dump FILESET [PATTERN ...]

DESCRIPTION

Ibid is a simple tool for performing incremental backups. An incremental backup is a backup of just those files that have changed since the last backup.

Ibid backs up files to any destination that can be mounted as a writable filesystem. It was developed for the author’s use with Jungledisk (http://www.jungledisk.com/, a commercial “infinite filesystem” service) in conjunction with davfs (http://dav.sourceforge.com/), but any similar service or mountable media should also work.

To back up files, you must first define a fileset, which is an ordinary text file (whose name is the name of the fileset) listing files and directories to include in the backup, one per line. This file is typically stored in the directory $HOME/.ibid. Example:

 /home
 /var
 /etc
 /usr/local

Each item in the fileset definition must be a full pathname. Directories are processed recursively but filesystem boundaries are never crossed.

You may also define a set of exclusions in $HOME/.ibid/fileset.EXCLUDE. This is a list of Perl regular expressions, one per line. These are matched against each candidate file and directory; if a match is found, the file or directory is excluded from the backup. The pattern is matched against the full pathname of the candidate file or directory. You must explicitly use ^ and $ to anchor your match to the beginning and end of the pathname. Example:

 ~$
 /#[^/]+#$
 /tmp/
 /core$
 ^/home/bobg/src/extern/

This excludes Emacs-style backup and checkpoint files; everything under any directory named tmp; any file named core; and the tree rooted at /home/bobg/src/extern/.

With the fileset and optional exclusions defined, start a backup with:

 ibid FILESET TARGET

where TARGET is a directory on the destination filesystem; e.g., /mnt/jungledisk. Ibid will copy eligible files from the fileset to a new tree rooted at TARGET/FILESET/1. Here, 1 denotes this is the first session for FILESET; each time ibid runs it creates a new session. A record of the backup is stored in $HOME/.ibid/FILESET.1 and in TARGET/FILESET/FILESET.1. If you later reinvoke

 ibid FILESET TARGET

then any files created or changed since session 1 will be copied to a tree rooted at TARGET/FILESET/2, and a cumulative record of all sessions will be written to FILESET.2.

Symlinks are not copied to the target filesystem, but they are added to the session record. Hard links are detected and are also added to the session record only. In both cases the target of the link is recorded too, permitting a later “restore.”

If a file is renamed between sessions but remains on the same filesystem and is otherwise unchanged, the new name is treated as a hard link: the file’s contents are not recopied to the target filesystem, but the name is added to the session record.

You may compress old session records in $HOME/.ibid with gzip or bzip2. Ibid will still be able to read them.

OPTIONS

--no (or -n)

Don’t make any changes: files are not copied and a new session record is not written. With -v, this is a good way to preview what a backup would do.

--verbose (or -v)

Increase verbosity. Each -v adds more.

--dir DIR (or -d DIR)

Look for filesets and sessions in DIR. Default is $HOME/.ibid.

--limit LIMIT (or -l LIMIT)

Limit this session to copying LIMIT bytes. Copying ends after the file that crosses the LIMIT threshold, so it’s possible for LIMIT to be exceeded by quite a lot.

LIMIT may have the suffix k, m, or g to denote kilobytes, megabytes, or gigabytes.

--preserve-mode/--nopreserve-mode (or -m)

Copy the mode bits of backed-up files or don’t. Default is to copy them.

--preserve-owner/--nopreserve-owner (or -o)

Copy the uid and gid of backed-up files or don’t. Default is to copy them.

--preserve-time/--nopreserve-time (or -t)

Copy the last-access and last-modification times of backed-up files or don’t. Default is not to copy them (because Jungledisk+davfs, which the author used during development of ibid, doesn’t support that operation [yet?]).

--restore-atime/--norestore-atime (or -a)

After copying files to the target filesystem, restore the file’s original last-access time or don’t. Default is to restore.

--recheck-mtime/--norecheck-mtime

If --restore-atime is in effect, enabling this option causes ibid to double-check the original file’s last-modification time against its original values after the last-access time is restored. This is to detect the bug in some versions of the Linux kernel and/or the ext3 filesystem that caused utime(2) to store bogus values. Default is to perform the recheck and die if the bug is detected.

--ensure/--noensure (or -e)

Before deciding to skip a file because it’s unchanged from its last backup, ibid will check the supposed backup copy to see if it’s present and has the proper size. If not, it’ll get backed up “again.” This option defends against the possibility that your “infinite disk” service dropped some data undetected, as can happen with Jungledisk (for instance) when using “background mode” and three consecutive retries all fail (as of version 1.13, see http://forum.jungledisk.com/viewtopic.php?t=421).

For purposes of this option, ibid assumes all prior sessions can be found under the same TARGET root as the current one.

Using this option is costly in time and network I/O, so the default is off.

--version (or -V)

Report the version of ibid and exit.

--dump (or -D)

Rather than perform a backup, ibid dumps the contents of a session record (the latest session for the given fileset) to standard output. The information includes the session number, the start and end times for that session, and the complete list of filenames, including symlinks and hard links. Beneath each filename is a list of sessions in which that filename appears, and either (a) the last-modification time and size of the file when it was backed up in that session, (b) the symbol S-> (to connote a symlink) and the target of the link, or (c) the symbol H-> (to connote a hard link) and the pathname under which the contents were most recently copied to the archive.

After the fileset, you may specify on the command line any number of Perl regular expressions; only matching filenames will be included in the output.

FILES

$HOME/.ibid/FILESET
$HOME/.ibid/FILESET.EXCLUDE
$HOME/.ibid/FILESET.1, FILESET.2, …