Bup
This page contains information about bup
, an awesome git-based backup utility. bup
is short for "backup".
References
- Website
- https://bup.github.io/
- GitHub project
- https://github.com/bup/bup
- DESIGN document, a worthwile and entertaining read that provides insight into the
bup
repository format and the deduplication algorithm - https://raw.githubusercontent.com/bup/bup/master/DESIGN
- Article with useful HOWTO commands
- https://techarena51.com/index.php/using-git-backup-website-files-on-linux/
- Article mostly concerned with the KDE frontend kup
- http://www.linux-magazine.com/Issues/2015/178/Kup
- Background info on data deduplication
- https://en.wikipedia.org/wiki/Data_deduplication
Rationale
When bup
makes a backup, it creates a snapshot of the files that are to be backed up. If another backup is made at a later time, another snapshot is created. The effect is a "time machine" that lets you go back and restore the backed up folder to any state as it appeared in the past at the time when the snapshot was made.
When bup
is run, it runs on the machine where the backup repository is located. bup
pulls files, possibly from a remote host, onto the backup repository machine.
bup
works on top of git
: The backup repository essentially is a git
repository, although bup
does a few special things to achieve its goals.
bup
applies data deduplication so that the backup repository grows only when the backed up files actually change their content. The deduplication algorithm is unbelievably clever: It is not limited to individual files (a limitation of backup tools that use hardlinks to minimize growth), but instead breaks up files into relatively small data chunks (8 KB chunks if I remember correctly) so that data deduplication works even if files have changed only partially.
An important consequence of this is that it is better to feed uncompressed files into bup
, because only then bup
is able to apply its deduplication algorithm to the maximum effect. Not to worry: bup
applies its own compression after deduplication has taken place, so that the bup
repository is about the most efficient backup storage in the known universe :-)
FAQ
- Does bup support file metadata?
- AFAIK, no. File ownership and file permission is not retained.
- Does bup have encryption?
- No. Here is an interesting thread where the original designer of
bup
is discussing the issue with someone who would like to have encryption inbup
. My solution, if I wanted encryption, would be to store the bup repository on an encrypted disk.
The BUP_DIR environment variable
The BUP_DIR
environment variable, if set, specifies the location of the bup
repository. bup
commands are commonly run like this:
BUP_DIR=/path/to/repo bup <subcommand>
If BUP_DIR
is not set, bup
assumes the default location
~/.bup
To override this one can specify the repository with a command line option:
bup -d /path/to/repo <subcommand>
Create repository
The following command creates a new folder and populates it with the stuff that makes it a bup
repository. The result is a bare git
repo.
BUP_DIR=foo.bup bup init
If the folder already exists, bup
simply writes its stuff into the existing folder.
Create a snapshot
First the folder to create a snapshot from must be indexed: In this step bup
takes stock of what is in the folder, and what has changed since the last snapshot was made. This is the command for indexing:
BUP_DIR=foo.bup bup index /path/to/folder
Note that the folder to index must be a local folder.
Once the indexing has completed, the snapshot can be created like this:
BUP_DIR=foo.bup bup save -n fancy-snapshot-label /path/to/folder
Notes:
bup
uses the snapshot label to create agit
branch. This is neat, but also imposes limitations on the characters that can be used to compose the label. For instance, spaces cannot be used. If something illegal is specified,bup
will burp and create an incomplete backup.
Create redundancy
Because of bup
's ultra-efficient storage format, a bup
repository is extremely vulnerable to corruption. The deduplication feature of bup
makes sure that any block of data is stored exactly once only, so if any such block were to become damaged, all snapshots that include the block would be corrupted at the same time.
For this reason, bup
can generate a bit of redundancy that allows the repository to survive a certain amount of corruption, at the price of a moderate size increase of the repository. This is the command that generates that redundancy:
BUP_DIR=foo.bup bup fsck -g
Notes:
- It is currently not clear if this command must be run after every new snapshot, or if it is sufficient to run it once and subsequent snapshot operations honor the setting.
- Running the command again on a repository 14 months old did not print any output to the console, so I would assume that no additional redundancy was generated because none was needed.
Integrity checking
An integrity check can be run with this command (can be very slow!):
BUP_DIR=foo.bup bup fsck
A faster check can be run like this. According to the man page this comes "with no obvious decrease in reliability. However, you may want to avoid this option if you're paranoid."
BUP_DIR=foo.bup bup fsck --quick
For comparison, I ran both checks on a bup repository with size 3.3 GB which contains 64 snapshots made over the last 14 months. The repo contains a lot of duplication:
- Quick fsck 1 = 121 seconds
- Normal fsck 1 = 117 seconds
- Quick fsck 2 = 113 seconds
- Normal fsck 1 = 112 seconds
- All checks took 84-85 seconds of user time
If corruption exists, this command tries to recover. In order for this to work, redundancy must have been created with bup fsck -g
before corruption occurred.
BUP_DIR=foo.bup bup fsck -r
List snapshots
This command lists all snapshots that exist in a repository:
BUP_DIR=foo.bup bup ls
Restore files
Direct restore
Restoring a single file can be as simple as
BUP_DIR=foo.bup bup restore /snapshot-label/latest/path/to/file
Notes:
- The file will be restored into the current working directory
- A folder can be specified instead of a file, in which case the folder and its contents are restored into the current working directory
- If several snapshots were made with the same name, then you can replace "latest" with another revision to access an older snapshot
- The
--outdir
command line option can be used to specify a target path for the restore that is not the current working directory - The
--exclude-rx
command line option can be used to exclude certain files and/or folders from the restore. See the man page for details. - The man page also has details about how file ownership is restored, and how hard links are handled.
Restore from temporary fileystem
The following command uses FUSE to temporarily mount a bup repository as a userspace filesystem and to restore files by manually copying them from that filesystem:
BUP_DIR=foo.bup bup fuse /tmp/foo.bup
Discussion:
- The mount point (
/tmp/foo.bup
in the example) must exist - According to the man page, because "bup fuse" is still experimental all files will be readable by all users
- Using the command line options
-f
or-d
the runs the mount process in the foreground.-d
in addition prints debug information when the filesystem contents are accessed.
To unmount the filesystem:
umount /tmp/foo.bup
Manage backups / delete old backups
TODO
View old backups: bup ls ...
Delete old backups: Experimental support "bup rm" and "bup gc"
Get a backup from a remote host
This is not directly possible.
The solution is to make an uncompressed tar
or cpio
archive and pull that from the remote host. Then feed the archive into bup
and the de-duplication process will work its magic to prevent backup bloat.
Push a backup to a remote server
TODO
Possible, examples see GitHub README
split/join vs. index/save/restore
What's the difference between split/join and index/save/restore?
- "bup index" creates / updates an index for a given folder. The index is maintained in the bup repository for that folder. "bup save" checks all files in the index and runs "bup split" on each file. "bup restore" goes through the content of a backup and runs "bup join" on all of the files in the backup.
- "bup split" and "bup join" work on a single file
TODO: Examples
bupper
Website = https://github.com/tobru/bupper
bupper
seems to be a nice frontend to bup
, making bup
even easier to use. The main point of bupper
is that you can define so-called profiles. A profile is a collection of settings that together define
- The location to back up, and
- The location of a
bup
repository that receives the backup
A very interesting feature of bupper
is that you can also specify excludes, i.e. files that you don't want to back up. Typically you would exclude files like gshadow
etc.