Bup

From HerzbubeWiki
Jump to navigation Jump to search

This page contains information about bup, an awesome git-based backup utility. bup is short for "backup".


References

Website
https://bup.github.io/
GitHub project
https://github.com/bup/bup
DESIGN document, a worthwile and entertaining read that provides insight into the bup repository format and the deduplication algorithm
https://raw.githubusercontent.com/bup/bup/master/DESIGN
Article with useful HOWTO commands
https://techarena51.com/index.php/using-git-backup-website-files-on-linux/
Article mostly concerned with the KDE frontend kup
http://www.linux-magazine.com/Issues/2015/178/Kup
Background info on data deduplication
https://en.wikipedia.org/wiki/Data_deduplication


Rationale

When bup makes a backup, it creates a snapshot of the files that are to be backed up. If another backup is made at a later time, another snapshot is created. The effect is a "time machine" that lets you go back and restore the backed up folder to any state as it appeared in the past at the time when the snapshot was made.

When bup is run, it runs on the machine where the backup repository is located. bup pulls files, possibly from a remote host, onto the backup repository machine.

bup works on top of git: The backup repository essentially is a git repository, although bup does a few special things to achieve its goals.

bup applies data deduplication so that the backup repository grows only when the backed up files actually change their content. The deduplication algorithm is unbelievably clever: It is not limited to individual files (a limitation of backup tools that use hardlinks to minimize growth), but instead breaks up files into relatively small data chunks (8 KB chunks if I remember correctly) so that data deduplication works even if files have changed only partially.

An important consequence of this is that it is better to feed uncompressed files into bup, because only then bup is able to apply its deduplication algorithm to the maximum effect. Not to worry: bup applies its own compression after deduplication has taken place, so that the bup repository is about the most efficient backup storage in the known universe :-)


FAQ

Does bup support file metadata?
AFAIK, no. File ownership and file permission is not retained.
Does bup have encryption?
No. Here is an interesting thread where the original designer of bup is discussing the issue with someone who would like to have encryption in bup. My solution, if I wanted encryption, would be to store the bup repository on an encrypted disk.


The BUP_DIR environment variable

The BUP_DIR environment variable, if set, specifies the location of the bup repository. bup commands are commonly run like this:

BUP_DIR=/path/to/repo bup <subcommand>

If BUP_DIR is not set, bup assumes the default location

~/.bup

To override this one can specify the repository with a command line option:

bup -d /path/to/repo <subcommand>


Create repository

The following command creates a new folder and populates it with the stuff that makes it a bup repository. The result is a bare git repo.

BUP_DIR=foo.bup bup init

If the folder already exists, bup simply writes its stuff into the existing folder.


Create a snapshot

First the folder to create a snapshot from must be indexed: In this step bup takes stock of what is in the folder, and what has changed since the last snapshot was made. This is the command for indexing:

BUP_DIR=foo.bup bup index /path/to/folder

Note that the folder to index must be a local folder.


Once the indexing has completed, the snapshot can be created like this:

BUP_DIR=foo.bup bup save -n fancy-snapshot-label /path/to/folder

Notes:

  • bup uses the snapshot label to create a git branch. This is neat, but also imposes limitations on the characters that can be used to compose the label. For instance, spaces cannot be used. If something illegal is specified, bup will burp and create an incomplete backup.


Create redundancy

Because of bup's ultra-efficient storage format, a bup repository is extremely vulnerable to corruption. The deduplication feature of bup makes sure that any block of data is stored exactly once only, so if any such block were to become damaged, all snapshots that include the block would be corrupted at the same time.

For this reason, bup can generate a bit of redundancy that allows the repository to survive a certain amount of corruption, at the price of a moderate size increase of the repository. This is the command that generates that redundancy:

BUP_DIR=foo.bup bup fsck -g

Notes:

  • It is currently not clear if this command must be run after every new snapshot, or if it is sufficient to run it once and subsequent snapshot operations honor the setting.
  • Running the command again on a repository 14 months old did not print any output to the console, so I would assume that no additional redundancy was generated because none was needed.


Integrity checking

An integrity check can be run with this command (can be very slow!):

BUP_DIR=foo.bup bup fsck

A faster check can be run like this. According to the man page this comes "with no obvious decrease in reliability. However, you may want to avoid this option if you're paranoid."

BUP_DIR=foo.bup bup fsck --quick

For comparison, I ran both checks on a bup repository with size 3.3 GB which contains 64 snapshots made over the last 14 months. The repo contains a lot of duplication:

  • Quick fsck 1 = 121 seconds
  • Normal fsck 1 = 117 seconds
  • Quick fsck 2 = 113 seconds
  • Normal fsck 1 = 112 seconds
  • All checks took 84-85 seconds of user time


If corruption exists, this command tries to recover. In order for this to work, redundancy must have been created with bup fsck -g before corruption occurred.

BUP_DIR=foo.bup bup fsck -r


List snapshots

This command lists all snapshots that exist in a repository:

BUP_DIR=foo.bup bup ls


Restore files

Direct restore

Restoring a single file can be as simple as

BUP_DIR=foo.bup bup restore /snapshot-label/latest/path/to/file

Notes:

  • The file will be restored into the current working directory
  • A folder can be specified instead of a file, in which case the folder and its contents are restored into the current working directory
  • If several snapshots were made with the same name, then you can replace "latest" with another revision to access an older snapshot
  • The --outdir command line option can be used to specify a target path for the restore that is not the current working directory
  • The --exclude-rx command line option can be used to exclude certain files and/or folders from the restore. See the man page for details.
  • The man page also has details about how file ownership is restored, and how hard links are handled.


Restore from temporary fileystem

The following command uses FUSE to temporarily mount a bup repository as a userspace filesystem and to restore files by manually copying them from that filesystem:

BUP_DIR=foo.bup bup fuse /tmp/foo.bup

Discussion:

  • The mount point (/tmp/foo.bup in the example) must exist
  • According to the man page, because "bup fuse" is still experimental all files will be readable by all users
  • Using the command line options -f or -d the runs the mount process in the foreground. -d in addition prints debug information when the filesystem contents are accessed.

To unmount the filesystem:

umount /tmp/foo.bup


Manage backups / delete old backups

TODO

View old backups: bup ls ...

Delete old backups: Experimental support "bup rm" and "bup gc"


Get a backup from a remote host

This is not directly possible.

The solution is to make an uncompressed tar or cpio archive and pull that from the remote host. Then feed the archive into bup and the de-duplication process will work its magic to prevent backup bloat.


Push a backup to a remote server

TODO

Possible, examples see GitHub README


split/join vs. index/save/restore

What's the difference between split/join and index/save/restore?

  • "bup index" creates / updates an index for a given folder. The index is maintained in the bup repository for that folder. "bup save" checks all files in the index and runs "bup split" on each file. "bup restore" goes through the content of a backup and runs "bup join" on all of the files in the backup.
  • "bup split" and "bup join" work on a single file


TODO: Examples


bupper

Website = https://github.com/tobru/bupper

bupper seems to be a nice frontend to bup, making bup even easier to use. The main point of bupper is that you can define so-called profiles. A profile is a collection of settings that together define

  1. The location to back up, and
  2. The location of a bup repository that receives the backup

A very interesting feature of bupper is that you can also specify excludes, i.e. files that you don't want to back up. Typically you would exclude files like gshadow etc.