BackupSolution

From HerzbubeWiki
Jump to: navigation, search

This page has information on how I am backing up my data.


Data sets

I am using several backup strategies for different sets of data. This section provides an overview by describing the different data sets.

Regular data 
This data set contains all personal and household documents (e.g. letters, spreadsheets, invoices, contracts, bank account data, personal notes, etc.). Some of the documents are personally created, but others were obtained from an external source and have become part of the data set in order to archive them. This data set is not too large (less than 10 GB at the time of writing) and has a relatively low growth rate. Files in the data set may change frequently.
Media files 
This data set contains music, images (except for the Photo Library), icons, travel souvenirs (sounds and small movies) and videos (except for the Movie Library). This data set is quite large (less than 200 GB at the time of writing) and has a relatively low growth rate (with spikes when new chunks are added). Files in the data set do not change a lot, the data set usually just grows over time. Metadata such as MP3 tags may be an exception.
Music Library 
This data set consists of the iTunes library data, excluding the actual music media files - those are part of the "Media files" data set. The Music Library therefore contains metadata that is also stored in the music media files (e.g. title, album, artist), personalized data (ratings, playlists), plus other data obtained from the Internet (e.g. album covers). The iTunes library data is treated separately from the music files in the "Media files" data set, and forms a data set in its own right, not because of the characteristics of the data, but because of the special nature of the process with which it must be backed up. iTunes does not allow its library data to be stored on a file server, it can therefore only be added to the automated backup process by manual intervention. For details see iTunes Library data section further down on this page. The Music Library is small (less than 1 GB at the time of writing) and has a very low growth rate (because it mainly consists of metadata). Some files in the data set change frequently (the metadata library), others remain quite static (album covers).
Photo Library 
This data set contains photos taken by one of the cameras in the household, usually while travelling. These photos are treated separately from the images in the "Media files" data set, and form a data set in their own right, because the Photo Library is stored separately from the images in the "Media files" data set, but also because the Photo Library data set has different characteristics than the "Media files" data set. The Photo Library is very large, almost as large again as the "Media files" data set. Also, the Photo Library is relatively fast growing (each trip adds a few more GB). Files in the data set do not change a lot, the data set usually just grows over time. Metadata such as Exif tags may be an exception (e.g. when I perform geo-tagging).
Movie Library 
This data set contains movies. These movies are treated separately from the videos in the "Media files" data set, and form a data set in their own right, simply because of the huge size of the data set (hundreds of GB). Files in the data set practically do not change, the data set usually just grows over time.
Software images 
This data set is an archive of disk images (e.g. Mac OS X install images, C64 floppy disk images), virtual environment images (e.g. Virtual Box machines, DosBox game folders) and other chunks of data that can be seen as software images (e.g. old BIOS dumps). This data set is very large (more than 100 GB at the time of writing) and has a relatively low growth rate (with spikes when new images are added). Files in the data set practically do not change, the data set usually just grows over time.
Software archive 
This data set is an archive of software packages. This data set is very large (more than 100 GB at the time of writing) and has a relatively low growth rate. Files in the data set do not change, the data set just grows over time.
Linux server data 
This data set contains various databases (e.g. Mediawiki, CardDAV contacts, CalDAV calendar data, LDAP data) in the form of nightly database dumps, Git repositories, and other data (e.g. users' home directories). This data set is not too large (less than 10 GB at the time of writing) and has a low growth rate. Files in the data set change frequently because the nightly database dumps usually differ from the previous dumps by at least a few bytes.
Linux server configuration 
This data set contains the configuration of the Linux server, which is mainly the /etc/ folder. This data set is small (less than 1 GB at the time of writing) and has an extremely low growth rate. Files in the data set do not change a lot.
Games data 
This data set contains images and videos originating from my playing computer games (e.g. screenshots and videos from Elite: Dangerous). This data set is very large (more than 100 GB at the time of writing) and has a relatively low growth rate (with spikes when I am on one of my periodic gaming sprees). Files in the data set do not change a lot because I don't do image/video editing.
ISFDB cover scans 
This data set contains scans of book covers I make while I work as editor for the ISFDB project. This data set is not too large (less than 20 GB at the time of writing) and has a relatively low growth rate. Files in the data set do not change, the data set just grows over time.
Data archive 
This data set is an archive of various chunks of data (old Windows data from the 90ies PC era, archive of mailing lists from lists.herzbube.ch, etc.). This data set is not too large (less than 10 GB at the time of writing) and has a low growth rate. With the exception of a TrueCrypt encrypted container, files in the data set do not change, the data set just grows over time.


Backup strategies

Summary

Data set Size On-site backup Off-site copy Snapshot Manual copy Encrypted backup  Remarks
Regular data < 10 GB n/a x x - x
Media files < 200 GB n/a x x - x
Music Library < 10 GB n/a (x) (x) x x The off-site copy and snapshot are made from an intermediate snapshot that needs to be manually updated.
Photo Library < 100 GB n/a (x) (x) x x The off-site copy and snapshot are made from an intermediate snapshot that needs to be manually updated.
Movie Library Hundreds of GB n/a No backup This data set is too large to back up.
Software images > 100 GB n/a No backup This data set is too large to back up.
Software archive > 100 GB n/a No backup This data set is too large to back up.
Linux server data < 10 GB x x x - x
Linux server configuration < 1 GB x x x - x
Games data > 100 GB n/a No backup This data set is not important enough to be backed up (especially in view of its size).
ISFDB cover scans > 10 GB n/a No backup This data set is not important enough to be backed up.
Data archive < 10 GB n/a x x - x


On-site backup

On pelargir an automated on-site backup job is running every night at 01:00 am. The details are on the wiki page Backupninja.

The resulting backup files are rotated in several stages: Daily (7x), weekly (4x) and monthly (12x). Configuration details are on this wiki page.


Off-site copy

Every three days an automated job creates an off-site copy of the backup files created on-site on pelargir. The same job also creates a copy of some of the other data sets that are stored on the intranet file server.


The automated off-site copy job currently runs on a Raspberry Pi - see that Wiki page for a documentation of the cron job schedule. The scripts are documented on the BackupScripts wiki page. The copied data is placed on an external hard disk that is attached via USB to the Raspberry Pi. Important aspects of that storage medium:

  • This is not the same hard disk as the one from which the intranet file server serves data. Storing the data on two independent physical devices guards against physical hardware failure. Unfortunately, in case of file server data there is no geographical separation between the two storage devices, but that can't be helped for the moment.
  • The hard disk with the backup data is not accessible from intranet desktop machines via regular file server mechanisms (e.g. Samba). This makes sure that the backup data cannot be corrupted by malware, especially the recently emerging ransomware.
  • Because the hard disk is accessed only every other day, the mechanical and electrical parts of the device should not wear out for a long time and premature disk failure is therefore rather unlikely. If the disk spins for 3 hours on every backup (conservative estimate), and 150 backups are made per year (includes some non-scheduled extra backups), the result are a ridiculous 450 hours of usage per year.


Snapshot

Every week an automated job creates a snapshot of the off-site copied data. This is achieved by feeding the data into a time-machine like tool. These weekly snapshots can be used to restore file versions and entire folder structures as they appeared in the past. Note that the time-machine tool must support data de-duplication to prevent excessive growth of the snapshot database.


The automated snapshot job currently runs on the same Raspberry Pi as the automated off-site copy job - see that Wiki page for a documentation of the cron job schedule. The scripts are documented on the BackupScripts wiki page. Snapshots are placed on yet another external hard disk, different from the one on which the off-site copy of the backup data lives, but also attached to the Raspberry Pi via USB. All considerations from the previous section regarding a physically separate storage medium also apply to this hard disk.


Encrypted backup

At least annually I create an encrypted backup of all critical data that I don't want to lose even if some disaster (fire, theft) strikes my home. Unfortunately I have to do this manually, so discipline is required.

The backup data is placed on an external hard disk that I then store somewhere away from home. The entire hard disk is encrypted so that I don't have to worry about theft or other unauthorized access.


Backup of desktop machines

Summary

Currently desktop machines (Mac, Windows) are not backed up automatically. Almost all of the important data is located on file servers, and programs can be re-installed. Some data, however, exists only locally on some of those desktop machines. Not all of that local data is worthy of backup, but here is a list of what I find important enough to warrant the effort of making a periodical manual backup:

  • Photo Library
  • iTunes Library data (the playlists and ratings, not the music files - these are already stored on the file server)


I manage the backup of this data via the following process:

  • I keep a snapshot of the data on the file server. This snapshot is picked up by the automatic backup processes described in the previous sections.
  • All I have to do is to update the snapshot on the file server from time to time by making a manual copy of the data. The next time one of the automated backup processes is running it will simply start to work with the updated snapshot.
  • Snapshots of Mac data are stored on the file server as sparse bundles which internally use the HFSX file system. This allows to preserve any resource fork data or extended attributes.
  • Because I want the snapshot to change only as little as possible I use rsync to update the snapshot. Unfortunately a few bands of the sparse bundle change every time that the disk image stored by the bundle is mounted, even if no data inside the disk image is actually modified. But that's a price I'm willing to pay for the ease of use afforded by sparse bundles.
  • The version of rsync that ships with Mac OS X (2.6.9) is ancient and outdated and doesn't properly handle resource forks and extended attributes. For this reason it is important that a modern version of rsync is installed on the system via a package manager such as Homebrew.


Photo Library

Why is the photo library not stored on the file server? Currently I manage the photo library with iPhoto. iPhoto allows to store its photo library on a disk image that is located on the file server, but then work with the photo library becomes unbearably slow. For this reason, the photo library must be stored locally on a desktop machine or on a fast external hard disk.


One of the following commands updates the snapshot on the file server:

# Use this command if the "alles-andere" share is not mounted on the machine.
# Important: DO NOT USE A TRAILING SLASH for the local folder to be snapshotted,
# otherwise rsync will place the CONTENT of the local folder into the root folder
# of the sparse bundle.
htb-mkbackupcopy.sh -s "//pi@raspberrypi1/alles-andere" "$HOME/Pictures/digikam" "Backup/Snapshots/Photo Library"

# Use this command if the "alles-andere" share is mounted on the machine for the current user.
# A numeric suffix possibly has to be added to the mount point (e.g. "/Volumes/alles-andere-1"),
# if some other user on the machine has mounted the share first.
htb-mkbackupcopy.sh "$HOME/Pictures/digikam" "/Volumes/alles-andere/Backup/Snapshots/Photo Library"


iTunes Library data

Why is the iTunes library not stored on the file server? Unfortunately iTunes does not allow the user to choose an arbitrary storage location for the iTunes library. It's possible to trick iTunes by creating a symbolic link that refers to a location on the file server, but this is a fragile solution that easily breaks because of two reasons:

  • When multiple users mount the same network volume (e.g. a Samba share), Mac OS X unfortunately assigns mount points on a first-come-first-serve basis. This means that mount points are not stable between reboots, and sometimes not even between login sessions without a reboot. This in turn means that the symbolic link to the iTunes library often points at the wrong location, requiring the symlink to be manually re-created every so often with an updated target location.
  • Sometimes the symbolic link gets deleted and needs to be re-created.

Because of these things I have decided to stop fighting iTunes and just accept that some data must be stored locally.


One of the following commands updates the snapshot on the file server:

# Use this command if the "alles-andere" share is not mounted on the machine
# Important: DO NOT USE A TRAILING SLASH for the local folder to be snapshotted,
# otherwise rsync will place the CONTENT of the local folder into the root folder
# of the sparse bundle.
htb-mkbackupcopy.sh -s "//pi@raspberrypi1/alles-andere" -d "Backup/Snapshots/iTunes Library.sparsebundle" "$HOME/Music/iTunes" "."

# Use this command if the "alles-andere" share is mounted on the machine for the current user.
# A numeric suffix possibly has to be added to the mount point (e.g. "/Volumes/alles-andere-1"),
# if some other user on the machine has mounted the share first.
htb-mkbackupcopy.sh -d "/Volumes/alles-andere/Backup/Snapshots/iTunes Library.sparsebundle" "$HOME/Music/iTunes" "."

# Use this command if the sparse bundle image is mounted on the machine for the current user.
htb-mkbackupcopy.sh "$HOME/Music/iTunes" "/Volumes/iTunes Library"


Backup history

My pre-2016 backup scheme required me to manually create backups and burn them to DVD. This section - although obsolete - shows how in the 13 years that I have been working on my self-hosting project I managed to create merely two of these manual backups. It is a monument both to my personal lack of discipline when it comes to making backups, and to the danger that is inherent in manual backups because they require such discipline. In 2016 a catastrophic hard disk failure on my MacMini occurred and - with the most recent backup seven (!) years out of date - made me realize how foolish I had been. I paid my dues to a data rescue service and then brought the era of manual backups in my household to an end.


Basic inventory:

osgiliath system backups:

  • 06.05.2007 (1 DVD)
  • 06.03.2009 (2 DVDs)

iPhoto library:

  • 05.03.2007 (1 DVD with albums 2001-2005, 1 DVD with albums 2006+2007)

Important data backup

  • 08.05.2007 (2 DVDs)
  • 06.03.2009 (1 DVDs)