demko.ca

<< Back to Posts

How to Build a Backup System

Aleksander Demko, 2011

Backup is vital. Keeping your important data on finicky spinning disk platters with reading heads held precariously on a cushion of air or oil is a recipe for disaster. Your hard disks will die, crash, get stolen or just break. Data will be lost. If you don't have a backup system in place, you're just waiting for the inevitable.

For most people, I recommend an external USB (flash or disk) drive, depending on the amount of data in question. On line services might also be a consideration, if privacy and Internet bandwidth aren't issues. However, for geeks or those with lots of data, I recommend custom solutions. In this post I will discuss my solution, and some of the issues I faced.

My Setup

I run Linux, so my solutions will utilize tools that come naturally to that OS, specifically, rsync. For network backups I combine rsync with ssh (using keys). I prefer rsync to archiving solutions (like tar) as it makes file extraction and testing simple.

Stage 1

I have two stages to my backup strategy.

The first stage uses rsync to daily (via cron) backup my /home directory (plus /root and /etc) to a second drive in my workstation. This second drive (aka internal backup drive) is dedicated to backup purposes. rsync is smart, of course, and will only copy files that are different since the last backup.

However, I take rsync one step further and use it to take 4 different backups (snapshots) a month. This way, if I accidentally delete a file a few days back, I can go back to the previous snapshot and retrieve it. This would normally take 4 times as much disk space on my backup drive - a huge waste of space. To remove much of this data duplication I use the --link-dest option of rsync to reuse unchanged files (by using hard links) from previous snapshots when building new ones.

Now, you can do a simple one-snapshot rsync system for simplicity, but if you'd like to try this --link-dest stuff, you can have a look at my sched_rsync python script, which is part of my backup scripts collection (see below). It calculates the current snapshot directory (depending on the day of the week or month), and gives --link-dest the previous directory.

Stage 2

The second stage does an off-site backup. Off-site backups add an additional layer of redudancy to the system. They're required in cases of fire, mass electrical problems, theft, etc.

For this purposes I keep 3 drives off-site. Every few months, I cycle through one of the drives: I bring it home and rsync the contents of the internal backup drive to the external drive.

For privacy and security, all (and especially the external) drives are full-disk encrypted. Under Linux, I simply use the build-in cryptsetup/luks system. The external drives don't have to be stored in a secure location, just a location you have occasional access to, such as a family member's house, friend's place or even your day job's office. I use off-site disks as apposed to some kind of network backup service as this doesn't have a monthly fee, doesn't require large Internet pipes, provided cheap disk space and is more secure.

Backup Verification

I'm incredibly paranoid about data loss on the backup drives. In particular, the silent, hard to detect errors where a bit or two in some little sector in some large home movie decides to flip are particularly scary. These types of errors could go unnoticed for years until you actually try to retrieve the file, at which point it's too late. The solution? More scripts, software and testing policies.

When I bring a Stage 2 external backup drive in for updating, I perform the following steps:

This whole process can take a day, which may seem long. I consider it small price to pay for assurance, and also, I think it's a good exercise for drives who live an otherwise dormant life.

Note, some of per-file checks in the last step I also perform on my internal backup drive too, such as md5_check and test_attr.

Getting the Scripts

All the utilities mentioned here are available from github. I'm providing the raw scripts for you to use as you wish, and assume you know what you're doing. Since I'm not in the business of providing off the shelf backup software, I don't provide "support" or help for these simple scripts at all. Patches are of course, welcome.