uwMike.com

I'm in Waterloo at the moment, and next available to work in September 2008.

Quick Server Backup

December 6th, 2005

Back up. Important topic. Booooooring topic. But I finally have a reliable mechanism in place: Except for the logs, every file that’s here on the server is sitting safe on my hard drive, in a basement in Waterloo.

It’s a comfort.

I’ll demonstrate how I’ve done it, since I think it’s a pretty good procedure for keeping a continuously synchronized duplicate.

Assumption 1: You have a hosting account that you can shell into. This means a real command line, not just FTP.

Assumption 2: I’ve provided instructions for acquiring rsync on a Windows box. Any Linux distro will have it in its source tree, and there are ways to run it on OSX.

Rsync

Rsync is a tool that lets you keep two locations syncronized. I use it to atomically launch changes to the Tron09 website. I can work, undisturbed, on a dev directory, and then once a feature is ready and tested, sync the two and push it “up to live.”

But rsync can syncronise across two separate computers as well. That’s what we’re going to do: synchronize a remote hosting account with a local folder. The first time it’ll take a long time, but after that, it’ll only sync the most recent changes.

Getting Rsync

On Windows, I used cwRsync. Once the installer was done, it had added a program group with a demo batch file containing a lot of REM lines (comments), and a few PATHs to help Windows out. Copy all of this out of there, and paste it into a new text file on your desktop, called backup.cmd.

At the bottom of this file, we’re going to put the actual rsync statement that will do the work.

Rsync Usage

Here’s the line from my file:

> rsync -rvu --exclude=logs myusername@foothill.dreamhost.com:/home/myusername/* /cygdrive/c/backup

The first directory given is the “from” path. It’s been specified as a network resource, and the syntax is pretty precise: it’s got to be “user at SSH server colon location.”

The second directory is where I want the files put locally. Again, it’s a strange path, but this is because of the way rsync (natively a Linux app) views the Windows filesystem. Since Linux has no concept of “drive letters”, and a colon would be hopelessly confusing, the cygdrive approach is what has been chosen.

Now, as for the -rv bit, that’s actually two separate options: r for recursive; v for verbose. We want Rsync to copy all the subfolders rather than just the root, and given that the operation could take a while, it’s nice to have some verbose output to let us know what’s going on.

And finally, --exclude tells rsync to ignore the log directory. It’s pointless and time consuming for me to back this up. In a situation such as an e-commerce environment, it might be prudent to back logs up as well. It’s not necessary for a weblog, however.

Keeping it Regular

Obviously, the synchronization is only going to happen when the batch job is run, right? For myself, I’ve put the backup in my Startup folder. If I know nothing has changed, I can simply dismiss it when it prompts for a password.

If you’re feeling more ambitious, it would be totally possible to run this through the Windows Scheduler, or by some other devious means.

Gotchas

  • Linux filesystems support things called symlinks. If you use any of these in your sites, cwRsync won’t follow them or back them up. This is a limitation in Windows; there’s no equivalent functionality in the NTFS filesystem.

  • This is only a backup of the site files. For a database backup as well, you’ll need to run a cron on the server that periodically dumps the DBs to files. Then they’ll get scooped up along with everything else this procedure handles.

  • All Rsync can do is tell that a file has changed. It can’t dive into a 20 MB XML file and find the one line that you changed. (Subversion can do this, but it’s a much different beast than Rsync, since it also backs up every previous version of every file…)

Correction: Rsync does, in fact, do partial updates on changed files, thanks to the rsync algorithm.

Summary

Backup is a pain to set up, but as I begin to host sites for people—even freebies for friends—it becomes more important to have fault recovery outside of what’s provided by a host.

And the cool thing about this technique is that it can sync the other way too. The next time I have an important desktop development project or writing assignment, I can sync it serverward for a free, co-located backup.

Mike

Leave a Reply

You can use Markdown for style. I love hearing from readers, but please don’t hijack the discussion, use offensive language, or try to sell anything.

© 2004-2008, Mike Purvis, some rights reserved. I'm running Wordpress, and I have an RSS feed.