Quick Server Backup
[Back up](http://photomatt.net/2005/03/02/just-back-up/). Important topic. Booooooring topic. But I finally have a reliable mechanism in place: Except for the logs, every file that’s here on the server is sitting safe on my hard drive, in a basement in Waterloo.
It’s a comfort.
I’ll demonstrate how I’ve done it, since I think it’s a pretty good procedure for keeping a continuously synchronized duplicate.
**Assumption 1**: You have a hosting account that you can shell into. This means a real command line, not just FTP.
**Assumption 2**: I’ve provided instructions for acquiring `rsync` on a Windows box. Any Linux distro will have it in its source tree, and there are [ways to run it on OSX](http://www.bombich.com/mactips/rsync.html).
### Rsync
`Rsync` is a tool that lets you keep two locations syncronized. I use it to atomically launch changes to the [Tron09][t09] website. I can work, undisturbed, on a dev directory, and then once a feature is ready and tested, sync the two and push it “up to live.”
But `rsync` can syncronise across two separate computers as well. That’s what we’re going to do: synchronize a remote hosting account with a local folder. The first time it’ll take a *long* time, but after that, it’ll only sync the most recent changes.
### Getting Rsync
On Windows, I used [cwRsync](http://www.itefix.no/phpws/index.php?module=pagemaster&PAGE_user_op=view_page&PAGE_id=6&MMN_position=23:23). Once the installer was done, it had added a program group with a demo batch file containing a lot of REM lines (comments), and a few PATHs to help Windows out. Copy all of this out of there, and paste it into a new text file on your desktop, called `backup.cmd`.
At the bottom of this file, we’re going to put the actual `rsync` statement that will do the work.
### Rsync Usage
Here’s the line from my file:
> `rsync -rvu –exclude=logs myusername@foothill.dreamhost.com:/home/myusername/* /cygdrive/c/backup`
The first directory given is the “from” path. It’s been specified as a network resource, and the syntax is pretty precise: it’s got to be “user at SSH server colon location.”
The second directory is where I want the files put locally. Again, it’s a strange path, but this is because of the way rsync (natively a Linux app) views the Windows filesystem. Since Linux has no concept of “drive letters”, and a colon would be hopelessly confusing, the `cygdrive` approach is what has been chosen.
Now, as for the `-rv` bit, that’s actually two separate options: r for recursive; v for verbose. We want Rsync to copy all the subfolders rather than just the root, and given that the operation could take a while, it’s nice to have some verbose output to let us know what’s going on.
And finally, `–exclude` tells `rsync` to ignore the log directory. It’s pointless and time consuming for me to back this up. In a situation such as an e-commerce environment, it might be prudent to back logs up as well. It’s not necessary for a weblog, however.
### Keeping it Regular
Obviously, the synchronization is only going to happen when the batch job is run, right? For myself, I’ve put the backup in my Startup folder. If I know nothing has changed, I can simply dismiss it when it prompts for a password.
If you’re feeling more ambitious, it would be totally possible to run this through the Windows Scheduler, or by some other devious means.
### Gotchas
* Linux filesystems support things called symlinks. If you use any of these in your sites, cwRsync won’t follow them or back them up. This is a limitation in Windows; there’s no equivalent functionality in the NTFS filesystem.
* This is only a backup of the site files. For a database backup as well, you’ll need to run a cron on the server that periodically dumps the DBs to files. Then they’ll get scooped up along with everything else this procedure handles.
* All Rsync can do is tell that a file _has_ changed. It can’t dive into a 20 MB XML file and find the one line that you changed. ([Subversion](http://subversion.tigris.org/) can do this, but it’s a much different beast than Rsync, since it also *backs up* every previous version of every file…)
Correction: Rsync *does*, in fact, do partial updates on changed files, thanks to the [rsync algorithm](http://en.wikipedia.org/wiki/Rsync#Algorithm).
### Summary
Backup is a pain to set up, but as I begin to host sites for people—even freebies for friends—it becomes more important to have fault recovery outside of what’s [provided by a host](http://www.squarefree.com/2005/05/02/snapshots-on-dreamhost/).
And the cool thing about this technique is that it can sync the *other* way too. The next time I have an important desktop development project or writing assignment, I can sync it serverward for a free, co-located backup.
Mike
[t09]: http://tron09.com

You can use Markdown for style. I love hearing from readers, but please don’t hijack the discussion, use offensive language, or try to sell anything.