Tuesday, August 11, 2009

Complete site backups in under five minutes

I've changed how site backups for Rosetta Code work.

Previously, a site backup was a rather manual affair of mysqldump, tar and scp. I've got a fair number of large tarballs that only contain the contents of the httpd root as well as database SQL dumps. Consumed space and bandwidth grows fairly quickly; This is part of why incremental backup strategies get devised.

Now, I have a set of nested makefiles on an offsite system with a few different targets. The root level makefile has 'backup', 'recurse' and 'git' targets. The 'backup' target depends on the other two. The recurse target drops into a few different subdirectories, one each for databases, webroot and logs. Each subdirectory has its own make file with a 'backup' target.

The databases subdirectory connects to the server, has the server do a mysqldump of the databases to a server-local file, and then uses rsync to copy the SQL dump file to the local system.

The webroot subdirectory uses rsync to copy the webroot to the local system.

The logs subdirectory uses rsync to copy system log files to the local system.

After running the recurse target, the root makefile runs the git target, which updates a local git repository with the modifications since the last time a backup was done. This is relatively cheap; Since the data has already been copied to the local system, the server isn't loaded down with the subsequent processing.

Once the whole thing has been primed (the first backup takes quite a while, as all of the data has to be copied), a full backup run takes less than five minutes to save off the changes from an hours' subsequent site traffic.

The biggest problem with the system is that serverside CPU usage is fairly heavy with the mysqldump and the rsync work. Serverside work currently takes four of the five minutes, while the git processing takes the rest. Hopefully, I'll be able to offset this by moving some batch processing typically done on the server to offsite, on better, faster hardware.

In the near future, I plan to add chunks of /etc to the backup process.

No comments:

Post a Comment