Really, there is nothing about this that is specific to GitHub but they are pretty full of awesomeness so I figure everyone might as well use them for their git hosting needs.
I am a little paranoid about keeping extra copies of my work around, especially my private repositories that are not naturally distributed amongst others via forks and whatnot. As such, I wanted to have an automated backup solution of my repo.
Below is a simple bash script that I installed in cron to run once every three hours for several of my repositories. This dumps it into a location that is already getting backed up to an offsite storage solution. This should provide me with eight snapshots per day, which should be plenty.
#!/bin/bash
usage() {
echo >&2 "Usage: $0 <USER>/<REPO>"
exit 1
}
set -e
test $# -eq 1 || usage
REPO="$1"
TSTAMP=`date "+%Y%m%d-%H%M%S"`
BACKUP_DIR='/backups/git/'$REPO
BACKUP_BASE_NAME=$TSTAMP.${REPO/\//-}
# Make Backup Directory if it doesn't exist
mkdir -p $BACKUP_DIR
# Clone a mirror
git clone --mirror git@github.com:$REPO.git $BACKUP_BASE_NAME.git
# Tarball it up
tar zcf $BACKUP_BASE_NAME.tar.gz $BACKUP_BASE_NAME.git
# Clean Up
rm -rf $BACKUP_BASE_NAME.git
mv $BACKUP_BASE_NAME.tar.gz $BACKUP_DIR
This is just making a local backup on whatever machine you schedule this to run. It's still a good idea to have another job push everything off your server and onto an offset storage solution (e.g. Amazon S3).
There are also a number of improvements that could be made to this. One improvement for instance is
instead of creating a clone blindly, we could have a reference clone to track if anything
changed since the last time the script ran through checking the before and after state
of a git fetch operation and if nothing is different, then skip making a tarball.
How do you back up your git repositories?
Leave a Comment