Backup Script for GitHub

Really, there is nothing about this that is specific to GitHub but they are pretty full of awesomeness so I figure everyone might as well use them for their git hosting needs.

I am a little paranoid about keeping extra copies of my work around, especially my private repositories that are not naturally distributed amongst others via forks and whatnot. As such, I wanted to have an automated backup solution of my repo.

Below is a simple bash script that I installed in cron to run once every three hours for several of my repositories. This dumps it into a location that is already getting backed up to an offsite storage solution. This should provide me with eight snapshots per day, which should be plenty.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash

usage() {
    echo >&2 "Usage: $0 <USER>/<REPO>"
    exit 1
}

set -e

test $# -eq 1 || usage
REPO="$1"
TSTAMP=`date "+%Y%m%d-%H%M%S"`
BACKUP_DIR='/backups/git/'$REPO
BACKUP_BASE_NAME=$TSTAMP.${REPO/\//-}

# Make Backup Directory if it doesn't exist
mkdir -p $BACKUP_DIR

# Clone a mirror
git clone --mirror git@github.com:$REPO.git $BACKUP_BASE_NAME.git

# Tarball it up
tar zcf $BACKUP_BASE_NAME.tar.gz $BACKUP_BASE_NAME.git

# Clean Up
rm -rf $BACKUP_BASE_NAME.git
mv $BACKUP_BASE_NAME.tar.gz $BACKUP_DIR

This is just making a local backup on whatever machine you schedule this to run. It's still a good idea to have another job push everything off your server and onto an offset storage solution (e.g. Amazon S3).

There are also a number of improvements that could be made to this. One improvement for instance is instead of creating a clone blindly, we could have a reference clone to track if anything changed since the last time the script ran through checking the before and after state of a git fetch operation and if nothing is different, then skip making a tarball.

How do you back up your git repositories?