Optimizing Longterm Backups

CaptainCore is now connected to over 1700 WordPress sites. Not bad considering it’s not yet publicly available. Have I ever mentioned building products is hard? 🤣 Every WordPress site is backed up daily to a Restic backup repository stored in a B2 bucket, which is a very cost-effective way to store long-term backups. Most of these sites have backups going back to Oct 2020, when I first rolled out this new Restic-infused backup solution. That means some websites, like this website, have over 900 backup snapshots fully restorable.

When I say cost-effective I mean B2 storage is very cheap. 1TB of B2 storage is roughly $5/month. Compare that to a Google Cloud persistent disk storage which is roughly $100/month for 1TB. Unfortunately getting data into a Restic backup repository requires a local copy of the files, which means I still need to pay a premium for local storage.

Currently, data is incrementally copied from each WordPress site to the main CaptainCore server which is then pushed up incrementally to Restic repositories. Only changed data is transferred so this all happens fairly quickly. Ideally, I’d like to remove the need for local storage. But how? Let’s explore some alternative ideas.

Running Restic directly from WordPress could work but not worth the risk

One idea would be to run Restic directly from WordPress over ssh. While this might be technically possible with many web hosts it also means the WordPress site would have credentials to talk to the B2 bucket. This same B2 bucket also contains other customer backups so I can’t see any secure way to do that. Maybe there is a way to use a local copy of Restic and talk to just the file system over SSH? Some discussion on that here. I can’t risk my primary B2 key getting intercepted by customers so I’m going to scrap the idea.

Experiments with Rclone mount and Restic backup

My next idea is to mount a remote SFTP connection to a local folder with rclone mount. Then run restic backup on that mount. This will take a performance hit when compared to running Restic with local files, but the cost savings potential is worth the effort to find out if it’s possible.

Experiment #1 – Backing up SFTP connection to B2 Restic repository.

# New empty directory for mount
mkdir backup
# Mount SFTP connection
rclone mount production: backup --config=rclone.conf --daemon --read-only
# Run Restic backup to B2 bucket
cd backup
restic backup . --repo rclone:B2:BucketName/sitename/production/restic-repo --password-file="restic.key" --file="restic-excludes" --no-scan --ignore-inode --read-concurrency=4
# Close mount
cd ..
umount backup/

This seems to work OK. It’s definitely slower, but it’s not unreasonable. The one thing I don’t like is the long connection talk to the WordPress site. Typically running an rclone sync command will grab any changes from the WordPress site in a few seconds or even minutes. Here the entire Restic backup time also requires an active connection via the SFTP mount which I don’t like.

Experiment #2Rclone sync to B2 bucket then push B2 bucket to B2 Restic repository.

My theory is SFTP over rclone mount is going to be slower than B2 over rclone mount. Even if the performance ends up being the same as the previous attempt, this approach offers two advantages:

  • I can organize the data how I want. That means I can put the database backup in the same place where the website is being hosted. That’s not possible when running Restic directly.
  • SFTP connection time will be minimal. Since rclone sync can very quickly figure out only the files that have been changed, the time talking to SFTP connections will be minimal. That means the bulk of the backup time can be safely offloaded to the CaptainCore server.
# Incremental backup WordPress site and database to B2 bucket
rclone sync production:$home_directory B2:BucketName/sitename/production/backup/ --file="excludes" --config=rclone.conf
rclone sync production:$database_backup B2:BucketName/sitename/production/backup/ --config=rclone.conf

# New empty directory for mount
mkdir backup
# Mount SFTP connection
rclone mount B2:BucketName/sitename/production/backup/ backup --config=rclone.conf --daemon --read-only
# Run Restic backup to B2 bucket
cd backup
restic backup . --repo rclone:B2:BucketName/sitename/production/restic-repo --password-file="restic.key" --file="restic-excludes" --no-scan --ignore-inode --read-concurrency=3
# Close mount
cd ..
umount backup/

Backup experiments compared

A customer of mine has a large WordPress site that is 98.3 GB with 792,065 files. I picked this one to run performance testing on backups because it’s abnormally large. The first three are backup duration times using the current backup method (rclone sync to CaptainCore, then restic backup to B2).

  • March 26th 2023 4:02 pm – 4 minutes and 43 seconds
  • March 27th 2023 4:46 pm – 7 minutes and 17 seconds
  • March 28th 2023 5:53 pm – 5 minutes and 52 seconds

The initial rclone sync to B2 bucket took a very long time to complete, over 24 hours. That’s because running a sync from a remote SFTP to B2 means each file had to be downloaded and then uploaded. After that initial sync was done, here are the following duration time using the new backup method (rclone sync from SFTP to B2 then restic backup from B2 files to B2 restic repo).

  • March 30th 2023 7:46 pm – 31 minutes and 59 seconds
  • March 31st 2023 7:23 pm – 21 minutes and 34 seconds
  • April 1st 2023 6:05 pm – 24 minutes and 41 seconds
  • April 2nd 2023 6:12 pm – 23 minutes and 49 seconds
  • April 3rd 2023 6:04 pm – 26 minutes and 40 seconds
  • April 4th 2023 5:03 pm – 23 minutes and 10 seconds

The new method is a huge cost saver because I won’t need local storage to run long-term backups. The slow performance hit does mean I’ll need to scale up the number of backups I’m running in parallel. Since I don’t need local storage I’m exploring running these backups with docker and then scaling up as many containers as I need to handle the incremental backups 🤔. A project for another day.