How We Do It: Backups

We've talked about how you can produce backups on-demand both manually and using the API, but what happens behind the curtain at Compose to make it so easy and how do we take a consistent backup without stopping your database?

For our MongoDB elastic deployments, the secret is hidden – a hidden replica set member. When we deploy MongoDB for you, it's set up as a replica set with two visible members, a primary and a secondary database instance. What you can't see is the third member, the Harpo to the primary Groucho and secondary Chico. Like Harpo, this hidden member is silent – you'll never access it and you won't see it report its presence. Like the other members of the replica set, it does have a complete up-to-date duplicate of the data in the database.

When we need to take a backup, we ask Harpo to stop keeping up to date and take a break by sending it a db.fsyncLock() which flushes out all the pending write operations and locks the database. This is just a lock on the silent Harpo so neither Groucho or Chico notice that this has happened and carry on handling your queries, updates and other changes. Once the fsyncLock has engaged, we can now slip in and take a copy of the data files, the ones actually stored on disk not just a MongoDump, and we tar them up and send them to a private S3 bucket on Amazon.

When this copy is completed, we send a db.fsyncUnlock() and release the Harpo instance. It then catches up with the replication state of the other two members and rapidly returns the replica set to complete consistency. With the backup safely stored, it is also ready for retrieval.

We keep track of the backups through an internal database which holds on to the one on-demand backup and the daily, weekly and monthly backup sets. It's that database we look to when you ask for a particular backup for download or restore to locate the file in S3 and place it either in a specific secure location for download or into the data directory of a new deployment for restore.

Because we store the data files, rather than a dump, it's a lot easier to bring up an entire database and all its collections, as we showed in a previous article where we use the backup files to quickly make a local copy of a production database, ideal for one-off analytics or testing new ideas or queries.

We don't simply make backups. At Compose we give you the ability to easily put those backups to work when you need them. Because thats how we do it at Compose.