Compose backups - Going local

We do backups differently at Compose and we'd like to explain why. Our preferred approach is to take a rapid snapshot of a database's data on disk. With the range of databases we have though there are exceptions to that preference and that means different ways to load a local database with that data.

Why would you want to load a local database with the contents of your production database? Well, you may want to test new code in an isolated environment, do some analysis without affecting your applications response times or just explore your own data set for anomalies. For some databases, it's also the springboard for rapid export or import into other databases or applications. So we're going to look at all our current databases...

Snapshots and MongoDB

With MongoDB, we use snapshots to create our backups. We do that for a couple of reason. Firstly, there's no downtime in taking a snapshot. The image is swiftly grabbed at a suitable moment and the production databases know nothing about what has happened. The actual backup creation takes place in the background. The last thing you want with a backup is for it to be interrupting your service while trying to protect your service.

Secondly, these snapshots make for a complete image of your database and that means that we don't have dependencies on previous copies of the data or or other silos of information. That makes for a more reliable and coherent backup.

Thirdly, because we have a complete image, it makes it quick to restore the whole database into a new database deployment. We don't even think about trying to reimport data into an existing deployment as that could just complicate things to the point of corrupting data. Instead we create a clean new database deployment with the snapshot data in place and startup. And you're back.

So that's why we take snapshots, but we want to talk about how you can use those snapshots – and other backups – yourself, outside of Compose's restoration options and for that, you'll need a copy of the backup file.

Getting your backup

This process is the same for most of the databases you use on Compose. Head to the Compose Console and select Backups from the sidebar.

This takes us to a screen like this:

You can restore or download from a daily, weekly or monthly backup by selecting one of the rows in the Daily Backups, Weekly Backups or, you guessed it, Monthly Backups section of the table. If you want an up to the minute backup, you can click the Back up now button and it'll create an on-demand backup which will, when complete, be displayed at the top of the table.

On the right hand side of the table, you'll find two icon buttons in each row. The arrow into the box is the one we're interested in here as that is the download button; the other button, the circular arrow, is the restore button which we don't need for this process.

On everything but our original older MongoDB, clicking on the Download button will start the download immediately. The file being downloaded will be a gzipped tar archive though the actual file name may be just tar depending on your browser's interpretation of the download. You can, on Unix like systems, confirm the file type with the file command (file downloadedfile). The file will either be named with the timestamp it was created or as on_demand. To make life more readable from this point on, we're going to call our downloaded file backup.tar.gz. RethinkDB users won't need to do this extraction, but for other databases we now extract the contents of the backup into its own directory. It's best to extract any backup into its own directory as they may contain many, many files.

$ mkdir db
$ tar xvCZf db backup.tar.gz

The C in the tar command makes tar change into the specified directory before extracting. Now we've downloaded and extracted the backup into a directory called db, let's go bring up a local instance of the database. That of course depends on the database where the backup came from. We will assume that we've also downloaded and installed a version of the database that matches the version (or later) of what is in use on Compose.

Starting MongoDB Locally

To bring up a local MongoDB database with our data, we just have to run mongod and set the dbpath option to point at our data directory:

$ mongod --dbpath ./db
...

That's it. The database should be running on localhost:27017 and we should be able to run mongo to get a shell session with it.

A local PostgreSQL

PostgreSQL backups are built to allow many backups to be restored into the same directory. If we extract a typical backup as above and look in the db directory we have to go down a few levels to get to the data. In db we find a data directory:

[db] ls                                                                                                                                                                          
data  
[db] cd data

and inside there a backup directory:

[data] ls                                                                                                                                                                        
backup  
[data] cd backup                                                                                                                                                                 

which leads to a timestamped directory for when the backup was taken:

[backup] ls
20160120121608  
[backup] cd 20160120121608

and within there a snapshot directory:

[20160120121608] ls                                                                                                                                                              
snapshot  
[20160120121608] cd snapshot                                                                                                                                                     
[snapshot] ls   
README conf   data  

The snapshot directory is our destination for database data and that README file contains everything we need to unlock it. If we look in there we find this:

[snapshot] more README                                                                                                                                                           
This snapshot is meant to be run with the same minor version  
of Postgresql, which is postgres (PostgreSQL) 9.4.5.

To startup a Postgresql environment with this snapshot, run:  
`postgres -D conf`

You can then connect to the db by running: `psql postgres -U focker`

To list all your databases run `\list` and to connect to  
another database, run `\connect compose`.

The configuration in the conf has very promiscuous settings,  
and will allow connections from any user.  You will want to  
modify the configuration files before allowing world access  
to the db.  

There's not a lot to add to that...

postgres -D conf  
LOG:  database system was interrupted; last known up at 2016-01-20 12:16:08 GMT  
LOG:  redo starts at 0/7000028  
LOG:  consistent recovery state reached at 0/70000F0  
LOG:  redo done at 0/70000F0  
LOG:  MultiXact member wraparound protections are now enabled  
LOG:  database system is ready to accept connections  

And you are up and running.

A RethinkDB of your own

Rethinkdb's backups are created with the rethinkdb dump command and the tools know how to work with gzipped tar files, which is why you could skip to here; you don't need to unpack the backup. To restore them, you need a running rethinkdb instance which is simple enough:

$ rethinkdb                                                                                                                                                                    
Recursively removing directory /Users/dj/rethinkdb_data/tmp  
Initializing directory /Users/dj/rethinkdb_data  
Running rethinkdb 2.2.3-1 (CLANG 7.0.2 (clang-700.1.81))...  
Running on Darwin 15.3.0 x86_64  
Loading data from directory /Users/dj/rethinkdb_data  
Listening for intracluster connections on port 29015  
Listening for client driver connections on port 28015  
Listening for administrative HTTP connections on port 8080  
Listening on addresses: 127.0.0.1, ::1  

Now go to another terminal session and run rethinkdb import backup.tar.gz:

$ rethinkdb restore backup.tar.gz                                                                                                                             
Unzipping archive file...  
  Done (0 seconds)
Importing from directory...  
[========================================] 100% 

If we bring up a browser and look at the RethinkDB console on localhost:8080 we should find all our data there.

Redis to go

Redis backups are a copy of the dump.rdb file created by the server. Bringing them to life locally is a little more complex as the redis-server is configured entirely from a .conf file. We'll give an example here of starting the server using a brew-installed Redis. We've got our data extracted into the db directory so first we'll cd db into that directory. Then we want a simple configuration file. The easiest source for that is the one installed with our Redis, in /usr/local/etc/redis.conf so we'll copy that to the db directory and edit it:

$ cp /usr/local/etc/redis.conf .
$ vi redis.conf

You now need to change the entry in the config file for dir. It'll typically be pointing at a directory and in our case it reads dir /usr/local/var/db/redis/. We'll change that to dir . and save the file. Once saved, we can run the server with our config file.

$ redis-server redis.conf

To check it's running, you should be able to log in with just redis-cli in another window and enter scan 0 to get a quick view of some of the keys in the Redis database.

Elasticsearch

Currently, Elasticsearch is an exception to Compose's downloadable backups feature. Contact support@compose.io if you want to assistance with Elasticsearch backups. Backups are still restorable within the Compose system.

Wrapping up

We've covered how to use Compose's downloadable backups to bring up a local database with the same data running in your production database. You can use this for local testing, performance runs or just as part of your restoration process.