Mongo Retooled - or "Whatever happened to --dbpath"

With the arrival on MongoDB 3.0, one of the less discussed changes is the complete reworking of the MongoDB tools such as mongodump. The first thing anyone notices is that the --dbpath option has gone and, if you like to work with your MongoDB files on disk, it's not clear what to do next. For anyone working with MongoDB, this is a big change in how you handle backups, but the first question most people ask is "Why change?"

Evolving tools, evolving MongoDB

One of the big things about MongoDB 3.0 was the introduction of pluggable storage engines. This meant a completely different way of storing data could be plugged into MongoDB, on demand, and the database would start using that. No, this doesn't move your data between the storage engines. They can use whatever directory or format they like and when they start up for the first time, they don't look for data in other formats and import it. They just run and start up empty. There's no native data compatibility between them.

Compare this with before MongoDB 3.0. There was only one storage format – sit down at the back Tokutek, we know you made your own – just the one storage format, MMAPv1 as its known today. MongoDB tools needed to know only that one format and it could even use code from inside the DB itself to read chunks of that data off-disk. If the disk format changed, the tools could be updated alongside the database. It was a simpler time.

Now, when the storage engines came in 3.0 there was a new conundrum to be solved. If the database can read and write data to disk in any one of a couple of formats, which ones do you put in the tools without making life a very complicated engineering task? You could consider making it so plugins storage engines would work with both the database and the tools, but that's another complex engineering task.

So the MongoDB engineers went, "Hey, why don't we just talk to the database and let it sort it out". What database you may wonder... in this case, it's the one you run to read your data files. The database will have all the pluggable storage engines installed in it and will be the tried and trusted way of accessing that data. You can get the MongoDB database up and running with a simple command line and then you get the tools to connect to it.

The old tool collection has to be retired in this scenario and MongoDB developers took the opportunity to completely rewrite the tools in Go using the mgo driver. Options for being pointed at a set of files are gone and now the tools exclusively support getting their data from a database.

Practical changes

With the new tools in place, the workflow actually becomes rather simple. Let's start by doing mongodump from a MongoDB backup from Compose.

First, you'll need to get MongoDB installed on your local hardware. Generally, you can download binary versions from mongodb.org. Or you can, if you are on Mac OS X and use Homebrew to brew install mongodb. On Linux, it's not likely you'll have a version of MongoDB 3.0 available in the package repository so follow your Linux distribution's instructions to get a solid installation.

Next, you'll need your downloaded backup. Go to your Compose Dashboard, select your MongoDB deployment, then select Backups. You can download any of the regularly taken backups for download or do an on-demand backup (which you can download when it's completed). Just click the box with the downwards arrow next to the backup you want.

Once downloaded, you will want to make some space to work in.

$ mkdir restoreexample
$ cd restoreexample

Then unpack that backup into its own directory...

$ mkdir db
$ tar xvZfC pathandfilenameofbackupfile.tar.gz db

The C option in tar extracts to the specified directory. If your browser automatically decompresses files, lose the Z and the .gz extension.

With the data unpacked, we are now ready to start the database.

$ mongod --dbpath ./db

Actually this command is just letting things default. To be precise we should specify the storage engine, in this case it's mmapv1, the default MongoDB 3.0 storage engine. In the future it could be wiredTiger, but not today. So...

$ mongod --storageEngine mmapv1 --dbpath ./db

Which ever one you use, there'll be a lot of output on the screen. Start up another terminal session and cd into the restoreexample directory. We are now ready to run mongodump...

$ mongodump

And that's it. You'll now have a directory named dump with the dumped files in it. Did we miss something? No, it's just that the mongod server process starts up listening on the localhost port 27017 and, by an unremarkable coincidence, the mongodump command defaults to connecting to localhost port 27017.

mongodump is not the only command, of course, to lose the --dbpath option. mongorestore, mongoexport, mongoimport, mongofiles and mongooplog have all lost that option and a few other related options. If you want to work with your MongoDB data locally now, you have to bring up a server locally and let that do the data wrangling for you.

The Upside

It is worth mentioning that the rewrite of the tools did add some new features to the tools. These features include the ability for Mongodump to exclude collections from being dumped, mongorestore being able to parallel restore data and read BSON data from standard in and tool integrators will find mongostat and mongotop are now able to return json formatted output. These are not huge changes compared to the tools requiring use of a running server.

Although this is an added step to the process, the switch to always using a mongod process to manipulate data has its upsides. The tools don't need to be changed for any new storage engines and can be treated as their own entity. In fact, that's what MongoDB developers do; the tools have their own GitHub repository and any enhancements will be available to all the various plugin storage engines. It also means that you don't have to match your tools to your MongoDB version and storage engine, though you will have to keep tabs on what MongoDB version and storage engine you are using so you install the right one when you set up your local database.

The real payoff for this change will come when there are more storage engines available for MongoDB, probably when the plugability of them becomes a bit more dynamic rather than hardwired. In that future, you'll still have one set of tools to use when you need them.