How I Stopped Worrying & Learned to Love the Mongo Shell


Although it is built around a JavaScript engine, many people approach the Mongo Shell as simply a way of entering queries, updates and administration commands. But that JavaScript engine opens up a world of possibilities for making a MongoDB user's life easier and more efficient when it comes to herding the data.

The critical part of the process is understanding that you are working with a JavaScript session within the shell. Each line you enter into the shell's REPL (Read-Evaluate-Print Loop) is evaluated as a JavaScript expression and the results of that evaluation are persisted for the session with the shell. Without realising that, you can end up typing and retyping large chunks of boiler plate code into the Shell, which, even though it has simple inline editing and command completion, is still a chore.

Let's look at a practical example of handing a migration. Imagine we have a collection of machines called "webscale", with each machine's kind being either "production","test" or "dev", running an app in either "javascript","scala","go","ruby" or "java", and with an active flag set to true or false.

Now, we should all be generally aware that you can do basic querying ...

db.webscale.find({kind: "production"})  

... and updates ...

db.webscale.update({kind: "production"}, {$set: {active: true}})  

Consider now if we wanted to take some subset of that data and copy it to another collection. The job at hand may be to copy the "test" machines in their own "test" collection. Using the built-in forEach function and JavaScript's anonymous functions gives us the power we need to do this.

db.webscale.find({kind:"test"}).forEach(function(doc) {db.test.insert(doc)})  

That's fine for a quick one-liner but as soon as you need to do some real work, or want to reuse them, anonymous functions become tough to read and difficult to debug.

For example, say the requirement just changed so we need to remove the test machines from the webscale collection. Here's where named functions are easier to wield. We can define named functions directly in the shell so that we can reuse them. Let's define one for the "moveit" task:

function moveit(doc) {  

Now, we can iterate over our collection and apply this function to each document:

db.webscale.find({kind: "test"}).forEach(moveit)  

Fantastic! Now we're getting somewhere. But what happens when our migrations get more complicated or the requirements change again? We could cursor up and use the Mongo Shell's inline editor but normally we'd write a function like this in our favourite text editor.

Mongo has an 'edit' helper in the shell which can call up your preferred editor so you can modify an already defined function by saying edit functionname. To work out what editor to use, the Mongo shell first tries to use the JavaScript variable EDITOR, which you can set within the shell like so:


If EDITOR isn't set, the Mongo shell checks for the environment variable $EDITOR, which is often already set, but if not, can be set at the operating system's command line, or in your profile like this.

export EDITOR=/bin/vim # or emacs, no judgements here  

With EDITOR set, we can now do the following.

edit moveit  

Using your favorite editor within the shell makes it easier to write and debug more complicated functions.

Now, we don't have a collection to hand to safely run these experiments on – don't run this on a production server or collection with live data – but we can create one. When we are creating data, we usually need to select one of a list of possible values. Let's define a reusable function in the shell to do that:

function oneFrom(list) {  
  return list[Math.floor(Math.random()*list.length)];

We can now say onefrom(['a','b','c']) and get 'a', 'b' or 'c' back. Using this we can make a randomLanguage function:

function randomLanguage() {  
  return oneFrom(['javascript', 'ruby', 'scala', 'golang', 'java']);

We can also create a randomKind function too:

function randomKind() {  
  return oneFrom(['production','dev','test']);

Now we could create a ten thousand machines using JavaScript's for loop:

for (var i = 0; i < 1e5; i++) {  
  db.webscale.insert({number:i, lang: randomLanguage(), kind:randomKind()})

But the insert statement is already getting a bit messy to work with in the shell and we haven't even set the active field yet. Another function to the rescue, this time to create a document for a machine given just a number:

function makeMachine(i) {  
  var machine={};
  machine.kind=randomKind();[ true, false ]);
  return machine;

We can edit and reuse this function easily and we can clearly express how we create our test data collection:

for (var i = 0; i < 1e5; i++) {  

We can test our migration script skills a little more easily now! Let's split the original collection 'webscale' into separate collections based on programming language. As a shortcut, we'll use the fact that the db is an associative array to reference our language collections:

function movedoc(doc) {  
  doc.moved_at = new Date()
  // notice that we can reference the collection as an array member

Now we have five different collections each containing a subset of original data and an empty webscale collection. If we rebuilt the webscale collection and edited movedoc so we used doc.kind instead of doc.lang and reran our command, the collection would be sorted into collections named after the various kinds of system.

Of course you could also make your migration script a function too so you didn't have to type it out and could go back and edit it too:

function migrationtest() {  

And you easily use edit to modify it so it could clear out old collections, create new test collections and even run some checks for you at the end. There's an awful lot you can do in the shell when you embrace its JavaScript-ness. There is only one caveat and that is these functions exist only for the lifetime of the Mongo Shell; exit it and they are gone. There are ways to persistently reuse your shell functions though and we'll look at that in the future.

But for now, what you need to remember is if you find yourself doing the same thing over and over again in the shell, or if you need to create and test a migration for your data, define some functions, use the power of JavaScript and get more done with less effort.

Conquer the Data Layer

Spend your time developing apps, not managing databases.