Going Slow for Performance - MongoDB Shell Power

Just recently we were asked by a customer if we could help him not lock his MongoDB database. The problem seemed simple on the face of it. All he needed to do was remove one field that had been accidentally added to every document. Of course, things weren't quite that simple – with nearly 25 million records to update the database would be locked for quite a while and done as one operation there was a good chance that replication could lose track.

The Locking Problem

It's all to do with the database level locks we talked about recently when we looked at what is coming in MongoDB 2.8. In MongoDB 2.6 a write operation locks the database while it is taking place. Although many concurrent reads can happen at the same time, all reads are locked out for any writing operation and only one of them can have control of the lock. Now, update is a write operation and if you have to go through all 25 million records to update them then that operation is going to take some time. And while thats happening, no one can read the database.

There's worse in store though. Even though the command to do the update to every is compact, it'll unroll in the oplog, the capped collection used for replication, so that there's a record for every single document updated or at least there would be if the collection wasn't capped. If it's capped there's a finite limit on how many records can queue up before records are lost to the replication process and the system has to fall back on to being manually resynchronised and that will involves quite a lot of locking too.

This all makes for a tricky problem - especially as it is expected to be a one-off change so any changes to the architecture of the servers would not be an option. The solution, in this case was to take it slowly. The unwanted field wasn't causing any issues so we could pace ourselves in removing the field.

The Shell Solution

3

The solution to the problem can be found in the MongoDB shell. We've talked about the shell in the past (and covered shell data reorganisation, functions, customisation and other tips) and shown how you can use its JavaScript base to orchestrate some quite complex operations. Usually though we try to achieve our results as quickly as possible but in this case we want to take it slowly. The strategy we're going to adopt is to do a little work every five seconds - say update a 1000 records and then go to sleep. While our script sleeps, replication can catch up and the database will be unlocked so work can continue as normal. That's the plan, and here's where we implement it... Here's the whole function we'll call set up to remove unwantedfield from a collection called writenow:

function slowremove() {  
  var run=true;

  while(run) {
    var arr=db.writenow.find({ unwantedfield: { $exists:true } }, { _id:1 } ).limit(1000).map( function(doc) { return doc._id; } );
    db.writenow.update( { _id: { $in:arr } }, { $unset: { unwantedfield:"" } }, { multi:true } );
    run=arr.length>0;
    sleep(5000);
  }
}

Let's take this function apart - it's basically a loop which keeps going till it's out of data. Each pass through the loop first sets out to find the documents where the unwantedfield exists:

  db.writenow.find({ unwantedfield: { $exists:true } } )

But we don't only want the _id field, so we add in a projection to select only that field:

    db.writenow.find({ unwantedfield: { $exists:true } }, { _id:1 } )

And we don't want all the documents, just the first thousand that match so we add a limit statement.

              .limit(1000)

We want these results as an array of ids so we can find them quickly using the $in operator. If we used the .toArray() function though it would give us an array of documents that contained the _id field so we don't want to use that. Instead we can use the .map() function – this passes every result of the find to a function specified as a parameter to .map(). What comes back from the function is added to an array and it's that array that's finally returned. In our case, we just want the _id so we add give .map() a function that just returns that:

              .map( function(doc) { return doc._id; } )

We store the result of that in an array, novelly named arr. Now we can go tidy up just those records with an update. This is fairly conventional:

  db.writenow.update( { _id: { $in:arr } }, { $unset: { unwantedfield:"" } }, { multi:true } );

We just check for the _id of the documents is in the array of ids we just created and where it is, we remove the unwantedfield. We make sure the multi option is true to ensure all the array is processed.

That's our update, but before we loop back around to do it again, we can check to see if there's more to process. In this case we just see if there were any results in our last query, if there was, we'll keep processing. This does mean that there's a redundant update at the end of the processing, but it matches and processes nothing which is an inexpensive as you can get. The last step in the loop is a sleep(5000) which makes the shell sleep for five seconds. That five seconds lets everyone else in to work, let replication to happen and basically keeps everything flowing nicely.

To use this function yourself, you'll need to take the code and save it in a file, say slowremove.js - edit it to handle your collection and field and perform whatever operation you actually want to do, then run the mongo shell and connect to your MongoDB. Load the script in the shell with load("slowremove.js") and start it off by invoking the function with slowremove()... and wait.

And wait. This isn't a quick operation. With 25 million records at a 1000 processed every 5 seconds it'll take around one and a half days to complete. But at the end of it, unwantedfield will no longer be present in the collection and no users should have noticed the changed being performed.