Node.js & Mongo-Oplog: Elegant Oplog Consumption

In the first part of this series, we showed you what was in the oplog and some raw code that you could use to access it. That raw code was not the prettiest of things though, and starting in this post we'll look at the Mongo-Oplog library for Node.js and how you can put it to work for your application. We'll also look at the limitations of using the oplog.

A quick look on npmjs.org shows quite a few packages for reading the oplog, so we’ve selected some of the more useful ones. The first library we’ll look at is the elegant Mongo-Oplog for Node.js. The simplest oplog watcher using this library looks like this:

var MongoOplog = require('mongo-oplog');  
var oplog = MongoOplog(process.env.MONGOHQ_URL, 'wiktory.userinfo').tail();

oplog.on('op', function (data) {  
  console.log(data);
});

After requiring the library, the program creates a new Mongo-oplog instance. Here it’s using the the environment variable MONGOHQ_URL which is composed as we described in the first post in this short series, of the Replica Set URI with an "authSource" set to the database. The next parameter is a filter which sets the namespaces operations for which we are interested. This parameter can also include wildcards if we are interested in events from a number of similarly named collections. That is followed by a command to start tailing the newly created oplog monitor. For clarity, that one line can also be expressed as:

var oplog = MongoOplog(process.env.MONGOHQ_URL);  
oplog.filter("wiktory.userinfo");  
oplog.tail();  

Now, the next part sets up a callback whenever there’s an operation, "op", in the oplog stream. That callback is passed the document we covered in the previous part, complete with timestamp, unique id, operation type, and other details of the changes. It's similar, but a lot shorter than the example code in the previous article.

The plus side with Mongo-Oplog is that it actually generates a range of events for the oplog contents. Primarily, there’s "insert", "update", and "delete" which fire when the op field is "i", "u" or "d". There's also "end" and "error" events for, respectively, when the cursor stream ends or there is an error.

Say we had a name and address collection and we want to trigger a process when customers from Alaska are added or a customer's records are updated to place them in Alaska. In a case like this, we could do something like:

oplog.on('insert', function (doc) {  
  if(doc.o.state=="Alaska") {
    console.log("New Alaska customer");
    console.log(doc);
  }
});

oplog.on('update', function (doc) {  
  if(doc.o.$set.state=="Alaska") {
    console.log("Customer moved to Alaska");
    console.log(doc);
  }
});

This seems a good point to note the limitations of oplog tailing. For example, say the requirement was to trigger an event when someone's state was changed from "Alaska" to something else. Because you can only see what has been written in the oplog document you can't see what the preceding state of the now-updated record was, unless the update operation also saved the previous address in the document. A rule of thumb with the oplog is, don't assume you'll have any copy of the previous state to work with when you are filtering through the oplog's entries. This applies doubly so to deletions where you only get the "_id" of the deleted document.

Another useful rule of thumb is to avoid going back to the server when analyzing whether you should trigger an event. A single query back to the server for every oplog operation means every update or insert could generate one or more queries because of the way MongoDB decomposes updates that modify multiple documents into multiple oplog operations. Bear that in mind when considering whether oplog tailing is appropriate for your task.

Back to Mongo-Oplog's stream events. All operations generate an "op" event. Make sure you aren't duplicating effort if you have an "op" event handler and an "Insert/Delete/Update" handler as any of these three will trigger both event handlers.

A good application should be able to handle the unexpected and for Mongo-Oplog, there are three unexpected events: "error" for errors coming from the underlying cursor or the database itself, "end" for the oplog stream coming to an end, and "stop" for the server itself stopping. The application can call the stop method of the oplog watcher to stop tailing and disconnect from the server gracefully while leaving the option open to re-establish the connection.

And that's a tour of Mongo-Oplog - a simple, clean, JavaScript-based library for handling oplogs. In forthcoming posts in this series, we'll look at how the Meteor framework uses the oplog as part of its scaling proposition and follow that up with a review of other oplog libraries, for Node.js and other languages, as well as other oplog consuming tools.