Rethinking Changes - How Two Databases Handle Change Notification

What makes RethinkDB different from say MongoDB? Since we added RethinkDB to the Compose database service, that's a question we've been asked. We could talk about the ground up scaling that's engineered in or the integrated web interface but today I'd like to talk to you about how you can monitor changes in database tables because it's something that really picks out the difference in philosophy between the two databases.

As web applications have become more real-time, more collaborative and more interlocked, it's more important than ever to be able to appropriately react to changes in data and, with a lot of clients or backend servers, you don't know where those changes are coming from. The traditional technique has been to poll databases for changes with regularly executed queries, but this burns database processing time and is only as up to date as the last query made.

The MongoDB Way

3

With MongoDB, there is an option, at least for back-end services in the form of the oplog. When MongoDB developers built their replication system they did so by generating a stream of all the changes in the database, an operational log of activity. This oplog was for a long time, only used by other MongoDB servers in a replica set to keep up to date with the master server. Eventually, the oplog was used by other applications as a source of changes to tap into and frameworks like Meteor picked up on that and developed drivers that subscribed to the oplog's flow. That allows them to respond immediately to changes.

There are pitfalls in the oplog approach though. It needs more configuration and the granting of oplog access permissions to users but more critically, you get all the changes. When we say all the changes, even if you are only interested in one collection, you will get all the changes for every collection and you'll have to write your application server to drop the oplog activity you aren't interested in. Having oplog access though is incredibly useful, so people work their way around the pitfalls. If you want to know more, do look at our previous articles on how to use the oplog on Compose MongoDB deployments.

The RethinkDB Way

3

With RethinkDB, there's been no borrowing of replication data. The process of following changes is built into the API with a mechanism called change feeds. The feature was introduced in June 2014 after a long period of development where the challenge of making change notification working over multiple servers was taken on and overcome. The result is that you can get a change feed from any table in your database simply using the changes method in a chain of commands:

r.table("changables").changes().run(conn,function(err,cursor) {  
    cursor.each(console.log);
    });

This is the JavaScript version; when a change occurs in the table, it and any other changes are delivered to a user defined function as a cursor which they can process as they wish. Here we simply print out each change. In Ruby this would read:

r.table("changables").changes().run(conn).each{ |change| p(change) }  

While in Python it would be:

feed=r.table("changables").changes().run(conn)  
for change in feed:  
    print change

The API is unified over the three platforms only varying for the idiomatic variation of each language. We'll use JavaScript for the rest of this article though.

With changes method, the results come in the form of an oldval object and newval object. Both will have values for updates and replaces, inserts will have a null oldval while delete has a null newvalue. With that information, it's simple to determine what operation took place. An added plus is that you can filter or map the results before they are returned to your application. For example, if you wanted to only process changes belonging to a particular customer 'Fred" then you could do:

r.table("changables").changes()  
    .filter(r.row('old_val')('customer').eq('Fred'))
    .run(conn,function(err,cursor) {
cursor.each(console.log);  
    });

There is a limitations to this though; you can't count or sort the change feed stream (because its endless and counting and sorting rely on results being finite). Despite that, it's a powerful way to get your database changes pushed to you. One hint worth noting; run your change feeds on their own RethinkDB connection to ensure that your other connections can remain consistent and responsive even when a lot of changes are happening.

What is a complex bolt-on in MongoDB is a tightly integrated part of the query syntax of RethinkDB. If you want to learn more about change feeds in practices, see RethinkDB's blog posting "Catthink" which teams up the cats of Instagram with RethinkDB and the introductory documentation. Meanwhile, we'll be looking at more RethinkDB features in future articles.