Mongo on the RocksDB

When MongoDB 3.0 arrived, it brought with it the ability to plug in storage engines into MongoDB. We anticipated that last year when we wrote about The Coming of the MongoDB Storage Engines. At that time, it was expected that RocksDB would be playing a part in the MongoDB storage engine story, but that wasn't to be. MongoDB Inc acquired the well-respected WiredTiger storage engine and company and began work integrating that.

Some might have thought that the switch by MongoDB, Inc, even though still to be fully completed and still with some issues which could cause data loss, would be the end of the story for RocksDB and MongoDB. Which is why we were quite excited by the news that Parse, part of Facebook, have actually been working on optimizing RocksDB and MongoDB. More than that though, they are already running MongoDB on RocksDB for some of Parse's production workloads.

Parse points out that Facebook already runs RocksDB as a storage component in some of their services in the announcement and that they had a lot of confidence in the maturity of RocksDB given that. More importantly, they could work with Facebook engineers who developed RocksDB.

It's not just a strategic/people decision that Parse has made though. Built using code from LevelDB and using ideas from Apache HBase, RocksDB is an LSM – Log Structured Merge-tree – storage engine designed to consume as much of the IO operations a RAM and Flash/SSD storage system can offer, with pluggable compression offering another opportunity to get great insert performance tuned to their usage. WiredTiger is capable of LSM, but the MongoDB implementation apparently limits it to only working with btree indexes and reduces its usefulness, at least for Parse.

A more subtle technical issue is the one of how many files the storage engine creates. WiredTiger creates a file for every collection and every index you create. For Parse, that doesn't work very well as they have "millions of collections and tens of millions of indexes" which means a lot of files, most likely enough to start having an impact on performance at the file system level. RocksDB doesn't have that problem say Parse which is especially important given their use case of allowing numerous users to create numerous collections of data which will be subjected (or not) to a range of different queries. It's a rare, if not near unique use case so when Parse start publishing their test results, that has to be borne in mind.

But that last issue encompasses an important point. Different storage engines store data in different ways. For all the variations in performance between them, there's an equal range of ways that they can stress the underlying OS, filesystem, SSDs, RAM or CPU in different ways. From a deployment point of view, you have to treat them like completely different databases. If you've tuned your OS and resources to database X and then just switch to brand new database Y, odds are you aren't going to get the best performance and as it's a new database, there's no best practices, yet, for database Y.

It's all about gathering data, from our own testing, from your testing, and in this particular case, from Parse testing RocksDB vs WiredTiger. When you want the perfect fitting containers for databases, the rule is measure at least twice, deploy right first time.