Optimizing MongoDB Queries with Elasticsearch

There is a school of thought that says one database should be able to cover all your database needs. Tsk. When we hear the question "So, which should I choose for my application, MongoDB or Elasticsearch?" our default answer, for so many database questions and especially this one, is "depends". It depends on your application, its search needs, how you want it to manage the data and what you really want from your database.

MongoDB vs. Elasticsearch

Let's start with MongoDB. At its core its a fairly traditional document database. It stores JSON documents in collections, but not before quietly converting them into BSON, a binary version of JSON that is designed to make it easier to traverse arbitrary-sized fields and values in memory (and on disk). BSON adds more types to JSON's very simple model of numbers and strings to make things easier and more efficient to process. Its roots are very much, though, in the realm of traditional database even if it is storing documents.

With Elasticsearch, its roots show through too, though its heritage is that of the search engine and the Lucene text search library. It too stores JSON documents into what it calls indices. There's no conversion to a binary format, but the content of the document is analysed and indexing created based on that analysis. The analysis itself is driven by indicating the values of various fields have types and that type information is used to better understand and index those fields. That's the search engine heritage showing through making the stored data amenable to lots of different ad-hoc queries.

Pick one, any one…

So, if digging through your data in different ways is important then you probably want Elasticsearch in your solution. As a rule of thumb we use in house, if you need any more than five or so indexes on a MongoDB collection, then that collection is a prime candidate for putting into Elasticsearch.

You may wonder why the number of indexes is important. Every time a record is inserted or updated, each one of those indexes has to be updated. The bigger the indexes, the harder the task. The more indexes there are, the more times that hard task has to be carried out. This applies to most databases, but none more so than the current MongoDB.

Not just Elasticsearch

So, you may say, why not use Elasticsearch as your main database. With only a few indexes, MongoDB is as fast as most applications need and if you need performance then a MongoDB schema tuned for minimal indexes is ideal. It'll outperform Elasticsearch with queries on the similar indexing. But there's a more important reason not to use Elasticsearch - it can lose write operations. The problem is known and well explored. When Elasticsearch finds itself splitting and reforming the cluster for whatever reason, writes can get lost. The problem comes from the search engine roots of Elasticsearch and it means that Elasticsearch is not a good pick for your main database.

Elasticsearch is good as a search engine and one task it is especially popular for is digging through logs. Logs are vast amounts of semistructured data and they never get updated. It's like the ideal use case for Elasticsearch and its really good at giving the ad-hoc analysis and search results on that kind of data.

The power of two

Before you assume the question of which database to use is answered though, you'll notice that question hanging in the air about MongoDB's indexes and their effect on index performance. This is where Elasticsearch comes back into play. If you keep your MongoDB indexes minimal and simple, but then also feed all your records into Elasticsearch as a secondary database, what you get is a database tuned for efficient transactions and a secondary database tunable for speedy, complex and comprehensive searches.

So how do you build a multi-database solution like this? There's three routes. One is to use drivers for MongoDB and Elasticsearch and write your application to use both when writing and updating. This is entirely customisable, but you will probably end up abstracting your read and write operations out so no one has to see the complexity underneath.

Another option is to use a bi-lingual driver like Mogoostastic which plugs into the Node.js ORM Mongoose and lets you select fields which will be indexed in Elasticsearch as well as stored in MongoDB. A .search() method is then added to let you look up records using the Elasticsearch Query DSL but retrieving the MongoDB records. Of course, that does tie you to the ORM and that plugin, but its worth considering for new applications.

The third and final route to multiple databases is the Compose way. Run your application with MongoDB and let Compose's Transporter application keep a synchronised copy of that data in Elasticsearch. With that synchronised Elasticsearch, you are free to extend your application to query it using the Elasticsearch API and DSL while continuing working with MongoDB. That means you retain choice as to which languages, drivers and ORM your application uses and it's easier to backport this solution into a legacy codebase.

Two for power

No one database will fulfill all your storage and search needs. There will always be compromises made in general purpose databases which allow specialised databases to shine. MongoDB is very much a general purpose database and Elasticsearch specialises in search, so it isn't surprising to find that they complement each other.

Similar pairings could be made with PostgreSQL or RethinkDB (also available on the Compose platform) and Elasticsearch but the MongoDB/Elasticsearch combination is more common. That's in part because, before MongoDB 2.6. MongoDB's full text search wasn't good enough and Elasticsearch's text search was strong.

Now, MongoDB 2.6 has a "good enough" single field text index but for anything richer than that, your first stop is Elasticsearch. More importantly though, Elasticsearch is capable of performing complex searches and providing the data for features such as autocomplete in your applications. By using Elasticsearch for that type of work, you also move work away from your MongoDB servers and spread the load.

So, to sum up, choose MongoDB for general application use and add Elasticsearch for when you need rich full text searching. It's not an either/or question at Compose where we understand how the power of two (or three or more) database technologies can drive your application.