At the MongoDB 2.6 Index Intersection


Although MongoDB 2.6 is still some way off, some of the major changes which will make the next release are now getting to see the light of day. One important change, index intersection, has been discussed for some time as it addresses one of the oddities of MongoDB – it can currently only use one index at a time when handling queries. Now, MongoDB’s CTO has confirmed in a discussion that MongoDB 2.6 will by default, be able to use more than one index when planning a query. The next question would logically be “how many indexes” and the code that has landed in the development tree for MongoDB version 2.5.5 answers that question: 2 indexes, though with the rider that this limit many increase over time. To understand what this means in practice though, we need to look at MongoDB 2.4’s use of indexes.

Let’s start with an example of a simple document:

{ firstname: "Theresa", lastname: "Prohaska", city: "East Haylie" }

There are three fields - four if you include the “_id” field which will be generated when this document is saved - in our document. Now, without indexes, any query will need to scan every document stored. If we index the lastname field, then issue a query including lastname, its index will be used to thin down the field of documents to be scanned. If we index the firstname too, and issue a query including the lastname and the firstname, the query planner doesn’t use both indexes but instead tests to see which field’s index runs fastest the first time it sees the query and uses that from then on. It’s not optimal, especially as indexes cost memory.

The traditional MongoDB workaround for that problem is compound indexes. These are indexes made up of more than one field. You can think of them as a declaration that you intend to search on this particular combination enough that it's worth using the memory and resources to maintain this particular combination. And there’s more catches to compound indexes. For example, the order the fields are defined in the compound index determine how useful the index will be in supporting queries that use the fields, so if we compound indexed our example like so…

db.collection.ensureIndex( { firstname:1, lastname:1 } );  

The index would be useful for query including firstname and lastname or firstname only, but would be no help for lastname only; you’d need to index lastname separately too. This is a simple example and in practice you may have many combinations of fields you’d want faster queries for. You could create an index for each combination but each index would cost memory and write performance.

The catches don’t stop there; indexes help in sorting, but with compound indexes can only boost sorting if the sort order of the index is the same as, or exactly opposite to, the query. Our example compound index would help out with both fields sorted ascending:

db.collection.find().sort( { firstname:1, lastname:1 } )  

and both fields sorted descending:

db.collection.find().sort( { firstname:-1, lastname:-1 } )  

but be unable to offer any assistance for one field ascending and one descending like:

db.collection.find().sort( { firstname:1, lastname:-1 } )  

So a compound index can be very useful but it’s a very specific performance optimization with a cost. If only there was a more generic solution…

Enter Index Intersection

The index intersection technique has been around for many years in the world of traditional relational databases; by using independent indexes together it gives the query planner more flexibility to find an optimal way to answer queries and minimize scanning of the stored data. What MongoDB 2.6 will have is the ability to use two indexes when planning a query. If we assume our example documents now just has an index on firstname and another on lastname:

db.collection.ensureIndex( { firstname:1 } )  
db.collection.ensureIndex( { lastname:1  } )  

When a query is made the query planner can now use both the independent indexes in working out how to resolve the query by looking for where the indexes cross over, or intersect. This in turn cuts down further on how many documents have to be scanned to fulfill the rest of the query’s specification. For collections which have a compound index with two fields, as long as the fields involved have their own indexes, it should be possible to drop the compound index and let the query planner handle it. Less indexes mean less write contention too as there’s less to be updated. But, if the fields aren’t already indexed, then you’ll need to consider whether you want to create two indexes to replace that one index.

If you rely on the indexing for boosting sort performance, there's a catch there too: as of MongoDB 2.5.5, sorting isn't able to use indexes in the same way as the query planner does for queries. So, disposing of a compound index may well slow down sorting and you will also need to check where the compound index comes into play with sorting. Where there are many fields in a compound index, the planner changes will have a limited effect, unless the compound index fields and another indexed field are referenced in a query; a compound index appears to be treated by the planner as just another index.

That initial limit of 2 indexes is also going to make decisions about when or if you should change indexes a bit trickier. If more indexes were available to the query planner, then these decisions would be simpler, but MongoDB 2.6 is going to be limited to 2 indexes. With that in mind, MongoDB users are going to have an opportunity, before 2.6 is eventually released, to look at how they are indexing their collections, especially the ones with lots of indexes, and seeing whether they can win performance or memory by letting the new query planner optimize with index intersection. Until a later MongoDB gets the ability to intersect more than two indexes and expands support for intersection to sorting, it is going to be a delicate performance balancing act for users.

MongoHQ customers who may have any questions about how 2.6's index intersections could be harnessed should contact us.

Dj Walker-Morgan
Dj Walker-Morgan was Compose's resident Content Curator, and has been both a developer and writer since Apples came in II flavors and Commodores had Pets. Love this article? Head over to Dj Walker-Morgan’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.