MongoDB & Indexing: Best Practices Redux

Back in May last year, we talked about a set of best practices for indexing and MongoDB, focusing on performance. Since then MongoDB 2.6 has arrived and it's a good time to update that list.

One index per query

MongoDB can only use one index in any one query operation. Although MongoDB 2.6 has introduced index intersection, which allows more than one index to be used (specifically two indexes currently), there are a number of caveats on their use and one index, whether singular or compound, is still easier to manage and optimise around.

One "multi-value" operator per query

There's a range of selectors we call "multi-value" operators. They can return records that match a variety of different values. They include $gt, $gte, $lt and $lte, $in and $nin, $ne, $not and $near. If you need to use one, use one and no more. This is because the "multi-value" operators can return an unpredictable number of values and are more likely to be scanned to be evaluated - the more you have to scan, the slower the query.

"Multi-value" fields last in an index

If you're indexing a couple of fields, and one of the fields you expect to be queried using one of those "multi-value" operators we've previously mentioned, then it is best to place them at the end of the index. Ideally, the index should be ordered so that the fields you query for exact values come first and the "multi-value" operators come last in the index. A wrinkle in that rule of thumb are fields that are sorted against – they should come between the exact and "multi-value" fields to minimise the amount of in-memory sorting needed.

Move logic into the schema

Some things work better with preparation. Consider a situation where you want to find out what time-stamped documents match a particular day. You have a time field for when the document was created and you reach for the find method and write...

db.coll.find(time: { $gte: ISODate('2014-06-23'), $lt: ISODate('2014-06-24') } )  

...a date greater or equal to midnight of the day we are interested in and less than midnight the day after. There's going to be a lot of scanning to get the results for that query and it's all avoidable.

We're interested in days, so we could create a day_bucket field in the schema and populate that with just the day...

db.coll.insert({ time: ISODate('2014-06-23T14:57:13.556Z'),day_bucket:ISODate("2014-06-23"});  

Any document for this date will have the same day_bucket value, so now we can query for a simple equivalence:

db.coll.find(time:ISODate('2014-06-23') )  

And a simple equivalence is a lot easier to scan for and index. By using techniques like this, you can optimise a range of database-stressing operations into one compact query.

Just enough indexes

The temptation to create many indexes for all possible queries can be high, but don't succumb to that temptation. Every index created has a cost at insert time of longer write locking or more work for background threads. And the chances are that far from all the indexes are actively used, especially when you recall what we said about one index per query. So if you have a collection inserting slowly or locking the database for too long, get a list of what indexes there are with db.coll.getIndexes() and make sure you need them. And if you do need them, make sure the queries that use them are making best use of the indexes.

Indexes in RAM

It may seem obvious, but for best performance of indexing, the indexes should be kept in RAM. It goes with the previous best practice: those unused indexes will all jostle for a seat in the RAM and marching them to and from disk is not going to be an efficient use of your databases time. Make sure your indexes fit in RAM and don't have to fight for their right to memory.