MongoDB Indexing Best Practices

Published

Going “Best Practice” on any topic is an expansive statement. But, we will give it a go with some high level anti-patterns and best practices. Some best practices will be general statements about the performance of MongoDB.

Indexing Constraints

The following constraints are in affect as of MongoDB 2.4.x series. There is talk of adding features in later versions that will remove of these limitations. Even if these limitations are removed, by using the following constraints will continue to lead to better performance.

With MongoDB, it - Can only use 1 index per query - Can only use one “multi-value” operator / query (e.g. $nin, $in, $nor, $gte, $ge, $lt, $lte, $near, $sort). Yes, that does include sorting. However you can use a range query and sort on the same field effectively. - Must include the multi-value operator as the last used field in the index - RAM is fast, disk is slow. You must keep indexes in RAM.

// A Bad query with conflicting `time` and `user_id`
db.coll.find({action: 'run-process', time: {$gte: ISODate('2013-05-06'), $lt: ISODate('2013-05-07')}).sort({user_id: -1})

// A better query using date "bucketing"
db.coll.find({action: 'run-process', day_bucket: ISODate('2013-05-06')}).sort({user_id: -1})  

Commonly, when optimizing queries, you move logic from the query into the schema. Instead of using the query to bound a date field, we are using a date bucket to move that date logic to the schema.

These indexing constraints will set you free. By following these constraints, you will model data and build a better distributed system.

Too Many Indexes

Everyone thinks of indexes they need, not everyone thinks of the indexes that can be removed.

Recently, I helped a customer optimize his database. Write lock on the database was running consistently at 95%. CPU was spiking consistently, and making for a poor experience. We looked at the indexes, and determined the customer had too many indexes (126 non-_id indexes). 24 of the indexes were on one collection. For each insert, update that modified keys, MongoDB was having to insert the document, and update 24 indexes. This was the cause of the excessive write lock.

The first thing we did was delete all of his indexes (do not try this on a database larger than 15 GB). Immediately after deleting the indexes, write lock when to 0%. The tradeoff for removing indexes was page faults on reads. However, reads were faster with no indexes than when trying to update consistently with too many indexes.

Once we removed these indexes, we turned on system profiling from the Database > Admin page. After a short analysis on the system.profile after the database ran for a few hours, we added back 6 essential indexes. Performance was drastically better.

Be precise with your indexes for the sake of RAM and write lock.

Best Practices

Conquer the Data Layer

Spend your time developing apps, not managing databases.