The Coming of the MongoDB Document Locks

We recently looked at the plans to include pluggable storage engines in MongoDB 2.8. That's not the only major feature planned for MongoDB 2.8 – another is improved concurrency through the introduction of document level locking. To understand what that means, it's worth looking at locking and MongoDB's history.

Contention for resources is the predominant limiter of performance for any database system. Whether it be CPU, networking or disk, if the resource is scarce then any operation that consumes that resource will have to work with that scarcity. One of the most scarce things in MongoDB, and other databases, is the ability to read and write to the database. That scarcity is, though, a deliberate artificial scarcity.

Concurrent access to the MongoDB database is managed by each operation getting hold of the lock. When an operation starts, it asks for the lock. If it's not available, it waits and asks later. Reading operations can ask for the lock and share it with other reading operations. But when a request to write to the database comes along, the write operation gets exclusive control of that lock - only it can read or write while it holds then lock.

The lock means that no reading operation gets to see partially updated data and things stay consistent. The presence of a lock is not where the performance problem is though. The performance problem comes from the scope of the lock – how much gets locked when the lock is engaged.

Before version 2.2 of MongoDB, the lock's scope was huge, as in the entire MongoDB process - every database and every collection being handled by the process would be locked for the write to take place. This meant that one database could be locked despite no operations taking place on it while another database was being written to.

Version 2.2 saw the introduction of database level locking which specifically improved that situation. Now the scope of a lock would be restricted to just the database the read or write operation was working with (with the reasonable exception of operations which affected multiple databases which still took a global lock). Now this improves things for a lot of use cases. Where a database is dominated by read operations, the issue isn't visible. Where there are occasional writes, the lock isn't an issue.

But where there's a lot of simultaneous writes or a large volume of writes, then the lock issue comes back into play. Examples of this could include time series data or logging from a number of different clients. The lock is, by design, greedy so if there are a lot of writes queued up, then the readers will be starved of attention. People have tried to work around this in various ways – sharding databases so they all take the load of the various writes, using one server as a write database and then using replication to stream out the updates to other servers or trading write commitment for faster lock release. But the root cause of the problem is that the database is being locked.

That's where the promise of 2.8 comes in with document level locking. The scope of the lock is pulled in further than ever before. Now, when a write is occurring, only the documents involved in the write operation will be locked. There will, of course, be times when a write operation affects the whole collection, and there should be collection locking to handle that, and there will still be times when multiple database operations will invoke a global lock.

Document locking will improve performance under write loads, so where you have a good mix of inserts, updates and deletes, there will be less contention. But, the arrival of document locking may also reveal some design patterns which won't benefit from the improvements. For example, if you have a document with details about a device and an array as a time series of samples from that device, then you'll still have contention updating that document.

We can't, unfortunately, say much more about MongoDB's document level locking. With just over 3 months of the year left, the feature ticket has more open than closed sub-tasks. That said, a prototype lock manager arrived in 2.7.3 and the work down to pull the existing global and database locks into the lock manager has been completed. The current schedule sees many of the outstanding tasks due to be completed by the development release of 2.7.7, the next development release which should be soon given that 2.7.8 is pencilled in for mid-October. Around then, we'll be coming back to see if we can benchmark the new document level locking.