MongoDB 3.2 – A First Forward Look

MongoDB 3.2 is expected by the end of 2015 and after the presentations at MongoDB World, we thought it may be useful to take a first look at some of features spotlighted. Most of these features are still in development and, despite being shown, there is always the possibility that they will change by the time 3.2 actually arrives.

Schemas?

There was a lot of talk about schemas at the conference. For a "schema-less" database, as MongoDB has been promoted, this may seem odd but it does appear that MongoDB, Inc has re-discovered that a regular structure to the documents stored in the database can be helpful for managing the evolution of a database.

What this actually comes down to is a new, paid-for MongoDB Enterprise tool, Scout, which scans collections and reverse engineers a schema from the collections. It also offers suggestions about using new features in 3.2 to make the collection more useful and regular, features like...

Validations

One of those new features, in the open source version of MongoDB, is the ability to add validation to fields in a document. The feature, SERVER-18227, lets a collection have a validator specified as part of the collections metadata. The validator is a match expression which must be true for a document to be inserted or updated. If it doesn't pass, then the change will be rejected with a error 121, a DocumentValidationFailure.

But there's limitations. First of all, the validator has to be pretty simple in terms of match expressions; greater than, less than or exists seems to be it. No geomatching for near, no text searching and no where expressions.

To set the validator, you can either do it at table creation time, which now takes a validator option, or you can do it through the collmod command something like:

db.runCommand({"collMod": collName,  
               "validator" : {a: {$exists: true}}})

That example checks a field "a" exists. If you are considering modifying the validator on the fly, be aware that there's no get function for metadata, so you'll need to get the collection stats which should include the existing validator. Then you can modify it and set it again with "collMod".

There's more to bear in mind about validators though. First, they only apply to insert and update operations which means existing data in the collection is not validated... right up until you update an existing document and then, unless no changes are being made to the document, the validator will be applied. So, if you want to turn on validation, you'll want to scan your existing collection first and make all documents conform or add validation failure capture to all insert and update operations. Or you could give your users the BypassDocumentValidation permission and let them set the bypassDocumentationValidation flag, but that would be defeating the purpose of validation. Those permissions and flags are, by the way, there for administration tasks like restoring a partially conforming collection.

Partial Indexes

The other server side feature related to schemas is called "Partial Indexes", a feature which has been floating around MongoDB's JIRA since 2010. It's best explained by an example of where you'd use it. So, imagine you have a collection of all the customers you've ever had, active and inactive. For day to day use you want good query performance for the active users. One way to get this would be to have two collections, one of active users which is indexed and one of inactive users which isn't, but that means changing your app to ensure users are in the right collections. Or you could use partial indexes which only index a document if it passes a filter expression. This might be done like this:

myusercoll.createIndex({ name: 1 },  
        { partialFilterExpression: { status: { $eq: "active" } })

Now, this could be a huge performance improvement with very large tables where, if the filter isn't matched, not just skipping the documents at query time, but also skipping the indexing at insert or update time. That will, though, entirely depend on the structure and density of the fields being used in the indexing process.

Lookup!

The thing that MongoDB doesn't have is any type of join between tables. For a lot of tasks, you don't need joins, but when you're pulling your data together to analyse it, then you probably want to use a join. The MongoDB Inc advice around this has, historically, been to denormalise your data a bit more, copying the relevant data from a different collection into the collection you plan on having at the core of your analysis query. Well that works, as long as you ensure the collections stay in sync, at least for day to day application work, but when it comes to analytics you really can't go copying everything into everything else.

The core tool in MongoDB's analytics story is aggregation – it lets your create a pipeline through which your selected documents can be passed which can then perform various operations to burn down to the information you are after. You could say aggregate your orders table, and in the pipeline, have an operator which matched only a particular classes of product ordered, then another operator that grouped the total sales of each product class.

The pipeline has only worked with one collection's documents, so if, say there was another collection with the product's description, you couldn't use that information in the aggregation. At least, that is, until MongoDB 3.2 because it is adding a $lookup operator to bring in data from another collection.

A $lookup has a from parameter, the collection from which you want to pull data from. There's then an on parameter which says which field to use in that collection and which field in the document pipeline it should match. Finally, when a document is found it gets inserted into the pipelined document and a as parameter lets you specify the key name of the field under which its inserted. If this sounds a bit lumpy, making the document bigger, don't worry because the rest of the aggregation operators will be there to slice and dice the data back down to size.

$lookup has a lot of potential in the aggregation pipeline and may allow some users to un-denormalize some of their collections. We will have to wait for alpha/beta releases to get some idea how efficient $lookup will be in practice though.

Concluding

This is the first look at the database level operations that we should expect in MongoDB 3.2. All three features here are addressing pain points in the MongoDB architecture from inside the server. When an alpha/beta release of MongoDB 3.2 arrives, we'll be able if the user facing side of the server gets any more improvements. Most of the other MongoDB 3.2 changes are with storage engines, authentication, integration and replication. Something we'll cover in the future.