Node.js, MongoDB & Pool Pollution Problems

With their shared affinities for JavaScript and JSON, Node.js and MongoDB are often referred to as perfect partners. Together, they make developing server-side applications fun and storing JSON data direct and easy. The only thing to watch out for is the intricacies of Node.js that make it easy to fall into bad patterns, such as creating too many connections to the database and thus overloading it. To avoid this bad pattern, we need to start with some Node.js basics.

The problem

Powered by a single thread, Node.js runs its JavaScript code sequentially within one process, constantly passing through an event loop powered by libuv. IO operations happen asynchronously with code for any IO posting a callback function to be invoked when the IO is done. If we take the simple example of using the Node.js native driver for Mongo…

var mongodb = require('mongodb')  
      , MongoClient = mongodb.MongoClient

MongoClient.connect(process.env.MONGOHQ_URL, function(err, db) {  
   db.collection('users').find({}).toArray(function(err, users) {
      console.log(users);
    })
 })

We put the processing within the callback function for the MongoClient.connect() call. Running this, the code prints the contents of the users collection and never exits because its stuck in the event loop, waiting for an event.

Now, say we have a simple web server; here’s one using the express framework which responds to requests for “/users” with a simple message:

var express = require('express')  
  , app = express()

app.get('/users', function(req, res) {  
 res.send("Would send users")
})

app.listen(1337)  

The “obvious” way of bringing the code together would be to replace the message with the database processing code like so…

var mongodb = require('mongodb')  
  , MongoClient = mongodb.MongoClient
  , express = require('express')
  , app = express()

app.get('/users', function(req, res) {  
   MongoClient.connect(process.env.MONGOHQ_URL, function(err, db) {
    db.collection('users').find({}).toArray(function(err, users) {
      res.json(users)
    })
  })
})

app.listen(1337)  

And if you run this program, it will work in the worst possible way. If we keep reloading the page and then log into the MongoDB server with the shell and run

db.serverStatus().connections we will find that after only a realatively few reloads, we’ve eaten up half the available connections:

> db.serverStatus().connections
{ "current" : 526, "available" : 293, "totalCreated" : NumberLong(657) }

The problem is not only that we are creating a new connection to service every request, we are in fact creating five. The MongoDB driver for Node.js is built to be shared and can handle multiple requests. To allow for this it opens five connections to the server and places them in a pool ready to service database requests. And what we’ve done is negated that code by creating a connection every time.

The Simple Fix

The easiest way to fix this problem is to open the database connection once and reuse it. Thats as simple as swapping two lines of code so that

app.get('/users', function(req, res) {  
   MongoClient.connect(process.env.MONGOHQ_URL, function(err, db) {
   ….
becomes

MongoClient.connect(process.env.MONGOHQ_URL, function(err, db) {  
   app.get('/users', function(req, res) {
      db.collection('users').find({}).toArray(function(err, users) {
       res.json(users)
    })
  })
})

The drawback here, at least for the developer, is that all the request routing code now sits inside a function and that's not as maintainable as it could be.

An Alternative Fix

One thing about Node.js is that it maintains state while running its event loop. Set a global namespace variable and it stays set. We can leverage that to create a db variable which we initialize at startup.

var mongodb = require('mongodb')  
  , MongoClient = mongodb.MongoClient
  , express = require('express')
  , app = express()

var db;

MongoClient.connect(process.env.MONGOHQ_URL, function(err, database) {  
db=database;  
app.listen(1337)  
})

app.get('/users', function(req, res) {  
   db.collection('users').find({}).toArray(function(err, users) {
   res.json(users)
  })
})

Taking it further, we could create a global variable for the collection, initialize that after connecting to the database and using the variable in place of references to the db.collection('users'), reducing the chance of errors creeping in where the collection name is incorrectly entered.