Full Text Search with MongoDB & Node.js

Today we are going to look at a full text search in MongoDB and how you can use it from Node.js. Before we go any further, you'll need a MongoDB 2.6 system. MongoHQ users will also need to be running on a paid account, either an Elastic Deployment or dedicated account, because we don't support full text search on the free sandboxes.

Now, full text search in MongoDB is a language sensitive text searching engine which scores matches. There's one big gotcha you need to know up front and that is you can only have one full text searchable field per collection. That means no documents with multiple indexed text fields for searching. If your schema is that complex and needs full text searching, there are better solutions like Elastic Search.

To explore full text search, we're going to create a small, and very unpolished web application and walk through the various parts. We open as always by bringing in the various libraries we need. We will be using the Express web framework and the MongoDB Node.js native driver so...

var express = require('express');  
var bodyParser = require('body-parser');  
var mongodb = require('mongodb'),  
  MongoClient = mongodb.MongoClient;
var assert = require('assert');  
var util=require('util');

var app = express();  
app.use(bodyParser.json());  
app.use(bodyParser.urlencoded({ extended: true }));

var db;  

We also do a little setup here to use the Express middleware body-parser. We also create a global variable for db access - this is a technique we discussed in a previous blog posting. With all the set up out of the way, we are going to first connect to the database and open a collection called "textstore" (which, obviously you should change if you already have a collection named that in your database):

MongoClient.connect(process.env.MONGOHQ_URL, function(err, database) {  
  db = database;
  db.collection("textstore", { }, function(err, coll) {

We capture the error in the callback and check if thats not null. If the collection doesn't exist, we can create it:

    if (err != null) {
      db.createCollection("textstore", function(err, result) {
assert.equal(null, err);  
      });
    }

... which is a handy snippet to remember when you are making example applications or programs you want to make self-bootstrapping safely. The assert.equal(null,err); is a quick way to make the program halt if there is an error.

Now we can create our index for full text search. We're going to have a document field in our documents which will contain the text. We'll also have a created field for timestamps but thats not important right now. Where a traditional index specifies the field to be indexed and an value representing the ascending/descending order of the index, the full text search requires that value to be "text". So let's create that index:

db.ensureIndex("textstore", {  
  document: "text"
}, function(err, indexname) {
  assert.equal(null, err);
});

With our collection and full text index in place, lets get the web server up and listening:

    app.listen(3000);
  });
});

And, yes, we need to start adding some routes to that server so let's start with one for the root and one for an add page...

app.get("/", function(req, res) {  
  res.sendfile("./views/index.html");
});

app.get("/add", function(req, res) {  
  res.sendfile('./views/add.html');
});

The index page is nothing more than links to the add page and a search page to come. The add.html file is simply a form which HTTP POSTs whatever is entered into its text area.

<html>  
<body>  
  <form method="post">
   <textarea class="text" cols="80" rows ="20" name="newDocument"></textarea>
   <br/>
   <input type="submit" value="Add" class="submitButton">
</form>  
</body>  
</html>  

We'd better handle that POST next in the code.

app.post("/add", function(req, res) {  
  db.collection('textstore').insert({
    document: req.body.newDocument,
    created: new Date()
  }, function(err, result) {
    if (err == null) {
      res.sendfile("./views/add.html");
    } else {
      res.send("Error:" + err);
    }
  });
});

Back in the setup, we pulled in the body-parser middleware which makes this quite simple. All we do here is insert a new document with a document field set to the value of the form's text area, that is req.body.newDocument and we also put a date on it. If it all goes well, we send the user the form again and if it didn't we display an error. The take away from this is that there's no special handling needed when inserting text for full text search.

We can now get to some full text searching. First we'll create a search page so the user can enter some search terms. Again, this will be a page with a form, a single text field, and a submit button that HTTP POSTs the form:

app.get("/search", function(req, res) {  
  res.sendfile('./views/search.html');
});

And now we can start to write our search query:

app.post("/search", function(req, res) {  
  db.collection('textstore').find({
    "$text": {
      "$search": req.body.query
    }
  },

The first part of the find is the query, and here we use the new $text operator to say we want to use the full text search index on this collection. Remember we said there was only one full text search index per collection, well this is part of the reason - there's no way to specify which text index. We'll come back to what the query can actually contain. The next parameter we give is for the projected fields. The first three are obvious, but the fourth involves some MongoDB magic:

 {
  document: 1,
  created: 1,
  _id: 1,
  textScore: {
    $meta: "textScore"
  }
},

The $meta projection operator in new in MongoDB 2.6 and is currently only used to handle this particular situation - getting at the text score value in the results of a full text search. It allows a projected field to be created from some associated metadata, in this case the text score. Expect it to be used in the future for other situations where data is generated but difficult to make visible. Anyway, we now get the text score in our values and this means we can also sort on it. The next parameter specifies sorting and it looks again refers to the $meta version of textScore:

 {
    sort: {
      textScore: {
$meta: "textScore"
      }
    }
  })

Our line isn't quite complete yet. We need to convert the results of the query to an array and send them to the user. We'll just add that in like so:

.toArray(function(err, items) {
    res.send(pagelist(items));
  })
});

Because this is a quick example, the pagelist function is somewhat rough and ready and only included in here to stop you having to read a single string representation of a JavaScript array. In any reasonable application, you'd be using a templating library to layout your pages. Anyway here's the pagelist function:

function pagelist(items) {  
  result = "<html><body><ul>";
  items.forEach(function(item) {
    itemstring = "<li>" + item._id + "<ul><li>" + item.textScore +
      "</li><li>" + item.created + "</li><li>" + item.document +
      "</li></ul></li>";
    result = result + itemstring;
  });
  result = result + "</ul></body></html>";
  return result;
}

The code for the example can be found in a GitHub repository. To run it, just set the MONGOHQ_URL environment variable to point at your database - the URL can be found in the admin dashboard for your MongoHQ database and then run node index.js. Navigate your browser to localhost:3000 and you'll be able to add or search from there. When searching, the better the match to the entire phrase entered, the higher the text score and the sooner it will be presented in the results. With some fragments of Wikipedia copied in as a simple search phrase a typical result page may look like this:

Search Results Example

This is a simple example to get people started and seeing what they can actually achieve with MongoDB 2.6's full text search. It could easily be enhanced to provide a richer web interface or, probably more usefully, converted into a REST service itself with add and search endpoints to provide other users with search-as-a-service.