Getting started with Elasticsearch and Node.js - Part 1

In this article we're going to look at using Node to connect to an Elasticsearch deployment, index some documents and perform a simple text search.

It's the first of a series of articles that will show you how to combine the powerful search and indexing capabilities of Elasticsearch with Node's efficiency and ease of deployment. Future articles will take you through to building and deploying a web app while demonstrating some key Elasticsearch concepts along the way. All the code you'll need can be found in the article or in the petitioneering Github repository.

Creating an Elasticsearch deployment with Compose

If you don't already have a Compose account, you can quickly get started with Elasticsearch and Node by signing up for a free 30-day trial.

If you already have a Compose account you can create an Elasticsearch deployment from the Deployments tab in the Compose management console. To do this, log in to your Compose console, and click Create Deployment. Select Elasticsearch and create a new Compose Hosted deployment.

Choosing a dataset

There are tons of great datasets out there that you can use to explore Elasticsearch. The UK government's open data initiative, for example, has led to the release of tens of thousands of datasets. Or, you could spread the net wider by choosing a set from Awesome Public Datasets, a github archive that is exactly what its name suggests.

For this series of articles, we've chosen to work with data from the UK Government and Parliament petitions website. Set up to make the government more accountable to the people, it allows any British citizen and UK resident to start a petition. When a petition reaches ten thousand signatures it gets a response from the government, while petitions with more than 100,000 signatures could end up debated in parliament.

The petitions dataset includes information about which electoral district in the UK (known as a constituency) each signature comes from. Before we start working with the ten thousand or so petitions that have been created we'll index a dataset that contains a little information about each of those constituencies. This is a smaller dataset that will allow you to quickly get working with an index while you get a feel for the Elasticsearch structure.

Before all that, however, you'll need to configure your environment for using Node.

Setting up your Node environment

First, install Node and npm.

You'll need npm and the following Node modules for this walkthrough:

Install the modules using npm:

npm install elasticsearch get-json  

Referring to your deployment in Node

Using the elasticsearch module in node we can easily connect to and interact with our elasticsearch cluster. We'll create one file for the connection code which we can then use in all our subsequent code using Node's require method.

Copy this code and save it as connection.js. Replace the username, password, server and port with the values from your Elasticsearch deployment. You'll find these values in the connection strings in the Connection Info section on your Compose Deployment Overview page.

var elasticsearch=require('elasticsearch');

var client = new elasticsearch.Client( {  
  hosts: [
    'https://[username]:[password]@[server]:[port]/',
    'https://[username]:[password]@[server]:[port]/'
  ]
});

module.exports = client;  

We can now use this client var whenever we want to perform an operation on our deployment by including the following line in any other file we create:

var client = require('./connection.js');  

We'll start by creating a series of short, self-contained Node files that each perform a single function. Later on we'll start to combine them as we get into more in-depth examples. Let's test our connection with a simple deployment health check. Create a new file with this code, which will display the current status of your cluster.

var client = require('./connection.js');

client.cluster.health({},function(err,resp,status) {  
  console.log("-- Client Health --",resp);
});

Save the file as info.js, then in the terminal, run the file with:

node info  

You should get a response like this:

-- Client Health -- { cluster_name: 'el-petitions',
  status: 'green',
  timed_out: false,
  number_of_nodes: 3,
  number_of_data_nodes: 3,
  active_primary_shards: 0,
  active_shards: 0,
  relocating_shards: 0,
  initializing_shards: 0,
  unassigned_shards: 0,
  delayed_unassigned_shards: 0,
  number_of_pending_tasks: 0,
  number_of_in_flight_fetch: 0 }

If not, go back and check the connection details in your Compose Deployment Overview. If your connection looks good it's time to move on to creating an index and adding some documents.

Indexing

Indexing in Elasticsearch is not quite like indexing in other databases: the word 'index' itself has different meanings in different contexts in Elasticsearch, some of which might not be immediately intuitive.

In Elasticsearch, an index is a place to store related documents. We're going to create an index called 'gov', and we're going to use it to store two types of documents - 'constituencies' and 'petitions'. The act of storing those documents in an index is known as indexing. Unlike other database systems, where you need to explicitly specify and create indexes to improve the efficiency of some operations, in Elasticsearch these 'inverted indexes' as they are usually known are automatically created. When you index a document in Elasticsearch every field in that document is indexed by default.

To create the 'gov' index, let's create a new file (with our require statement at the top so we can connect to our Elasticsearch deployment):

var client = require('./connection.js');

client.indices.create({  
  index: 'gov'
},function(err,resp,status) {
  if(err) {
    console.log(err);
  }
  else {
    console.log("create",resp);
  }
});

Save this file as create.js and run it.

You should see:

create { acknowledged: true }  

Deleting an index is as easy as creating one. Create delete.js with the following:

var client = require('./connection.js');

client.indices.delete({index: 'gov'},function(err,resp,status) {  
  console.log("delete",resp);
});

When you run this file you should see:

delete { acknowledged: true }  

Now we've got an index, we just need some documents to go in it. We're going to be using two datasets for our index: one contains information about parliamentary constituencies, the other contains actual the petitions data. We'll start with the constituencies data because it's smaller and less complex.

Before we add our dataset, though, let's look at just adding a single document. Create a new file, called document_add.js and add the following:

var client = require('./connection.js');

client.index({  
  index: 'gov',
  id: '1',
  type: 'constituencies',
  body: {
    "ConstituencyName": "Ipswich",
    "ConstituencyID": "E14000761",
    "ConstituencyType": "Borough",
    "Electorate": 74499,
    "ValidVotes": 48694,
  }
},function(err,resp,status) {
    console.log(resp);
});

What this does is add a document to our 'gov' index, with a document type of 'constituencies' and an id of 1. (If you don't specify an id, Elasticsearch automatically generates one for the document). The document itself contains a few fields relating to the UK parliamentary constituency of Ipswich.

Run the code and you should see a response like this:

{ _index: 'gov',
  _type: 'constituencies',
  _id: '1',
  _version: 1,
  created: true }

Run it again and you should see

  ...
  _version: 2,
  created:false
  ...

Because we already have a document with id 1 for this type in this index, Elasticsearch treats this as a new version of the document.

At some point we might want to check how many documents there are in our index. This is not going to produce a terribly exciting result at the moment, but let's do it anyway. Add the following to info.js.

client.count({index: 'gov',type: 'constituencies'},function(err,resp,status) {  
  console.log("constituencies",resp);
});

Run it and you should get a response that shows a count of one constituency.

Deleting a document is as easy as indexing it. Create a new file, document_del.js and add:

var client = require('./connection.js');

client.delete({  
  index: 'gov',
  id: '1',
  type: 'constituencies'
},function(err,resp,status) {
    console.log(resp);
});

Run this file and you should see something along the lines of:

{ found: true,
  _index: 'gov',
  _type: 'constituencies',
  _id: '1',
  _version: 3 }

Run info.js again and our document count should now be zero.

When we want to add a lot of documents at the same time it's often easier to use the bulk method in Elasticsearch. The format is similar to the index format, except for each document we need to send Elasticsearch two objects, one to define the index, type and id of the document, and one for the body. For example:

var myBody = { index: {_index: 'gov', _type: 'constituencies', _id: '1' } },  
{
  "ConstituencyName": "Ipswich",
  "ConstituencyID": "E14000761",
  "ConstituencyType": "Borough"
  ...
}

These objects then form the body of the object that you send to Elasticsearch using the bulk call:

client.bulk({  
  index: 'gov',
  type: 'constituencies',
  body: myBody
};

Download the constituencies files - constituencies.json and constituencies.js - from the petitioneering Github repo. Take a look at both and make sure you understand what's going on. In a nutshell, constituencies.js reads the contents of constituencies.json file, and adds each entry to its bulk array. Using the bulk command it sends all the constituency data to the Elasticsearch client, which indexes each constituency and then returns a response, which you'll be able to see all or part of depending on how far back your terminal window lets you scroll.

Run constituencies.js to index the contents of constituencies.json.

If you check your client again by running info.js you should see an updated constituencies document count of 650.

Searching

Obviously, one of the things you're going to want to do with your Elasticsearch index is search it. Create a new file and add the following:

var client = require('./connection.js');

client.search({  
  index: 'gov',
  type: 'constituencies',
  body: {
    query: {
      match: { "constituencyname": "Harwich" }
    },
  }
},function (error, response,status) {
    if (error){
      console.log("search error: "+error)
    }
    else {
      console.log("--- Response ---");
      console.log(response);
      console.log("--- Hits ---");
      response.hits.hits.forEach(function(hit){
        console.log(hit);
      })
    }
});

Save the file as search.js and run it.

All being well you'll get one hit. Change the query so it looks for constituency names matching "Ipswich" and you should get more hits. Let's change it again, but this time using "North Ipswich" as the search term. We might expect just one result for this query (after all, how many North Ipswich constituencies can there be in the UK?), but in fact this query returns multiple hits. We'll get into why that is the case in the next article in the series. You can also learn more about searching in Elasticsearch by checking out our article on Query-time strategies and techniques".

Wildcards and regular expression searches

Finally, a simple change to the search query shows how you can use wildcards and regular expressions to expand your search.

For an example of a wildcard search, search for constituency names starting with any three characters followed by 'wich':

query: {  
  wildcard: { "constituencyname": "???wich" }
}

For an example of a regular expression search, search for constituency names starting with one or more characters followed by 'wich':

query: {  
  regexp: { "constituencyname": ".+wich" }
}

Next

In the next article in the series we'll introduce using analyzed and non-analyzed fields to control search results in Elasticsearch defining mappings to tell Elasticsearch what sort of data your fields contain. Later we'll go on to explore how you can handle nested data structures, and finally how to turn our code snippets into a deployable web app.