Elasticsearch.pm - Part 1: Connecting and Cluster Monitoring

In this article we're going to look at how to use the Elasticsearch perl module to connect to our Compose Elasticsearch deployment and report on the status of our cluster.

If Perl is your thing and you haven't already started using the Search::Elasticsearch perl module, you should definitely install it to get your Elasticsearch project going. Even if you're only a little familiar with Perl and want to start an Elasticsearch project, you might want to try using Perl with the Elasticsearch module for your project.

In a series of articles, we'll explore the power of this module for making the most of the functionality of Elasticsearch.

Installing the Module

The Elasticsearch perl module supports all the Elasticsearch APIs and gives you the features you're used to with Perl.

You can install the Search::Elasticsearch perl module from CPAN or MetaCPAN. For our examples here, we installed the 2.00 version on the strawberry perl 5.22.0.1 distribution on Windows. Elastic has written an overview for using the module with Elasticsearch that you can use as a quick reference.

Once you have the module installed in your perl library, you'll reference it at the beginning of your perl script:

use Search::Elasticsearch;  

Connecting to Elasticsearch

Using the new() method, we can create a client object that connects to our cluster. We're storing ours in the object variable called $es. Since we have two haproxy portals in our Compose cluster, we're going to list both of them using the nodes parameter so we can automatically switch our requests back and forth between them (via the HTTP::Tiny module by default).

You can get your nodes from the "Connection Info" section on your deployment Overview page. Replace the username and password for the ones that work for your deployment:

# Create client object to switch between two nodes and log/trace to a file:
my $es = Search::Elasticsearch->new(  
    trace_to => ['File','logfile.txt'],
    log_to => ['File','logfile.txt'],
    nodes => [
        'https://username:password@aws-us-east-1-portal10.dblayer.com:10019/',
        'https://username:password@aws-us-east-1-portal7.dblayer.com:10304/'
    ]
);

In our example we're not specifying a cxn_pool parameter because we're going to just use the default one, which is static, since our app and our Elasticsearch instance are not on the same servers. We are, however, using the log_to and trace_to parameters (part of the Log::Any module) for logging our server interactions and HTTP requests and responses. This can be helpful for any debugging needs that might arise. We're using them to easily check the connection success and the server responses. If you log to a file, like we are in our example, the file will be appended each time the script is run.

Now, if we look at our logfile after running this part of the script, we'll see the following:

[Tue Dec 15 15:33:24 2015] Current cxns: ["https://aws-us-east-1-portal10.dblayer.com:10019","https://aws-us-east-1-portal7.dblayer.com:10304"]
[Tue Dec 15 15:33:24 2015] Forcing ping before next use on all live cxns
[Tue Dec 15 15:33:24 2015] Ping [https://aws-us-east-1-portal10.dblayer.com:10019] before next request
[Tue Dec 15 15:33:24 2015] Ping [https://aws-us-east-1-portal7.dblayer.com:10304] before next request

We see the two connection nodes and the notes that they will each be pinged before the next request is made to ensure the connections are still live. This means that we are connected. We'll refer to our two nodes in shorthand below as "portal10" and "portal7".

If you want to use SSH for a more secure connection than HTTPS, check out our article about digging an SSH tunnel to your deployment. Even though this article uses RethinkDB as the example, the process is the same for Elasticsearch. Your perl script would then implement the Net::SSH module to connect (instead of the HTTP::Tiny one as mentioned above). To learn more about the Compose security options for Elasticsearch, have a look at Keeping Elasticsearch Secure.

A Simple Index

Before we check our cluster, let's index a document on our cluster so that the cluster stats we get later are more interesting. We'll get more into the details of indexing documents in the next part of this series, but for now, here's the quick-and-easy version using the built-in Elasticsearch defaults:

# Index a blog post:
$es->index(
    index   => 'blog_index',
    type    => 'blog_post',
    id      => 1,
    body    => {
        title   => 'It\'s a Rocco Holiday season of Giving',
        summary => 'We\'re giving away shirts for signing up for Compose.',
        date    => '2015-12-02'
    }
);

We're using the index() method on our client object to name the index, specify what type of document it is, set the document id, and then populate the document using "body". You can see that we're adding a few different fields in "body": a title, a summary, and a date. Because Elasticsearch uses JSON natively, our documents can be constructed flexibly. In this particular example, we've also escaped our in-text apostrophes with a backslash.

And here's what the logfile recorded for running this part of our script:

[Tue Dec 15 15:33:24 2015] Pinging [https://aws-us-east-1-portal10.dblayer.com:10019]
[Tue Dec 15 15:33:27 2015] Marking [https://aws-us-east-1-portal10.dblayer.com:10019] as live
[Tue Dec 15 15:33:27 2015] # Request to: https://aws-us-east-1-portal10.dblayer.com:10019
curl -XPOST 'http://localhost:9200/blog_index/blog_post/1?pretty=1' -d '  
{
   "date" : "2015-12-02",
   "title" : "It\u0027s a Rocco Holiday season of Giving",
   "summary" : "We\u0027re giving away shirts for signing up for Compose."
}
'

[Tue Dec 15 15:33:27 2015] # Response: 201, Took: 629 ms
# {
#    "_index" : "blog_index",
#    "created" : true,
#    "_id" : "1",
#    "_type" : "blog_post",
#    "_version" : 1
# }

We see that our "portal10" node was used for this request. It got pinged, marked as "live", and then the index request was sent via POST. The URL string contains the name of the index, the type, and the field details for document id 1.

In the next section, we see the response from the server. We get a 201 ("created") response and can see that it took 629 millisecond to complete. The response is commented by the logger with hash symbols so that we can easily spot it as the response. Our index did not previously exist so it was created with this request. Also, we can see that the document with id 1 was indexed and that this is version 1 of the indexed document.

Checking the Cluster

Now that we're connected and we've got a document indexed in our Elasticsearch cluster, let's see how our cluster is doing.

There are a couple methods we'll look at here. The first is for information about the version of Elasticsearch running on the cluster and the other one tells us about the overall health of the cluster. We're storing the response from each in an object variable named for the method to help us keep things straight: $info and $health. We could later parse these objects for specific data to use for alerts or some other purpose. For our example here, though, we're not going to do anything with the response objects. We'll just use the logfile to see what they contain.

# Elasticsearch info:
my $info = $es->info;

# Cluster requests:
my $health = $es->cluster->health;  

As we see the requests and responses below for each method, notice how the request switches back and forth between our "portal7" and our "portal10" nodes.

Using the info() method on our client object returns information about our version of Elasticsearch. Here's what the logfile shows:

[Tue Dec 15 15:33:27 2015] Pinging [https://aws-us-east-1-portal7.dblayer.com:10304]
[Tue Dec 15 15:33:28 2015] Marking [https://aws-us-east-1-portal7.dblayer.com:10304] as live
[Tue Dec 15 15:33:28 2015] # Request to: https://aws-us-east-1-portal7.dblayer.com:10304
curl -XGET 'http://localhost:9200/?pretty=1'

[Tue Dec 15 15:33:28 2015] # Response: 200, Took: 385 ms
# {
#    "tagline" : "You Know, for Search",
#    "cluster_name" : "daring-elasticsearch-54",
#    "version" : {
#       "number" : "1.7.3",
#       "lucene_version" : "4.10.4",
#       "build_timestamp" : "2015-10-15T09:14:17Z",
#       "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
#       "build_snapshot" : false
#    },
#    "status" : 200,
#    "name" : "elastic_search62_aws_us_east_1_data_19_dblayer_com"
# }

We see our Compose cluster name, that we're running version 1.7.3 of Elasticsearch with version 4.10.4 of Lucene under the hood, and that we're getting a 200 ("ok") response status from the cluster.

Running the cluster health() method gives us a brief synopsis of how the cluster is doing. A "green" status is what we like to see. Here's what our logfile shows us:

[Tue Dec 15 15:33:28 2015] # Request to: https://aws-us-east-1-portal10.dblayer.com:10019
curl -XGET 'http://localhost:9200/_cluster/health?pretty=1'

[Tue Dec 15 15:33:28 2015] # Response: 200, Took: 88 ms
# {
#    "active_shards" : 15,
#    "delayed_unassigned_shards" : 0,
#    "number_of_nodes" : 3,
#    "status" : "green",
#    "timed_out" : false,
#    "relocating_shards" : 0,
#    "number_of_pending_tasks" : 0,
#    "initializing_shards" : 0,
#    "number_of_in_flight_fetch" : 0,
#    "number_of_data_nodes" : 3,
#    "active_primary_shards" : 5,
#    "cluster_name" : "daring-elasticsearch-54",
#    "unassigned_shards" : 0
# }

In the response above, you can see that there are 5 primary shards. By default, each index is allocated 5 shards and 2 replicas in the Compose cluster since there are three nodes in the cluster.

Now, if we want more information, we can use many other methods which return an extensive amount of detail about the cluster. CPAN has more about cluster requests if you want to delve into additional settings and methods that can be used.

Two that might be useful for your project include state() and stats(). You can add them to your script in the "Cluster requests" section like this:

my $state = $es->cluster->state;  
my $stats = $es->cluster->stats;  

state() returns information about the cluster state from the master node. The response tells us about the nodes, including which one is currently the master node, about our index and how it's sharded and the number of replicas available, and how routing among the cluster is working. We have the option to limit our request to only specific metrics and even for specific indexes. The default setting is everything, however, so we kept it in the logfile rather than displaying it all for you here since it's a lot of data.

stats() returns information about cluster performance and settings. It can be limited by specifying a particular node. Again, we didn't put the data here from the logfile since it's quite extensive.

Putting It All Together

Here's what our connecting and monitoring perl script looks like:

use Search::Elasticsearch;

# Create client object to switch between two nodes and log/trace to a file:
my $es = Search::Elasticsearch->new(  
    trace_to => ['File','logfile.txt'],
    log_to => ['File','logfile.txt'],
    nodes => [
        'https://username:password@aws-us-east-1-portal10.dblayer.com:10019/',
        'https://username:password@aws-us-east-1-portal7.dblayer.com:10304/'
    ]
);

# Index a blog post:
$es->index(
    index   => 'blog_index',
    type    => 'blog_post',
    id      => 1,
    body    => {
        title   => 'It\'s a Rocco Holiday season of Giving',
        summary => 'We\'re giving away shirts for signing up for Compose.',
        date    => '2015-12-02'
    }
);

# Elasticsearch info:
my $info = $es->info;

# Cluster requests:
my $health = $es->cluster->health;  
my $state = $es->cluster->state;  
my $stats = $es->cluster->stats;  

We've opted to put everything into a logfile for this example, but you may decide that it's better for your situation to parse responses from your Elasticsearch cluster instead, so they can be used in other ways.

Next

Now that you know how to connect to your Elasticsearch cluster and monitor how it's doing using the Search::Elasticsearch perl module, you're probably ready to dive into getting your Elasticsearch project further underway. In the next parts of this series, we'll get into some basics of indexing, then some more advanced indexing options. In the final part of the series, we'll look at querying. Each part in the series will walk you through how you can use Elasticsearch.pm to get your Elasticsearch project up and running.