Elasticsearch Tools & Compose

Bonsai

Outside of the core Elasticsearch toolset, there's a world of tools that make the search and analytics database even more useful and accessible. In this article we'll look at some and show what you do to get them working with Compose's Elasticsearch deployments. We'll start with a command line tool, move on to a simple search tool and finish with an all purpose client for searching and manipulating your Elasticsearch database...

Es2unix - Command line power

Let us start the tool tour with Es2unix, from the Elasticsearch developers. Es2unix is a version of the Elasticsearch API that you can use from the command line. It doesn't just make the API calls though, it also converts the returned results into a line-oriented, tabular format like many other Unix tools output. That makes it ideal for integrating Elasticsearch into your awk, grep and sort using shell scripts.

Es2unix will need Java installed, Java 7 at least, and the binary version can be simply downloaded with a curl command and enabled with chmod as per the installation instructions:

curl -s download.elasticsearch.org/es2unix/es >~/bin/es  
chmod +x ~/bin/es  

Note this assumes you have a bin directory in your $HOME and it's on your path.

Now, when you run es it'll assume that Elasticsearch is running locally. When you are using Compose Elasticsearch, that isn't the case. If you've got the HTTP/TCP access portal enabled, you'll have to give the es command a URL to locate your Elasticsearch deployment. You can get the URL from your Compose dashboard - remember to substitute in the username and password of a Elasticsearch user (from the Users tab) into the URL. This URL is then passed using the -u option:

$ es -u https://user:pass@haproxy1.dblayer.com:10360/ version
es            20140723711d4f9  
elasticsearch 1.3.4  

The es command is followed by one of a selection of subcommands. There we've used the version subcommand to get the version of the es command and the version of Elasticsearch it is talking to. The health of the cluster can be established with the health subcommand:

$ es -u https://user:pass@haproxy1.dblayer.com:10360/ health -v
time     cluster    status nodes data pri shards relo init unassign  
11:14:39 EsExemplum green      3    3   3      6    0    0        0  

Drop the -v to get unlabelled results, ideal for passing into monitoring software - adding -v on many es subcommands is a signal that more extensive labelling of returned data is desired.

The es command has the ability to count all documents or the number of documents that meets a simple query, and to search all indices and return matching ids:

$ es -u https://user:pass@haproxy1.dblayer.com:10360/ count "one species or variety"
11:44:02 16 "one species or variety"  

shows a count of documents matching the parts of that phrase to different extents. Using the search command we can dig deeper:

$ es -u https://user:pass@haproxy1.dblayer.com:10360/ -v search "one species or variety"
score   index         type    id  
0.16337 darwin-origin chapter II  
0.12559 darwin-origin chapter IX  
0.10360 darwin-origin chapter IV  
0.10141 darwin-origin chapter I  
0.09734 darwin-origin chapter XI  
0.09326 darwin-origin chapter V  
0.09226 darwin-origin chapter XV  
0.08744 darwin-origin chapter XIV  
0.08069 darwin-origin chapter VIII  
0.07525 darwin-origin chapter III  
 Total: 16

Now we can see the matching score along with the id, index and type of the document. Although here, 16 documents match, Elasticsearch returns only the top ten results by default. If we wanted to be more precise we could quote the string (remembering we're in the shell so back-slash escapes are needed) and select a field for matching:

$ es -u https://user:pass@haproxy1.dblayer.com:10360/ -v search ""one species or variety"" text
score   index         type    id text  
0.03073 darwin-origin chapter I  ["CHAPTER I. VARIATI  
0.03073 darwin-origin chapter IX ["CHAPTER IX. HYBRID  
  Total: 2

Other subcommands in es2unix include indices, for listing indexes, ids for retrieving all ids from an index and a variety of management reporting commands such as nodes, heap and shards.

SSHortcuts

You'll have probably noticed that the es command is a little laborious when you have to specify the URL every time. Es2unix doesn't have any short cuts when it comes to passing that URL like environment variables. There is another way though to shorten things and thats by using an SSH access portal instead. If you configure an SSH access portal for your Elasticsearch deployment then the default command for creating your SSH tunnels makes a node of the cluster appear to be at localhost:9200 which is the default. Once you have an SSH tunnel set up, you can drop the entire -u [URL] part and use tools as if you had Elasticsearch locally configured.

Quick search

Sometime you just want to set up a quick search for your Elasticsearch database with the minimum of effort. The Calaca project is very useful in that regard. It's an all JavaScript search front end for Elasticsearch which connects up to Elasticsearch. To get up and running, you'll want to download and unpack the zip file available from the Github page. Calaca's configuration can be found in the file js/config.js which looks like this:

var indexName = "name"; //Ex: twitter  
var docType = "type"; //Ex: tweet  
var maxResultsSize = 10;  
var host = "localhost"; //Ex: http://ec2-123-aws.com  
var port = 9200;  

As you can see, it comes configured to use the database on localhost port 9200, so you could use the SSH shortcut above. But we're here anyway so we need to change the host variable to "https://user:pass@haproxy1.dblayer.com" to match the URL we're given in the Compose dashboard and don't forget to copy in the username and password. The port number also needs to be copied from the dashboard URL to the port variable. The rest of the configuration is selecting what to search and what to show. Set the indexName and docType variables to index and data type you want to search. So, for our example here we have a config.js that reads:

var indexName = "darwin-origin";  
var docType = "chapter";  
var maxResultsSize = 10;  
var host = "https://user:pass@haproxy1.dblayer.com";  
var port = 10361;  

Then it's a matter of editing the index.html file to set what results are shown. In the middle of the file is a section which says:

<article class='result' ng-repeat='result in results track by $id(result)'>  
  <h2>{{result.name}}</h2>
  <p>{{result.description}}</p>
</article>  

Edit the result.name and result.description to display what fields you want to display from your document:

<h2>Chapter {{results._id}}:{{result.title}}</h2>  
<p>{{result.text.substring(0,255)}}...</p>  

We have a particularly long block of text in our document which we truncates down and we use the id and title together to create a heading. Save that, open index.html in your browser – there's no need to deploy to a server – and you'll see Calaca's search field. Enter a term and you'll see results like so:

Screenshot 2014-11-20 11.44.54

It's a quick way to get a pretty search query front end up locally without wrestling with forming Curl/JSON requests or deploying a full on server.

ESClient

Where Calaca's great for a super simple search client, you might want something a little more potent for your searching. For that, try ESClient, which not only has an extensive search UI but adds the ability to display those results in a table or as raw JSON results and then edit and delete selected documents. Like Calaca, ESClient needs no server, just download the zip or clone the Github respository. Configuring it means just editing the config.js file and putting in the URL from the Compose dashboard:

var Config = {  
   'CLUSTER_URL':'https://user:pass@haproxy1.dblayer.com:10361',
   ...

Then you open esQueryClient.html in your browser and before you know it, there's the ESClient configuration screen - click the Connect button and a connection to the Elasticsearch database will be made and you'll be moved to the Search tab where you can select index, type, fields, sort fields, specify a Lucene or DSL query and click Search to see the results in a table below the query.

esclientscreen

Double clicking on a result will let you edit the documents that make up the result or you can use the results as a guide for a delete operation. If you set to "Raw JSON" switch in the Configuration tab, you'll also be able to view the complete raw returned results in the JSON Results tab.

It's all rather usefully functional and there's only one slight problem. If you look at the top of the ESClient page, you'll see it's displaying the username and password as part of the URL for the database you are connecting to. Not really ideal that, but the SSH access portal can help out there too. If you set up and activate the tunnel, then you can return the CLUSTER_URL value in the config.js file to http://localhost:9200 and there'll be no username or password to display on screen.

Wrapping up

We've touched on three tools in this article, but more importantly we've shown the practical differences between using the HTTP/TCP and SSH access portals on componse. With HTTP/TCP access, there will be usernames and passwords embedded in the URL you use and this will leave any scripts or tools you configure susceptible to shoulder surfers and the like. That said, for occasionally launched tools it is quick and simple.

With the SSH access portal, the configuration and authentication is done when you set up the tunnel in a separate process and the tunnel means you can use Elasticsearch as if the node was installed locally. The downside is you do need to make sure the SSH tunnel is up before you run any command and it may be easier to go through the HTTP/TCP access portal. But then thats why we give you both options at Compose so you can choose what suits you and your applications best.