How to Script Painless-ly in Elasticsearch

Published

With the release of Elasticsearch 5.x came Painless, Elasticsearch's answer to safe, secure, and performant scripting. We'll introduce you to Painless and show you what it can do.

With the introduction of Elasticsearch 5.x over a year ago, we got a new scripting language, Painless. Painless is a scripting language developed and maintained by Elastic and optimized for Elasticsearch. While it's still an experimental scripting language, at its core Painless is promoted as a fast, safe, easy to use, and secure.

In this article, we'll give you a short introduction to Painless, and show you how to use the language when searching and updating your data.

On to Painless ...

A Painless introduction

The objective of Painless scripting is to make writing scripts painless for the user, especially if you're coming from a Java or Groovy environment. While you might not be familiar with scripting in Elasticsearch in general, let's start with the basics.

Variables and Data Types

Variables can be declared in Painless using primitive, reference, string, void (doesn't return a value), array, and dynamic typings. Painless supports the following primitive types: byte, short, char, int, long, float, double, and boolean. These are declared in a way similar to Java, for example, int i = 0; double a; boolean g = true;.

Reference types in Painless are also similar to Java, except they don't support access modifiers, but support Java-like inheritance. These types can be allocated using the new keyword on initialization such as when declaring a as an ArrayList, or simply declaring a single variable b to a null Map like:

ArrayList a = new ArrayList();  
Map b;  
Map g = [:];  
List q = [1, 2, 3];  

Lists and Maps are similar to arrays, except they don't require the new keyword on initialization, but they are reference types, not arrays.

String types can be used along with any variable with or without allocating it with the new keyword. For example:

String a = "a";  
String foo = new String("bar");  

Array types in Painless support single and multidimensional arrays with null as the default value. Like reference types, arrays are allocated using the new keyword then the type and a set of brackets for each dimension. An array can be declared and initialized like the following:

int[] x = new int[2];  
x[0] = 3;  
x[1] = 4;  

The size of the array can be explicit, for example, int[] a = new int[2] or you can create an array with values 1 to 5 and a size of 5 using:

int[] b = new int[] {1,2,3,4,5};  

Like arrays in Java and Groovy, the array data type must have a primitive, string, or even a dynamic def associated with it on declaration and initialization.

def is the only dynamic type supported by Painless and has the best of all worlds when declaring variables. What it does is it mimics the behavior of whatever type it's assigned at runtime. So, when defining a variable:

def a = 1;  
def b = "foo";  

In the above code, Elasticsearch will always assume a is a primitive type int with a value of 1 and b as a string type with the value of "foo". Arrays can also be assigned with a def, for instance, note the following:

def[][] h = new def[2][2];  
def[] f = new def[] {4, "s", 5.7, 2.8C};  

With variables out of the way, let's take a look at conditionals and operators.

Operators and Conditionals

If you know Java, Groovy, or a modern programming language, then conditionals and using operators in Painless will be familiar. The Painless documentation contains an entire list of operators that are compatible with the language in addition to their order of precedence and associativity. Most of the operators on the list are compatible with Java and Groovy languages. Like most programming languages operator precedence can be overridden with parentheses (e.g. int t = 5+(5*5)).

Working with conditionals in Painless is the same using them in most programming languages. Painless supports if and else, but not else if or switch. A conditional statement will look familiar to most programmers:

if (doc['foo'].value = 5) {  
    doc['foo'].value *= 10;
} 
else {  
    doc['foo'].value += 10;
}

Painless also has the Elvis operator ?:, which is behaves more like the operator in Kotlin than Groovy. Basically, if we have the following:

x ?: y  

the Elvis operator will evaluate the right-side expression and returns whatever the value of x is if not null. If x is null then the left-side expression is evaluated. Using primitives won't work with the Elvis operator, so def is preferred here when it's used.

Methods

While the Java language is where Painless gets most of its power from, not every class or method from the Java standard library (Java Runtime Environment, JRE) is available. Elasticsearch has a whitelist reference of classes and methods that are available to Painless. The list doesn't only include those available from the JRE, but also Elasticsearch and Painless methods that are available to use.

Painless Loops

Painless supports while, do...while, for loops, and control flow statements like break and continue which are all available in Java. An example for loop in Painless will also look familiar in most modern programming languages. In the following example, we loop over an array containing scores from our document doc['scores'] and add them to the variable total then return it:

def total = 0;  
for (def i = 0; i < doc['scores'].length; i++) {  
    total += doc['scores'][i];
}
return total;  

Modifying that loop to the following will also work:

def total = 0;  
for (def score : doc['scores']) {  
    total += score;
}
return total;  

Now that we have an overview of some of the language fundamentals, let's start looking at some data and see how we can use Painless with Elasticsearch queries.

Loading the Data

Before loading data into Elasticsearch, make sure you have a fresh index set up. You'll need to create a new index either in the Compose console, in the terminal, or use the programming language of your choice. The index that we'll create is called "sat". Once you've set up the index, let's gather the data.

The data we're going to use is a list of average SAT scores by school for the year 2015/16 compiled by the California Department of Education. The data from the California Department of Education comes in a Microsoft Excel file. We converted the data into JSON which can be downloaded from the Github repository here.

After downloading the JSON file, using Elasticsearch's Bulk API we can insert the data into the "sat" index we created.

curl -XPOST -u username:password 'https://portal333-5.compose-elasticsearch.compose-44.composedb.com:44444/_bulk' --data-binary @sat_scores.json  

Remember to substitute the username, password, and deployment URL with your own and add _bulk to the end of the URL to start importing data.

Searching Elasticsearch Using Painless

Now that we have the SAT scores loaded into the "sat" index, we can start using Painless in our SAT queries. In the following examples, all variables will use def to demonstrate Painless's dynamic typing support.

The format of scripts in Elasticsearch looks similar to the following:

GET sat/_search

{
  "script_fields": {
    "some_scores": {
      "script": {
          "lang": "painless",
        "inline": "def scores = 0; scores = doc['AvgScrRead'].value + doc['AvgScrWrit'].value; return scores;"
      }
    }
  }
}

Within a script you can define the scripting language lang, where Painless is the default. In addition, we can specify the source of the script. For example, we're using inline scripts or those that are run when making a query. We also have the option of using stored, which are scripts that are stored in the cluster. Also, we have file scripts that are scripts stored in a file and referenced within Elasticsearch's configuration directory.

Let's look at the above script in a little more detail.

In the above script, we're using the _search API and the script_fields command. This command will allow us to create a new field that will hold the scores that we write in the script. Here, we've called it some_scores just as an example. Within this new script field, use the script field to define the scripting language painless (Painless is already the default language) and use the field inline which will include our Painless script:

def scores = 0;  
scores = doc['AvgScrRead'].value + doc['AvgScrWrit'].value;  
return scores;  

You'll notice immediately that the Painless script that we just wrote doesn't have any line breaks. That's because scripts in Elasticseach must be written out as a single-line string. Running this simple query doesn't require Painless scripting. In fact, it could be done with Lucene Expressions, but it serves just as an example.

Let's look at the results:

{
    "_index": "sat",
    "_type": "scores",
    "_id": "AV3CYR8JFgEfgdUCQSON",
    "_score": 1,
    "_source": {
        "cds": 1611760130062,
        "rtype": "S",
        "sname": "American High",
        "dname": "Fremont Unified",
        "cname": "Alameda",
        "enroll12": 444,
        "NumTstTakr": 298,
        "AvgScrRead": 576,
        "AvgScrMath": 610,
        "AvgScrWrit": 576,
        "NumGE1500": 229,
        "PctGE1500": 76.85,
        "year": 1516
    },
    "fields": {
        "some_scores": [
            1152
        ]
    }
}

The script is run on each document in the index. The above result shows that a new field called fields has been created with another field containing the name of the new field some_scores that we created with the script_fields command.

Let's write another query that will search for schools that have a SAT reading score of less than 350 and a math score of more than 350. The script for that would look like:

doc['AvgScrRead'].value < 350 && doc['AvgScrMath'].value > 350  

And the query:

GET sat/_search

{
  "query": {
    "script": {
      "script": {
        "inline": "doc['AvgScrRead'].value < 350 && doc['AvgScrMath'].value > 350",
        "lang": "painless"
      }
    }
  }
}

This will give us four schools. Of those four schools, we can then use Painless to create an array containing four values: the SAT scores from our data and a total SAT score, or the sum of all the SAT scores:

def sat_scores = [];  
def score_names = ['AvgScrRead', 'AvgScrWrit', 'AvgScrMath'];  
for (int i = 0; i < score_names.length; i++) {  
    sat_scores.add(doc[score_names[i]].value)
}
def temp = 0;  
for (def score : sat_scores) {  
    temp += score;
}
sat_scores.add(temp);  
return sat_scores;  

We'll create a sat_scores array to hold the SAT scores (AvgScrRead, AvgScrWrit, and AvgScrMath) and the total score that we'll calculate. We'll create another array called scores_names to hold the names of the document fields that contain SAT scores. If in the future our field names change, all we'd have to do is update the names in the array. Using a for loop, we'll loop through the document fields using the score_names array, and put their corresponding values in the sat_scores array. Next, we'll loop over our sat_scores array and add the values of the three SAT scores together and place that score in a temporary variable temp. Then, we add the temp value to our sat_scores array giving us the three individual SAT scores plus their total score.

The entire query to get the four schools and the script looks like:

GET sat/_search

{
  "query": {
    "script": {
      "script": {
        "inline": "doc['AvgScrRead'].value < 350 && doc['AvgScrMath'].value > 350",
        "lang": "painless"
      }
    }
}, 
  "script_fields": {
    "scores": {
      "script": {
        "inline": "def sat_scores = []; def scores = ['AvgScrRead', 'AvgScrWrit', 'AvgScrMath']; for (int i = 0; i < scores.length; i++) {sat_scores.add(doc[scores[i]].value)} def temp = 0; for (def score : sat_scores) {temp += score;} sat_scores.add(temp); return sat_scores;",
        "lang": "painless"
      }
    }
  }
}

Each document returned by the query will look similar to:

"hits": {
    "total": 4,
    "max_score": 1,
    "hits": [
      {
        "_index": "sat",
        "_type": "scores",
        "_id": "AV3CYR8PFgEfgdUCQSpM",
        "_score": 1,
        "fields": {
          "scores": [
            326,
            311,
            368,
            1005
          ]
        }
      }
 ...

One drawback of using the _search API is that the results aren't stored. To do that, we'd have to use the _update or _update_by_query API to update individual documents or all the documents in the index. So, let's update our index with the query results we've just used.

Updating Elasticsearch Using Painless

Before we move further, let's create another field in our data that will hold an array of the SAT scores. To do that, we'll use Elasticsearch's _update_by_query API to add a new field called All_Scores which will initially start out as an empty array:

POST sat/_update_by_query

{
  "script": {
    "inline": "ctx._source.All_Scores = []",
    "lang": "painless"
  }
}

This will update the index to include the new field where we can start adding our scores to. To do that, we'll use a script to update the All_Scores field:

def scores = ['AvgScrRead', 'AvgScrWrit', 'AvgScrMath'];  
for (int i = 0; i < scores.length; i++) {  
    ctx._source.All_Scores.add(ctx._source[scores[i]]);
} 
def temp = 0;  
for (def score : ctx._source.All_Scores) {  
    temp += score;
}
ctx._source.All_Scores.add(temp);  

Using _update or the _update_by_query API, we won't have access to the doc value. Instead, Elasticsearch exposes the ctx variable and the _source document that allows us to access the each document's fields. From there we can update the All_Scores array for each document with each SAT score and the total average SAT score for the school.

The entire query looks like this:

POST sat/_update_by_query

{
  "script": {
    "inline": "def scores = ['AvgScrRead', 'AvgScrWrit', 'AvgScrMath']; for (int i = 0; i < scores.length; i++) { ctx._source.All_Scores.add(ctx._source[scores[i]])} def temp = 0; for (def score : ctx._source.All_Scores) {temp += score;}ctx._source.All_Scores.add(temp);",
    "lang": "painless"
  }
}

If we want to update only a single document, we can do that, too, using a similar script. All we'll need to indicate is the document's _id in the POST URL. In the following update, we're simply adding 10 points to the AvgScrMath score for the document with id "AV2mluV4aqbKx_m2Ul0m".

POST sat/scores/AV2mluV4aqbKx_m2Ul0m/_update

{
  "script": {
    "inline": "ctx._source.AvgScrMath += 10",
    "lang": "painless"
  }
}  

Summing up

We've gone over the basics of Elasticsearch's Painless scripting language and have given some examples of how it works. Also, using some of the Painless API methods like HashMap and loops, we've given you a taste of what you could do with the language when updating your documents, or just modifying your data prior to getting your search results back. Nonetheless, this is just the tip of the iceberg for what's possible with Painless.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Leeroy Agency

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.