Using Query String Queries in Elasticsearch

In Elasticsearch, query string queries are their own breed of query - loads of functionality for full text search rolled into one sweet little package. In this article, we'll take a closer look at why query string queries are special and how you can make use of them.

Search Lite

Elasticsearch: The Definitive Guide explains that the query string query type uses what they call "Search Lite", where all the query parameters are passed in the query string. Because of this, query string queries use a different syntax than the standard request body we've covered in previous articles, such as Elasticsearch Query-Time Strategies and Techniques for Relevance: Part I and Part II.

Note that the request body format for querying is the recommended approach from Elasticsearch since it is considered robust and provides extensive functionality. However, if you just need to do a quick-and-dirty full text search that has some power behind it, then using the "q" parameter in search (the query string query shortcut) is the way to go. Generally, query string queries, and their cousins (simple query string queries), will be most effective when used in development or QA testing, or when made available to power users who know the syntax like the back of their hand.

Let's take a closer look at this query type to understand what it can do for us by searching against the IMDB Top 250 Films, which we have loaded into our Elasticsearch instance an index called "top_films" with a document type named "film".

The query string "query"

If you look at the Elasticsearch documentation for the Search APIs "Search" page, you'll notice all the examples there use the "q" parameter for search. This is a shortcut way of accessing query string queries. Using the "q" parameter for search is equivalent to the "query" option in JSON-formatted query string queries (which we'll get into more details on later in the article when we look at the setting options).

We're going to start by exploring just the query and its syntax since that's the bare bones needed for query string queries, though you'll see that it's chock full of features just on its own.

Let's run through some basic default settings so we know where we stand when we construct a query using the query string query type:

There are some other more advanced usage defaults, but now that we know the basic ones, we'll use the "q" parameter -- the query string queries shortcut -- for some search examples. To follow along, just copy the HTTP connection string for your Elasticsearch deployment from the "Overview" page in the Compose administrative web console, supplying the username and password appropriate for your instance. In this first example, we'll show the full URL path so you can see how it's formed, but for the rest of the examples, we won't show the whole connection string, just the part starting with our "top_films" index through the "q" parameter.

Getting started, then... the simplest query we can construct is a single term query without any additional specification, for example:

https://admin:[password]@aws-us-east-1-portal10.dblayer.com:10019/top_films/film/_search?q=godfather  

Here, we're just searching for "godfather". Since we didn't specify a particular field, the _all field will be used. As our result, we get 2 hits: The Godfather and the The Godfather: Part II, since those are the only ones from that series which have made it into the top 250 films:

 {
    "hits" : {
       "total" : 2,
       "max_score" : 1.1862482,
       "hits" : [
          {
             "_index" : "top_films",
             "_score" : 1.1862482,
             "_source" : {
                "title" : "The Godfather",
                "year" : "1972"
             },
             "_type" : "film",
             "_id" : "2"
          },
          {
             "_source" : {
                "title" : "The Godfather: Part II",
                "year" : "1974"
             },
             "_score" : 1.1862482,
             "_type" : "film",
             "_id" : "3",
             "_index" : "top_films"
          }
       ]
    }
 }

If you'd like to know more about how the scores are arrived at for each of the hits, then check out our article How Scoring Works in Elasticsearch.

Multiple terms

Now that we have our baseline for a "godfather" query, let's add to our original query a bit to create a multi-term query using the terms "godfather" and "part". We'll join the two terms with the + sign for the proper URL encoding of the query:

/top_films/film/_search?q=godfather+part

In this case, because multiple terms are "OR"d together by default, we get 3 hits - the 3 films that have either the term "godfather" or the term "part" in them: The Godfather and The Godfather: Part II which we saw before, plus Harry Potter and the Deathly Hallows: Part 2:

 {
    "hits" : {
       "total" : 3,
       "hits" : [
          {
             "_index" : "top_films",
             "_type" : "film",
             "_id" : "3",
             "_score" : 1.6776081,
             "_source" : {
                "title" : "The Godfather: Part II",
                "year" : "1974"
             }
          },
          {
             "_index" : "top_films",
             "_source" : {
                "year" : "1972",
                "title" : "The Godfather"
             },
             "_score" : 0.41940203,
             "_type" : "film",
             "_id" : "2"
          },
          {
             "_index" : "top_films",
             "_source" : {
                "title" : "Harry Potter and the Deathly Hallows: Part 2",
                "year" : "2011"
             },
             "_score" : 0.35948747,
             "_id" : "216",
             "_type" : "film"
          }
       ],
       "max_score" : 1.6776081
    }
 }

Phrases

Let's try that same search again, but this time we'll use double quotes to indicate that the two words form a phrase (we're using URL encoding "%22" for double quotes and "%20" for a whitespace):

/top_films/film/_search?q=%22godfather%20part%22

Now that we've indicated to treat the two words as a phrase, we only get 1 hit back: The Godfather: Part II:

{
    "hits" : {
       "total" : 1,
       "max_score" : 2.3724964,
       "hits" : [
          {
             "_score" : 2.3724964,
             "_source" : {
                "title" : "The Godfather: Part II",
                "year" : "1974"
             },
             "_id" : "3",
             "_index" : "top_films",
             "_type" : "film"
          }
       ]
    }
 }

Fields

If we want to search in a specified field (or fields) for terms, we can indicate that with our query syntax. For example, we can search the title field for "godfather" and the year field for 1974 (the year that The Godfather: Part II was released). Notice the "%3A" URL encoding for a colon between the field name and the term:

/top_films/film/_search?q=title%3Agodfather+year%3A1974

In this case, we'll get 3 results as well since each part of the query is "OR"d together: our two Godfather films because they match the term "godfather" in the title and Part II also matches the term 1974 for the year, and the film Chinatown because it is the only other film in the top 250 films that was released in 1974:

 {
    "hits" : {
       "total" : 3,
       "hits" : [
          {
             "_id" : "3",
             "_score" : 2.8478138,
             "_index" : "top_films",
             "_type" : "film",
             "_source" : {
                "year" : "1974",
                "title" : "The Godfather: Part II"
             }
          },
          {
             "_source" : {
                "year" : "1972",
                "title" : "The Godfather"
             },
             "_type" : "film",
             "_id" : "2",
             "_score" : 1.6665416,
             "_index" : "top_films"
          },
          {
             "_source" : {
                "year" : "1974",
                "title" : "Chinatown"
             },
             "_type" : "film",
             "_score" : 0.0906736600000001,
             "_id" : "122",
             "_index" : "top_films"
          }
       ],
       "max_score" : 2.8478138
    }
 }

Wildcards

Let's turn pat of our query into a wildcard search. In this case we'll prefix "father" for any matching characters using the * special character:

/top_films/film/_search?q=title%3A*father+year%3A1974

Now we get back 4 hits: the 3 we saw above, plus In the Name of the Father because it matched to our wildcard query for "*father" in the title:

 {
    "hits" : {
       "max_score" : 1.4142135,
       "total" : 4,
       "hits" : [
          {
             "_index" : "top_films",
             "_id" : "3",
             "_score" : 1.4142135,
             "_type" : "film",
             "_source" : {
                "title" : "The Godfather: Part II",
                "year" : "1974"
             }
          },
          {
             "_id" : "2",
             "_index" : "top_films",
             "_score" : 0.35355338,
             "_type" : "film",
             "_source" : {
                "year" : "1972",
                "title" : "The Godfather"
             }
          },
          {
             "_score" : 0.35355338,
             "_index" : "top_films",
             "_id" : "122",
             "_source" : {
                "title" : "Chinatown",
                "year" : "1974"
             },
             "_type" : "film"
          },
          {
             "_index" : "top_films",
             "_id" : "185",
             "_score" : 0.35355338,
             "_type" : "film",
             "_source" : {
                "title" : "In the Name of the Father",
                "year" : "1993"
             }
          }
       ]
    }
 }

We can just keep building more and more complex queries in this way. Some options will increase the number of hits and some will decrease them. We won't go through examples of all the options available in the query syntax here, but we'll do a quick review of other special characters and syntax for more advanced functionality.

Other query options

As you can see, there's a lot of functionality available to you in just formulating query string queries and using the "q" parameter in search makes it pretty simple to do.

But what if you want to change some of those defaults we discussed? If, instead of using the "q" parameter in search as we've shown in all the examples above, you choose to construct a full-blown query string query, then you have many more options available to you. We'll look at these next.

Query string query settings

There are many different settings available in query string queries and you can set them as options during query construction. Let's have a closer look at some of them.

First, as we mentioned above, the default operator is "OR". We showed an example above where we specifically used "AND", but you can change the default operator to "AND". This is one of the settings that can also be changed for the "q" parameter so we'll show both methods.

First, let's take the query we did above where we searched for the individual terms "godfather" and "part". If you remember, we got 3 hits: the 2 Godfather films and then also Harry Potter and the Deathly Hallows: Part 2 because it contains the term "part" and our query "OR"d the two terms. We can change that behavior by setting the default_operator setting as an additional parameter in the URL alongside our "q" parameter. It'd look like this:

/top_films/film/_search?default_operator=AND&q=godfather+part

Now, because we're "AND"ing together our two search terms by default, we'll only get The Godfather: Part II back since it's the only film that contains both terms.

The default_operator setting is one of the only ones from the query string query settings that can be set as an additional parameter in the URL with the "q" parameter. All the other settings we'll cover below can only be used within the query string query construction, which is formatted as JSON. So, here's what our query would look like using that format:

{
  "query_string" : {
    "query" : "godfather part",
    "default_operator" : "AND"
  }
}

Now that you can see how the query string query construction is formatted, we won't go through all the settings here, but we'll look at a couple more of them below so you can get an idea of some different uses.

First, as we mentioned above, when multiple fields are specified in the query, then bool is used by default. We can instead set multiple field queries to use the disjunction maximum function. To do this, set use_dis_max to "true" as follows:

{
  "query_string" : {
    "query" : "godfather mafia",
    "fields" : ["title", "description"],
    "use_dis_max" : "true"
  }
}

In this case, we're looking for "godfather" or "mafia" in either the title or the description field and we've indicated that we want to use disjunction maximum. For a discussion on using boolean versus dismax, have a look at our article on querytime strategies and techniques

Another one we'll have a quick look at here is fuzziness. Fuzziness has a few different settings that you can alter. These include the fuzziness setting itself where you can change the default from "AUTO" to a specified character count you'd prefer to use, the fuzzy_max_expansions setting which you can alter from the default of 50 expansions to another number that better suits your queries and document set, and the fuzzy_prefix_length where you can set the number of characters at the beginning of terms which should not be changed for fuzzy matches. On that one, for example, you may want to set the prefix length as 1 character so that the first character of a term will not be changed for fuzzy matching. The default for prefix length is 0 so all characters in a term are candidates for changing unless you set this option. Here's an example containing these settings:

{
  "query_string" : {
    "query" : "man~",
    "fuzziness" : 2,
    "fuzzy_max_expansions" : 10,
    "fuzzy_prefix_length" : 1
  }
}

In the above example, we're doing a fuzzy match on the term "man", indicated by the ~ special character at the end of the term. We've changed the default fuzziness setting from "AUTO" to 2 specifying that we want to allow up to 2 characters to change. We've also lowered the default expansions from 50 to 10 and we've specified that the first character of the term cannot change.

There are many other settings that are available, which you can read about in the official Elasticsearch documentation for query string queries. It's probably pretty clear now how powerful this query type is.

The primary drawback of query string queries, and why they're recommended only for development, QA testing, and knowledgeable power users, is that they can break easily with a simple typo... One slip of the syntax can yield zero results or send your Elasticsearch instance crunching on some heavy query that consumes all the memory. Because mistakes are easily made, Lucene (which runs under the hood of Elasticsearch) developed the SimpleQueryParser, whose purpose is to parse a string of human-readable text, no matter how poorly formatted, to produce a result. The simple query string query uses this special parser so that it ignores parts of the query that aren't formatted correctly. It's got much of the same functionality of the query string query type, but it's more like a laid-back cousin.

We've already covered a lot in this article so we'll have to save simple query string queries for another day.

Wrapping up

In this article we got deep into the syntax for using the "q" parameter in search, which is a shortcut for performing query string queries in Elasticsearch. We also looked at how to construct full-blown query string queries in JSON format and why and how you might want to change some of the default settings or use other setting options available to you. Query string queries can help you quickly test your Elasticsearch index and can be a boon for power users who want to have maximum functionality directly in the query syntax. Query well.