Elasticsearch Query-Time Strategies and Techniques for Relevance: Part I

In this 2-part series, we'll look at a couple different strategies for applying some of Elasticsearch's built-in tools at query time to tune search results for relevancy. The techniques we'll review can help determine which documents get retrieved and impact the relevance scores for the retrieved documents. If you need a primer for relevance and the mechanics of scoring used in Elasticsearch, check out our article on how scoring works.

Elasticsearch is a complex beast made even more powerful as a search engine by running Lucene under the hood. Because there are so many knobs and dials (and switches and buttons and faders) we won't be able to cover all the nitty-gritty in this series, but we hope to give you an idea of how the relevancy of your search results can be improved. You'll get an understanding of some key concepts that you can use to define a strategy and apply techniques that will work for your situation.

Get Personal

Have you heard your users sigh and complain that "search doesn't work"? While search may indeed be broken due to some network issue or a bug in the UI, usually that's not what your users mean. What they mean is that they didn't get the results they were expecting for the search they did. There may be results they expected to see that didn't appear or the results may have been in an order that didn't jive with the documents they thought should be more highly ranked or there may have been too many results and they got overwhelmed. Users have been trained to expect Google-like magic from search engines. That can be a hard act to follow, but Elasticsearch has some of its own magic and a plethora of tools to help you do just that.

Yep - that's right... it's time to move away from the default settings and instead get to know search on a personal level. Your search results can be tuned for better relevancy based on what your users expect.

Precision vs Recall

If you're responsible for search, then you should have a basic understanding of precision and recall. These two measures will shape the strategies we'll focus on in this article.

Precision is the number of relevant documents retrieved divided by the total number of retrieved documents.

          Precision = relevant documents retrieved / retrieved documents

Recall is the number of relevant documents retrieved divided by the total of all relevant documents.

          Recall = relevant documents retrieved / relevant documents

Let's throw some numbers in there to help this make more sense. Let's say that our 1,000 document index contains 30 relevant documents for our given query. Performing the search gives us some of those relevant documents and also some non-relevant documents in the result set. If we get back 50 search results, but only 25 of them are part of the set deemed relevant, then our precision is 50% (25 relevant retrieved documents / 50 retrieved documents), but our recall is 83% (25 relevant retrieved documents / 30 relevant documents).

Optimum recall and precision is achieved when all and only the relevant documents are retrieved. We can discover how close we are to the optimum by calculating the F1 score, which basically tells us how well both measures are performing together. In practice, however, recall and precision tend to have an inverse relationship to each other. The higher the precision, then generally, the lower the recall, and vice versa. For that reason, you can typically tune your search more toward one or more toward the other. Which way you go depends on what your users deem relevant and what they are willing to accept sifting through search results.

Of course, achieving the results your users expect will probably mean you need to look at more than just the query-time tools - you'll probably need to make adjustments in indexing, analysis, and post-results settings - but dialing in how the queries function can help you discover where there are opportunities in those other areas and also let you fine-tune results to better satisfy your users.

Query DSL

The Query DSL (Domain Specific Language) in Elasticsearch is included in the Search APIs and provides a robust, flexible interface to query your indexes using JSON.

Be aware that the Query DSL for queries and filters has changed somewhat between the 1.x versions (you'll likely have 1.7.3 with a pre-existing Elasticsearch deployment on Compose) and the 2.x versions (you'll get 2.1.1 as of this publishing with a new Elasticsearch deployment). For that reason, please check the official documentation for the version of Elasticsearch you are running and construct your queries (and filters) accordingly. Our examples below are based on the 2.x structure.

Total Recall

Usually recall is not a problem. Most search algorithms err on the side of recall, but use relevance scores to give the impression of precision (we'll discuss that strategy in Part II). Since relevancy is in the eye of the beholder, it's easy to get high recall if we assume that just because a document matched a query, it's got to be relevant to someone. The strategy used in these cases is to retrieve as many documents as possible, knowing that will likely increase the number of relevant documents retrieved. This puts more burden on users to sift through the non-relevant documents that get retrieved, but the documents they expect to find are likely to be in the result list somewhere. If your users have more of a research-bent, then a recall-oriented strategy is likely going to be the way to go since these users can't bear to have missed any possible reference they might need for their work.

Match All

We can achieve perfect recall by writing a query to retrieve all the documents. Elasticsearch even provides a way for us to do that - the match_all query type. Retrieving all the documents for every query doesn't make for a very good search experience, though, (let alone performance if you have a large document set... not something we recommend) unless we combine that with pre- or post-result filtering or with other query functions (some of which we'll discuss below and in Part II) that can help weed and rank those documents based on an additional query or defined logic.

KISS

Because we're trying to retrieve as many documents as might be relevant, usually the simplest queries and filters are the ones that will help us with recall. So, KISS (Keep it simple, stupid)!

Filter

It may seem counter-intuitive, but the most straightforward approach is to use a simple filter. A filter performs an exact match in the specified field. Because of that, filters are typically used on not-analyzed fields (straight string, numeric, or date fields). When only a filter is applied then the default query that happens behind the scenes is the match_all query. So, let's say we want to retrieve documents by a specified publication year. We can perform a filter search for "year" = 2006, for example, and that will first of all return all the documents in the index and then filter them by the ones that match our filter. Our result set is all the documents with a publication year of 2006. There won't be any relevance ranking because all the documents are equal in terms of relevance - they all match exactly publication year of 2006. In this scenario we get high recall for our query as well as high precision.

Here's an example where we want to retrieve all the potential candidates who have applied to our company that are located in the state of Colorado:

 {
   "query": {
      "filter" : {
          "term" : { "state" : "colorado" }
      }
   }
 }

Match

The match query type is the most commonly used. You're probably already familiar with it since it's recommended by Elasticsearch as the default query type. match is used for full text searches in any field (use multi_match for multiple fields). If the field is an analyzed field, match will also apply the field analyzer(s) to the incoming query for best results. Because analyzers can be configured to include stemming or synonyms the possible matches will be automatically expanded to include variant forms of the terms, meaning more documents are likely to be found as matches than just the ones with those exact terms. Documents retrieved will be scored using the Practical Scoring Function, which we discussed in our previous article on scoring.

So, let's go back to our candidate records to identify candidates who have mentioned "social media" as part of their qualifications:

 {
   "query": {
      "match" : { "qualifications" : "social media" }
   }
 }

Dis Max

Dis Max stands for Disjunction Maximum. It is used in the multi_match query type mentioned above, but can also be used directly. Basically, it queries multiple fields for the terms specified individually, making use of analyzers for analyzed fields. If multiple terms from the query are found in the same field (or even in multiple fields) then those documents will rank more highly than documents where only one term is matched in one or more fields (frequency of the terms is less important than matching all the terms in the query). This allows for recall of many potentially relevant documents because we're looking for any of the terms across multiple fields, but it weights some of them more highly based on the term-field combinations.

Below we are using dis_max to search for candidates with qualifications mentioning "social media" and also those who may have mentioned it in their cover letter. By using dis_max we'll also return records that may match "social" in the cover letter and "media" in the qualifications, or vice versa, or some combination, thus expanding our retrieved document set.

 {
   "query": {
      "dis_max": {
          "queries": [
              { "match": { "qualifications": "social media" }},
              { "match": { "cover_letter":  "social media" }}
          ]
      }
   }
 }

Fuzziness

With match queries, including dismax, and with term queries, we can even take them further into recall territory by using fuzziness. Fuzziness allows us to indicate how many edits can be made to a term that we would still consider to be a match. For example, if we're using the "auto" setting for fuzziness, a term that contains 3-5 characters could have one character edited and we would still consider that a match. Let's look at the case of a query for "snake". If we consider one edit to still be a valid match, then documents with the the word "shake" would also be returned (we've edited the "n" to replace it with an "h"), albeit with a lower relevancy score. Fuzziness is a technique to increase recall, but typically comes at a significant cost to precision.

Let's look for potential candidates who have experience with strategy. Using fuzzy matching with a fuzziness of 2 edits, we would also find matches where qualifications contained "strategic" or perhaps the mispelling "startegy" instead.

 {
   "query": {
      "fuzzy" : {
        "qualifications" : {
            "value" : "strategy",
            "fuzziness" : 2
        }
      }
   }
 }

Recall Wrap-Up

In considering a recall strategy, these are some of the query-time techniques we can apply:

It can be very tempting to assume a match equates with relevance (which is what we're doing with a recall strategy), but a match on "pumps" that returns a product spec for a water pump is not going to be relevant to someone who was looking for women's shoes. Now that we know how to expand our potentially-relevant result set and thus typically increase recall, let's move on to precision.

Precise is Nice

The precision strategy is to make sure the documents retrieved are as relevant as possible - no room for "potentially-relevant" here. Because of that, these next techniques will mean we retrieve less documents, but also that our results are more precise. By using them, we might miss some relevant documents, but that's the trade-off we'll make for higher precision. This kind of strategy works best for users who just want the best answer, not to know what all the posible answers are.

Play by the Rules

For a precision-oriented strategy, we're going to want to apply as many rules as we can to fine-tune our queries to retrieve only relevant results.

Bool

The bool query type takes us a step beyond match queries. While performing the full text matching using analyzers where they exist, it also allows us to define rules for matching and the ability to combine queries (and filters) to better dial-in precision in the result set. For boolean queries, we can define "must", "should", "should not", and "must not" matches. The number of conditionals impacts the relevancy score. Scoring can also be impacted by how the query is structured, the hierachy applied to the conditionals. Nested conditionals will score lower than ones at higher levels. Boolean queries typically require some kind of "advanced" search UI for your users to construct the query rules.

Here's an example of a boolean query for a potential candidate for the marketing department where we've specified that the candidate cannot have less than 10 years of experience and should have qualifications for social media and public relations:

 {
   "query": {
     "bool": {
        "must" : {
            "term" : { "department" : "marketing" }
        },
        "must_not" : {
            "range" : {
                "experience" : { "lt" : 10 }
            }
        },
        "should" : [
            {
                "match" : { "qualifications" : "social media" }
            },
            {
                "match" : { "qualifications" : "public relations" }
            }
        ]
     }
   }
 }

Filters

As we mentioned when discussing recall above, filters can act on their own as a simple yes/no to get results from the index (similar to using term queries), but they can also be used to aid precision by being coupled with queries. A filter can fine-tune a query by including only relevant results or excluding irrelevant ones. For example, for the ambiguous term "pumps" that we brought up previously, we could do a match or term query for that term, but then also combine that using a bool query type with a filter on the field "product-category". We can filter that field for "shoes" or for "tools", depending on which kind of pump document we wants to retrieve. Again, this technique will typically require that an "advanced" search UI is available to your users to be able to define the applicable filters.

Let's revisit our potential candidate search and add a filter for the state the candidate is located in:

 {
   "query": {
     "bool": {
        "must" : {
            "term" : { "department" : "marketing" }
        },
        "must_not" : {
            "range" : {
                "experience" : { "lt" : 10 }
            }
        },
        "filter" : {
            "term" : { "state" : "colorado" }
        },
        "should" : [
            {
                "match" : { "qualifications" : "social media" }
            },
            {
                "match" : { "qualifications" : "public relations" }
            }
        ]
     }
   }
 }

Be a Minimalist

For making sure our result sets are as precise and relevant as they can be, we can also apply some minimum thresholds which will exclude documents that don't meet the specified standard.

minimum_should_match

To add another level of precision on boolean queries, we can apply the minimum_should_match parameter. Using minimum_should_match sets a threshold (absolute number, percentage, or combination of these) for matching clauses in boolean queries. For example, we could set a minimum_should_match as 2, meaning that at least two of the conditional clauses in our boolean query have to match for us to consider the document a match for the query.

Below we've added a minimum_should_match parameter to our potential candidate search that indicates that at least one of the conditional "should" clauses has to match.

 {
   "query": {
     "bool": {
        "must" : {
            "term" : { "department" : "marketing" }
        },
        "must_not" : {
            "range" : {
                "experience" : { "lt" : 10 }
            }
        },
        "filter" : {
            "term" : { "state" : "colorado" }
        },
        "should" : [
            {
                "match" : { "qualifications" : "social media" }
            },
            {
                "match" : { "qualifications" : "public relations" }
            }
        ],
        "minimum_should_match" : 1
     }
   }
 }

min_score

Regardless of query type, we can also set a minimum score threshold, where documents which match less than the stated minimum will be excluded in the result set. This is like chopping your retrieved document set with a blunt axe, but if your users require super-high precision and are only interested in the most relevant results from what was retrieved, this is one way to achieve their standards.

Precision Wrap-Up

For a precision strategy, consider these query-time techniques:

Coming Next

In this article we looked at strategies and techniques around precision and recall, and walked through some examples of how to structure our queries to take advantage of some of the query types and settings in Elasticsearch to achieve our goals.

In our next article we'll look at how to directly impact the relevance scores for the retrieved documents using some built-in tools that can be applied at query time, including boosts and more advanced functions.