GeoFile: Elasticsearch Geo Queries

Published

GeoFile is a series dedicated to looking at geographical data, its features, and uses. In this article, we'll be covering Elasticsearch and its Geo mapping datatypes, geo_point and geo_shape, and Geo querying capabilities. We'll show you how to construct your mappings and demonstrate how to query some data.

The use of GeoData in databases has risen in popularity. In other GeoFile articles, we've covered database extensions like PostGIS for PostgreSQL, as well as MongoDB and Redis's Geo querying capabilities that come out of the box. In this article, we'll look at Elasticsearch's Geo queries, how you can set up mappings and indices, and provide you with some example of how you can query your data.

While Elasticsearch may not be your first choice when searching through your GeoData, its developers have been improving on its capabilities since 1.x by adding and enhancing querying features. For this article, we'll be using version 2.4, but you should be able to use these queries from 2.x on.

Elasticsearch and GeoData

Elasticsearch allows you to represent GeoData in two ways: geo_shape and geo_point.

Geo Point allows you to store data as latitude and longitude coordinate pairs. Use this field type when you want to filter data for distances between points, search within bounding boxes, or when using aggregations. There are a lot of features and options that you can specify which are beyond the scope of this article. We'll cover a couple here, but you can view the options for Geo Bounding Box, Geo Distance, and Geo Aggregations in Elasticsearch's documentation.

Use Geo-Shape when you have GeoData that represents a shape, or when you want to query points within a shape. geo_shape data must be encoded in GeoJSON format which is converted into strings representing long/lat coordinate pairs on a grid of Geohash cells. Since Elasticsearch indexes shapes as terms, it's simple for it to determine the relationships between shapes, which can be queried using intersects, disjoint, contains, or within query spatial relation operators.

Unfortunately, geo-point and geo-shape cannot be queried together. For example, if you want to get all the cities within a specified polygon, you cannot use cities that are indexed with geo-point. They must be indexed using a "type": "Point" in GeoJSON and indexed as geo-shape. We'll see how this works later. However, note that you must determine how you'll query your data prior to indexing it in Elasticsearch otherwise you'll end up remapping and reindexing your data.

The Data and Conversion to Usable GeoJSON

The data that we'll be using for this walkthrough is taken from the Washington State Department of Transportation (WSDOT) GeoData Catalogue. Download the shapefiles for "City Points" and "WSDOT Regions 24k". City Points will give us the cities in Washington, while WSDOT Regions will provide us with regions designated by WSDOT. You can view the data before downloading by clicking View next to their download link.

We'll also be using data from Washington State's Office of Financial Management, which has census geographic files that provide us with geographic coordinates for county boundaries and cities. Download the file for "Counties".

The unzipped file contain all of the necessary files needed with a shapefile. Since Elasticsearch does not use shapefiles, we'll have to convert it to GeoJSON. A simple way to convert shapefiles to GeoJSON is to use the GDAL's ogr2ogr, which is a command line program that converts geographic data files from one format to another. We recommend downloading GDAL using homebrew brew install gdal. Once GDAL has been downloaded, you can use ogr2ogr on the command line.

To convert the "City Points" shapefile to GeoJSON, enter the following command into a terminal:

ogr2ogr -f "GeoJSON" /destination/file.json -t_srs "EPSG:4326" /path/to/shapefile/shapefile.shp  

Here, we write ogr2ogr to start the program, use the -f switch to indicate the file format "GeoJSON" along with the new file destination, use the -t_srs switch to indicate the encoding of our GeoJSON file, and then provide the path of the shape file. There are other options that you can define that are located in ogr2ogr's documentation, but the ones used above will suit our needs. Once we run the command, you'll see the GeoJSON file you defined location.

The output of the GeoJSON file will look similar to the following:

{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },

"features": [
{ "type": "Feature", "properties": { "OBJECTID": 1, "NAME": "Sumas", "CountySeat": null, "GNIS": 2412000, "LastUpdate": "2009\/08\/31", "MajorCity": null, "CountyFIPS": 73, "CityFIPS": "5368330WA" }, "geometry": { "type": "Point", "coordinates": [ -122.264923557847354, 49.00004692672551 ] } },
...
{ "type": "Feature", "properties": { "OBJECTID": 281, "NAME": "Tacoma", "CountySeat": "yes", "GNIS": 2412025, "LastUpdate": "2006\/08\/31", "MajorCity": "yes", "CountyFIPS": 53, "CityFIPS": "5370000WA" }, "geometry": { "type": "Point", "coordinates": [ -122.440097136359299, 47.253172271414293 ] } }
]
}

One of the first issues you may notice with GeoJSON is that the "coordinates" key is defined as lon/lat rather than lat/lon as we specified for geo_point above. The specifications for GeoJSON require coordinates to be in lon/lat, and since the coordinates are in an array, geo_point requires the same coordinate format lon/lat. Therefore, we won't have to reformat coordinates if we are indexing using geo_point or geo_shape.

On the other hand, the GeoJSON file will have to be modified a little to use in Elasticsearch. An easy way to do this is to delete the following, since we don't need it:

{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },

"features": 

} // end of the file

What we'll be left with is an array of documents:

[
{ "type": "Feature", "properties": { "OBJECTID": 1, "NAME": "Sumas", "CountySeat": null, "GNIS": 2412000, "LastUpdate": "2009\/08\/31", "MajorCity": null, "CountyFIPS": 73, "CityFIPS": "5368330WA" }, "geometry": { "type": "Point", "coordinates": [ -122.264923557847354, 49.00004692672551 ] } },
...
{ "type": "Feature", "properties": { "OBJECTID": 281, "NAME": "Tacoma", "CountySeat": "yes", "GNIS": 2412025, "LastUpdate": "2006\/08\/31", "MajorCity": "yes", "CountyFIPS": 53, "CityFIPS": "5370000WA" }, "geometry": { "type": "Point", "coordinates": [ -122.440097136359299, 47.253172271414293 ] } }
]

You will want to format the file further depending on whether you want all of the GeoJSON contents within your Elasticsearch index. For this article, we will modify the file slightly by only using the "NAME", "OBJECTID", and "geometry" keys. A code sample can be found in the repository here.

Mappings

Mappings of both geo_point and geo_shape is fairly straight-forward, but there are a couple differences that you should be aware of. When defining a mapping in Elasticsearch using geo_point, you do not have to include the type of shape your longitude and latitude coordinates. What is necessary is that you have the correct lat/lon order, otherwise Elasticsearch will give you an error, except if your coordinates are in an array, then coordinates should be in lon/lat.

The following formats are acceptable for the geo_point type:

{
    "kind": "Object",
    "location": {
        "lat": 48.3,
        "lon": -117.3
    }
}

{
    "kind": "String",
    "location": "48.3, -117.3"
}

{
    "kind": "Array",
    "location": [-117.3, 48.3]
}

The geo_shape mapping is different in that it only accepts GeoJSON formatted data and that you must include "type" and "coordinates" within the GeoJSON "geometry" object to tell Elasticsearch which type of shape it's indexing.

An example of this type of data is the following:

{
    "kind": "GeoJSON",
    "geometry": {
        "type": "point",
        "coordinates": [-117.3, 48.3]
    }
}

{
    "kind": "GeoJSON",
    "geometry": {
        "type": "polygon",
        "coordinates": [[-117.323, 48.312], ... [-117.315, 48.319]]
    }
}

Before indexing geo-points or geo-shapes you must define the mapping beforehand since their fields are not dynamically mapped. Mappings for both datatypes look like the following:

{
  "mappings": {
    "cities": {
      "properties": {
        "name": {
          "type": "string"
        },
        "geometry": {
          "type": "geo_point" // or "geo_shape"
        }
      }
    }
  }
}

By using geo_point or geo_shape, Elasticsearch will automatically find the coordinates, validate them according to the needed format, and index them.

We'll be using a similar mapping for the downloaded data. To create the mapping, we're using NodeJS and the elasticsearch client. If you're not familiar with NodeJS and Elasticsearch see a good primer on setting up, building and deploying an application in our five-part series "Getting started with Elasticsearch and Node.js".

What you'll first need to do is create an index and call the index "wa_cities". Compose's Elasticsearch browser allows you to do this easily. From your deployment, click on the Browser button on the sidebar, which will take you to the browser page. Then click the Create Index button on the browser and insert the name of the index then click Run.

After setting up the index, we can write the code for setting up the mapping and run it node mapping.js from the terminal.

const elasticsearch = require('elasticsearch'),  
    client = new elasticsearch.Client({
        hosts: [
            'https://[username]:[password]@[server]:[port]/',
        'https://[username]:[password]@[server]:[port]/'
        ]
});

client.indices.putMapping({  
  index: 'wa_cities_points',
  type: 'cities',
  body: {
    properties: {
      "location": {
        "type": "geo_point",
      },
      "name": {
        "type": "string"
      }
    }
  }
}, (err, resp, status) => {
    if (err) throw err;
    console.log(resp);
});

Once the code has been run, you'll see {"acknowledged": true} returned indicating that the mapping was successfully created. You'll want to create two more mappings using {"type": "geo_shape"} for the "County" and again for the "City Point" data for when we look at Geo Queries.

Once your mappings have been created, you can insert the data using the _bulk API. All of the code to create the mappings, modify the data, and insert it into an index has been provided in the example repository here. You can modify it accordingly.

Geo Queries

Elasticsearch uses the terms queries and filters. Querying relies on "scoring", or if and how well a document matches the query. Filtering, on the other hand, is "non-scoring" and determines if the document matches a query. According to Elasticsearch, as of 2.x querying and filtering have become synonymous in that you can have queries that are both scoring and non-scoring. There are various performance benefits and drawbacks to using scoring or non-scoring queries, but the rule-of-thumb is to use scoring queries when a relevance score is important, and non-scoring queries for everything else.

The queries that we will look at here will focus on some of the basic queries that you can do with Elasticsearch. We'll look at aggregations and Geohashes in an upcoming supplement.

Since we have some data in our indices, it's time to start querying. We'll look at some of the basic queries that we can use for geo_point and geo_shape.

The queries available for Elasticsearch 2.x are geo_shape, geo_bounding_box, geo_distance, geo_distance_range, geo_polygon, and geohash_cell. As of Elasticsearch 5.x, geohash_cell has been deprecated.

Distance with Geo Point

To get the distance between any two points, our data must be stored using the geo_point type. The documentation provides various data formats as examples. But, since our data is stored as an array that conforms to GeoJSON, our query would look something like the following:

{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "10mi",
                    "location" : 
                    [-122.3375,47.6112] // Seattle
                }
            }
        }
    }
}

This query asks Elasticsearch to look for all matching points within a radius of 10 miles of the "location" you provide. Here, our location is Seattle so we're searching for all cities within a 10-mile radius. If you want to use other distance units other than miles, see the documentation for acceptable units of measurement.

The output of this query will look similar to the following, giving us twelve documents:

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "84",
      "_score" : 1.0,
      "_source" : {
        "name" : "Bainbridge Island",
        "location" : [ -122.52083338339754, 47.62471310583139 ]
      }
    }, {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "108",
      "_score" : 1.0,
      "_source" : {
        "name" : "Mercer Island",
        "location" : [ -122.23504918395818, 47.58665906645245 ]
      }
    },
...
Distance Range with Geo Point

Instead of locating cities within a radius from a point of origin, you can also set the start and end distances, forming a donut shape instead of a circle. The query is similar to geo_distance except that define distances within "from" and "to".

{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance_range" : {
                    "from" : "10mi",
                    "to": "12mi",
                    "location" : 
                    [-122.3375, 47.6112] // Seattle
                }
            }
        }
    }
}

With this query, we're asking Elasticsearch to start at 10 miles from "location" and include cities between 10 to 12 miles only.

{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "123",
      "_score" : 1.0,
      "_source" : {
        "name" : "Normandy Park",
        "location" : [ -122.33965429615549, 47.43995693292799 ]
      }
    }, {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "70",
      "_score" : 1.0,
      "_source" : {
        "name" : "Bothell",
        "location" : [ -122.20559409874588, 47.76009486395827 ]
      }
...
Geo Polygon with Geo Point

Queries don't have to specify radii, but you can also define polygons as search areas. Here's an example:

{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "location" : {
                        "points": [
                      [
                        -122.35610961914062,
                        47.70514099299205
                      ],
                      [
                        -122.48519897460936,
                        47.5626274374099
                      ],
                      [
                        -122.28744506835938,
                        47.44852243794931
                      ],
                      [
                        -122.15972900390624,
                        47.558920607496525
                      ],
                      [
                        -122.2283935546875,
                        47.719001413201916
                      ],
                      [
                        -122.35610961914062,
                        47.70514099299205
                      ]
                    ]

                     }
                 }
             }
         }
  }
}

On a map the polygon looks like this:

The result of the query will give us eight locations within the polygon:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "108",
      "_score" : 1.0,
      "_source" : {
        "name" : "Mercer Island",
        "location" : [ -122.23504918395818, 47.58665906645245 ]
      }
    }, {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "110",
      "_score" : 1.0,
      "_source" : {
        "name" : "Beaux Arts",
        "location" : [ -122.19852431727838, 47.58511084550324 ]
      }
    }
...

Elasticsearch mentions that while this query is available, it's expensive, and geo-shapes should be used instead.

Geo Bounding Box with Geo Point

In this query, you will define a box (top, bottom, left, right) and search for points within the box.

{
  "query": {
    "filtered": {
      "filter": {
        "geo_bounding_box": {
          "location": { 
            "top_left": {
              "lat":  47.7328,
              "lon": -122.448
            },
            "bottom_right": {
              "lat":  47.4680,
              "lon": -122.0924
            }
          }
        }
      }
    }
  }
}

The results of the query provide us with twelve results since it's a large box.

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "108",
      "_score" : 1.0,
      "_source" : {
        "name" : "Mercer Island",
        "location" : [ -122.23504918395818, 47.58665906645245 ]
      }
    }, {
      "_index" : "wa_cities_points",
      "_type" : "cities",
      "_id" : "110",
      "_score" : 1.0,
      "_source" : {
        "name" : "Beaux Arts",
        "location" : [ -122.19852431727838, 47.58511084550324 ]
      }
    },
...

Geo-Shape Queries

All geo-shape queries require your data to be mapped using the geo_shape mapping. This is why we asked you to create two indices for cities using geo_point and geo_shape. Using geo-shapes we can find documents that intersect with the query shape.

What's nice about geo-shape queries is that you do not have to define all of the coordinates of the shape. What does this mean? Elasticsearch will allow you to reference a pre-indexed shape in another index, or provide the entire coordinates of a shape within the query.

Querying using a user defined shape is similar to querying using geo_point. If we want to get all the points within a specified radius we can use the following query:

{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "circle",
          "radius": "10mi", 
          "coordinates": [
           -122.33, 47.61
          ]
        }
      }
    }
  }
}

Here we only provide the starting point of the coordinates, which corresponds to Seattle, WA. Using the geo_shape query, we tell the query to look at the location field and provide the type of shape (circle), how wide the radius of the circle is (10 miles), and provide the point of origin (coordinates). This produces the same results as our geo_point query above.

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_shapes",
      "_type" : "cities",
      "_id" : "84",
      "_score" : 1.0,
      "_source" : {
        "name" : "Bainbridge Island",
        "location" : {
          "type" : "Point",
          "coordinates" : [ -122.52083338339754, 47.62471310583139 ]
        }
      }
    }
...

One of the useful features of geo_shape is being able to use pre-indexed shapes. When using pre-indexed shapes, we don't have to insert a shape's coordinates in the query, but we only have to refer to a shape's index, type, and id. The query looks like the following:

{
  "query": {
    "geo_shape": {
      "location": {
        "indexed_shape": {
          "index": "wa_counties",
          "type":  "county",
          "id":    "King",
          "path":  "location"
        }
      }
    }
  }
}

Here, we tell the query to look in the location field, but this time we use index_shape instead of shape so that Elasticsearch retrieves the shape from a specified index and id. In this example, we use the wa_counties index, the name of our index containing the County data, our mapping type is county, and we want all the points within "id": "King", which we specified when inserting the data into the index. We also mention the path of the county document coordinates, which is location like out cities data.

Running this query gives us 38 cities within King County.

{
  "took" : 100,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 38,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_shapes",
      "_type" : "cities",
      "_id" : "98",
      "_score" : 1.0,
      "_source" : {
        "name" : "Sammamish",
        "location" : {
          "type" : "Point",
          "coordinates" : [ -122.03554939749195, 47.61655151942234 ]
        }
      }
    }
...

We can add more to this query by defining another field called relation, which allows us to add spatial relation operators: intersects, disjoint, within, or contains. A handy guide to these is located here. Te default value is intersects which in our case will give us all the cities within and on the border of our county. If we use a relation like disjoint, all the cities outside of King County will be counted.

{
  "query": {
    "geo_shape": {
      "location": { 
        "relation": "disjoint",
        "indexed_shape": {
          "index": "wa_counties",
          "type":  "county",
          "id":    "King",
          "path":  "location"
        }
      }
    }
  }
}'

This will result in 243 cities to be returned. In total, there are 281 cities that we indexed, so 281-243 = 38 (the cities in King County).

{
  "took" : 56,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 243,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "wa_cities_shapes",
      "_type" : "cities",
      "_id" : "14",
      "_score" : 1.0,
      "_source" : {
        "name" : "Marcus",
        "location" : {
          "type" : "Point",
          "coordinates" : [ -118.06261699119396, 48.6635062250126 ]
        }
      }
    }

Summing up

So, we've given you an overview of the Geo querying capabilities of Elasticsearch and looked at the basics of how to set up your mappings and do some basic querying. In the next installment, we'll look at Elasticsearch aggregations and GeoData further, and show you how to build an application that will allow you to see your GeoData on a map.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.