GeoFile: Using OpenStreetMap Data in Compose PostgreSQL - Part II

Published

GeoFile is a series dedicated to looking at geographical data, its features, and uses. In today's article, we're continuing our examination of OpenStreetMap data and walking through how to incorporate other data sources. We'll also look at using PostGIS to filter our data and to find places that are within or intersect a chosen polygon.

In the last GeoFile article, we looked at how to import OpenStreetMap (OSM) data into Compose PostgreSQL and ran some queries to get the most popular cuisines in Seattle. We found that coffee shops were the most popular places in the city, and we provided a top ten list of which coffee companies have the most branches in Seattle. In this article, we'll be using the same OSM data in conjunction with the Seattle Police Department's 911 call data. We'll show you how to create tables and store this data in PostgreSQL using Sequelize, a Node.js ORM for relational databases. Then we'll look at locations, areas, and reasons for 911 calls using PostGIS and then viewing them all using OpenJUMP, an open-source GIS tool.

Let's look at Sequelize and import some data into our PostgreSQL deployment ...

Sequelize Me

Sequelize is a Node.js ORM that works with a number of relational databases out of the box. For our use case, Sequelize makes it easy to perform CRUD operations and create models for our data. In particular, for the data model that we'll be creating, it comes with a geometry data type that works well with GeoJSON and PostgreSQL.

What we'll be doing with Sequelize is creating a table called emergency_calls and inserting GeoJSON documents from the SPD 911 call API. The information that we'll be gathering from the API is the incident id, longitude, latitude, event_clearance_group, and event_description. The event_clearance_group and event_description provide us with details about each 911 call incident.

In addition to Sequelize, we'll be using the request Node.js library. This library will allow us to gather the GeoJSON documents from the SPD 911 call API and will help us insert the documents into PostgreSQL one at a time.

To install Sequelize and request, we'll write the following in our terminal using NPM.

npm install sequelize request --save  

After installing the packages, let's create a file called 911data.js. Within the file, we'll first require both the request and sequelize libraries we installed with NPM.

const request = require('request');  
const Sequelize = require('sequelize');  

We then set up a variable url with the URL of the API and append to the URL $$app_token and include a custom token from data.seattle.gov. You'll have to apply for a token in order to not have download limits on your data. Next, we'll use Sequelize's $offset and $limit functions to limit the number of records we'll import to our database since there are more than 1.3 million records in the SPD 911 calls dataset. In order to get the latest 911 calls, we'll offset our data by 1.3 million rows and limit our data to only the last 100,000 rows.

const url = "https://data.seattle.gov/resource/pu5n-trf4.geojson?$$app_token=your_token&$offset=1300000&$limit=100000";  

After that, we'll initialize a database connection using our Compose PostgreSQL connection string located on the Overview page under Credentials. At the end of the connection string, we'll change the database name from compose to osm since we're inserting the 911 call records into a table located within the OSM database.

const sequelize = new Sequelize("postgres://admin:mypass@aws-us-west-4-portal.0.dblayer.com:25223/osm");  

Once that's done, we can set up a Sequelize model for our data. For this example, we'll keep it simple and only get the id and the event_group and event_description, which categorize and describe each 911 call. We'll also set up a column called geom that will automatically process our GeoJSON longitude and latitude coordinates into a PostGIS geometry object. Sequelize does this by using the PostGIS function ST_GeomFromGeoJSON behind the scenes. Since we're using GeoJSON data, the PostGIS geometry object coordinate reference system will be set to SRID 4236, but the geom column set up by Sequelize will have an SRID set to 0. We'll have to change this once we've inserted our data since SRID 4236 will not work with OSM data since it uses a different SRID - this is discussed further below.

To set up a model for our data, we'll first define the model using Sequelize's define method. The first argument that the method takes is the table name we want to create. For our use case, the table will be named emergency_calls. The second argument of the function is an object that contains the data type, field name, and other constraints we want to put on our columns such as defining primary keys and allowing null values.

const EmergencyCalls = sequelize.define('emergency_calls', {  
    id: {
        type: Sequelize.STRING,
        field: 'id',
        primaryKey: true
    },
    eventGroup: {
        type: Sequelize.TEXT,
        field: 'event_clearance_group'
    },
    eventDescription: {
        type: Sequelize.TEXT,
        field: 'description'
    },
    geom: {
        type: Sequelize.GEOMETRY('POINT'),
        field: 'geom',
        allowNull: false
    }
});

The field is the name we assign to the PostgreSQL table column. The type is the data type we assign to the column using any appropriate Sequelize's data type. For columns what will store data as a string, we'll use Sequelize's STRING data type. For the geom column, Sequelize has a GEOMETRY data type that allows us to assign the type of geometry to a column. In our use case, it's "POINT" since the GeoJSON geometry type is also "POINT". Sequelize also allows us to assign the GEOMETRY data type a second parameter that is a SRID number. However, since our GeoJSON data does not contain information regarding the SRID, if we define the SRID of the column and our data doesn't match when inserting a record, we'll receive an error. Therefore, to solve this problem we will not set the SRID of the column initially and we'll go back and manually change the column later using PostGIS.

Now that we have the model set up, we can initialize EmergencyCalls which will create the table. Here, we'll append the sync method to create the PostgreSQL table in the database and use the force option to drop the table if it exists.

EmergencyCalls  
    .sync({ force: true })
    .then(() => {})
    .catch(err => {
        console.log(err.message);
    });

Once we have the model set up and the table created, we can start importing the data into the table. We'll do this using the request library which takes a URL and a callback that contains the GeoJSON data in the body. We'll assign a variable called data that parses the GeoJSON data using the JSON.parse method. Then we'll take the data and iterate over the GeoJSON features array:

request(url, (err, res, body) => {  
    let data = JSON.parse(body);
    let jsonFeatures = data.features;
    for (let i = 0; i < jsonFeatures.length; i++) {
        ...
    }
});

Within the for-loop, we'll use the Sequelize create method to insert each 911 call record into our database. We'll only select the necessary information from the GeoJSON "Properties" and "Geometry" objects and put the results into keys we created from our Sequelize model.

EmergencyCalls.create({  
    id: jsonFeatures[i].properties.cad_cdw_id,
    eventGroup: jsonFeatures[i].properties.event_clearance_group,
    eventDescription: jsonFeatures[i].properties.event_clearance_description,
    geom: jsonFeatures[i].geometry
});

The full request looks like:

request(url, (err, res, body) => {  
    let data = JSON.parse(body);
    let jsonFeatures = data.features;
    for (let i = 0; i < jsonFeatures.length; i++) {
        EmergencyCalls.create({
            id: jsonFeatures[i].properties.cad_cdw_id,
            eventGroup: jsonFeatures[i].properties.event_clearance_group,
            eventDescription: jsonFeatures[i].properties.event_clearance_description,
            geom: jsonFeatures[i].geometry
        });
    }
});

Once we have the code set up, just run node 911data.js and we'll see the table set up and all of our data being logged in the terminal window. After the data has been inserted, let's see what it looks like in PostgreSQL by logging into our OSM database.

Our 911 Data and OSM with PostGIS

Once we've logged into our PostgreSQL deployment and connected to our OSM database, we can view what Sequelize has inserted into our emergency_calls table. Using a SELECT query our table will contain documents that look something like this:

SELECT * FROM emergency_calls LIMIT 1;  

Running the query gives us:

  id   |  event_clearance_group   |    description     |                    geom                    |         createdAt          |         updatedAt          
-------+--------------------------+--------------------+--------------------------------------------+----------------------------+----------------------------
 89778 | SUSPICIOUS CIRCUMSTANCES | SUSPICIOUS VEHICLE | 0101000000C58EC6A17E935EC040852348A5CC4740 | 2017-03-29 20:48:10.812+00 | 2017-03-29 20:48:10.812+00

Notice that the fields that we defined and the data have been created and inserted along with two other timestamp fields created by Sequelize. These timestamps show when a record has been inserted and updated in the table. If you don't want timestamps to be added, just add timestamps: false inside the Sequelize model.

To view the data type of each column run \d emergency_calls.

                Table "public.emergency_calls"
        Column         |           Type           | Modifiers 
-----------------------+--------------------------+-----------
 id                    | character varying(255)   | not null
 event_clearance_group | text                     | 
 description           | text                     | 
 geom                  | geometry(Point)          | not null
 createdAt             | timestamp with time zone | not null
 updatedAt             | timestamp with time zone | not null
Indexes:  
    "emergency_calls_pkey" PRIMARY KEY, btree (id)

Here we can see that the geom column has been assigned a geometry data type without an SRID even though there are geometry objects inserted in the column. Since our geom column contains GeoJSON data in the form of a geometry object, the coordinates of the data are automatically calculated using SRID 4326 even though the column doesn't have an SRID defined. If we decided to project the geom data onto our OSM map, however, the geom points would not align with the map because OSM uses SRID 3857. So how do we solve this issue?

To solve the issue we'll have to transform our geom column to SRID 3857. To do this, we'll update the column by first setting the SRID of each value to SRID 4326 by using PostGIS's ST_SetSRID function. We'll then transform each value to SRID 3857 using PostGIS's ST_Transform function. This can be done by writing the following SQL statement:

UPDATE emergency_calls SET geom = ST_Transform(ST_SetSRID(geom,4326),3857);  

After that's completed, we'll create an index on the geom column:

CREATE INDEX idx_emergency_calls ON emergency_calls USING GIST (geom);  

Using OpenJUMP, we now can view each of the points on the Seattle OSM map.

Now that we have some 911 call points, an OSM map, and other OSM data, let's do some querying ...

PostGIS Queries

In the last article, we looked at some restaurant data using OSM's hstore data column to find the top ten cuisines in Seattle. We then found out that coffee was the most popular "cuisine" and found the top ten coffee shops in the city. Let's take a closer look at the map this time by focusing on one particular area of Seattle called Capitol Hill.

We'll start out by selecting the area and getting its coordinates.

A useful tool to draw and get the coordinates of a polygon on a map is geojson.io. All we have to do is zoom on Seattle and draw a polygon around the area we want. It will automatically provide us with the coordinates of the polygon in GeoJSON in the right sidebar.

Once we have the coordinates, we can use PostGIS's function ST_GeomFromText to define the shape type, its coordinates, and the SRID. In this case, since the coordinates derive from a GeoJSON object, the default SRID is 4326. This is what the function with our coordinates will look like:

ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326)  

The value returned from ST_GeomFromText will have to be transformed so that it can be viewed correctly on the OSM map like the geom column coordinates in our emergency_calls table. To do that, we'll again use the ST_Transform function and assign it SRID 3857:

ST_Transform(ST_GeomFromText(..., 4326), 3857)  

Once we have this setup, we can use OpenJUMP and select Run Datastore Query from the File menu. Once we select Run Datastore Query, it will open a window to create a new map layer. We first select or create a new connection to our OSM database. Then we type in the name of our layer and then write the SQL query:

SELECT ST_Transform(ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326), 3857)  
FROM planet_osm_polygon;  

Running this query we will see the polygon appear on the map.

To get all the restaurants and their names that are only within the polygon, we'll use the PostGIS function ST_Contains, which selects only objects that are contained within a defined geometry. A simple way to understand how this function works is to view it as:

ST_Contains(shape_to_search_in, objects_to_find_in_the_shape)  

Therefore, when we write this query to find all the OSM points inside the polygon, we'd write:

SELECT amenity, name, way  
FROM planet_osm_point  
WHERE  
   ST_Contains(ST_Transform(ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326), 3857), 
way) AND amenity = 'restaurant'  
GROUP BY amenity, name, way;  

Within ST_Contains, the first geometry we add is the polygon that we want to search within. In this case, it's the same polygon that we created and transformed to SRID 3857. The second geometry way is the geometry column of planet_osm_point which are the OSM points we want to return if they are contained inside the polygon. Additionally, we select only the restaurant amenities so that we only get the restaurants and filter out the other points. Running this query provides us with five results:

  amenity   |         name          |                        way                         
------------+-----------------------+----------------------------------------------------
 restaurant | 611 Supreme           | 0101000020110F000014AE4719F1F869C185EB51B86C0D5741
 restaurant | Bill's Off Broadway   | 0101000020110F00008FC2F5E0DBF869C1C3F528EC6B0D5741
 restaurant | Fogan Cocina Mexicana | 0101000020110F00003333339BF6F869C114AE47D1760D5741
 restaurant | Raygun Lounge         | 0101000020110F000048E17A7402F969C11F85EB716B0D5741
 restaurant | Yo! Zushi             | 0101000020110F00006666660EC3F869C1295C8FD2870D5741

Using our 911 call data, we could create a more complex example using ST_Contains to show the number of 911 calls that took place near these restaurants.

To so that, what we'd want to show is the name of each restaurant, the reason for the 911 call, and the distance between the 911 call event and the restaurant. Another constraint that we might add is that the 911 call has to be within a radius of 30 meters from the restaurant. This will filter out calls that are further away. This query would look similar to the following:

SELECT p.name AS restaurant, e.event_clearance_group AS activity, ST_Distance(p.way, e.geom) AS distance  
FROM planet_osm_point AS p, emergency_calls AS e  
WHERE  
    p.amenity = 'restaurant' AND
    e.event_clearance_group > '' AND 
    ST_Contains(ST_Transform(ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326), 3857),p.way) AND 
    ST_Contains(ST_Transform(ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326), 3857), e.geom) AND
    ST_DWithin(p.way, e.geom,30) 
ORDER BY p.name;  

Running the query gives us 71 results like:

      restaurant       |         activity         |     distance     
-----------------------+--------------------------+------------------
 611 Supreme           | ASSAULTS                 | 21.6197191748894
 ...
 Fogan Cocina Mexicana | MENTAL HEALTH            | 29.6527188363329
 ...
 Raygun Lounge         | SUSPICIOUS CIRCUMSTANCES | 28.8634577346799
 ...
 Yo! Zushi             | FALSE ALARMS             | 27.4377722361284

The first thing to notice is that we have two ST_Contains functions being used. Each one indicates that the restaurant and the 911 call should be contained within the polygon. What's also noticeable is the other PostGIS queries that we added: ST_Distance and ST_DWithin. ST_Distance provides the distance between one geometry and the other. In this case, it shows us the distance between the 911 call and a restaurant. The function ST_DWithin returns true if two geometries are within a specified distance of each other. So, above, we are indicating in the WHERE clause that each restaurant and 911 call have to be within 30 meters of each other.

Another interesting function provided by PostGIS is ST_Instersects, which is useful for when you want to see what shapes intersect with another. This function is helpful especially when we want to find roads that intersect with our polygon. Like the function ST_Contains, this function first takes the geometry of the polygon that we are searching within, and then the geometry column of the table that contains the shapes we want if they intersect with our polygon. A query selecting all the roads that interest with our polygon would look like the following:

SELECT name, way  
FROM planet_osm_roads  
WHERE  
  ST_Intersects(ST_Transform(ST_GeomFromText('POLYGON((-122.32576847076416 47.61471249278791, -122.32044696807861 47.61471249278791, -122.32044696807861 47.6179525278143, -122.32576847076416 47.6179525278143, -122.32576847076416 47.61471249278791))', 4326), 3857), 
way)  
GROUP BY name, way;  

We first provide the coordinates of the polygon that we transformed. Then we provide the geometry column way from the planet_osm_roads table, which contains the geometries of all the roads on our map. When running the query, we'll find that there are five streets that intersect the polygon.

               name                
-----------------------------------
 Broadway
 University Link Northbound
 East Pine Street
 University Link Southbound
 Seattle Streetcar First Hill Line

So Much More ...

In this article, we looked at how to import and use another dataset with our OSM data. In addition, we looked at using PostGIS functions in order to modify our new dataset in order to work with OSM and to select only a portion of that data to query. While we haven't covered all of PostGIS's capabilities, this basic overview will help you start combining your own data with OSM and start using PostGIS and PostgreSQL for all your GIS needs.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attributionStephen Monroe

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page and keep reading.