GeoFile: Using OpenStreetMap Data in Compose PostgreSQL - Part I

Published

GeoFile is a series dedicated to looking at geographical data, its features, and uses. In today's article, we're going to introduce you to OpenStreeMap data, import that data into a Compose PostgreSQL deployment, and make queries on some of the non-conventional data stored in hstore columns.

OpenStreetMap (OSM) is one of the most recognized and popular community-driven, open data sources for maps worldwide. If you're going to do anything that happens to use maps, more than likely you'll run into it and use it, or know someone who does. Most web and mobile applications use it because of its reliability since it's continuously updated by a vast number of contributors.

This article comprises part one of a two-part series on OSM data and Compose PostgreSQL. In this article, we'll be taking a look at how to import OSM data into a Compose PostgreSQL database. Also, we'll show you some of the OSM data features, and how to query city amenities like restaurants. As we query restaurants, we'll find the type of cuisine they serve, which is information that is only found in the hstore column.

Let's get some OSM ...

Importing OSM data

For our examples, we'll be using OSM data of Seattle, WA. On OSM's website, you can search for Seattle in the search bar at the top left of the screen. Once Seattle's on the screen, we'll export the map.

Since OSM provides us with a map of more than just the city of Seattle, we'll want to zoom in on Seattle. We'll zoom in because OSM will export what's shown on the screen, not just Seattle. Therefore, let's zoom in a little to get just the city in our browser window.

Once we're happy with what we see on the screen, we can export the map by pressing the Export button at the top of the window. It will give you several options to export data.

If we click on the blue Export button, we'll receive an error in our browser. This error happens when we are trying to download an area that contains too many nodes (> 50000). By running CURL in a terminal to reproduce the error, we'll see the following message appear:

curl -I "http://www.openstreetmap.org/api/0.6/map?bbox=-122.3819%2C47.5763%2C-122.2677%2C47.6267"  
...
Error: You requested too many nodes (limit is 50000). Either request a smaller area, or use planet.osm  
...

If we were to zoom in on a specific area of Seattle, we could successfully export a portion of the city. However, since we're interested in exporting a map of the entire city, go ahead and press the Overpass API link which gives us more nodes and will automatically start downloading a file called map that includes the entire city.

You will need to add .osm to the map file in order to import it to PostgreSQL. To import OSM data, we'll use the command line tool osm2pgsql.

osm2pgsql is a command line tool that transforms OSM data into a format that can be saved into PostgreSQL. If you're using MacOS, you can download it using Homebrew brew install osm2pgsql. For other operating systems, osm2pgsql provides installation instructions on their Github repository.

Once osm2pgsql has been installed, we need to create a database. You can create a database via the terminal after logging into your Compose PostgreSQL deployment, or from the Compose Browser. We'll use the terminal which requires only the command line string provided in your deployment's Connection info panel. Once we're logged in, create a database called osm by typing:

CREATE DATABASE osm;  

Ater that, connect to the database using \c osm;, and then install the PostGIS and HStore extensions using:

CREATE EXTENSION postgis;  
CREATE EXTENSION hstore;  

We'll need PostGIS installed to successfully import OSM data since it uses geometry data that PostGIS requires. If you don't install the PostGIS extension and try to import data, you'll receive an error. We'll also install the hstore extension which will be used to store and retrieve non-standardized data that doesn't fit into a column. This data is inserted as key value pairs in a column named tags.

Once the database and extensions have been set up, we can use osm2pgsql in our terminal like:

osm2pgsql -U admin -W -d osm -H aws-us-west-1-portal.3.dblayer.com -P 17124 --hstore --hstore-add-index map.osm  

Start out with the osm2pgsql command and add -U with your database username (usually admin) and -W indicating that you need a password prompt. Next, add -d and your database name (here osm). The -H option is the deployment's hostname and -P is the port number. We've added the --hstore to create tags columns for each table that contains the supplemental non-standardized data and the --hstore-add-index option sets up indexes on those columns. Finally, we add the map.osm file that we downloaded when we exported the Seattle map.

After running the command, enter the deployment password then we'll see our data processed in the terminal and imported to the osm PostgreSQL database creating tables and indexes for primary key and tags columns. After the data has been processed and imported, log into the PostgreSQL deployment and connect to the database.

Now, let's look at some of the tables that have been created. When listing the tables using the \d command, we should see a list of tables like:

              List of relations
 Schema |        Name        | Type  | Owner 
--------+--------------------+-------+-------
 public | geography_columns  | view  | admin
 public | geometry_columns   | view  | admin
 public | planet_osm_line    | table | admin
 public | planet_osm_point   | table | admin
 public | planet_osm_polygon | table | admin
 public | planet_osm_roads   | table | admin
 public | raster_columns     | view  | admin
 public | raster_overviews   | view  | admin
 public | spatial_ref_sys    | table | admin

The tables that we are mostly concerned about are those that begin with planet_osm_*. These are the tables that we'll use when setting up our queries. They contain the necessary geometry data that is used to view the map on a GIS client. Here's a breakdown of what each of these tables contains:

So now that our data has been imported and indexes have been created, let's see what it looks like and make some queries ...

Querying OSM data

Loading our data from PostgreSQL into a GIS mapping program will provide us with a visual representation of our OSM data, which will look similar to this:

As we can see, the map is made up of the geodata that was exported by OSM that includes streets, structures, and location points. The abrupt cut off from the sides of the map is due to us only exporting the portion of the OSM map that we needed which was visible in the browser window.

Now that we have a map, let's query some of the data to find all of the restaurants within our Seattle map. We'll find various types of facilities like restaurants, banks, and schools in the amenity column of our tables. For these queries, we'll be looking at the planet_osm_point table that will provide us with the exact location of restaurants and other facilities throughout the city. To get all the restaurants from the table, the SQL would look like the following:

SELECT name, count(name) FROM planet_osm_point WHERE amenity = 'restaurant' GROUP BY name ORDER BY count DESC;  

This will give us a list of about 644 restaurants and the number of branches they have in the city. If we refine this query a little more, we can select only those restaurants that have three or more branches.

SELECT name, count(name) as number FROM planet_osm_point WHERE amenity = 'restaurant' GROUP BY name HAVING count(name) >= 3 ORDER BY name ASC;  

This refines our data and gives us five restaurants that have three branches.

       name        | number 
-------------------+--------
 Blue Moon Burgers |      3
 Cactus            |      3
 MOD Pizza         |      3
 Pho Than Brothers |      3
 Via Tribunali     |      3

The names of restaurants, however, are not very helpful since we don't know what type of cuisine they serve. So how do we get this data? This is where our hstore data comes in handy. Non-standard information like "cuisine" is stored in the tags column we set up when importing our data to PostgreSQL. To get the type of cuisine for each of these restaurants, we could write the query like:

SELECT name, tags->'cuisine' as cuisine FROM planet_osm_point WHERE amenity = 'restaurant' AND name IN ('Blue Moon Burgers', 'Cactus', 'MOD Pizza', 'Pho Than Brothers', 'Via Tribunali') GROUP BY name, tags ORDER BY name ASC;  

This query will retrieve the "cuisine" key for each restaurant, which will give us:

       name        |  cuisine   
-------------------+------------
 Blue Moon Burgers | burger
 Blue Moon Burgers | burger
 Blue Moon Burgers | 
 Cactus            | mexican
 Cactus            | mexican
 Cactus            | mexican
 MOD Pizza         | pizza
 MOD Pizza         | pizza
 MOD Pizza         | pizza
 Pho Than Brothers | vietnamese
 Pho Than Brothers | vietnamese
 Pho Than Brothers | vietnamese
 Via Tribunali     | pizza
 Via Tribunali     | pizza
 Via Tribunali     | 

What's noticeable is that two restaurants don't have values in the cuisine column. This is perhaps one of the drawbacks of using OSM data since it can be inconsistent at times. However, community members can update the missing data and update their map as needed.

Since we're interested in looking at cuisines. Let's look at the top 10 most popular cuisines in Seattle. To do that we'd run the following SQL query, which will group the number of restaurants that serve the same type of cuisine.

SELECT DISTINCT tags->'cuisine' as cuisine, count(name) FROM planet_osm_point WHERE tags ? 'cuisine' GROUP BY cuisine ORDER BY count DESC LIMIT 10;  

Here we're selecting the restaurants that have a "cuisine" key and then grouping and counting all of those restaurants according to the cuisine they serve. This query produces the following results:

   cuisine   | count 
-------------+-------
 coffee_shop |   171
 sandwich    |    73
 mexican     |    65
 pizza       |    58
 american    |    42
 italian     |    37
 vietnamese  |    37
 thai        |    34
 chinese     |    33
 japanese    |    32

Overwhelmingly, Seattlies love their coffee, followed by sandwiches, mexican food and pizza. However, we can also see the strong Asian influence in Seattle with Vietnamese, Thai, Chinese, and Japanese food all very popular throughout the city. On the map, it shows us that the largest concentration of these cuisines is served downtown.

To make this data a little more interesting, we might want to see what coffee shops dominate Seattle. To view the top 10 coffee shops in the city, we'd run the following SQL query:

SELECT name, count(name)FROM planet_osm_point WHERE tags->'cuisine' IN ('coffee_shop') GROUP BY name ORDER BY count DESC LIMIT 10;  

This will group the names of all the coffee shops that have "coffee_shop" as their cuisine. Then it will give us the total number of branches that they have in the city, and give us the top 10 results. The result will look like:

            name            | count 
----------------------------+-------
 Starbucks                  |    47
 Tully's Coffee             |    11
 Cherry Street Coffee House |     7
 Uptown Espresso            |     5
 Caffe Ladro                |     5
 Storyville Coffee          |     3
 Voxx Coffee                |     2
 Stumptown Coffee           |     2
 Specialty's                |     2
 Caffe Vita                 |     2

So, Starbucks wins as expected. However, it's also surprising that there are at least 95 different coffee shops throughout the city.

Onward ...

OSM data is perhaps the most popular GIS data that's used by organizations all over the world. In this article, we showed you how to select data from OSM, export it, import it to your PostgreSQL deployment, query some of the data, and show what it looks like on a map. For the next installment of GeoFile, we'll be working more with our Seattle dataset and use PostGIS functions to narrow down some of our queries as well as add and transform external datasets on our OSM data.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attributionStephen Monroe

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.