Geofile: Getting Started with pgRouting

Published

If you need to plan your UK castle tour but don’t know how to start, let us show you how PostgreSQL, PostGIS and pgRouting on Compose can get you started on this and any other geo-routing tasks you may need your database to perform.

When we brought PostgreSQL 9.6 onto Compose, we also took the opportunity to add the pgRouting extension to expand the geospatial capabilities of the database. In this article, we'll show you how to get started using pgRouting by getting the shortest paths between castles in the UK.

While PostGIS is an extension for PostgreSQL, pgRouting can be viewed as an extension for PostGIS. What pgRouting does is extend PostGIS's functionality by providing you with several functions based on popular network algorithms to solve problems like the shortest path and the traveling salesperson problem.

In order to do that, however, pgRouting must have a valid topology. Take for example a road network. Each road segment is joined together at intersections, which we'll call "nodes". In order for pgRouting to work, each node must be connected to a road segment so that it can figure out how to get from one node to the next. pgRouting helps us do that using a few functions create network topologies and inspect their validity.

In addition to topology functions, pgRouting gives us various routing functions that we can use to find the shortest path between nodes. In this article we'll only look at two of them pgr_dijkstra, or Dijkstra's algorithm and pgr_ksp, or Yen's algorithm.

Before using these pgRouting functions, we'll give you a short introduction to setting up a dataset so that it's capable of using pgRouting because having only a data set of geographic points is not enough. First, we'll create a network of nodes using the locations of castles in the UK and connect them together. These connections are called "edges". From there, we'll let pgRouting do all the heavy lifting by analyzing our routing topology and computing the shortest routes from one node to the next.

To show you how this is done, let's first work on a sample data set of castles in the UK where we'll create the nodes and edges manually. In the next article, we'll move on to using an ESRI shapefile that has nodes and edges already established and how to make sure that it has a valid topology.

Getting set up

You'll need a Compose PostgreSQL 9.6 database up and running first. Then, add the PostGIS and pgRouting extensions from the Compose console, the SQL Query editor, or in the PostgreSQL shell. To do that, we'll create the PostGIS and pgRouting extensions together using the CASCADE parameter for CREATE EXTENSION that comes with PostgreSQL 9.6. It will automatically download all the extensions the pgRouting depends on:

CREATE EXTENSION pgrouting CASCADE; -- adds pgRouting and PostGIS 2.2.5  
SELECT upgrade_postgis_23x(); -- upgrade PostGIS  

Now that we have the extensions set up, let's create a table to hold some data. For this exercise, our table will contain a list of the names and coordinates of twelve popular castles in the UK based on the The Telegraph’s twelve best castles in the UK. So, the table will contain three columns named id, name, and geom. geom is defined as a geometry data type containing a POINT object since each castle is a point on the map.

CREATE TABLE castles(  
    id SERIAL PRIMARY KEY,
    name VARCHAR(75),
    geom GEOMETRY(POINT)
);

Next, we'll insert the castles into the table. Notice that we use the PostGIS functions ST_SetSRID and ST_Point. We'll have to use ST_Point so that PostGIS can process the coordinates, then we'll set each coordinate to the Esri code WGS 84, or SRID 4326. For more information on Esri and SRIDs, see the article on spatial reference systems and databases.

insert into castles (name, geom)  
values  
('Edinburgh Castle', ST_SetSRID(ST_Point(-3.200833, 55.948611), 4326)),
('Dover Castle', ST_SetSRID(ST_Point(1.3214, 51.1297), 4326)),
('Tower of London', ST_SetSRID(ST_Point(-0.076111, 51.508056), 4326)),
('Warwick Castle', ST_SetSRID(ST_Point(-1.584722, 52.279167), 4326)),
('Caernarfon Castle', ST_SetSRID(ST_Point(-4.2769, 53.1393), 4326)),
('Stirling Castle', ST_SetSRID(ST_Point(-3.948889, 56.123889), 4326)),
('Carrickfergus Castle', ST_SetSRID(ST_Point(-5.806446, 54.713314), 4326)),
('Cardiff Castle', ST_SetSRID(St_Point(-3.1811, 51.4824), 4326)),
('York Castle', ST_SetSRID(ST_Point(-1.08, 53.9558), 4326)),
('Leeds Castle', ST_SetSRID(ST_Point(0.628889, 51.248056), 4326)),
('Lancaster Castle', ST_SetSRID(ST_Point(-2.80562, 54.04981), 4326)),
('Arundel Castle', ST_SetSRID(ST_Point(-0.553611, 50.856111), 4326));

The output of this table will look something like the following:

 id |         name         |                        geom                        
----+----------------------+----------------------------------------------------
  1 | Edinburgh Castle     | 0101000020E610000019A9F7544E9B09C01CD0D2156CF94B40
  2 | Dover Castle         | 0101000020E61000004DF38E537424F53F462575029A904940
  3 | Tower of London      | 0101000020E6100000BDA8DDAF027CB3BFAF44A0FA07C14940
  4 | Warwick Castle       | 0101000020E61000000307B474055BF9BFAC8F87BEBB234A40
  5 | Caernarfon Castle    | 0101000020E6100000B30C71AC8B1B11C0992A1895D4914A40
  6 | Stirling Castle      | 0101000020E6100000ED45B41D53970FC0C5AA4198DB0F4C40
  7 | Carrickfergus Castle | 0101000020E6100000C7F5EFFACC3917C0B4E386DF4D5B4B40
  8 | Cardiff Castle       | 0101000020E6100000DE718A8EE47209C092CB7F48BFBD4940
  9 | York Castle          | 0101000020E610000048E17A14AE47F1BF27C286A757FA4A40
 10 | Leeds Castle         | 0101000020E6100000780DFAD2DB1FE43FCDC98B4CC09F4940
 11 | Lancaster Castle     | 0101000020E6100000350708E6E87106C0C381902C60064B40
 12 | Arundel Castle       | 0101000020E61000005F96766A2EB7E1BF785F950B956D4940

Now that we have castles and their locations, we can set up a table that will connect them all together called castle_routes.

SELECT DISTINCT ON  
    (LEAST(X.name || Y.name, Y.name || X.name)) 
    dense_rank()
        OVER(ORDER BY LEAST(X.name || Y.name, Y.name || X.name))::INTEGER AS id,
    X.name AS name_1, 
    Y.name AS name_2,
    ST_MakeLine(X.geom, Y.geom)::GEOMETRY(LineString,4326) AS geom,
    ROUND(ST_Distance(X.geom::GEOGRAPHY, Y.geom::GEOGRAPHY)/1000) AS distance,
    X.id AS source, 
    Y.id AS target
INTO  
    castle_routes
FROM  
    castles X CROSS JOIN castles Y
WHERE  
    X.name <> Y.name;

What we're doing above is creating a new table called castle_routes that contains edges that connect each castle to one another, but not to themselves. Using the PostgreSQL window function dense_rank the id column of our edges table will create a sequence of edges. We define the origin and destination names of each edge with X.name and Y.name. The geom column contains the line string we created to connect each castle one another. The line strings are also set to SRID 4326, the same as our castle coordinates, using the geometry typemod construct. The distance column contains the length of the edge, which is the distance between one castle and another in kilometers. To get the distance in kilometers, we'll have to cast our geometry type data to geography. Why? Because the geography data type calculates distances in meters, unlike the geometry data type which uses degrees. Then we'll use the ids from the castles table to populate the source and target columns, which indicate the beginning and end nodes of each edge, respectively.

When this SQL query is executed, we'll get a table containing 66 edges like the following (we've kept the geom column out for brevity):

 id |      name_1       |        name_2        | distance | source | target 
----+-------------------+----------------------+----------+--------+--------
  1 | Caernarfon Castle | Arundel Castle       |      360 |      5 |     12
  2 | Arundel Castle    | Cardiff Castle       |      197 |     12 |      8
  3 | Arundel Castle    | Carrickfergus Castle |      556 |     12 |      7
  4 | Arundel Castle    | Dover Castle         |      135 |     12 |      2
  5 | Arundel Castle    | Edinburgh Castle     |      593 |     12 |      1
  6 | Lancaster Castle  | Arundel Castle       |      387 |     11 |     12
  7 | Leeds Castle      | Arundel Castle       |       94 |     10 |     12
  8 | Arundel Castle    | Stirling Castle      |      628 |     12 |      6
  9 | Tower of London   | Arundel Castle       |       80 |      3 |     12
 10 | Warwick Castle    | Arundel Castle       |      174 |      4 |     12
...

Since we're creating our own map that has routes connecting from one castle to the next, we'll have to reconstruct the table of nodes (our castles) based on the edges table we've created (castle_routes). To do that, use the pgRouting function pgr_createVerticesTable using the castle_routes as the table name, the geometry column geom, and the source and target column names.

select pgr_createVerticesTable('castle_routes','geom','source','target');  

The only name the function requires is the edges table name castle_routes. If you omit geom, the function will look for the_geom as the geometry column name; therefore, it must be defined. source and target also are default names that the function looks for, but for showcasing how the function works, they've been added.

The output of running this function will show the process of creating a vertices table with containing an id and the_geom column with the geometries of each castle location. The output will show that it has created twelve vertices from the 66 edges, which corresponds to the number of castles we have.

NOTICE:  PROCESSING:  
NOTICE:  pgr_createVerticesTable('castle_routes','geom','source','target','true')  
NOTICE:  Performing checks, please wait .....  
NOTICE:  Populating public.castle_routes_vertices_pgr, please wait...  
NOTICE:    ----->   VERTICES TABLE CREATED WITH  12 VERTICES  
NOTICE:                                         FOR   66  EDGES  
NOTICE:    Edges with NULL geometry,source or target: 0  
NOTICE:                              Edges processed: 66  
NOTICE:  Vertices table for table public.castle_routes is: public.castle_routes_vertices_pgr  
NOTICE:  ----------------------------------------------  
 pgr_createverticestable 
-------------------------
 OK

From this a new table called castle_routes_vertices_pgr is created that looks like:

 id | cnt | chk | ein | eout |                      the_geom                      
----+-----+-----+-----+------+----------------------------------------------------
  1 |     |     |     |      | 0101000020E610000019A9F7544E9B09C01CD0D2156CF94B40
  2 |     |     |     |      | 0101000020E61000004DF38E537424F53F462575029A904940
  3 |     |     |     |      | 0101000020E6100000BDA8DDAF027CB3BFAF44A0FA07C14940
  4 |     |     |     |      | 0101000020E61000000307B474055BF9BFAC8F87BEBB234A40
  5 |     |     |     |      | 0101000020E6100000B30C71AC8B1B11C0992A1895D4914A40
  6 |     |     |     |      | 0101000020E6100000ED45B41D53970FC0C5AA4198DB0F4C40
  7 |     |     |     |      | 0101000020E6100000C7F5EFFACC3917C0B4E386DF4D5B4B40
  8 |     |     |     |      | 0101000020E6100000DE718A8EE47209C092CB7F48BFBD4940
  9 |     |     |     |      | 0101000020E610000048E17A14AE47F1BF27C286A757FA4A40
 10 |     |     |     |      | 0101000020E6100000780DFAD2DB1FE43FCDC98B4CC09F4940
 11 |     |     |     |      | 0101000020E6100000350708E6E87106C0C381902C60064B40
 12 |     |     |     |      | 0101000020E61000005F96766A2EB7E1BF785F950B956D4940

That's all that's needed now for the general set up of the routes. Creating your own points and generating the edges is not really necessary if you're using a shapefile or OpenStreetMap data. But, it's always best to understand how routes are generated if you decide to generate your own routes from your maps.

Getting the Shortest Path

Now that we have the routes prepared, we can start finding out the optimal path to get from one castle to the next. To do that, pgRouting provides various functions that'll calculate the distance and cost of each route, but we'll focus on perhaps the most basic shortest path function pgr_dijkstra.

To use the function to calculate the distance from the Tower of London to Stirling Castle we'd write something like:

SELECT  
    p.seq, p.node, p.cost, r.geom as edge_geom, c.name
FROM  
    pgr_dijkstra(
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'), 
        (SELECT id FROM castles WHERE name = 'Stirling Castle'), 
        FALSE
    ) AS p
    LEFT JOIN castle_routes AS r ON p.edge = r.id
    LEFT JOIN castles AS c ON p.node = c.id
ORDER BY  
    p.seq;

The first argument of the function takes a string which needs the id, source, target and cost columns from the edge table. Since the castle_routes table doesn't include a cost column, we're assigning cost to the distance column. The second and third arguments are the source and target columns for the route. We can either hard code an id, or if we don't remember the id of the castle, we can select the id of the castle we want using an SQL query like above with a WHERE filter. The fourth argument we provided is set to FALSE since we're using an undirected graph. If you experiment with the function and set it to TRUE, the function will expect another column in castle_routes called reverse_cost that shows the distance in reverse, which we don't need here. For maps with road networks or OpenStreetMap data, you'll usually find the reverse cost added in the data they give you.

The output from this query will give us the node (castle ids), the cost (distance) of 573 kilometers, the castle names, and the edge geometry connecting both castles.

 seq | node | cost |                                         edge_geom                                          |      name       
-----+------+------+--------------------------------------------------------------------------------------------+-----------------
   1 |    3 |  573 | 0102000020E610000002000000ED45B41D53970FC0C5AA4198DB0F4C40BDA8DDAF027CB3BFAF44A0FA07C14940 | Tower of London
   2 |    6 |    0 |                                                                                            | Stirling Castle

It's well known that shortest path between two points is a straight line. Selecting the shortest path becomes more interesting, however, if we select other castles we'd like to visit in between the Tower of London and Stirling Castle. Let's say that we want to determine what the cost of going to Stirling Castle and visiting other castles is along the way. For that, we'd use the pgRouting function pgr_ksp. Unlike pgr_dijkstra, it will give you multiple paths to your destination that we can choose from.

Below is an example that will calculate five of the shortest paths between the Tower of London and Stirling Castle. You can substitute the number 5 in the SQL query with any number to get a different number of paths.

SELECT x.path_id, x.path_seq, COALESCE(s.name || ' - ' || t.name, 'Total Trip') as route,  
CASE  
    WHEN edge = -1 THEN agg_cost ELSE NULL END AS "total_cost(distance)"
FROM  
    pgr_ksp(
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'),
        (SELECT id FROM castles WHERE name = 'Stirling Castle'),
        5,
        directed := TRUE
    ) as x
    LEFT JOIN castle_routes AS r ON x.edge = r.id
    LEFT JOIN castles AS s ON r.source = s.id
    LEFT JOIN castles AS t ON r.target = t.id
ORDER BY  
    x.path_id, x.path_seq;

Using the pgr_ksp function we return a path_id that groups the rows containing the same route together. The path_seq column gives us the sequence of the route (i.e. 1, 2, ...). For the route column we'll show the names of the origin and destination of each route segment with "Total Trip" as the last row of each route. The last column is total_cost which gives us the total aggregated distance (agg_cost) of a route. We probably don't want the distance for each segment of a route, so those have been taken out and only the aggregated distance is shown next to the "Total Trip".

The function pgr_ksp takes as the first argument the same SQL query string to our castle_routes table. Also, we select the origin and destination of our castles in the next two arguments. The number of routes that you want is the next argument. We've chosen five, but you can experiment with as many as you'd like. Finally, the last argument we added is directed := TRUE, which means that the route goes from the Tower of London to Stirling Castle. If this is set to FALSE, then we'll still go to Stirling Castle, but Stirling Castle might not be the end of the route.

Executing this query, we'll get five of routes with the total distance of each.

path_id | path_seq |                 route                  | total_cost  
---------+----------+----------------------------------------+------------
       1 |        1 | Tower of London - Warwick Castle       |           
       1 |        2 | Warwick Castle - Stirling Castle       |           
       1 |        3 | Total Trip                             |        590
       2 |        1 | Tower of London - Carrickfergus Castle |           
       2 |        2 | Carrickfergus Castle - Stirling Castle |           
       2 |        3 | Total Trip                             |        720
       3 |        1 | Tower of London - Leeds Castle         |           
       3 |        2 | Leeds Castle - Stirling Castle         |           
       3 |        3 | Total Trip                             |        678
       4 |        1 | Tower of London - Lancaster Castle     |           
       4 |        2 | Lancaster Castle - Stirling Castle     |           
       4 |        3 | Total Trip                             |        579
       5 |        1 | Tower of London - Arundel Castle       |           
       5 |        2 | Arundel Castle - Stirling Castle       |           
       5 |        3 | Total Trip                             |        708

Summing up

We’ve introduced some of the basics of pgRouting in this article: nodes, edges, and creating paths between them with pgr_dijkstra and pgr_ksp. Next time we'll use ESRI shapefiles to see what else pgRouting can do.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution lydia harper

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.