GeoFile: Converting and importing shapefiles for Compose PostgreSQL and MongoDB


GeoFile is a series dedicated to looking at geographical data, its features and its uses. In this article, we discuss how to convert shapefiles into SQL and GeoJSON and import them into Compose PostgreSQL and MongoDB.

When viewing geographical data on GIS software, you are likely viewing data produced by a shapefile. A shapefile is a vector data format used for storing data that references geographical objects. These files must be converted into a data format that your database can read before it is stored and queried.

Here, we will discuss several tools that will enable you to convert shapefiles to SQL and GeoJSON to work with your Compose PostgresSQL and MongoDB databases.

A shapefile is commonly downloaded as a single .zip file that, once unzipped, contains three mandatory files with the prefixes .shp, .dbf, and .shx. The .shp file contains the geography data, which includes points, lines, and polygons. The .dbf file (or dBase table) contains non-geographic features and attributes that describes the data. And the .shx file contains indices of the record sets in the .shp file for quicker lookups. Other files can be included within the shapefile, which can be found here.

Converting to SQL and importing into Compose PostgreSQL

Importing shapefiles into your Compose PostgreSQL deployment is made easier by the command-line tool shp2psql that comes with the PostGIS extension. To use the tool, you must add the PostGIS, fuzzystrmatch and postgistigergeocoder extensions to your Compose PostgreSQL database. This can be done in the Compose-UI by clicking on your database > browser > extensions. You can also connect to your Compose PostgreSQL deployment via the terminal and add the extensions via the SQL command:

CREATE EXTENSION extension_name;  

Now, we just need to download a shapefile that we want to use. Here, I am using the Washington State Counties TIGER shapefile from the Washington State Office of Financial Management (WSOFM). This data contains the various shapes that make up the different counties in Washington State.

Using the command line tool shp2psql, we will convert the shapefile to a .sql file so that we can import it into our database. shp2psql provides us with many options to convert our data into any required format, which can be viewed on this cheatsheet.

For this demo, we convert the shapefile using the following command:

shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county  

Let's break down what we did. We start with the shp2pgsql command then add the -s switch to indicate the spacial reference system of the source we are using. The US Census and WSOFM use NAD83, which corresponds to the North American geodetic network, and has the EPSG projection code and PostGIS SRID number 4269. We should include this number to tell the program that our shapefile uses this EPSG code. The -g switch sets the geometry or geography column name in the table created by shp2pgsql, which I've named geom_4269. The -I switch creates a spatial index on the geometry column, which can be included now, or you can index your table later. The -W switch sets the encoding to Latin1 (or ISO-8859-1) based on what the .dbf file is encoded in. The file name "county10" refers to the file we are importing, and county is the name we are assigning the table once it's created.

The output is dumped into a .sql file and can be immediately imported into any Compose PostgreSQL database. You can also save the file to a location on your computer and import it later. I have provided the commands to run both below.

Convert and saved on your computer
shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county > county.sql  
Convert and imported to Compose PostgreSQL
shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county | psql "sslmode= require port=9675309 dbname=shapefile_demo user=admin"  

Converting to GeoJSON and importing into Compose MongoDB

Converting shapefiles into GeoJSON is done using a command-line tool called ogr2ogr, from the GDAL library, which can be downloaded here. The tool allows you to convert shapefiles into many different formats. Here, we will convert our shapefile into GeoJSON so that MongoDB, or any database that can read GeoJSON, is able about to read it. However, if you wanted to convert your GeoJSON file into a shapefile, the tool can do that as well.

In order to convert our shapefile to GeoJSON, we run the following in our terminal:

ogr2ogr -f "GeoJSON" county.json "county10.shp" -t_srs EPSG:4326 -lco COORDINATE_PRECISION=7  

We will explain this command as well. We start with ogr2ogr and add the -f switch to indicate the file type that we are converting our data into, GeoJSON. The filename county.json is the name of our output file. You can name it whatever you like. The file county10.shp in quotations is the name of our source file, our shapefile. The -t_srs switch with EPSG:4326 is reprojecting the file from EPSG 4269 to EPSG 4326 because MongoDB only supports this geospatial reference currently. The -lco switch (layer creation option) allows us to control other features of the document we are creating. There are several features that are available, but we will use the COORDINATE_PRECISION option. This sets the number of decimal places after the decimal point on the longitude and latitude coordinates. In this example, we are setting the option to seven for a more accurate reading. If you do not set COORDINATE_PRECISION to a value, your GeoJSON will not come out as expected and it will set your longitude and latitude coordinates to numbers like 28377973, which will not be valid.

Once our data has been exported into a JSON file, we can clean the data up a little by removing the following lines at the top of the file:

{ "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [...]} 

We will keep all of the other generated data within the array. This allows us to import separate documents into our collection versus storing all our data inside a single document. After that, we can import our data with the following command:

mongoimport -h -p mypassword -u someone -d shapefile_demo county.json --jsonArray  

We tell MongoDB to import from our host with our username and password credentials. We also provide the database name shapefile_demo with the JSON file we want to import. Then we tell MongoDB to save each item in the JSON array as a single document. This takes care of the issue of saving the entire JSON file as one document.

For more a more in-depth discussion about GDAL's ogr2ogr, read this.

All together now

So, we've taken shapefiles and changed them into formats that can be imported into your Compose Postgres and MongoDB deployments. All you need to do is use a GIS software package to view your data. You can use a number of popular GIS software packages like OpenJUMP or QGIS to connect to you Compose deployments and view the counties.

Next time we will look at what EPSG codes are and what they mean for your geographic data.

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.