GeoFile: Converting and importing shapefiles for Compose PostgreSQL and MongoDB
PublishedGeoFile is a series dedicated to looking at geographical data, its features, and its uses. In this article, we discuss how to convert shapefiles into SQL and GeoJSON and import them into Compose PostgreSQL and MongoDB.
When viewing geographical data on GIS software, you are likely viewing data produced by a shapefile. A shapefile is a vector data format used for storing data that references geographical objects. These files must be converted into a data format that your database can read before it is stored and queried.
Here, we will discuss several tools that will enable you to convert shapefiles to SQL and GeoJSON to work with your Compose PostgresSQL and MongoDB databases.
A shapefile is commonly downloaded as a single .zip
file that, once unzipped, contains three mandatory files with the prefixes .shp
, .dbf
, and .shx
. The .shp
file contains the geography data, which includes points, lines, and polygons. The .dbf
file (or dBase table) contains non-geographic features and attributes that describes the data. And the .shx
file contains indices of the record sets in the .shp
file for quicker lookups. Other files can be included within the shapefile, which can be found here.
Converting to SQL and importing into Compose PostgreSQL
Importing shapefiles into your Compose PostgreSQL deployment is made easier by the command-line tool shp2psql
that comes with the PostGIS extension. To use the tool, you must add the PostGIS, fuzzystrmatch and postgistigergeocoder extensions to your Compose PostgreSQL database. This can be done in the Compose-UI by clicking on your database > browser > extensions. You can also connect to your Compose PostgreSQL deployment via the terminal and add the extensions via the SQL command:
CREATE EXTENSION extension_name;
Now, we just need to download a shapefile that we want to use. Here, I am using the Washington State Counties TIGER shapefile from the Washington State Office of Financial Management (WSOFM). This data contains the various shapes that make up the different counties in Washington State.
Using the command line tool shp2psql
, we will convert the shapefile to a .sql
file so that we can import it into our database. shp2psql
provides us with many options to convert our data into any required format, which can be viewed on this cheatsheet.
For this demo, we convert the shapefile using the following command:
shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county
Let's break down what we did. We start with the shp2pgsql
command then add the -s
switch to indicate the spacial reference system of the source we are using. The US Census and WSOFM use NAD83, which corresponds to the North American geodetic network, and has the EPSG projection code and PostGIS SRID number 4269. We should include this number to tell the program that our shapefile uses this EPSG code. The -g
switch sets the geometry or geography column name in the table created by shp2pgsql
, which I've named geom_4269. The -I
switch creates a spatial index on the geometry column, which can be included now, or you can index your table later. The -W
switch sets the encoding to Latin1 (or ISO-8859-1) based on what the .dbf
file is encoded in. The file name "county10"
refers to the file we are importing, and county
is the name we are assigning the table once it's created.
The output is dumped into a .sql
file and can be immediately imported into any Compose PostgreSQL database. You can also save the file to a location on your computer and import it later. I have provided the commands to run both below.
Convert and saved on your computer
shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county > county.sql
Convert and imported to Compose PostgreSQL
shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county | psql "sslmode= require host=aws-us-east-1-portal.99.dbplayer.com port=9675309 dbname=shapefile_demo user=admin"
Converting to GeoJSON and importing into Compose MongoDB
Converting shapefiles into GeoJSON is done using a command-line tool called ogr2ogr, from the GDAL library, which can be downloaded here. The tool allows you to convert shapefiles into many different formats. Here, we will convert our shapefile into GeoJSON so that MongoDB, or any database that can read GeoJSON, is able about to read it. However, if you wanted to convert your GeoJSON file into a shapefile, the tool can do that as well.
In order to convert our shapefile to GeoJSON, we run the following in our terminal:
ogr2ogr -f "GeoJSON" county.json "county10.shp" -t_srs EPSG:4326 -lco COORDINATE_PRECISION=7
We will explain this command as well. We start with ogr2ogr
and add the -f
switch to indicate the file type that we are converting our data into, GeoJSON. The filename county.json
is the name of our output file. You can name it whatever you like. The file county10.shp
in quotations is the name of our source file, our shapefile. The -t_srs
switch with EPSG:4326
is reprojecting the file from EPSG 4269 to EPSG 4326 because MongoDB only supports this geospatial reference currently. The -lco
switch (layer creation option) allows us to control other features of the document we are creating. There are several features that are available, but we will use the COORDINATE_PRECISION
option. This sets the number of decimal places after the decimal point on the longitude and latitude coordinates. In this example, we are setting the option to seven for a more accurate reading. If you do not set COORDINATE_PRECISION
to a value, your GeoJSON will not come out as expected and it will set your longitude and latitude coordinates to numbers like 28377973
, which will not be valid.
Once our data has been exported into a JSON file, we can clean the data up a little by removing the following lines at the top of the file:
{ "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [...]}
We will keep all of the other generated data within the array. This allows us to import separate documents into our collection versus storing all our data inside a single document. After that, we can import our data with the following command:
mongoimport -h aws-us-east-1-portal.99.dblayer.com:8675309 -p mypassword -u someone -d shapefile_demo county.json --jsonArray
We tell MongoDB to import from our host with our username and password credentials. We also provide the database name shapefile_demo
with the JSON file we want to import. Then we tell MongoDB to save each item in the JSON array as a single document. This takes care of the issue of saving the entire JSON file as one document.
For more a more in-depth discussion about GDAL's ogr2ogr, read this.
All together now
So, we've taken shapefiles and changed them into formats that can be imported into your Compose Postgres and MongoDB deployments. All you need to do is use a GIS software package to view your data. You can use a number of popular GIS software packages like OpenJUMP or QGIS to connect to you Compose deployments and view the counties.
Next time we will look at what EPSG codes are and what they mean for your geographic data.