GeoFile: Spatial Reference Systems and Databases

GeoFile is a series dedicated to looking at geographical data, its features and its uses. In this article, we discuss spatial reference systems, what EPSG codes are, and how databases handle and transform them.

Having and knowing what the right spatial reference system is for your GIS data could be the difference between having the right map or the wrong map when your data points are projected. While you may believe that a map is a map, whether it's flat or a sphere, map spatial reference systems don't work that way. Knowing what spatial reference systems are, their differences, and how to use data from different sources will help you make better decisions when pulling out data and storing it your databases.

What are Spatial Reference Systems?

Spatial reference systems (SRS) consist of components that describe a series of geographic parameters, such as the orientation, latitude, longitude, and elevation in reference to geographic objects, which define coordinate systems and spatial properties on a map.

The underlying assumption of spatial reference systems is that the Earth is a geoid. Since it would be difficult to precisely calculate the Earth as a geoid, we use the next best shape, an ellipsoid (or a flattened sphere; also known as a spheroid). Now we need to select point, a spatial reference point or anchor point, on the ellipsoid for a frame of reference. We call this point a geodetic datum.

http://www.icsm.gov.au/mapping/datums2.html

Image from ICSM

The point of origin on a map is an easy way to visualize geodetic datum (Point P). It indicates the center and orientation of the ellipsoid. It includes a description of the position and orientation of the ellipsoid. It is made up of an equatorial radius (semi-major axis) and a polar radius (semi-minor axis) (the dotted lines running from N to the equator on image above). These lines are then calculated, producing a flattening measurement that measures the compression relative to the equatorial axis, providing the shape of the ellipsoid. The datum also has a prime meridian that is set to zero longitude (the solid line running from N to the equator). This is usually set to the Greenwich prime meridian; however, it might differ if using an older, or localized datum for a particular area or region.

Don't worry about calculating geodetic datum, unless you need to create your own. Most of the datum that you will use are predefined, so it isn't necessary to get out your calculator and compass to create your own.

Now that we have a datum, we can use a geographical coordinate reference system (or geodetic coordinate system) to provide longitude and latitude coordinates on the ellipsoid. It's important to note that longitude and latitude coordinates depend on the datum used, but their values are not unique to to any particular datum. It is important to note that if you do not know the datum being used, your coordinate system could be off by 1 meter to several hundred meters. Therefore, the consequences of not knowing the datum could pose significant problems.

The last component of the SRS is a projection. Projection refers to taking the Earth as a 3D ellipsoid and squashing it onto a 2D flat surface. There are many types of projections but they all fall on a Cartesian coordinate system and depend on the geographic coordinate reference system used. Choosing which projection to use depends on several factors such as measurement, shape, direction, and range, and each has its tradeoffs. The most common type of projections are conic, cylindrical, and azimuthal/planar. These are classified into different flavors such as Mercator, Lambert Azimuthal Equal Area, Lambert Conformal Conic, Universal Trans Mercator (UTM), national grid systems, state plane, and geodetic. For a description of these, see here.

Just to review what we've covered. A spacial reference system (SRS) is made up of a an ellipsoid, geodetic datum, and a geographic coordinate reference system with an associated projection. Often, when working with SRSs you will find that they are referred to by a number following the acronym EPSG. These are predefined SRSs with unique IDs, which are recognized and used throughout the GIS industry. Let's get to know them better.

EPSG

When working with databases or GIS libraries, you will see the number 4326 referred to a lot. It's full name, EPSG 4326, is a unique SRS identification number developed by the European Petroleum Survey Group, or EPSG. You will also find that EPSG 4326 is referred to as WGS 84. WGS is the World Geodetic System which is a standardized geodetic system developed in 1984. What makes EPSG 4326/WGS 84 well-known is that it is used by the US Department of Defense, NATO, and Global Positioning Systems (GPS).

Identification numbers like 4326 refer to a standardized collection of SRSs and coordinate transformations. These numbers have been archived and can be viewed in the Geodetic Parameter Registry. Below is a snapshot of EPSG 4326 from the registry. When we look at 4326, we notice that it comprises two main features: geodetic datum and ellipsoidal coordinate system (or geodetic coordinate system).

View here for full record

As mentioned previously, geodetic coordinate systems are the latitude and longitude points derived from geodetic datum. Geodetic datum refer to a set of points, or anchors, where survey measurements are based. There are two types of datum: global or local. Some localized datum can be more accurate than datum that cover larger areas since they concentrate on one area. We will discuss two of them.

The most recognized datum are WGS 84 (covering the entire world) and NAD 83 (covering only North America). The anchor for WGS 84 is placed at the center of the Earth, while NAD 83 has its anchor placed at the center of North America, lying in the middle of northern Canada. From the registry we can see that although both datum use the same semi-major axis, or radius, they have slightly different flattening calculations. The differences in the calculations are based on the locations where each takes measurements from. NAD 83 uses the North American Plate as a reference, which can change by up to 2 cm per year. WGS 84 does not change, since it takes reference points from all over the Earth.

One tool that will help you to understand ESPG references is ESPG.io. It is a visualization tool that gives you information about datum and coordinate systems and shows you the locations they cover on a map.

Databases and SRSs

When considering databases and geodata, you must understand how your database works with SRSs.

GeoJSON is used by MongoDB and Elasticsearch. By default, GeoJSON is set to WGS 84 (EPSG 4326). According to the IETF standards for GeoJSON, WGS 84 (EPSG 4326) is the only supported datum for GeoJSON. MongoDB follows the GeoJSON standard, but is discussing the possibility of supporting other datum in the future.

PostGIS, the PostgreSQL extension for GIS data, has also set its default to EPSG 4326. But, if you have data that needs to be converted into a different SRS, then PostGIS provides you with the ST_Transform function. This function enables you to take a point, an area, a line, or whatever can be expressed in longitude and latitude coordinates, and transform it to any SRS you require by providing a new EPSG number.

One important note is that PostGIS often refers to EPSG numbers as SRIDs (spacial reference identification number). You don't have to worry about them too much because they are synonymous.

To show you an example of how ST_Tranform works, let's create a table called my_geometry and set the column geom with our data point to EPSG 4269 (NAD 83).

CREATE TABLE my_geometry(  
  gid serial primary key,
  geom geometry(POINT, 4269),
  name text not null
);

Then, we insert a data point using the ST_GeomFromText function. This function allows us to define a point, line, or polygon as a string, and set it to any SRID (spacial reference identification number, or EPSG number). In this case our SRID is EPSG 4269. If we don't set the inserted point to EPSG 4269, we will get a data compatibility error because the value returned from ST_GeomFromText will be in EPSG 4326 and our table requires ESPG 4269.

INSERT INTO my_geometry (geom, name)  
VALUES (  
  ST_GeomFromText('POINT(1 -1)', 4269),
  'My Point'
);

The result will produce the following table that sets the POINT to its Well-Known Text representation.

gid |                        geom                        |   name  
-----+----------------------------------------------------+----------
   1 | 0101000020AD100000000000000000F03F000000000000F0BF | My Point

To transform the geom result to ESPG 4326, we'll use the ST_Transform function. This will allow us to and indicate the SRID that we want the geom to be transferred to.

SELECT  
  geom as original, 
  ST_Transform(geom, 4326) as new 
FROM my_geometry;

This produces the following table that provides us with both the original and the new geom results, which show the geom in EPSG 4269 and 4326, respectively.

                   original                      |                        new                         
----------------------------------------------------+----------------------------------------------------
 0101000020AD100000000000000000F03F000000000000F0BF | 0101000020E6100000000000000000F03F000000000000F0BF

Transforming your data with ST_Transform, however, is not without a caveat. If you do a lot of data processing, using the function could produce floating-point errors; therefore, it's recommended that you only use it once instead of retransforming data back and forth a lot.

So, what's the word?

In this article we covered what spatial referencing systems are, what they are composed of, and how EPSG numbers provide a nice referencing system to be able to talk about spatial systems. Additionally, we looked at how PostGIS allows you to transform your geographic data from one spatial system to another, and learned that it's best to transform your data as little as possible.

Next time, we will delve into some practical examples using PostGIS with many more code samples.