What PostgreSQL has over other open source SQL databases: Part II

PostgreSQL's tag line claims that it's "The world's most advanced open source database." In Part I of this series we looked at storing the data - the model, structures, types and size limits - to give you a few reasons why PostgreSQL makes this claim. In Part II here, we'll look at data manipulation and retrieval, including indexing, virtual table features and query capabilities. For this series, we looked at what makes PostgreSQL stand out from other open source SQL databases, namely MySQL, MariaDB and Firebird.

Indexing

PostgreSQL offers index capabilities that other open source databases don't have. PostgreSQL supports partial, expression, GiST and GIN indexes, among more standard ones. Let's look at some of these specialized indexes.

Partial Indexes

Partial indexes can be created when you only want to index a subset of a table, such as only rows where a column value meets a particular condition. This advantageous feature lets you keep your indexes reasonably sized and targeted to improve performance and reduce size on disk. A key aspect of partial indexes is that the column being indexed can be different than the ones providing the conditions. For example, you may want to index only accounts that are from paying customers, not those created for internal testing:

-- create index of only paying customers
CREATE INDEX paying_accounts_idx ON accounts (account_id)  
WHERE account_type <> 'test';  

It's important to make note that sometimes in MySQL the term "partial index" is used to refer to the truncation of the indexed values to a certain number of bytes, not the limiting of the number of indexed rows based on a condition. Partial indexes as we are describing here are not supported in MySQL.

Expression Indexes

Expression indexes can be created using any function to pre-compute a column for indexing. The new values are indexed and treated as constants for querying, as opposed to being computed each time a query is run. As an example, if you have a web hit log that collects URL hits in whatever format they are received, you may want to create an index of the URLs lower-cased to normalize the data (PostgreSQL is case sensitive — compose.io and Compose.io would be considered different results):

-- create index of lower-cased URLs
CREATE INDEX webhits_lower_urls_idx ON webhits (lower(url));  

GIST and GIN (and BRIN on the way!)

GiST (Generalized Search Tree) allows combining B-tree, R-tree, and user-defined index types to create a custom index with advanced query capability. GiST is used in PostGIS (which we've made standard with all PostgreSQL deployments since January) and OpenFTS (an open source full text search engine). PostgreSQL also supports SP-GiST which allows for the creation of partitioned search indexes for incredibly speedy retrieval.

GIN (Generalized Inverted Index) allows for indexing of composite data types. Composite data types let you combine other data types in various ways to create something entirely custom. See Part I of this series for an overview on composite types.

For creating GIST and GIN indexes, the syntax is CREATE INDEX... ON... USING GIST|GIN.... Simple.

In PostgreSQL 9.5 (now in beta! - see the "More on the way" section below), BRIN (Block Range Index) will be introduced. BRIN allows for breaking up large tables into ranges based on the column to be indexed. This means the query planner can scan for only the range that is indicated by the query. Also, by indexing ranges, the amount of disk size required for the index is much less than it would be with a standard B-Tree index.

In comparison

The other SQL databases we looked at are closing the gap when it comes to expression indexes. In MySQL 5.7.6, generated columns were introduced which can be used as an expression index. For MariaDB, virtual (aka "generated" or "computed") columns were introduced in version 5.2, but only support the use of built-in functions for creating the columns (no user-defined functions). In the 2.0 version of Firebird, expression indexing using computed columns was introduced. However, none of these databases support partial, GiST or GIN indexes. Also, as we mentioned in Part I, native JSON data types cannot be indexed in these databases.

And don't forget to read Simple Index Checking with PostgreSQL by Matt Barr of mySidewalk once you have your indexes set up and want to analyze their performance.

Virtual table features

Virtual tables are a necessity in many queries. All the SQL databases we compared offer some virtual table functionality. PostgreSQL gives you more.

CTEs and recursion

PostgreSQL supports Common Table Expressions (CTEs) using the WITH clause. We demonstrated this feature in PostgreSQL - Series, Random and With. CTEs let you create virtual tables inline in your query, logically expressing an order of operations which can be much easier to read and QA than the creation of virtual tables using subqueries elsewhere in the query. CTEs in PostgreSQL can also be used recursively. This handy capability lets you step through a hierarchy, with the query self-referring (calling itself) repeatedly until no more data levels can be returned. Here's an example of a recursive CTE that identifies the levels, topics and parent relationships in a topic taxonomy:

-- query with recursive
CTE WITH RECURSIVE topic_taxonomy_recursive  
(level, parent_topic_name, topic_name)
AS (  
     SELECT 1, tt.parent_topic_name, tt.topic_name
     FROM topic_taxonomy tt
     WHERE tt.parent_topic_name = 'All Topics'
     UNION ALL
     SELECT ttr.level + 1, tt.parent_topic_name, tt.topic_name
     FROM topic_taxonomy_recursive ttr, topic_taxonomy tt
     WHERE ttr.topic_name = tt.parent_topic_name
) SELECT level, parent_topic_name, topic_name
  FROM topic_taxonomy_recursive;

MySQL and MariaDB don't use the WITH clause and, therefore, do not formally support CTEs. While you can create derived tables using subqueries in these databases, they do not allow for recursion. Also, though the query optimizer in MySQL has been improved since the 5.6 release, nested subqueries are known to be troublesome and may significantly impact performance in that database. Firebird comes out ahead of MySQL and MariaDB here, matching PostgreSQL functionality by supporting CTEs using WITH and providing recursive capability as well.

Materialized views

Materialized views are another convenient virtual table feature supported by PostgreSQL. Materialized views are like regular views in that they represent the result set of a query which you want to use often, except that the result set is actually stored on disk like a regular table. Materialized views can be indexed and also, unlike regular views which are regenerated each time they are called, materialized views are snapshots in time. They are not refreshed except as specified. This can significantly increase the speed with which queries run that use materialized views. Instead of using regular views or having to do complex joins on tables or perform aggregation functions in a query, use a materialized view with the necessary data ready and waiting on disk for increased efficiency. When you're ready to have the data updated in a materialized view, it can be refreshed on demand using the REFRESH command. Here's an example of a materialized view which produces aggregate revenue data:

-- create an aggregated revenue result as a materialized view
CREATE MATERIALIZED VIEW aggregatedMonthlyRevenue  
(year, month, total_revenue)
AS (  
     SELECT date_part('year', date) AS year,
          date_part('month', date) AS month,
          SUM(revenue) AS total_revenue
     FROM revenue
     WHERE date >= '2014-01-01'
     GROUP BY date_part('year', date),
          date_part('month', date)
     ORDER BY date_part('year', date),
          date_part('month', date)
);

-- refresh the materialized view as needed
REFRESH MATERIALIZED VIEW aggregatedMonthlyRevenue;  

Firebird, MySQL and MariaDB do not support materialized views, though it is possible to create a kind of workaround in these databases by creating a regular table and using a stored procedure or trigger to update it as required.

Query capabilities

PostgreSQL's query capabilities are extensive.

We talked about WITH a bit in the previous section. Let's take a look at a couple other optional features that can be used in SELECT statements.

Combining queries

PostgreSQL provides UNION, INTERSECT and EXCEPT clauses for interactions between SELECT statements. UNION will append the results of the second SELECT statement to those of the first. INTERSECT returns only the rows which match from both SELECT statements. EXCEPT returns only the rows from the first SELECT statement that do not match rows from the second SELECT statement. Let's look at an example using EXCEPT where we want to return the customer contact information except in the case where the customer has received and responded to an email in the past week.

/*
query to get customer info  
where the customer has not been contacted  
and responded in the past week  
*/
SELECT c.lastName, c.firstName, c.email  
FROM customers c  
EXCEPT  
SELECT e.lastName, e.firstName, e.email  
FROM email_log e  
WHERE e.email_date > current_date - interval '7 days'  
     AND e.email_action_date > current_date - interval '7 days'
     AND email_action_type = 'response';

While MySQL, MariaDB, and Firebird all support UNION, none of them support INTERSECT or EXCEPT. However, through the use of joins in the query and the EXISTS condition, the same results can be acquired as from PostgreSQL. This requires the query to be more complicated, though.

Window functions

Window functions, which perform aggregate functions over some rows of the result set (providing a "window" into the subset), can be extremely useful. They essentially let you loop through rows in the partition that are related to the current row to perform the function. Common functions include ROW_NUMBER(), RANK(), DENSE_RANK() and PERCENT_RANK(). The OVER keyword, optionally used with PARTITION BY and ORDER BY, indicates that a window function is being used. As an example, in the "Functions and more" section below, we use a Window function with ROW_NUMBER() OVER... to identify the median in a series of numeric values. Note that the WINDOW clause is not required in queries with Window functions, but it lets you create and name windows to help keep things straight.

Firebird, MySQL and MariaDB do not currently support window functions, though window functions were announced several years ago in planning for Firebird 3.

Lateral subqueries

The LATERAL keyword can be applied to subqueries in the FROM clause to allow you to cross-reference between the subquery and other tables or virtual tables created preceding it. Queries can be written more simply this way. Also, the way it works is that each row is evaluated against the cross-referenced table, which can mean speed improvements during query processing. Here's an example where we want a list of our students and to know if they have been reading technology-oriented topics recently:

-- query using a lateral subquery
SELECT s.firstName, s.LastName, x.topic_name  
FROM students s  
JOIN content_log c ON c.student_id = s.id  
LEFT OUTER JOIN LATERAL (  
     SELECT t.topic_name
     FROM content_topics t
     WHERE t.parent_topic_name = 'Technology'
          AND t.id = c.topic_id
          AND c.date > current_date - interval '30 days'
) x ON true;

MySQL, Firebird and MariaDB do not currently support lateral subqueries. Again, workarounds can be performed, but queries get more complicated.

Another thing to note: MySQL and MariaDB do not support full outer joins, but a workaroud using UNION ALL can be used to combine all rows of the two tables.

Functions and more

PostgreSQL provides robust built-in operators and functions including those that support the specialized data types we reviewed in Part I of this series, but it also lets you create your own operators and functions (including aggregates) as well as custom stored procedures and triggers. We won't be able to go into detail on all of these here because there is a lot to cover (!), but let's look at a couple simple examples for functions.

PostgreSQL supports 4 kinds of user-defined functions: query language, procedural language, C-language and internal. Each kind can take and return both base and composite types. Note that in PostgreSQL the CREATE FUNCTION command is used for creating stored procedures as well as functions.

Let's look at an example of creating a function that returns a composite type:

-- create a new composite type called "datetext"
CREATE TYPE datetext AS (  
     date date,
     date_as_text text
);

/*
create a function that takes the date  
then returns the date and the datetext  
*/
CREATE FUNCTION show_date_as_text(date)  
RETURNS datetext -- this is our composite type  
AS  
$$ SELECT $1, to_char($1, 'FMMonth FMDD, YYYY') $$
LANGUAGE SQL;

-- query using the function
SELECT show_date_as_text('2015-01-01');

-- Returns: (2015-01-01,"January 1, 2015")

Here's a practical custom function for finding the median in a numeric data series:

-- create a function that finds the median in a numeric data series
CREATE FUNCTION median(numeric[])  
RETURNS numeric  
AS  
$$ SELECT AVG(x.result)
   FROM (
        SELECT result, 
             ROW_NUMBER() OVER (ORDER BY val) as ra,
             ROW_NUMBER() OVER (ORDER BY val DESC) as rd
        FROM unnest($1) result -- notice the use of array "unnest"
        ) AS x
   WHERE x.ra BETWEEN x.rd - 1 AND x.rd + 1;
$$
LANGUAGE SQL;

-- query using the function
SELECT median(ARRAY[1,2,3,4,5,6,7]);

-- Returns: 4

While the other open source SQL databases we're comparing also let you create your own functions, stored procedures and triggers, they do not have the richness of data types and customization options that PostgreSQL provides. Additionally, in PostgreSQL you can create your own operators. The other comparison databases do not support user-defined operators.

The customization capabilities of PostgreSQL are unparalleled when compared with MySQL, MariaDB and Firebird.

Language extensions

PostgreSQL comes with a variety of language extensions, some that are part of the distribution and many more available through 3rd parties.

At Compose, we support only trusted language extensions for PostgreSQL to ensure your deployment is secure. We added support for PL/Perl back in February and support for PL/v8, a javaScript-based procedural language, in August. These language extensions, which possess more built-in functions than the out-of-the-box SQL-based PL/pgSQL language (also available in Compose deployments), let you create sophisticated scripts to manipulate and process data on the server.

More on the way

PostgreSQL just announced the Beta 1 release of the 9.5 version late last week. We've already got our hands on it, learning about all the new features so we can make the 9.5 release available to you as soon as it's stable. Over the past few months we took a look at some of the features coming in 9.5, such as the BRIN indexes mentioned above. Check out PostgreSQL's Future Is Looking Up-sert and Beyond Upsert - Coming in PostgreSQL 9.5 for a preview of what's coming next.

Summing up

PostgreSQL is exceedingly rich, with copious built-in features and innumerable ways in which you can customize or extend it further to suit your needs. Add to that its acknowledged reliability and maturity and it's clear why this is a database solution worthy of any enterprise endeavor. Even so, it remains accessible and effective for development projects just getting underway too.

Though we only covered a handful of capabilities that make PostgreSQL distinct from other open source SQL solutions - there are many more (and more on the way in the 9.5 version!), we hope that this two-part series has provided a solid overview for why you might choose PostgreSQL.