Mastering PostgreSQL Tools: Filters and Foreign Data Wrappers

Published

Lucero Del Alba takes a look at how to use PostgreSQL's filter clause, and streamline database imports using PostgreSQL's foreign data wrapper in this Compose Write Stuff article.

The development of PostgreSQL happens in big steps, with plenty of features and improvements introduced in every major update. In this case, we'll explore two of them: simpler filtering in aggregate functions with the FILTER clause, and IMPORT FOREIGN SCHEMA from an external server. We'll review in a use case with examples.

AccountsRUs

To illustrate these features, let's consider an imaginary accountancy agency, AccountsRUs. You're working on the back-end side for the development team, and you need to be able to make all kinds of reports on the fly. The team has a leg up in that it uses PostgreSQL, but every tool they can lay their hands on will help.

Once reports and tables are created, you'll need to move the schemata onto the production servers. This used to be a time-consuming process: exporting them off the development server, editing them, and importing them on the production servers. But, we'll see later how things get simplified with the foreign-data wrapper.

Imagine AccountsRUs have some sales data on their development systems...

name                              | 2014       | 2015        | 2016  
----------------------------------|-----------:|-----------:|----------:
A Torta o Caca SRL                |    980,444 |  1,243,544 |  2,020,540  
Compu-Global-Hyper-Mega-Net       | 12,421,424 | 21,432,245 | 20,456,100  
Der Hammer AG                     | 11,425,024 | 10,452,525 |  1,524,659  
Minha Farinha SA                  |    420,304 |    545,242 |  1,004,234  
Philatropia, Inc.                 |    200,534 |     26,523 |    156,534  

Here's the table schema for such reports:

CREATE TABLE companies  
(
  name character varying(32) NOT NULL,
  year integer NOT NULL,
  sales numeric(10, 0) NOT NULL
)

And here's the SQL code for such entries, should you want to play with this set as you read the article:

INSERT INTO companies (name, year, sales)  
    VALUES ('A Torta o Caca SRL', 2014, 980444);
INSERT INTO companies (name, year, sales)  
    VALUES ('A Torta o Caca SRL', 2015, 1243544);
INSERT INTO companies (name, year, sales)  
    VALUES ('A Torta o Caca SRL', 2016, 2020540);

INSERT INTO companies (name, year, sales)  
    VALUES ('Compu-Global-Hyper-Mega-Net', 2014, 12421424);
INSERT INTO companies (name, year, sales)  
    VALUES ('Compu-Global-Hyper-Mega-Net', 2015, 21432245);
INSERT INTO companies (name, year, sales)  
    VALUES ('Compu-Global-Hyper-Mega-Net', 2016, 20456100);

INSERT INTO companies (name, year, sales)  
    VALUES ('Der Hammer AG', 2014, 11425024);
INSERT INTO companies (name, year, sales)  
    VALUES ('Der Hammer AG', 2015, 10452525);
INSERT INTO companies (name, year, sales)  
    VALUES ('Der Hammer AG', 2016,  1524659);

INSERT INTO companies (name, year, sales)  
    VALUES ('Minha Farinha SA', 2014,  420304);
INSERT INTO companies (name, year, sales)  
    VALUES ('Minha Farinha SA', 2015,  545242);
INSERT INTO companies (name, year, sales)  
    VALUES ('Minha Farinha SA', 2016, 1004234);

INSERT INTO companies (name, year, sales)  
    VALUES ('Philatropia, Inc.', 2014, 200534);
INSERT INTO companies (name, year, sales)  
    VALUES ('Philatropia, Inc.', 2015,  26523);
INSERT INTO companies (name, year, sales)  
    VALUES ('Philatropia, Inc.', 2016, 156534);

Ready? Let's examine these tools...

Filtering for Fun

You're probably used to summing, counting, averaging, and grouping data that meets certain criteria; but things get slightly more complicated when you want to perform these queries only on certain rows. PostgreSQL 9.4 introduced a simple, yet powerful way to deal with this: the FILTER clause. With it, aggregate functions can be filtered so that only the input rows where the filter clause evaluates to true are fed to such functions, while the others are discarded.

Maybe that was a little too abstract, so let's see an example.

Let's say management asks for a report with the number of companies grossing more than a million and less than 500K in sales.

Simple enough, you could trigger one query for high sales:

SELECT year, COUNT(*) high_sales  
FROM companies  
WHERE sales > 1000000  
GROUP BY year;  

... returning the amount of companies with high sales:

 year | high_sales
------+------------
 2014 |          2
 2015 |          3
 2016 |          4

And another one for low sales:

SELECT year, COUNT(*) low_sales  
FROM companies  
WHERE sales <  500000  
GROUP BY year;  

... returning the amount of companies with low sales:

 year | low_sales
------+-----------
 2014 |         2
 2015 |         1
 2016 |         1

With FILTER, however, you can easily combine the two in a single query:

SELECT year,  
    COUNT(*) FILTER (WHERE sales > 1000000) high_sales,
    COUNT(*) FILTER (WHERE sales <  500000) low_sales
FROM companies  
GROUP BY year;  

... and get both results in a single output:

year | high_sales | low_sales
-----+------------+-----------
2014 |          2 |         2
2015 |          3 |         1
2016 |          4 |         1

Your manager will also be happier since you've spent less time extracting the relevant information and presented it in a simpler manner.

More than Syntactic Sugar

Those with some extra SQL savvy may have noticed for this particular example, that FILTER works as syntactic sugar for CASE WHEN...:

SELECT year,  
  SUM(CASE WHEN sales > 1000000 THEN 1 ELSE 0 END) AS high_sales,
  SUM(CASE WHEN sales <  500000 THEN 1 ELSE 0 END) AS low_sales
FROM companies  
GROUP BY year;  

... as that would return the same results as in the previous query:

year | high_sales | low_sales
-----+------------+-----------
2014 |          2 |         2
2015 |          3 |         1
2016 |          4 |         1

However, with FILTER not only the code is cleaner and easier to follow, you can do more than counting. In fact, you can use any aggregate function, and even use them in window functions!

Use in Window Functions

Window functions are a topic in their own right; but in short, they allow you to perform calculations across a set of rows that are related to the current row. This is, in a way, similar to when you are paginating results with LIMIT from, to, but all of the result sets for the different pages are being processed in the current query.

Again, an example should make things clearer.

Using FILTER in a window function with OVER window_name would look like this:

SELECT year, name, sales,  
  COUNT(*) FILTER (WHERE sales > 100000)
      OVER (PARTITION BY year) high_sales
FROM companies;  

... which returns the details of the sales with the amount high sales companies for that period:

 year |            name             |  sales   | high_sales
------+-----------------------------+----------+------------
 2014 | Minha Farinha SA            |   420304 |          2
 2014 | Der Hammer AG               | 11425024 |          2
 2014 | A Torta o Caca SRL          |   980444 |          2
 2014 | Compu-Global-Hyper-Mega-Net | 12421424 |          2
 2014 | Philatropia, Inc.           |   200534 |          2
 2015 | Der Hammer AG               | 10452525 |          3
 2015 | A Torta o Caca SRL          |  1243544 |          3
 2015 | Compu-Global-Hyper-Mega-Net | 21432245 |          3
 2015 | Minha Farinha SA            |   545242 |          3
 2015 | Philatropia, Inc.           |    26523 |          3
 2016 | Philatropia, Inc.           |   156534 |          4
 2016 | Minha Farinha SA            |  1004234 |          4
 2016 | A Torta o Caca SRL          |  2020540 |          4
 2016 | Der Hammer AG               |  1524659 |          4
 2016 | Compu-Global-Hyper-Mega-Net | 20456100 |          4

This sort of composite output can prove really handy when building complex reports with a mix of aggregate and non-aggregate data, since you don't need to store certain periodic (window) values (as the number of high sales for a given year, in this case) in a variable, as it is all already available to you in the results set.

For more on these topics, you can see the syntax for Aggregate Expressions and read about Window Functions in the PostgreSQL documentation.

Importing for Fun

Setting up a new database environment is a seemingly simple task. However, more often than not, it ends up consuming more time than initially thought, and it's still prone to problems coming up later on. With the foreign-data wrapper (FDW) and IMPORT FOREIGN SCHEMA introduced in PostgreSQL 9.5, you can conveniently import table definitions from an external server, handling everything from the target's database prompt.

Going back to our use case. In AccountsRUs, you find yourself in the situation in which you regularly need to import the table schemata from the development server into the one running in production. But, now you'll use the foreign-data wrapper on the production server, which is provided by the postgres_fdw module, that will allow you to access data that's stored on the development server.

You should notice that unlike simply exporting/importing SQL files, the FDW first needs to be set-up. This is a fairly straightforward process, and, once it's done, it simplifies the importing process a lot, giving you some peace of mind since this is a task you'll be performing quite often.

FDW Set-Up: Prepare for Remote Access
1. Install the postgres_fdw Extension

You'll need to do this step on the server you intend to import to only once. Run the following command:

CREATE EXTENSION postgres_fdw;  

Compose Note: On Compose, you'll need to enable the extension from the Compose console for your PostgreSQL deployment. Select Browser, then the Compose database, then Extensions. Scroll down the list for postgres_fdw and then click Install on the right.

2. Create a Foreign Server Object

You need to create a representation of each remote database you want to access. In this case, it will be your development server. You can create as many remote databases as you'd like to connect to (development, backup, etc), you'll only need the appropriate credentials to connect to each of them, which we'll do in the next step.

This is the command to create the foreign server object:

CREATE SERVER development_server  
      FOREIGN DATA WRAPPER postgres_fdw 
      OPTIONS (host 'remote_host', dbname 'remote_db', port '5432');

development_server is the identifier you'll use to refer to the remote server. Remember, there can be as many as you need.

3. Create a User Mapping

The idea here is that you'll make logging into the remote servers a completely transparent process to the local users who will need to access them. For this, you'll have to map the local user(s) for each remote database user. Like this:

CREATE USER MAPPING  
        FOR local_user
        SERVER development_server
        OPTIONS (user 'remote_user', password 'remote_password');

This way, the local user in your production server doesn't need to know what the remote credentials are; as long he or she has been mapped to the remote servers, the access will be granted.

That's all for the setup! Now let's retrieve some remote data.

Importing the Foreign Schema or Set of Tables

You can import a whole PostgreSQL schema, a set of tables, or everything except certain tables.

This is how you import a schema:

IMPORT FOREIGN SCHEMA remote_schema  
    FROM SERVER development_server INTO local_schema;

That single command will take care of everything —
wrapping all of the remote schema from your development server, bringing it into the local (production) server, and importing it.

Often, though not always, the remote_schema and the local_schema will simply be public, as it is PostgreSQL's default, but you should double check this.

To import a set of tables, you'll use the LIMIT TO parameter:

IMPORT FOREIGN SCHEMA remote_schema  
    LIMIT TO (balances, companies)
    FROM SERVER development_server INTO local_schema;

And finally, to import everything except certain tables you'll pass the EXCEPT parameter:

IMPORT FOREIGN SCHEMA remote_schema  
    EXCEPT (logs)
    FROM SERVER development_server INTO local_schema;

Both, LIMIT TO and EXCEPT accept as many table names as needed. And this last option is very handy since often you don't want to import certain tables that are used for tracking or consolidating data, and that are normally useless out of the context in which they were created.

Additional Remarks

Keep in mind that, as the wording suggests, IMPORT FOREIGN SCHEMA does precisely that: it imports the database structure and the indicated table definitions without the data.

A caveat for this feature, if you may, is that the remote server (your development environment) should be accessible from the target server (the production environment, in our example). Occasionally this might not be the case, due to security constraints (e.g. firewall settings) or to some networking and routing issues (remote requests reaching the router but not being forwarded to the appropriate server). So, before you can use IMPORT FOREIGN SCHEMA, you should make sure that you have access to the remote server.

But as we have seen, FDW manages external data in a very transparent way, that is, without the need to handle logins, permission accesses, SQL files, etc. You only need to set the FWD with as many remote accesses as you need and you'll be all set.

Wrap-Up

One thing that sets PostgreSQL apart from other DBMS is the amount of enterprise-grade features it delivers. In this article, we only got to review two of them, the FILTER clause and IMPORT FOREIGN SCHEMA, but there are many more exciting features in the current versions of PostgreSQL, and yet more in the versions to come.

Fortunately, you can keep track of recent features and which version they were introduced on the PostgreSQL: Feature Matrix. There, you'll find a comprehensive list that you can easily filter by version number. This is important, as features aren't normally backward compatible, and using them in a legacy environment will introduce errors in your code.


Do you want to shed light on a favorite feature in your preferred database? Why not write about it for Write Stuff?

Lucero dances, [plays music](http://luzdealba.bandcamp.com/), writes about random topics, leads projects to varying and doubtful degrees of success, and keeps trying to be real and he keeps failing.

attribution Adam Sherez

This article is licensed with CC-BY-NC-SA 4.0 by Compose.