Lucero Del Alba takes a look at how to use PostgreSQL's filter clause, and streamline database imports using PostgreSQL's foreign data wrapper in this Compose Write Stuff article.
The development of PostgreSQL happens in big steps, with plenty of features and improvements introduced in every major update. In this case, we'll explore two of them: simpler filtering in aggregate functions with the
FILTER clause, and
IMPORT FOREIGN SCHEMA from an external server. We'll review in a use case with examples.
To illustrate these features, let's consider an imaginary accountancy agency, AccountsRUs. You're working on the back-end side for the development team, and you need to be able to make all kinds of reports on the fly. The team has a leg up in that it uses PostgreSQL, but every tool they can lay their hands on will help.
Once reports and tables are created, you'll need to move the schemata onto the production servers. This used to be a time-consuming process: exporting them off the development server, editing them, and importing them on the production servers. But, we'll see later how things get simplified with the foreign-data wrapper.
Imagine AccountsRUs have some sales data on their development systems...
name | 2014 | 2015 | 2016 ----------------------------------|-----------:|-----------:|----------: A Torta o Caca SRL | 980,444 | 1,243,544 | 2,020,540 Compu-Global-Hyper-Mega-Net | 12,421,424 | 21,432,245 | 20,456,100 Der Hammer AG | 11,425,024 | 10,452,525 | 1,524,659 Minha Farinha SA | 420,304 | 545,242 | 1,004,234 Philatropia, Inc. | 200,534 | 26,523 | 156,534
Here's the table schema for such reports:
CREATE TABLE companies ( name character varying(32) NOT NULL, year integer NOT NULL, sales numeric(10, 0) NOT NULL )
And here's the SQL code for such entries, should you want to play with this set as you read the article:
INSERT INTO companies (name, year, sales) VALUES ('A Torta o Caca SRL', 2014, 980444); INSERT INTO companies (name, year, sales) VALUES ('A Torta o Caca SRL', 2015, 1243544); INSERT INTO companies (name, year, sales) VALUES ('A Torta o Caca SRL', 2016, 2020540); INSERT INTO companies (name, year, sales) VALUES ('Compu-Global-Hyper-Mega-Net', 2014, 12421424); INSERT INTO companies (name, year, sales) VALUES ('Compu-Global-Hyper-Mega-Net', 2015, 21432245); INSERT INTO companies (name, year, sales) VALUES ('Compu-Global-Hyper-Mega-Net', 2016, 20456100); INSERT INTO companies (name, year, sales) VALUES ('Der Hammer AG', 2014, 11425024); INSERT INTO companies (name, year, sales) VALUES ('Der Hammer AG', 2015, 10452525); INSERT INTO companies (name, year, sales) VALUES ('Der Hammer AG', 2016, 1524659); INSERT INTO companies (name, year, sales) VALUES ('Minha Farinha SA', 2014, 420304); INSERT INTO companies (name, year, sales) VALUES ('Minha Farinha SA', 2015, 545242); INSERT INTO companies (name, year, sales) VALUES ('Minha Farinha SA', 2016, 1004234); INSERT INTO companies (name, year, sales) VALUES ('Philatropia, Inc.', 2014, 200534); INSERT INTO companies (name, year, sales) VALUES ('Philatropia, Inc.', 2015, 26523); INSERT INTO companies (name, year, sales) VALUES ('Philatropia, Inc.', 2016, 156534);
Ready? Let's examine these tools...
Filtering for Fun
You're probably used to summing, counting, averaging, and grouping data that meets certain criteria; but things get slightly more complicated when you want to perform these queries only on certain rows. PostgreSQL 9.4 introduced a simple, yet powerful way to deal with this: the
FILTER clause. With it, aggregate functions can be filtered so that only the input rows where the filter clause evaluates to true are fed to such functions, while the others are discarded.
Maybe that was a little too abstract, so let's see an example.
Let's say management asks for a report with the number of companies grossing more than a million and less than 500K in sales.
Simple enough, you could trigger one query for high sales:
SELECT year, COUNT(*) high_sales FROM companies WHERE sales > 1000000 GROUP BY year;
... returning the amount of companies with high sales:
year | high_sales ------+------------ 2014 | 2 2015 | 3 2016 | 4
And another one for low sales:
SELECT year, COUNT(*) low_sales FROM companies WHERE sales < 500000 GROUP BY year;
... returning the amount of companies with low sales:
year | low_sales ------+----------- 2014 | 2 2015 | 1 2016 | 1
FILTER, however, you can easily combine the two in a single query:
SELECT year, COUNT(*) FILTER (WHERE sales > 1000000) high_sales, COUNT(*) FILTER (WHERE sales < 500000) low_sales FROM companies GROUP BY year;
... and get both results in a single output:
year | high_sales | low_sales -----+------------+----------- 2014 | 2 | 2 2015 | 3 | 1 2016 | 4 | 1
Your manager will also be happier since you've spent less time extracting the relevant information and presented it in a simpler manner.
More than Syntactic Sugar
Those with some extra SQL savvy may have noticed for this particular example, that
FILTER works as syntactic sugar for
SELECT year, SUM(CASE WHEN sales > 1000000 THEN 1 ELSE 0 END) AS high_sales, SUM(CASE WHEN sales < 500000 THEN 1 ELSE 0 END) AS low_sales FROM companies GROUP BY year;
... as that would return the same results as in the previous query:
year | high_sales | low_sales -----+------------+----------- 2014 | 2 | 2 2015 | 3 | 1 2016 | 4 | 1
FILTER not only the code is cleaner and easier to follow, you can do more than counting. In fact, you can use any aggregate function, and even use them in window functions!
Use in Window Functions
Window functions are a topic in their own right; but in short, they allow you to perform calculations across a set of rows that are related to the current row. This is, in a way, similar to when you are paginating results with
LIMIT from, to, but all of the result sets for the different pages are being processed in the current query.
Again, an example should make things clearer.
FILTER in a window function with
OVER window_name would look like this:
SELECT year, name, sales, COUNT(*) FILTER (WHERE sales > 100000) OVER (PARTITION BY year) high_sales FROM companies;
... which returns the details of the sales with the amount high sales companies for that period:
year | name | sales | high_sales ------+-----------------------------+----------+------------ 2014 | Minha Farinha SA | 420304 | 2 2014 | Der Hammer AG | 11425024 | 2 2014 | A Torta o Caca SRL | 980444 | 2 2014 | Compu-Global-Hyper-Mega-Net | 12421424 | 2 2014 | Philatropia, Inc. | 200534 | 2 2015 | Der Hammer AG | 10452525 | 3 2015 | A Torta o Caca SRL | 1243544 | 3 2015 | Compu-Global-Hyper-Mega-Net | 21432245 | 3 2015 | Minha Farinha SA | 545242 | 3 2015 | Philatropia, Inc. | 26523 | 3 2016 | Philatropia, Inc. | 156534 | 4 2016 | Minha Farinha SA | 1004234 | 4 2016 | A Torta o Caca SRL | 2020540 | 4 2016 | Der Hammer AG | 1524659 | 4 2016 | Compu-Global-Hyper-Mega-Net | 20456100 | 4
This sort of composite output can prove really handy when building complex reports with a mix of aggregate and non-aggregate data, since you don't need to store certain periodic (window) values (as the number of high sales for a given year, in this case) in a variable, as it is all already available to you in the results set.
For more on these topics, you can see the syntax for Aggregate Expressions and read about Window Functions in the PostgreSQL documentation.
Importing for Fun
Setting up a new database environment is a seemingly simple task. However, more often than not, it ends up consuming more time than initially thought, and it's still prone to problems coming up later on. With the foreign-data wrapper (FDW) and
IMPORT FOREIGN SCHEMA introduced in PostgreSQL 9.5, you can conveniently import table definitions from an external server, handling everything from the target's database prompt.
Going back to our use case. In AccountsRUs, you find yourself in the situation in which you regularly need to import the table schemata from the development server into the one running in production. But, now you'll use the foreign-data wrapper on the production server, which is provided by the
postgres_fdw module, that will allow you to access data that's stored on the development server.
You should notice that unlike simply exporting/importing SQL files, the FDW first needs to be set-up. This is a fairly straightforward process, and, once it's done, it simplifies the importing process a lot, giving you some peace of mind since this is a task you'll be performing quite often.
FDW Set-Up: Prepare for Remote Access
1. Install the
You'll need to do this step on the server you intend to import to only once. Run the following command:
CREATE EXTENSION postgres_fdw;
Compose Note: On Compose, you'll need to enable the extension from the Compose console for your PostgreSQL deployment. Select Browser, then the Compose database, then Extensions. Scroll down the list for
postgres_fdw and then click Install on the right.
2. Create a Foreign Server Object
You need to create a representation of each remote database you want to access. In this case, it will be your development server. You can create as many remote databases as you'd like to connect to (development, backup, etc), you'll only need the appropriate credentials to connect to each of them, which we'll do in the next step.
This is the command to create the foreign server object:
CREATE SERVER development_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'remote_host', dbname 'remote_db', port '5432');
development_server is the identifier you'll use to refer to the remote server. Remember, there can be as many as you need.
3. Create a User Mapping
The idea here is that you'll make logging into the remote servers a completely transparent process to the local users who will need to access them. For this, you'll have to map the local user(s) for each remote database user. Like this:
CREATE USER MAPPING FOR local_user SERVER development_server OPTIONS (user 'remote_user', password 'remote_password');
This way, the local user in your production server doesn't need to know what the remote credentials are; as long he or she has been mapped to the remote servers, the access will be granted.
That's all for the setup! Now let's retrieve some remote data.
Importing the Foreign Schema or Set of Tables
You can import a whole PostgreSQL schema, a set of tables, or everything except certain tables.
This is how you import a schema:
IMPORT FOREIGN SCHEMA remote_schema FROM SERVER development_server INTO local_schema;
That single command will take care of everything —
wrapping all of the remote schema from your development server, bringing it into the local (production) server, and importing it.
Often, though not always, the
remote_schema and the
local_schema will simply be
public, as it is PostgreSQL's default, but you should double check this.
To import a set of tables, you'll use the
LIMIT TO parameter:
IMPORT FOREIGN SCHEMA remote_schema LIMIT TO (balances, companies) FROM SERVER development_server INTO local_schema;
And finally, to import everything except certain tables you'll pass the
IMPORT FOREIGN SCHEMA remote_schema EXCEPT (logs) FROM SERVER development_server INTO local_schema;
LIMIT TO and
EXCEPT accept as many table names as needed. And this last option is very handy since often you don't want to import certain tables that are used for tracking or consolidating data, and that are normally useless out of the context in which they were created.
Keep in mind that, as the wording suggests,
IMPORT FOREIGN SCHEMA does precisely that: it imports the database structure and the indicated table definitions without the data.
A caveat for this feature, if you may, is that the remote server (your development environment) should be accessible from the target server (the production environment, in our example). Occasionally this might not be the case, due to security constraints (e.g. firewall settings) or to some networking and routing issues (remote requests reaching the router but not being forwarded to the appropriate server). So, before you can use
IMPORT FOREIGN SCHEMA, you should make sure that you have access to the remote server.
But as we have seen, FDW manages external data in a very transparent way, that is, without the need to handle logins, permission accesses, SQL files, etc. You only need to set the FWD with as many remote accesses as you need and you'll be all set.
One thing that sets PostgreSQL apart from other DBMS is the amount of enterprise-grade features it delivers. In this article, we only got to review two of them, the
FILTER clause and
IMPORT FOREIGN SCHEMA, but there are many more exciting features in the current versions of PostgreSQL, and yet more in the versions to come.
Fortunately, you can keep track of recent features and which version they were introduced on the PostgreSQL: Feature Matrix. There, you'll find a comprehensive list that you can easily filter by version number. This is important, as features aren't normally backward compatible, and using them in a legacy environment will introduce errors in your code.
Do you want to shed light on a favorite feature in your preferred database? Why not write about it for Write Stuff?
attribution Adam Sherez