Analyzing Pet Name Trends with PostgreSQL's \crosstabview

Published

PostgreSQL 9.6 comes with a number of updates and new features to explore. One very useful addition is the \crosstabview command, which gives you the power to rearrange how your data is viewed without the difficulty of writing complex SQL queries.

Since the release of PostgreSQL 9.6.2 on Compose, we've been playing with some of the new additions to the database. One addition we found interesting and quite useful is the new psql meta-command \crosstabview, which was released with PostgreSQL 9.6. This command allows query results to be shown in a representation, similar to a spreadsheet pivot table, without needing to write complex SQL queries.

Here, we'll look at how it works and show you some of the use cases where it might be beneficial to use. The dataset we'll use is the Current Pet Licenses for the City of Tacoma and Fircrest for 2017, which is a CSV file that contains a list of 15,555 names of cats and dogs in the two cities. To follow along, download the dataset from the link and let's look at how the \crosstabview command works.

Importing the dataset and Querying Pet Names

After downloading the dataset, we created a database pets and a table names, then we imported the CSV data.

CREATE TABLE names (  
    name TEXT, 
    animaltype TEXT, 
    primarybreed TEXT, 
    tpdsector INT, 
    latlon TEXT, 
    animalcount INT
);

\COPY names (name, animaltype, primarybreed, tpdsector, latlon, animalcount) FROM '/Downloads/Current_Pet_License-City_of_Tacoma___Fircrest.csv' CSV HEADER;  

Now that the pet names have been inserted, let's look for the names that both cats and dogs share. A simple query using count and a GROUP BY clause will do the trick.

SELECT  
    name, 
    animaltype,
    count(name)
FROM  
    names 
GROUP BY  
    name, 
    animaltype
ORDER BY  
    1;

A sample of the results of that query is below. As you can see, some names are shared between cats and dogs (e.g. "ABBY"). However, since the names are divided between CAT and DOG, the names are grouped accordingly and we don't have one row dedicated to a single name.

         name         | animaltype | count 
----------------------+------------+-------
 2P2                  | DOG        |     1
 A BARKSDALE          | DOG        |     1
 A509966              | CAT        |     1
 AARON                | DOG        |     1
 AB                   | DOG        |     1
 AB "ABBY"            | DOG        |     1
 ABBBY                | CAT        |     1
 ABBEY                | DOG        |     5
 ABBI                 | DOG        |     2
 ABBIE                | DOG        |    10
 ABBIGAIL             | CAT        |     1
 ABBOTT               | CAT        |     1
 ABBY                 | DOG        |    40
 ABBY                 | CAT        |    12
 ...

Trying to look at every row to find each cat and dog with identical names will be tedious, especially if the dataset is much larger than this. One way that we might overcome the problem is to design a new query that would put the count of cats and dogs in their own column.

SELECT  
    name,                                                                            
    count(CASE WHEN animaltype='CAT' THEN 1 END) AS CAT,
    count(CASE WHEN animaltype='DOG' THEN 1 END) AS DOG
FROM  
    names
GROUP BY  
    name
ORDER BY  
    1;

which produces ...

         name         | cat | dog 
----------------------+-----+-----
 2P2                  |   0 |   1
 A BARKSDALE          |   0 |   1
 A509966              |   1 |   0
 AARON                |   0 |   1
 AB                   |   0 |   1
 AB "ABBY"            |   0 |   1
 ABBBY                |   1 |   0
 ABBEY                |   0 |   5
 ABBI                 |   0 |   2
 ABBIE                |   0 |  10
 ABBIGAIL             |   1 |   0
 ABBOTT               |   1 |   0
 ABBY                 |  12 |  40
 ...

But creating an entirely new query to reorganize our data might be overkill, especially if you only want to rearrange the columns. That's where PostgreSQL's \crosstabview will help.

Query with \crosstabview

The first query we ran grouped together and counted all the cats and dogs having names with identical spellings and placed them into separate rows. \crosstabview can transform the data automatically by placing CAT and DOG in separate, horizontal columns, merging together the pet names in the vertical column, and using the count values to fill in the grid where cells are shared between the horizontal and vertical headers. All that's required for \crosstabview to work is that you have at least three columns that it can select data from.

It does this by finding the distinct values within the query's results and uses them as horizontal and vertical headers. The data shared between the header values are then projected into the grid of cells.

To see it in action, all you have to do is run \crosstabview after your SQL query.

SELECT  
    name, 
    animaltype,
    count(name)
FROM  
    names 
GROUP BY  
    name, 
    animaltype
ORDER BY  
    1 
\crosstabview

Once the \crosstabview command is executed, it sends the query input buffer to the server then shows the results of that query in a crosstab grid. That means crosstabview will only use the last query executed to populate the crosstab grid.

If \crosstabview is appended to the SQL query like in the query above, don't use the semicolon after \crosstabview, otherwise, you'll get an error: Invalid command \crosstabview;. That's because \crosstabview works similarly to ; at the end of an SQL query. Alternatively, if you execute the query first with a semicolon ; then, afterward, execute \crosstabview, it will give you the same results because it uses the query buffer.

When \crosstabview is executed in the above query, we'll get the following table with individual columns for DOG and CAT, which are the distinct values taken from the animalType column. As you can see, each name is grouped together like we want and the count column values are then used to fill in the table grid. We got a similar result using the second query we wrote above but using \crosstabview allowed users to use the query buffer and saved us from building and executing a new query that produces a similar result.

         name         | DOG | CAT 
----------------------+-----+-----
 2P2                  |   1 |    
 A BARKSDALE          |   1 |    
 A509966              |     |   1
 AARON                |   1 |    
 AB                   |   1 |    
 AB "ABBY"            |   1 |    
 ABBBY                |     |   1
 ABBEY                |   5 |    
 ABBI                 |   2 |    
 ABBIE                |  10 |    
 ABBIGAIL             |     |   1
 ABBOTT               |     |   1
 ABBY                 |  40 |  12 
 ...

Rearranging Tables with \crosstabview

Behind the scenes, PostgreSQL's \crosstabview will determine how to set up your table. However, if you want to rearrange how your data is viewed, PostgreSQL gives you that option, too.

If you want to tell \crosstabview to rearrange the table, for example, you may want to flip horizontal and vertical headers by placing DOG and CAT vertically and name horizontally, you can do that by specifying the vertical and horizontal headers, respectively, like:

\crosstabview animaltype name

This tells \crosstabview to place type as the vertical header and name as the horizontal header. You need to put a space between the column names. For this example, the count column will automatically be used as the data that fills in the grid. If we wanted to specify the count column as the data that \crosstabview will use, we'd place count as the third argument.

\crosstabview animaltype name count

However, this is not really necessary here since PostgreSQL will automatically deduce that count is the data shared by the values in the vertical and horizontal headers.

There are some limitations if you decide to specify the order of the headers. For example, running the query above with the name column as the horizontal header will give you:

\crosstabview: maximum number of columns (1600) exceeded

This error occurs because we've put all our names in the horizontal header, which PostgreSQL has limited to 1600 columns. Therefore, we can run the query again with \crosstabview animaltype name, but limit the query to get the first ten results, which would return something like:

 animaltype | 2P2 | A BARKSDALE | A509966 | AARON | AB | AB "ABBY" | ABBBY | ABBEY | ABBI | ABBIE 
------------+-----+-------------+---------+-------+----+-----------+-------+-------+------+-------
 DOG        |   1 |           1 |         |     1 |  1 |         1 |       |     5 |    2 |    10
 CAT        |     |             |       1 |       |    |           |     1 |       |      |      

Trending Pet Names

Looking beyond getting pet names and animal types, we could use \crosstabview to find out what breed of dogs, for instance, tend to have certain names and whether there is a correlation between animal breeds and pet names that pet owners prefer. To do that, we could construct a query that analyzes the breeds of DOG and the names associated with them.

SELECT  
    primarybreed, 
    name, 
    count(name) 
FROM  
    names 
WHERE  
    animaltype = 'DOG' 
GROUP BY  
    primarybreed, 
    name 
ORDER BY  
    3 DESC;

This query will give us a list of dog breeds, the names of dogs associated with a breed, and the number of dogs that have a specific name that is a certain breed.

  primarybreed   |         name         | count 
-----------------+----------------------+-------
 LABRADOR RETR   | BELLA                |    23
 LABRADOR RETR   | MAX                  |    20
 LABRADOR RETR   | SADIE                |    15
 LABRADOR RETR   | CHARLIE              |    14
 LABRADOR RETR   | DAISY                |    14
 LABRADOR RETR   | MAGGIE               |    13
 LABRADOR RETR   | RILEY                |    13
 CHIHUAHUA SH    | BUDDY                |    12
 CHIHUAHUA SH    | CHICO                |    12
 LABRADOR RETR   | MOLLY                |    12
 CHIHUAHUA SH    | BELLA                |    12
 LABRADOR RETR   | BEAR                 |    10
 LABRADOR RETR   | LUCY                 |    10
 GOLDEN RETR     | CHARLIE              |     9
 LABRADOR RETR   | BAILEY               |     9
 LABRADOR RETR   | STELLA               |     9
 LABRADOR RETR   | COCO                 |     8
 GERM SHEPHERD   | MAGGIE               |     8
 LABRADOR RETR   | DUKE                 |     8
 LABRADOR RETR   | LUNA                 |     8
 GERM SHEPHERD   | MAX                  |     8
 ...

From the results, it seems that there are a lot of Labrador Retrievers named Bella, but we also have a high number of short hair Chihuahua's with the same name. Bella is not the only name that is shared between breeds, but looking at the entire list of all the occurrences of Bella, or any dog for that matter is not efficient.

In fact, it's the same problem that we ran into in the first query where we have a repetition of names on separate rows, but this time it's because the names are listed with different breeds. The problem with this query is that if we decided to run \crosstabview, we'd exceed the number of columns allowed since the name column would be placed in the horizontal header. We could try to go around this by specifying that we want name in the vertical column and primarybreed in the horizontal column like \crosstabview name primarybreed, but we'd get a table that is extremely difficult to read.

In order to overcome this, we might want to select the top 10 names of dogs and then use those names to see what breeds tend to have those names. To do that, we'll use the following query, which is a modified version of the first query we ran in the article that selects only the animaltype = 'DOG' and is ordered in descending order according to the animal name:

SELECT  
    name, 
    animaltype,
    count(name)
FROM  
    names 
WHERE  
    animaltype = 'DOG'
GROUP BY  
    name, 
    animaltype
ORDER BY  
    3 DESC
LIMIT  
    10
\crosstabview

This gives us the following table with the top ten dog names:

  name   | DOG 
---------+-----
 BELLA   | 117
 LUCY    | 103
 BUDDY   | 102
 MAX     |  92
 DAISY   |  87
 CHARLIE |  77
 MOLLY   |  77
 SADIE   |  64
 JACK    |  60
 MAGGIE  |  56

Now that we know the top ten dog names, we can create a second query that narrows down the search and selects the number of dogs with those top ten names and the breeds that they belong to.

SELECT  
    primarybreed, 
    name, 
    count(primarybreed) 
FROM  
    names 
WHERE  
    animaltype = 'DOG' AND
    name LIKE ANY('{BELLA,LUCY,BUDDY,MAX,DAISY,CHARLIE,MOLLY,SADIE,JACK,MAGGIE}')  
GROUP BY  
    primarybreed,
    name
ORDER BY  
    3 DESC;

This will return a table that looks something like this:

  primarybreed   |  name   | count 
-----------------+---------+-------
 LABRADOR RETR   | BELLA   |    23
 LABRADOR RETR   | MAX     |    20
 LABRADOR RETR   | SADIE   |    15
 LABRADOR RETR   | CHARLIE |    14
 LABRADOR RETR   | DAISY   |    14
 LABRADOR RETR   | MAGGIE  |    13
 CHIHUAHUA SH    | BELLA   |    12
 CHIHUAHUA SH    | BUDDY   |    12
 ...

Now, using \crosstabview the results will be arranged according to the name of the dogs in the horizontal column and the primarybreed in the vertical column like:

  primarybreed   | BELLA | MAX | SADIE | CHARLIE | DAISY | MAGGIE | BUDDY | MOLLY | LUCY | JACK 
-----------------+-------+-----+-------+---------+-------+--------+-------+-------+------+------
 LABRADOR RETR   |    23 |  20 |    15 |      14 |    14 |     13 |     8 |    12 |   10 |    7
 CHIHUAHUA SH    |    12 |   4 |     1 |       2 |     6 |      5 |    12 |     1 |    6 |    7
 GOLDEN RETR     |     4 |   4 |     5 |       9 |     6 |      4 |     5 |     6 |    2 |    3
 GERM SHEPHERD   |     4 |   8 |     4 |       4 |     3 |      8 |     1 |     3 |    3 |    2
 POMERANIAN      |     4 |     |     1 |         |       |        |     3 |       |    5 |    1
 SHIH TZU        |     5 |   4 |     2 |       3 |     3 |      1 |     5 |     4 |    2 |    2
 PIT BULL        |     4 |   5 |     4 |       2 |     5 |        |     4 |     3 |    4 |    1
 AUST SHEPHERD   |     1 |   4 |     2 |       3 |     1 |      2 |     5 |     1 |    1 |     
 DACHSHUND       |     2 |   2 |     1 |       3 |     2 |      1 |     5 |     4 |    5 |    2
 ...

Using the first table to get the top ten dog names, we can already assume the order of the most popular dogs. However, the other question that we wanted to answer is whether there are particular breeds of dogs that have these top ten names. Instead of creating another query for this, we simply used \crosstabview to organize the name of dogs and the breeds in horizontal and vertical headers. The count was then dispersed throughout the grid forming what we have above.

From the data that's presented, we can determine that not only is Bella the most popular name, but it's the most popular name for Labrador Retrievers. At the same time, it's a pretty popular name for Chihuahuas, too. The table also tells us the most popular breed of dog for among the top ten names are Labrador Retrievers overwhelmingly, which might conclude that the inhabitants of Fircrest and Tacoma like their so-called family dogs.

Other interesting questions that might be answered with further data is whether pet owners prefer female over male dogs, and what names and breeds are preferred for males and females. According to the limited data presented here, it appears that female dogs are preferred over males just by looking at the top ten names. However, to make that claim we'd have to categorize the gender of all the pets according to their name, which may be easy to do with Pippy Long Stockings, Clarice, and Han Solo, but a little more difficult with Fluffy, Snickerdoodle, and Boo Boo.

There is a lot more that we could conclude from these results, but \crosstabview has provided, nonetheless, a way to easily take rows with figures and get meaningful result that would otherwise appear jumbled across a number of rows that we'd have to sift through, or create more complex queries to get similar results.

Summing up

The \crosstabview command only works in the psql shell. It's not a command that you can use in your application; for that, you will have to write a query that will produce the table structure you need, or use the crosstab function, which is included in the tablefunc extension. This extension is easy to add in Compose PostgreSQL by selecting the extension from the Compose console. However, if you simply want another view of your data from within the psql shell, then \crosstabview is a fantastic alternative that will make your life easier when trying to disect complicated datasets and the best part is that it comes out of the box with PostgreSQL 9.6.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Ricardo Gomez Angel

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.