Looking inside Books with PostgreSQL's hstore

Published

One of the highlights of PostgreSQL is its versatility, especially when you need a flexible data model. Here, we'll explore hstore, PostgreSQL's simple key-value store, and take a look at how it works using a catalogue of books.

PostgreSQL comes with a lot of features that makes working with unstructured and semi-structured data easy. hstore is a PostgreSQL extension that implements the hstore data type. It's a key-value data type for PostgreSQL that's been around since before JSON and JSONB data types were added. You can use it anywhere you need to store simple key-value pairs, but don't want to add an extra column (and run a migration) for every new key added. Since it supports GIN and GiST indexing, lookup is fast and memory efficient.

hstore can be used with, rather than a replacement for, the JSON and JSONB data types. You might consider hstore if you have ancillary data that doesn’t fit into your relational columns, isn’t frequently looked up, or doesn’t have a nested structure. For example, if we were talking about books in a bookstore, you could store the information that pertains to all books (ISBN, title, etc) as table columns, while storing simpler and less-accessed data such as descriptions of pictures, themes, or whether the book is part of a larger series, in an hstore column.

In this example, we'll show you how to use hstore by creating an catalog of books. Using an hstore column of book attributes, we'll look at how to add, delete, and edit this data, and how to query for book attributes using its specialized operators and functions.

So, let's get started by setting up the catalog of books ...

The book catalog

The catalog that we'll create comprises books having a title, author/editor, year, publisher, and an attribute column. The attribute columns contain key-value pairs that include ancillary information about a book. The ancillary data about each book is not standardized across the catalog and includes information like the number of pages, subject type, and its physical contents.

In order to use hstore, we have to install the extension. The extension comes with Compose PostgreSQL, so to install it run the following from the psql shell:

CREATE EXTENSION hstore;  

Next, we'll need a table to store the catalog of books. For that, we'll set up a table like the following:

CREATE TABLE books (  
    id SERIAL PRIMARY KEY,                                                                                                                                                                                                                  title TEXT, 
    author_editor TEXT, 
    year INT, 
    publisher TEXT,                                                                                                                                                                                 attributes hstore
);

Notice that the attributes column uses hstore. Once you create the table, you can confirm that the column has indeed been created using the psql metacommand \d books where we'll have a table like:

                             Table "public.books"
    Column     |  Type   |                     Modifiers                      
---------------+---------+----------------------------------------------------
 id            | integer | not null default nextval('books_id_seq'::regclass)
 title         | text    | 
 author_editor | text    | 
 year          | integer | 
 publisher     | text    | 
 attributes    | hstore  | 
Indexes:  
    "books_pkey" PRIMARY KEY, btree (id)

With the table set up, we can add some books to the table:

INSERT INTO books (title, author_editor, year, publisher, attributes) VALUES (  
    'The Iliad','Homer',1998,'Oxford UP',
    'category => "mythology", pages => "446", translation => "Robert Fitzgerald", introduction => "G. S. Kirk"'
), (
    'How to Read Egyptian Hieroglyphs','Mark Collier',1998,'University of California Press',
    'category => "language", binding => "hardback", pages => "179"'
), (
    'Beginner''s Latin','G. D. A. Sharpley',2003,'McGraw-Hill',
    'category => "language learning", language => "Latin", pages => "232", includes => "CD"'
), (
    'Game of Thrones','George R. R. Martin',1996,'Bantam',
    'category => "fiction", pages => "694", binding => "hardback", includes => "maps"'
), (
    'Old and Middle English: An Anthology','Elaine Treharne',2000,'Blackwell Publishers',
    'category => "anthology", language => "old english", language_2 => "middle english", pages => "622", translation => "Elaine Treharne"'
), (
    'History of Ancient Egypt','Erik Hornung',1999,'Cornell UP',
    'category => "history", pages => "185", translation => "David Lorton", translation_language => "German", binding => "paper"'
), (
    'The Making of Textual Culture: Grammatica and Literary Theory 350 - 1100','Martin Irvine',1994,'Cambridge UP',
    'category => "history", category_2 => "literary theory", pages => "604", series => "Cambridge Studies in Medieval Literature", series_num => "19"'
), (
    'Beowulf: A Verse Translation','Daniel Donoghue',2002,'W. W. Norton and Co.',
    'category => "literature", pages => "256", series => "Norton Critical Edition", translation => "Seamus Heaney"'
), (
    'A History of the Vikings','Gwyn Jones',1987,'Oxford UP',
    'category => "history", pages => "504", includes => "maps and plates"'
), (
    'PostGIS in Action','Regina O. Obe and Leo S. Hsu',2015,'Manning',''
);

Now that you have ten books in the catalog before we start querying let's show you how to set up an index on the attributes column.

Creating an index

After adding the books to the catalog, we should set up an index on the hstore column to reduce the time it takes for PostgreSQL to look for particular keys and values. hstore supports GIN and GiST indexing with @>, ?, ?&, and ?| operators and hash or btree indexes with the = operator.

There isn't a special way to create an index on an hstore column - you just have to index the entire column, not individual keys or values. To create a GIN index on the attributes column, we'd run the following SQL command:

CREATE INDEX idx_book_attrs ON books USING GIN(attributes);  

And that's all to it. Having an index set up, let's look at how we can work with our hstore data.

Adding, deleting, and editing hstore data

The attributes column is where the hstore data lives. As you can see, the key-value pairs are stored in a string like:

'category => "history", pages => "504", includes => "maps and plates"'  

In order to query this data, hstore comes with a variety of operators and functions that are useful for adding, modifying and deleting individual keys and values that are stored in the string.

If we look at the last book in the catalog "PostGIS in Action", we'll notice that it doesn't have attributes set up - it's just an empty column. To add some attributes to the book, we use the hstore function hstore(...) to update the attributes column like:

UPDATE books SET attributes = hstore(ARRAY[['pages', '570'],['category','gis'],['includes','code samples'],['media', 'ebook']]) WHERE id = 10;  

This example takes a two-dimensional array that includes the keys and values we want to add to the book. The hstore function will then add those values to the column. Updating the column for "PostGIS in Action" then using that function will give us:

       title       |                                   attributes                                    
-------------------+---------------------------------------------------------------------------------
 PostGIS in Action | "media"=>"ebook", "pages"=>"570", "category"=>"gis", "includes"=>"code samples"

To add to or update the attributes of books that already have values in the hstore column, we use the same UPDATE SQL command. For example, if we wanted to add on a "binding" attribute to the book "A History of the Vikings", we could do that using the hstore concatenate operator || like:

UPDATE books SET attributes = attributes || 'binding=>"paper"' WHERE id = 9;  

Giving us:

                                        attributes                                        
------------------------------------------------------------------------------------------
 "pages"=>"504", "binding"=>"paper", "category"=>"history", "includes"=>"maps and plates"

Updating the "includes" attribute, we use the same SQL command like;

UPDATE books SET attributes = attributes || 'includes=>"maps, plates, and pictures"' WHERE id = 9;  

Now giving us:

                                             attributes                                              
-----------------------------------------------------------------------------------------------------
 "pages"=>"504", "binding"=>"paper", "category"=>"history", "includes"=>"maps, plates, and pictures"

To delete an attribute, that's easy, too. For example, let's delete "category_2" from "The Making of Textual Culture". In the SQL query, we only need to define the attributes column and key.

UPDATE books SET attributes = delete(attributes,'category_2') WHERE id = 7;  

And now the book's attributes are:

                                                   attributes                                                    
-----------------------------------------------------------------------------------------------------------------
 "pages"=>"604", "series"=>"Cambridge Studies in Medieval Literature", "category"=>"history", "series_num"=>"19"

Querying hstore

Now that adding, editing, and deleting attributes is taken care of, let's focus on querying data in the hstore column. As previously mentioned, hstore has a variety of operators we can use to get keys and values.

The most common operator for searching through hstore is ->, which we can use to lookup a specific key. For example, if we want to look for all the "category" keys with the value "history", we'd write something like:

SELECT title, author_editor, year FROM books WHERE attributes -> 'category' = 'history';  

This will give us:

                                  title                                   | author_editor | year 
--------------------------------------------------------------------------+---------------+------
 History of Ancient Egypt                                                 | Erik Hornung  | 1999
 The Making of Textual Culture: Grammatica and Literary Theory 350 - 1100 | Martin Irvine | 1994
 A History of the Vikings                                                 | Gwyn Jones    | 1987

Using other operators, we could also create a query that looks for a particular key in each row instead of values. For example, searching for books that contain the key "language" we'd use the ? operator that returns either true or false. Therefore, creating an SQL command for that would look like:

SELECT title, author_editor, attributes FROM books WHERE attributes ? 'language';  

This will give us two books:

                title                 |   author_editor   |                                                              attributes                                                              
--------------------------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------
 Beginner's Latin                     | G. D. A. Sharpley | "pages"=>"232", "category"=>"language learning", "includes"=>"CD", "language"=>"Latin"
 Old and Middle English: An Anthology | Elaine Treharne   | "pages"=>"622", "category"=>"anthology", "language"=>"old english", "language_2"=>"middle english", "translation"=>"Elaine Treharne"

If we are ambitious and want to check for more than one attribute, we might change the previous query and add the hstore ?| operator. This operator checks whether any of the attributes occur in the books. So using the ARRAY data type, we'll include the keys we want to search for.

SELECT title, author_editor, attributes FROM books WHERE attributes ?| ARRAY['translation', 'binding'];  

This gives us the following books:

                title                 |    author_editor    |                                                              attributes                                                              
--------------------------------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------
 The Iliad                            | Homer               | "pages"=>"446", "category"=>"mythology", "translation"=>"Robert Fitzgerald", "introduction"=>"G. S. Kirk"
 How to Read Egyptian Hieroglyphs     | Mark Collier        | "pages"=>"179", "binding"=>"hardback", "category"=>"language"
 Game of Thrones                      | George R. R. Martin | "pages"=>"694", "binding"=>"hardback", "category"=>"fiction", "includes"=>"maps"
 Old and Middle English: An Anthology | Elaine Treharne     | "pages"=>"622", "category"=>"anthology", "language"=>"old english", "language_2"=>"middle english", "translation"=>"Elaine Treharne"
 Beowulf: A Verse Translation         | Daniel Donoghue     | "pages"=>"256", "series"=>"Norton Critical Edition", "category"=>"literature", "translation"=>"Seamus Heaney"
 History of Ancient Egypt             | Erik Hornung        | "binding"=>"paper"

We can also get a little creative with the queries and return all the books over 300 pages that have "category" as "history". Since the "pages" key is text, we'll have to cast it as an integer first.

SELECT title, year, attributes -> 'pages' AS pages  
FROM books  
    WHERE (attributes -> 'pages')::integer > 300 
        AND attributes -> 'category' = 'history';

This query will give us two books:

                                  title                                   | year | pages 
--------------------------------------------------------------------------+------+-------
 The Making of Textual Culture: Grammatica and Literary Theory 350 - 1100 | 1994 | 604
 A History of the Vikings                                                 | 1987 | 504

Casting the string value to an integer is not the optimal way you should approach this problem, so you might consider placing pages either in its own column, or just going forth and converting everything to JSONB.

Converting to JSONB

If you still would like to convert your data to JSON, hstore has functions to do that. The functions can convert the attributes into a matrix, array, JSON, or JSONB. For instance, if we want the output of the previous query as JSONB, we'd change it to:

SELECT title, author_editor, hstore_to_jsonb(attributes) FROM books WHERE attributes ?| ARRAY['translation', 'binding'];  

Note we use the hstore_to_json() function and place the attributes column inside. The results of the query look like:

                title                 |    author_editor    |                                                            hstore_to_jsonb                                                             
--------------------------------------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------
 The Iliad                            | Homer               | {"pages": "446", "category": "mythology", "translation": "Robert Fitzgerald", "introduction": "G. S. Kirk"}
 How to Read Egyptian Hieroglyphs     | Mark Collier        | {"pages": "179", "binding": "hardback", "category": "language"}
 Game of Thrones                      | George R. R. Martin | {"pages": "694", "binding": "hardback", "category": "fiction", "includes": "maps"}
 Old and Middle English: An Anthology | Elaine Treharne     | {"pages": "622", "category": "anthology", "language": "old english", "language_2": "middle english", "translation": "Elaine Treharne"}
 Beowulf: A Verse Translation         | Daniel Donoghue     | {"pages": "256", "series": "Norton Critical Edition", "category": "literature", "translation": "Seamus Heaney"}
 History of Ancient Egypt             | Erik Hornung        | {"binding": "paper"}

As you can see, hstore successfully converted the results to JSONB, which you could put in another table to query using JSONB operators and functions. With hstore, flexibility is king.

Happy searching ...

This is a simple introduction to hstore in PostgreSQL. While it's not as versatile as JSON or JSONB, since they are fully-fledged document store types, it can be easily incorporated into new or old tables when you don't want to create an entirely new set of columns for data that is not looked up frequently. While we suggest using JSONB for its versatility and power, we also want share with you alternatives to explore, especially some of the lesser used features of PostgreSQL.


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Alex Block

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.