Faster Operations with the JSONB Data Type in PostgreSQL

Published

Lucero Del Alba takes a look at how to get better performance out of jsonb data types in PostgreSQL in this Compose's Write Stuff article.

Since version 9.4, PostgreSQL offers a significant speedup when using the binary representation of JSON data, jsonb, which can give you that extra edge you need to increase your performance.

What's jsonb

The data types json and jsonb, as defined by the PostgreSQL documentation,are almost identical; the key difference is that json data is stored as an exact copy of the JSON input text, whereas jsonb stores data in a decomposed binary form; that is, not as an ASCII/UTF-8 string, but as binary code.

And this has some immediate benefits:

And some drawbacks:

The reason behind this last issue is that, for any given column, PostgreSQL saves descriptive statistics such as the number of distinct and most common values, the fraction of NULL entries, and --for ordered types-- a histogram of the data distribution. All of this will be unavailable when the info is entered as JSON fields, and you will suffer a heavy performance penalty especially when aggregating data (COUNT, AVG, SUM, etc) among tons of JSON fields.

To avoid this, you may consider storing data that you may aggregate later on regular fields.

For further commentary about this issue, you can read Heap's blog post When To Avoid JSONB In A PostgreSQL Schema.

Use Case: Book Entries

Let's use a toy model with book entries to illustrate some basic operations when working with JSON data in PostgreSQL.

The operations in this section will be essentially the same either if you use json or jsonb, but let's review them to refresh what we can do with JSON and to set our use case before we see the jsonb goodies right after.

Define a Column in a Table

Simply enough, we specify the data column with the jsonb data type:

CREATE TABLE books (  
  book_id serial NOT NULL,
  data jsonb
);

Insert JSON Data

To enter data to the books table we just pass the whole JSON string as a field value:

INSERT INTO books VALUES (1, '{"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}');  
INSERT INTO books VALUES (2, '{"title": "Influence", "genres": ["Marketing & Sales", "Self-Help ", "Psychology"], "published": true}');  
INSERT INTO books VALUES (3, '{"title": "The Dictator''s Handbook", "genres": ["Law", "Politics"], "authors": ["Bruce Bueno de Mesquita", "Alastair Smith"], "published": true}');  
INSERT INTO books VALUES (4, '{"title": "Deep Work", "genres": ["Productivity", "Reference"], "published": true}');  
INSERT INTO books VALUES (5, '{"title": "Siddhartha", "genres": ["Fiction", "Spirituality"], "published": true}');  

Query Data

We can now query specific keys within the JSON data:

SELECT data->'title' AS title FROM books;  

This returns the title, extracted from the JSONB data, as a column:

           title
---------------------------
 "Sleeping Beauties"
 "Influence"
 "The Dictator's Handbook"
 "Deep Work"
 "Siddhartha"
(5 rows)

Filter Results

You may also filter a result set no differently as you normally would, using the WHERE clause but through JSON keys:

SELECT * FROM books WHERE data->'published' = 'false';  

Which in this case returns the raw JSON data:

 book_id |                                              data
---------+-------------------------------------------------------------------------------------------------
       1 | {"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}
(1 row)

Expand Data

This is an important one, as it will enable us to use the aggregate functions that we are familiar when dealing with relational databases, but in the otherwise counter-intuitive environment of JSON data.

SELECT jsonb_array_elements_text(data->'genres') AS genre  
FROM books  
WHERE book_id = 1;  

That will expand the JSON array into a column:

  genre
----------
 Fiction
 Thriller
 Horror
(3 rows)

Special jsonb Features

Besides efficiency, there are extra ways in which you can benefit from storing JSON in binary form.

One such enhancement is the GIN (Generalized Inverted Index) indexes and a new brand of operators that come with them.

Checking Containment

Containment tests whether one document (a set or an array) is contained within another. This can be done in jsonb data using the @> operator.

For example, the array ["Fiction", "Horror"] is contained in the array ["Fiction", "Thriller", "Horror"] (where t stands for true):

SELECT '["Fiction", "Thriller", "Horror"]'::jsonb @> '["Fiction", "Horror"]'::jsonb;  
t

The opposite however, ["Fiction", "Thriller", "Horror"] being contained in ["Fiction", "Horror"], is false:

SELECT '["Fiction", "Horror"]'::jsonb @> '["Fiction", "Thriller", "Horror"]'::jsonb;  
f

Using this principle we can easily check for a single book genre:

SELECT data->'title' FROM books WHERE data->'genres' @> '["Fiction"]'::jsonb;  
"Sleeping Beauties"
"Siddhartha"

Or multiple genres at once, by passing an array (notice that they key order won't matter at all):

SELECT data->'title' FROM books WHERE data->'genres' @> '["Fiction", "Horror"]'::jsonb;  
"Sleeping Beauties"

Also, since version 9.5, PostgreSQL introduces the ability to check for top-level keys and containment of empty objects:

SELECT '{"book": {"title": "War and Peace"}}'::jsonb @> '{"book": {}}'::jsonb;  
t

Checking Existence

As a variation on containment, jsonb also has an existence operator (?) which can be used to find whether an object key or array element is present.

Here, let's count the books with the authors field entered:

SELECT COUNT(*) FROM books WHERE data ? 'authors';  

Only one in this case ("The Dictator's Handbook"):

count
-------
    1
(1 row)

Creating Indices

Let’s take a moment to remind ourselves that indexes are a key component of relational databases. Without them, whenever we need to retrieve a piece of information, the database would do a scan the entire table which is, of course, very inefficient.

A dramatic improvement of jsonb over the json data type, is the ability to index JSON data.

Our toy example only has 5 entries, but if they were thousands --or millions-- of entries, we could cut seek times in more than half by building indices.

We could, for example, index published books:

CREATE INDEX idx_published ON books ((data->'published'));  

This simple index will automatically speed up all the aggregate functions that we run on published books (WHERE data->'published' = 'true') because of the idx_published index.

And in fact, we could --and probably should, as the DB size increases-- index anything that's subject to be used on a WHERE clause when filtering results.

Caveats

There are just a few technicalities you need to consider when switching over to jsonb data type.

jsonb is stricter, and as such, it disallows Unicode escapes for non-ASCII characters (those above U+007F) unless the database encoding is UTF8. It also rejects the NULL character (\u0000), which cannot be represented in PostgreSQL's text type.

It does not preserve white space, and it will strip your JSON strings of leading/lagging white space as well as white space within the JSON string, all of which will just untidy your code (which might not be a bad thing for you after all.)

It does not preserve the order of object keys, treating keys in pretty much the same way as they are treated in Python dictionaries -- unsorted. You'll need to find a way around this if you rely on the order of your JSON keys.

Finally, jsonb does not keep duplicate object keys (which, again, might not be a bad thing, especially if you want to avoid ambiguity in your data), storing only last entry.

Conclusions

The PostgreSQL documentation recommends that most applications should prefer to store JSON data as jsonb, since as we've seen there are significant performance enhancements and only minor caveats.

The features jsonb brings are so powerful that you may very well handle relation data in pretty much the same manner as you would do in regular RDBMS, but all in JSON and which a very significant gain in performance, combining the practicality of a NoSQL solution with the power features of a RDBMS.

The main drawback when switching to jsonb is legacy code that may, for example, be relying on the ordering of object keys; which is code that will need to be updated to work as expected. And to state the obvious, as a feature introduced in version 9.4, jsonb isn't backward-compatible and the very jsonb keyword you need to use set the JSON tables will break your SQL code on legacy platforms.

Finally, notice that I covered some typical uses of the indices and their operators; for more details and examples have a look at the jsonb indexing and the JSON functions and operators in the official PostgreSQL documentation.


Do you want to shed light on a favorite feature in your preferred database? Why not write about it for Write Stuff?

Lucero dances, [plays music](http://luzdealba.bandcamp.com/), writes about random topics, leads projects to varying and doubtful degrees of success, and keeps trying to be real and he keeps failing.

attribution Lionello DelPiccolo

This article is licensed with CC-BY-NC-SA 4.0 by Compose.

Conquer the Data Layer

Spend your time developing apps, not managing databases.