Faster Operations with the JSONB Data Type in PostgreSQL
PublishedLucero Del Alba takes a look at how to get better performance out of jsonb data types in PostgreSQL in this Compose's Write Stuff article.
Since version 9.4, PostgreSQL offers a significant speedup when using the binary representation of JSON data, jsonb
, which can give you that extra edge you need to increase your performance.
What's jsonb
The data types json
and jsonb
, as defined by the PostgreSQL documentation,are almost identical; the key difference is that json
data is stored as an exact copy of the JSON input text, whereas jsonb
stores data in a decomposed binary form; that is, not as an ASCII/UTF-8 string, but as binary code.
And this has some immediate benefits:
- more efficiency,
- significantly faster to process,
- supports indexing (which can be a significant advantage, as we'll see later),
- simpler schema designs (replacing entity-attribute-value (EAV) tables with
jsonb
columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X!)
And some drawbacks:
- slightly slower input (due to added conversion overhead),
- it may take more disk space than plain
json
due to a larger table footprint, though not always, - certain queries (especially aggregate ones) may be slower due to the lack of statistics.
The reason behind this last issue is that, for any given column, PostgreSQL saves descriptive statistics such as the number of distinct and most common values, the fraction of NULL
entries, and --for ordered types-- a histogram of the data distribution. All of this will be unavailable when the info is entered as JSON fields, and you will suffer a heavy performance penalty especially when aggregating data (COUNT
, AVG
, SUM
, etc) among tons of JSON fields.
To avoid this, you may consider storing data that you may aggregate later on regular fields.
For further commentary about this issue, you can read Heap's blog post When To Avoid JSONB In A PostgreSQL Schema.
Use Case: Book Entries
Let's use a toy model with book entries to illustrate some basic operations when working with JSON data in PostgreSQL.
The operations in this section will be essentially the same either if you use json
or jsonb
, but let's review them to refresh what we can do with JSON and to set our use case before we see the jsonb
goodies right after.
Define a Column in a Table
Simply enough, we specify the data
column with the jsonb
data type:
CREATE TABLE books (
book_id serial NOT NULL,
data jsonb
);
Insert JSON Data
To enter data to the books
table we just pass the whole JSON string as a field value:
INSERT INTO books VALUES (1, '{"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}');
INSERT INTO books VALUES (2, '{"title": "Influence", "genres": ["Marketing & Sales", "Self-Help ", "Psychology"], "published": true}');
INSERT INTO books VALUES (3, '{"title": "The Dictator''s Handbook", "genres": ["Law", "Politics"], "authors": ["Bruce Bueno de Mesquita", "Alastair Smith"], "published": true}');
INSERT INTO books VALUES (4, '{"title": "Deep Work", "genres": ["Productivity", "Reference"], "published": true}');
INSERT INTO books VALUES (5, '{"title": "Siddhartha", "genres": ["Fiction", "Spirituality"], "published": true}');
Query Data
We can now query specific keys within the JSON data:
SELECT data->'title' AS title FROM books;
This returns the title, extracted from the JSONB data, as a column:
title
---------------------------
"Sleeping Beauties"
"Influence"
"The Dictator's Handbook"
"Deep Work"
"Siddhartha"
(5 rows)
Filter Results
You may also filter a result set no differently as you normally would, using the WHERE
clause but through JSON keys:
SELECT * FROM books WHERE data->'published' = 'false';
Which in this case returns the raw JSON data:
book_id | data
---------+-------------------------------------------------------------------------------------------------
1 | {"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}
(1 row)
Expand Data
This is an important one, as it will enable us to use the aggregate functions that we are familiar when dealing with relational databases, but in the otherwise counter-intuitive environment of JSON data.
SELECT jsonb_array_elements_text(data->'genres') AS genre
FROM books
WHERE book_id = 1;
That will expand the JSON array into a column:
genre
----------
Fiction
Thriller
Horror
(3 rows)
Special jsonb
Features
Besides efficiency, there are extra ways in which you can benefit from storing JSON in binary form.
One such enhancement is the GIN (Generalized Inverted Index) indexes and a new brand of operators that come with them.
Checking Containment
Containment tests whether one document (a set or an array) is contained within another. This can be done in jsonb
data using the @>
operator.
For example, the array ["Fiction", "Horror"]
is contained in the array ["Fiction", "Thriller", "Horror"]
(where t
stands for true
):
SELECT '["Fiction", "Thriller", "Horror"]'::jsonb @> '["Fiction", "Horror"]'::jsonb;
t
The opposite however, ["Fiction", "Thriller", "Horror"]
being contained in ["Fiction", "Horror"]
, is false
:
SELECT '["Fiction", "Horror"]'::jsonb @> '["Fiction", "Thriller", "Horror"]'::jsonb;
f
Using this principle we can easily check for a single book genre:
SELECT data->'title' FROM books WHERE data->'genres' @> '["Fiction"]'::jsonb;
"Sleeping Beauties"
"Siddhartha"
Or multiple genres at once, by passing an array (notice that they key order won't matter at all):
SELECT data->'title' FROM books WHERE data->'genres' @> '["Fiction", "Horror"]'::jsonb;
"Sleeping Beauties"
Also, since version 9.5, PostgreSQL introduces the ability to check for top-level keys and containment of empty objects:
SELECT '{"book": {"title": "War and Peace"}}'::jsonb @> '{"book": {}}'::jsonb;
t
Checking Existence
As a variation on containment, jsonb
also has an existence operator (?
) which can be used to find whether an object key or array element is present.
Here, let's count the books with the authors
field entered:
SELECT COUNT(*) FROM books WHERE data ? 'authors';
Only one in this case ("The Dictator's Handbook"):
count
-------
1
(1 row)
Creating Indices
Let’s take a moment to remind ourselves that indexes are a key component of relational databases. Without them, whenever we need to retrieve a piece of information, the database would do a scan the entire table which is, of course, very inefficient.
A dramatic improvement of jsonb
over the json
data type, is the ability to index JSON data.
Our toy example only has 5 entries, but if they were thousands --or millions-- of entries, we could cut seek times in more than half by building indices.
We could, for example, index published books:
CREATE INDEX idx_published ON books ((data->'published'));
This simple index will automatically speed up all the aggregate functions that we run on published books (WHERE data->'published' = 'true'
) because of the idx_published
index.
And in fact, we could --and probably should, as the DB size increases-- index anything that's subject to be used on a WHERE
clause when filtering results.
Caveats
There are just a few technicalities you need to consider when switching over to jsonb
data type.
jsonb
is stricter, and as such, it disallows Unicode escapes for non-ASCII characters (those above U+007F) unless the database encoding is UTF8. It also rejects the NULL
character (\u0000
), which cannot be represented in PostgreSQL's text
type.
It does not preserve white space, and it will strip your JSON strings of leading/lagging white space as well as white space within the JSON string, all of which will just untidy your code (which might not be a bad thing for you after all.)
It does not preserve the order of object keys, treating keys in pretty much the same way as they are treated in Python dictionaries -- unsorted. You'll need to find a way around this if you rely on the order of your JSON keys.
Finally, jsonb
does not keep duplicate object keys (which, again, might not be a bad thing, especially if you want to avoid ambiguity in your data), storing only last entry.
Conclusions
The PostgreSQL documentation recommends that most applications should prefer to store JSON data as jsonb
, since as we've seen there are significant performance enhancements and only minor caveats.
The features jsonb
brings are so powerful that you may very well handle relation data in pretty much the same manner as you would do in regular RDBMS, but all in JSON and which a very significant gain in performance, combining the practicality of a NoSQL solution with the power features of a RDBMS.
The main drawback when switching to jsonb
is legacy code that may, for example, be relying on the ordering of object keys; which is code that will need to be updated to work as expected. And to state the obvious, as a feature introduced in version 9.4, jsonb
isn't backward-compatible and the very jsonb
keyword you need to use set the JSON tables will break your SQL code on legacy platforms.
Finally, notice that I covered some typical uses of the indices and their operators; for more details and examples have a look at the jsonb
indexing and the JSON functions and operators in the official PostgreSQL documentation.
Do you want to shed light on a favorite feature in your preferred database? Why not write about it for Write Stuff?
attribution Lionello DelPiccolo