Using JSON Extensions in PostgreSQL from Python

In this Write Stuff article, Ryan Scott Brown takes a look at how you can work with PostgreSQL's JSON and JSONB support from the comfort of Python.

JSON is everywhere on the modern web, so it's no surprise that developers want to store data the way they use it. JSON is flexible, easy to access programmatically, and easy to use from any language. NoSQL databases have capitalized on this by offering plenty of JSON storage options.

While the NoSQL movement is most associated with JSON, mature databases like PostgreSQL can work with JSON too. PostgreSQL gives you the power to use JSON for flexibility, but as part of schemaful solutions.

JSON (since 9.2) and JSONB (since 9.4) data types are available that support indexing and advanced queries, and let you change what you store without changing the database schema.

Not "schema-less" just "less schema"

JSON and JSONB are available in PostgreSQL, but what distinguishes them from JSON-native stores? Relational database technology has been around the block a few times, and has picked up a few tricks. Specifying schemas up-front may be painful for teams that need to change quickly (read: lots of them), but features like triggers, stored procedures, PostGIS extensions, and more make it worth the effort. Beyond features, the ecosystem of battle-hardened tools built with and for PostgreSQL is compelling.

Flexible NoSQL vs. inflexible schemaful SQL isn't a binary choice. With PostgreSQL you can use schemas for the things you're certain about, and not specify a schema for data you're unsure of. Let's test this out by storing some user profiles with contact information.

In our example, we'll store names and emails in normal data types since we know they'll be needed, but use schemaless data types to store extra data.

Connect to Compose

If you'd like to follow along, now is the time to log in to your Compose account and spin up a new PostgreSQL installation.

I'll wait. You'll only be a couple of minutes.

All the code examples are in this IPython notebook to save some copying if you'd like to run them. You will need psycopg2, SQLAlchemy, and Jupyter. Once you've installed the dependencies, fill in the connection string (starts with postgresql://) into the variable connection_string.

connection_string = 'postgresql://user:[redacted]@[compose db host]:10187/compose'

import json  
import sqlalchemy  
db = sqlalchemy.create_engine(connection_string)  
engine = db.connect()  
meta = sqlalchemy.MetaData(engine)  

Now check that the connection worked:

>>> result = engine.execute("SELECT 1")
>>> print(result.rowcount)
1  

Great, now let's define a new table.

JSON vs. JSONB

JSON and JSONB look very similar on the surface. They both allow nesting and all JSON data types (ints, floats, strings, lists, and maps). HSTORE only supports string data types, and one level of nesting. When using JSON and JSONB you may not notice much of a difference, since operators like ->> work on both.

This is a great time to mention another article, "Is PostgreSQL Your Next JSON Database?"" which has a deeper treatment of indexing and performance differences between JSON and JSONB. For this article, I'll stick with JSON even though all the examples would also work with JSONB.

Setting Up

First, we'll make an EMR diagram to define the data model. ... Kidding, with JSON data types, we can just get rolling with a small schema. SQLAlchemy lets us define the table we want to build as a Python object and have SQLAlchemy create it in PostgreSQL (or not, if it exists already).

from sqlalchemy import Column, Integer, Text  
from sqlalchemy.dialects.postgresql import JSON, JSONB

sqlalchemy.Table("jsontable", meta,  
                Column('id', Integer),
                Column('name', Text),
                Column('email', Text),
                Column('doc', JSON))
meta.create_all()  

Now, let's add some fake data. If you want to follow along, get the IPython notebook I've put together with the examples for the rest of the article. The notebook showcases a couple of different ways to use Core to insert values, but it's not too important for understanding how JSON works in PostgreSQL.

SQLAlchemy Core

SQLAlchemy has two layers – the core and the ORM. The ORM maps properties of Python objects to SQL statements to save those properties in one or more tables. Core is built to provide easy access to SQL while protecting against accidental SQL injection and making hard or annoying things easy. If you (or your team) is new to Python but intimately familiar with SQL, Core is a good pick.

Let's add a person. The table object will let us generate the insert statement automatically, and escape any data that could otherwise contain SQL escape characters (a security danger zone).

>>> j_table = sqlalchemy.table(
        "jsontable", Column('id', Integer),
        Column('name', Text),
        Column('email', Text),
        Column('doc', JSON))
>>> statement = j_table.insert().values(
        id=3,
        name="Mr. Params",
        email="use@params.com",
        doc={
            "dialect": "params",
            "address": {"street": "Main St.", "zip": 12345},
        },
    )
>>> engine.execute(statement)
>>> print(str(statement))
INSERT INTO jsontable (id, name, email, doc) VALUES (:id, :name, :email, :doc)  

As you can see the statement doesn't show us the values we passed, but that's ok. Note that we can pass in the Python dict type and SQLAlchemy will handle converting it to JSON. It would do the same sort of casts if we passed a float to an Integer column.

To find the user again, we can use .select() and .where to generate a similar statement.

>>> find_user = j_table.select().where(j_table.c.name == "Mr. Params")
>>> engine.execute(find_user).fetchone()
(3, 'Mr. Params', 'use@params.com', {'dialect': 'params', 'address': {'street': 'Main St.', 'zip': 12345}})

There you go, and the dictionary is back out – note that we didn't need to deserialize the JSON, it was handled automatically by SQLAlchemy. Now let's query against the JSON data. Since the data we're looking for isn't using the = operator in PostgreSQL, we can't use the == operator in the SQLAlchemy query.

In PostgreSQL we would use the ->> operator where doc->>'dialect' = 'params'. To express that in SQLAlchemy we must use j_table.c.doc['dialect'].astext == 'params' to get to a key in our document column.

That's still pretty neat, now what about nested JSON or data types other than text? SQLAlchemy can cast data for you, so you can query any of the supported JSON types using the SQLAlchemy types you're used to by replacing .astext with .cast(SomeType) in the query.

>>> find_zip = j_table.select().where(
        j_table.c.doc[
            ('address', 'zip') # specify the path as a tuple/list of keys
        ].cast(
            sqlalchemy.Integer # zip codes are stored as numbers, cast the query
        ) == 12345) # finally, the zip we're looking for
>>> engine.execute(find_zip).fetchone()
(3, 'Mr. Params', 'use@params.com', {'dialect': 'params', 'address': {'street': 'Main St.', 'zip': 12345}})

Take note that we selected our result based on nested JSON by specifying a tuple of keys, ('address', 'zip'). Then we needed to cast the contents of the zip key as an Integer to query against it properly.

SQLAlchemy ORM (with Declarative Base)

The biggest difference between the ORM and Core (last section) of SQLAlchemy is that the ORM is built to let you make applications around business objects. Core is made to allow for querying and schema definitions, and can be great for analytics, reporting, and other workloads where the raw data may not be very useful. If you're interested in Domain-Driven Design then you'll feel right at home with the ORM.

First, we'll build our business object (a user) with the same schema we had earlier. The columns are actually the same, we're just binding them to attributes on an object.

from sqlalchemy.ext.declarative import declarative_base  
from sqlalchemy.orm import sessionmaker

Base = declarative_base()  
class User(Base):  
    __tablename__ = 'jsontable'
    id = Column(Integer, primary_key=True)
    name = Column(Text)
    email = Column(Text)
    doc = Column(JSON)

The Base object collects information about all objects that inherit from it, and maps the columns they specify to table columns.

Base.metadata.create_all(engine)  
SessionFactory = sessionmaker(engine)  

A key difference between the objects you get from the ORM and the tuples from queries in SQLAlchemy Core is that the objects are bound to the session that they are created with, and must be added to a session if they are created new. The SessionFactory creates sessions (well...yeah) that you can use in queries and to persist changes.

session = SessionFactory()  
u = User(  
    id=4,
    name="Oscar ORM",
    email="me@orms.com",
    doc={"address": {"zip": 5678, "street": "Cross St."}})
session.add(u)  
session.commit()  

It helps to think about an ORM session as a single transaction. While you can add as many objects as you want to a single session, limiting a transaction to a single logical set of actions is much better. This way, if a .commit fails it will only affect a single business transaction. It also makes it easy to enforce other logic at a database level, like uniqueness of usernames, emails or other attributes.

Querying looks pretty familiar with the ORM, though using the User object rather than the table schema.

>>> uu = session.query(User).filter(User.id == 3).one()
>>> print(uu.name)
'Mr. Params'  
>>> print(uu.doc)
{'dialect': 'params', 'address': {'zip': 12345, 'street': 'Main St.'}}

As with Core, the JSON type is unpacked to a Python dictionary automatically. Querying works the same way as well.

>>> uu = session.query(User).filter(
    User.doc[
        ('address', 'zip')
    ].cast(sqlalchemy.Integer) == 5678
).one()
>>> print(uu.doc)
{'address': {'zip': 5678, 'street': 'Cross St.'}}

Wrapping Up

Choosing a database isn't a final decision - you can always rip it out and replace it - but choosing one that doesn't fit your needs is costly. PostgreSQL is a flexible choice because it lets you use documents where you want, and lets you take advantage of great performance and type-enforcing schemas.

In Python, SQLAlchemy lets you choose between using the database directly, or designing your application around business objects and letting the ORM persist them for you. In either case, you'll get access to all the JSON and JSONB features in PostgreSQL and be able to use them easily from Python.

Ryan Scott Brown is a software engineer at Red Hat, and spends most of his time working in Python on OpenStack or other Open Source projects. Find them on his Github or read about them on his blog.

This article is licensed with CC-BY-NC-SA 4.0 by Compose.