PostgreSQL's Future Is Looking Up-sert

The next edition of PostgreSQL, version 9.5, is pencilled in to arrive in the third quarter of this year, but the big features are already landing and we're tracking them at Compose. Top of the list of most wanted features is "upsert", the ability to insert a record into a table, unless it's already there in which case you update the record. It sounds terribly simple, but it's a pit of problems if you don't have a dedicated database mechanism for doing it.

Consider how you could do an "upsert", or "merge" as it is also known, without an appropriate command. You could SELECT to see if the record exists – if it does UPDATE it, if it doesn't INSERT it. There's just one big problem with this. Between the SELECT and the UPDATE or INSERT, something else could have changed or deleted the record. It's a big race condition waiting to make records magically persist or oddly update under load. You could try doing an UPDATE first and if the update fails for the record then INSERT it. Same problem, there's a race condition because you are returning to your application code, making a decision and then returning to the database to act on it. You can, of course, lock the table while you are doing this. Now, instead of a race condition, you have a table contention problem all the time.

Well, that's not very good. There are various techniques that can be applied. The PostgreSQL documentation includes an example upsert which uses a plpgsql function to try and get around the problems but which in turn opens up the question of working out what caused an error when inserting - the insert or something triggered by the insert. Other ideas include using transactions and rollback to upsert. Others use row triggers and a set of functions to handle upserting. But that's just piling performance draining complexity into the database.

Which leads us to what is coming in 9.5 which isn't called "upsert" or "merge" in the SQL syntax. As is typical of the work of the PostgreSQL developers, they took a step back and asked "What's the problem we're trying to solve here" and came up with the answer "conflict resolution". When an INSERT command is being run, if the primary key, or some other key, conflicts with a constraint on one of the indexes, that command would normally fail.

So the new "upsert" syntax lets you catch those conflicts and take alternative action. This is defined with ON CONFLICT. If you have a table of superheroes names and you want do an upsert operation the command would look something like this:

INSERT INTO superheroes (sid, supername)  
VALUES ( 6, "Cheeseman" ), (7, "Specialismo")  
ON CONFLICT (sid) DO UPDATE SET supername = EXCLUDED.supername;  

The sid field is, of course, the basis for our primary index and everything is pretty normal SQL till we read the ON CONFLICT. Each insertion is handled separately by the one or more ON CONFLICT clause you can have. First, it needs to work out what the conflict is; if the conflict matches the argument after ON CONFLICT then the commands after the DO are executed. This argument is called the conflict_target while the commands following the DO are called the conflict_action The conflict_target can be one or more column names or can use ON CONSTRAINT and directly name the index which gives a lot of flexibility in how you identify the source of the conflict. The conflict_action, on the other hand, can either be UPDATE or NOTHING.

We've kept it simple with our example – if there's a conflict concerning indexes on the sid column, we do the conflict_action. Our example has us resolving the conflict by updating supername field of the row in conflict. The EXCLUDED. syntax lets us refer to the values that would have been inserted if there wasn't a conflict. But there's more. You can also add a WHERE clause to this ON CONFLICT DO UPDATE so that you can only apply the update when the WHERE condition is met. We could have said

... ON CONFLICT ON CONSTRAINT superheroes_pkey DO ...

to use just the primary key index for the table. Not that useful in this case but consider where you have a number of indexes on a field and you want to react differently to conflicts on each one of them. The opposite of doing something is DO NOTHING and it's with this that you can ignore conflicts and leave the row alone.

There's no race conditions to worry about and no parsing of error codes and exceptions to figure out what went wrong with the insert. It's a simple, coherent extension of SQL's INSERT which encapsulates most use cases. We'll be looking forward to the third quarter this year when this and, no doubt, many other enhancements arrive in PostgreSQL 9.5.