Common misconceptions about locking in PostgreSQL

Do you understand locking in PostgreSQL? Robert M. Wysocki has come across quite a few people who don't and in this Write Stuff article, he looks at some of the myths that exist around the subject.

At the core of any good database is a great locking mechanism to ensure that data is quickly, safely and consistently updated. In my work as a DBA though, I've come across many misconceptions about locking and even some senior developers I talked with were unclear as to the internal workings of PostgreSQL. Understanding some of the concepts should go a long way in helping to create faster and more responsive applications.

Locks are the key element of PostgreSQL's MVCC – Multi-Version Concurrency Control - mechanism, ensuring the data consistency when working with many concurrent read/write transactions. They are also absolutely necessary for the database because without them it would be impossible to tell, which row version is current and therefore correct and RDBMS wouldn't be able to perform operations in the proper order. Let's try to identify some of the common myths about them.

The more locks the worse the database health

I hear this quite often. A sysadmin or a developer takes a look at a "locks" munin graph, sees a spike and starts lamenting about how many locks there were/are, how bad that is and that we should do something about it... Well, it isn't like that.

The first thing to understand is that locks are there to actually guarantee your databases' health. It would be a much worse sign to see an unnaturally low number of locks than it would be to see it high. This number is a reflection of what's going on with your data; if it's too low or perhaps close to zero, it might in fact mean that there's nothing going on, and that's usually a bad thing. Ultimately you want your database to be active, if it isn't, then probably your application is down, your domain expired, your users went with your competition or some other disaster happened. On the other hand you should keep a close eye on high number of locks, but never assume that it's bad until you'll consult with your DBA.

Another thing to remember is that even a simple transaction will create a bunch of locks - see the following transcript:

demo=# create table dummy (abc int);
CREATE TABLE
demo=# insert into dummy select * from generate_series(1,100);
INSERT 0 100
demo=# begin;
BEGIN
demo=# update dummy set abc = 123;
UPDATE 100
demo=# select count(*) from pg_locks where virtualtransaction = (select virtualtransaction from pg_locks where mode='ExclusiveLock' and transactionid::text=txid_current()::text) and (relation = (select relid from pg_stat_user_tables where relname = 'dummy') or relation is null);
 count 
 -------
      3
 (1 row)

There's a good chance that all of these locks will show up on your graph, but that's perfectly normal and there's no point in worrying about it.

Let's take a closer look:

demo=# select locktype, relation, mode from pg_locks where virtualtransaction = (select virtualtransaction from pg_locks where mode='ExclusiveLock' and transactionid::text=txid_current()::text) and (relation = (select relid from pg_stat_user_tables where relname = 'dummy') or relation is null);
   locktype    | relation |       mode       
---------------+----------+------------------
 relation      |   178844 | RowExclusiveLock
 virtualxid    |          | ExclusiveLock
 transactionid |          | ExclusiveLock
(3 rows)

What you can see here are the 3 locks acquired by the UPDATE query we executed before. If you were expecting only one - well, you were wrong.

First we have RowExclusiveLock - this one tells us that there were some row-level write operations made - we did an UPDATE so no wonder we have a lock in this mode. The second and the third are actually related to each other - you see, when a transaction starts, it takes an ExclusiveLock on its identifier. This is necessary for other transactions to know that this particular one is still in progress. There's a slight difference when it comes to the lock type - until the transaction doesn't make a change in the data, it has a virtual identifier; only after a write operation is performed is the transaction assigned a permanent identifier. In our case the transaction first took a lock on it's virtual id and then, after issuing the UPDATE query it also took another one on its perm id.

All the locks acquired by a transaction are in effect till its end - be it a COMMIT or a ROLLBACK.

So, as you can see, there can be many locks even on a database with an average workload and it doesn't have to mean that something is wrong.

All locks are bad

This is actually related to the previous myth and it's especially visible in all sorts of graphing systems. There are two kinds of locks, but usually they're both presented on the same graph and this might be confusing even for experienced system administrators (though I've also seen it happen to some junior DBAs).

So what you really should know and understand is that we have granted locks and we have waiting locks. You shouldn't really worry about the number of granted ones unless you also have a lot of waiting ones. The names might be self-explanatory, but just to be clear let's describe both kinds.

When a transaction tries to acquire a lock, it might be granted or the transaction might have to wait for it. Let's look at two concurrent sessions:

demo=# \set PROMPT1 '%`date +%Y%m%d-%H:%M:%S` (SESSION 1) %# '
20150807-22:19:43 (SESSION 1) # begin;
BEGIN
20150807-22:19:44 (SESSION 1) # select * from dummy limit 0; abc 
-----
(0 rows)
20150807-22:19:48 (SESSION 1) # select pid, locktype, relation, mode, granted from pg_locks where virtualtransaction = (select virtualtransaction from pg_locks where mode='ExclusiveLock' and transactionid::text=txid_current()::text) and (relation = (select relid from pg_stat_user_tables where relname = 'dummy') or relation is null);
  pid  |   locktype    | relation |      mode       | granted 
-------+---------------+----------+-----------------+---------
 16815 | relation      |   178844 | AccessShareLock | t
 16815 | virtualxid    |          | ExclusiveLock   | t
 16815 | transactionid |          | ExclusiveLock   | t
(3 rows)

                                  demo=# \set PROMPT1 '%`date +%Y%m%d-%H:%M:%S` (SESSION 2) %# '
                                  20150807-22:19:56 (SESSION 2) # begin;
                                  BEGIN
                                  20150807-22:20:00 (SESSION 2) # drop table dummy;
                                  (...waiting...)

20150807-22:20:12 (SESSION 1) # select pid, locktype, relation, mode, granted from pg_locks where (relation = (select relid from pg_stat_user_tables where relname = 'dummy') or relation is null);  pid  |   locktype    | relation |        mode         | granted 
-------+---------------+----------+---------------------+---------
 16803 | virtualxid    |          | ExclusiveLock       | t
 16815 | virtualxid    |          | ExclusiveLock       | t
 16803 | relation      |   178844 | AccessExclusiveLock | f
 16815 | transactionid |          | ExclusiveLock       | t
 16815 | relation      |   178844 | AccessShareLock     | t
(5 rows)

So what you can see here are two sessions running at the same time; first one starts an explicit transaction and does a simple SELECT; second one starts another transaction a bit later and wants to DROP TABLE. As you can see, the first one holds three locks (as described above) - all are granted. The second one needs an AccessExclusiveLock in order to drop the table, but cannot acquire it immediately since there's another conflicting lock granted already (AccessShareLock) - so apart from few granted locks we have now also a waiting lock.

So granted locks can hurt you only if they're causing other lock types to queue up; if they're not doing that, you can have a lot of them and you'll still be fine.

As an illustration of the two above myths let's take a look following sample graph:

Locks

As you can see, there's a one significant spike in the number of AccessShare locks. This doesn't have to mean, that something is wrong, but depending on some other factors it can be an incentive to investigate a bit more.

Only exclusive locks are bad

Well, it really depends. As demonstrated above, in some cases even light locks can cause problems. It's really important to remember that especially if your application uses long transactions that span over several hours. I had to explain the importance of that to developers more times than I can count - you should close your transactions as soon as possible and check your code over and over again for places that might cause your transactions to "hang". Look again at the previous snippet of our two sessions running in parallel. Now, imagine that the first session is started by your app but something goes wrong and it stays open. You want to deploy a new app version with some migrations adding a new column to dummy table. Well, your migration will fail. And so will your deployment.

Lock queue? What queue?

Yes, that's actually pretty common. Developers tend to forget that there's a lock queue. Let me show you why you should always remember that.

Let's go back to our two concurrent sessions again. First one has an open transaction that holds an AccessShareLock on our table, second has a waiting AccessExclusiveLock in order to drop the table.
Let's fire up two more sessions. Observe:

demo=# \set PROMPT1 '%`date +%Y%m%d-%H:%M:%S` (SESSION 3) %# '
20150807-16:03:24 (SESSION 3) # select * from dummy;
(...waiting...)

demo=# \set PROMPT1 '%`date +%Y%m%d-%H:%M:%S` (SESSION 4) %# '
20150807-16:03:51 (SESSION 4) # select pid, locktype, relation, mode, granted from pg_locks where (relation = (select relid from pg_stat_user_tables where relname = 'dummy') or relation is null);
  pid  |   locktype    | relation |        mode         | granted 
-------+---------------+----------+---------------------+---------
 20352 | virtualxid    |          | ExclusiveLock       | t
 20329 | virtualxid    |          | ExclusiveLock       | t
 16803 | virtualxid    |          | ExclusiveLock       | t
 16815 | virtualxid    |          | ExclusiveLock       | t
 16803 | relation      |   178844 | AccessExclusiveLock | f
 16815 | transactionid |          | ExclusiveLock       | t
 20329 | relation      |   178844 | AccessShareLock     | f
 16815 | relation      |   178844 | AccessShareLock     | t
(8 rows)

So as you can see we have now two waiting locks - an AccessExclusiveLock that the DROP TABLE query is waiting to acquire and AccessShareLock that the new SESSION 3 needs in order to execute the SELECT query.

The lock queue exists for a very simple reason - if the DROP TABLE statement were to succeed, the SELECT query against the same table issued after the DROP TABLE one should fail. Without lock queue in place the SELECT would simply succeed even though the DROP TABLE issued before would still be waiting.

If we go a bit deeper into PostgreSQL's system catalogues and execute the awesome query from the PostgreSQL wiki, we can even see, what is blocking what:

20150807-16:10:24 (SESSION 4) # \e
 blocked_pid | blocking_pid 
-------------+--------------
       16803 |        20329
       16803 |        16815
       20329 |        16803
       20329 |        16815
(4 rows)

So here you can clearly see, that our first session (pid 16815, so SELECT) blocks second session (pid 16803 - yep, I started it earlier, than the first one) and this one in turn blocks the third (pid 20329). As for the two other rows - well, the query is really helpful, but not really super-exact.

These are the most common myths with regards to locking in PostgreSQL that I came across when working with developers and system administrators.

Robert M. Wysocki is a freelance PostgreSQL consultant, experienced GNU/Linux administrator and author of travellingsysop.net technical blog

This article is licensed with CC-BY-NC-SA 4.0 by Compose.