Using Bucardo 5.3 to Migrate a Live PostgreSQL Database

Recently, we did some exploration into PostgreSQL tools for data migration, and Bucardo was by far the most popular choice. It’s a tool that is long in the tooth, which tends to mean stable. However, when doing the research, we didn’t find a simple, up-to-date tutorial for Bucardo. That helps no one, so we've put together this tutorial to assist new, and old users of Bucardo to apply it to live migration.

The Components

The Source Database: This will be the PostgreSQL database which holds the data we want to replicate. You won't need superuser access to this database as long as you can run these two commands on it:

SET session_replication_role = `replica`;  
CREATE EXTENSION plperl;  

If you can run them without complaint, then you're good to go. If not, then check the Running without Superuser at the end of this article as it may give you a way to still run Bucardo.

The Destination Database: This will be the database that the Source Database will be copied into. Same terms apply to this database, you'll need to be able to run:

SET session_replication_role = `replica`;  
CREATE EXTENSION plperl;  

without complaint to continue. Or, again, you could refer to the Running without Superuser section.

Bucardo 5.3: You will, naturally, require the latest version of Bucardo which you can download from https://github.com/bucardo/bucardo/releases/tag/5.3.1. The code will need somewhere to run which is where the Bucardo PostgreSQL comes in...

Bucardo PostgreSQL: This is the PostgreSQL data store used by Burcardo. You'll need superuser access to this as it uses the untrusted version of PL/Perl plperlu. That also means it can't be hosted on most services (including Compose's PostgreSQL).

Setting up Bucardo

1. Get dependencies

We'll be creating the Bucardo PostgreSQL installation on a fresh Ubuntu 14.04 LTS system. We'll need to install a few things for that to happen. To install all the needed packages:

sudo apt-get install libdbi-perl libdbd-pg-perl libboolean-perl wget build-essential libreadline-dev libz-dev autoconf bison libtool libgeos-c1 libproj-dev libgdal-dev libxml2-dev libxml2-utils libjson0-dev xsltproc docbook-xsl docbook-mathml libossp-uuid-dev libperl-dev libdbix-safe-perl postgresql postgresql-plperl  

2. Get Bucardo

With all those dependencies in place, we can can now download the source of Bucardo. Let's download and unpack it into the well-known location /usr/local/src:

sudo chown $USER:$USER /usr/local/src  
cd /usr/local/src  
wget https://github.com/bucardo/bucardo/archive/5.3.1.tar.gz  
tar zxvf 5.3.1.tar.gz  
cd bucardo*  

3. Set up required directories

The Bucardo PostgreSQL will need to keep various files so we'll give it a directory in /var/log and /var/run to work with.

sudo mkdir -p /var/log/bucardo /var/run/bucardo  
sudo chown $USER:$USER /var/log/bucardo /var/run/bucardo  

4. Configuring Bucardo's PostgreSQL

When Bucardo starts up it'll need some settings. It gets these from the .bucardorc file in your home directory, so we need to create that file with the settings for the local PostgreSQL. Cut and paste the following to do that:

cat > $HOME/.bucardorc <<EOL  
dbhost=127.0.0.1  
dbname=bucardo  
dbport=5432  
dbuser=bucardo  
EOL  

5. Storing PostgreSQL connection information

We will need to connect to that local PostgreSQL too and the settings for that are kept in the .pgpass file. They include a password, so we will also need to secure that file once created:

echo "127.0.0.1:5432:bucardo:bucardo:bucardo-runner" > $HOME/.pgpass  
chmod 0600 $HOME/.pgpass  

6. Starting the Bucardo PostgreSQL

It's time to boot up the PostgreSQL server for the first time and create a database super-user that matches the login credentials we just saved in .pgpass. These commands run as the system's postgres user and use a combination of sudo and su to switch to that user.

sudo su - postgres -c "psql postgres"  
CREATE USER bucardo WITH LOGIN SUPERUSER ENCRYPTED PASSWORD 'bucardo-runner';  
CREATE DATABASE bucardo;  
<CTRL-D>  

7. Install Bucardo into PostgreSQL

This is where we get to turn what is a plain PostgreSQL into a Bucardo PostgreSQL. We're going to install the Bucardo code into the local PostgreSQL. Run this:

./bucardo install --quiet

And it'll come up with the option settings it plans to configure the database with:

1. Host:           127.0.0.1  
2. Port:           5432  
3. User:           bucardo  
4. Database:       bucardo  
5. PID directory:  /var/run/bucardo  

Make your list of options read like this. Most likely, we'll need to change option 3 to bucardo. Then we enter "P" to proceed and the database is ready to use with Bucardo.

Variable setup time, so we don't have to keep retyping them

First for the Source Database

These example variables are for references to the Source Database. Insert your own values as appropriate:

export SOURCE_HOST=bucardo-source.cndxiylzqybi.us-east-1.rds.amazonaws.com  
export SOURCE_PORT=5432  
export SOURCE_DATABASE=source_db  
export SOURCE_USERNAME=winslett  
export SOURCE_PASSWORD=***  

Second, the Destination Database

Again, these example variables are references to the Destination Database. Insert your own values:

export DEST_HOST=bucardo-destination.cndxiylzqybi.us-east-1.rds.amazonaws.com  
export DEST_PORT=5432  
export DEST_DATABASE=source_db  
export DEST_USERNAME=winslett  
export DEST_PASSWORD=***  

Finally, the tables to be migrated

In this example, we're going to be migrating two tables, public.companies and public.employees. These names are passed to commands in two ways. First as parameters to a -t option:

export TABLES="-t public.companies -t public.employees"  

And also as a space separated list:

export TABLES_WITH_SPACES="public.companies public.employees"  

We'll need both of those as we move on to...

Set up the Destination Database

We need to get the schema from the Source Database and create it in the Destination Database. There are a few commands to be run so let's save a little time and save the login credentials locally in the .pgpass file:

cat >> $HOME/.pgpass <<EOL  
$DEST_HOST:$DEST_PORT:$DEST_DATABASE:$DEST_USERNAME:$DEST_PASSWORD
$SOURCE_HOST:$SOURCE_PORT:$SOURCE_DATABASE:$SOURCE_USERNAME:$SOURCE_PASSWORD
EOL  

The first step then is to extract the schema from the Source Database using pg_dump, then we clean up the schema dump removing trigger definitions and similar unneeded items:

pg_dump "host=$SOURCE_HOST port=$SOURCE_PORT dbname=$SOURCE_DATABASE user=$SOURCE_USERNAME" $TABLES --schema-only | grep -v 'CREATE TRIGGER' | grep -v '^--' | grep -v '^$' | grep -v '^SET' | grep -v 'OWNER TO' > schema.sql  

The next step is to create our database in the Destination Database:

psql "host=$DEST_HOST port=$DEST_PORT dbname=postgres user=$DEST_USERNAME" -c "CREATE DATABASE $DEST_DATABASE;"  

And, with the database created, load the schema into it:

psql "host=$DEST_HOST port=$DEST_PORT dbname=$DEST_DATABASE user=$DEST_USERNAME" -f schema.sql  

We are now ready to head into the final furlong...

Setting up Bucardo Sync

With the databases ready to go, we can now configure Bucardo itself. We add the Source Database first:

./bucardo add db source_db dbhost=$SOURCE_HOST dbport=$SOURCE_PORT dbname=$SOURCE_DATABASE dbuser=$SOURCE_USERNAME dbpass=$SOURCE_PASSWORD

Then we add the Destination Database:

./bucardo add db dest_db dbhost=$DEST_HOST dbport=$DEST_PORT dbname=$DEST_DATABASE dbuser=$DEST_USERNAME dbpass=$DEST_PASSWORD

We add the tables we want to migrate and add them to a "copying herd":

./bucardo add tables $TABLES_WITH_SPACES db=source_db
./bucardo add herd copying_herd $TABLES_WITH_SPACES

And build a sync based on that "copying herd":

./bucardo add sync the_sync relgroup=copying_herd dbs=source_db:source,dest_db:target onetimecopy=2

Sync - Go!

With everything in place, we can now simply run:

./bucardo start

And the syncing will begin and keep going till we stop the process. Restarting the process will restart the sync. How can you tell what it's doing? You could log in, after pushing Bucardo into the background or using another session, to the Destination Database with:

psql "host=$DEST_HOST port=$DEST_PORT dbname=$DEST_DATABASE user=$DEST_USERNAME"  

And query the tables there to check the status of things. If it doesn't seem right, you'll want to do some diagnostics.

Debugging

Running without superuser on the Destination Database

When using a PostgreSQL database, administered by someone else or with a hosted or shared service, there's a good chance you won't have superuser access. This can be problematic as Bucardo would, by default, like superuser access to the destination database. But, there is a little hack you can do to get around this. The thing you need to be able to do is load an extension and change your session_replication_role, as we did at the start of this article.

Now, we can't just let you install any extension as a user, but if you can get the database administrator to use the pgextwlist extension, they can whitelist particular extensions for loading by non-superusers. In this case, they'd have to add it and add pgperl to the whitelist. We've already done that at Compose. Your milage may vary with other database hosting services.

As for changing session_replication_role, its possible to create a function to allow users to make that change. The following function was recommended in an article by C. Keane on the Amazon RDS blog:

CREATE OR REPLACE FUNCTION public.rds_session_replication_role(role text)  
 RETURNS text
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$  
        DECLARE
                curr_val text := 'unset';
        BEGIN
                EXECUTE 'SET session_replication_role = ' || quote_literal(role);
                EXECUTE 'SHOW session_replication_role' INTO curr_val;
                RETURN curr_val;
        END
$function$;

Once you have created a function like that you'll need to change the Bucardo.pm file at line 3183 to read:

$dbh->do(q{DO LANGUAGE plpgsql $$DECLARE somevar VARCHAR; BEGIN SELECT p.proname INTO somevar FROM pg_catalog.pg_proc p WHERE p.proname = 'rds_session_replication_role' LIMIT 1; IF NOT FOUND THEN SET session_replication_role = 'replica'; ELSE PERFORM compose_session_replication_role('replica'); END IF; END$;});

At Compose, we've already created the replication function compose_session_replication_role and it's added to PostgreSQL by default to allow you to run Bucardo.

Conclusion

Configuring Bucardo is not the simplest process, but this article should get your databases synchronising for the first time. The techniques here can be applied to any setup, but at Compose, we're already ready to be your Bucardo Destination Database.