Using Bucardo 5.3 to Migrate a Live PostgreSQL Database
PublishedRecently, we did some exploration into PostgreSQL tools for data migration, and Bucardo was by far the most popular choice. It’s a tool that is long in the tooth, which tends to mean stable. However, when doing the research, we didn’t find a simple, up-to-date tutorial for Bucardo. That helps no one, so we've put together this tutorial to assist new, and old users of Bucardo to apply it to live migration.
The Components
The Source Database: This will be the PostgreSQL database which holds the data we want to replicate. You won't need superuser access to this database as long as you can run these two commands on it:
SET session_replication_role = `replica`;
CREATE EXTENSION plperl;
If you can run them without complaint, then you're good to go. If not, then check the Running without Superuser at the end of this article as it may give you a way to still run Bucardo.
The Destination Database: This will be the database that the Source Database will be copied into. Same terms apply to this database, you'll need to be able to run:
SET session_replication_role = `replica`;
CREATE EXTENSION plperl;
without complaint to continue. Or, again, you could refer to the Running without Superuser section.
Bucardo 5.3: You will, naturally, require the latest version of Bucardo which you can download from https://github.com/bucardo/bucardo/releases/tag/5.3.1. The code will need somewhere to run which is where the Bucardo PostgreSQL comes in...
Bucardo PostgreSQL: This is the PostgreSQL data store used by Burcardo. You'll need superuser access to this as it uses the untrusted version of PL/Perl plperlu
. That also means it can't be hosted on most services (including Compose's PostgreSQL).
Setting up Bucardo
1. Get dependencies
We'll be creating the Bucardo PostgreSQL installation on a fresh Ubuntu 14.04 LTS system. We'll need to install a few things for that to happen. To install all the needed packages:
sudo apt-get install libdbi-perl libdbd-pg-perl libboolean-perl wget build-essential libreadline-dev libz-dev autoconf bison libtool libgeos-c1 libproj-dev libgdal-dev libxml2-dev libxml2-utils libjson0-dev xsltproc docbook-xsl docbook-mathml libossp-uuid-dev libperl-dev libdbix-safe-perl postgresql postgresql-plperl
2. Get Bucardo
With all those dependencies in place, we can can now download the source of Bucardo. Let's download and unpack it into the well-known location /usr/local/src
:
sudo chown $USER:$USER /usr/local/src
cd /usr/local/src
wget https://github.com/bucardo/bucardo/archive/5.3.1.tar.gz
tar zxvf 5.3.1.tar.gz
cd bucardo*
3. Set up required directories
The Bucardo PostgreSQL will need to keep various files so we'll give it a directory in /var/log
and /var/run
to work with.
sudo mkdir -p /var/log/bucardo /var/run/bucardo
sudo chown $USER:$USER /var/log/bucardo /var/run/bucardo
4. Configuring Bucardo's PostgreSQL
When Bucardo starts up it'll need some settings. It gets these from the .bucardorc
file in your home directory, so we need to create that file with the settings for the local PostgreSQL. Cut and paste the following to do that:
cat > $HOME/.bucardorc <<EOL
dbhost=127.0.0.1
dbname=bucardo
dbport=5432
dbuser=bucardo
EOL
5. Storing PostgreSQL connection information
We will need to connect to that local PostgreSQL too and the settings for that are kept in the .pgpass
file. They include a password, so we will also need to secure that file once created:
echo "127.0.0.1:5432:bucardo:bucardo:bucardo-runner" > $HOME/.pgpass
chmod 0600 $HOME/.pgpass
6. Starting the Bucardo PostgreSQL
It's time to boot up the PostgreSQL server for the first time and create a database super-user that matches the login credentials we just saved in .pgpass
. These commands run as the system's postgres
user and use a combination of sudo and su to switch to that user.
sudo su - postgres -c "psql postgres"
CREATE USER bucardo WITH LOGIN SUPERUSER ENCRYPTED PASSWORD 'bucardo-runner';
CREATE DATABASE bucardo;
<CTRL-D>
7. Install Bucardo into PostgreSQL
This is where we get to turn what is a plain PostgreSQL into a Bucardo PostgreSQL. We're going to install the Bucardo code into the local PostgreSQL. Run this:
./bucardo install --quiet
And it'll come up with the option settings it plans to configure the database with:
1. Host: 127.0.0.1
2. Port: 5432
3. User: bucardo
4. Database: bucardo
5. PID directory: /var/run/bucardo
Make your list of options read like this. Most likely, we'll need to change option 3 to bucardo
. Then we enter "P" to proceed and the database is ready to use with Bucardo.
Variable setup time, so we don't have to keep retyping them
First for the Source Database
These example variables are for references to the Source Database. Insert your own values as appropriate:
export SOURCE_HOST=bucardo-source.cndxiylzqybi.us-east-1.rds.amazonaws.com
export SOURCE_PORT=5432
export SOURCE_DATABASE=source_db
export SOURCE_USERNAME=winslett
export SOURCE_PASSWORD=***
Second, the Destination Database
Again, these example variables are references to the Destination Database. Insert your own values:
export DEST_HOST=bucardo-destination.cndxiylzqybi.us-east-1.rds.amazonaws.com
export DEST_PORT=5432
export DEST_DATABASE=source_db
export DEST_USERNAME=winslett
export DEST_PASSWORD=***
Finally, the tables to be migrated
In this example, we're going to be migrating two tables, public.companies
and public.employees
. These names are passed to commands in two ways. First as parameters to a -t
option:
export TABLES="-t public.companies -t public.employees"
And also as a space separated list:
export TABLES_WITH_SPACES="public.companies public.employees"
We'll need both of those as we move on to...
Set up the Destination Database
We need to get the schema from the Source Database and create it in the Destination Database. There are a few commands to be run so let's save a little time and save the login credentials locally in the .pgpass
file:
cat >> $HOME/.pgpass <<EOL
$DEST_HOST:$DEST_PORT:$DEST_DATABASE:$DEST_USERNAME:$DEST_PASSWORD
$SOURCE_HOST:$SOURCE_PORT:$SOURCE_DATABASE:$SOURCE_USERNAME:$SOURCE_PASSWORD
EOL
The first step then is to extract the schema from the Source Database using pg_dump
, then we clean up the schema dump removing trigger definitions and similar unneeded items:
pg_dump "host=$SOURCE_HOST port=$SOURCE_PORT dbname=$SOURCE_DATABASE user=$SOURCE_USERNAME" $TABLES --schema-only | grep -v 'CREATE TRIGGER' | grep -v '^--' | grep -v '^$' | grep -v '^SET' | grep -v 'OWNER TO' > schema.sql
The next step is to create our database in the Destination Database:
psql "host=$DEST_HOST port=$DEST_PORT dbname=postgres user=$DEST_USERNAME" -c "CREATE DATABASE $DEST_DATABASE;"
And, with the database created, load the schema into it:
psql "host=$DEST_HOST port=$DEST_PORT dbname=$DEST_DATABASE user=$DEST_USERNAME" -f schema.sql
We are now ready to head into the final furlong...
Setting up Bucardo Sync
With the databases ready to go, we can now configure Bucardo itself. We add the Source Database first:
./bucardo add db source_db dbhost=$SOURCE_HOST dbport=$SOURCE_PORT dbname=$SOURCE_DATABASE dbuser=$SOURCE_USERNAME dbpass=$SOURCE_PASSWORD
Then we add the Destination Database:
./bucardo add db dest_db dbhost=$DEST_HOST dbport=$DEST_PORT dbname=$DEST_DATABASE dbuser=$DEST_USERNAME dbpass=$DEST_PASSWORD
We add the tables we want to migrate and add them to a "copying herd":
./bucardo add tables $TABLES_WITH_SPACES db=source_db
./bucardo add herd copying_herd $TABLES_WITH_SPACES
And build a sync based on that "copying herd":
./bucardo add sync the_sync relgroup=copying_herd dbs=source_db:source,dest_db:target onetimecopy=2
Sync - Go!
With everything in place, we can now simply run:
./bucardo start
And the syncing will begin and keep going till we stop the process. Restarting the process will restart the sync. How can you tell what it's doing? You could log in, after pushing Bucardo into the background or using another session, to the Destination Database with:
psql "host=$DEST_HOST port=$DEST_PORT dbname=$DEST_DATABASE user=$DEST_USERNAME"
And query the tables there to check the status of things. If it doesn't seem right, you'll want to do some diagnostics.
Debugging
- If you want to know the current status of the Bucardo sync, run
./bucardo status
. This will tell you the information on all the synchronisations that the Bucardo instance is running. If you name a sync in the command, such as./bucardo status the_sync
, you'll get more detailed information. - Tailing logs can reveal how well the syncs are running too. Run
tail -f /var/log/bucardo/log.bucardo
to see Bucardo's. It'll either be errors or status messages. If it's errors then you have a problem and the error messages should give you a handle on what's failing. If it's successful status messages, proceed with your day knowing you've configured Bucardo.
Running without superuser on the Destination Database
When using a PostgreSQL database, administered by someone else or with a hosted or shared service, there's a good chance you won't have superuser access. This can be problematic as Bucardo would, by default, like superuser access to the destination database. But, there is a little hack you can do to get around this. The thing you need to be able to do is load an extension and change your session_replication_role
, as we did at the start of this article.
Now, we can't just let you install any extension as a user, but if you can get the database administrator to use the pgextwlist
extension, they can whitelist particular extensions for loading by non-superusers. In this case, they'd have to add it and add pgperl
to the whitelist. We've already done that at Compose. Your milage may vary with other database hosting services.
As for changing session_replication_role
, its possible to create a function to allow users to make that change. The following function was recommended in an article by C. Keane on the Amazon RDS blog:
CREATE OR REPLACE FUNCTION public.rds_session_replication_role(role text)
RETURNS text
LANGUAGE plpgsql
SECURITY DEFINER
AS $function$
DECLARE
curr_val text := 'unset';
BEGIN
EXECUTE 'SET session_replication_role = ' || quote_literal(role);
EXECUTE 'SHOW session_replication_role' INTO curr_val;
RETURN curr_val;
END
$function$;
Once you have created a function like that you'll need to change the Bucardo.pm
file at line 3183 to read:
$dbh->do(q{DO LANGUAGE plpgsql $$DECLARE somevar VARCHAR; BEGIN SELECT p.proname INTO somevar FROM pg_catalog.pg_proc p WHERE p.proname = 'rds_session_replication_role' LIMIT 1; IF NOT FOUND THEN SET session_replication_role = 'replica'; ELSE PERFORM compose_session_replication_role('replica'); END IF; END$;});
At Compose, we've already created the replication function compose_session_replication_role
and it's added to PostgreSQL by default to allow you to run Bucardo.
Conclusion
Configuring Bucardo is not the simplest process, but this article should get your databases synchronising for the first time. The techniques here can be applied to any setup, but at Compose, we're already ready to be your Bucardo Destination Database.