Bucardo

Bucardo is an asynchonous master-master and master-slave replication system for Postgres. It uses triggers on individual tables. It supports conflict resolution and exception handling through the use of custom Perl subroutines.

This document covers Bucardo 3.0.9. The latest version of this document can always be found at http://bucardo.org/bucardo.html



Bucardo requirements

Bucardo requires no modification to your installation of Postgres, and runs as a Perl daemon which connects to the control database and all the databases to be replicated. To use Bucardo, you will need:

Postgres

Bucardo requires that all databases involved in the replication be running version 8.1 or greater, and that they have the Pl/Pgsql language installed. The database that Bucardo itself uses must have the Pl/Perlu language installed.

Perl

Bucardo runs as a series of Perl daemons, and requires version 5.8.3 of Perl or better.The following modules are also required to run Bucardo:

In order to run the test suite (highly recommended), the following modules are required:

* These modules come bundled with Bucardo.

Unix-like system

Currently, Bucardo has only been tested on Linux distributions. In theory, it should work fine on most other unix-like systems. It will not run on Windows without some minor modifications to code involving system calls.



Bucardo Features

Bucardo has among its features the folowing:



Bucardo Limitations

Bucardo, like all replication systems, has limitations, including:



Installing Bucardo

Installing Bucardo is a fairly straightforward process:

Make and install the Perl modules

Bucardo comes with three Perl modules. These are installed in the typical Perl fashion:

perl Makefile.PL make make install

The three modules are Bucardo, DBIx::Safe, and Test::Dynamic.

Bucardo

This is the main Bucardo module, which also contains the helper scripts. Most significantly, it contains bucardo_ctl and Bucardo.pm.

DBIx::Safe

This module is used to provide safe versions of the database handles to the conflict resolution and exception handling routines.

Test::Dynamic

This is used to count the number of tests in the test suite.

Create the database

Bucardo needs its own database to keep track of things. Create a superuser named Bucardo, and a database named Bucardo owned by that user. Make sure the required languages are installed.

CREATE USER bucardo SUPERUSER; CREATE DATABASE bucardo OWNER bucardo; CREATE LANGUAGE plpgsql; CREATE LANGUAGE plperlu;

It is recommended that you put the bucardo database on the same server as your busiest replicated database, to reduce some network traffic.

Import the schema

Import the schema into the newly created database:

psql -f bucardo.schema -U bucardo bucardo



Testing Bucardo

It is highly recommended that you run the test suite that comes with Bucardo. This can not only flush out normal bugs in Bucardo, but bugs that may be specific to your particular environment. To run the test, go to the Bucardo directory and (after issuing a 'perl Makefile.PL') run the tests with either "make test" or (better still), using the "tmtv" script, which is simply:

time make test TEST_VERBOSE=1

This will run the tests in verbose mode (which shows you the name and result of each test), as well as timing the whole process. Testing involves lots of copying data from one database to another, so the full test suite will take a number of minues to complete.

Because Bucardo uses forking, calling the main daemon directly from the testing suite does not work. Therefore, a helper program is kicked off before the Test:: modules are called. This program needs to know where to find a database to connect to in order to perform the testing. Editing the file "t/bucardo.test.data" is therefore necessary before running the test suite. The contents of the file are (hopefully) self-explanatory.

There are a few environment variables you can set to help you when debugging and running tests:

Bucardo testing environment variables
Name Default Description
BUCARDO_TEST_NUKE_OKAY 1 If set, will not prompt before droppping the TESTBCxxx databases.
BUCARDO_TESTBAIL 0 If set, testing will stop when the first error appears.
BUCARDO_KEEP_OLD_DEBUG 0 If set, the temporary log files will not be removed at the end of testing.

The tests are grouped into logical families, which can be toggled at the top of the main test script, "01bc.t". The group of tests labelled TEST_RANDOM_SWAP is off by default, as their randomnes sometimes causes deadlock and serialization errors. Nonetheless, it is recommended to turn this on and run it, and try and discover non-deadlock, non-serialization errors that may pop up.

Please report any bugs found in testing to the mailing list (preferred) or the author of this module.



Bucardo Concepts

There are some common terms used when talking about Bucardo:



Populating Bucardo

Once Bucardo has been installed, the next step is to populate its database with the specific information for your replicaton needs. Basically, this means that you need to add information to some of the tables within the bucardo schema. Information can be added using the standard Moose-like methods within Bucardo.pm, or simply by adding rows to the correct database tables within the bucardo database.

Adding databases

Each database involved in the replication must be added to the db table within the bucardo database.

Table "db"
Column Type Default Required? Description
name text None Yes A unique name for this database, often the same as the dbname or dbhost for dedicated boxes. Must be of pattern [A-Za-z]\w*
dbhost text Empty string No The hostname the database is on. If empty, connection is made locally via Unix sockets.
dbport text 5432 No The port number the database is listening on.
dbname text None Yes The name of the Postgres database
dbuser text None Yes The username to connect as.
dbpass text None No The password to connect as. If empty, $ENV{DBI_PASS} and the .pgpass file may be used.
pgpass text None No Full path to a .pgpass file
dbconn text Empty string No String to add to the end of the generated DSN.
status text 'active' No Can be 'active' or 'inactive'. If inactive, no replication to or from this database will occur.
sourcelimit smallint 0 No Maximum concurrent Bucardo read connections to this database
targetlimit smallint 0 No Maximum concurrent Bucardo write connections to this database

Example:

INSERT INTO db(name, dbname, dbhost, dbuser) VALUES ('slave1','sales','sales-1.example.com','postgres'), VALUES ('venus','product','venus','ro_user');

Table "dbgroup"
Column Type Default Required? Description
name Text None Yes Unique name for this database group. Must be of pattern [A-Za-z]\w*

Table "dbmap"
Column Type Default Required? Description
db Text None Yes Name of the database; foreign key to db.name
dbgroup Text None Yes Name of the database group; foreign key to dbgroup.name
priority Smallint 0 No For ordering within the group: higher numbers go first when syncing

Example:

INSERT INTO dbgroup(name) VALUES ('readonlys'); INSERT INTO dbmap(db,dbgroup) VALUES ('venus','readonlys'), ('mercury','readonlys');

Adding goats

Each table that needs to be replicated needs to be added to the goat table.

Table "goat"
Column Type Default Required? Description
db Text None Yes The name of the database this table is in; foreign key to db.name.
schemaname Text 'public' No The schema this table belongs to
tablename Text None Yes The name of the table
pkey Text None Yes The primary key column for this table
pkeytype Text Null Yes Type of primary key: must be one of: 'smallint','int','bigint','text','timestamp','date'
ping Boolean Null No Issue NOTIFY via a trigger when this table changes? Used to override a sync-level ping.
has_delta Boolean False No Whether or not this table has delta rows. Usually okay to leave as is.
ghost Boolean False No Table has triggers and rules dropped, but is not replicatedt
customselect Text Null No A SELECT statement to transform the data between master and slave
makedelta Text Null No Whether to create fake delta rows to enable multi-sync pushes
rebuild_index Text Null No Whether to turn off indexes and then rebuild
standard_conflict Text Null No The method to resolve collisions for this table, one of: 'source','target','skip','random','latest','abort'. Only needed for 'swap' syncs.
analyze_after_copy Boolean True No Whether to run an ANALYZE on the table after a fullcopy sync

Example:

INSERT INTO goat(db, tablename, pkey, pkeytype, standard_conflict) VALUES('venus', 'inventory', 'id', 'bigint', 'source');

Adding herds

Goats can be grouped together into herds. Goats can belong to one or more herdds via a many to many mapping using the herdmap table.

Table "herd"
Column Type Default Required? Description
name Text None Yes Unique name for this herd

Table "herdmap"
Column Type Default Required? Description
herd Text None Yes Name of a herd; foreign key to herd.name
goat Integer None Yes ID of a goat; foreign key to goat.id
priority Smallint 0 No For ordering within a herd: higher numbers go first when syncing

Example:

INSERT INTO herd(name) VALUES ('merch'); INSERT INTO herdmap(herd,goat) SELECT 'merch', id FROM goat WHERE db='venus' AND tablename IN ('inventory','stats','sales');

Adding syncs

The last step is to add syncs, which are individual replication events, added to the sync table:

Table "herdmap"
Column Type Default Required? Description
name Text None Yes A unique name for this sync. Must be of pattern [A-Za-z]\w*
source Text None Yes Name of the source herd; foreign key to herd.name
targetdb Text None No* The target database; foreign key to db.name. Cannot be NULL if targetgroup is NULL.
targetgroup Text None No* The target database group; foreign key to dbgroup.name. Cannot be NULL if targetdb is NULL.
synctype Text None Yes What type of sync this is, one of: 'pushdelta','fullcopy','swap'
stayalive Boolean True No Does the sync controller stay connected when finished?
kidsalive Boolean True No Do the children stay connected when finished?
copyextra Text Empty string No Extra test to put after COPY command such as "WITH OIDS"
deletemethod Text 'delete' No How to delete rows, one of 'delete' 'truncate'. Truncate is fast but does locking.
limitdbs Smallint 0 No How many databases can we sync to at once? 0 = all. If set to one, we only sync to one target at a time.
ping Boolean True No Are we issuing NOTIFY via triggers?
do_listen Boolean False No Allows direct NOTIFY kick calls even if ping is false
checktime Interval Null No How often to run the sync if no activity?
status Text 'active' No Currently, only 'active' and 'inactive'
makedelta Boolean false No Whether to create fake delta rows to enable multi-sync pushes
rebuild_index Boolean false No Whether to turn off indexes and then rebuild them
priority Smallint 0 No Higher numbered sync run first
disable_triggers Text 'pg_class' Yes How to disable triggers, one of: 'pg_class','SQL','replica'. Replica is highly recommended but will only work on Postgres version 8.3 or greater. SQL ("DISABLE TRIGGERS") will lock the tables, while pg_class will not, but will cause the very occasional OID error.
disable_rules Text 'pg_class' Yes How to disable rules, one of: 'none','pg_class','replica'
analyze_after_copy Boolean True No Whether to run an ANALYZE on the table after a fullcopy sync
overdue Interval '0 seconds' No How long until the sync is considered overdue, e.g. for nagios warnings. '0 seconds' = do not check.
expired Interval '0 seconds' No How long until the sync is considered expired, e.g. for nagios errors.

Example:

INSERT INTO sync(name,source,targetdb,synctype,checktime,disable_triggers,disable_rules) VALUES ('merch','prod1','prod2','swap','10 minutes','replica','replica');



Starting and Stopping Bucardo

Bucardo is started and stopped with the bucardo_ctl script, which should be in the same directory as Bucardo.pm for ease. Usage is to simply give the action and the reason for doing so. The reason is logged to a local file and also sent in the email notice sent then Bucardo is brought up or down. It's polite to sign your name to the reason as well.

./bucardo_ctl stop "Stopping to add a new slave database - Greg" ./bucardo_ctl start "Restarting after donut break - Greg"

The file that contains a log of the stop and start reasons is set at the top of the bucardo_ctl file: it defaults to "/home/bucardo/restart.reason.log". For other arguments, and reason the start and stop arguments are named they are, see the section on bucardo_ctl.

Starting Bucardo

When starting up Bucardo, it is best to watch the logs and make sure no errors occur. By default, the logs are sent to syslog and it is recommended that they be routed to their own file, e.g. "/var/log/bucardo". Upon startup, Bucardo will connect to all the databases used for all sync marked active, and verify that the tables being replicated have the same structure.

Stopping Bucardo

When stopping Bucardo, the MCP program will tell the controllers to stop what they are doing and exit. They in turn will send the same message to their kids. Note that any existing replication events will not be cancelled but will wait until finished, so Bucardo may not stop for some time. If you really need to stop all Bucardo activity right away, you can manually kill any Bucardo processes from the command line. Looking for these processes is also the best way to check if Bucardo has completely finished shutting down:

ps -Afwww | grep Bucardo



Gathering statistics and status information

Bucardo comes with its own cgi script which provides a simplified, HTML table-based view of the status of all the syncs for one or more Bucardo instances. This script also outputs information designed for easy Nagios parsing. There is also a script for grabbing Bucardo information and adding it into a round-robin database (rrd), for use with reports by cacti and other programs. All scripts are located in the Bucardo/scripts directory.



Troubleshooting

Bucardo does its best to handle any surprises that come up, but there are three common classes of errors that can appear:

Startup Errors

If Bucardo fails to start at all, check that you have all the required modules: issue a perl -c Bucardo.pm. Another common error is not being able to connect to a remote database, or having a table definition change. The Bucardo logs are a valuable source of information: grepping the logs for the string Warning will quickly show you the important items.

Connection Errors

If Bucardo detects that a remote host is no longer reachable, that a remote database has crashed, or any other such error, it will send out a warning email, sleep for a little bit, and then attempt to restart. This will continue until the problem fixes itself of Bucardo is manually stopped.

Replication Errors

When replicating, Bucardo may come upon a situation it does not know how to handle. A common example is a violation of an existing constraint when doing master to master replication. It is your responsibility to provide Bucardo with write custom exception handling methods for these cases. A good way to detect these is to setup and use the web-based stats page.



Bucardo Logging

Bucardo is very verbose by design in its logging. By default, all logging is done via syslog to LOG_LOCAL1 (this can be changed via the syslog_facility setting in bucardo_config). It is highly recommended that you route bucardo messages to their own file, such as /var/log/bucardo. To do so, add this line to your syslog.conf file:

# Route Bucardo messages local1.* -/var/log/bucardo.log

This assumes that local1 is not being used by anything else of course. All important warnings and errors in the logs will have the string "Warning" inside of them. Thus, an easy way to check all the warnings, from newest to oldest, is:

tac /var/log/bucardo | grep Warning | less



The bucardo_ctl script

The main way of controlling Bucardo is through bucardo_ctl, a small command-line Perl script that is used to start and stop Bucardo, as well as providing a conveinent way to kick off syncs externally. The general format is an action verb plus a list of direct objects. Tasks that can be performed by bucardo_ctl are:

See the latest bucardo_ctl documentation for more information.



The bucardo_config table

Inside the bucardo database, the bucardo_config table holds many settings used throughout Bucardo. Some changes will require Bucardo to be restarted to take effect, while others can take effect immediately by issuing a "NOTIFY bucardo_reload_config" or using "./bucardo_ctl reload_config".

Name Default Description
kick_sleep 0.2 How long do we sleep while waiting for a kick response?
mcp_loop_sleep 0.1 How long does the main MCP daemon sleep between loops?
mcp_dbproblem_sleep 15 How many seconds to sleep before trying to respawn
ctl_nothingfound_sleep 1.0 How long does the controller loop sleep if nothing is found?
kid_nothingfound_sleep 0.1 How long does a kid sleep if nothing is found?
kid_nodeltarows_sleep 0.8 How long do kids sleep if no delta rows are found?
kid_serial_sleep 10 How long to sleep in seconds if we hit a serialization error
endsync_sleep 1.0 How long do we sleep when custom code requests an endsync?
mcp_pingtime 60 How often do we ping check the MCP?
ctl_pingtime 600 How often do we ping check the CTL?
kid_pingtime 60 How often do we ping check the KID?
ctl_checkonkids_time 10 How often does the controller check on the kids health?
ctl_checkabortedkids_time 30 How often does the controller check the q table for aborted children?
ctl_createkid_time 0.5 How long do we sleep to allow kids-on-demand to get on their feet?
tcp_keepalives_idle 10 How long to wait between each keepalive probe.
tcp_keepalives_interval 5 How long to wait for a response to a keepalive probe.
tcp_keepalives_count 2 How many probes to send. 0 indicates sticking with system defaults.
piddir /var/run/bucardo Directory holding Bucardo PID files
pidfile bucardo.pid Name of the main Bucardo pid file
stopfile fullstopbucardo Name of the semaphore file used to stop Bucardo processes
log_showpid 0 Show PID in the log output?
log_showtime 1 Show timestamp in the log output?
log_showline 0 Show line number in the log output?
reason_file /home/bucardo/restart.reason File to hold reasons for stopping and starting
syslog_facility LOG_LOCAL1 Which syslog facility level to use
kid_abort_limit 3 How many times we will restore an aborted kid before giving up?
default_email_to nobody@example.com Who to send alert emails to
default_email_from nobody@example.com Who the alert emails are sent as
stats_script_url http://www.bucardo.org/ Location of the stats script
upsert_attempts 3 How many times do we try out the upsert loop?
max_select_clause 500 Maximum number of items to select inside of IN() clauses
max_delete_clause 200 Maximum number of items to delete inside of IN() clauses


The bucardo_delta and bucardo_track tables

Each database that needs to track row changes (has at least one table being used for a swap or a pushdelta sync)will have the following two tables created in the "bucardo" schema:

These tables (and the schema) itself are automatically created and modified as needed. Note that for safety reasons, removing a sync will not remove a table's triggers or remove entries from the bucardo_delta table. This must be done manually, for the risk of removing important data is to high to make it an automatic task.



Bucardo Routine Maintenance

Bucardo does require some routine maintenance to keep things going, especially if you are processing heavy workloads. Sepcifically, there are some tables that need frequent vacuuming, and some that need frequent cleaning. There are provided examples of cronjobs to perform both of these tasks, but you should adjust them to your own system. The cronjobs will call the following functions:



Custom code hooks

Bucardo has the ability to run custom code at certain point in the replication process. Code to be run is added to the customcode table, and is either run at the goat or sync level.

Table "customcode"
Column Type Default Required? Description
name Text None Yes A unique name for this code
about Text Null No A longer description of the code
whenrun Text None Yes When this code should be run: see below
src_code Text None Yes The literal source code: a Perl subroutine
getdbh Boolean True No Does this code require database handles?
getrows Boolean False No Does this code require row information?

The "whenrun" column must be one of the following:

Table customcode.whenrun values
Value Level Description
before_sync sync Runs before a sync is started, before the controller is created.
before_txn sync Runs before the replication transaction is started.
before_check_rows sync Runs after txn started, but before delta rows are checked.
before_trigger_drop sync Runs immediately before the triggers and rules are disabled.
after_trigger_drop sync Runs immediately before the triggers and rules are disabled.
after_table_sync sync Runs after the tables have been synced.
before_trigger_enable sync Runs immediately before the triggers and rules are enabled.
after_trigger_enable sync Runs immediately after the triggers and rules are enabled.
after_txn sync Runs after the transaction has committed.
after_sync sync Runs after the sync has finished.
conflict goat Runs when a conflict is detected for swap syncs.
exception goat Runs when an exception is raised when replicating rows.

Code should then be associated with a goat or a sync via the customcode_map table:

Table "customcode_map"
Column Type Default Required? Description
code Text None Yes Which code; foreign key to code.name
sync Text Null No* The sync to attach the code to; foreign key to sync.name. Cannot be NULL if goat is NULL.
goat Integer Null No* The goat to attach the code to; foreign key to goat.id. Cannot be NULL if sync is NULL.
active Boolean True No Whether to run the code
priority Smallint 0 No Higher numbered runs first

Each coderef (the src_code column) is tested to make sure it compiles as Bucardo is starting up. To prevent any strange effects from having your subroutine run before being provided any real data, a hashref is passed to the subroutine in this phase with a key named dummy. Subroutines should return immediately if this key exists.

Custom code input

All custom code is passed the following information via a hashref as the first arg:

If "get_rows" is true, then an additional key is returned:

Due to the way that the code is evaluated within Bucardo, the custom code should not read in the hashref with a "shift", but as follows:

my ($result) = @_;



Bucardo Conflict Handling

Master to master replication requires a way to handle conflict, in the cases where the same row is updated on both databases. Bucardo provides both standard and custom conflict resolution methods, which are always set at the goat (table) level. To set a standard conflict resolution method, simply set the goat.standard_conflict column for the table in question to one of:

Bucardo standard conflict methods
Method Description
source The source database always wins.
target The target database always wins.
random One of the two sides is chosen at random as the winner.
latest The side most recently changed wins.
abort The sync is aborted, and will not continue.
skip No action is taken. Not very useful by itself.

Use of any but "source" and "target" is not recommended for most circumstances.

In some cases, however, applying specific knowledge about your database and business rules is the only way to truly resolve a conflict. To do so, you add an entry into the customcode table, with a the whenrun column set to "conflict". Basically, this is a Perl subroutine that receives information about the conflicting rows, and returns a value that tells Bucardo what to do.

Conflict code input

In addition to all of the normal items passed in to custom codes, conflict codes receive an additional key in the hashref named 'rowinfo' which contains:

Conflict code output:

The hashref that is passed to the conflict subroutine can be modified to let Bucardo know how to handle this conflict. Specifically, the "action" key is a bitmap that can be used to state which side should "win" the conflict:

However, it is also possible to modify the sourcerow and targetrow hashes inside of your subroutine. This makes it possible to put the modified rows back in the same database it came from. Hence:

If nothing should be done about this row, leave it as the default value:



Bucardo Exception Handling

A sync may fail because of an exception thrown by the database. A common example is a unique constraint on a non primary key column. Each table can be assigned one or more exception handlers via the customcode table to try and handle this problems.

Exception code input

In addition to all of the normal items passed in to custom codes, exception codes receive an additional key in the hashref named 'rowinfo' which contains:

Exception code output

It is expected that the excpetion code will change the rows directly via the sourcerow and targetrow keys, or use the database handles to connect back to the databases and fix the condition that caused the exception. Once it has done so, it should set the value of "runagain" to true, and Bucardo will retry the transaction that caused the exception. This process will be repeated, but will eventually throw a fatal error, if the number of attempts is greater than the number of rows for this sync.



The Bucardo Freezer

Bucardo tracks many things about each sync: when it completed, how long it took, how many rows were processed, etc. This data is stored in the q table, but this table can grow very quickly. To prevent access to this table from getting too slow, older data is moved from the q table to a table called master_q inside of the freezer schema. The master_q table is partitioned by date, so that the stats page only has to select from a few child tables, depending on the date range that is set. The moving of the old rows from the q table to the master_q table is accomplished by the bucardo_purge_q_table() function, which is usually called regularly (e.g. every five minutes) by a cronjob.



Bucardo pinging

All the bucardo processes will respond to a specific NOTIFY message. This can be used to verify that the process is still alive and working normally. For the MCP, the name of the message is "bucardo_mcp_ping". It will issue a "bucardo_mcp_pong" NOTIFY when it receives the message. (This is what "./bucardo_ctl ping" does). The controller and children use their PID (Process Identification number) to construct the ping and pong names. For controllers, the name is "bucardo_ctl_#_ping", where '#' is the PID, and for kids, the name is "bucardo_kid_#_ping". Both will return a "_pong" NOTIFY when they receive the ping.



Bucardo Development

The latest development version of Bucardo can be checked out like so:

git clone http://bucardo.org/bucardo.git/



How Bucardo Works

Detailed information on how the whole thing works.



Acknowledgments

Special thanks to Jon Jensen at End Point, who developed the push-tables script (an early ancestor of Bucardo), and who provided much code review, guidance, and worthy advice. Thanks to Ethan Rowe, Jeff Boes, and other End Point colleagues for code review, testing, and a willingness to answer questions quickly and correctly. Mark Johnson was invaluable in finding new ways to break the customcode sections and helping translate theory into working code.

Bucardo was developed for Backcountry.com and much thanks is due to them for their early support, particularly Spencer Christensen and Dave Jenkins. Bucardo would not be as far along as it is if it did not grow up in the demanding, complex, extremely high volume database environment at Backcountry.com.



Bucardo TODO

Bucardo is always improving. Some of the things things on its todo list are:



Bucardo resources

The canonical place for Bucardo information, releases, bug announcements, etc. is the website http://bucardo.org. There are also some mailing lists available: