Bucardo is an asynchonous master-master and master-slave replication system for Postgres. It uses triggers on individual tables. It supports conflict resolution and exception handling through the use of custom Perl subroutines.
This document covers Bucardo 3.0.9. The latest version of this document can always be found at http://bucardo.org/bucardo.html
Bucardo requires no modification to your installation of Postgres, and runs as a Perl daemon which connects to the control database and all the databases to be replicated. To use Bucardo, you will need:
Bucardo requires that all databases involved in the replication be running version 8.1 or greater, and that they have the Pl/Pgsql language installed. The database that Bucardo itself uses must have the Pl/Perlu language installed.
Bucardo runs as a series of Perl daemons, and requires version 5.8.3 of Perl or better.The following modules are also required to run Bucardo:
In order to run the test suite (highly recommended), the following modules are required:
* These modules come bundled with Bucardo.
Currently, Bucardo has only been tested on Linux distributions. In theory, it should work fine on most other unix-like systems. It will not run on Windows without some minor modifications to code involving system calls.
Bucardo has among its features the folowing:
Bucardo, like all replication systems, has limitations, including:
Installing Bucardo is a fairly straightforward process:
Bucardo comes with three Perl modules. These are installed in the typical Perl fashion:
perl Makefile.PL make make install
The three modules are Bucardo, DBIx::Safe, and Test::Dynamic.
This is the main Bucardo module, which also contains the helper scripts. Most significantly, it contains bucardo_ctl and Bucardo.pm.
This module is used to provide safe versions of the database handles to the conflict resolution and exception handling routines.
This is used to count the number of tests in the test suite.
Bucardo needs its own database to keep track of things. Create a superuser named Bucardo, and a database named Bucardo owned by that user. Make sure the required languages are installed.
CREATE USER bucardo SUPERUSER; CREATE DATABASE bucardo OWNER bucardo; CREATE LANGUAGE plpgsql; CREATE LANGUAGE plperlu;
It is recommended that you put the bucardo database on the same server as your busiest replicated database, to reduce some network traffic.
Import the schema into the newly created database:
psql -f bucardo.schema -U bucardo bucardo
It is highly recommended that you run the test suite that comes with Bucardo. This can not only flush out normal bugs in Bucardo, but bugs that may be specific to your particular environment. To run the test, go to the Bucardo directory and (after issuing a 'perl Makefile.PL') run the tests with either "make test" or (better still), using the "tmtv" script, which is simply:
time make test TEST_VERBOSE=1
This will run the tests in verbose mode (which shows you the name and result of each test), as well as timing the whole process. Testing involves lots of copying data from one database to another, so the full test suite will take a number of minues to complete.
Because Bucardo uses forking, calling the main daemon directly from the testing suite does not work. Therefore, a helper program is kicked off before the Test:: modules are called. This program needs to know where to find a database to connect to in order to perform the testing. Editing the file "t/bucardo.test.data" is therefore necessary before running the test suite. The contents of the file are (hopefully) self-explanatory.
There are a few environment variables you can set to help you when debugging and running tests:
Name | Default | Description |
---|---|---|
BUCARDO_TEST_NUKE_OKAY | 1 | If set, will not prompt before droppping the TESTBCxxx databases. |
BUCARDO_TESTBAIL | 0 | If set, testing will stop when the first error appears. |
BUCARDO_KEEP_OLD_DEBUG | 0 | If set, the temporary log files will not be removed at the end of testing. |
The tests are grouped into logical families, which can be toggled at the top of the main test script, "01bc.t". The group of tests labelled TEST_RANDOM_SWAP is off by default, as their randomnes sometimes causes deadlock and serialization errors. Nonetheless, it is recommended to turn this on and run it, and try and discover non-deadlock, non-serialization errors that may pop up.
Please report any bugs found in testing to the mailing list (preferred) or the author of this module.
There are some common terms used when talking about Bucardo:
Once Bucardo has been installed, the next step is to populate its database with the specific information for your replicaton needs. Basically, this means that you need to add information to some of the tables within the bucardo schema. Information can be added using the standard Moose-like methods within Bucardo.pm, or simply by adding rows to the correct database tables within the bucardo database.
Each database involved in the replication must be added to the db table within the bucardo database.
Column | Type | Default | Required? | Description |
---|---|---|---|---|
name | text | None | Yes | A unique name for this database, often the same as the dbname or dbhost for dedicated boxes. Must be of pattern [A-Za-z]\w* |
dbhost | text | Empty string | No | The hostname the database is on. If empty, connection is made locally via Unix sockets. |
dbport | text | 5432 | No | The port number the database is listening on. |
dbname | text | None | Yes | The name of the Postgres database |
dbuser | text | None | Yes | The username to connect as. |
dbpass | text | None | No | The password to connect as. If empty, $ENV{DBI_PASS} and the .pgpass file may be used. |
pgpass | text | None | No | Full path to a .pgpass file |
dbconn | text | Empty string | No | String to add to the end of the generated DSN. |
status | text | 'active' | No | Can be 'active' or 'inactive'. If inactive, no replication to or from this database will occur. |
sourcelimit | smallint | 0 | No | Maximum concurrent Bucardo read connections to this database |
targetlimit | smallint | 0 | No | Maximum concurrent Bucardo write connections to this database |
Example:
INSERT INTO db(name, dbname, dbhost, dbuser) VALUES ('slave1','sales','sales-1.example.com','postgres'), VALUES ('venus','product','venus','ro_user');
Column | Type | Default | Required? | Description |
---|---|---|---|---|
name | Text | None | Yes | Unique name for this database group. Must be of pattern [A-Za-z]\w* |
Column | Type | Default | Required? | Description |
---|---|---|---|---|
db | Text | None | Yes | Name of the database; foreign key to db.name |
dbgroup | Text | None | Yes | Name of the database group; foreign key to dbgroup.name |
priority | Smallint | 0 | No | For ordering within the group: higher numbers go first when syncing |
Example:
INSERT INTO dbgroup(name) VALUES ('readonlys'); INSERT INTO dbmap(db,dbgroup) VALUES ('venus','readonlys'), ('mercury','readonlys');
Each table that needs to be replicated needs to be added to the goat table.
Column | Type | Default | Required? | Description |
---|---|---|---|---|
db | Text | None | Yes | The name of the database this table is in; foreign key to db.name. |
schemaname | Text | 'public' | No | The schema this table belongs to |
tablename | Text | None | Yes | The name of the table |
pkey | Text | None | Yes | The primary key column for this table |
pkeytype | Text | Null | Yes | Type of primary key: must be one of: 'smallint','int','bigint','text','timestamp','date' |
ping | Boolean | Null | No | Issue NOTIFY via a trigger when this table changes? Used to override a sync-level ping. |
has_delta | Boolean | False | No | Whether or not this table has delta rows. Usually okay to leave as is. |
ghost | Boolean | False | No | Table has triggers and rules dropped, but is not replicatedt |
customselect | Text | Null | No | A SELECT statement to transform the data between master and slave |
makedelta | Text | Null | No | Whether to create fake delta rows to enable multi-sync pushes |
rebuild_index | Text | Null | No | Whether to turn off indexes and then rebuild |
standard_conflict | Text | Null | No | The method to resolve collisions for this table, one of: 'source','target','skip','random','latest','abort'. Only needed for 'swap' syncs. |
analyze_after_copy | Boolean | True | No | Whether to run an ANALYZE on the table after a fullcopy sync |
Example:
INSERT INTO goat(db, tablename, pkey, pkeytype, standard_conflict) VALUES('venus', 'inventory', 'id', 'bigint', 'source');
Goats can be grouped together into herds. Goats can belong to one or more herdds via a many to many mapping using the herdmap table.
Column | Type | Default | Required? | Description |
---|---|---|---|---|
name | Text | None | Yes | Unique name for this herd |
Column | Type | Default | Required? | Description |
---|---|---|---|---|
herd | Text | None | Yes | Name of a herd; foreign key to herd.name |
goat | Integer | None | Yes | ID of a goat; foreign key to goat.id |
priority | Smallint | 0 | No | For ordering within a herd: higher numbers go first when syncing |
Example:
INSERT INTO herd(name) VALUES ('merch'); INSERT INTO herdmap(herd,goat) SELECT 'merch', id FROM goat WHERE db='venus' AND tablename IN ('inventory','stats','sales');
The last step is to add syncs, which are individual replication events, added to the sync table:
Column | Type | Default | Required? | Description |
---|---|---|---|---|
name | Text | None | Yes | A unique name for this sync. Must be of pattern [A-Za-z]\w* |
source | Text | None | Yes | Name of the source herd; foreign key to herd.name |
targetdb | Text | None | No* | The target database; foreign key to db.name. Cannot be NULL if targetgroup is NULL. |
targetgroup | Text | None | No* | The target database group; foreign key to dbgroup.name. Cannot be NULL if targetdb is NULL. |
synctype | Text | None | Yes | What type of sync this is, one of: 'pushdelta','fullcopy','swap' |
stayalive | Boolean | True | No | Does the sync controller stay connected when finished? |
kidsalive | Boolean | True | No | Do the children stay connected when finished? |
copyextra | Text | Empty string | No | Extra test to put after COPY command such as "WITH OIDS" |
deletemethod | Text | 'delete' | No | How to delete rows, one of 'delete' 'truncate'. Truncate is fast but does locking. |
limitdbs | Smallint | 0 | No | How many databases can we sync to at once? 0 = all. If set to one, we only sync to one target at a time. |
ping | Boolean | True | No | Are we issuing NOTIFY via triggers? |
do_listen | Boolean | False | No | Allows direct NOTIFY kick calls even if ping is false |
checktime | Interval | Null | No | How often to run the sync if no activity? |
status | Text | 'active' | No | Currently, only 'active' and 'inactive' |
makedelta | Boolean | false | No | Whether to create fake delta rows to enable multi-sync pushes |
rebuild_index | Boolean | false | No | Whether to turn off indexes and then rebuild them |
priority | Smallint | 0 | No | Higher numbered sync run first |
disable_triggers | Text | 'pg_class' | Yes | How to disable triggers, one of: 'pg_class','SQL','replica'. Replica is highly recommended but will only work on Postgres version 8.3 or greater. SQL ("DISABLE TRIGGERS") will lock the tables, while pg_class will not, but will cause the very occasional OID error. |
disable_rules | Text | 'pg_class' | Yes | How to disable rules, one of: 'none','pg_class','replica' |
analyze_after_copy | Boolean | True | No | Whether to run an ANALYZE on the table after a fullcopy sync |
overdue | Interval | '0 seconds' | No | How long until the sync is considered overdue, e.g. for nagios warnings. '0 seconds' = do not check. |
expired | Interval | '0 seconds' | No | How long until the sync is considered expired, e.g. for nagios errors. |
Example:
INSERT INTO sync(name,source,targetdb,synctype,checktime,disable_triggers,disable_rules) VALUES ('merch','prod1','prod2','swap','10 minutes','replica','replica');
Bucardo is started and stopped with the bucardo_ctl script, which should be in the same directory as Bucardo.pm for ease. Usage is to simply give the action and the reason for doing so. The reason is logged to a local file and also sent in the email notice sent then Bucardo is brought up or down. It's polite to sign your name to the reason as well.
./bucardo_ctl stop "Stopping to add a new slave database - Greg" ./bucardo_ctl start "Restarting after donut break - Greg"
The file that contains a log of the stop and start reasons is set at the top of the bucardo_ctl file: it defaults to "/home/bucardo/restart.reason.log". For other arguments, and reason the start and stop arguments are named they are, see the section on bucardo_ctl.
When starting up Bucardo, it is best to watch the logs and make sure no errors occur. By default, the logs are sent to syslog and it is recommended that they be routed to their own file, e.g. "/var/log/bucardo". Upon startup, Bucardo will connect to all the databases used for all sync marked active, and verify that the tables being replicated have the same structure.
When stopping Bucardo, the MCP program will tell the controllers to stop what they are doing and exit. They in turn will send the same message to their kids. Note that any existing replication events will not be cancelled but will wait until finished, so Bucardo may not stop for some time. If you really need to stop all Bucardo activity right away, you can manually kill any Bucardo processes from the command line. Looking for these processes is also the best way to check if Bucardo has completely finished shutting down:
ps -Afwww | grep Bucardo
Bucardo comes with its own cgi script which provides a simplified, HTML table-based view of the status of all the syncs for one or more Bucardo instances. This script also outputs information designed for easy Nagios parsing. There is also a script for grabbing Bucardo information and adding it into a round-robin database (rrd), for use with reports by cacti and other programs. All scripts are located in the Bucardo/scripts directory.
Bucardo does its best to handle any surprises that come up, but there are three common classes of errors that can appear:
If Bucardo fails to start at all, check that you have all the required modules: issue a perl -c Bucardo.pm
. Another common error is not being able to connect to a remote database, or having a table definition change. The Bucardo logs are a valuable source of information: grepping the logs for the string Warning will quickly show you the important items.
If Bucardo detects that a remote host is no longer reachable, that a remote database has crashed, or any other such error, it will send out a warning email, sleep for a little bit, and then attempt to restart. This will continue until the problem fixes itself of Bucardo is manually stopped.
When replicating, Bucardo may come upon a situation it does not know how to handle. A common example is a violation of an existing constraint when doing master to master replication. It is your responsibility to provide Bucardo with write custom exception handling methods for these cases. A good way to detect these is to setup and use the web-based stats page.
Bucardo is very verbose by design in its logging. By default, all logging is done via syslog to LOG_LOCAL1 (this can be changed via the syslog_facility setting in bucardo_config). It is highly recommended that you route bucardo messages to their own file, such as /var/log/bucardo. To do so, add this line to your syslog.conf file:
# Route Bucardo messages local1.* -/var/log/bucardo.log
This assumes that local1 is not being used by anything else of course. All important warnings and errors in the logs will have the string "Warning" inside of them. Thus, an easy way to check all the warnings, from newest to oldest, is:
tac /var/log/bucardo | grep Warning | less
The main way of controlling Bucardo is through bucardo_ctl, a small command-line Perl script that is used to start and stop Bucardo, as well as providing a conveinent way to kick off syncs externally. The general format is an action verb plus a list of direct objects. Tasks that can be performed by bucardo_ctl are:
See the latest bucardo_ctl documentation for more information.
Inside the bucardo database, the bucardo_config table holds many settings used throughout Bucardo. Some changes will require Bucardo to be restarted to take effect, while others can take effect immediately by issuing a "NOTIFY bucardo_reload_config" or using "./bucardo_ctl reload_config".
Name | Default | Description |
---|---|---|
kick_sleep | 0.2 | How long do we sleep while waiting for a kick response? |
mcp_loop_sleep | 0.1 | How long does the main MCP daemon sleep between loops? |
mcp_dbproblem_sleep | 15 | How many seconds to sleep before trying to respawn |
ctl_nothingfound_sleep | 1.0 | How long does the controller loop sleep if nothing is found? |
kid_nothingfound_sleep | 0.1 | How long does a kid sleep if nothing is found? |
kid_nodeltarows_sleep | 0.8 | How long do kids sleep if no delta rows are found? |
kid_serial_sleep | 10 | How long to sleep in seconds if we hit a serialization error |
endsync_sleep | 1.0 | How long do we sleep when custom code requests an endsync? |
mcp_pingtime | 60 | How often do we ping check the MCP? |
ctl_pingtime | 600 | How often do we ping check the CTL? |
kid_pingtime | 60 | How often do we ping check the KID? |
ctl_checkonkids_time | 10 | How often does the controller check on the kids health? |
ctl_checkabortedkids_time | 30 | How often does the controller check the q table for aborted children? |
ctl_createkid_time | 0.5 | How long do we sleep to allow kids-on-demand to get on their feet? |
tcp_keepalives_idle | 10 | How long to wait between each keepalive probe. |
tcp_keepalives_interval | 5 | How long to wait for a response to a keepalive probe. |
tcp_keepalives_count | 2 | How many probes to send. 0 indicates sticking with system defaults. |
piddir | /var/run/bucardo | Directory holding Bucardo PID files |
pidfile | bucardo.pid | Name of the main Bucardo pid file |
stopfile | fullstopbucardo | Name of the semaphore file used to stop Bucardo processes |
log_showpid | 0 | Show PID in the log output? |
log_showtime | 1 | Show timestamp in the log output? |
log_showline | 0 | Show line number in the log output? |
reason_file | /home/bucardo/restart.reason | File to hold reasons for stopping and starting |
syslog_facility | LOG_LOCAL1 | Which syslog facility level to use |
kid_abort_limit | 3 | How many times we will restore an aborted kid before giving up? |
default_email_to | nobody@example.com | Who to send alert emails to |
default_email_from | nobody@example.com | Who the alert emails are sent as |
stats_script_url | http://www.bucardo.org/ | Location of the stats script |
upsert_attempts | 3 | How many times do we try out the upsert loop? |
max_select_clause | 500 | Maximum number of items to select inside of IN() clauses |
max_delete_clause | 200 | Maximum number of items to delete inside of IN() clauses |
Each database that needs to track row changes (has at least one table being used for a swap or a pushdelta sync)will have the following two tables created in the "bucardo" schema:
These tables (and the schema) itself are automatically created and modified as needed. Note that for safety reasons, removing a sync will not remove a table's triggers or remove entries from the bucardo_delta table. This must be done manually, for the risk of removing important data is to high to make it an automatic task.
Bucardo does require some routine maintenance to keep things going, especially if you are processing heavy workloads. Sepcifically, there are some tables that need frequent vacuuming, and some that need frequent cleaning. There are provided examples of cronjobs to perform both of these tasks, but you should adjust them to your own system. The cronjobs will call the following functions:
Bucardo has the ability to run custom code at certain point in the replication process. Code to be run is added to the customcode table, and is either run at the goat or sync level.
Column | Type | Default | Required? | Description |
---|---|---|---|---|
name | Text | None | Yes | A unique name for this code |
about | Text | Null | No | A longer description of the code |
whenrun | Text | None | Yes | When this code should be run: see below |
src_code | Text | None | Yes | The literal source code: a Perl subroutine |
getdbh | Boolean | True | No | Does this code require database handles? |
getrows | Boolean | False | No | Does this code require row information? |
The "whenrun" column must be one of the following:
Value | Level | Description |
---|---|---|
before_sync | sync | Runs before a sync is started, before the controller is created. |
before_txn | sync | Runs before the replication transaction is started. |
before_check_rows | sync | Runs after txn started, but before delta rows are checked. |
before_trigger_drop | sync | Runs immediately before the triggers and rules are disabled. |
after_trigger_drop | sync | Runs immediately before the triggers and rules are disabled. |
after_table_sync | sync | Runs after the tables have been synced. |
before_trigger_enable | sync | Runs immediately before the triggers and rules are enabled. |
after_trigger_enable | sync | Runs immediately after the triggers and rules are enabled. |
after_txn | sync | Runs after the transaction has committed. |
after_sync | sync | Runs after the sync has finished. |
conflict | goat | Runs when a conflict is detected for swap syncs. |
exception | goat | Runs when an exception is raised when replicating rows. |
Code should then be associated with a goat or a sync via the customcode_map table:
Column | Type | Default | Required? | Description |
---|---|---|---|---|
code | Text | None | Yes | Which code; foreign key to code.name |
sync | Text | Null | No* | The sync to attach the code to; foreign key to sync.name. Cannot be NULL if goat is NULL. |
goat | Integer | Null | No* | The goat to attach the code to; foreign key to goat.id. Cannot be NULL if sync is NULL. |
active | Boolean | True | No | Whether to run the code |
priority | Smallint | 0 | No | Higher numbered runs first |
Each coderef (the src_code column) is tested to make sure it compiles as Bucardo is starting up. To prevent any strange effects from having your subroutine run before being provided any real data, a hashref is passed to the subroutine in this phase with a key named dummy. Subroutines should return immediately if this key exists.
All custom code is passed the following information via a hashref as the first arg:
If "get_rows" is true, then an additional key is returned:
Due to the way that the code is evaluated within Bucardo, the custom code should not read in the hashref with a "shift", but as follows:
my ($result) = @_;
Master to master replication requires a way to handle conflict, in the cases where the same row is updated on both databases. Bucardo provides both standard and custom conflict resolution methods, which are always set at the goat (table) level. To set a standard conflict resolution method, simply set the goat.standard_conflict column for the table in question to one of:
Method | Description |
---|---|
source | The source database always wins. |
target | The target database always wins. |
random | One of the two sides is chosen at random as the winner. |
latest | The side most recently changed wins. |
abort | The sync is aborted, and will not continue. |
skip | No action is taken. Not very useful by itself. |
Use of any but "source" and "target" is not recommended for most circumstances.
In some cases, however, applying specific knowledge about your database and business rules is the only way to truly resolve a conflict. To do so, you add an entry into the customcode table, with a the whenrun column set to "conflict". Basically, this is a Perl subroutine that receives information about the conflicting rows, and returns a value that tells Bucardo what to do.
In addition to all of the normal items passed in to custom codes, conflict codes receive an additional key in the hashref named 'rowinfo' which contains:
The hashref that is passed to the conflict subroutine can be modified to let Bucardo know how to handle this conflict. Specifically, the "action" key is a bitmap that can be used to state which side should "win" the conflict:
However, it is also possible to modify the sourcerow and targetrow hashes inside of your subroutine. This makes it possible to put the modified rows back in the same database it came from. Hence:
If nothing should be done about this row, leave it as the default value:
A sync may fail because of an exception thrown by the database. A common example is a unique constraint on a non primary key column. Each table can be assigned one or more exception handlers via the customcode table to try and handle this problems.
In addition to all of the normal items passed in to custom codes, exception codes receive an additional key in the hashref named 'rowinfo' which contains:
It is expected that the excpetion code will change the rows directly via the sourcerow and targetrow keys, or use the database handles to connect back to the databases and fix the condition that caused the exception. Once it has done so, it should set the value of "runagain" to true, and Bucardo will retry the transaction that caused the exception. This process will be repeated, but will eventually throw a fatal error, if the number of attempts is greater than the number of rows for this sync.
Bucardo tracks many things about each sync: when it completed, how long it took, how many rows were processed, etc. This data is stored in the q table, but this table can grow very quickly. To prevent access to this table from getting too slow, older data is moved from the q table to a table called master_q inside of the freezer schema. The master_q table is partitioned by date, so that the stats page only has to select from a few child tables, depending on the date range that is set. The moving of the old rows from the q table to the master_q table is accomplished by the bucardo_purge_q_table() function, which is usually called regularly (e.g. every five minutes) by a cronjob.
All the bucardo processes will respond to a specific NOTIFY message. This can be used to verify that the process is still alive and working normally. For the MCP, the name of the message is "bucardo_mcp_ping". It will issue a "bucardo_mcp_pong" NOTIFY when it receives the message. (This is what "./bucardo_ctl ping" does). The controller and children use their PID (Process Identification number) to construct the ping and pong names. For controllers, the name is "bucardo_ctl_#_ping", where '#' is the PID, and for kids, the name is "bucardo_kid_#_ping". Both will return a "_pong" NOTIFY when they receive the ping.
The latest development version of Bucardo can be checked out like so:
git clone http://bucardo.org/bucardo.git/
Detailed information on how the whole thing works.
Special thanks to Jon Jensen at End Point, who developed the push-tables script (an early ancestor of Bucardo), and who provided much code review, guidance, and worthy advice. Thanks to Ethan Rowe, Jeff Boes, and other End Point colleagues for code review, testing, and a willingness to answer questions quickly and correctly. Mark Johnson was invaluable in finding new ways to break the customcode sections and helping translate theory into working code.
Bucardo was developed for Backcountry.com and much thanks is due to them for their early support, particularly Spencer Christensen and Dave Jenkins. Bucardo would not be as far along as it is if it did not grow up in the demanding, complex, extremely high volume database environment at Backcountry.com.
Bucardo is always improving. Some of the things things on its todo list are:
The canonical place for Bucardo information, releases, bug announcements, etc. is the website http://bucardo.org. There are also some mailing lists available: