Testing streaming replication with clustertest

Posted: May 29, 2011 in postgresql
Tags: , , , ,

clustertest is the distributed testing framework that we built for testing Slony. While in Ottawa for PGCon, I modified it such that clustertest can be used to test the streaming replication features built into PostgreSQL 9.1.

A repository with the changes to clustertest including a sample test script can be found at

The clustertest framework is a Java program that runs test scripts written in Javascript using the Rhino engine.

Test Framework
The framework is built with ‘ant jar-coordinator’.
The framework consists of

  • a test client engine (src/info/slony/clustertest/client) that can be deployed as a client worker on remote machines that checks in with the test coordinator for work. In the below example the client engine is running within the same JVM as the test coordinator.
  • A test coordinator (src/info/slony/clustertest/coordinator) that provides classes for interfacing with PostgreSQL commands (createdb, initdb, psql, slonik, slon etc…) and a framework for launching, running and tracking the results of javascript tests.
  • Some utility classes common to both components

The Example Test
The example test script src/info/slony/clustertest/example/sync_rep_test.js has a Javascript class Example1 that extends the Javascript class StreamingRepBase.

Some of the more interesting methods of StreamingRepBase are:

  • initCluster() creates a new database cluster by calling initdb and then enables streaming replication by modifying the postgresql.conf and pb_hba.conf files
  • setupSlave() copies the data directory and creates a recovery.conf file. This method assumes that postgres is not running when it is called (it does not call pg_start_backup).
  • seedData() Populates the disorder tables. This should be called after disorder-1.sql has been run but before disorder-2.sql
  • generateLoad() Will create a set of client workers that will perform database transactions. These transactions run in the background until stop() is called on the ClientScript object.
  • sync() will grab the current WAL write_location from the master and then wait until the replay_location() is caught up to that point
  • Configuring the example test
    The file src/info/slony/clustertest/example/sync_rep_example.properties contains property settings used by the test. Settings such as, where to find the postgresql binary files, and what ports, users and passwords to use, are set in this file.

    To run the sample test (after editing the properties file) you would run
    java -jar clustertest-coordinator.jar src/info/slony/clustertest/examples/sync_rep_example.js src/info/slony/clustertest/examples/sync_rep_example.properties)

    A sample test script
    contains an example test script. This script will

    1. Perform an initdb to create a new database cluster
    2. Modify the postgresql.conf to use the port specified in the properties file and configure synchronous streaming replication
    3. Copy the data directory to so it can act as a base backup for a slave
    4. Perform the required config changes on the slave data directory
    5. Start up both instances of PostgreSQL

    Once both instances of PostgreSQL are running the test script will setup the disorder schema. disorder is a simulated retail workload that was written to test Slony. The schema consists of tables storing orders, items, inventory and customers along with stored procedures that populate the dataset.

    The sample script will then

    1. Generate some load by launching a set of concurrent clients that execute scripts (see disorder.js) that perform business transactions on the store (buy items, add inventory etc…) .
    2. The load will run for a minute then it is stopped. The test script will then wait until the slave database is caught up in applying the WAL for these transactions.
    3. The testing framework compares the contents of the two databases to make sure all data is visible on the slave.
    4. The testing framework then starts up a long running transaction on the slave. This transaction will read all rows in do_inventory and then sit idle with the transaction open.
    5. While the transaction on the slave is open, the test script will generates more load on the master and verifies that the order count is increasing. The test script also periodically vacuums the do_inventory table
    6. The transaction load will run for 5 minutes. Then the test script will verify that the original slave transaction has in fact been aborted. We expect the vacuum to invoke the cancellation behaviour in PostgreSQL 9.1 for slaves. If this does not happen the test records a failure.

    When the test run finishes the results can be viewed in the results/$testrun/testResult.test.txt file

    This sample is intended to demonstrate how clustertest can be used to test PostgreSQL built-in replication. It is not a comprehensive regression test for streaming replication. The changes to the clustertest framework have not yet been merged into the main clusterteset git repository but are available on my github fork.

  1. Great tool! Will clustertest test also schema modifications?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s