Archive for the ‘postgresql’ Category

The slides from my PostgreSQL replication talk at FOSSLC are available here.

The talk covers both Slony and Streaming replication. The key points covered in the talk are

  • Why use replication
  • Some common load balancing architectures
  • 6 Simple steps to setting up Slony
  • 5 Simple steps to setting up streaming replication

I will update this post to link to a video of the talk when FOSSLC makes it available.

Updated: A video of the presentation is available at here

New Features in Slony 2.1

Posted: July 21, 2011 in postgresql
Tags: ,

Last week the Slony team released beta3 of Slony 2.1.0. I thought it would be a good idea to blog about some of the changes we have made in Slony 2.1. My personal theme for this release has been usability. I have overheard people complaining about the usability of Slony and hope that this changes go towards improving it.
(more…)

I’m planning on attending two conferences this September in Denver. The first conference is the annual OpenStreetMap State Of The Map‘ September 9-11. This year will mark the first time since I’ve been involved with OpenStreetMap that the main State Of The Map conference has been held in North America. I am looking forward to putting faces to names and meeting lots of awsome mappers. I might be giving a talk on new features in PostgreSQL 9.1 at the conference but they haven’t yet accepted talks or announced the schedule.

Following State Of the Map I will be hanging around in Denver for FOSS4G 2011 (September 12-16). FOSS4G is the annual conference for open source geo-spatial software. I will be giving a talk on ‘PostGIS replication‘ where I will give an overview of built in replication and Slony. My blog post comparing Slony and 9.0 replication is by far the most popular post on this blog, and the talk will expand on that material.

They are expecting about 1000 people to attend FOSS4G this year. I am expecting there to be a lot of maps and talk about maps. In addition to my talk there are many other PostGIS/PostgreSQL talks on the schedule. If your going to be attending a conference related to databases this September what better place to be than Denver? The early registration discounts end on June 30’th so remember to register before the price goes up

clustertest is the distributed testing framework that we built for testing Slony. While in Ottawa for PGCon, I modified it such that clustertest can be used to test the streaming replication features built into PostgreSQL 9.1.

(more…)

I recently wanted to build some PostgreSQL C stored functions on a Win32 machine using the Microsoft Windows SDK. I wanted to build with the Microsoft compiler (Visual C) using nmake files, but without involving the Visual Studio IDE.
(more…)

My Slony Internals talk at PgEast was well attended at it seemed like most of the room was able to follow along.

Here are the slides from the talk.

Getting ready for PgEast

Posted: March 17, 2011 in postgresql
Tags: , , ,

Next week I’m giving a talk on Slony Internals at PGEast in NYC. I’ll be covering the different components that many up Slony and explaining how data flows through a Slony cluster. Working on the slides I’m reminded how many moving parts Slony has.

In addition to my talk, Jim Mlodgenski will be giving a talk on multi-master Replication with Slony.

My favourite part of conferences such as PGEast is that I get to put faces to the names & email addresses that I’ve been communicating with. I know that at least two Slony developers and two of the Slony packager maintainers, and a number of regulars from the #slony IRC channel are going to be at PGEast. I’m looking forward to meeting a lot more people in the PostgreSQL community and maybe even learning something about MongoDB.

Slony wishlist

Posted: November 26, 2010 in postgresql
Tags: , ,

There has been some recent discussion on the slony -general mailing list about what changes should be made in the next major version of slony.

The slony team would like to get a sense of what aspects of slony are causing people issues or what features need to be added. The idea list is being tracked in a wiki page.
(more…)

Slony 2.0.5 was just released. This version is the result of a lot of testing of the 2.0.x version and I feel that 2.0.5 is now more stable and a better choice than 1.2.x for most deployments.

PostgreSQL 9.0 (including streaming replication) is also now out.

Some people are asking if slony or streaming replication is a better choice for them. Here is my take on the issue.

Both the streaming replication and Slony are asynchronous. If you need synchronous replication then you will have to wait at least until 9.1. However, if your looking for asynchronous replication then ask your self the following questions:

  1. Are my master and slave running the same version of PostgreSQL on the same platform?
  2. Does my slave only need to perform failover or read-only queries?
  3. Do I only need one slave? (Multiple slaves can consume the same WAL segments but it is unclear to me how you will be able to keep the second slave after failing over)?
  4. Do I want my slave to be identical to the master(no extra tables, no extra databases, no extra indices)?

If the answer to all of the above questions is yes then streaming replication in 9.0 might be a good choice for you.

However if the answer to any of the following questions are yes:

  1. My master and slave are on different hardware platforms?
  2. I want to add some tables for reporting on my slave?
  3. I have multiple databases on my master but only want to replicate some of them to the slave?
  4. For security reasons I want tables to have different permissions on my slave then my master?
  5. I want to be able to take my master down for hardware maintenance but after I’m done I want to have the master take over from the slave without having to re-copy my entire database?
  6. I want to replicate from A==>B and then have B replicate to C and D?
  7. I can live without automatic DDL replication

then Slony (or another trigger based replication system) might be a better choice for you. It is unlikely that the WAL based replication in PostgreSQL will ever be able to deal with a lot of these use-cases. I see many situations where trigger based replication is appropriate and I don’t see this changing with 9.0 or 9.1

Slon memory usage

Posted: August 2, 2010 in postgresql
Tags: , ,

A few people (myself included) have recently commented on how the slon on x64 linux systems tends to use a lot more memory than one would exepect. Slony clusters with 3-5 nodes seem to have slon proceses consuming 100-180 megs of memory (VmData).

Slon is a multi threaded process that has a set of threads managing the local database (cleanupThread, localListener, syncThread and the main thread) and a pair of threads (a remoteWorker and a remoteListener) for each remote database that slon has a path to. I should write another post explaining what each of these does.

Each thread created in a multi-threaded process has memory set side aside for that threads stack. A programs memory is divided into a into a stack (well one stack per thread) and a heap. Variables declared at a function scope level go on the stack, when a function calls another function the stack is ‘pushed down’ so the new function can put its variables on the top of the stack. Dynamically allocated memory goes on the heap. Good programming practice indicates that you try to only put small things on the stack, such as pointers to things on the heap, or single numeric values, or maybe a small string. Larger variables should go on the heap.

In my early days of programming I remember writing programs on MS-DOS that would occasionally run out of stack space. The program would exit with an error ‘stack space exceeded’ or some such thing and you would need to figure out what was going on. Often these errors were because of infinite recursion where a function keeps calling itself (which in turn calls itself) over and over again. Sometimes it was because I tried to allocate lots of large arrays on the stack. One learned not to allocate your larger objects on the stack. I think the default stack size was in the range of 8k.

So how much stack memory is each of my slon threads actually consuming? pthreads allows you to adjust this through the API but the default amount on Linux is coverned by the RLIMIT_STACK setting. So what is RLIMIT_STACK on my x64 Ubuntu laptop system?


ulimit -s
8129kb

8 megabytes!!!!

8*(4(local threads) + 2*4(remote threads)=96 megs

That is just stack space which is mostly unused since most data in slony is allocated on the heap. Why is my Ubuntu system giving me a 8 meg default per thread stack? In a single threaded process RLIMIT_STACK specifies the maximum size that the stack can grow to. Programs that won’t need stacks this large won’t have stacks that large. The problem is that in a multi-threaded process pthreads needs to give each thread a continuous stack when it creates the thread. The stack for each thread can’t grow because it might grow into the next one. Properly written multi-threaded programs tend to not use large stacks. Good C programmers know not to put large things on the stack in a multi-threaded program and the higher level languages such as Perl, Java, and Python go to great pains to implement memory management for heap data.

I question the rational behind making the default RLIMIT_STACK setting equal to 8 megabytes, both my Ubuntu and Debian systems have this. If the RLIMIT_STACK where unlimited then the Linux implementation of pthreads would give me 2 megabytes of stack per thread. I agree it makes sense to have a limited stack size by default (think of programs that have infinite recursion bugs) but the challenge is figuring out what the stack size should be for a multi-threaded program. The problem is made worse because stack overflows in a multi-threaded program don’t generate exceptions but instead silently write into some other part of memory. Setting the default stack size to a high value like 8 megabytes seems like a cope-out that encourages developers to not pay attention to their stack usage.

So if your slon processes are using more memory than you feel they should then check what your stack segment size is configured as. On unix systems you can adjust this with the ulimit -s command. You should also keep in mind that the stack pages that aren’t actually being used by slony will tend to not occupy space in the resident set (use up real memory). In a future version of slony we should look at setting per-thread stack size that is much lower and closer but we need to first do some analysis to get a sense of how small of a stack we can get away with.