As a part of the project, I needed to make replication of postgresql database to be available on different server for analization use.
Main database is being filled constantly with data while the replicated database needs to be accessible by several services in read-only mode on different segment of the network. Because of that and need of local backup direct connection of those services to main database wasn’t an option. So, plan for postgresql replication was needed. Also, as these two servers aren’t connected by secure network or any kind of VPN somehow they needed to have communication channel between them for transfering of the replication data solved – in a secure and least obtrusive way possible. Data status on the slave side doesn’t affect main server and there is possiblity of temporary out-of-syncs (in case of some network failure) which is good enough for the requirements of the project so replication can be done asynchonously.
Unfortunately, postgresql (still) doesn’t have native replication mechanisms so some help from postgresql replication projects was needed. Although pgFoundry has quite a few replication projects listed only few were a real option (working with current versions of postgresql, no need to patch production databases, project in relatively stable state, simple master-slave asycnhronous replication). In the end SkyTools, an open sourced project by the skype developers (seems like they are using this also for replication of their postgresql cluster) was chosen for the job.
As for obtaining secure communication between the servers part – good ol’ ssh is used, one port needed, one-way connection (from slave replication server to master) to keep things secure as much as possible on the replicated server.
So, with tools chosen here is how it’s all put together, we’ll use some abbreviations to keep things simple:
- MasterIP – IP of the master postgresql server
- MPort – port of the postgresql server on master server
- MDbase – master database name
- SPort – port of the postgresql server on slave
- MSPort – local tunneled port for postgresql master server on the slave
- SDbase – slave database name
Connection part
For the use, simple tunneling of the master postgresql port to slave with ssh -L MSPort:127.0.0.1:MPort MasterIP fulfills our needs. Although, as things go, once connection is broken it’s bye-bye replication so we need to make sure this connection stays up and keeps the tunnel open for the data.
So, meet autossh, a ssh connection monitoring program. After install (either thru apt/yum or manual compile) you just need to start it, this thing doesn’t have much features but it does what it’s intended for remarkably well. It doesn’t just monitor the ssh process, but also checks for traffic on the connection and checks for the ssh connectivity (so it doesn’t try to repeatedly login to server in case of wrong credentials or network failure). Traffic checking on the connection is made with few additional ports, their range is set by the -M parameter.
So to start the tunnel and keep it open (use of ssh authentication keys is recommended here):
autossh -M 20000 -f ssh -L MSPort:127.0.0.1:MPort MasterIP
Preparation of the postgresql server(s)
As already stated, very few modifications need to be done on the database servers itself, mainly it’s adding a few python functions and tables needed for skytools…
Install the SkyTools from pgFoundry page (simple ./configure && make && make install usually does the trick, although some dependencies might be needed; like postgresql-devel and python-psycopg2 packages). Additional note for the RedHat users: skytools modules by default get installed in “/usr/local/lib/python..”, while the system expects them at “/usr/lib/python…” – symlink or addition to path will solve this without much problems.
Slave server should also contain the same schema for the database that needs to be replicated as it’s existent on master server.
SkyTools configuration
SkyTools need to be installed on both master and slave (theoreticaly it could work from slave only but it would probably be too slow for this type of connection between the servers), and config files shall be created by hand; to keep things neat and clean we make /etc/skytools directory on both servers and make config files:
On master server
/etc/skytools/ticker.ini
[pgqadm]
job_name = pgqadm_ticker
db = dbname=MDbase port=MPort host=127.0.0.1
# how often to run maintenance [seconds]
maint_delay = 600
# how often to check for activity [seconds]
loop_delay = 0.1
logfile = /tmp/%(job_name)s.log
pidfile = /tmp/%(job_name)s.pid
On slave server
/etc/skytools/conf.ini
[londiste]
job_name = SPDLABreplicate_dbase
provider_db = dbname=MDbase port=MSPort host=127.0.0.1
subscriber_db = dbname=SDbase port=SPort host=127.0.0.1
# it will be used as sql ident so no dots/spaces
pgq_queue_name = SPDLABreplicator
logfile = /tmp/%(job_name)s.log
pidfile = /tmp/%(job_name)s.pid
Database preparation and replication start
After this is done, we can start creating the neccessary modifications to the databases and start the replication (commands should be executed by the postgresql superuser, each command states at which server):
ON MASTER - pgqadm.py /etc/skytools/ticker.ini install (installing the ticker sql on master database)
ON SLAVE – londiste.py /etc/skytools/conf.ini provider install (installing the londiste provider sql)
ON SLAVE – londiste.py /etc/skytools/conf.ini subscriber install (installing the londiste subscriber sql)
ON MASTER – pgqadm.py /etc/skytools/ticker.ini register SPDLABreplicator SPDLABreplicate_dbase (registering the consumer for ticker)
ON MASTER – pgqadm.py -d /etc/skytools/ticker.ini ticker (starting the ticker daemon; can be stopped with -s switch)
ON SLAVE – londiste.py /etc/skytools/conf.ini provider add –all (adding all tables on provider for replication)
ON SLAVE – londiste.py /etc/skytools/conf.ini subscriber add –all (adding all tables on subscriber for replication)
ON SLAVE – londiste.py -d /etc/skytools/conf.ini replay (starting the londiste daemon, can be stopped with -s switch)
Well, after this replication should have started and slave would be in sync and kept in that state. In case that’s not the so, check the log files defined in config files and use these commands for debugging:
ON MASTER – pgqadm.py /etc/skytools/ticker.ini status (checking the status of the ticker and updates)
ON SLAVE – londiste.py /etc/skytools/conf.ini subscriber tables (checking the state of the tables)
Additional notes
SkyTools seems to be quite undocumented regarding these procedures of installing and maintaing sync for replication and steps explained above seem to be working fine for me, while you might get problems somewhere along the way; so it’s good to keep an eye on the original documentation when some additional help is needed…
Additional notes 2
Sometimes things can go wrong and replication stops – in order to cope with that try to:
Restart the service ON SLAVE – londiste.py /etc/skytools/conf.ini -r
Still no progress?
Try to make a full new copy:
ON SLAVE – londiste.py /etc/skytools/conf.ini subscriber remove –all
ON SLAVE – londiste.py /etc/skytools/conf.ini subscriber add –all
ON SLAVE – disable constraints, triggers and such which could slow down the database
ON SLAVE – londiste.py /etc/skytools/conf.ini copy (if not possible try with –reset option to start as completely new copy with truncate, replay option can also be used but gives less info on the process)
ON SLAVE - In the end re-add the constraints on tables
When it’s really stuck, you can even try with dropping the whole database on subscriber end and reinstall and re-add subscriber stuff, along with initial copy (in the original docs it suggests you make initial copy before adding the tables for subscriber).