In the last post we had a look at the installation and setup of pg_auto_failover. We currently have one primary cluster and two replicas synchronizing from this primary cluster. But we potentially also have an issue in the setup: The monitor is running beside the primary instance on the same node and if that nodes goes down the monitor is gone. What happens in that case and how can we avoid that? We also did not look at controlled switch-overs, and this is definitely something you want to have in production. From time to time you’ll need to do some maintenance on one of the nodes, and switching the primary cluster to another node is very handy in such situations. Lets start with the simple case and have a look at switch-overs first.
This is the current state of the setup:
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/6002408 | yes | primary | primary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/6002408 | yes | secondary | secondary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/6002408 | yes | secondary | secondary
Before we attempt to do a switch-over you should be aware of your replication settings:
postgres@pgaf1:~$ pg_autoctl get formation settings --pgdata /u02/pgdata/13/monitor/ Context | Name | Setting | Value ----------+---------+---------------------------+------------------------------------------------------------- formation | default | number_sync_standbys | 1 primary | node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)' node | node_1 | candidate priority | 50 node | node_2 | candidate priority | 50 node | node_3 | candidate priority | 50 node | node_1 | replication quorum | true node | node_2 | replication quorum | true node | node_3 | replication quorum | true
What does this tell us:
- synchronous_standby_names: We’re using synchronous replication and at least one of the two replicas need to confirm a commit (This is a PostgreSQL setting)
- number_sync_standbys=1: That means at least one standby needs to confirm the commit (This is a pg_auto_failover setting)
- candidate priority=50: This specifies which replica gets promoted. At the default setting of 50 all replicas have the same chance to be selected for promotion and the monitor will pick the one with the most advanced LSN. (This is a pg_auto_failover setting)
- replication quorum=true: This mean synchronous replication, a values of false mean asynchronous replication. (This is a pg_auto_failover setting)
You maybe have noticed the “formation” keyword above. A formation is a set of PostgreSQL clusters that are managed together and that means you can use the same monitor to manage multiple sets of PostgreSQL clusters. We are using the default formation in this example.
Lets assume we need to do some maintenance on our primary node and therefore want to switch-over the primary instance to another node. The command to do that is simple:
postgres@pgaf1:~$ pg_autoctl perform switchover --pgdata /u02/pgdata/13/PG1/ 16:10:05 15960 INFO Targetting group 0 in formation "default" 16:10:05 15960 INFO Listening monitor notifications about state changes in formation "default" and group 0 16:10:05 15960 INFO Following table displays times when notifications are received Time | Name | Node | Host:Port | Current State | Assigned State ---------+--------+-------+--------------------------------+---------------------+-------------------- 16:10:05 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | primary | draining 16:10:05 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | draining | draining 16:10:05 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | secondary | report_lsn 16:10:05 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | secondary | report_lsn 16:10:06 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | report_lsn | report_lsn 16:10:06 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | report_lsn | report_lsn 16:10:06 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | report_lsn | prepare_promotion 16:10:06 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | prepare_promotion | prepare_promotion 16:10:06 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | prepare_promotion | stop_replication 16:10:06 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | draining | demote_timeout 16:10:06 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | report_lsn | join_secondary 16:10:06 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | demote_timeout | demote_timeout 16:10:06 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | join_secondary | join_secondary 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | stop_replication | stop_replication 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | stop_replication | wait_primary 16:10:07 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | demote_timeout | demoted 16:10:07 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | demoted | demoted 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | wait_primary | wait_primary 16:10:07 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | join_secondary | secondary 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | wait_primary | primary 16:10:07 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | demoted | catchingup 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | wait_primary | join_primary 16:10:07 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | join_primary | join_primary 16:10:08 | node_3 | 3 | pgaf3.it.dbi-services.com:5432 | secondary | secondary 16:10:08 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | catchingup | catchingup 16:10:08 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | catchingup | secondary 16:10:08 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | join_primary | primary 16:10:08 | node_1 | 1 | pgaf1.it.dbi-services.com:5432 | secondary | secondary 16:10:08 | node_2 | 2 | pgaf2.it.dbi-services.com:5432 | primary | primary postgres@pgaf1:~$
You’ll get the progress messages to the screen so you can actually see what happens. As the services are started with systemd you can also have a look at the journal:
-- Logs begin at Thu 2020-12-10 15:17:38 CET, end at Thu 2020-12-10 16:11:26 CET. -- Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO Transition complete: current state is now "secondary" Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO node 1 "node_1" (pgaf1.it.dbi-services.com:5432) reported new state "secondary" Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): secondary ➜ sec Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO New state for this node (node 1, "node_1") (pgaf1.it.dbi-services.com:5432): se Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO node 2 "node_2" (pgaf2.it.dbi-services.com:5432) reported new state "primary" Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima
The second second node was selected as the new primary, and we can of course confirm that:
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 | yes | secondary | secondary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 | yes | primary | primary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/60026F8 | yes | secondary | secondary postgres@pgaf1:~$
Next test: What happens when we reboot a node that currently is running a replica? Lets reboot pgaf3 as this one is currently a replica, and it does not run the monitor:
postgres@pgaf3:~$ sudo reboot postgres@pgaf3:~$ Connection to 192.168.22.192 closed by remote host. Connection to 192.168.22.192 closed.
Watching at the state the “Reachable” status changes to “no” for the third instance and the LSN falls behind:
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 | yes | secondary | secondary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 | yes | primary | primary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 | no | secondary | secondary
Once it is back, the replica is brought back to the configuration and all is fine:
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 | yes | secondary | secondary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 | yes | primary | primary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 | yes | secondary | secondary ... postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/6013120 | yes | secondary | secondary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/6013120 | yes | primary | primary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/6013120 | yes | secondary | secondary
But what happens if we shutdown the monitor node?
postgres@pgaf1:~$ sudo systemctl poweroff postgres@pgaf1:~$ Connection to 192.168.22.190 closed by remote host. Connection to 192.168.22.190 closed.
Checking the status on the node which currently hosts the primary cluster:
postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/ 10:26:52 1293 WARN Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require", retrying until the server is ready 10:26:52 1293 ERROR Connection to database failed: timeout expired 10:26:52 1293 ERROR Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require" after 1 attempts in 2 seconds, pg_autoctl stops retrying now 10:26:52 1293 ERROR Failed to retrieve current state from the monitor
As the monitor is down we cannot anymore ask for status. The primary and the remaining replica cluster are still up and running but we lost the possibility to interact with pg_auto_failover. Booting up the monitor node brings is back into the game:
postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/ Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+--------------------------------+-----------+-----------+---------------------+-------------------- node_1 | 1 | pgaf1.it.dbi-services.com:5432 | 0/6000000 | yes | secondary | secondary node_2 | 2 | pgaf2.it.dbi-services.com:5432 | 0/6013240 | yes | primary | primary node_3 | 3 | pgaf3.it.dbi-services.com:5432 | 0/6013240 | yes | secondary | secondary
This has a consequence: The monitor should not run on any of the PostgreSQL nodes but on a separate node which is dedicated to the monitor. As you can manage more than one HA setup with the same monitor this should not an issue, though. But this also means that the monitor is a single point of failure and the health of the monitor is critical for pg_auto_failover.
Cet article pg_auto_failover: Failover and switchover scenarios est apparu en premier sur Blog dbi services.