MySQL高可用方案 MHA之四 keepalived 半同步复制

主从架构（开启5.7的增强半同步模式）
master:
10.150.20.90 　　 ed3jrdba90
slave:
10.150.20.97 　　 ed3jrdba97
10.150.20.132 　　 ed3jrdba132
manager:
10.150.20.95 　　 ed3jrdba95
vip:10.150.20.200

MySQL:5.7

os:CentOS 7.3
网卡名：ens3

这里，ed3jrdba90、ed3jrdba97部署keepalived,vip绑定在ed3jrdba90，并且ed3jrdba90节点为MHA master主库，ed3jrdba97为candidate_master。

1:配置文件candidate_masterkeepalived.conf如下：
# cat /etc/keepalived/keepalived.conf

! Configuration File for keepalivedglobal_defs {
notification_email {
eric@gmail.cn
}
notification_email_from eric@gmail.cn
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id mysql_mha
}
vrrp_instance VI_1 {
state master
interface ens3
nopreempt
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.150.20.200
}
}virtual_server 10.150.20.200 33061 {
delay_loop 6
lb_algo wrr
lb_kind DR
nat_mask 255.255.255.0
persistence_timeout 50
protocol TCP
real_server 10.150.20.90 33061 {
weight 1
notify_down /etc/keepalived/shutdown_keepalived.sh
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
connect_port 33061
}
}
}

# cat /etc/keepalived/shutdown_keepalived.sh

systemctl stop keepalived
检查vip绑定在ed3jrdba90的ens3:1
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 54:52:00:01:61:2b brd ff:ff:ff:ff:ff:ff
inet 10.150.20.90/24 brd 10.150.20.255 scope global ens3
valid_lft forever preferred_lft forever
inet 10.150.20.200/32 scope global ens3
valid_lft forever preferred_lft forever

ed3jrdba95节点：
MHA manager配置文件
# vi /etc/mysql_mha/app1.cnf
[server default]
manager_workdir=/data/mysql_mha/app1
manager_log=/data/mysql_mha/app1-manager.log

master_binlog_dir=/data/mysql_33061/logs

user=mha_monitor
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1

master_ip_failover_script= /usr/local/bin/master_ip_failover
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change

secondary_check_script=/usr/local/bin/masterha_secondary_check -s 10.150.20.97 -s 10.150.20.90

repl_user=replicator
repl_password=replicator

report_script=/usr/local/bin/send_report
shutdown_script=””
ssh_user=root

[server1]
hostname=10.150.20.90
port=33061

[server2]
hostname=10.150.20.97
port=33061
candidate_master=1
check_repl_delay=0

[server3]
hostname=10.150.20.132
port=33061

master_ip_failover脚本:

# cat /usr/local/bin/master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command,$ssh_user,$orig_master_host,$orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip,$new_master_port
);my $vip = '10.150.20.200';
my $ssh_start_vip = "systemctl start keepalived";
my $ssh_stop_vip = "systemctl stop keepalived";GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s'=>\$orig_master_ip,
'orig_master_port=i'=>\$orig_master_port,
'new_master_host=s'=>\$new_master_host,
'new_master_ip=s'=>\$new_master_ip,
'new_master_port=i'=> \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = ;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = ;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {my $exit_code = ;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = ;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit ;
}
else {
&usage();
exit ;
}
}# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
return  unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

master_ip_failover

检测mha复制环境：

# masterha_check_repl –conf=/etc/mysql_mha/app1.cnf

Thu Dec  ::  - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Dec  ::  - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] MHA::MasterMonitor version 0.58.
Thu Dec  ::  - [info] GTID failover mode =
Thu Dec  ::  - [info] Dead Servers:
Thu Dec  ::  - [info] Alive Servers:
Thu Dec  ::  - [info] 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:)
Thu Dec  ::  - [info] Alive Slaves:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Current Alive Master: 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Checking slave configurations..
Thu Dec  ::  - [info] read_only= is not set on slave 10.150.20.97(10.150.20.97:).
Thu Dec  ::  - [warning] relay_log_purge= is not set on slave 10.150.20.97(10.150.20.97:).
Thu Dec  ::  - [info] read_only= is not set on slave 10.150.20.132(10.150.20.132:).
Thu Dec  ::  - [info] Checking replication filtering settings..
Thu Dec  ::  - [info] binlog_do_db= , binlog_ignore_db=
Thu Dec  ::  - [info] Replication filtering check ok.
Thu Dec  ::  - [info] GTID (with auto-pos) is not supported
Thu Dec  ::  - [info] Starting SSH connection tests..
Thu Dec  ::  - [info] All SSH connection tests passed successfully.
Thu Dec  ::  - [info] Checking MHA Node version..
Thu Dec  ::  - [info] Version check ok.
Thu Dec  ::  - [info] Checking SSH publickey authentication settings on the current master..
Thu Dec  ::  - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Thu Dec  ::  - [info] Master MHA Node version is 0.58.
Thu Dec  ::  - [info] Checking recovery script configurations on 10.150.20.90(10.150.20.90:)..
Thu Dec  ::  - [info] Executing command: save_binary_logs --command=test --start_pos= --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.
Thu Dec  ::  - [info] Connecting to root@10.150.20.90(10.150.20.90:)..
Creating /data/mysql_mha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql_33061/logs, up to mysql-bin.
Thu Dec  ::  - [info] Binlog setting check done.
Thu Dec  ::  - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Dec  ::  - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port= --workdir=/data/mysql_mha/app1 --target_version=5.7.-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Thu Dec  ::  - [info] Connecting to root@10.150.20.97(10.150.20.97:)..
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/logs, up to relaylog.
Temporary relay log file is /data/mysql_33061/logs/relaylog.
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Dec  ::  - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port= --workdir=/data/mysql_mha/app1 --target_version=5.7.-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Thu Dec  ::  - [info] Connecting to root@10.150.20.132(10.150.20.132:)..
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.
Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Dec  ::  - [info] Slaves settings check done.
Thu Dec  ::  - [info]
10.150.20.90(10.150.20.90:) (current master)
+--10.150.20.97(10.150.20.97:)
+--10.150.20.132(10.150.20.132:)Thu Dec  ::  - [info] Checking replication health on 10.150.20.97..
Thu Dec  ::  - [info] ok.
Thu Dec  ::  - [info] Checking replication health on 10.150.20.132..
Thu Dec  ::  - [info] ok.
Thu Dec  ::  - [info] Checking master_ip_failover_script status:
Thu Dec  ::  - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Checking the Status of the script.. OK
Thu Dec  ::  - [info] OK.
Thu Dec  ::  - [warning] shutdown_script is not defined.
Thu Dec  ::  - [info] Got exit code  (Not master dead).MySQL Replication Health is OK.

输出

检测MHA manager是否开启：

# masterha_check_status –conf=/etc/mysql_mha/app1.cnf
app1 (pid:19724) is running(0:PING_OK), master:10.150.20.90

开启日志监控
# tail -f /data/mysql_mha/app1-manager.log

故障切换测试：

节点ed3jrdba90：关闭主库MySQL:
# ps -ef |grep mysql
root 18908 1 0 12月12 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe –datadir=/data/mysql_33061/data –pid-file=/data/mysql_33061/run/mysql.pid
mysql 20401 18908 0 12月12 ? 00:03:02 /usr/local/mysql/bin/mysqld –basedir=/usr/local/mysql –datadir=/data/mysql_33061/data –plugin-dir=/usr/local/mysql/lib/plugin –user=mysql –log-error=/data/mysql_33061/logs/mysqlerror.log –open-files-limit=65535 –pid-file=/data/mysql_33061/run/mysql.pid –socket=/data/mysql_33061/run/mysql.sock –port=33061
root 31104 29976 0 15:57 pts/0 00:00:00 grep –color=auto mysql
# kill -9 20401 18908

节点ed3jrdba97：主库MySQL宕掉之后，vip飘到节点ed3jrdba97

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff
inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3
valid_lft forever preferred_lft forever
inet 10.150.20.200/32 scope global ens3
valid_lft forever preferred_lft forever

节点ed3jrdba95：在MHA manager节点的日志显示了MHA切换过程

Thu Dec  ::  - [warning] Got error on MySQL select ping:  (MySQL server has gone away)
Thu Dec  ::  - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 10.150.20.97 -s 10.150.20.90 --user=root --master_host=10.150.20.90 --master_ip=10.150.20.90 --master_port= --master_user=mha_monitor --master_password=mha_monitor --ping_type=SELECT
Thu Dec  ::  - [info] Executing SSH check script: save_binary_logs --command=test --start_pos= --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Thu Dec  ::  - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Monitoring server 10.150.20.97 is reachable, Master is not reachable from 10.150.20.97. OK.
Monitoring server 10.150.20.90 is reachable, Master is not reachable from 10.150.20.90. OK.
Thu Dec  ::  - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Dec  ::  - [warning] Got error on MySQL connect:  (Can't connect to MySQL server on '10.150.20.90' (111))
Thu Dec  ::  - [warning] Connection failed  time(s)..
Thu Dec  ::  - [warning] Got error on MySQL connect:  (Can't connect to MySQL server on '10.150.20.90' (111))
Thu Dec  ::  - [warning] Connection failed  time(s)..
Thu Dec  ::  - [warning] Got error on MySQL connect:  (Can't connect to MySQL server on '10.150.20.90' (111))
Thu Dec  ::  - [warning] Connection failed  time(s)..
Thu Dec  ::  - [warning] Master is not reachable from health checker!
Thu Dec  ::  - [warning] Master 10.150.20.90(10.150.20.90:) is not reachable!
Thu Dec  ::  - [warning] SSH is reachable.
Thu Dec  ::  - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Dec  ::  - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Dec  ::  - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] GTID failover mode =
Thu Dec  ::  - [info] Dead Servers:
Thu Dec  ::  - [info] 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Alive Servers:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:)
Thu Dec  ::  - [info] Alive Slaves:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Primary candidate for the new Master (candidate_master is set)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Checking slave configurations..
Thu Dec  ::  - [info] read_only= is not set on slave 10.150.20.97(10.150.20.97:).
Thu Dec  ::  - [warning] relay_log_purge= is not set on slave 10.150.20.97(10.150.20.97:).
Thu Dec  ::  - [info] read_only= is not set on slave 10.150.20.132(10.150.20.132:).
Thu Dec  ::  - [info] Checking replication filtering settings..
Thu Dec  ::  - [info] Replication filtering check ok.
Thu Dec  ::  - [info] Master is down!
Thu Dec  ::  - [info] Terminating monitoring script.
Thu Dec  ::  - [info] Got exit code  (Master dead).
Thu Dec  ::  - [info] MHA::MasterFailover version 0.58.
Thu Dec  ::  - [info] Starting master failover.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase : Configuration Check Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] GTID failover mode =
Thu Dec  ::  - [info] Dead Servers:
Thu Dec  ::  - [info] 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Checking master reachability via MySQL(double check)...
Thu Dec  ::  - [info] ok.
Thu Dec  ::  - [info] Alive Servers:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:)
Thu Dec  ::  - [info] Alive Slaves:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Primary candidate for the new Master (candidate_master is set)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Starting Non-GTID based failover.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] ** Phase : Configuration Check Phase completed.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase : Dead Master Shutdown Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Forcing shutdown so that applications never connect to the current master..
Thu Dec  ::  - [info] Executing master IP deactivation script:
Thu Dec  ::  - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port= --command=stopssh --ssh_user=rootIN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Disabling the VIP on old master: 10.150.20.90
Thu Dec  ::  - [info] done.
Thu Dec  ::  - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Dec  ::  - [info] * Phase : Dead Master Shutdown Phase completed.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase : Master Recovery Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] The latest binary log file/position on all slaves is mysql-bin.:
Thu Dec  ::  - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Primary candidate for the new Master (candidate_master is set)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] The oldest binary log file/position on all slaves is mysql-bin.:
Thu Dec  ::  - [info] Oldest slaves:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Primary candidate for the new Master (candidate_master is set)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Fetching dead master's binary logs..
Thu Dec  ::  - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:): save_binary_logs --command=save --start_file=mysql-bin. --start_pos= --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --handle_raw_binlog= --disable_log_bin= --manager_version=0.58
Creating /data/mysql_mha/app1 if not exists.. ok.
Concat binary/relay logs from mysql-bin. pos  to mysql-bin. EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position  to .. ok.
Dumping effective binlog data from /data/mysql_33061/logs/mysql-bin. position  to tail().. ok.
Binlog Checksum enabled
Concat succeeded.
Thu Dec  ::  - [info] scp from root@10.150.20.90:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.
Thu Dec  ::  - [info] HealthCheck: SSH to 10.150.20.97 is reachable.
Thu Dec  ::  - [info] HealthCheck: SSH to 10.150.20.132 is reachable.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 3.3: Determining New Master Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Dec  ::  - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Dec  ::  - [info] Searching new master from slaves..
Thu Dec  ::  - [info] Candidate masters from the configuration file:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.90(10.150.20.90:)
Thu Dec  ::  - [info] Primary candidate for the new Master (candidate_master is set)
Thu Dec  ::  - [info] Non-candidate masters:
Thu Dec  ::  - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Dec  ::  - [info] New master is 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] Starting master failover..
Thu Dec  ::  - [info]
From:
10.150.20.90(10.150.20.90:) (current master)
+--10.150.20.97(10.150.20.97:)
+--10.150.20.132(10.150.20.132:)To:
10.150.20.97(10.150.20.97:) (new master)
+--10.150.20.132(10.150.20.132:)
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Dec  ::  - [info] Sending binlog..
Thu Dec  ::  - [info] scp from local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to root@10.150.20.97:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 3.5: Master Log Apply Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Dec  ::  - [info] Starting recovery on 10.150.20.97(10.150.20.97:)..
Thu Dec  ::  - [info] Generating diffs succeeded.
Thu Dec  ::  - [info] Waiting until all relay logs are applied.
Thu Dec  ::  - [info] done.
Thu Dec  ::  - [info] Getting slave status..
Thu Dec  ::  - [info] This slave(10.150.20.97)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000011:154). No need to recover from Exec_Master_Log_Pos.
Thu Dec  ::  - [info] Connecting to the target slave host 10.150.20.97, running recover script..
Thu Dec  ::  - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port= --apply_files=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --workdir=/data/mysql_mha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.58 --slave_pass=xxx
Thu Dec  ::  - [info]
MySQL client version is 5.7.. Using --binary-mode.
Applying differential binary/relay log files /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog on 10.150.20.97:. This may take long time...
Applying log files succeeded.
Thu Dec  ::  - [info] All relay logs were successfully applied.
Thu Dec  ::  - [info] Getting new master's binlog name and position..
Thu Dec  ::  - [info] mysql-bin.:
Thu Dec  ::  - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=, MASTER_LOG_FILE='mysql-bin.000018', MASTER_LOG_POS=, MASTER_USER='replicator', MASTER_PASSWORD='xxx';
Thu Dec  ::  - [info] Executing master IP activate script:
Thu Dec  ::  - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port= --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port= --new_master_user='mha_monitor' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_passwordIN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Enabling the VIP - 10.150.20.200 on the new master - 10.150.20.97
Thu Dec  ::  - [info] OK.
Thu Dec  ::  - [info] ** Finished master recovery successfully.
Thu Dec  ::  - [info] * Phase : Master Recovery Phase completed.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase : Slaves Recovery Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:) started, pid: . Check tmp log /data/mysql_mha/app1/10.150..132_33061_20181213163810.log if it takes time..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Log messages from 10.150.20.132 ...
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Dec  ::  - [info] End of log messages from 10.150.20.132.
Thu Dec  ::  - [info] -- 10.150.20.132(10.150.20.132:) has the latest relay log events.
Thu Dec  ::  - [info] Generating relay diff files from the latest slave succeeded.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:) started, pid: . Check tmp log /data/mysql_mha/app1/10.150..132_33061_20181213163810.log if it takes time..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Log messages from 10.150.20.132 ...
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Sending binlog..
Thu Dec  ::  - [info] scp from local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to root@10.150.20.132:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.
Thu Dec  ::  - [info] Starting recovery on 10.150.20.132(10.150.20.132:)..
Thu Dec  ::  - [info] Generating diffs succeeded.
Thu Dec  ::  - [info] Waiting until all relay logs are applied.
Thu Dec  ::  - [info] done.
Thu Dec  ::  - [info] Getting slave status..
Thu Dec  ::  - [info] This slave(10.150.20.132)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000011:154). No need to recover from Exec_Master_Log_Pos.
Thu Dec  ::  - [info] Connecting to the target slave host 10.150.20.132, running recover script..
Thu Dec  ::  - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port= --apply_files=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --workdir=/data/mysql_mha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.58 --slave_pass=xxx
Thu Dec  ::  - [info]
MySQL client version is 5.7.. Using --binary-mode.
Applying differential binary/relay log files /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog on 10.150.20.132:. This may take long time...
Applying log files succeeded.
Thu Dec  ::  - [info] All relay logs were successfully applied.
Thu Dec  ::  - [info] Resetting slave 10.150.20.132(10.150.20.132:) and starting replication from the new master 10.150.20.97(10.150.20.97:)..
Thu Dec  ::  - [info] Executed CHANGE MASTER.
Thu Dec  ::  - [info] Slave started.
Thu Dec  ::  - [info] End of log messages from 10.150.20.132.
Thu Dec  ::  - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:) succeeded.
Thu Dec  ::  - [info] All new slave servers recovered successfully.
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] * Phase : New master cleanup phase..
Thu Dec  ::  - [info]
Thu Dec  ::  - [info] Resetting slave info on the new master..
Thu Dec  ::  - [info] 10.150.20.97: Resetting slave info succeeded.
Thu Dec  ::  - [info] Master failover to 10.150.20.97(10.150.20.97:) completed successfully.
Thu Dec  ::  - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf .
Thu Dec  ::  - [info]----- Failover Report -----app1: MySQL Master failover 10.150.20.90(10.150.20.90:) to 10.150.20.97(10.150.20.97:) succeededMaster 10.150.20.90(10.150.20.90:) is down!Check MHA Manager logs at ed3jrdba95:/data/mysql_mha/app1-manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 10.150.20.90(10.150.20.90:)
The latest slave 10.150.20.97(10.150.20.97:) has all relay logs for recovery.
Selected 10.150.20.97(10.150.20.97:) as a new master.
10.150.20.97(10.150.20.97:): OK: Applying all logs succeeded.
10.150.20.97(10.150.20.97:): OK: Activated master IP address.
10.150.20.132(10.150.20.132:): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.150.20.132(10.150.20.132:): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:)
10.150.20.97(10.150.20.97:): Resetting slave info succeeded.
Master failover to 10.150.20.97(10.150.20.97:) completed successfully.

failover过程

MHA manager节点的masterha_manager已经停掉
# masterha_check_status –conf=/etc/mysql_mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
app1.cnf文件被修改为：
# cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=””
ssh_user=root
user=mha_monitor

[server2]
hostname=10.150.20.97
port=33061

[server3]
hostname=10.150.20.132
port=33061

此时的复制关系为：
新主：10.150.20.97
从库：10.150.20.132

# masterha_check_repl –conf=/etc/mysql_mha/app1.cnf

Thu Dec  ::  - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Dec  ::  - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Thu Dec  ::  - [info] MHA::MasterMonitor version 0.58.
Thu Dec  ::  - [info] GTID failover mode =
Thu Dec  ::  - [info] Dead Servers:
Thu Dec  ::  - [info] Alive Servers:
Thu Dec  ::  - [info] 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:)
Thu Dec  ::  - [info] Alive Slaves:
Thu Dec  ::  - [info] 10.150.20.132(10.150.20.132:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Dec  ::  - [info] Replicating from 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] Current Alive Master: 10.150.20.97(10.150.20.97:)
Thu Dec  ::  - [info] Checking slave configurations..
Thu Dec  ::  - [info] read_only= is not set on slave 10.150.20.132(10.150.20.132:).
Thu Dec  ::  - [info] Checking replication filtering settings..
Thu Dec  ::  - [info] binlog_do_db= , binlog_ignore_db=
Thu Dec  ::  - [info] Replication filtering check ok.
Thu Dec  ::  - [info] GTID (with auto-pos) is not supported
Thu Dec  ::  - [info] Starting SSH connection tests..
Thu Dec  ::  - [info] All SSH connection tests passed successfully.
Thu Dec  ::  - [info] Checking MHA Node version..
Thu Dec  ::  - [info] Version check ok.
Thu Dec  ::  - [info] Checking SSH publickey authentication settings on the current master..
Thu Dec  ::  - [info] HealthCheck: SSH to 10.150.20.97 is reachable.
Thu Dec  ::  - [info] Master MHA Node version is 0.58.
Thu Dec  ::  - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:)..
Thu Dec  ::  - [info] Executing command: save_binary_logs --command=test --start_pos= --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.
Thu Dec  ::  - [info] Connecting to root@10.150.20.97(10.150.20.97:)..
Creating /data/mysql_mha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql_33061/logs, up to mysql-bin.
Thu Dec  ::  - [info] Binlog setting check done.
Thu Dec  ::  - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Dec  ::  - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port= --workdir=/data/mysql_mha/app1 --target_version=5.7.-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Thu Dec  ::  - [info] Connecting to root@10.150.20.132(10.150.20.132:)..
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.
Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Dec  ::  - [info] Slaves settings check done.
Thu Dec  ::  - [info]
10.150.20.97(10.150.20.97:) (current master)
+--10.150.20.132(10.150.20.132:)Thu Dec  ::  - [info] Checking replication health on 10.150.20.132..
Thu Dec  ::  - [info] ok.
Thu Dec  ::  - [info] Checking master_ip_failover_script status:
Thu Dec  ::  - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Checking the Status of the script.. OK
Thu Dec  ::  - [info] OK.
Thu Dec  ::  - [warning] shutdown_script is not defined.
Thu Dec  ::  - [info] Got exit code  (Not master dead).MySQL Replication Health is OK.

主从发生变化

实战：https://yq.aliyun.com/articles/58920

个人收藏笔记记录

开通VIP