欢迎来到HugNew-拥抱变化,扫一扫右边二维码关注微信订阅号:Martin说 或 加QQ群:427697041互相交流,Stay hungry, Stay foolish.

MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS

集群架构 Martin 15409℃ 12评论

简介

目前Mysql高可用的方案有好多,比如MMM,heartbeat+drbd,Cluster等,还有percona的Galera Cluster等。这些高可用软件各有优劣。在进行高可用方案选择时,主要是看业务还有对数据一致性方面的要求。本文探讨的是MHA这种方案。
至于DAL层,也有很多方案,现在主流的一是在应用层写数据库路由,当然这个效率也是最快的,但是最大的缺陷就是运维难度大,技术难度也大,需要有强大的技术团队支持。第二个方案是proxy中间件,现在开源的中间件很多,比如TDDL、Cobar、Atlas、MyCat、Mysql Proxy、Oceanus等等,中间件的最大的好处是对数据层解耦,减轻了运维难度,当然在服务器层与数据库层加了一层proxy,使得效率明显不如直接访问数据库。按业务实际需求选择合适的方案,在本篇文章中,选的是360开源的DAL中间件Atlas,详细信息官方文档(https://github.com/Qihoo360/Atlas)以描述的很清楚,这里不做过多的描述。
现在继续讨论MHA,MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。

架构图

2015-06-18_172304
现在说下架构图的层级关系,最底层的是mysql主从,其中有从机作为备选的主机,备选主机与主机之间由keepalived维护着心跳。每个mysql server都是MHA node,由MHA Manager监控着Mysql的节点,当MHA Manager监控(keepalived的心跳检测)到主机出现问题,会切换到备选主机,并将备选主机提升为新主机,其他的从机成为新主机的从机,并且VIP也漂移到新主机,这样就实现了Mysql的HA。db上面的一层是Atlas Proxy,Atlas主要的功能是读写分离、从库的负载均衡、自动分表、自动摘除宕机的DB、DBA可平滑的上下线DB及IP过滤,其中读写分离及自动摘除宕机的DB是当初选择该中间件的最主要原因,自动摘除宕机的从机保障了整个架构系统持续性,而不影响业务的正常运行。后期根据数据量的增大可以切换到Mycat这个方案,支持分库分表、主从切换等等,并且社区活跃度也很高,感兴趣的可以查看下官方文档(https://github.com/MyCATApache)。回到正题,这里有两个Atlas做HA,上层的LVS做Atlas的负载均衡,两个LVS之间利用keepalived心跳实现热主备模式,防止出现单点故障。再上层的就是应用server,这里就不在讨论了。

角色

角色                    ip地址          主机名          
MHA Manger          192.168.3.123     B-dev23        
Mysql Master        192.168.3.110     B-dev10    
Mysql slave1        192.168.3.115     B-dev15
Mysql slave2        192.168.3.117     B-dev17
Candicate master    192.168.3.118     B-dev18          
Atlas1              192.168.3.124     B-dev24
Atlas2              192.168.3.125     B-dev25
LVS Master          192.168.3.119     B-dev19
LVS Backup          192.168.3.120     B-dev20

MHA

1.MHA的基本原理

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log)到其他的slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新的master;

(6)使其他的slave连接新的master进行复制;

MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。

Manager工具包主要包括以下几个工具:

masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息

Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:

save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

注意:

为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.5的半同步复制,关于半同步复制原理及安装流程下面会详细介绍。

2.部署MHA

接下来部署MHA,具体的搭建环境如下:
其中master对外提供写服务,slave提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master。
(1)在所有节点安装MHA node所需的perl模块(DBD:mysql),安装脚本如下:
[root@192.168.3.110 ~]# cat install.sh 
#!/bin/bash
wget http://xrl.us/cpanm --no-check-certificate
mv cpanm /usr/bin
chmod 755 /usr/bin/cpanm     
cat > /root/list << EOF
install DBD::mysql
EOF
for package in `cat /root/list`
do
    cpanm $package
done
[root@192.168.3.110 ~]# 

如果有安装epel源,也可以使用yum安装

yum install perl-DBD-MySQL -y

(2)在所有的节点安装mha node:

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.53.tar.gz
tar xf mha4mysql-node-0.53.tar.gz
cd mha4mysql-node-0.53
perl Makefile.PL
make && make install

安装完成后会在/usr/local/bin目录下生成以下脚本文件:

[root@192.168.3.110 ~]# pwd
/usr/local/bin
[root@192.168.3.110 bin]# ll
total 40
-r-xr-xr-x 1 root root 15498 Jun 20 10:05 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jun 20 10:05 filter_mysqlbinlog
-r-xr-xr-x 1 root root  7401 Jun 20 10:05 purge_relay_logs
-r-xr-xr-x 1 root root  7263 Jun 20 10:05 save_binary_logs
[root@192.168.3.110 bin]#

关于上面脚本的功能,上面已经介绍过了,这里不再重复了。

3.安装MHA Manager

MHA Manager中主要包括了几个管理员的命令行工具,例如master_manger,master_master_switch等。MHA Manger也依赖于perl模块,具体如下:

(1)安装MHA Node软件包之前需要安装依赖。我这里使用yum完成,没有epel源的可以使用上面提到的脚本(epel源安装也简单)。注意:在MHA Manager的主机也是需要安装MHA Node。

rpm -ivh http://dl.fedorJunoject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install perl-DBD-MySQL -y

安装MHA Node软件包,和上面的方法一样,如下:

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.53.tar.gz
tar xf mha4mysql-node-0.53.tar.gz
cd mha4mysql-node-0.53perl Makefile.PL
make && make install

(2)安装MHA Manager。首先安装MHA Manger依赖的perl模块(我这里使用yum安装):

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

安装MHA Manager软件包:

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-manager-0.53.tar.gz
tar xf mha4mysql-manager-0.53.tar.gz 
cd mha4mysql-manager-0.53
perl Makefile.PL
make && make install

安装完成后会在/usr/local/bin目录下面生成以下脚本文件,前面已经说过这些脚本的作用,这里不再重复。

[root@192.168.3.123 bin]# pwd
/usr/local/bin
[root@192.168.3.123 bin]# ll
total 76
-r-xr-xr-x 1 root root 15498 Jun 20 10:58 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jun 20 10:58 filter_mysqlbinlog
-r-xr-xr-x 1 root root  1995 Jun 20 11:33 masterha_check_repl
-r-xr-xr-x 1 root root  1779 Jun 20 11:33 masterha_check_ssh
-r-xr-xr-x 1 root root  1865 Jun 20 11:33 masterha_check_status
-r-xr-xr-x 1 root root  3201 Jun 20 11:33 masterha_conf_host
-r-xr-xr-x 1 root root  2517 Jun 20 11:33 masterha_manager
-r-xr-xr-x 1 root root  2165 Jun 20 11:33 masterha_master_monitor
-r-xr-xr-x 1 root root  2373 Jun 20 11:33 masterha_master_switch
-r-xr-xr-x 1 root root  3749 Jun 20 11:33 masterha_secondary_check
-r-xr-xr-x 1 root root  1739 Jun 20 11:33 masterha_stop
-r-xr-xr-x 1 root root  7401 Jun 20 10:58 purge_relay_logs
-r-xr-xr-x 1 root root  7263 Jun 20 10:58 save_binary_logs

复制相关脚本到/usr/local/bin目录

[root@192.168.3.123 scripts]# pwd
/root/mha4mysql-manager-0.53/samples/scripts
[root@192.168.3.123 scripts]# ll
total 32
-rwxr-xr-x 1 root root  3443 Jun  8  2015 master_ip_failover                #自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
-rwxr-xr-x 1 root root  9186 Jun  8  2015 master_ip_online_change           #在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
-rwxr-xr-x 1 root root 11867 Jun  8  2015 power_manager                     #故障发生后关闭主机的脚本,不是必须
-rwxr-xr-x 1 root root  1360 Jun  8  2015 send_report                       #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成。
[root@192.168.3.123 scripts]# cp * /usr/local/bin/

4.配置SSH免密码登录(key模式)

这里需要注意的是所有的mysql server节点两两之间必须配成SSH无密码免登录双向模式,否则后面会报错。SSH免密码登录配置的文档可以查看Linux(使用ssh-keygen)设置SSH免密码登录

5.搭建主从复制环境

搭建MySQL主从复制的文档可以查看linux(Ubuntu)下mysql安装及主从复制,MySQL半复制的文档可以查看Mysql半复制浅谈,这里不再赘述。

6.配置MHA

(1)创建MHA的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。

[root@192.168.3.123 ~]# mkdir -p /etc/masterha
[root@192.168.3.123 ~]# cp mha4mysql-manager-0.53/samples/conf/app1.cnf /etc/masterha/

修改app1.cnf配置文件,修改后的文件内容如下(注意,配置文件中的注释需要去掉,我这里是为了解释清楚):

[root@192.168.3.123 ~]# cat /etc/masterha/app1.cnf 

[server default]
manager_workdir=/var/log/masterha/app1.log              //设置manager的工作目录
manager_log=/var/log/masterha/app1/manager.log          //设置manager的日志
master_binlog_dir=/data/mysql                         //设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
#master_ip_failover_script= /usr/local/bin/master_ip_failover    //设置自动failover时候的切换脚本,这里先注释掉后面集成keepalived后再开启
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  //设置手动切换时候的切换脚本
password=123456         //设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码
user=root               设置监控用户root
ping_interval=1         //设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp     //设置远端mysql在发生切换时binlog的保存位置
repl_password=123456    //设置复制用户的密码
repl_user=repl          //设置复制环境中的复制用户名
report_script=/usr/local/send_report    //设置发生切换后发送的报警的脚本
secondary_check_script= /usr/local/bin/masterha_secondary_check -s B-dev18 -s B-dev10 --user=root --master_host=B-dev10 --master_ip=192.168.3.110 --master_port=3306   //一旦MHA到B-dev10的监控之间出现问题,MHA Manager将会尝试从B-dev18登录到B-dev10
shutdown_script=""      //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
ssh_user=root           //设置ssh的登录用户名

[server1]
hostname=192.168.3.110
port=3306

[server2]
hostname=192.168.3.118
port=3306
candidate_master=1   //设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0   //默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master

[server3]
hostname=192.168.3.115
port=3306

[server4]
hostname=192.168.3.117
port=3306

[root@192.168.3.123 ~]# 

(2)设置relay log的清除方式(在每个slave节点上):

[root@192.168.3.115 ~]# mysql -e 'set global relay_log_purge=0'
[root@192.168.3.117 ~]# mysql -e 'set global relay_log_purge=0'
[root@192.168.3.118 ~]# mysql -e 'set global relay_log_purge=0'

注意:

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。

pure_relay_logs脚本参数如下所示:

--user mysql                      用户名
--password mysql                  密码
--port                            端口号
--workdir                         指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge         默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。

(3)设置定期清理relay脚本(三台slave服务器)

[root@192.168.3.115 ~]# cat purge_relay_log.sh 
#!/bin/bash
user=root
passwd=123456
port=3306
log_dir='/data/masterha/log'
work_dir='/data'
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]
then
   mkdir $log_dir -p
fi

$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
[root@192.168.3.115 ~]# 

添加到crontab定期执行

[root@192.168.3.115 ~]# crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh
[root@192.168.3.115 ~]# 

purge_relay_logs脚本删除中继日志不会阻塞SQL线程。下面我们手动执行看看什么情况。

[root@192.168.3.115 ~]# purge_relay_logs --user=root --password=123456 --port=3306 -disable_relay_log_purge --workdir=/data/
2015-06-20 15:47:24: purge_relay_logs script started.
 Found relay_log.info: /data/mysql/relay-log.info
 Removing hard linked relay log files B-dev15-relay-bin* under /data/.. done.
 Current relay log file: /data/mysql/B-dev15-relay-bin.000002
 Archiving unused relay log files (up to /data/mysql/B-dev15-relay-bin.000001) ...
 Creating hard link for /data/mysql/B-dev15-relay-bin.000001 under /data//B-dev15-relay-bin.000001 .. ok.
 Creating hard links for unused relay log files completed.
 Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older  relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
 Removing hard linked relay log files B-dev15-relay-bin* under /data/.. done.
2015-06-20 15:47:27: All relay log purging operations succeeded.
[root@192.168.3.115 ~]# 

7.检查SSH配置

检查MHA Manger到所有MHA Node的SSH连接状态:

[root@192.168.3.123 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Wed Jun 24 02:04:24 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 24 02:04:24 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jun 24 02:04:24 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 24 02:04:24 2015 - [info] Starting SSH connection tests..
Wed Jun 24 02:04:26 2015 - [debug] 
Wed Jun 24 02:04:24 2015 - [debug]  Connecting via SSH from root@192.168.3.110(192.168.3.110:22) to root@192.168.3.118(192.168.3.118:22)..
Wed Jun 24 02:04:24 2015 - [debug]   ok.
Wed Jun 24 02:04:24 2015 - [debug]  Connecting via SSH from root@192.168.3.110(192.168.3.110:22) to root@192.168.3.115(192.168.3.115:22)..
Wed Jun 24 02:04:25 2015 - [debug]   ok.
Wed Jun 24 02:04:25 2015 - [debug]  Connecting via SSH from root@192.168.3.110(192.168.3.110:22) to root@192.168.3.117(192.168.3.117:22)..
Wed Jun 24 02:04:26 2015 - [debug]   ok.
Wed Jun 24 02:04:27 2015 - [debug] 
Wed Jun 24 02:04:24 2015 - [debug]  Connecting via SSH from root@192.168.3.118(192.168.3.118:22) to root@192.168.3.110(192.168.3.110:22)..
Wed Jun 24 02:04:25 2015 - [debug]   ok.
Wed Jun 24 02:04:25 2015 - [debug]  Connecting via SSH from root@192.168.3.118(192.168.3.118:22) to root@192.168.3.115(192.168.3.115:22)..
Wed Jun 24 02:04:26 2015 - [debug]   ok.
Wed Jun 24 02:04:26 2015 - [debug]  Connecting via SSH from root@192.168.3.118(192.168.3.118:22) to root@192.168.3.117(192.168.3.117:22)..
Wed Jun 24 02:04:27 2015 - [debug]   ok.
Wed Jun 24 02:04:28 2015 - [debug] 
Wed Jun 24 02:04:25 2015 - [debug]  Connecting via SSH from root@192.168.3.115(192.168.3.115:22) to root@192.168.3.110(192.168.3.110:22)..
Wed Jun 24 02:04:25 2015 - [debug]   ok.
Wed Jun 24 02:04:25 2015 - [debug]  Connecting via SSH from root@192.168.3.115(192.168.3.115:22) to root@192.168.3.118(192.168.3.118:22)..
Wed Jun 24 02:04:27 2015 - [debug]   ok.
Wed Jun 24 02:04:27 2015 - [debug]  Connecting via SSH from root@192.168.3.115(192.168.3.115:22) to root@192.168.3.117(192.168.3.117:22)..
Wed Jun 24 02:04:28 2015 - [debug]   ok.
Wed Jun 24 02:04:28 2015 - [debug] 
Wed Jun 24 02:04:25 2015 - [debug]  Connecting via SSH from root@192.168.3.117(192.168.3.117:22) to root@192.168.3.110(192.168.3.110:22)..
Wed Jun 24 02:04:26 2015 - [debug]   ok.
Wed Jun 24 02:04:26 2015 - [debug]  Connecting via SSH from root@192.168.3.117(192.168.3.117:22) to root@192.168.3.118(192.168.3.118:22)..
Wed Jun 24 02:04:27 2015 - [debug]   ok.
Wed Jun 24 02:04:27 2015 - [debug]  Connecting via SSH from root@192.168.3.117(192.168.3.117:22) to root@192.168.3.115(192.168.3.115:22)..
Wed Jun 24 02:04:28 2015 - [debug]   ok.
Wed Jun 24 02:04:28 2015 - [info] All SSH connection tests passed successfully.

可以看见各个节点ssh验证都是ok的。

8.检查整个复制环境状况

通过masterha_check_repl脚本查看整个集群的状态

[root@192.168.123 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Jun 24 01:51:17 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 24 01:51:17 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jun 24 01:51:17 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 24 01:51:17 2015 - [info] MHA::MasterMonitor version 0.53.
Wed Jun 24 01:51:18 2015 - [info] Dead Servers:
Wed Jun 24 01:51:18 2015 - [info] Alive Servers:
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.110(192.168.3.110:3306)
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.118(192.168.3.118:3306)
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.115(192.168.3.115:3306)
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.117(192.168.3.117:3306)
Wed Jun 24 01:51:18 2015 - [info] Alive Slaves:
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.118(192.168.3.118:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 01:51:18 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 01:51:18 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.115(192.168.3.115:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 01:51:18 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 01:51:18 2015 - [info]   192.168.3.117(192.168.3.117:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 01:51:18 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 01:51:18 2015 - [info] Current Alive Master: 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 01:51:18 2015 - [info] Checking slave configurations..
Wed Jun 24 01:51:18 2015 - [info]  read_only=1 is not set on slave 192.168.3.118(192.168.3.118:3306).
Wed Jun 24 01:51:18 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.118(192.168.3.118:3306).
Wed Jun 24 01:51:18 2015 - [info]  read_only=1 is not set on slave 192.168.3.115(192.168.3.115:3306).
Wed Jun 24 01:51:18 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.115(192.168.3.115:3306).
Wed Jun 24 01:51:18 2015 - [info]  read_only=1 is not set on slave 192.168.3.117(192.168.3.117:3306).
Wed Jun 24 01:51:18 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.117(192.168.3.117:3306).
Wed Jun 24 01:51:18 2015 - [info] Checking replication filtering settings..
Wed Jun 24 01:51:18 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jun 24 01:51:18 2015 - [info]  Replication filtering check ok.
Wed Jun 24 01:51:18 2015 - [info] Starting SSH connection tests..
Wed Jun 24 01:51:23 2015 - [info] All SSH connection tests passed successfully.
Wed Jun 24 01:51:23 2015 - [info] Checking MHA Node version..
Wed Jun 24 01:51:24 2015 - [info]  Version check ok.
Wed Jun 24 01:51:24 2015 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jun 24 01:51:25 2015 - [info] HealthCheck: SSH to 192.168.3.110 is reachable.
Wed Jun 24 01:51:26 2015 - [info] Master MHA Node version is 0.53.
Wed Jun 24 01:51:26 2015 - [info] Checking recovery script configurations on the current master..
Wed Jun 24 01:51:26 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000002 
Wed Jun 24 01:51:26 2015 - [info]   Connecting to root@192.168.3.110(192.168.3.110).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000002
Wed Jun 24 01:51:26 2015 - [info] Master setting check done.
Wed Jun 24 01:51:26 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jun 24 01:51:26 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.118 --slave_ip=192.168.3.118 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 01:51:26 2015 - [info]   Connecting to root@192.168.3.118(192.168.3.118:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 01:51:27 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.115 --slave_ip=192.168.3.115 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 01:51:27 2015 - [info]   Connecting to root@192.168.3.115(192.168.3.115:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 01:51:29 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.117 --slave_ip=192.168.3.117 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 01:51:29 2015 - [info]   Connecting to root@192.168.3.117(192.168.3.117:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000003
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 01:51:31 2015 - [info] Slaves settings check done.
Wed Jun 24 01:51:31 2015 - [info] 
192.168.3.110 (current master)
 +--192.168.3.118
 +--192.168.3.115
 +--192.168.3.117

Wed Jun 24 01:51:31 2015 - [info] Checking replication health on 192.168.3.118..
Wed Jun 24 01:51:31 2015 - [info]  ok.
Wed Jun 24 01:51:31 2015 - [info] Checking replication health on 192.168.3.115..
Wed Jun 24 01:51:31 2015 - [info]  ok.
Wed Jun 24 01:51:31 2015 - [info] Checking replication health on 192.168.3.117..
Wed Jun 24 01:51:31 2015 - [info]  ok.
Wed Jun 24 01:51:31 2015 - [info] Checking master_ip_failover_script status:
Wed Jun 24 01:51:31 2015 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.3.110 --orig_master_ip=192.168.3.110 --orig_master_port=3306 


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK 
Wed Jun 24 01:51:31 2015 - [info]  OK.
Wed Jun 24 01:51:31 2015 - [warning] master_ip_failover is not defined.
Wed Jun 24 01:51:31 2015 - [warning] shutdown_script is not defined.
Wed Jun 24 01:51:31 2015 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

没有报错信息,只有两个警告而已,复制也显示正常了。

9.检查MHA Manager的状态

通过master_check_status脚本查看Manager的状态:

[root@192.168.3.123 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
[root@192.168.3.123 ~]#

注意:如果正常,会显示”PING_OK”,否则会显示”NOT_RUNNING”,这代表MHA监控没有开启。

10.开启MHA Manager监控

[root@192.168.3.123 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &  
[1] 30867
[root@192.168.3.123 ~]#

启动参数介绍:

  • –remove_dead_master_conf      该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
  • –manger_log                            日志存放位置
  • –ignore_last_failover                 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为–ignore_last_failover。

查看MHA Manager监控是否正常:

[root@192.168.3.123 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:20386) is running(0:PING_OK), master:192.168.3.110
[root@192.168.3.123 ~]#

可以看见已经在监控了,而且master的主机为192.168.3.110

11.查看启动日志

[root@192.168.3.123 ~]# tail -f /var/log/masterha/app1/manager.log
Sun Jun 20 19:12:01 2015 - [info]   Connecting to root@192.168.3.117(192.168.3.117:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to B-dev17-relay-bin.000002
    Temporary relay log file is /data/mysql/B-dev17-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 20 19:12:01 2015 - [info]   Connecting to root@192.168.3.115(192.168.3.117:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to B-dev15-relay-bin.000002
    Temporary relay log file is /data/mysql/B-dev15-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 20 19:12:01 2015 - [info] Slaves settings check done.
Sun Jun 20 19:12:01 2015 - [info] 
192.168.3.110 (current master)
 +--192.168.3.115
 +--192.168.3.117
 +--192.168.3.123
Sun Apr 20 19:12:01 2015 - [warning] master_ip_failover_script is not defined.
Sun Apr 20 19:12:01 2015 - [warning] shutdown_script is not defined. 
Sun Apr 20 19:12:01 2015 - [info] Set master ping interval 1 seconds. 
Sun Apr 20 19:12:01 2015 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s B-dev18 -s B-dev10 --user=root --master_host=B-dev10 --master_ip=192.168.3.110 --master_port=3306 
Sun Apr 20 19:12:01 2015 - [info] Starting ping health check on 192.168.3.123(192.168.3.123:3306).. 
Sun Apr 20 19:12:01 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

[root@192.168.3.123 ~]#

其中”Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..”说明整个系统已经开始监控了。

12.关闭MHA Manage监控

关闭很简单,使用masterha_stop命令完成。

[root@192.168.3.123 ~]# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover --manager_log=/data/mamanager.log
[root@192.168.3.123 ~]#

13.配置VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的漂移;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。这里采用的是keepalived的方式管理vip的漂移。

keepalived方式管理虚拟ip,keepalived配置方法如下:

(1)下载软件进行并进行安装(两台master,准确的说一台是master,另外一台是备选master,在没有切换以前是slave):

[root@192.168.3.110 ~]# wget http://www.keepalived.org/software/keepalived-1.2.17.tar.gz
tar xf keepalived-1.2.17.tar.gz           
cd keepalived-1.2.17
./configure --prefix=/usr/local/keepalived
make &&  make install
cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
mkdir /etc/keepalived
cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

(2)配置keepalived的配置文件,在master上配置(192.168.3.110)

[root@192.168.3.110 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 80
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.3.250
    }
}

[root@192.168.3.110 ~]# 

其中router_id MySQL HA表示设定keepalived组的名称,将192.168.3.250这个虚拟ip绑定到该主机的eth0网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 80表示设置的优先级为80。下面的配置略有不同,但是都是一个意思。

(3)在候选master上配置(192.168.3.118)

[root@192.168.3.118 ~]# cat /etc/keepalived/keepalived.conf 
! Configuration File for keepalived

global_defs {
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 60
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.3.250
    }
}

[root@192.168.3.118 ~]# 

(4)启动keepalived服务,在master上启动并查看日志

[root@192.168.3.110 ~]# /etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
[root@192.168.3.110 ~]# tail -f /var/log/messages
Jun 20 20:22:16 192 Keepalived_healthcheckers[15334]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 20 20:22:16 192 Keepalived_healthcheckers[15334]: Configuration is using : 7231 Bytes
Jun 20 20:22:16 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
Jun 20 20:22:16 192 kernel: IPVS: ipvs loaded.
Jun 20 20:22:16 192 Keepalived_healthcheckers[15334]: Using LinkWatch kernel netlink reflector...
Jun 20 20:22:19 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jun 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Entering MASTER STATE
Jun 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) setting protocol VIPs.
Jun 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.3.250
Jun 20 20:22:20 192 Keepalived_healthcheckers[15334]: Netlink reflector reports IP 192.168.3.250 added
Jun 20 20:22:25 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.3.250

发现已经将虚拟ip 192.168.3.250绑定了网卡eth0上。

(5)查看绑定情况

[root@192.168.3.110 ~]# ip addr | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.3.110/24 brd 192.168.3.255 scope global eth0
    inet 192.168.3.250/32 scope global eth0
[root@192.168.3.110 ~]# 

在另外一台服务器,候选master上启动keepalived服务,并观察

 [root@192.168.3.118 ~]# /etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
[root@192.168.3.110 ~]# tail -f /var/log/messages
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: Registering gratuitous ARP shared channel
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: Configuration is using : 62976 Bytes
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: Using LinkWatch kernel netlink reflector...
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jun 20 20:26:18 192 Keepalived_vrrp[9472]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.3.250 added
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.3.118 added
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6a9e added
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6aa8 added
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink reflector
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink command channel
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Configuration is using : 7231 Bytes
Jun 20 20:26:18 192 kernel: IPVS: Registered protocols (TCP, UDP, AH, ESP)
Jun 20 20:26:18 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
Jun 20 20:26:18 192 kernel: IPVS: ipvs loaded.
Jun 20 20:26:18 192 Keepalived_healthcheckers[9471]: Using LinkWatch kernel netlink reflector...

从上面的信息可以看到keepalived已经配置成功。

注意:

上面两台服务器的keepalived都设置为了BACKUP模式,在keepalived中2种模式,分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下,一旦主库宕机,虚拟ip会自动漂移到从库,当主库修复后,keepalived启动后,还会把虚拟ip抢占过来,即使设置了非抢占模式(nopreempt)抢占ip的动作也会发生。在backup->backup模式下,当主库宕机后虚拟ip会自动漂移到从库上,当原主库恢复和keepalived服务启动后,并不会抢占新主的虚拟ip,即使是优先级高于从库的优先级别,也不会发生抢占。为了减少ip漂移次数,通常是把修复好的主库当做新的备库。

(6)MHA引入keepalived

要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。

编辑脚本/usr/local/bin/master_ip_failover,修改后如下,这里完整贴出该脚本(192.168.3.123)。

在MHA Manager修改脚本修改后的内容如下:

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.3.250';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

# A simple system call that enable the VIP on the new mastersub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

现在已经修改这个脚本了,我们现在打开在上面提到过的参数,再检查集群状态,看是否会报错。

[root@192.168.3.123 ~]# grep 'master_ip_failover_script' /etc/masterha/app1.cnf
master_ip_failover_script= /usr/local/bin/master_ip_failover
[root@192.168.3.123 ~]# 
[root@192.168.3.123 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf  
Wed Jun 24 02:02:20 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 24 02:02:20 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jun 24 02:02:20 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 24 02:02:20 2015 - [info] MHA::MasterMonitor version 0.53.
Wed Jun 24 02:02:20 2015 - [info] Dead Servers:
Wed Jun 24 02:02:20 2015 - [info] Alive Servers:
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.110(192.168.3.110:3306)
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.118(192.168.3.118:3306)
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.115(192.168.3.115:3306)
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.117(192.168.3.117:3306)
Wed Jun 24 02:02:20 2015 - [info] Alive Slaves:
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.118(192.168.3.118:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 02:02:20 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 02:02:20 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.115(192.168.3.115:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 02:02:20 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 02:02:20 2015 - [info]   192.168.3.117(192.168.3.117:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 24 02:02:20 2015 - [info]     Replicating from 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 02:02:20 2015 - [info] Current Alive Master: 192.168.3.110(192.168.3.110:3306)
Wed Jun 24 02:02:20 2015 - [info] Checking slave configurations..
Wed Jun 24 02:02:20 2015 - [info]  read_only=1 is not set on slave 192.168.3.118(192.168.3.118:3306).
Wed Jun 24 02:02:20 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.118(192.168.3.118:3306).
Wed Jun 24 02:02:20 2015 - [info]  read_only=1 is not set on slave 192.168.3.115(192.168.3.115:3306).
Wed Jun 24 02:02:20 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.115(192.168.3.115:3306).
Wed Jun 24 02:02:20 2015 - [info]  read_only=1 is not set on slave 192.168.3.117(192.168.3.117:3306).
Wed Jun 24 02:02:20 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.117(192.168.3.117:3306).
Wed Jun 24 02:02:20 2015 - [info] Checking replication filtering settings..
Wed Jun 24 02:02:20 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jun 24 02:02:20 2015 - [info]  Replication filtering check ok.
Wed Jun 24 02:02:20 2015 - [info] Starting SSH connection tests..
Wed Jun 24 02:02:25 2015 - [info] All SSH connection tests passed successfully.
Wed Jun 24 02:02:25 2015 - [info] Checking MHA Node version..
Wed Jun 24 02:02:26 2015 - [info]  Version check ok.
Wed Jun 24 02:02:26 2015 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jun 24 02:02:27 2015 - [info] HealthCheck: SSH to 192.168.3.110 is reachable.
Wed Jun 24 02:02:27 2015 - [info] Master MHA Node version is 0.53.
Wed Jun 24 02:02:27 2015 - [info] Checking recovery script configurations on the current master..
Wed Jun 24 02:02:27 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000002 
Wed Jun 24 02:02:27 2015 - [info]   Connecting to root@192.168.3.110(192.168.3.110).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000002
Wed Jun 24 02:02:28 2015 - [info] Master setting check done.
Wed Jun 24 02:02:28 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jun 24 02:02:28 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.118 --slave_ip=192.168.3.118 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 02:02:28 2015 - [info]   Connecting to root@192.168.3.118(192.168.3.118:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 02:02:29 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.115 --slave_ip=192.168.3.115 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 02:02:29 2015 - [info]   Connecting to root@192.168.3.115(192.168.3.115:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 02:02:30 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.3.117 --slave_ip=192.168.3.117 --slave_port=3306 --workdir=/tmp --target_version=5.6.25-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jun 24 02:02:30 2015 - [info]   Connecting to root@192.168.3.117(192.168.3.117:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000003
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jun 24 02:02:31 2015 - [info] Slaves settings check done.
Wed Jun 24 02:02:31 2015 - [info] 
192.168.3.110 (current master)
 +--192.168.3.118
 +--192.168.3.115
 +--192.168.3.117

Wed Jun 24 02:02:31 2015 - [info] Checking replication health on 192.168.3.118..
Wed Jun 24 02:02:31 2015 - [info]  ok.
Wed Jun 24 02:02:31 2015 - [info] Checking replication health on 192.168.3.115..
Wed Jun 24 02:02:31 2015 - [info]  ok.
Wed Jun 24 02:02:31 2015 - [info] Checking replication health on 192.168.3.117..
Wed Jun 24 02:02:31 2015 - [info]  ok.
Wed Jun 24 02:02:31 2015 - [info] Checking master_ip_failover_script status:
Wed Jun 24 02:02:31 2015 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.3.110 --orig_master_ip=192.168.3.110 --orig_master_port=3306 


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK 
Wed Jun 24 02:02:31 2015 - [info]  OK.
Wed Jun 24 02:02:31 2015 - [warning] shutdown_script is not defined.
Wed Jun 24 02:02:31 2015 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
可以看见没有报错。 /usr/local/bin/master_ip_failover添加或者修改的内容意思是当主库数据库发生故障时,会触发MHA切换,MHA Manager会停掉主库上的keepalived服务,触发虚拟ip漂移到备选从库,从而完成切换。当然可以在keepalived里面引入脚本,这个脚本监控mysql是否正常运行,如果不正常,则调用该脚本杀掉keepalived进程。
keepalived的vip漂移原理及配置项说明可以查看文档小谈keepalived vip漂移原理与VRRP协议

14.MHA测试

到此为止,基本MHA集群已经配置完毕。接下来就是实际的测试环节了。通过一些测试来看一下MHA到底是如何进行工作的。下面将从MHA自动failover、手动failover两种方式来介绍MHA的工作情况。

14.1. 自动Failover(必须先启动MHA Manager,否则无法自动切换,当然手动切换不需要开启MHA Manager监控)

自动failover模拟测试的操作步骤如下:

(1)使用sysbench生成测试数据(使用yum快速安装),sysbench的使用说明可以查看文档sysbench使用详解

 yum install sysbench -y

在主库(192.168.3.110)上进行sysbench数据生成,在sbtest库下生成sbtest表,共100W记录。

[root@192.168.3.110 ~]# sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --mysql-db=sps --oltp-table-name=sbtest1  --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/var/lib/mysql/mysql.sock --mysql-password=123456 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex perpare

(2)停掉slave sql线程,模拟主从延时(192.168.3.118)

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.08 sec)

mysql> 

另外两台slave我们没有停止io线程,所以还在继续接收日志。

(3)模拟sysbench压力测试

在主库上(192.168.3.110)进行压力测试,持续时间为3分钟,产生大量的binlog。

[root@192.168.3.110 ~]# sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --mysql-db=sps --oltp-table-name=sbtest1  --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/var/lib/mysql/mysql.sock --mysql-password=123456 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.


Doing OLTP test.
Running mixed OLTP test
Using Uniform distribution
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 15 times)
Done.

OLTP test statistics:
    queries performed:
        read:                            15092
        write:                           5390
        other:                           2156
        total:                           22638
    transactions:                        1078   (5.92 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 20482  (112.56 per sec.)
    other operations:                    2156   (11.85 per sec.)

Test execution summary:
    total time:                          181.9728s
    total number of events:              1078
    total time taken by event execution: 2910.4518
    per-request statistics:
         min:                                934.29ms
         avg:                               2699.86ms
         max:                               7679.95ms
         approx.  95 percentile:            4441.47ms

Threads fairness:
    events (avg/stddev):           67.3750/1.49
    execution time (avg/stddev):   181.9032/0.11

(4)开启slave(192.168.3.118)上的IO线程,追赶落后于master的binlog。

mysql> start slave io_thread;     
Query OK, 0 rows affected (0.00 sec)
mysql> 

(5)杀掉主库mysql进程,模拟主库发生故障,进行自动failover操作。

[root@192.168.3.110 ~]# pkill -9 mysqld

(6)查看MHA切换日志,了解整个切换过程,在192.168.3.123上查看日志(注意:由于测试多次切换过程,下面摘录的是192.168.3.118为主机,192.168.3.110为候选主机的切换log):

[root@192.168.3.123 ~]# tail -f /var/log/masterha/app1/manager.log

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK
Thu Jun 11 13:52:09 2015 - [info]  OK.
Thu Jun 11 13:52:09 2015 - [warning] shutdown_script is not defined.
Thu Jun 11 13:52:09 2015 - [info] Set master ping interval 1 seconds.
Thu Jun 11 13:52:09 2015 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s B-dev10 -s B-dev18  --user=root --master_host=B-dev18 --master_ip=192.168.3.118 --master_port=3306
Thu Jun 11 13:52:09 2015 - [info] Starting ping health check on 192.168.3.118(192.168.3.118:3306)..
Thu Jun 11 13:52:09 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Jun 11 13:54:01 2015 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Thu Jun 11 13:54:02 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql-bin
Thu Jun 11 13:54:02 2015 - [info] Executing seconary network check script: /usr/local/bin/masterha_secondary_check -s B-dev10 -s B-dev18  --user=root --master_host=B-dev18 --master_ip=192.168.3.118 --master_port=3306  --user=root  --master_host=192.168.3.118  --master_ip=192.168.3.118  --master_port=3306
Thu Jun 11 13:54:02 2015 - [info] HealthCheck: SSH to 192.168.3.118 is reachable.
Thu Jun 11 13:54:02 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jun 11 13:54:02 2015 - [warning] Connection failed 1 time(s)..
Monitoring server B-dev10 is reachable, Master is not reachable from B-dev10. OK.
Thu Jun 11 13:54:03 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jun 11 13:54:03 2015 - [warning] Connection failed 2 time(s)..
Monitoring server B-dev18 is reachable, Master is not reachable from B-dev18. OK.
Thu Jun 11 13:54:04 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Jun 11 13:54:04 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jun 11 13:54:04 2015 - [warning] Connection failed 3 time(s)..
Thu Jun 11 13:54:04 2015 - [warning] Master is not reachable from health checker!
Thu Jun 11 13:54:04 2015 - [warning] Master 192.168.3.118(192.168.3.118:3306) is not reachable!
Thu Jun 11 13:54:04 2015 - [warning] SSH is reachable.
Thu Jun 11 13:54:04 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Jun 11 13:54:04 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jun 11 13:54:04 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Thu Jun 11 13:54:04 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Thu Jun 11 13:54:13 2015 - [info] Dead Servers:
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:13 2015 - [info] Alive Servers:
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.110(192.168.3.110:3306)
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.115(192.168.3.115:3306)
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.117(192.168.3.117:3306)
Thu Jun 11 13:54:13 2015 - [info] Alive Slaves:
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.110(192.168.3.110:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:13 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:13 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.115(192.168.3.115:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:13 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:13 2015 - [info]   192.168.3.117(192.168.3.117:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:13 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:13 2015 - [info] Checking slave configurations..
Thu Jun 11 13:54:13 2015 - [info]  read_only=1 is not set on slave 192.168.3.110(192.168.3.110:3306).
Thu Jun 11 13:54:13 2015 - [warning]  relay_log_purge=0 is not set on slave 192.168.3.110(192.168.3.110:3306).
Thu Jun 11 13:54:13 2015 - [info]  read_only=1 is not set on slave 192.168.3.115(192.168.3.115:3306).
Thu Jun 11 13:54:13 2015 - [info]  read_only=1 is not set on slave 192.168.3.117(192.168.3.117:3306).
Thu Jun 11 13:54:13 2015 - [info] Checking replication filtering settings..
Thu Jun 11 13:54:13 2015 - [info]  Replication filtering check ok.
Thu Jun 11 13:54:13 2015 - [info] Master is down!
Thu Jun 11 13:54:13 2015 - [info] Terminating monitoring script.
Thu Jun 11 13:54:13 2015 - [info] Got exit code 20 (Master dead).
Thu Jun 11 13:54:13 2015 - [info] MHA::MasterFailover version 0.53.
Thu Jun 11 13:54:13 2015 - [info] Starting master failover.
Thu Jun 11 13:54:13 2015 - [info]
Thu Jun 11 13:54:13 2015 - [info] * Phase 1: Configuration Check Phase..
Thu Jun 11 13:54:13 2015 - [info]
Thu Jun 11 13:54:19 2015 - [info] Dead Servers:
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:19 2015 - [info] Checking master reachability via mysql(double check)..
Thu Jun 11 13:54:19 2015 - [info]  ok.
Thu Jun 11 13:54:19 2015 - [info] Alive Servers:
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.110(192.168.3.110:3306)
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.115(192.168.3.115:3306)
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.117(192.168.3.117:3306)
Thu Jun 11 13:54:19 2015 - [info] Alive Slaves:
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.110(192.168.3.110:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:19 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:19 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.115(192.168.3.115:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:19 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:19 2015 - [info]   192.168.3.117(192.168.3.117:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:19 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:22 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jun 11 13:54:22 2015 - [info]
Thu Jun 11 13:54:22 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jun 11 13:54:22 2015 - [info]
Thu Jun 11 13:54:22 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jun 11 13:54:22 2015 - [info] Executing master IP deactivatation script:
Thu Jun 11 13:54:22 2015 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.3.118 --orig_master_ip=192.168.3.118 --orig_master_port=3306 --command=stopssh --ssh_user=root 


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Disabling the VIP on old master: 192.168.3.118
Thu Jun 11 13:54:23 2015 - [info]  done.
Thu Jun 11 13:54:23 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jun 11 13:54:23 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jun 11 13:54:23 2015 - [info]
Thu Jun 11 13:54:23 2015 - [info] * Phase 3: Master Recovery Phase..
Thu Jun 11 13:54:23 2015 - [info]
Thu Jun 11 13:54:23 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jun 11 13:54:23 2015 - [info]
Thu Jun 11 13:54:23 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:145958297
Thu Jun 11 13:54:23 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jun 11 13:54:23 2015 - [info]   192.168.3.110(192.168.3.110:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:23 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:23 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 13:54:23 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:142466719
Thu Jun 11 13:54:23 2015 - [info] Oldest slaves:
Thu Jun 11 13:54:23 2015 - [info]   192.168.3.115(192.168.3.115:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:23 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:23 2015 - [info]
Thu Jun 11 13:54:23 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jun 11 13:54:23 2015 - [info]
Thu Jun 11 13:54:24 2015 - [info] Fetching dead master's binary logs..
Thu Jun 11 13:54:24 2015 - [info] Executing command on the dead master 192.168.3.118(192.168.3.118:3306): save_binary_logs --command=save --start_file=mysql-bin.000001  --start_pos=145958297 --binlog_dir=/var/lib/mysql --output_file=/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /tmp if not exists..    ok.
Concat binary/relay logs from mysql-bin.000001 pos 145958297 to mysql-bin.000001 EOF into /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000001 position 145958297 to tail(145969603).. ok.
Concat succeeded.
Thu Jun 11 13:54:26 2015 - [info] scp from root@192.168.3.118:/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog succeeded.
Thu Jun 11 13:54:26 2015 - [info] HealthCheck: SSH to 192.168.3.110 is reachable.
Thu Jun 11 13:54:27 2015 - [info] HealthCheck: SSH to 192.168.3.115 is reachable.
Thu Jun 11 13:54:28 2015 - [info] HealthCheck: SSH to 192.168.3.117 is reachable.
Thu Jun 11 13:54:29 2015 - [info]
Thu Jun 11 13:54:29 2015 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jun 11 13:54:29 2015 - [info]
Thu Jun 11 13:54:29 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jun 11 13:54:29 2015 - [info] Checking whether 192.168.3.110 has relay logs from the oldest position..
Thu Jun 11 13:54:29 2015 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=145958297 --target_mlf=mysql-bin.000001 --target_rmlp=142466719 --server_id=110 --workdir=/tmp --timestamp=20150611135413 --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  :
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003
Fast relay log position search succeeded.
Target relay log file/position found. start_file:mysqld-relay-bin.000003, start_pos:79548.
Target relay log FOUND!
Thu Jun 11 13:54:30 2015 - [info] OK. 192.168.3.110 has all relay logs.
Thu Jun 11 13:54:30 2015 - [info] Searching new master from slaves..
Thu Jun 11 13:54:30 2015 - [info]  Candidate masters from the configuration file:
Thu Jun 11 13:54:30 2015 - [info]   192.168.3.110(192.168.3.110:3306)  Version=5.6.25-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 13:54:30 2015 - [info]     Replicating from 192.168.3.118(192.168.3.118:3306)
Thu Jun 11 13:54:30 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 13:54:30 2015 - [info]  Non-candidate masters:
Thu Jun 11 13:54:30 2015 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Jun 11 13:54:30 2015 - [info] New master is 192.168.3.110(192.168.3.110:3306)
Thu Jun 11 13:54:30 2015 - [info] Starting master failover..
Thu Jun 11 13:54:30 2015 - [info]
From:
192.168.3.118 (current master)
+--192.168.3.110
+--192.168.3.115
+--192.168.3.117

To:
192.168.3.110 (new master)
+--192.168.3.115
+--192.168.3.117
Thu Jun 11 13:54:30 2015 - [info]
Thu Jun 11 13:54:30 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jun 11 13:54:30 2015 - [info]
Thu Jun 11 13:54:30 2015 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jun 11 13:54:30 2015 - [info] Sending binlog..
Thu Jun 11 13:54:31 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog to root@192.168.3.110:/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog succeeded.
Thu Jun 11 13:54:31 2015 - [info]
Thu Jun 11 13:54:31 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jun 11 13:54:31 2015 - [info]
Thu Jun 11 13:54:31 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jun 11 13:54:31 2015 - [info] Starting recovery on 192.168.3.110(192.168.3.110:3306)..
Thu Jun 11 13:54:31 2015 - [info]  Generating diffs succeeded.
Thu Jun 11 13:54:31 2015 - [info] Waiting until all relay logs are applied.
Thu Jun 11 13:59:46 2015 - [info]  done.
Thu Jun 11 13:59:46 2015 - [info] Getting slave status..
Thu Jun 11 13:59:46 2015 - [info] This slave(192.168.3.110)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:145958297). No need to recover from Exec_Master_Log_Pos.
Thu Jun 11 13:59:46 2015 - [info] Connecting to the target slave host 192.168.3.110, running recover script..
Thu Jun 11 13:59:46 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.3.110 --slave_ip=192.168.3.110  --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog --workdir=/tmp --target_version=5.6.25-log --timestamp=20150611135413 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Thu Jun 11 13:59:49 2015 - [info]
MySQL client version is 5.6.25. Using --binary-mode.
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog on 192.168.3.110:3306. This may take long time...
Applying log files succeeded.
Thu Jun 11 13:59:49 2015 - [info]  All relay logs were successfully applied.
Thu Jun 11 13:59:49 2015 - [info] Getting new master's binlog name and position..
Thu Jun 11 13:59:49 2015 - [info]  mysql-bin.000001:11760
Thu Jun 11 13:59:49 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.3.110', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=11760, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jun 11 13:59:49 2015 - [info] Executing master IP activate script:
Thu Jun 11 13:59:49 2015 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.3.118 --orig_master_ip=192.168.3.118 --orig_master_port=3306 --new_master_host=192.168.3.110 --new_master_ip=192.168.3.110 --new_master_port=3306 


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Enabling the VIP - 192.168.3.250 on the new master - 192.168.3.110
Thu Jun 11 13:59:49 2015 - [info]  OK.
Thu Jun 11 13:59:49 2015 - [info] ** Finished master recovery successfully.
Thu Jun 11 13:59:49 2015 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jun 11 13:59:49 2015 - [info]
Thu Jun 11 13:59:49 2015 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jun 11 13:59:49 2015 - [info]
Thu Jun 11 13:59:49 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jun 11 13:59:49 2015 - [info]
Thu Jun 11 13:59:49 2015 - [info] -- Slave diff file generation on host 192.168.3.115(192.168.3.115:3306) started, pid: 17896. Check tmp log /var/log/masterha/app1/192.168.3.115_3306_20150611135413.log if it takes time..
Thu Jun 11 13:59:49 2015 - [info] -- Slave diff file generation on host 192.168.3.117(192.168.3.117:3306) started, pid: 17897. Check tmp log /var/log/masterha/app1/192.168.3.117_3306_20150611135413.log if it takes time..
Thu Jun 11 13:59:52 2015 - [info]
Thu Jun 11 13:59:52 2015 - [info] Log messages from 192.168.3.117 ...
Thu Jun 11 13:59:52 2015 - [info]
Thu Jun 11 13:59:49 2015 - [info] Server 192.168.3.117 received relay logs up to: mysql-bin.000001:142482387
Thu Jun 11 13:59:49 2015 - [info] Need to get diffs from the latest slave(192.168.3.110) up to: mysql-bin.000001:145958297 (using the latest slave's relay logs)
Thu Jun 11 13:59:50 2015 - [info] Connecting to the latest slave host 192.168.3.110, generating diff relay log files..
Thu Jun 11 13:59:50 2015 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.3.117 --latest_mlf=mysql-bin.000001 --latest_rmlp=145958297 --target_mlf=mysql-bin.000001 --target_rmlp=142482387 --server_id=110 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog --workdir=/tmp --timestamp=20150611135413 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/
Thu Jun 11 13:59:52 2015 - [info]
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003
Fast relay log position search succeeded.
Target relay log file/position found. start_file:mysqld-relay-bin.000003, start_pos:95216.
Concat binary/relay logs from mysqld-relay-bin.000003 pos 95216 to mysqld-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000003 position 95216 to tail(3571126).. ok.
Concat succeeded.
Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog .
scp B-dev10:/tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog to root@192.168.3.117(22) succeeded.
Thu Jun 11 13:59:52 2015 - [info]  Generating diff files succeeded.
Thu Jun 11 13:59:52 2015 - [info] End of log messages from 192.168.3.117.
Thu Jun 11 13:59:52 2015 - [info] -- Slave diff log generation on host 192.168.3.117(192.168.3.117:3306) succeeded.
Thu Jun 11 13:59:53 2015 - [info]
Thu Jun 11 13:59:53 2015 - [info] Log messages from 192.168.3.115 ...
Thu Jun 11 13:59:53 2015 - [info]
Thu Jun 11 13:59:49 2015 - [info] Server 192.168.3.115 received relay logs up to: mysql-bin.000001:142466719
Thu Jun 11 13:59:49 2015 - [info] Need to get diffs from the latest slave(192.168.3.110) up to: mysql-bin.000001:145958297 (using the latest slave's relay logs)
Thu Jun 11 13:59:50 2015 - [info] Connecting to the latest slave host 192.168.3.110, generating diff relay log files..
Thu Jun 11 13:59:50 2015 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.3.115 --latest_mlf=mysql-bin.000001 --latest_rmlp=145958297 --target_mlf=mysql-bin.000001 --target_rmlp=142466719 --server_id=110 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog --workdir=/tmp --timestamp=20150611135413 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/
Thu Jun 11 13:59:53 2015 - [info]
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003
Fast relay log position search succeeded.
Target relay log file/position found. start_file:mysqld-relay-bin.000003, start_pos:79548.
Concat binary/relay logs from mysqld-relay-bin.000003 pos 79548 to mysqld-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000003 position 79548 to tail(3571126).. ok.
Concat succeeded.
Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog .
scp B-dev10:/tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog to root@192.168.3.115(22) succeeded.
Thu Jun 11 13:59:53 2015 - [info]  Generating diff files succeeded.
Thu Jun 11 13:59:53 2015 - [info] End of log messages from 192.168.3.115.
Thu Jun 11 13:59:53 2015 - [info] -- Slave diff log generation on host 192.168.3.115(192.168.3.115:3306) succeeded.
Thu Jun 11 13:59:53 2015 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jun 11 13:59:53 2015 - [info]
Thu Jun 11 13:59:53 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jun 11 13:59:53 2015 - [info]
Thu Jun 11 13:59:53 2015 - [info] -- Slave recovery on host 192.168.3.115(192.168.3.115:3306) started, pid: 17910. Check tmp log /var/log/masterha/app1/192.168.3.115_3306_20150611135413.log if it takes time..
Thu Jun 11 13:59:53 2015 - [info] -- Slave recovery on host 192.168.3.117(192.168.3.117:3306) started, pid: 17911. Check tmp log /var/log/masterha/app1/192.168.3.117_3306_20150611135413.log if it takes time..
Thu Jun 11 14:14:14 2015 - [info]
Thu Jun 11 14:14:14 2015 - [info] Log messages from 192.168.3.117 ...
Thu Jun 11 14:14:14 2015 - [info]
Thu Jun 11 13:59:53 2015 - [info] Sending binlog..
Thu Jun 11 13:59:53 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog to root@192.168.3.117:/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog succeeded.
Thu Jun 11 13:59:53 2015 - [info] Starting recovery on 192.168.3.117(192.168.3.117:3306)..
Thu Jun 11 13:59:53 2015 - [info]  Generating diffs succeeded.
Thu Jun 11 13:59:53 2015 - [info] Waiting until all relay logs are applied.
Thu Jun 11 13:59:53 2015 - [info]  done.
Thu Jun 11 13:59:54 2015 - [info] Getting slave status..
Thu Jun 11 13:59:54 2015 - [info] This slave(192.168.3.117)'s Exec_Master_Log_Pos(mysql-bin.000001:142481989) does not equal to Read_Master_Log_Pos(mysql-bin.000001:142482387). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Thu Jun 11 13:59:54 2015 - [info] Saving local relay logs from exec pos to read pos on 192.168.3.117: from mysqld-relay-bin.000002:142482152 to the end of the relay log..
Thu Jun 11 13:59:54 2015 - [info] Executing command : save_binary_logs --command=save --start_file=mysqld-relay-bin.000002  --start_pos=142482152 --output_file=/tmp/relay_from_exec_to_read_192.168.3.117_3306_20150611135413.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --binlog_dir=/var/lib/mysql/
Thu Jun 11 13:59:54 2015 - [info]
  Creating /tmp if not exists..    ok.
Concat binary/relay logs from mysqld-relay-bin.000002 pos 142482152 to mysqld-relay-bin.000002 EOF into /tmp/relay_from_exec_to_read_192.168.3.117_3306_20150611135413.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000002 position 142482152 to tail(142482550).. ok.
Concat succeeded.
Thu Jun 11 13:59:54 2015 - [info] Connecting to the target slave host 192.168.3.117, running recover script..
Thu Jun 11 13:59:54 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.3.117 --slave_ip=192.168.3.117  --slave_port=3306 --apply_files=/tmp/relay_from_exec_to_read_192.168.3.117_3306_20150611135413.binlog,/tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog,/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog --workdir=/tmp --target_version=5.6.25-log --timestamp=20150611135413 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Thu Jun 11 14:14:12 2015 - [info]
Concat all apply files to /tmp/total_binlog_for_192.168.3.117_3306.20150611135413.binlog ..
Copying the first binlog file /tmp/relay_from_exec_to_read_192.168.3.117_3306_20150611135413.binlog to /tmp/total_binlog_for_192.168.3.117_3306.20150611135413.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog.. dumped up to pos 283. ok.
/tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog has effective binlog events from pos 283.
  Dumping effective binlog data from /tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog position 283 to tail(3476193).. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog.. dumped up to pos 120. ok.
/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog position 120 to tail(11426).. ok.
Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.3.117_3306.20150611135413.binlog .
MySQL client version is 5.6.25. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_exec_to_read_192.168.3.117_3306_20150611135413.binlog,/tmp/relay_from_read_to_latest_192.168.3.117_3306_20150611135413.binlog,/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog on 192.168.3.117:3306. This may take long time...
Applying log files succeeded.
Thu Jun 11 14:14:12 2015 - [info]  All relay logs were successfully applied.
Thu Jun 11 14:14:12 2015 - [info]  Resetting slave 192.168.3.117(192.168.3.117:3306) and starting replication from the new master 192.168.3.110(192.168.3.110:3306)..
Thu Jun 11 14:14:14 2015 - [info]  Executed CHANGE MASTER.
Thu Jun 11 14:14:14 2015 - [info]  Slave started.
Thu Jun 11 14:14:14 2015 - [info] End of log messages from 192.168.3.117.
Thu Jun 11 14:14:14 2015 - [info] -- Slave recovery on host 192.168.3.117(192.168.3.117:3306) succeeded.
Thu Jun 11 14:14:37 2015 - [info]
Thu Jun 11 14:14:37 2015 - [info] Log messages from 192.168.3.115 ...
Thu Jun 11 14:14:37 2015 - [info]
Thu Jun 11 13:59:53 2015 - [info] Sending binlog..
Thu Jun 11 13:59:53 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog to root@192.168.3.115:/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog succeeded.
Thu Jun 11 13:59:53 2015 - [info] Starting recovery on 192.168.3.115(192.168.3.115:3306)..
Thu Jun 11 13:59:53 2015 - [info]  Generating diffs succeeded.
Thu Jun 11 13:59:53 2015 - [info] Waiting until all relay logs are applied.
Thu Jun 11 13:59:53 2015 - [info]  done.
Thu Jun 11 13:59:53 2015 - [info] Getting slave status..
Thu Jun 11 13:59:53 2015 - [info] This slave(192.168.3.115)'s Exec_Master_Log_Pos(mysql-bin.000001:142466642) does not equal to Read_Master_Log_Pos(mysql-bin.000001:142466719). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Thu Jun 11 13:59:53 2015 - [info] Saving local relay logs from exec pos to read pos on 192.168.3.115: from mysqld-relay-bin.000002:142466805 to the end of the relay log..
Thu Jun 11 13:59:53 2015 - [info] Executing command : save_binary_logs --command=save --start_file=mysqld-relay-bin.000002  --start_pos=142466805 --output_file=/tmp/relay_from_exec_to_read_192.168.3.115_3306_20150611135413.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --binlog_dir=/var/lib/mysql/
Thu Jun 11 13:59:54 2015 - [info]
  Creating /tmp if not exists..    ok.
Concat binary/relay logs from mysqld-relay-bin.000002 pos 142466805 to mysqld-relay-bin.000002 EOF into /tmp/relay_from_exec_to_read_192.168.3.115_3306_20150611135413.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000002 position 142466805 to tail(142466882).. ok.
Concat succeeded.
Thu Jun 11 13:59:54 2015 - [info] Connecting to the target slave host 192.168.3.115, running recover script..
Thu Jun 11 13:59:54 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.3.115 --slave_ip=192.168.3.115  --slave_port=3306 --apply_files=/tmp/relay_from_exec_to_read_192.168.3.115_3306_20150611135413.binlog,/tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog,/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog --workdir=/tmp --target_version=5.6.25-log --timestamp=20150611135413 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Thu Jun 11 14:14:35 2015 - [info]
Concat all apply files to /tmp/total_binlog_for_192.168.3.115_3306.20150611135413.binlog ..
Copying the first binlog file /tmp/relay_from_exec_to_read_192.168.3.115_3306_20150611135413.binlog to /tmp/total_binlog_for_192.168.3.115_3306.20150611135413.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog.. dumped up to pos 283. ok.
/tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog has effective binlog events from pos 283.
  Dumping effective binlog data from /tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog position 283 to tail(3491861).. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog.. dumped up to pos 120. ok.
/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog position 120 to tail(11426).. ok.
Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.3.115_3306.20150611135413.binlog .
MySQL client version is 5.6.25. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_exec_to_read_192.168.3.115_3306_20150611135413.binlog,/tmp/relay_from_read_to_latest_192.168.3.115_3306_20150611135413.binlog,/tmp/saved_master_binlog_from_192.168.3.118_3306_20150611135413.binlog on 192.168.3.115:3306. This may take long time...
Applying log files succeeded.
Thu Jun 11 14:14:35 2015 - [info]  All relay logs were successfully applied.
Thu Jun 11 14:14:35 2015 - [info]  Resetting slave 192.168.3.115(192.168.3.115:3306) and starting replication from the new master 192.168.3.110(192.168.3.110:3306)..
Thu Jun 11 14:14:37 2015 - [info]  Executed CHANGE MASTER.
Thu Jun 11 14:14:37 2015 - [info]  Slave started.
Thu Jun 11 14:14:37 2015 - [info] End of log messages from 192.168.3.115.
Thu Jun 11 14:14:37 2015 - [info] -- Slave recovery on host 192.168.3.115(192.168.3.115:3306) succeeded.
Thu Jun 11 14:14:37 2015 - [info] All new slave servers recovered successfully.
Thu Jun 11 14:14:37 2015 - [info]
Thu Jun 11 14:14:37 2015 - [info] * Phase 5: New master cleanup phease..
Thu Jun 11 14:14:37 2015 - [info]
Thu Jun 11 14:14:37 2015 - [info] Resetting slave info on the new master..
Thu Jun 11 14:14:37 2015 - [info]  192.168.3.110: Resetting slave info succeeded.
Thu Jun 11 14:14:37 2015 - [info] Master failover to 192.168.3.110(192.168.3.110:3306) completed successfully.
Thu Jun 11 14:14:37 2015 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Thu Jun 11 14:14:37 2015 - [info]

----- Failover Report -----

app1: MySQL Master failover 192.168.3.118 to 192.168.3.110 succeeded

Master 192.168.3.118 is down!

Check MHA Manager logs at B-dev23:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.3.118.
The latest slave 192.168.3.110(192.168.3.110:3306) has all relay logs for recovery.
Selected 192.168.3.110 as a new master.
192.168.3.110: OK: Applying all logs succeeded.
192.168.3.110: OK: Activated master IP address.
192.168.3.117: Generating differential relay logs up to 192.168.3.110 succeeded.
192.168.3.115: Generating differential relay logs up to 192.168.3.110 succeeded.
Generating relay diff files from the latest slave succeeded.
192.168.3.117: OK: Applying all logs succeeded. Slave started, replicating from 192.168.3.110.
192.168.3.115: OK: Applying all logs succeeded. Slave started, replicating from 192.168.3.110.
192.168.3.110: Resetting slave info succeeded.
Master failover to 192.168.3.110(192.168.3.110:3306) completed successfully.
看到最后的Master failover to 192.168.3.110(192.168.3.110:3306) completed successfully.说明备选master现在已经上位了。

从上面的输出可以看出整个MHA的切换过程,共包括以下的步骤:

  • 连接主机三次失败确认主机宕机,摘除宕机的vip
  • 检查从库中使用的最新binlog file/pos
  • 检查从库中使用的最老binlog file/pos
  • 获取宕掉主机的binlog:根据步骤1获取的binlog位置开始保存未同步到从机的binlog(导出binlog描述事件,导出影响的binlog数据);将导出的binlog发送到mha-manage服务器指定目录下
  • healthcheck:监控ssh到其他的机器是否正常
  • 检查最新binlog file/pos的从机是否有需要恢复其他从机所需的所有relay log
  • 如果其他从机跟最新从机pos不一致,需要最新从机的relay log进行一致性恢复
  • 检查是否设置候选主机
  • 开始候选主机切换:先检查候选主机是否有所有的relay log,没有首先从最新从机恢复,然后将mha收到宕机主机缺省的binlog发送给候选主机,识别差异的relay log并将差异的事件应用所有的relay log,执行vip漂移到候选主机,候选主机恢复结束
  • 其他从机relay log恢复:先检查从机是否有所有的relay log,没有首先从最新从机恢复,然后将mha收到宕机主机缺省的binlog发送给从机,识别差异的relay log并将差异的事件应用所有的relay log
  • 将所有从机指向为候选主机
启动MHA Manger监控,查看集群里面现在谁是master。
(7)检查切换后数据是主从数据是否一致
在主机和候选主机上分别安装percona-toolkit,安装流程及使用文档请查看percona-toolkit 简明教程
这里简单说下pt-table-checksum 的工作原理: pt-table-checksum 在主上执行检查语句在线检查mysql复制的一致性, 生成replace 语句,然后通过复制传递到从,再通过update更新master_src 的值。通过检测从上this_src 和master_src 的值从而判断复制是否一致。注意: 使用的时候选择业务地峰的时候运行,因为运行的时候会造成表的部分记录锁定。 使用–max-load 来指定最大的负载情况,如果达到那个负载这个暂停运行。 如果发现有不一致的数据,可以使用pt-table-sync 工具来修复。
在切换后的新主机上使用命令:
[root@192.168.3.110 ~]# pt-table-checksum --nocheck-replication-filters --chunk-size=1000000 --no-check-binlog-format --replicate=sps.checksums --databases=sps h=127.0.0.1,u=root,p=123456,P=3306  
          
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
06-23T18:06:38      0      0        0       1       0   3.885 sps.sps_account
06-23T18:06:42      0      0        3       1       0   3.118 sps.sps_activiti_goods
06-23T18:06:46      0      0        0       1       0   4.613 sps.sps_activiti_prop
06-23T18:06:49      0      0        2       1       0   3.013 sps.sps_activiti_re
06-23T18:06:52      0      0        3       1       0   2.858 sps.sps_activity
06-23T18:06:55      0      0        1       1       0   3.359 sps.sps_app_version
06-23T18:06:58      0      0        4       1       0   2.894 sps.sps_brand
06-23T18:07:02      0      0     3272       1       0   3.367 sps.sps_district
06-23T18:07:05      0      0       16       1       0   2.935 sps.sps_goods
06-23T18:07:08      0      0        6       1       0   3.761 sps.sps_goods_brand_cat
06-23T18:07:12      0      0      332       1       0   3.418 sps.sps_goods_category
06-23T18:07:16      0      0        0       1       0   4.175 sps.sps_goods_category_template
06-23T18:07:20      0      0        5       1       0   4.061 sps.sps_goods_comment
06-23T18:07:24      0      0        9       1       0   3.784 sps.sps_goods_desc
06-23T18:07:27      0      0        4       1       0   3.487 sps.sps_goods_favorite
06-23T18:07:33      0      0       14       1       0   5.847 sps.sps_goods_media
06-23T18:07:37      0      0        2       1       0   3.811 sps.sps_goods_recommend
06-23T18:07:41      0      0        2       1       0   3.623 sps.sps_goods_recommend_group
06-23T18:07:44      0      0        4       1       0   3.402 sps.sps_goods_recommend_group_cate
06-23T18:07:48      0      0       12       1       0   3.662 sps.sps_goods_sku_attribute_value
06-23T18:07:52      0      0       19       1       0   3.951 sps.sps_goods_stock
06-23T18:07:55      0      0        6       1       0   3.557 sps.sps_goods_template
06-23T18:07:58      0      0        2       1       0   3.226 sps.sps_goods_tmpl_sku_attribute_line
06-23T18:08:03      0      0        5       1       0   4.183 sps.sps_goods_tmpl_sku_attribute_line_value
06-23T18:08:08      0      0   208733       1       0   5.422 sps.sps_member
06-23T18:08:11      0      0        2       1       0   3.304 sps.sps_member_account_seq
06-23T18:08:15      0      0    27721       1       0   3.575 sps.sps_member_address
06-23T18:08:18      0      0        2       1       0   3.116 sps.sps_member_baby_birthday
06-23T18:08:21      0      0        1       1       0   2.876 sps.sps_member_grade_change_info
06-23T18:08:24      0      0        0       1       0   3.563 sps.sps_member_grade_dict
06-23T18:08:28      0      0       16       1       0   3.848 sps.sps_member_points_seq
06-23T18:08:31      0      0        0       1       0   2.958 sps.sps_menu
06-23T18:08:35      0      0        0       1       0   3.687 sps.sps_message
06-23T18:08:39      0      0        5       1       0   3.589 sps.sps_message_text
06-23T18:08:41      0      0      746       1       0   2.848 sps.sps_order
06-23T18:08:46      0      0       94       1       0   4.529 sps.sps_order_goods_detail
06-23T18:08:49      0      0       23       1       0   3.030 sps.sps_order_status
06-23T18:08:52      0      0        0       1       0   3.390 sps.sps_payment
06-23T18:08:57      0      0       13       1       0   4.559 sps.sps_promotion
06-23T18:09:00      0      0       15       1       0   2.986 sps.sps_promotion_goods
06-23T18:09:04      0      0        0       1       0   4.158 sps.sps_role
06-23T18:09:08      0      0        0       1       0   3.880 sps.sps_role_menu
06-23T18:09:11      0      0       16       1       0   2.997 sps.sps_seller
06-23T18:09:15      0      0        4       1       0   3.808 sps.sps_shop_trolley
06-23T18:09:18      0      0        6       1       0   3.409 sps.sps_sign_in
06-23T18:09:21      0      0        2       1       0   3.067 sps.sps_sku_attribute
06-23T18:09:25      0      0        5       1       0   3.871 sps.sps_sku_attribute_value
06-23T18:09:30      0      0        1       1       0   4.466 sps.sps_store
06-23T18:09:33      0      0        0       1       0   3.378 sps.sps_store_account
06-23T18:09:36      0      0        6       1       0   3.331 sps.sps_sys_config
06-23T18:09:40      0      0        8       1       0   3.391 sps.sps_sys_message
06-23T18:09:43      0      0       14       1       0   3.182 sps.sps_ticket
06-23T18:09:46      0      0       12       1       0   3.614 sps.sps_ticket_goods
06-23T18:09:50      0      0      140       1       0   3.419 sps.sps_ticket_his
06-23T18:09:53      0      0       12       1       0   3.041 sps.sps_ticket_prop
06-23T18:09:56      0      0        0       1       0   3.091 sps.sps_user_role
06-23T18:10:00      0      0        5       1       0   3.592 sps.sps_warehouse

从结果中,我们可以看到DIFFS0,说明主从数据一致,故障切换流程没有数据丢失。

14.2.手动Failover(MHA Manager必须没有运行)

手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下:

注意:如果MHA Manager检测到没有dead的server,将报错,并结束failover:

Mon Jun 21 21:23:33 2015 - [info] Dead Servers:
Mon Jun 21 21:23:33 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.
Mon Jun 21 21:23:33 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53

进行手动切换命令如下:

[root@192.168.3.123 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.3.110 --dead_master_port=3306 --new_master_host=192.168.3.118 --new_master_port=3306 --ignore_last_failover

14.3.修复宕机的Master

通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:

[root@192.168.3.123 app1]# grep -i "All other slaves should start" manager.log 
Mon Jun 21 22:28:33 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.3.118', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000022', MASTER_LOG_POS=506716, MASTER_USER='repl', MASTER_PASSWORD='xxx';
[root@192.168.3.123 app1]# 

获取上述信息以后,就可以直接在修复后的master上执行change master to相关操作,重新作为从库了。

Atlas配置

1.简介

Atlas是由 Qihoo 360, Web平台部基础架构团队开发维护的一个基于MySQL协议的数据中间层项目。它是在mysql-proxy 0.8.2版本的基础上,对其进行了优化,增加了一些新的功能特性。360内部使用Atlas运行的mysql业务,每天承载的读写请求数达几十亿条。

主要功能:

  • 读写分离
  • 从库负载均衡
  • IP过滤
  • 自动分表
  • DBA可平滑上下线DB
  • 自动摘除宕机的DB

2.安装(192.168.3.124,192.168.3.125)

2.1 RPM/DEB包安装

https://github.com/Qihoo360/Atlas/releases 页面下载最新版RPM包,我这里下载的是Atlas-2.2.1.el6.x86_64.rpm,然后执行:sudo rpm –i Atlas-2.2.1.el6.x86_64.rpm安装。
如果是Debian/Ubuntu用户,下载DEB包后用dpkg -i安装即可。

注意事项:

(1).Atlas只能安装运行在64位的系统上。

(2).Centos 5.X安装 Atlas-XX.el5.x86_64.rpm,Centos 6.X安装Atlas-XX.el6.x86_64.rpm。

(3).如果执行sudo rpm –i Atlas-XX.el6.x86_64.rpm,提示类似:“file /usr/local/mysql-proxy/bin/encrypt from install of Atlas-2.0.1-1.x86_64 conflicts with file from package Atlas-1.0.3-1.x86_64”错误,则表示该系统之前已经安装过Atlas-1.0.3-1.x86_64,需要执行:sudo rpm –e Atlas-1.0.3-1.x86_64,将之前安装的Atlas删除掉,再执行sudo rpm –i Atlas-XX.el6.x86_64.rpm安装新版本的Atlas。

(4).后端mysql版本应大于5.1,建议使用Mysql 5.6

2.2 源码包安装

对Atlas系统不太熟悉的新用户,推荐使用rpm包的安装Atlas。对于想源码编译安装Atlas的用户,可参考http://blog.qixingzhong.com/2013/09/centos-install-atlas.html 提供的安装方式进行安装。注意:Atlas为了提升内存管理效率,使用了jemalloc库,源码编译的时候需要额外安装jemalloc库。

2.3 配置文件修改

Atlas运行需要依赖一个配置文件(test.cnf)。在运行Atlas之前,需要对该文件进行配置。Atlas的安装目录是/usr/local/mysql-proxy,进入安装目录下的conf目录,可以看到已经有一个名为test.cnf的默认配置文件,我们只需要修改里面的某些配置项,不需要从头写一个配置文件。
配置文件如下:
[mysql-proxy]

#带#号的为非必需的配置项目

#管理接口的用户名
admin-username = atlas

#管理接口的密码
admin-password = hugnew

#Atlas后端连接的MySQL主库的IP和端口,可设置多项,用逗号分隔(这里是Mysql主机提供的vip)
proxy-backend-addresses = 192.168.3.250:3306

#Atlas后端连接的MySQL从库的IP和端口,@后面的数字代表权重,用来作负载均衡,若省略则默认为1,可设置多项,用逗号分隔
proxy-read-only-backend-addresses = 192.168.3.115:3306,192.168.3.117:3306

#用户名与其对应的加密过的MySQL密码,密码使用PREFIX/bin目录下的加密程序encrypt加密,下行的user1和user2为示例,将其替换为你的MySQL的用户名和加密密码!
pwds = root:or9RzYJBtvga8s/oWZlMvQ==

#设置Atlas的运行方式,设为true时为守护进程方式,设为false时为前台方式,一般开发调试时设为false,线上运行时设为true,true后面不能有空格。
daemon = false

#设置Atlas的运行方式,设为true时Atlas会启动两个进程,一个为monitor,一个为worker,monitor在worker意外退出后会自动将其重启,设为false时只有worker,没有monitor,一般开发调试时设为false,线上运行时设为true,true后面不能有空格。
keepalive = false

#工作线程数,对Atlas的性能有很大影响,可根据情况适当设置
event-threads = 4

#日志级别,分为message、warning、critical、error、debug五个级别
log-level = message

#日志存放的路径
log-path = /usr/local/mysql-proxy/log

#SQL日志的开关,可设置为OFF、ON、REALTIME,OFF代表不记录SQL日志,ON代表记录SQL日志,REALTIME代表记录SQL日志且实时写入磁盘,默认为OFF
sql-log = ON

#慢日志输出设置。当设置了该参数时,则日志只输出执行时间超过sql-log-slow(单位:ms)的日志记录。不设置该参数则输出全部日志。
sql-log-slow = 10

#实例名称,用于同一台机器上多个Atlas实例间的区分(instance名与配置文件名test.cnf要一样)
instance = test

#Atlas监听的工作接口IP和端口
proxy-address = 0.0.0.0:1234

#Atlas监听的管理接口IP和端口
admin-address = 0.0.0.0:2345

#分表设置,此例中person为库名,mt为表名,id为分表字段,3为子表数量,可设置多项,以逗号分隔,若不分表则不需要设置该项
#tables = person.mt.id.3

#默认字符集,设置该项后客户端不再需要执行SET NAMES语句
charset = utf8

#允许连接Atlas的客户端的IP,可以是精确IP,也可以是IP段,以逗号分隔,若不设置该项则允许所有IP连接,否则只允许列表中的IP连接
#client-ips = 127.0.0.1, 192.168.1

#Atlas前面挂接的LVS的物理网卡的IP(注意不是虚IP),若有LVS且设置了client-ips则此项必须设置,否则可以不设置
#lvs-ips = 192.168.1.1

2.4 运行Atlas

进入/usr/local/mysql-proxy/bin目录,执行下面的命令启动、重启或停止Atlas。

(1). sudo ./mysql-proxyd test start,启动Atlas。

(2). sudo ./mysql-proxyd test restart,重启Atlas。

(3). sudo ./mysql-proxyd test stop,停止Atlas。

注意:

(1). 运行文件是:mysql-proxyd(不是mysql-proxy)。

(2). test是conf目录下配置文件的名字,也是配置文件里instance项的名字,三者需要统一。

(3). 可以使用ps -ef | grep mysql-proxy查看Atlas是否已经启动或停止。

3.Atlas测试

1234端口是Atlas客户端的接口,在Mysql主服务器上执行命令:mysql -h127.0.0.1 -P1234 -uroot -p123456,这里的用户名密码是Mysql访问的用户名及密码,测试连接Mysql正常,client端连接Atlas执行几条SQL语句正常说明说明Atlas正常。
2345端口是Atlas的管理接口,此接口提供了查看mysql状态,设置上线、下线后端mysql服务器。
访问方法使用mysql命令
mysql -h 127.0.0.1 -P2345 - uatlas -phugnew
这里的用户名密码是上面配置文件中的admin-username = admin、admin-password = password
下面是管理接口的使用帮助
SELECT * FROM help;  # 查看帮助信息
SELECT * FROM backends; # 查看后端主机状态
SET OFFLINE $backend_id  # 下线一台主机,$backend_id == SELECT * FROM backends查出来的ID
SET ONLINE $backend_id   # 上线一台主机
ADD MASTER $backend      # 添加一台master节点,如: ADD MASTER 10.57.1.131;
ADD SLAVE $backend       # 添加一台slave节点,如 : ADD SLAVE 10.57.1.132;
REMOVE BACKEND $backend_id # 删除一个节点
更多资料查看官方地址:https://github.com/Qihoo360/Atlas

LVS配置

1.简介

LVS集群采用IP负载均衡技术和基于内容请求分发技术。调度器具有很好的吞吐率,将请求均衡地转移到不同的服务器上执行,且调度器自动屏蔽掉服 务器的故障,从而将一组服务器构成一个高性能的、高可用的虚拟服务器。整个服务器集群的结构对客户是透明的,而且无需修改客户端和服务器端的程序。
LVS集群中实现的三种IP负载均衡技术是VS/NAT、VS/TUN和VS/DR技术。它们主要解决系统的可伸缩性和透明性问题,如何通过负载调度器将请求高 效地分发到不同的服务器执行,使得由多台独立计算机组成的集群系统成为一台虚拟服务器;客户端应用程序与集群系统交互时,就像与一台高性能的服务器交互一 样。
在内核中的连接调度算法上,IPVS已实现了以下八种调度算法:
  • 轮叫调度(Round-Robin Scheduling)
  • 加权轮叫调度(Weighted Round-Robin Scheduling)
  • 最小连接调度(Least-Connection Scheduling)
  • 加权最小连接调度(Weighted Least-Connection Scheduling)
  • 基于局部性的最少链接(Locality-Based Least Connections Scheduling)
  • 带复制的基于局部性最少链接(Locality-Based Least Connections with Replication Scheduling)
  • 目标地址散列调度(Destination Hashing Scheduling)
  • 源地址散列调度(Source Hashing Scheduling)
更多详细信息可以查看官方文档:http://www.linuxvirtualserver.org/zh/

2.准备工作(每台LVS机器)

首先禁掉防火墙和SeLinux,分为临时禁掉和永久禁掉,命令如下
####临时禁掉防火墙,重启服务器失效
service iptables stop 
####临时禁掉Selinux,重启服务器失效
setenforce 0 
####永久禁掉防火墙,需重启服务器
chkconfig iptables off
####永久禁掉Selinux,需重启服务器
修改/etc/selinux/config文件中设置SELINUX=disabled ,然后重启服务器。

3.安装LVS软件

LVS软件包括二部分:

  • IPVS模块,LVS已经是Linux标准内核的一部分,直接被编译在内核中!
  • IPVS管理工具IPVSadm ,如:IPVSadm-1.26

(1)检查是否安装了IPVS模块

LVS已经是Linux标准内核的一部分,直接被编译在内核中,可以直接使用modprobe命令即可查看。

(2)安装IPVS管理软件及依赖包
yum -y install ipvsadm
yum -y install kernel-devel
yum -y install libnl-devel
yum -y install popt-devel

4.安装keepalived

tar zxf keepalived-1.2.14.tar.gz
cd keepalived-1.2.2
./configure --prefix=/usr/local/keepalived --sysconf=/etc --with-kernel-dir=/usr/src/kernels/2.6.32-71.el6.x86_64/
直到出现四个:yes
Use IPVS Framework: Yes 
IPVS sync daemon support : Yes 
IPVS use libnl: Yes 
Use VRRP Framework : Yes

然后进行编译

make && make install
ln -s /usr/local/sbin/keepalived /sbin/ //做个软链接
执行命令keepalived –help命令检查安装keepalived是否成功。

5.编辑配置文件

5.1 在主机上配置keepalived(192.168.3.119)

keepalived的原理可以查看文档小谈keepalived vip漂移原理与VRRP协议
vi /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
   notification_email {
     acassen@firewall.loc   #设置报警邮件地址,可以设置多个,每行1个
     failover@firewall.loc  #需开启邮件报警及本机的Sendmail服务。
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 127.0.0.1  #设置SMTP Server地址;
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}
########VRRP Instance########
vrrp_instance VI_1 {
    state MASTER    #指定Keepalived的角色,MASTER为主机服务器,BACKUP为备用服务器
    interface eth0  #BACKUP为备用服务器
    virtual_router_id 51
    priority 100    #定义优先级,数字越大,优先级越高,主机器必须大于备机。
    advert_int 1
    authentication {
        auth_type PASS  #设置验证类型,主要有PASS和AH两种
        auth_pass 1111  #设置验证密码
    }
    virtual_ipaddress {
        192.168.3.251  #设置主DR的虚拟IP地址(virtual IP),可多设,但必须每行1个
    }
}
########Virtual Server########
virtual_server 192.168.3.251 1234 {  #注意IP地址与端口号之间用空格隔开
    delay_loop 6        #设置健康检查时间,单位是秒
    lb_algo lc          #设置负载调度算法,默认为rr,即轮询算法,最优秀是wlc算法,这里使用的是最小链接算法
    lb_kind DR          #设置LVS实现LB机制,有NAT、TUNN和DR三个模式可选
    #nat_mask 255.255.255.0
    #persistence_timeout 50  #会话保持时间,单位为秒
    protocol TCP        #指定转发协议类型,有TCP和UDP两种
    real_server 192.168.3.124 1234 {
        weight 1          #配置节点权值,数字越大权值越高
        TCP_CHECK {
            connect_timeout 3     #表示3秒无响应,则超时
            nb_get_retry 3        #表示重试次数
            delay_before_retry 3  #表示重试间隔
        }
    }
    real_server 192.168.3.125 1234 {  #配置服务器节点,即Real Server2的public IP
        weight 3            #配置节点权值,数字越大权值越高
        TCP_CHECK {
            connect_timeout 3       #表示3秒无响应,则超时
            nb_get_retry 3          #表示重试次数
            delay_before_retry 3    #表示重试间隔
        }
    }
}

5.2 在备机上配置keepalived(192.168.3.120)

备机配置文件与主机有两处不同,如下
把“state MASTER”修改为“state BACKUP” #即备用服务器;
把“priority 100”修改为“priority 90” #优先级为90。

6.配置real server节点(Atlas所有服务器)

(1)安装httpd
yum –y install httpd
(2)启动httpd服务
修改httpd.conf配置文件
vi /etc/httpd/conf/httpd.conf
#265 ServerName www.example.com:80 //去掉前面的#号
/etc/init.d/httpd restart
(3)编写ip绑定脚本realserver.sh
#add for chkconfig
#chkconfig: 2345 70 30  #234都是文本界面,5就是图形界面X,70启动顺序号,30系统关闭,脚本
#止顺序号
#description: RealServer's script  #关于脚本的简短描述
#processname: realserver.sh       #第一个进程名,后边设置自动时会用到
#!/bin/bash
VIP=192.168.3.251
source /etc/rc.d/init.d/functions
case "$1" in
start)
       ifconfig lo:0 $VIP netmask 255.255.255.255 broadcast $VIP
       ifconfig lo:0 $VIP netmask 255.255.255.255 broadcast $VIP
       /sbin/route add -host $VIP dev lo:0
       echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
       echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
       echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
       echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
       echo "RealServer Start OK"
       ;;
stop)
       ifconfig lo:0 down
       route del $VIP >/dev/null 2>&1
       echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore
       echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce
       echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore
       echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce
       echo "RealServer Stoped"
       ;;
       *)
       echo "Usage: $0 {start|stop}"
       exit 1
esac
exit 0
(4)为realserver.sh添加权限并执行
chmod 755 realserver.sh
./realserver.sh start
(5)检查ip绑定是否成功
执行ip addr命令
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 192.168.3.251/32 brd 192.168.3.251 scope global lo:0
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:1a:4a:e0:4c:41 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.124/24 brd 192.168.3.255 scope global eth0
inet6 fe80::21a:4aff:fee0:4c41/64 scope link
valid_lft forever preferred_lft forever
可以看到ip已经绑定成功

7. lvs启动keepalived并检查是否生效

(1) lvs主机启动keepalived并检查是否生效
启动keepalived
/etc/init.d/keepalived start
检查是否生效,执行ipvsadm -Ln命令
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.3.251:1234 lc
-> 192.168.3.124:1234           Route   5      0          0
-> 192.168.3.125:1234           Route   5      0          0
执行ip addr 命令
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:1a:4a:e0:4c:32 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.119/23 brd 192.168.3.255 scope global eth0
inet6 fe80::21a:4aff:fee0:4c32/64 scope link
valid_lft forever preferred_lft forever
可以看到lvs主机配置已经生效。
(2) lvs备机启动keepalived并检查是否生效
启动keepalived
/etc/init.d/keepalived start
检查是否生效,执行ipvsadm -Ln命令
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.3.251:1234 lc
-> 192.168.3.124:1234           Route   5      0          0
-> 192.168.3.125:1234           Route   5      0          0
可以看到lvs备机配置也已经生效。

8.LVS测试

对lvs的vip地址192.168.3.251:1234发请求,使用Linux抓包工具tcpdump在atlas服务器上抓包,抓包命令为tcpdump -i eth0 -nn port 1234 and src host 192.168.3.251,测试lvs负载请求到atlas是否正常。

结语

ok,Mysql半复制+MHA+Keepalived+Atlas+LVS的集成就说到这里,这里面涉及到知识点可能比较多也比较散,如果有疑问的可以先查看官方文档,也可以找我交流哦。

参考网址

转载请注明:HugNew » MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS

喜欢 (13)or分享 (0)
发表我的评论
取消评论

表情
(12)个小伙伴在吐槽
  1. 大神 膜拜了.写的非常认真,谢谢分享, :mrgreen:
    匿名2015-12-01 19:01 回复
  2. 我部署了Keepalived+atlas,但是部署了网站上去之后,发现有部分页面显示403,偶尔刷新有可以了,请问有遇到过这种问题吗?
    匿名2015-12-07 10:50 回复
    • 这个没遇到过,有Atlas的相关问题你可以去Atals的issue(https://github.com/Qihoo360/Atlas/issues)去提问
      匿名2015-12-12 14:30 回复
  3. 请问直接用M-S和LVS不就可以实现读写分离么?这里的Atlas是什么作用
    匿名2015-12-15 09:52 回复
    • 相当于服务端侧做了读写分离,那对客户端侧是透明的
      匿名2017-02-05 16:40 回复
  4. 大神,膜拜呀,我一直参考你的文章进行测试
    匿名2016-02-26 18:32 回复
    • 谢谢,互相学习
      匿名2016-03-04 16:05 回复
  5. 请问master_ip_online_change的脚本应该怎么修改呢?
    匿名2016-05-30 15:16 回复
  6. 大神 膜拜 
    匿名2016-07-29 18:21 回复
  7. 大神 膜拜了
    匿名2016-11-14 20:16 回复
  8. 请问,realserver的lo绑定VIP有什么用,而且keepalived运行时,并没有发现eth0绑定VIP成功,难道是在realserver上的eth0上绑定的,在keepalived所在服务器上没有绑定?
    匿名2017-07-03 10:24 回复
  9. 拜谢大神
    匿名2018-05-25 18:04 回复