深入剖析Redis系列： Redis哨兵模式与高可用集群

前言 Redis 的主从复制模式下，一旦主节点由于故障不能提供服务，需要手动将从节点晋升为主节点，同时还要通知客户端更新主节点地址，这种故障处理方式从一定程度上是无法接受的。达到当天最大量API KEY 超过次数限制Redis 2.8 以后提供了 Redis sentinel 哨兵机制来解决这个问题。

正文

1. Redis高可用概述

在 Web 服务器中，高可用是指服务器可以正常访问的时间，衡量的标准是在多长时间内可以提供正常服务（99.9%、99.99%、99.999% 等等）。在 Redis 层面，高可用的含义要宽泛一些，除了保证提供正常服务（如主从分离、快速容灾技术等），还需要考虑数据容量扩展、数据安全等等。

在 Redis 中，实现高可用的技术主要包括持久化、复制、哨兵和集群，下面简单说明它们的作用，以及解决了什么样的问题：

持久化：持久化是最简单的高可用方法。它的主要作用是数据备份，即将数据存储在硬盘，保证数据不会因进程退出而丢失。
复制：复制是高可用 Redis 的基础，哨兵和集群都是在复制基础上实现高可用的。复制主要实现了数据的多机备份以及对于读操作的负载均衡和简单的故障恢复。缺陷是故障恢复无法自动化、写操作无法负载均衡、存储能力受到单机的限制。
哨兵：在复制的基础上，哨兵实现了自动化的故障恢复。缺陷是写操作无法负载均衡，存储能力受到单机的限制。
集群：通过集群，Redis 解决了写操作无法负载均衡以及存储能力受到单机限制的问题，实现了较为完善的高可用方案。

2. Redis Sentinel的基本概念

Redis Sentinel 是 Redis 高可用的实现方案。Sentinel 是一个管理多个 Redis 实例的工具，它可以实现对 Redis 的监控、通知、自动故障转移。下面先对 Redis Sentinel 的基本概念进行简单的介绍。

基本名词说明：

如图所示，Redis 的主从复制模式和 Sentinel 高可用架构的示意图：

3. Redis主从复制的问题

Redis 主从复制可将主节点数据同步给从节点，从节点此时有两个作用：

一旦主节点宕机，从节点作为主节点的备份可以随时顶上来。
扩展主节点的读能力，分担主节点读压力。

主从复制同时存在以下几个问题：

一旦主节点宕机，从节点晋升成主节点，同时需要修改应用方的主节点地址，还需要命令所有从节点去复制新的主节点，整个过程需要人工干预。
主节点的写能力受到单机的限制。
主节点的存储能力受到单机的限制。
原生复制的弊端在早期的版本中也会比较突出，比如：Redis 复制中断后，从节点会发起 psync。此时如果同步不成功，则会进行全量同步，主库执行全量备份的同时，可能会造成毫秒或秒级的卡顿。

4. Redis Sentinel深入探究

4.1. Redis Sentinel的架构

4.2. Redis Sentinel的主要功能

Sentinel 的主要功能包括主节点存活检测、主从运行情况检测、自动故障转移（failover）、主从切换。Redis 的 Sentinel 最小配置是一主一从。

Redis 的 Sentinel 系统可以用来管理多个 Redis 服务器，该系统可以执行以下四个任务：

监控

Sentinel 会不断的检查主服务器和从服务器是否正常运行。

通知

当被监控的某个 Redis 服务器出现问题，Sentinel 通过 API 脚本向管理员或者其他的应用程序发送通知。

自动故障转移

当主节点不能正常工作时，Sentinel 会开始一次自动的故障转移操作，它会将与失效主节点是主从关系的其中一个从节点升级为新的主节点，并且将其他的从节点指向新的主节点。

配置提供者

在 Redis Sentinel 模式下，客户端应用在初始化时连接的是 Sentinel 节点集合，从中获取主节点的信息。

4.3. 主观下线和客观下线

默认情况下，每个 Sentinel 节点会以每秒一次的频率对 Redis 节点和其它的 Sentinel 节点发送 PING 命令，并通过节点的回复来判断节点是否在线。

主观下线

主观下线适用于所有主节点和从节点。如果在 down-after-milliseconds 毫秒内，Sentinel 没有收到目标节点的有效回复，则会判定该节点为主观下线。

客观下线

客观下线只适用于主节点。如果主节点出现故障，Sentinel 节点会通过 sentinel is-master-down-by-addr 命令，向其它 Sentinel 节点询问对该节点的状态判断。如果超过 <quorum> 个数的节点判定主节点不可达，则该 Sentinel 节点会判断主节点为客观下线。

4.4. Sentinel的通信命令

Sentinel 节点连接一个 Redis 实例的时候，会创建 cmd 和 pub/sub 两个连接。Sentinel 通过 cmd 连接给 Redis 发送命令，通过 pub/sub 连接到 Redis 实例上的其他 Sentinel 实例。

Sentinel 与 Redis 主节点和从节点交互的命令，主要包括：

Sentinel 与 Sentinel 交互的命令，主要包括：

4.5. Redis Sentinel的工作原理

每个 Sentinel 节点都需要定期执行以下任务：

每个 Sentinel 以每秒钟一次的频率，向它所知的主服务器、从服务器以及其他 Sentinel 实例发送一个 PING 命令。

如果一个实例（instance）距离最后一次有效回复 PING 命令的时间超过 down-after-milliseconds 所指定的值，那么这个实例会被 Sentinel 标记为主观下线。

如果一个主服务器被标记为主观下线，那么正在监视这个主服务器的所有 Sentinel 节点，要以每秒一次的频率确认主服务器的确进入了主观下线状态。

如果一个主服务器被标记为主观下线，并且有足够数量的 Sentinel（至少要达到配置文件指定的数量）在指定的时间范围内同意这一判断，那么这个主服务器被标记为客观下线。

在一般情况下，每个 Sentinel 会以每 10 秒一次的频率，向它已知的所有主服务器和从服务器发送 INFO 命令。当一个主服务器被 Sentinel 标记为客观下线时，Sentinel 向下线主服务器的所有从服务器发送 INFO 命令的频率，会从 10 秒一次改为每秒一次。

Sentinel 和其他 Sentinel 协商主节点的状态，如果主节点处于 SDOWN 状态，则投票自动选出新的主节点。将剩余的从节点指向新的主节点进行数据复制。

当没有足够数量的 Sentinel 同意主服务器下线时，主服务器的客观下线状态就会被移除。当主服务器重新向 Sentinel 的 PING 命令返回有效回复时，主服务器的主观下线状态就会被移除。

注意：一个有效的 PING 回复可以是：+PONG、-LOADING 或者 -MASTERDOWN。如果服务器返回除以上三种回复之外的其他回复，又或者在指定时间内没有回复 PING 命令，那么 Sentinel 认为服务器返回的回复无效（non-valid）。

5. Redis Sentinel搭建

5.1. Redis Sentinel的部署须知

一个稳健的 Redis Sentinel 集群，应该使用至少三个 Sentinel 实例，并且保证讲这些实例放到不同的机器上，甚至不同的物理区域。
Sentinel 无法保证强一致性。
常见的客户端应用库都支持 Sentinel。
Sentinel 需要通过不断的测试和观察，才能保证高可用。

5.2. Redis Sentinel的配置文件

# 哨兵sentinel实例运行的端口，默认26379

port 26379

# 哨兵sentinel的工作目录

dir ./

# 哨兵sentinel监控的redis主节点的

## ip：主机ip地址

## port：哨兵端口号

## master-name：可以自己命名的主节点名字（只能由字母A-z、数字0-9 、这三个字符".-_"组成。）

## quorum：当这些quorum个数sentinel哨兵认为master主节点失联那么这时客观上认为主节点失联了

# sentinel monitor <master-name> <ip> <redis-port> <quorum>

sentinel monitor mymaster 127.0.0.1 6379 2

# 当在Redis实例中开启了requirepass <foobared>，所有连接Redis实例的客户端都要提供密码。

# sentinel auth-pass <master-name> <password>

sentinel auth-pass mymaster 123456

# 指定主节点应答哨兵sentinel的最大时间间隔，超过这个时间，哨兵主观上认为主节点下线，默认30秒

# sentinel down-after-milliseconds <master-name> <milliseconds>

sentinel down-after-milliseconds mymaster 30000

# 指定了在发生failover主备切换时，最多可以有多少个slave同时对新的master进行同步。这个数字越小，完成failover所需的时间就越长；反之，但是如果这个数字越大，就意味着越多的slave因为replication而不可用。可以通过将这个值设为1，来保证每次只有一个slave，处于不能处理命令请求的状态。

# sentinel parallel-syncs <master-name> <numslaves>

sentinel parallel-syncs mymaster 1

# 故障转移的超时时间failover-timeout，默认三分钟，可以用在以下这些方面：

## 1. 同一个sentinel对同一个master两次failover之间的间隔时间。

## 2. 当一个slave从一个错误的master那里同步数据时开始，直到slave被纠正为从正确的master那里同步数据时结束。

## 3. 当想要取消一个正在进行的failover时所需要的时间。

## 4.当进行failover时，配置所有slaves指向新的master所需的最大时间。不过，即使过了这个超时，slaves依然会被正确配置为指向master，但是就不按parallel-syncs所配置的规则来同步数据了

# sentinel failover-timeout <master-name> <milliseconds>

sentinel failover-timeout mymaster 180000

# 当sentinel有任何警告级别的事件发生时（比如说redis实例的主观失效和客观失效等等），将会去调用这个脚本。一个脚本的最大执行时间为60s，如果超过这个时间，脚本将会被一个SIGKILL信号终止，之后重新执行。

# 对于脚本的运行结果有以下规则：

## 1. 若脚本执行后返回1，那么该脚本稍后将会被再次执行，重复次数目前默认为10。

## 2. 若脚本执行后返回2，或者比2更高的一个返回值，脚本将不会重复执行。

## 3. 如果脚本在执行过程中由于收到系统中断信号被终止了，则同返回值为1时的行为相同。

# sentinel notification-script <master-name> <script-path>

sentinel notification-script mymaster /var/redi

# 这个脚本应该是通用的，能被多次调用，不是针对性的。

# sentinel client-reconfig-script <master-name> <script-path>

sentinel client-reconfig-script mymaster /var/redi

5.3. Redis Sentinel的节点规划

5.4. Redis Sentinel的配置搭建

5.4.1. Redis-Server的配置管理

分别拷贝三份 redis.conf 文件到 /usr/local/redis-sentinel 目录下面。三个配置文件分别对应 master、slave1 和 slave2 三个 Redis 节点的启动配置。

$ sudo cp /usr/local /usr/local/redis-sentinel

分别修改三份配置文件如下：

主节点：redi

daemonize yes

pidfile /var/run

logfile /var/log/redi

port 16379

bind 0.0.0.0

timeout 300

databases 16

dbfilename dum

dir ./redis-workdir

masterauth 123456

requirepass 123456

从节点1：redi

daemonize yes

pidfile /var/run

logfile /var/log/redi

port 26379

bind 0.0.0.0

timeout 300

databases 16

dbfilename dum

dir ./redis-workdir

masterauth 123456

requirepass 123456

slaveof 127.0.0.1 16379

从节点2：redi

daemonize yes

pidfile /var/run

logfile /var/log/redi

port 36379

bind 0.0.0.0

timeout 300

databases 16

dbfilename dum

dir ./redis-workdir

masterauth 123456

requirepass 123456

slaveof 127.0.0.1 16379

如果要做自动故障转移，建议所有的 redis.conf 都设置 masterauth。因为自动故障只会重写主从关系，即 slaveof，不会自动写入 masterauth。如果 Redis 原本没有设置密码，则可以忽略。

5.4.2. Redis-Server启动验证

按顺序分别启动 16379，26379 和 36379 三个 Redis 节点，启动命令和启动日志如下：

Redis 的启动命令：

$ sudo redis-server /usr/local/redis-sentinel

查看 Redis 的启动进程：

$ ps -ef | grep redis-server

0 7127 1 0 2:16下午 ?? 0:01.84 redis-server 0.0.0.0:16379

0 7133 1 0 2:16下午 ?? 0:01.73 redis-server 0.0.0.0:26379

0 7137 1 0 2:16下午 ?? 0:01.70 redis-server 0.0.0.0:36379

查看 Redis 的启动日志：

节点 redis-16379

$ cat /var/log/redi

7126:C 22 Aug 14:16:38.907 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7126:C 22 Aug 14:16:38.908 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7126, just started

7126:C 22 Aug 14:16:38.908 # Configuration loaded

7127:M 22 Aug 14:16:38.910 * Increased maximum number of open files to 10032 (it was originally set to 256).

7127:M 22 Aug 14:16:38.912 * Running mode=standalone, port=16379.

7127:M 22 Aug 14:16:38.913 # Server initialized

7127:M 22 Aug 14:16:38.913 * Ready to accept connections

7127:M 22 Aug 14:16:48.416 * Slave 127.0.0.1:26379 asks for synchronization

7127:M 22 Aug 14:16:48.416 * Full resync requested by slave 127.0.0.1:26379

7127:M 22 Aug 14:16:48.416 * Starting BGSAVE for SYNC with target: disk

7127:M 22 Aug 14:16:48.416 * Background saving started by pid 7134

7134:C 22 Aug 14:16:48.433 * DB saved on disk

7127:M 22 Aug 14:16:48.487 * Background saving terminated with success

7127:M 22 Aug 14:16:48.494 * Synchronization with slave 127.0.0.1:26379 succeeded

7127:M 22 Aug 14:16:51.848 * Slave 127.0.0.1:36379 asks for synchronization

7127:M 22 Aug 14:16:51.849 * Full resync requested by slave 127.0.0.1:36379

7127:M 22 Aug 14:16:51.849 * Starting BGSAVE for SYNC with target: disk

7127:M 22 Aug 14:16:51.850 * Background saving started by pid 7138

7138:C 22 Aug 14:16:51.862 * DB saved on disk

7127:M 22 Aug 14:16:51.919 * Background saving terminated with success

7127:M 22 Aug 14:16:51.923 * Synchronization with slave 127.0.0.1:36379 succeeded

以下两行日志日志表明，redis-16379 作为 Redis 的主节点，redis-26379 和 redis-36379 作为从节点，从主节点同步数据。

7127:M 22 Aug 14:16:48.416 * Slave 127.0.0.1:26379 asks for synchronization

7127:M 22 Aug 14:16:51.848 * Slave 127.0.0.1:36379 asks for synchronization

节点 redis-26379

$ cat /var/log/redi

7132:C 22 Aug 14:16:48.407 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7132:C 22 Aug 14:16:48.408 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7132, just started

7132:C 22 Aug 14:16:48.408 # Configuration loaded

7133:S 22 Aug 14:16:48.410 * Increased maximum number of open files to 10032 (it was originally set to 256).

7133:S 22 Aug 14:16:48.412 * Running mode=standalone, port=26379.

7133:S 22 Aug 14:16:48.413 # Server initialized

7133:S 22 Aug 14:16:48.413 * Ready to accept connections

7133:S 22 Aug 14:16:48.413 * Connecting to MASTER 127.0.0.1:16379

7133:S 22 Aug 14:16:48.413 * MASTER <-> SLAVE sync started

7133:S 22 Aug 14:16:48.414 * Non blocking connect for SYNC fired the event.

7133:S 22 Aug 14:16:48.414 * Master replied to PING, replication can continue...

7133:S 22 Aug 14:16:48.415 * Partial resynchronization not possible (no cached master)

7133:S 22 Aug 14:16:48.417 * Full resync from master: 211d3b4eceaa3af4fe5c77d22adf06e1218e0e7b:0

7133:S 22 Aug 14:16:48.494 * MASTER <-> SLAVE sync: receiving 176 bytes from master

7133:S 22 Aug 14:16:48.495 * MASTER <-> SLAVE sync: Flushing old data

7133:S 22 Aug 14:16:48.496 * MASTER <-> SLAVE sync: Loading DB in memory

7133:S 22 Aug 14:16:48.498 * MASTER <-> SLAVE sync: Finished with success

节点 redis-36379

$ cat /var/log/redi

7136:C 22 Aug 14:16:51.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7136:C 22 Aug 14:16:51.840 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7136, just started

7136:C 22 Aug 14:16:51.841 # Configuration loaded

7137:S 22 Aug 14:16:51.843 * Increased maximum number of open files to 10032 (it was originally set to 256).

7137:S 22 Aug 14:16:51.845 * Running mode=standalone, port=36379.

7137:S 22 Aug 14:16:51.845 # Server initialized

7137:S 22 Aug 14:16:51.846 * Ready to accept connections

7137:S 22 Aug 14:16:51.846 * Connecting to MASTER 127.0.0.1:16379

7137:S 22 Aug 14:16:51.847 * MASTER <-> SLAVE sync started

7137:S 22 Aug 14:16:51.847 * Non blocking connect for SYNC fired the event.

7137:S 22 Aug 14:16:51.847 * Master replied to PING, replication can continue...

7137:S 22 Aug 14:16:51.848 * Partial resynchronization not possible (no cached master)

7137:S 22 Aug 14:16:51.850 * Full resync from master: 211d3b4eceaa3af4fe5c77d22adf06e1218e0e7b:14

7137:S 22 Aug 14:16:51.923 * MASTER <-> SLAVE sync: receiving 176 bytes from master

7137:S 22 Aug 14:16:51.923 * MASTER <-> SLAVE sync: Flushing old data

7137:S 22 Aug 14:16:51.924 * MASTER <-> SLAVE sync: Loading DB in memory

7137:S 22 Aug 14:16:51.927 * MASTER <-> SLAVE sync: Finished with success

5.4.3. Sentinel的配置管理

分别拷贝三份 redi 文件到 /usr/local/redis-sentinel 目录下面。三个配置文件分别对应 master、slave1 和 slave2 三个 Redis 节点的哨兵配置。

$ sudo cp /usr/local /usr/local/redis-sentinel

节点1：

protected-mode no

bind 0.0.0.0

port 16380

daemonize yes

sentinel monitor master 127.0.0.1 16379 2

sentinel down-after-milliseconds master 5000

sentinel failover-timeout master 180000

sentinel parallel-syncs master 1

sentinel auth-pass master 123456

logfile /var/log/redi

节点2：

protected-mode no

bind 0.0.0.0

port 26380

daemonize yes

sentinel monitor master 127.0.0.1 16379 2

sentinel down-after-milliseconds master 5000

sentinel failover-timeout master 180000

sentinel parallel-syncs master 1

sentinel auth-pass master 123456

logfile /var/log/redi

节点3：

protected-mode no

bind 0.0.0.0

port 36380

daemonize yes

sentinel monitor master 127.0.0.1 16379 2

sentinel down-after-milliseconds master 5000

sentinel failover-timeout master 180000

sentinel parallel-syncs master 1

sentinel auth-pass master 123456

logfile /var/log/redi

5.4.4. Sentinel启动验证

按顺序分别启动 16380，26380 和 36380 三个 Sentinel 节点，启动命令和启动日志如下：

$ sudo redis-sentinel /usr/local/redis-sentinel

查看 Sentinel 的启动进程：

$ ps -ef | grep redis-sentinel

0 7954 1 0 3:30下午 ?? 0:00.05 redis-sentinel 0.0.0.0:16380 [sentinel]

0 7957 1 0 3:30下午 ?? 0:00.05 redis-sentinel 0.0.0.0:26380 [sentinel]

0 7960 1 0 3:30下午 ?? 0:00.04 redis-sentinel 0.0.0.0:36380 [sentinel]

查看 Sentinel 的启动日志：

节点 sentinel-16380

$ cat /var/log/redi

7953:X 22 Aug 15:30:27.245 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7953:X 22 Aug 15:30:27.245 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7953, just started

7953:X 22 Aug 15:30:27.245 # Configuration loaded

7954:X 22 Aug 15:30:27.247 * Increased maximum number of open files to 10032 (it was originally set to 256).

7954:X 22 Aug 15:30:27.249 * Running mode=sentinel, port=16380.

7954:X 22 Aug 15:30:27.250 # Sentinel ID is 69d05b86a82102a8919231fd3c2d1f21ce86e000

7954:X 22 Aug 15:30:27.250 # +monitor master master 127.0.0.1 16379 quorum 2

7954:X 22 Aug 15:30:32.286 # +sdown sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

7954:X 22 Aug 15:30:34.588 # -sdown sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

sentinel-16380 节点的 Sentinel ID 为 69d05b86a82102a8919231fd3c2d1f21ce86e000，并通过 Sentinel ID 把自身加入 sentinel 集群中。

节点 sentinel-26380

$ cat /var/log/redi

7956:X 22 Aug 15:30:30.900 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7956:X 22 Aug 15:30:30.901 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7956, just started

7956:X 22 Aug 15:30:30.901 # Configuration loaded

7957:X 22 Aug 15:30:30.904 * Increased maximum number of open files to 10032 (it was originally set to 256).

7957:X 22 Aug 15:30:30.905 * Running mode=sentinel, port=26380.

7957:X 22 Aug 15:30:30.906 # Sentinel ID is 21e30244cda6a3d3f55200bcd904d0877574e506

7957:X 22 Aug 15:30:30.906 # +monitor master master 127.0.0.1 16379 quorum 2

7957:X 22 Aug 15:30:30.907 * +slave slave 127.0.0.1:26379 127.0.0.1 26379 @ master 127.0.0.1 16379

7957:X 22 Aug 15:30:30.911 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 16379

7957:X 22 Aug 15:30:36.311 * +sentinel sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

sentinel-26380 节点的 Sentinel ID 为 21e30244cda6a3d3f55200bcd904d0877574e506，并通过 Sentinel ID 把自身加入 sentinel 集群中。此时 sentinel 集群中已有 sentinel-16380 和 sentinel-26380 两个节点。

节点 sentinel-36380

$ cat /var/log/redi

7959:X 22 Aug 15:30:34.273 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

7959:X 22 Aug 15:30:34.274 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7959, just started

7959:X 22 Aug 15:30:34.274 # Configuration loaded

7960:X 22 Aug 15:30:34.276 * Increased maximum number of open files to 10032 (it was originally set to 256).

7960:X 22 Aug 15:30:34.277 * Running mode=sentinel, port=36380.

7960:X 22 Aug 15:30:34.278 # Sentinel ID is fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7

7960:X 22 Aug 15:30:34.278 # +monitor master master 127.0.0.1 16379 quorum 2

7960:X 22 Aug 15:30:34.279 * +slave slave 127.0.0.1:26379 127.0.0.1 26379 @ master 127.0.0.1 16379

7960:X 22 Aug 15:30:34.283 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 16379

7960:X 22 Aug 15:30:34.993 * +sentinel sentinel 21e30244cda6a3d3f55200bcd904d0877574e506 127.0.0.1 26380 @ master 127.0.0.1 16379

sentinel-36380 节点的 Sentinel ID 为 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7，并通过 Sentinel ID 把自身加入 sentinel 集群中。此时 sentinel 集群中已有 sentinel-16380，sentinel-26380 和 sentinel-36380 三个节点。

5.4.5. Sentinel配置刷新

节点1：

文件新生成如下的配置项：

# Generated by CONFIG REWRITE

dir "/usr/local/redis-sentinel"

sentinel config-epoch master 0

sentinel leader-epoch master 0

sentinel known-slave master 127.0.0.1 36379

sentinel known-slave master 127.0.0.1 26379

sentinel known-sentinel master 127.0.0.1 26380 21e30244cda6a3d3f55200bcd904d0877574e506

sentinel known-sentinel master 127.0.0.1 36380 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7

sentinel current-epoch 0

可以注意到，刷新写入了 Redis 主节点关联的所有从节点 redis-26379 和 redis-36379，同时写入了其余两个 Sentinel 节点 sentinel-26380 和 sentinel-36380 的 IP 地址，端口号和 Sentinel ID。

# Generated by CONFIG REWRITE

dir "/usr/local/redis-sentinel"

sentinel config-epoch master 0

sentinel leader-epoch master 0

sentinel known-slave master 127.0.0.1 26379

sentinel known-slave master 127.0.0.1 36379

sentinel known-sentinel master 127.0.0.1 36380 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7

sentinel known-sentinel master 127.0.0.1 16380 69d05b86a82102a8919231fd3c2d1f21ce86e000

sentinel current-epoch 0

可以注意到，刷新写入了 Redis 主节点关联的所有从节点 redis-26379 和 redis-36379，同时写入了其余两个 Sentinel 节点 sentinel-36380 和 sentinel-16380 的 IP 地址，端口号和 Sentinel ID。

# Generated by CONFIG REWRITE

dir "/usr/local/redis-sentinel"

sentinel config-epoch master 0

sentinel leader-epoch master 0

sentinel known-slave master 127.0.0.1 36379

sentinel known-slave master 127.0.0.1 26379

sentinel known-sentinel master 127.0.0.1 16380 69d05b86a82102a8919231fd3c2d1f21ce86e000

sentinel known-sentinel master 127.0.0.1 26380 21e30244cda6a3d3f55200bcd904d0877574e506

sentinel current-epoch 0

可以注意到，刷新写入了 Redis 主节点关联的所有从节点 redis-26379 和 redis-36379，同时写入了其余两个 Sentinel 节点 sentinel-16380 和 sentinel-26380 的 IP 地址，端口号和 Sentinel ID。

5.5. Sentinel时客户端命令

检查其他 Sentinel 节点的状态，返回 PONG 为正常。

> PING sentinel

显示被监控的所有主节点以及它们的状态。

> SENTINEL masters

显示指定主节点的信息和状态。

> SENTINEL master <master_name>

显示指定主节点的所有从节点以及它们的状态。

> SENTINEL slaves <master_name>

返回指定主节点的 IP 地址和端口。如果正在进行 failover 或者 failover 已经完成，将会显示被提升为主节点的从节点的 IP 地址和端口。

> SENTINEL get-master-addr-by-name <master_name>

重置名字匹配该正则表达式的所有的主节点的状态信息，清除它之前的状态信息，以及从节点的信息。

> SENTINEL reset <pattern>

强制当前 Sentinel 节点执行 failover，并且不需要得到其他 Sentinel 节点的同意。但是 failover 后会将最新的配置发送给其他 Sentinel 节点。

SENTINEL failover <master_name>

6. Redis Sentinel故障切换与恢复

6.1. Redis CLI客户端跟踪

上面的日志显示，redis-16379 节点为主节点，它的进程 ID 为 7127。为了模拟 Redis 主节点故障，强制杀掉这个进程。

$ kill -9 7127

使用 redis-cli 客户端命令进入 sentinel-16380 节点，查看 Redis 节点的状态信息。

$ redis-cli -p 16380

查看 Redis 主从集群的主节点信息。可以发现 redis-26379 晋升为新的主节点。

127.0.0.1:16380> SENTINEL master master

1) "name"

2) "master"

3) "ip"

4) "127.0.0.1"

5) "port"

6) "26379"

7) "runid"

8) "b8ca3b468a95d1be5efe1f50c50636cafe48c59f"

9) "flags"

10) "master"

11) "link-pending-commands"

12) "0"

13) "link-refcount"

14) "1"

15) "last-ping-sent"

16) "0"

17) "last-ok-ping-reply"

18) "588"

19) "last-ping-reply"

20) "588"

21) "down-after-milliseconds"

22) "5000"

23) "info-refresh"

24) "9913"

25) "role-reported"

26) "master"

27) "role-reported-time"

28) "663171"

29) "config-epoch"

30) "1"

31) "num-slaves"

32) "2"

33) "num-other-sentinels"

34) "2"

35) "quorum"

36) "2"

37) "failover-timeout"

38) "180000"

39) "parallel-syncs"

40) "1"

6.2. Redis Sentinel日志跟踪

查看任意 Sentinel 节点的日志如下：

7954:X 22 Aug 18:40:22.504 # +tilt #tilt mode entered

7954:X 22 Aug 18:40:32.197 # +tilt #tilt mode entered

7954:X 22 Aug 18:41:02.241 # -tilt #tilt mode exited

7954:X 22 Aug 18:48:24.550 # +sdown master master 127.0.0.1 16379

7954:X 22 Aug 18:48:24.647 # +new-epoch 1

7954:X 22 Aug 18:48:24.651 # +vote-for-leader fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 1

7954:X 22 Aug 18:48:25.678 # +odown master master 127.0.0.1 16379 #quorum 3/2

7954:X 22 Aug 18:48:25.678 # Next failover delay: I will not start a failover before Wed Aug 22 18:54:24 2018

7954:X 22 Aug 18:48:25.709 # +config-update-from sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

7954:X 22 Aug 18:48:25.710 # +switch-master master 127.0.0.1 16379 127.0.0.1 26379

7954:X 22 Aug 18:48:25.710 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 26379

7954:X 22 Aug 18:48:25.711 * +slave slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

7954:X 22 Aug 18:48:30.738 # +sdown slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

7954:X 22 Aug 19:38:23.479 # -sdown slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

分析日志，可以发现 redis-16329 节点先进入 sdown 主观下线状态。

+sdown master master 127.0.0.1 16379

哨兵检测到 redis-16329 出现故障，Sentinel 进入一个新纪元，从 0 变为 1。

+new-epoch 1

三个 Sentinel 节点开始协商主节点的状态，判断其是否需要客观下线。

+vote-for-leader fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 1

超过 quorum 个数的 Sentinel 节点认为主节点出现故障，redis-16329 节点进入客观下线状态。

+odown master master 127.0.0.1 16379 #quorum 3/2

Sentinal 进行自动故障切换，协商选定 redis-26329 节点作为新的主节点。

+switch-master master 127.0.0.1 16379 127.0.0.1 26379

redis-36329 节点和已经客观下线的 redis-16329 节点成为 redis-26479 的从节点。

7954:X 22 Aug 18:48:25.710 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 26379

7954:X 22 Aug 18:48:25.711 * +slave slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

6.3. Redis的配置文件

分别查看三个 redis 节点的配置文件，发生主从切换时 redis.conf 的配置会自动发生刷新。

节点 redis-16379

daemonize yes

pidfile "/var/run"

logfile "/var/log/redi"

port 16379

bind 0.0.0.0

timeout 300

databases 16

dbfilename "dum"

dir "/usr/local/redis-sentinel/redis-workdir"

masterauth "123456"

requirepass "123456"

节点 redis-26379

daemonize yes

pidfile "/var/run"

logfile "/var/log/redi"

port 26379

bind 0.0.0.0

timeout 300

databases 16

dbfilename "dum"

dir "/usr/local/redis-sentinel/redis-workdir"

masterauth "123456"

requirepass "123456"

节点 redis-36379

daemonize yes

pidfile "/var/run"

logfile "/var/log/redi"

port 36379

bind 0.0.0.0

timeout 300

databases 16

dbfilename "dum"

dir "/usr/local/redis-sentinel/redis-workdir"

masterauth "123456"

requirepass "123456"

slaveof 127.0.0.1 26379

分析：redis-26379 节点 slaveof 配置被移除，晋升为主节点。redis-16379 节点处于宕机状态。redis-36379 的 slaveof 配置更新为 127.0.0.1 redis-26379，成为 redis-26379 的从节点。

重启节点 redis-16379。待正常启动后，再次查看它的 redis.conf 文件，配置如下：

daemonize yes

pidfile "/var/run"

logfile "/var/log/redi"

port 16379

bind 0.0.0.0

timeout 300

databases 16

dbfilename "dum"

dir "/usr/local/redis-sentinel/redis-workdir"

masterauth "123456"

requirepass "123456"

# Generated by CONFIG REWRITE

slaveof 127.0.0.1 26379

节点 redis-16379 的配置文件新增一行 slaveof 配置属性，指向 redis-26379，即成为新的主节点的从节点。

小结

本文首先对 Redis 实现高可用的几种模式做出了阐述，指出了 Redis 主从复制的不足之处，进一步引入了 Redis Sentinel 哨兵模式的相关概念，深入说明了 Redis Sentinel 的具体功能，基本原理，高可用搭建和自动故障切换验证等。

当然，Redis Sentinel 仅仅解决了高可用的问题，对于主节点单点写入和单节点无法扩容等问题，还需要引入 Redis Cluster 集群模式予以解决。

1.《深入剖析Redis系列： Redis哨兵模式与高可用集群》援引自互联网，旨在传递更多网络信息知识，仅代表作者本人观点，与本网站无关，侵删请联系页脚下方联系方式。

2.《深入剖析Redis系列： Redis哨兵模式与高可用集群》仅供读者参考，本网站未对该内容进行证实，对其原创性、真实性、完整性、及时性不作任何保证。

3.文章转载时请保留本站内容来源地址，https://www.lu-xu.com/gl/2080345.html

深入剖析Redis系列： Redis哨兵模式与高可用集群

DNF平民宝珠推荐：最低10w，最高不超过150w，买到就是赚到

DNF漫画：转职书

678改动日志看这里!难民恶行已让德国忍无可忍，2019年德国的难民被遣返数量大幅上升

678改动日志看这里!三个技巧，教你将Docker镜像体积减小90%｜优化调优

【678改动日志】开赌场要退赃678万拒不执行，名下141个ZIPPO打火机、多个大牌手表1元起拍

678改动日志看这里!开赌场要退赃678万拒不执行，名下141个ZIPPO打火机、多个大牌手表1元起拍

【678改动日志】专题e8c现场常见问题分析

【678改动日志】专题大数据开发之HBase异常问题分析

678改动日志专题之Flink 最佳实践之使用 Canal 同步 MySQL 数据至 TiDB

678改动日志看这里!Flink 最佳实践之使用 Canal 同步 MySQL 数据至 TiDB