起因: db目录挂载到了磁盘上,而dbsync目录挂载到本地,等价于两个目录挂载到两块盘
日志: 查看从节点的PIKA.WARRNING日志如下,可以看到提示rename问题.
解释: 因为slave 接受的master全同步数据 是硬链接到db目录的。需要在同一个文件系统上。
解决方案: 将db与dbsync都挂载到磁盘上,rename问题解决
Log file created at: 2020/12/01 11:14:21
Running on machine: pika-test-20201128-001
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W1201 11:14:21.244900 67744 pika_partition.cc:304] Partition: db0, Failed to rename new db path when change db, error: Invalid cross-device link
W1201 11:14:21.244971 67744 pika_partition.cc:255] Partition: db0, Failed to change db
W1201 11:15:41.613822 67637 pika_repl_client_thread.cc:49] Master conn timeout : pika1:11221 try reconnect
W1201 11:20:19.930094 67744 pika_partition.cc:298] Partition: db0, Failed to rename db path when change db, error: No such file or directory
W1201 11:20:19.930109 67744 pika_partition.cc:255] Partition: db0, Failed to change db
W1201 11:21:38.866683 67637 pika_repl_client_thread.cc:49] Master conn timeout : pika1:11221 try reconnect
起因: 主从关系掉了之后,进行全量同步过程中,主节点的dump+传输时间过长,导致主从在不断的全量同步
日志: 如下主从节点的日志,发生了超时,在不断的重新进行全量同步
解释: 理解为主节点dump+传输时间过长
解决方法: slaveof pika1 9221 force(force操作,会让从节点一直等待主节点dump完成并且传输到从节点建立主从)
注: 如果pika主从的建立是通过slaveof pika1 9221这种使用hostname建立的,还有可能日志会提示使用容器id而不是hostname(pika1)
解决方法先使用容器id创建主从
slaveof containerid 9221 force
当主从建立成功之后执行:
slaveof no one
slaveof pika1 9221 force
config rewrite
这个原理暂时还不能解释,亲测有效
主节点日志:
I1130 10:44:48.794337 29011 pika_partition.cc:379] db0 bgsave_info: path=/data1/pika/dump/20201130/db0, filenum=2562, offset=13365700
I1130 10:48:02.246551 29011 pika_partition.cc:385] db0 create new backup finished.
I1130 10:48:02.246703 29011 pika_server.cc:1085] Partition: db0 Start Send files in /data1/pika/dump/20201130/db0 to 127.0.0.1
I1130 10:58:55.963577 29011 pika_server.cc:1186] Partition: db0 RSync Send Files Success
I1130 11:00:15.398463 26013 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 201, ip_port: 127.0.0.1:37504
I1130 11:00:15.398608 26013 pika_server.cc:740] Delete Slave Success, ip_port: 127.0.0.1:9221
I1130 11:00:15.398638 26013 pika_rm.cc:90] Remove Slave Node, Partition: (db0:0), ip_port: 127.0.0.1:9221
I1130 11:00:25.094928 26016 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 127.0.0.1, Slave port:9221
I1130 11:00:25.095026 26016 pika_server.cc:843] Add New Slave, 127.0.0.1:9221
I1130 11:00:25.233932 26014 pika_repl_server_conn.cc:108] Receive Trysync, Slave ip: 10.20.134.1, Slave port:9221, Partition: db0, filenum: 0, pro_offset: 0
I1130 11:00:25.233992 26014 pika_repl_server_conn.cc:263] Partition: db0 binlog has been purged, may need full sync
I1130 11:00:40.320998 26015 pika_repl_server_conn.cc:324] Handle partition DBSync Request
I1130 11:00:40.321120 26015 pika_rm.cc:79] Add Slave Node, partition: (db0:0), ip_port: 127.0.0.1:9221
I1130 11:00:40.322064 26015 pika_repl_server_conn.cc:347] Partition: db0_0 Handle DBSync Request Success, Session: 183
I1130 11:00:52.044495 29011 pika_partition.cc:376] db0 after prepare bgsave
I1130 11:00:52.044572 29011 pika_partition.cc:379] db0 bgsave_info: path=/data1/pika/dump/20201130/db0, filenum=2562, offset=13365700
I1130 11:04:03.152256 29011 pika_partition.cc:385] db0 create new backup finished.
I1130 11:04:03.152402 29011 pika_server.cc:1085] Partition: db0 Start Send files in /data1/pika/dump/20201130/db0 to 127.0.0.1
从节点日志:
I1130 10:44:35.124609 53402 pika_repl_client_conn.cc:182] Partition: db0 Need Wait To Sync
I1130 10:58:55.921267 53506 pika_partition.cc:236] Partition: db0 Information from dbsync info, master_ip: 127.0.0.1, master_port: 9221, filenum: 2562, offset: 13365700, term: 0, index: 0
I1130 10:58:55.921336 53506 pika_partition.cc:293] Partition: db0, Prepare change db from: /data2/pika/db/db0_bak
I1130 11:00:15.392289 53399 pika_repl_client_thread.cc:38] ReplClient Timeout conn, fd=95, ip_port=127.0.0.1:11221
I1130 11:00:25.087127 53506 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (127.0.0.1:9221)
I1130 11:00:25.088173 53403 pika_server.cc:618] Mark try connect finish
I1130 11:00:25.088215 53403 pika_repl_client_conn.cc:146] Finish to handle meta sync response
I1130 11:00:25.226863 53404 pika_repl_client_conn.cc:261] Partition: db0 Need To Try DBSync
I1130 11:00:40.315070 53405 pika_repl_client_conn.cc:182] Partition: db0 Need Wait To Sync
I1130 11:15:53.390866 53506 pika_partition.cc:236] Partition: db0 Information from dbsync info, master_ip: 127.0.0.1, master_port: 9221, filenum: 2562, offset: 13365700, term: 0, index: 0
I1130 11:15:53.392174 53506 pika_partition.cc:293] Partition: db0, Prepare change db from: /data2/pika/db/db0_bak
I1130 11:17:12.993613 53399 pika_repl_client_thread.cc:38] ReplClient Timeout conn, fd=70, ip_port=127.0.0.1:11221
I1130 11:17:22.057538 53506 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (127.0.0.1:9221)
起因: 由于网络情况或是其他情况主从断开,重新建立连接时提示主从关系已经存在
日志: 如下为从节点的日志,提示Slave AlreadyExist
解决方案: 将pika版本升级为3.3.6,这个版本已经修复了 Slave AlreadyExist 这个问题,可以看一下github上的版本更新
pika_repl_client.cc:145] Try Send Meta Sync Request to Master (pika1:9221)
pika_repl_client_conn.cc:100] Meta Sync Failed: Slave AlreadyExist
Sync error, set repl_state to PIKA_REPL_ERROR
pika_repl_client_thread.cc:21] ReplClient Close conn, fd=364, ip_port=pika1:11221
1.Slave AlreadyExist问题更新版本到3.3.6
2.rename问题将db与dbsync挂载到同一目录下
3.ReplClient Timeout conn问题建立主从的时候需要执行force建立
4.建议将dump db与dbsync一起挂载到磁盘目录上,这样也会减少dump的时间