5.5引入semi-sync,当master事务提交后,由dump将对应binlog传给slaves,至少收到一个slave的ACK确认,master才返回给用户线程;
注意事项
1 slave ACK只代表io_thread已记录relay_log,并不意味着sql_thread已经执行;
2 master的事务commit后才传输给slave,如果此时master crash,会出现主备数据不一致;
3 dump thread既要负责传输binlog,又负责接收slave的ACK,且两者不能并行,效率很低;
4 dump thread读取binlog时获取LOCK_log,mutex期间任何线程不得对binlog进行读写;
为此后续版本不断改进
1 after_sync
5.7引入rpl_semi_sync_master_wait_point参数 ,DBA可选择master 在哪个阶段等待来自slave的ACK,要么按照以前的方法(after_commit),要么在master事务flush binlog之后但是commit storage engine之前;
AFTER_SYNC (the default): The master writes each transaction to its binary log and the slave, and syncs the binary log to disk. The master waits for slave acknowledgment of transaction receipt after the sync. Upon receiving acknowledgment, the master commits the transaction to the storage engine and returns a result to the client, which then can proceed.
AFTER_COMMIT: The master writes each transaction to its binary log and the slave, syncs the binary log, and commits the transaction to the storage engine. The master waits for slave acknowledgment of transaction receipt after the commit. Upon receiving acknowledgment, the master returns a result to the client, which then can proceed.
假定master上有两个客户端连接clienta和clientb,
clienta提交一个事务,pre-5.7 mysql将其依次写入redo,binlog和redo(commit),然后semi-sync,接收到slave ack后才能返回给clienta;
clientb便可在redo(commit)之后看到clienta提交的事务数据,这领先于clienta一步,从而造成连接间的数据不一致;
after_sync则避免了这种问题,clienta提交一个事务,mysql将其依次写入redo和binlog,然后semi-sync,等收到slave ack后才进行redo(commit),然后返回给clienta;
after_commit另外一个问题,若master在redo(commit)和semi-sync期间crash,此时主备数据并不一致;
after_sync至少能保证redo(commit)成功的事务都已同步到slave,比之改进了半步;
2 ack collector thread
5.7引入此独立线程,此时的dump thread只负责读取并发送binlog event,slave ACK的接收由ACK collector thread负责;
dump thread不必等待ack确认便可继续发送event,类似TCP的滑动窗口协议;
master维护一个semisync slave列表,即便ack thread宕掉,该列表仍然存在;
dump thread通过调用transmit_start时将slave注册到master,如果slave支持semisync则添加到semisync slave列表;
ack thread通过select()监听semisync slave列表;
Ack_receiver Class用于维护ACK线程
该线程有3种状态
enum status { ST_UP, ST_STOPPING, ST_DOWN };
ST_UP means ack receive thread is created and is working.
ST_DOWN means ack receive thread is destroyed.
ST_STOPPING means a user is disabling semisync master, and ack receive thread is being destroyed.
- m_slaves
A slave vector which includes slaves' useful information here.
DEFINITION:
Slave_vector m_slaves
- m_mutex
m_slaves and m_status are shared between user sessions(dump threads) and ack thread. So they should be protected by a mutex.
- add_slave()
Add a new semisync slave to slave list.
DEFINITION:
bool add_slave(THD *thd);
LOGIC:
initialze slave information.
acquire m_mutex
add the slave's information into m_slaves.
send a signal to ack receive thread. It may be waiting for a signal.
release m_mutex
- remove_slave()
remove a semisync slave from slave list.
DEFINITION:
void remove_slave(THD *thd)
LOGIC:
acquire m_mutex
remove thd of the slave from m_slaves.
release m_mutex
- run()
The handle function of receive thread.
DEFINITION:
void run();
LOGIC:
initialize pthread related things
while (1)
{
acquire m_mutex
if m_status is ST_STOPPING then break the loop.
wait any semisync slave to be added if slave list empty.
call select to listen on sockets, timeout is 1s.
restart and continue the loop if error or timeout happens.
receive and report acks to semisync master.
release m_mutex
}
de-initialize pthread related things
Note: Giving select a timeout makes other threads can add/remove slaves
or stop ack receive thread when there is no ack.
3 解除dump thread的LOCK_log mutex
当前dump线程的工作逻辑如下:
前台线程写binlog
acquire LOCK_log
write log event to binlog
release LOCK_log
signal update
dump线程
while client is not killed:
acquire LOCK_log
read event from binlog
release LOCK_log
if EOF was reached in the previous read:
acquire LOCK_log
wait for update signal
read event from binlog
release LOCK_log
当某个dump线程读取binlog时,它会获取LOCK_log mutex,期间会阻塞任何针对该binlog的读写请求;
移除LOCK_log
event只添加到当前binlog的尾部,所以读取其他部位的event不需要锁;
唯一的顾虑是当前台线程写binlog时,dump thread可能会读取到incomplete event;
为此MYSQL_BIN_LOG引入一个变量binlog_end_pos,记录当前binlog的last event的位置信息,dump thread只读取这之前的event;
write thread:添加完event后更新此变量,
read thread:只读取binlog_end_pos之前的event,
该变量由LOCK_binlog_end_pos保护,读写时均需要;
此时dump thread的逻辑如下
dump thread design:
end_position = 0
while client is not killed:
if current read position == end_position:
acquire lock_binlog_end
while end_position == binlog_end and client is not killed:
wait for update signal
release lock_binlog_end
if client is killed:
break
read event from binlog
http://dev.mysql.com/worklog/task/?id=5721#tabs-5721-5