MySQL binlog

Posted May 26, 202011 min read

** Please indicate the source https://segmentfault.com/a/1190000022744239 **

1. Introduction to binlog

binlog is a binary log maintained by the Mysql sever layer, which is completely different from the redo/undo log in the innodb engine; it is mainly used to record SQL statements that update or potentially update mysql data, and use Saved in disk. The binlog of MySQL is multi-file storage. To locate a LogEvent, you need to locate it by binlog filename + binlog position.

The main functions are:

  • Replication:MySQL Replication starts binlog on the master side, and the master passes its binary log to slaves and replays to achieve the purpose of master-slave data consistency
  • Data recovery:recover data through mysqlbinlog tool
  • Incremental backup

2. The data format of binlog

MySQL Replication replication can be based on a statement(Statement Level) or a record(Row Level), you can set this replication level in the MySQL configuration parameters, different replication level settings will affect the bin-log on the Master side Log format.

row-based

In line-based logging, the master writes events to a binary log file to show how the rows of a single table are affected. The log will record the modified form of each line of data, and then modify the same data on the slave side.

Advantages: In row mode, the bin-log does not need to record the context-related information of the executed SQL statement. It only needs to record which record has been modified and what it has changed. Therefore, the log content of row will record the details of each row of data modification very clearly, which is very easy to understand. And there will be no stored procedures or functions under certain circumstances, and the trigger call and trigger cannot be copied correctly.

Disadvantages: When all the executed statements are recorded in the log, they will be recorded with the modification of each line of record, which may generate a large amount of log content, such as an update statement, modify multiple records, then Each modification in binlog will have a record, which will cause a large amount of binlog log, especially when executing a statement such as alter table, because the table structure is modified, each record changes, then each record in the table will Log to the log.

statement-based

Every SQL that will modify the data will be recorded in the master's bin-log. When the slave is copying, the SQL process will be parsed into the same SQL as the original master and executed again.

Advantages: Does not record changes in each row of data, reducing the amount of bin-log logs, saving I/O and storage resources, and improving performance. Because only the details of the statement executed on the master and the context information when the statement is executed are recorded.

Disadvantages: Due to the recorded execution statements, in order for these statements to be executed correctly on the slave side, then he must also record some relevant information during the execution of each statement, that is, context information to ensure All statements can get the same result when executed on the slave side as when executed on the master side. In the statement, there have been found that there are many situations that will cause problems with MySQL replication, mainly when certain data or functions are used when modifying data, such as:sleep() function It cannot be copied correctly, and the last \ _insert \ _id() function is used in the stored procedure, which may cause the slave and master to get inconsistent ids and so on. Since row is based on the changes recorded by each row, there will be no similar problems.

mixed

Starting from version 5.1.8, MySQL provides a third replication mode in addition to statement and row:mixed, which is actually a combination of the first two modes. In mixed mode, MySQL will distinguish the log form to be recorded according to each specific SQL statement executed, that is, choose one between statement and row. The general statement modification uses the status format to save the binlog, such as the table structure change, but for the statement can not complete the master-slave replication operation, such as some functions, the row format is used to save the binlog.

If binlog adopts Mixed mode, then the binlog mode will be automatically changed from statement mode to row mode in the following situations:

  • When the DML statement updates an NDB(NDB Cluster) table;
  • When UUID() is included in the function;
  • When two or more tables containing AUTO \ _INCREMENT fields are updated;
  • When executing the INSERT DELAYED statement;
  • When using UDF(Userdefined function);
  • When the row must be used in the view, for example, the UUID() function is used when creating the view;

Summary of row-based and statement-based features

statement advantages:

  • Long history and mature technology;
  • The generated binlog file is small;
  • The binlog contains all database modification information, which can be used to audit the database security and other conditions;
  • binlog can be used for real-time restoration, not just for replication;
  • The master-slave version can be different, the slave server version can be higher than the master server version;

statement disadvantages:

  • Not all UPDATE statements can be copied, especially when it contains uncertain operations;
  • Copying may also cause problems when calling functions with uncertain factors;
  • Statements using the following functions cannot be copied:
    1. LOAD \ _FILE()
    2. UUID()
    3. USER()
    4. FOUND \ _ROWS()
    5. SYSDATE()(unless the --sysdate-is-now option is enabled at startup)
  • INSERT SELECT will produce more row-level locks than RBR(row-based replication);
  • When copying an UPDATE that requires a full table scan(the index is not used in the WHERE statement), it requires more row-level locks than row requests;
  • For InnoDB tables with AUTO \ _INCREMENT field, the INSERT statement will block other INSERT statements;
  • For some complex statements, the resource consumption on the slave server will be more serious, while in row mode, it will only affect the record that has changed;
  • The storage function(not the storage process) will also execute the NOW() function once when it is called, which can be said to be a bad thing or a good thing;
  • The determined UDF also needs to be executed on the slave server;
  • The data table must be almost consistent with the main server, otherwise it may cause replication errors;
  • Execution of complex statements will consume more resources if something goes wrong;

row advantages:

  • Any situation can be copied, which is the safest and most reliable for copying;
  • Same replication skills as most other database systems;
  • In most cases, if the table on the slave server has a primary key, replication will be much faster;
  • There are fewer row locks when copying the following statements:
    1. INSERT SELECT
    2. INSERT with AUTO \ _INCREMENT field
    3. UPDATE or DELETE statements with no strings attached or many records modified
  • Less locks when executing INSERT, UPDATE, DELETE statements;
  • It is possible to perform multi-thread replication from the server;

row disadvantages:

  • The generated binlog log is much larger;
  • The complex rollback will contain a lot of data in the binlog;
  • When the UPDATE statement is executed on the main server, all the changed records will be written to binlog, and the statement will only be written once, which will cause frequent concurrent write requests of binlog;

The large BLOB value generated by \ -UDF will cause slow copying;

  • Can't see from the binlog what statements are copied(encrypted);
  • When executing a pile of SQL statements on a non-transactional table, it is best to use the statement mode, otherwise it is easy to cause data inconsistencies between the master and slave servers;

In addition, the processing guidelines for the table changes in the system library MySQL are as follows:

    • If INSERT, UPDATE, DELETE is used to directly operate the table, the log format is recorded according to the setting of binlog \ _format;
      • If you use GRANT, REVOKE, SET PASSWORD and other management statements to do it, you must use the statement mode record anyway;
  • After using the statement mode, it can deal with many original primary key duplication problems;

How to choose binlog mode
  1. If MySQL has relatively few special functions(stored procedures, triggers, functions) in production. Select the default statement mode, Statement Level.
  2. If there are many special functions using MySQL in production, you can choose Mixed mode.
  3. If there are many special functions that use MySQL in production, and you want to maximize data consistency, it is best to use Row level mode;

3. The difference between binlog and redo/undo log

The two are completely different logs, mainly with the following differences:

  • Different levels. The redo/undo log is maintained by the innodb layer, and the binlog is maintained by the mysql server layer. It has nothing to do with which engine is used. It records the log records of all engine update operations.
  • The recorded content is different. The redo/undo log records the modification of each page, which is a combination of physical log + logical log(redo log physical to page, logical log in page, undo log uses logical log), the purpose is to ensure the data consistency. binlog records all transaction operations, and the format is binary.
  • The recording timing is different. The redo/undo log will be continuously written during the transaction execution, and the binlog is written before the final commit of the transaction. Of course, when binlog is refreshed to disk is related to the parameter sync_binlog.

Obviously, when we execute SELECT and other statements that do not involve data updates, binlog will not be recorded, while data updates will be recorded. It should be noted that for engines that support transactions, such as innodb, the transaction must be committed before the binlog is recorded.

The timing of binlog refreshing to disk is related to the sync_binlog parameter. If it is set to 0, it means that MySQL does not control the refresh of binlog. The file system controls the refresh of its cache. \ _binlog transactions, MySQL calls the file system refresh operation to refresh the binlog to disk. Set to 1 is the safest, at most one transaction update will be lost when the system fails, but it will affect the performance. In general, it will be set to 100 or 0, sacrificing certain consistency to obtain better performance.

4. Binlog configuration

Open binlog

Settings in my.cnf configuration:

[mysqld]
log-bin = mysql-bin #binlog file name prefix
binlog-format = ROW #ROW, Statement, MiXED three formats
expire_logs_days = 7 #binlog expired cleanup time
sync_binlog = 1 #Time to refresh to disk

Restart mysql

service mysqld start

Without restarting mysql and reloading the my.cnf file, modify the binlog type method:

#Only valid for the current session, mysql restart is invalid
mysql> SET SESSION binlog_format = 'ROW';
#New session is valid, mysql restart is invalid
mysql> SET GLOBAL binlog_format = 'ROW';

5. Analysis of binlog

The directory of the binlog file is in the location specified by datadir in the my.cnf configuration file, or you can view the location and related information through the sql statement

#View location
mysql> show variables like '%datadir%';
#View the open status and file name of binlog
mysql> show variables like '%log_bin%';
#View the current format of binlog
mysql> show variables like '%format%';
#View binlog file list
mysql> show binary logs;
#View the status of binlog
mysql> show master status;

By default, the binlog log is in binary format and cannot be viewed directly. There are two ways to view:

  • mysqlbinlog
    Enter datadir to view the binlog file

        $mysqlbinlog mysql-bin.000001
        ## Or, remotely read the binlog file
        $mysqlbinlog -R -hIP -uNAME -pPASSWORD mysql-bin.000001
  • Command line parsing

       SHOW BINLOG EVENTS
            [IN 'log_name']
            [FROM pos]
            [LIMIT [offset,]row_count]

    Such as:

        mysql> show binlog events in 'mysql-bin.000001' from 0 limit 2,1;
statement format log
# at 524
# 190406 23:36:43 server id 1 end_log_pos 589 CRC32 0x790d8d0f Anonymous_GTID last_committed = 1 sequence_number = 2 rbr_only = no
SET @@ SESSION.GTID_NEXT = 'ANONYMOUS'/*! * /;
# at 589
# 190406 23:36:43 server id 1 end_log_pos 676 CRC32 0xc0498a17 Query thread_id = 11 exec_time = 0 error_code = 0
SET TIMESTAMP = 1554565003/*! * /;
BEGIN

/! * /;
# at 676
# at 708
# 190406 23:36:43 server id 1 end_log_pos 708 CRC32 0xe7b12002 Intvar
SET INSERT_ID = 75/! * /;
# 190406 23:36:43 server id 1 end_log_pos 863 CRC32 0x902ebba7 Query thread_id = 11 exec_time = 0 error_code = 0
SET TIMESTAMP = 1554565003/*! * /;
insert into test.t( d, s) values (NOW(), '2019-03-15 09:53:47')
/
! * /;
# at 863
# 190406 23:36:43 server id 1 end_log_pos 894 CRC32 0x30487fcc Xid = 7
COMMIT/
! * /;

row format log
# at 428
# 190404 17:07:00 server id 1 end_log_pos 493 CRC32 0xcb947c46 Anonymous_GTID last_committed = 1 sequence_number = 2 rbr_only = yes

/! 50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED * //! * /;
SET @@ SESSION.GTID_NEXT = 'ANONYMOUS'/*! * /;
# at 493
# 190404 17:07:00 server id 1 end_log_pos 573 CRC32 0x5050376a Query thread_id = 33 exec_time = 0 error_code = 0
SET TIMESTAMP = 1554368820/
! * /;
BEGIN
/
! * /;
# at 573
# 190404 17:07:00 server id 1 end_log_pos 621 CRC32 0xe48f4fe2 Table_map:test.t mapped to number 108
# at 621
# 190404 17:07:00 server id 1 end_log_pos 671 CRC32 0x8af05d9b Write_rows:table id 108 flags:STMT_END_F

BINLOG '
NMmlXBMBAAAAMAAAAG0CAAAAAGwAAAAAAAEABHRlc3QAAXQAAwMSEgIAAATiT4/k
NMmlXB4BAAAAMgAAAJ8CAAAAAGwAAAAAAAEAAgAD //hJAAAAmaLJEcCZop6db5td8Io =
'/*! * /;
# at 671
# 190404 17:07:00 server id 1 end_log_pos 702 CRC32 0x82022821 Xid = 8
COMMIT/*! * /;

The logs in both formats include the following information:

  • position:the position in the file(# at 573), indicating from which byte of the file the event recording starts
  • timestamp:timestamp when the event occurred(# 190404 17:07:00)
  • exec \ _time:time spent on event execution
  • error \ _code:error code
  • server id:server ID
  • thread \ _id:agent thread id
  • type:Event type(Query)
  • SET TIMESTAMP = 1554368820/*! * /;:The time to start things
  • end \ _log \ _pos:the end of the event(end_log_pos 671)
  • Xid:The event indicates the submitted XA transaction

6. Summary of basic operation of binlog

View
#View location
mysql> show variables like '%datadir%';
#View the open status and file name of binlog
mysql> show variables like '%log_bin%';
#View the current format of binlog
mysql> show variables like '%format%';
#View binlog file list
mysql> show binary logs;
#View the status of binlog, that is, the number name of the last binlog log, and the end point of the last operation event pos
mysql> show master status;

Use the mysqlbinlog tool to view the binlog file

$mysqlbinlog mysql-bin.000001
## Or, remotely read the binlog file
$mysqlbinlog -R -hIP -uNAME -pPASSWORD mysql-bin.000001
Data recovery
#Restore based on time:
$mysqlbinlog --start-datetime = "2019-04-04 23:20:35" --stop-datetime = "2019-04-04 23:50:18" mysql-bin.000001 | mysql -uroot -p123456

# Recovery based on time:
$mysqlbinlog --start-position = 428 --stop-position = 671 mysql-bin.000001 | mysql -uroot -p123456

You can also use the tool binlog2sql
), Parse out the SQL you want from MySQL binlog. According to different options, you can get raw SQL, rollback SQL, INSERT SQL with primary key removed, etc.

Cleanup
#Refresh log log, from now on a new numbered binlog log file is generated
#Every time the mysqld service restarts, it will automatically execute this command to refresh the binlog log; adding -F option when mysqldump backs up data will also refresh the binlog log;
mysql> flush logs;
#Reset(clear) all binlog logs
mysql> reset master;
#Delete the binlog log file in the log index before the specified date
mysql> purge master logs before '2019-03-15 09:35:00';
#Delete the binlog log file in the log index of the specified log file
mysql> purge master logs to 'binlog.000001';

7. MySQL binlog-based replication process

Replication is one of the most important functions of mysql. The high availability, load balancing and read-write separation of mysql cluster are all based on replication; from 5.6, there are two ways to achieve replication, based on binlog and based on GTID), The basic process of a master-slave replication based on binlog is as follows:

  1. Master records the data changes in the binary log(binary log)
  2. The Slave IO process connects to the Master and requests the log content from the specified location of the specified log file(or from the first log)
  3. After the Master receives the request from the Slave IO process, the IO process responsible for copying will read the log information after the specified location of the log according to the request information and return it to the Slave IO process. In addition to the information contained in the log, the returned information also includes the name of the bin-log file and the location of the bin-log that the information returned this time has reached the Master side
  4. After the Slave IO process receives the information, it will sequentially add the received log content to the end of the relay-log file on the slave side, and record the file name and location of the bin-log on the master side to the master-info In the file, so that the next time you read it, you can clearly tell the Master where from a certain bin-log to start the log content.
  5. When Slave's Sql process detects that the relay-log has newly added content, it will immediately parse the relay-log content into those executable content that is actually executed on the Master side, and execute it on its own.

[Please indicate the source of the reprint]: https://segmentfault.com/a/1190000022744239