Explain command to analyze MySQL SQL execution
Posted Jun 15, 2020 • 7 min read
In the previous article "Analysis of MySQL Common Locking Scenarios" , we talked about row locks being added to the index, but they are complicated SQL often contains multiple conditions and involves multiple indexes. Finding out which indexes are used during SQL execution is critical to analyzing locking scenarios.
For example, the following SQL:
mysql> delete from t1 where id = 1 or val = 1
Among them, id and val are indexes, so which indexes are used during execution and which locks are added? For this, we need to use explain to get MySQL's execution plan for executing this SQL.
What is an execution plan? To put it simply, it is the performance of SQL when it is executed in the database. It is usually used in scenarios such as SQL performance analysis, optimization, and lock analysis. The execution process is jointly performed by the parser, preprocessor, and query optimizer during the MySQL query process. generate.
MySQL query process
If you can figure out how MySQL optimizes and executes the query, it will not only help to optimize the query, but also determine the final locking scenario by analyzing the index used.
The following figure is the process of MySQL executing a query. In fact, each step is more complicated than expected, especially the optimizer, which is more complicated and difficult to understand. This article only gives a brief introduction.
The MySQL query process is as follows:
- The client sends a query to the server.
- The server first checks the query cache, and if it hits the cache, it immediately returns the results stored in the cache. Otherwise, enter the next stage.
- SQL analysis and preprocessing are performed on the server side, and the corresponding execution plan is generated by the optimizer.
- MySQL executes the query according to the execution plan generated by the optimizer, and then calls the API of the storage engine.
- Return the result to the client.
MySQL will parse the query, and create an internal data structure(parse tree), and perform various optimizations on it, including rewriting the query, determining the reading order of the table, and selecting the appropriate index.
The user can hint to the optimizer to influence the optimizer's decision-making process. You can also use explain to understand how the database makes optimization decisions, and provide a reference benchmark for users to reconstruct the query and database table schema, modify the database configuration, etc., to make the query as efficient as possible.
Below, we introduce the relevant output parameters in explain in turn, and explain the meaning of these parameters with practical examples.
The types of operations for querying data are as follows
- simple simple query, does not contain sub-query or union, as shown in the following figure, is the simplest query statement.
- primary is a complex subquery in SQL, and the outermost query is marked as this value.
- derived is the subquery contained in the from clause in SQL is marked as this value, MySQL will recursively execute these subqueries and put the results in a temporary table. The following figure shows the above two types.
- subquery is the subquery that SQL contains in select or where, and is marked as this value.
- dependent subquery:The first select in the subquery depends on the outer query, which is generally the subquery in in.
- union is SQL's second select after the union keyword and is marked as this value; if union is included in the from subquery, the outer select is marked as derived.
- union result Select to get the result from the union table. The following figure shows the SQL case of union and union result.
- The dependent union is also the second or subsequent select statement after the union keyword, which is the same as dependent subquery and depends on the external query.
The connection type of the table, its performance is ranked from high to low as system, const, eq_ref, ref, range, index and all.
- system indicates that the table has only one row of records, which is equivalent to the system table. As shown in the following figure, because the table derived from the subquery of from has only one row of data, the table connection type of the primary is system.
- const is found by indexing only once, matching only one row of data, used for constant value comparison PRIMARY KEY or UNIQUE index.
eq_ref unique index scan. For each index key, only one record in the table matches it. It is often used for primary key or unique index scan. For each row combination from the preceding table, read one row from the table. It is the best connection type except const type.
As shown in the following figure, the type of the query on table t1 is ALL, which means a full table scan, and then each row of data in t1 is compared with the primary key index of t2.id, so the query on t2 table is eq_ref.
- The ref non-unique index scan returns all rows that match a single value. The difference from eq_ref is that the index is a non-unique index. Specific cases are shown below.
- range only checks the rows in a given range, use an index to select rows, when using operators such as =, between, >, <, and in, and use constants to compare key columns. As shown in the figure below, where id is a unique index and val is a non-unique index.
- index is similar to ALL type, the only difference is that it only traverses the index tree to read the index value, which is slightly faster than ALL to read all the data rows, because the index file is usually smaller than the data file. This involves MySQL index coverage
- ALL full table scan, under normal circumstances, the performance is very poor and should be avoided.
possible_keys, key and key_len
The possible_key column indicates which index MySQL may use to look up in this table. If the column is NULL, the relevant index is not used. Need to check where clause conditions to create a suitable index to improve query efficiency.
The key column shows the index that MySQL actually decided to use. If no index is selected, the value is NULL.
key_len shows the length that MySQL decides to use the index. If the key is NULL, this column is also NULL. The length of the index used should be as short as possible while ensuring accuracy. Because the shorter the index file, the smaller the number of I/Os required.
As can be seen from the above figure, for the select * from t2 where id = 1 or val = 1 statement, you can use the PRIMARY or idx_t2_val index. The idx_t2_val index is actually used, and the length of the index is 5 .
These are actually the fields we are most concerned about in the analysis of locking scenarios. Subsequent articles will specifically explain how to determine which locks are added to complex SQL based on these fields and other tools.
The ref column indicates which column or constant of the other table is used to select rows from the table. As shown in the following figure, when reading data from t2, it is necessary to judge t2.id = t1.id, so ref is mysql.t1.id
rows and filtered
The rows column shows the number of rows that MySQL thinks it must check when it executes the query.
The filtered list shows the percentage of rows returned by the SQL statement after the SQL statement is executed. The larger the value, the better. MySQL will use Table Filter to filter the row data read out. In theory, the efficiency is the highest when the row read out is equal to the number of rows returned. The more filtering ratio, the lower the efficiency.
As shown in the figure above, there are three pieces of data in the t1 table, and rows is 3, indicating that all rows must be read. According to the table filter val = 3, only one row of data is returned, so the filtered ratio is 33.33%,
Contains additional information that is not suitable for display in other columns but is very important. Common values are as follows
using index indicates that the select operation uses a covered index, which avoids access to the data rows of the table, and the efficiency is good.
Using where clause is used to limit which line. That is, after reading the data, the Table Filter is used for filtering.
As shown in the figure below, because both id and val are indexed, select * can also directly use the overlay index to read data, so there is using index in extra. And because only 3 rows of data were read using the val index, it was still filtered by where clause, filtered was 55%, so using where was used in extra.
- using filesort MySQL will use an external index to sort the data, instead of reading according to the index order in the table. If this value appears, the SQL statement should be optimized. As shown in the following figure, the val column has no index, so it cannot be sorted and read using the index order.
- using temporary Use temporary tables to save intermediate results. For example, MySQL uses temporary tables when sorting query results. It is often used for order by and group by. If this value appears, SQL should be optimized. In my experience, group by a non-indexed column, or ORDER BY or GROUP BY column is not the first table from the JOIN statement sequence, it will generate a temporary table.
- using join buffer. As shown in the following figure, the connection cache and temporary table are shown. About the content of the connection cache, you can check it yourself, and there will be time to write an article to explain later.
- distinct after finding the first match, stop searching for more rows for the current row combination
After explaining the SQL execution plan through explain, we can not only understand the index used in SQL execution, to determine the locking scenario, but also optimize and analyze SQL for other information, such as optimizing the type from index to ref.