SQLTimeoutException: ORA-01013: user requested cancel of current

Posted Jun 27, 20205 min read

problem

The specific scenario is that cat monitoring reports an error.

Error message

Cause:java.sql.SQLTimeoutException:ORA-01013:user requested cancel of current operation

It literally means that the client canceled the database connection and reported an error.

This error is reported in two parts:

  1. Database server
    ORA-01013:user requested cancel of current operation //This is an error returned by the database server

  2. Client
    Cause:java.sql.SQLTimeoutException //This is the exception call stack in java

the reason

Reason 1-Database lock

I searched online and said that it was the cause of the lock, so I initially thought it was the cause of the lock, but later analysis found that it was not the cause of the lock, because the query is an ordinary query, and the ordinary query does not have the problem of locks, that is, any lock, Does not affect reading. Therefore, the error should not be due to this reason.


Cause 2-The client cancels the connection

According to the literal information of the error message, it is because the client cancels the database connection. In fact, this should be the reason. But why did the client cancel the database connection? According to the first part of the error message, it is a timeout. But why did it time out? This error is intermittent, and the general situation is normal. It takes less than 1s to execute SQL in production, and some of the error SQL data is relatively small but will also report errors, so it should not be a problem of SQL itself.

So far, no essential reason has been found, just the literal reason. Therefore, there is no solution.


other information

  1. The database connection pool uses Ali's druid
    This time I updated the configuration of the database connection pool, because I have reported errors in recent days:connection reset, reset connection, recover connection, and other error information.
    The reason is because the connection is invalid. The solution is to enable the idle connection check configuration item testWhileIdle of the connection pool. Then, this time I went online and updated the configuration of the database connection pool. The connection reset error was gone, but after going online, I got an error and the client canceled the connection. Therefore, it may also be related to the updated database connection pool configuration. However, the configuration of other projects is the same, but no error is reported.

  2. Executed SQL, there is no problem, time-consuming normal
    Although it is not the reason for the lock, it is normal to look at the lock in dba, that is, it is a short-term lock(SQL that executes the query lock many times, the lock record has been changing), and there is no case of holding the lock for a long time and not releasing it.

3.cat monitors each error message and also has a corresponding slow sql(there is also an error message, that is, abnormal call stack)
This is also in line with the first part of the error message, that is, the time to execute sql has timed out. But I don't know why it timed out.

  1. Database lock
    Other departments have the same error, and it is the separation of read and write, so it should not be the cause of the database lock.

to sum up

  1. Not the same order error, other order numbers also have errors

  2. It should not be the reason for the lock
    1) There is no lock that has not been released for a long time. I confirmed with dba that the occupation of the lock is short-lived, so it should not be the cause of the lock
    2) The lock of the database will not affect the query. The current query is just an ordinary query, so it should not be the reason for the lock
    3) Asked other departments, they read and write separately, and have this problem when inquiring, so it should not be the reason for the lock

  3. Should be caused by occasionally querying the database for too long
    When abnormal, the error and slow sql in cat are abnormal, and they are the same order.

The specific analysis is that because the database query is too long, the application times out, disconnects the database connection, and then reports an error:ORA-01013:user requested cancel of current operation, that is, the client canceled the database connection.

Locally simulate the query for update of two clients, and the subsequent for update will always block-stop the subsequent query, and an error message will be reported:ORA-01013:user requested cancel of current operation. Therefore, when the client cancels the connection, an error message will be reported. The more likely reason in the application is that the application timed out because the query to the database took too long, and then the application canceled the connection, thereby reporting the error message.

Solution

After analysis, it is unlikely to be a slow SQL problem, so it is still a connection problem. The current connection failure detection is only performed when the number of connections is greater than the minimum number. Therefore, there is still a problem of connection failure. If the connection fails, the client and server do not know, and then the client times out and then cancels the database connection, which may cause this problem.

The connection may also be disconnected by the firewall, and the firewall will disconnect long connections once an hour.

Therefore, you can enable less than the minimum number of connection failure detection, specifically keepalive configuration keepalive.


version

The keep-alive function is added later, so you need to use a newer version, currently using 1.1.9. Although this version also supports, it is best to use the new version, because the new version fixes the keep-alive configuration bug.

It is recommended to use at least 1.1.16
1.1.21 //Recommended
Latest 1.1.22


reference

https://github.com/alibaba/dr...

to sum up

How to keep the database connection pool druid? Also open 1. testWhileIdle configuration 2. keepAlive configuration.

Other knowledge points

Database lock

Here only talk about the row lock of the database, generally two cases will be locked:

  1. When updating data, other transactions cannot update the data
  2. Query for update, other transactions cannot update data

In other words, the above two situations do not affect the read data, the read data is not locked, unless for update is added.

In fact, any lock does not affect the read data, at least does not affect the ordinary query, that is, the query does not add for update.


oracle-query lock record

In Oracle database operation, we sometimes use operations such as lock table query and unlock and kill process.
So how are these operations implemented? In this article, we mainly introduce this part.

(1) The code for querying the lock table has the following form:

select count(*) from v$locked_object;

select * from v$locked_object;

(2) Check which table is locked

select b.owner, b.object_name, a.session_id, a.locked_mode
from v$locked_object a,dba_objects b
where b.object_id = a.object_id;

(3) Check which session caused it

select a.OS_USER_NAME, c.owner, c.object_name, b.sid, b.serial#, logon_time
from v$locked_object a, v$session b, dba_objects c
where a.session_id = b.sid
 and a.object_id = c.object_id
 order by b.logon_time;

(4) Kill the corresponding process

Execute command:alter system kill session '1025,41'; requires user to have permission to operate
Among them, 1025 is sid and 41 is serial#.

https://blog.csdn.net/wangchu...


Do shared locks and exclusive locks affect reading?

The difference between shared locks and exclusive locks is not to affect reads. Both locks do not affect reads. In fact, any locks do not affect reads. But there are other details, the specific difference is not the point, not to talk about it.

Because during the test, whether it is a shared lock(query for update) or an exclusive lock(update data, that is, add, delete, and modify), one transaction is updated, and the other transaction can still read the data.


test

  1. Two for update, the later for update will block, because the lock is acquired but cannot be obtained
  2. A for uddate, an ordinary query, you can query normally
  3. One update, one query, you can query normally

The above is tested without submitting the first operation.