Thursday, September 29, 2011

MySQL idle connections still holding locks

We had an interesting problem today. We were seeing very slow single-row updates (>30s) on our (innodb) scheduled_jobs table, and a large number of the update queries were failing with a LOCK WAIT TIMEOUT. The updates were using the primary key, so they should be pretty fast - but they weren't.

So we fired up a console and ran SHOW INNODB STATUS, and saw lots of transactions with long-held locks -

---TRANSACTION 0 200086649, ACTIVE 3000 sec, process no 29791, OS thread id 140353331377936
12 lock struct(s), heap size 3024, 6 row lock(s), undo log entries 10
MySQL thread id 243, query id 314676 ip-10-94-245-79.ec2.internal 10.94.245.79 tn


- however, when we cross-referenced the MySQL thread IDs with a SHOW PROCESSLIST, we found that lots of the threads weren't actually doing anything:

+------+------------+-----------+-----------------+----------------+------+--------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------------+-----------+-----------------+----------------+------+--------------+------------------+
| 243 | tn | (..snip..)| tn_production | Sleep | 3000 | | NULL



This was strange - but after a bit of a Googling and a few minutes thought, we realised the cause.

When one of our worker processes dies - e.g. either because monit has killed it for taking up too much resource for too long, or because the EC2 instance it's running on has become unresponsive to ssh and been automatically restarted by our detect-and-rescue script - it may be in the middle of a transaction at the point it's killed.

If the transaction was holding locks, and never got to the "COMMIT" stage, and the connection wasn't closed normally, then that connection will persist - and maintain its locks - until it gets timed-out by the MySQL server.... and the default wait_timeout variable is 28800s - 8 hrs!.

So, we killed the idle connections, and the update queries started going through much more quickly again. We've now brought our wait_timeout variable down to 10 mins, and we'll see how we get on.

5 comments:

Krasio said...

Hi,

How is this working?
Are you using Rails/mysql2?

Looks like mysql2 sets wait_timeout=2592000 for its connections - https://gist.github.com/a80f890858a7e1f459b5

Alistair Davidson said...

Hi Krasio

This was with the mysql gem and ActiveRecord on Rails 2.3.

It looks like bringing the wait_timeout down has made a big difference - we haven't seen the symptom since.

Al

Alistair Davidson said...

Hi Krasio

This was with the mysql gem and ActiveRecord on Rails 2.3.

It looks like bringing the wait_timeout down has made a big difference - we haven't seen the symptom since.

Al

Krasio said...

Thanks, I'll try it that too.

Anonymous said...

YOu set timeout time to 10 minutes, what if I set timeout to 30 seconds. I dont want to keep sleeping connections. Will it affect my application????