Friday, May 22, 2009

This should not be possible - transactional fail?

We're having a bizarre problem with MySQL 5.0.32 on Debian in a highly-concurrent environment, using the ActiveRecord JDBC adapter. We've also seen it on 5.0.51 on Ubuntu.

There's a particular sequence of 3 queries, all executed from within a ActiveRecord::Base.transaction { } block:


// Statement 1:
INSERT INTO (InnoDB table 1)
(type, ...loads of other fields...)
SELECT 'ProtoMessageItem', ...other fields...
FROM (MyISAM table 1) LEFT OUTER JOIN (InnoDB table 2)
WHERE (anti-duplication clause)
GROUP BY (synthetic identifier on MyISAM table 1, to be a unique key on InnoDB table 1)

// Statement 2:
INSERT INTO (InnoDB table 2)
SELECT (fields)
FROM (MyISAM table 1) LEFT OUTER JOIN (InnoDB table 3) INNER JOIN (InnoDB table 1)
WHERE (anti-duplication clause)

// Statement 3:
UPDATE (InnoDB table 1)
SET type = 'MessageItem'
WHERE type = 'ProtoMessageItem'


Now, all of this takes place within an explicit transaction, and works well - this approach provides huge performance gains over the more simple and obvious methods.

The problem is that just very very occasionally, it breaks. We don't know how or why, but once in a blue moon, statement 2 throws a DUPLICATE KEY exception, and the transaction DOESN'T get rolled back. Statement 1 still gets COMMITTED. We can see ProtoMessageItems in the DB, and this blocks up the pipeline and stops further MessageItems from being created.

ProtoMessageItems should NEVER be visible - they're only ever created and then updated to full MessageItems in the same 3-statement transaction. Seeing them from outside that transaction is like seeing a naked singularity - it just shouldn't be possible.

This is all kinds of wrong in a very wrong way, as it means one of the following - either:

  1. The DB is getting the correct BEGIN/COMMIT/ROLLBACK statements, but transactional atomicity is not holding for some reason

  2. The JDBC driver is not sending the correct ROLLBACK statements, but thinks it's in autocommit mode

  3. There's an obscure concurrency edge case somewhere with the ActiveRecord JDBC adapter, that means once in a while, for some reason, it starts that transaction in autocommit mode.


All of these possibilities are worrying. We've eliminated the possibility of it being some quirk of the particular data that causes this problem. We've turned on MySQL statement logging and watched it for hours, and as far as we can see, the correct ROLLBACK statements seem to be issued every time.

Has anyone encountered a similar problem? Any known issues with INSERT/SELECT on combinations of (transactional) InnoDB and (non-transactional) MyISAM tables?

2 comments:

Anonymous said...

MySQL doesn't recommend mixing transactional and non-transactional engines in a transaction but it doesn't say anything about inconsistent results as a result.

You can try and change the engine from MyISAM to innodb in a test environment and see if the problem remains.

Mack

Alistair Davidson said...

So, after about 2 months of continuous transaction logging, waiting for the problem to re-occur, it finally did.

And.... nothing. All the ridiculously verbose logging we put into the code indicates that all the correct BEGIN / COMMIT / ROLLBACK statements are being sent to the database.

DAMN!

Which only leaves the rather scary possibility that every so often, MySQL transactional consistency can fail. Somehow. For some unknown reason.

That makes me feel icky.