Wednesday, August 12, 2009

OutOfMemoryError in ActiveRecord-JDBC on INSERT SELECT

During some scale testing the other day, we came across this unusual / mildly amusing error in a database-bound command that just funnels INSERT SELECT statements down the ActiveRecord JDBC driver:


java/util/Arrays.java:2734:in `copyOf': java.lang.OutOfMemoryError: Java heap space (NativeException)
from java/util/ArrayList.java:167:in `ensureCapacity'
from java/util/ArrayList.java:351:in `add'
from com/mysql/jdbc/StatementImpl.java:1863:in `getGeneratedKeysInternal'
from com/mysql/jdbc/StatementImpl.java:1818:in `getGeneratedKeys'
from org/apache/commons/dbcp/DelegatingStatement.java:318:in `getGeneratedKeys'
from jdbc_adapter/JdbcAdapterInternalService.java:668:in `call'
from jdbc_adapter/JdbcAdapterInternalService.java:241:in `withConnectionAndRetry'
from jdbc_adapter/JdbcAdapterInternalService.java:662:in `execute_insert'
... 25 levels...


Now, there's a couple of things here that are worth pointing out.

  1. I REALLY LOVE the fact that it blew heap space in a method called ensureCapacity. That makes me smile.
  2. Why is it calling getGeneratedKeys() for an INSERT SELECT?


The getGeneratedKeys() method retrieves all the primary keys that are generated when you execute an INSERT statement. Fair enough - BUT the issue here is that we'd specifically structured the process and the SQL statements involved so as to be done with INSERT SELECTS, and hence avoid great chunks of data being transferred backwards and forwards between the app and the database.

It turns out that the ActiveRecord JDBC adapter is doing this :
(lib/active_record/connection_adapters/JdbcAdapterInternalService.java)

@JRubyMethod(name = "execute_insert", required = 1)
public static IRubyObject execute_insert(final IRubyObject recv, final IRubyObject sql) throws SQLException {
return withConnectionAndRetry(recv, new SQLBlock() {
public IRubyObject call(Connection c) throws SQLException {
Statement stmt = null;
try {
stmt = c.createStatement();
smt.executeUpdate(rubyApi.convertToRubyString(sql).getUnicodeValue(), Statement.RETURN_GENERATED_KEYS);
return unmarshal_id_result(recv.getRuntime(), stmt.getGeneratedKeys());
} finally {
if (null != stmt) {
try {
stmt.close();
} catch (Exception e) {
}
}
}
}
});
}


...in other words, explicitly telling the driver to return all the generated keys.
Hmm, OK, can we get round this by NOT calling the execute_insert method, and instead calling a raw execute method that doesn't return all the keys?

Well, no, unfortunately, because it also turns out that the ruby code is doing this:
(activerecord-jdbc-adapter-0.9/lib/active_record/connection_adapters/jdbc_adapter.rb)

# we need to do it this way, to allow Rails stupid tests to always work
# even if we define a new execute method. Instead of mixing in a new
# execute, an _execute should be mixed in.
def _execute(sql, name = nil)
if JdbcConnection::select?(sql)
@connection.execute_query(sql)
elsif JdbcConnection::insert?(sql)
@connection.execute_insert(sql)
else
@connection.execute_update(sql)
end
end


...and the JdbcConnection::insert? method is detecting if something's an insert by doing this:
(JdbcAdapterInternalService.java again)

@JRubyMethod(name = "insert?", required = 1, meta = true)
public static IRubyObject insert_p(IRubyObject recv, IRubyObject _sql) {
ByteList bl = rubyApi.convertToRubyString(_sql).getByteList();

int p = bl.begin;
int pend = p + bl.realSize;

p = whitespace(p, pend, bl);

if(pend - p >= 6) {
switch(bl.bytes[p++]) {
case 'i':
case 'I':
switch(bl.bytes[p++]) {
case 'n':
case 'N':
switch(bl.bytes[p++]) {
case 's':
case 'S':
switch(bl.bytes[p++]) {
case 'e':
case 'E':
switch(bl.bytes[p++]) {
case 'r':
case 'R':
switch(bl.bytes[p++]) {
case 't':
case 'T':
return recv.getRuntime().getTrue();
}
}
}
}
}
}
}
return recv.getRuntime().getFalse();
}


...in other words, if the sql contains the word INSERT, then it's an INSERT, and should be executed with an execute_insert call.

So, looks like we're a bit knacked here. There are two possible solutions:

  1. The proper solution - fix the AR JDBC adapter (and, arguably, the MySQL connector/J as well, to stop it blowing heap space), submit a patch, wait for it to be accepted and make it into the next release.

  2. Or, the pragmatic solution - rewrite the SQL-heavy command as a stored procedure and just call it with an EXECUTE and sod the DHH dogma.


We went with option 2 :)

Big kudos to D. R. MacIver for tracking down the source of the fail in double-quick time.

3 comments:

David R. MacIver said...

It should be noted we abandoned option 2 for option 3, which looks like the following:

def ex(sql)
logger.debug "Executing SQL:\n #{sql}"
start = Time.now
# ActiveRecord does stupid things which causes us to get an array
# with all the generated keys. Therefore we bypass it and go straight
# to JDBC.
ActiveRecord::Base.
connection.
raw_connection.
connection.
prepare_statement(sql).
execute_update
logger.debug "Execution completed in #{Time.now - start}s"
end


Ah, connection.raw_connection.connection. Such beautiful API design.

Peter said...

the amount of fugly in method "insert_p" is staggering.

maria said...

thank for all information