During some scale testing the other day, we came across this unusual / mildly amusing error in a database-bound command that just funnels INSERT SELECT statements down the ActiveRecord JDBC driver:
java/util/Arrays.java:2734:in `copyOf': java.lang.OutOfMemoryError: Java heap space (NativeException)
from java/util/ArrayList.java:167:in `ensureCapacity'
from java/util/ArrayList.java:351:in `add'
from com/mysql/jdbc/StatementImpl.java:1863:in `getGeneratedKeysInternal'
from com/mysql/jdbc/StatementImpl.java:1818:in `getGeneratedKeys'
from org/apache/commons/dbcp/DelegatingStatement.java:318:in `getGeneratedKeys'
from jdbc_adapter/JdbcAdapterInternalService.java:668:in `call'
from jdbc_adapter/JdbcAdapterInternalService.java:241:in `withConnectionAndRetry'
from jdbc_adapter/JdbcAdapterInternalService.java:662:in `execute_insert'
... 25 levels...
Now, there's a couple of things here that are worth pointing out.
- I REALLY LOVE the fact that it blew heap space in a method called
ensureCapacity
. That makes me smile. - Why is it calling getGeneratedKeys() for an INSERT SELECT?
The getGeneratedKeys() method retrieves all the primary keys that are generated when you execute an INSERT statement. Fair enough - BUT the issue here is that we'd specifically structured the process and the SQL statements involved so as to be done with INSERT SELECTS, and hence
avoid great chunks of data being transferred backwards and forwards between the app and the database.
It turns out that the ActiveRecord JDBC adapter is doing this :
(lib/active_record/connection_adapters/JdbcAdapterInternalService.java)
@JRubyMethod(name = "execute_insert", required = 1)
public static IRubyObject execute_insert(final IRubyObject recv, final IRubyObject sql) throws SQLException {
return withConnectionAndRetry(recv, new SQLBlock() {
public IRubyObject call(Connection c) throws SQLException {
Statement stmt = null;
try {
stmt = c.createStatement();
smt.executeUpdate(rubyApi.convertToRubyString(sql).getUnicodeValue(), Statement.RETURN_GENERATED_KEYS);
return unmarshal_id_result(recv.getRuntime(), stmt.getGeneratedKeys());
} finally {
if (null != stmt) {
try {
stmt.close();
} catch (Exception e) {
}
}
}
}
});
}
...in other words, explicitly telling the driver to return all the generated keys.
Hmm, OK, can we get round this by NOT calling the execute_insert method, and instead calling a raw execute method that
doesn't return all the keys?
Well, no, unfortunately, because it also turns out that the ruby code is doing this:
(activerecord-jdbc-adapter-0.9/lib/active_record/connection_adapters/jdbc_adapter.rb)
# we need to do it this way, to allow Rails stupid tests to always work
# even if we define a new execute method. Instead of mixing in a new
# execute, an _execute should be mixed in.
def _execute(sql, name = nil)
if JdbcConnection::select?(sql)
@connection.execute_query(sql)
elsif JdbcConnection::insert?(sql)
@connection.execute_insert(sql)
else
@connection.execute_update(sql)
end
end
...and the JdbcConnection::insert? method is detecting if something's an insert by doing this:
(JdbcAdapterInternalService.java again)
@JRubyMethod(name = "insert?", required = 1, meta = true)
public static IRubyObject insert_p(IRubyObject recv, IRubyObject _sql) {
ByteList bl = rubyApi.convertToRubyString(_sql).getByteList();
int p = bl.begin;
int pend = p + bl.realSize;
p = whitespace(p, pend, bl);
if(pend - p >= 6) {
switch(bl.bytes[p++]) {
case 'i':
case 'I':
switch(bl.bytes[p++]) {
case 'n':
case 'N':
switch(bl.bytes[p++]) {
case 's':
case 'S':
switch(bl.bytes[p++]) {
case 'e':
case 'E':
switch(bl.bytes[p++]) {
case 'r':
case 'R':
switch(bl.bytes[p++]) {
case 't':
case 'T':
return recv.getRuntime().getTrue();
}
}
}
}
}
}
}
return recv.getRuntime().getFalse();
}
...in other words, if the sql contains the word
INSERT
, then it's an INSERT, and should be executed with an execute_insert call.
So, looks like we're a bit knacked here. There are two possible solutions:
- The
proper
solution - fix the AR JDBC adapter (and, arguably, the MySQL connector/J as well, to stop it blowing heap space), submit a patch, wait for it to be accepted and make it into the next release.
- Or, the
pragmatic
solution - rewrite the SQL-heavy command as a stored procedure and just call it with an EXECUTE and sod the DHH dogma.
We went with option 2 :)
Big kudos to
D. R. MacIver for tracking down the source of the fail in double-quick time.