Thursday, August 11, 2005

JRun Guard Page Exception ???

Now here's a strange one :

One of our servers (W2K3, CFMX 6.1 standard) keeps falling over in the middle of the night, when there's virtually nothing running.

The error message we get in Event Viewer starts with :
"Application popup: jrun.exe - Application Error : The exception Guard Page Exception..... blah blah blah"

The CF logs show nothing at around that time: no exceptions, nothing in the application log, nada.

We've taken to running CF from the command prompt so that we can see any error messages it produces, and last night it said this:

"An End of Array or Structure has been reached, The exception was throw at the location (a memory location number) "

....which Google appears NOT to have heard of. Hmmm......

I only found one other mention of the Guard Page Exception in relation to JRun, which doesn't look like it was ever resolved.

We've been through just about every single "tune your JVM settings" post out there, including RobiSen's exhaustive tuning guide

I figure that if we're having this error, surely we're not the first - i guess people are just keeping quiet about it, or something...

So, has anyone else out there ever had this error? Anyone managed to fix it?

15 comments:

Niklas Richardson said...

Have you monitored the JVM memory or system memory?

It might be worth outputting that incrementally to see if you're either hitting a roof with either the Heap memory or Perm memory.

Have you got all updaters installed?

Alistair Davidson said...

We did have a lot of memory issues with this server, with JRun seemingly grabbing as much memory as it could and never releasing it, until the server fell over.

We tried all manner of JVM settings, mostly experimenting with -Xmx and -Xms settings, plus UseParallelGC etc etc...and although we never entirely got rid of the memory issues, we got it down to a more manageable level (ie. crashing once or twice a week as opposed to several times a day), and it seemed to stay up ok at the peak load times, so we stuck with that.

The memory issues have been going on since about April, and our sysadmin guy looks after it day to day - until this morning, when the boss pointed at me and said, effectively, "you're the CF guru, you fix it!" :-)

So yes, as far as I know, we've got all the updaters.

Niklas Richardson said...

Hi Alistair,

As a way to see exactly what's happening to memory usage I would suggest install and running VisualGC while you have CFMX running from the command line. This will give you a real detailed analysis of what is happening inside the JVM. Check out this link for more information:

http://www.petefreitag.com/item/141.cfm

Things have changed a bit since Pete's posting (visualgc now requires JRE 1.5). But it's fairly easy to get running once installed.

If you don't mind me asking, what spec machine do you have? Do you have anything else running on the server along side ColdFusion MX (i.e. databases, etc...) that could be steal resources?

Alistair Davidson said...

Thanks for the suggestion Niklas. Unfortunately the server only seems to fall over in the middle of the night, between about 2am - 5am. Short of sitting in the office all night and waiting for it to fall over, i'm not sure how much use it would be.

I did try profiling the app on our dev system with VisualGC a couple of months ago, and I used that to try and tune the JVM settings on the live system.

If it requires JRE 1.5 then that's a problem - we tried 1.5 on that same server a couple of months ago, to try and solve the memory issues, but it just made the problem catastrophically worse, so we had to revert back to 1.4.2

The machine is reasonably beefy - it's a dual Xeon with 2 gigs of RAM.
We have a separate database server which it connects to. The only other thing of note running on the same box is ASP, plus a couple of helper apps (LDaemon, MDaemon).

I've just checked it now, and the jrun.exe process is using 1% of the CPU, with mem usage at 497M and VM Size at 1,151M, with 74 threads. During the day, it usually stays up, but then falls over in the middle of the night.

Andy Allan said...

Any scheduled tasks at that time? or maybe backups that are kicking in?

Alistair Davidson said...

The last scheduled task completes around 00:30, and the backups kick in at 1am, but they're finished by 2.

Niklas Richardson said...

Hi Alistair,

You can still use visualgc with Java 1.5 by installing it seperately from the version used by ColdFusion MX. So you don't affect the version of Java that CFMX is using.

I've used this with great affect on a number of performance tuning / JVM tuning engagements for existing clients.

It does seem strange that it will just crash in the early hours for no apparent reason.

To debug this more I would however want a bit more information about the JVM metrics. Have you set up the JVM metrics to log to a file so you can see what the memory is at before it crashes?

Also, another thought I just had is that perhaps the Perm mem is running out of memory?

What are your current JVM settings?

Alistair Davidson said...

Thanks, I'll give that a try.

Our current JVM settings are (copied from runtime/bin/jvm.config and re-formatted a bit for readability):

# Arguments to VM
java.args=
-server
-DJINTEGRA_NATIVE_MODE
-DJINTEGRA_PREFETCH_ENUMS
-Xms1024m
-Xmx1024m
-Dsun.io.useCanonCaches=false
-Xbootclasspath/a:"{application.home}/../lib/webchartsJava2D.jar"
-XX:MaxPermSize=256m
-XX:+UseParallelGC
-XX:NewSize=64m
-XX:PermSize=64m
-Djavax.xml.parsers.SAXParserFactory=com.macromedia.crimson.jaxp.SAXParserFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.macromedia.crimson.jaxp.DocumentBuilderFactoryImpl

...but these have tended to change every couple of days as we try something else!

Niklas Richardson said...

Hi Alistair,

Okay.

Now, that heap size is large. Have you seen the memory usage go up that high in the JVM.

Check out the link below for some ColdFusion code to look at the the memory usage of the JVM for your ColdFusion server:

http://www.petefreitag.com/item/115.cfm

or an updated version with stats over time:

http://www.prismix.com/blog/archives/2005/08/coldfusion_memo.cfm

Remember you only want your JVM memory to be set at an increment of around 256MB above the maximum amount of memory you have seen your JVM use when your application has been under load. Otherwise if you assign TOO much memory then you'll lose performance (due to garbage collection).

You'll also want to monitor the perm size (best using visualgc or the JRun metrics) for this.

If you are using a lot of memory variables you will want to increase the perm size.

Unfortunately, this won't explain why you're having these problems. But it will certainly help you debug it a bit more and tune that memory! :)

Is another helpful link of tuning articles:

http://www.petefreitag.com/tag/jvm

Hope that helps.

Alistair Davidson said...

Yes, we have a large heap size - we had to set it that high to stop the server falling over with OutOfMemory exceptions. The JRun service would just keep grabbing more and more memory until the server died. I found this technote at MM which seemed to imply that this is actually not unexpected behaviour:

http://www.macromedia.com/cfusion/knowledgebase/index.cfm?id=tn_17517

" If your application needs a certain amount of memory at one point in time, it will never need less. This makes it appear that memory usage is always rising and giving the impression of a leak."

I did try interrogating the runtime service as in Pete's example code, but I found that the numbers it returned bore virtually no resemblance to the actual memory usage - maybe that's a Standard vs. Enterprise thing?

I'll see what I can do in terms of logging metrics and so on.

robi said...

what logs have you checked?
have you seen any jvm exceptions?

robi said...

BTW what version of the JVM are you running? Also can you isolate what code is running at the time you have the issue?

Alistair Davidson said...

Hi Robi

The JRun Guard Page Exception seems to have stopped happening over the last couple of months, although the memory issues have been continuing - but I think we made a major breakthrough in this issue just yesterday afternoon, which I'm about to create a new post about..... because it's something so smack-your-forehead simple that I can't believe we didn't think of it before.....

Anonymous said...

Hello Buddy,
I'm facing the same problem, but on a linux server, IBM JDK 1.4.1 for Anthill. I read through your post, right now, but couldn't find the solution you got.
You've mentioned posting the solution in a separate post, but I couldn't find that one.
Would be great if you could let me and the others know about this holy grail.


Thanks a lot in advance.

Note : I'll visit your blogs for a while to find out if you got this post, and if you can help us out.

Alistair Davidson said...

anonymous : I've had quite a few people ask me what the "silver bullet" was, and I'm afraid to say that it's been a while since I wrote this post and my memory is a bit fuzzy, but I'm fairly sure it was connected to the spiralling memory issues that I resolved here : RSS Ate My Server

Do you use session variables?
Do you dynamically generate RSS?
If so, check out that post and see if it helps.