Friday, December 02, 2005
The problem, simply stated, was that the memory, cpu and threads in use by the jrun process was growing without plateauing, until the server locked up and had to be restarted. It usually locked up once JRun was using about 1.1Gig of RAM, which happened several times a day, and nearly always in the middle of the night.
I've posted about this a few times over the last 6 months or so, and we've tried just about everything we could think of and everything anyone had suggested to diagnose and fix it - we just couldn't keep the server up for more than a few hours.
But yesterday I think we made the crucial breakthrough. Said server has now been quite happy for almost 24h, and memory usage is fairly stable at a much more respectable 500M. And the thing is, now that I've found the problem, I can't believe it wasn't one of the first things we tried....
Two days ago I whipped up a very quick-n-dirty app based on the coldfusion.runtime.SessionTracker object (more detail over at RewindLife) to list session scopes per application. I was quite surprised to see well over 1000 sessions on just one of our sites when I was fairly certain that there were nowhere near that many users "active" at the time.
The app in question is a one-codebase-serving-many-sites kind of affair, with each site having its own application scope - I'm sure you know the kind of thing - and if there were over 1000 sessions on just one of those sites, multiply that by the number of sites, and you can end up with a pretty big number.
This led me to the following chain of reasoning:
Every time a HTTP Request is made to a Coldfusion application with session management enabled, the CF server looks for the CFID&CFTOKEN cookie to determine which of its in-memory session scopes to assign to that request.
If there is no relevant cookie, then a new session scope is created.
Session scopes are also per-application - the same visitor will have a different session scope for each cf application they visit.
By itself, CF only stores a very small amount of data per session ( just enough to identify the user )
However, on the system in question, we have to store a significant amount of data per user as everything needs to be permission-controlled - objects and queries are also cached in session scope per-user to speed up performance.
As we are also using CF to permission-control the generated RSS feeds - both for access to the feed itself (e.g. blog feeds from a private group blog) and for what items appear in the feed (e.g. search results for a term which appears in a private groups name will depend on whether the current visitor is a logged-in member of the private group) - then every access to RSS feeds goes through this same mechanism.
Most RSS readers and server-to-server RSS requests will NOT be storing cookies and re-using them - so virtually every request for RSS will create a new session scope.
I've mentioned this before on the Headshift blog : RSS Will Eat Itself? and often joked in the office that the easiest way to create a singularity would be to subscribe two RSS aggregators to each other - but this is the first time I've actually seen it happen.
The session timeout on that app used to be an hour, but it was upped to a day after users were not happy about getting logged out so often. As a result, every time an RSS reader - or another site - requests RSS from the server, a new session is created , complete with cached app-specific objects and data - that hangs around for a full day.
Hence a pattern of gradually increasing resource usage is not that surprising.
So yesterday, after manually restarting CF in the middle of the afternoon for the umpteenth time, I took the opportunity to take the session timeout on that app down to 2hrs - this means people will still be logged in after taking lunch, which I suspect will probably be enough - and now the results are in:
After just short of 24hrs, the server has not neede to be bounced, the number of sessions on that app is showing at 360, the memory usage is pretty stable at around 500M, and the response time has stayed snappy throughout.
Can it really have been as simple as that? I'm just glad that we've got it under control, although I'm kicking myself that we didn't think of this sooner. But then, 99% of fixing any problem is working out what it is in the first place - my old school Physics teacher (Mr Watkinson, who managed to make A-Level Physics seem both fascinating and simple, and without whom I probably wouldn't have gone on to study Physics at university, and hence wouldn't be in my present career) used to say that a problem well-stated is a problem half-solved, and I've yet to see that statement disproved.
I'm not about to unfurl the flags and announce "Mission Accomplished", but I think we can declare major combat operations more-or-less finished :)
Thursday, December 01, 2005
Well, to the Knowledge Community which we developed for the National Institute for Mental Health in England anyway. It really was a complete surprise - we were up against a £5m Norwich Union Autonomy project, and the much-talked-about Public Sector Benchmarking Service, which has impressed a lot of influential people.
The ceremony itself was an interesting affair - the mystery celebrity host turned out to be none other than Sir Trevor MacDonald, and of the many awards handed out on the night, notable mentions should go to :
- The UK Freedom of Information Act blog, for Best Implementation of a Business Blog
- The Oxford Dictionary of National Biography online edition for beating such luminaries as Google Earth and Wikipedia to the Best User Experience gong.
- I won an international award for one of the websites i've been working on!
- Did you? Oh that's nice, well done son
- ...and I shook Trevor MacDonald's hand
- Really?!? WOW! You didn't tell me that...!
(shouts off the phone to Auntie Wendy)
Alistair's met Trevor MacDonald!
- Auntie Wendy:
- Did he? Wow! That's amazing!
(shouts off the phone to cousin Adam) Hey Adam! Alistair's made friends with Trevor MacDonald!
- Really? Wow! Hey Dad! Trevor MacDonald is Alistair's best mate!
...but it made us all feel nicely warm and tingly to have our efforts recognised, and a jolly good celebration was had by all:
Tuesday, November 29, 2005
I haven't been to an awards dinner since University! - and come to think of it, my tuxedo has been standing idle in the wardrobe since the final college ball... I really hope I remembered to clean it before stashing it away - 10 yr old beer stains are not something i want to see right now...ick...
I should mention that although I've been the technical lead on the project pretty much since I joined Headshift, it was already at version 0.9 or so when I picked it up - so kudos to Matt Perdeaux, now of Associative Trails, who kicked it all off in the first place and created such nifty features as the Vector Space Engine, and Andy Birchall, who's been doing the bulk of the recent work for the latest version (2.1.2) while I've been focussed on other things.
Tuesday, November 15, 2005
For those of us who remember the Browser Wars of the 90's, this excitement is tempered by a wry smile and the occasional twinge of apprehension - for instance, although IE7 is going to support tabbed browsing, it's still not going to support full CSS 2.0 - but how can anyone fail to appreciate the power, elegance, and sheer all-round damn-fine-ness of something like the IE Tab View plugin for Firefox ?
Definitions of A Brilliant Idea are many and varied, but my favourite is "something which doesn't exist but should" - and this definitely falls into that category.
All it does is use the IE plugin to render a Firefox tab. So when you right-click on a link, as well as options for "View In New Window" and "View in New Tab", you also get "View In IE Tab" - and it does just that. Plus you get a little icon on the status bar which lets you switch between the IE and Firefox rendering engines for the current page within your current tab.
Simple, effective, genius. Love it.
Friday, November 11, 2005
Well, it' s heartening to know that The Register readership has cracked it - the results are finally in on their What Is Web 2.0? Poll. Feed to your marketing department one page at a time :)
Friday, October 28, 2005
Enhancing User Interfaces with AJAX and Coldfusion - powerpoint
AJAX search suggest example code
I have to say, Niklas' presentation on Flash Forms was a pleasant surprise - I'm one of those guys who looked at CFFORM in about version 4.5, didn't like what I saw, so never looked at them again. I hadn't realised that they'd come on so much - maybe I'll look into them myself at some point.
Niklas says that there will be free beer after the next CFUG meeting, so get yourself down there!
Thursday, October 27, 2005
I introduced SVN/Tortoise SVN to our team earlier this year, and I have to say it's been fantastic - saved our butts on numerous occasions, especially when we've three simultaneous branches of development on the same system, plus live bug-fixes.
The only issues I have with it are
1) merging branches can be very counter-intuitive - I've done it numerous times, and I *still* need the help docs open while I fill in the form
2) it's a bit too easy to to get a working copy into a state where you can't commit, can't update, and can't clean up. The only thing you can do in that case is checkout an entirely new copy, manually port across your revisions, and junk the old copy.
3) terminology - if the "Update" and "Commit" menu options were just a bit more expansive (e.g. something like "Update this copy from repository" and "Check changes back in") it would save a fair amount of confusion amongst the less-technical team members.
All in all though, it is a great piece of software - if you haven't tried version control, or if you've only used something like Visual Source Safe or WinCVS, I thoroughly recommend giving Subversion + TortoiseSVN a try.
Tuesday, October 25, 2005
The meeting is at:
(main campus entrance),
Doors open at 6pm, and the first talk starts at 6:30.
It's been a while since I presented at a CFUG, so I'm looking forward to it. I'll be running through some AJAX tricks, going through a simple Google Suggest kind of interface at a code level, and also giving a tour of how we've used AJAX on a couple of recent projects.
The other topic I'd like to cover (if time allows) is how a good CFC architecture on the back-end can help you achieve a more powerful front-end interface that emulates desktop functionality..... but there's a lot of stuff to get through, so I may end up cutting this bit short -we'll see...
Hope to see you there.
Friday, October 14, 2005
Friday, September 23, 2005
"If your web site sells products that can be delivered digitally - information, music, software, or images - why not base your site where you won't have to pay taxes on the profits or sales? Under the new e-Business Act 2000, you can set up a Cybersuite and conduct business from Vanuatu without even setting up a company. "
OK, so this is hardly new news, as Kazaa and WinMX have already chosen to incorporate in Vanuatu, but it's the first it's been drawn to my attention - maybe I'll investigate...
Tuesday, August 16, 2005
Why would a previously solid, stable, Win2K3 box running CFMX6.1 standard suddenly start throwing a java.lang.NoSuchMethodError on any CFHTTP calls?
(note: this is NOT the box which falls over in the middle of the night with a JRun Guard Page Exception, this is another one!)
This line of code:
<cfhttp method="get" url="http://www.google.com" >
...in fact ANY cfhttp call, to any url, with any combination of parameters...
has just started throwing this error:
here's the stack trace:
at cftest2ecfm1025982501.runPage(«path removed»\test.cfm:11)
The NoSuchMethodError javadoc says the following:
"Thrown if an application tries to call a specified method of a class (either static or instance), and that class no longer has a definition of that method.
Normally, this error is caught by the compiler; this error can only occur at run time if the definition of a class has incompatibly changed."
So this kind of implies that either:
- One or more of the CFMX jar files has changed
- The Java Runtime Environment has changed
yet a quick search shows that neither is the case.
java -version from the command prompty is showing 1.4.2_08-b03
The CF Administrator System Information page shows :
Java VM Version 1.4.2-b28
and this is what I get if I cd into the cfusionmx/runtime/jre/bin directory and run java -version from there.
The only files under CFusionMX that have changed since 1st August are log files, verity collections or in wwwroot/WEB-INF/cfclasses/
My first thought was "has someone been fiddling about with the JVM?" but it appears not. No new code has been released on this box in weeks, nothing has changed in the CF Administrator, and no-one has applied any updates to the box for weeks.... yet it's just started throwing this error on all CHTTP calls.
Maybe it's that marvellous Automatic Updates feature in W2K3?
Anyone else had this problem? Anyone JUST STARTED having this?
If so, maybe we can try to find common elements and track down the cause.
I think I've isolated the cause - we had a custom Java-based CFHTTP replacement on the same box which was built around the HTTPClient lib. In deploying this project, we added a HTTPClient.jar onto the classpath, and it looks like that was causing the issue - a conflict with the httpclient.jar (note case) in the CFusionMX/runtime/lib directory.
When I deleted that second HTTPClient.jar file and restarted the server, the CFHTTP calls started working again.
HOWEVER - the custom-made Java-based CFHTTP replacement calls don't work any more........... it's another one of those "works perfectly on dev, but won't work at all on live" issues that we all know and love so well.
Thursday, August 11, 2005
One of our servers (W2K3, CFMX 6.1 standard) keeps falling over in the middle of the night, when there's virtually nothing running.
The error message we get in Event Viewer starts with :
"Application popup: jrun.exe - Application Error : The exception Guard Page Exception..... blah blah blah"
The CF logs show nothing at around that time: no exceptions, nothing in the application log, nada.
We've taken to running CF from the command prompt so that we can see any error messages it produces, and last night it said this:
"An End of Array or Structure has been reached, The exception was throw at the location (a memory location number) "
....which Google appears NOT to have heard of. Hmmm......
I only found one other mention of the Guard Page Exception in relation to JRun, which doesn't look like it was ever resolved.
We've been through just about every single "tune your JVM settings" post out there, including RobiSen's exhaustive tuning guide
I figure that if we're having this error, surely we're not the first - i guess people are just keeping quiet about it, or something...
So, has anyone else out there ever had this error? Anyone managed to fix it?
Wednesday, August 03, 2005
The usual way that people approach code and commenting is to dive in, code away feverishly through several iterations until they've got something that seems to work, THEN go back through it and add comments - if they have the time...
One trick I picked up early on in my development career - way back when I was coding massive invoicing systems on WYSE32 green screen terminals in 8-bit COBOL - that radically changed the way I approach the coding process is this:Write your comments FIRST.
Yes - first. That's BEFORE you write any code. It's a very handy way of planning out your structure, algorithms, and data flow on complex chunks of code before anything is set in stone.
Otherwise, if you dive right in there and start coding away, you often realise halfway through that your first approach is not going to work, and you have to go back and change THIS structure, change THOSE parameters, add THIS call, take THIS chunk out into a separate function, etc etc...
I'm convinced that the vast majority of bugs in code are not due to lack of syntax knowledge, or even typos, but actually down to insufficient planning - not thinking the code through well enough before starting. Of course, this happens for many reasons - usually down to time constraints and/or scope creep - but the end result is the same.
Try writing your comments first - you should be able to walk through your process, follow the data, and catch most of the most common gotchas before you even write a single line of code. If you can't, then you haven't written your comments in enough detail.... Of course, you're always going to have to do the occasional rushed hack job, but I've found this approach tends to actually save me time in the long run.
Give it a go - <mrs doyle>ah go on.... aaAAAhhhh go ON, go On, go on go on go on....!</mrs doyle>
Wednesday, July 27, 2005
Tuesday, July 19, 2005
Thursday, June 30, 2005
We explained that time spent here was saved many times over in the final build, and that when you have a really good FULL prototype, it leads you naturally to a complete architecture, and that doing it this way meant that the user experience was driving the back-end rather thatn the other way round, etc etc.... but he still wasn't convinced.
It turned out that although he was experienced in managing software projects, they were nearly all Windows desktop-based GUI apps, and that's how he was looking at this web system:
In the Windows GUI world, a "Prototype" is generally something that you whip up quickly in something like Visual Basic, but then completely throw away when it comes to the final build stage, which is typically done in C++. This was the root of his concern - he didn't want to spend the majority of his budget on something which he thought would be thrown away once complete.
So we explained that the display templates we develop during the Prototype stage can (with a little foresight and hopefully not too much tweaking) pretty much be dragged into the final app, and he brightened up considerably. Once we decided to rename the Prototype stage as "Front - End Development" he was sold.
I guess it goes to show the power of a name, and the danger associated with differing interpretations of what seems a fairly innocuous word. But then again, eliminating ambiguities from textual descriptions of abstract notions is part of the whole motivation for using a "Prototype" rather than a Requirements Document in the first place.... makes me wonder how many other times the words we use have counted against us, without us ever even knowing it.
Friday, May 20, 2005
"An internal error occurred while showing an internal error" ???
Aww, poor love....
Tuesday, May 10, 2005
(The colours and system names have been changed to protect the guilty...)
The figure below provides a high level site map for the «app name deleted» application:
There. Perfect. I've racked my brains trying to think of an appropriate comment to make, but.... I just can't add anything to it.
I have a system that needs the same code deployed across 1 Win2K3 box, and 1 Redhat box. Same code on each.
In this system, I use a custom 404 to allow the site owner to specify an arbitrary "short link" for a CMS page.
e.g. Owner of site X creates a page with the title "About Us", and gives it a short link of "about".
He/she can then use the url "http://(site domain)/about" to access that page.
Simple enough, shouldn't be anything complicated there, right? I just have a custom 404 that uses CGI variables to parse the requested URL and redirect to the appropriate page, right?
It works fine on the Windows box, but for some reason, Apache is not passing on the requested URL to the 404 page. I've dumped the CGI scope in the 404 page, and I get nothing mentioning the requested URL. Nada. Zip. Bugger all.
QUERY_STRING is empty, and although the Apache documentation says it will provide a whole load of REDIRECT_xxxx variables, I don't get any of them in CFMX.
I freely admit I'm not an Apache expert, so if anyone out there IS, and they know what I'm missing - can you help?
Red Hat Enterprise (9)
CFMX Standard 6.1, updater 3
UPDATE 8th August 2005: FIXED!I discovered that although a cfdump of the CGI scope doesn't show the existence of the REDIRECT_URL key, if you just test for existence with StructKeyExists( CGI.REDIRECT_URL ) it *is* actually there, and you can use it.
....which raises some curious questions about how cfdump is working under the hood....
Friday, April 08, 2005
Since we agreed a few months ago to give it a try, we've had a couple of attempts at using FLiP on projects that I haven't been involved in, but somehow it hasn't quite worked out as planned - and in both cases I'm convinced that the problem wasn't the process, it was that we hadn't applied the process thoroughly enough.
In the first case, it was a major upgrade to an existing application. This application was coded years ago, in CF5, in a page-based style - no Fusebox or any other methodology - and no-one could claim to have a full grasp of the existing functionality. It turned out that what had been signed off as a Prototype was only about a a small percentage of the final application - more like a design mock-up of about three or four pages. As a result, it went seriously over budget and a couple of months after go-live, we're still discovering features that were in the original version but not in the new release, and we're still fixing issues on a regular basis.
The second case was much better - a proof-of-concept (i.e. didn't have to be production quality) app that was a fresh build, but on a tight time scale. Again, I wasn't really involved in the development, just on a technical advisory level, but I tried to ensure that we followed a better process.
Anyway, on the day of the all-important demo to the client, I was feverishly coding away on another project when I heard the main consultant behind me saying to a designer:
"Can we change that font? Can we bring this out more and change that colour there...?"
When I interjected with:
"these are things that should have been sorted out in the prototype phase!"
the consultant replied
"Yeah, but the thing is, we couldn't really get a feel of the flow from the prototype, and it's been such a tight timescale...."
.....so the process was circumvented "just this once".....
The thing is, it's a fair point that he made, but "just this once" is a slippery slope. Before you know it, there'll be another "just this once", then another, and then it becomes the norm rather than the exception.
In the end, the client demo came and went very smoothly - the client was actually blown away by what we'd developed. He was expecting an extremely rough proof-of-concept, and we gave him a damn-near finished app.
There are lessons we can learn in relation to the consultants issues for the next project:
The tightness of the deadline shouldn't really be an issue. I've successfully used a full FLiP wireframe/prototype/build process on a site with a timescale of ONE AFTERNOON before, and found it an invaluable tool for focussing the minds of developers and project managers/business owners/clients alike onto
- what are we building?
- (just as important) what AREN'T we building?
- how should it look?
- how should it behave?
Flow and Feel
If you couldn't really get a feel of the flow of the app from the prototype, then the prototype wasn't as complete as it could/should have been. If you need to highlight a newly-added element, for instance, then you should have a page on the prototype that shows this - either a separate page, or a simple parameter passed to the existing page.
Any time the finished app will need to present something differently to the user, this should be reflected in the prototype. This may be tedious, and may feel like nit-picking, but it's important - the process of implementing this in the prototype means that you've
- identified a feature that needs to be in the final app
- thought about - and got feedback on - how it's going to look and behave
- at least thought about how it's actually going to be implemented
If the reason for not doing it in the prototype was time constraints, then you may have a problem - if it's going to impact your deadline to just mock it up in HTML, then what does that tell you about the final dynamic version?
I've actually come round to thinking of prototypes not as mock-ups per-se, but as front-end development. It's a bit of a subtle mental shift, but it helps me to justify the time they take to myself and others - and I've yet to come across a project where a decent prototype has been done, but I can't re-use the front-end code in the finished app. Sure, you'll probably need to change it slightly and add more CF, etc, but I've never felt like it was wasted time.