Wednesday, September 22, 2004

UDF for Converting QueryBean objects to CF Query objects

I've blogged before about the difficulties of returning complex data in CFMX webservices
After many hours of banging my head on the desk, I've come up with a scheme that actually seems to work. I'll leave the details of it to another, more expansive post, because I've just written a function that I found extremely useful and I wanted to share it.

In my WS scheme, I'm making all my webservice methods pass back a generic object (wsgeneric.cfc, or a subclass of that) which has three properties
  • a status code
  • an error description (if an error occurs)
  • the actual returned data - which may be anything
One thing I've found a little irritating is that when the actual data is a query, you don't get an actual query BACK. What you get is a java QueryBean object, which doesn't have the nice convenient syntax for referring to its elements that a CF query has.

So, I whipped up a UDF for converting a QueryBean back to a CF query, and here it is:

<cffunction name="queryBeanToQuery" access="public" returntype="query" output="yes">
<cfargument name="objQueryBean" type="any" required="true"/>

<cfscript>

var qry_return = "";
var arrColumns = ArrayNew(1);
var arrRows = arrayNew(1);
var thisRow = 0;
var thisCol = 0;
var numRows = 0;
var thisVal = "";

if( objQueryBean.getClass() EQ "class coldfusion.xml.rpc.QueryBean" ){
arrColumns = objQueryBean.getColumnList();
numCols = arrayLen( arrColumns );
arrRows = objQueryBean.getData();
numRows = arrayLen( arrRows );
// create the return query object
qry_return = QueryNew( ArrayToList(arrColumns) );
// loop round each row
for( thisRow = 1; thisRow LTE numRows; thisRow = thisRow + 1 ){
QueryAddRow( qry_return );
// loop round each column
for( thisCol = 1; thisCol LTE numCols; thisCol = thisCol + 1 ){
// empty columns seem to give rise to undefined array elements!
try{
thisVal = arrRows[thisRow][thisCol];
}
catch(Any e) {
thisVal = "";
}
QuerySetCell( qry_return, arrColumns[thisCol], thisVal );
}
}
return qry_return;

} else {
writeOutput( "THATS not a query bean, it's a #objQueryBean.getClass()#!" );
qry_return = QueryNew("");
return qry_return;
}

</cfscript>
</cffunction>

It's not really production code, but it's saved me a lot of time, and hopefully someone out there will find it useful.

Enjoy!

Wednesday, August 11, 2004

Next Steps for CFMX

Having been using CFCs for a while now, it's pretty clear that they're:
  • An excellent way to increase code reuse and maintainability
  • A good way to allow developers to code in a KIND-OF object orientated manner
  • A great leap forward for CF in general
  • A sign that CF is maturing into a "proper", "serious" programming language, rather than just a scripting language
BUT, as with any great leap forward, the CFC mechanism is also -
  • Immature, and full of niggly bugs!
From my own (highly-biased) point of view, there's a few improvements that could be made which could really raise the bar for CF, and boost it into the "grown-up" strata currently occupied by languages such as Java:

1) Proper variable scoping in CFCs
The current situation regarding CFC variable scopes has been the subject of much confusion and heated debate (For a good summary, see http://cfguru.daemon.com.au/archives/2002_08.html)

I believe that in order for CF to move forward, there should be a clear-up of the scopes, making it more explicit what is public and private. There is already a prefix ( "var", e.g. ) to make variables explicitly local to a method, so how about tidying this up a bit?

Also related to this topic is the CFPROPERTY tag: this tag serves only to define meta-data for exposure through web-services. It seems to me that strengthening this tag would be an ideal way to tackle the scoping issue.

Suggested rules:

Any variable within a CFC that is not explicitly declared with the CFPROPERTY tag should be local by default.
This would mean that it goes out-of-scope and is destroyed at the end of the function within which it is created.

Add an "access" attribute to the CFPROPERTY tag that would function the same way as the "access" attribute in the CFFUNCTION tag
Allowed values would be public, private, and package.
Any instance properties of the CFC could then be explicitly declared as public/private/package, without all the confusing, fiddly variables. / this. palaver.

(note for americans - "palaver" is a yorkshire term, meaning an unnecessary fuss - a Yorkshire version of Much Ado About Nothing might be called A Right Palaver Over Nowt!)

The "this." scope should only refer to properties of the CFC
An upshot of this would be that any attempt to add a this. scoped variable which is not defined as a cfproperty would throw an error.


2) Proper Inheritance and Polymorphism
The inheritance mechanism in MX as it stands is, well.... kind-of OK, but not complete. It has a couple of oddities, most of which are related to the variable scoping issue above.

For instance, you can't call methods on a parent CFC from the child CFC with named arguments, or an argumentCollection. You must pass unnamed arguments in the order they are defined.
( see Martin Laine's post on Issues With Calling Super Method - many people have been through this ordeal! )


This makes what would have been a nice elegant parent/child "is a" relationship into an ugly kludge - for instance, in the init method, you can't pass on the arguments collection from the child to the parent.
You have to pass unnamed, ordered arguments instead. This means that the child class has to "know" about the composition of the parent class, which spoils maintainablity, and just feels ... "icky". Every time you add a new property to the parent class, you have to explicitly add it into the child class aswell. Ugh.


Of course, the trouble with tightening up the variable scoping is that, unless it's thought out VERY carefully, it could break compatibility with existing code, which may well be why it hasn't been done yet. I'll leave that as a problem for those MM developers with access to the source code ;)











Wednesday, July 14, 2004

Arbitrary Complex Data in CFMX Web Services

I've been banging my head against a brick wall with this for a couple of days now, and it's finally dawned upon me that I'm trying to achieve something that the existing mechanism just isn't designed to handle:

I want to put a web-service interface layer on my CFCs.
I originally wanted to have my web-service interface CFCs in a sub-directory under the webroot, so they were publicly accessible but had their own Application.cfm; I also wanted to keep my back-end CFCs outside the webroot.
(maybe it's my background at an IT Security company, but the thought of putting my back-end business logic and data components in an environment where anyone can get at them just makes me feel queasy)
But, after spending an age battling with inter-component references and JRUN mappings , all to no avail, I eventually condeded defeat.
Looks like you HAVE to have your WS interface CFCs in the same directory. Mappings just get ignored.
Oh yeah, and don't use underscores in your CFC names. They won't work.
Oh, and don't use mixed-case either.

Nice one MM...

Anyway, I decided that I'd deal with security later (my old colleagues at the security company would be horrified, but sometimes you just have to suck it and see, y'know?)

Onto the main issue:

I wanted to create a CFC type to be returned from all web service methods, which wouldhave these fields:

  • intReturnCode

    an integer constant, indicating whether the request succeeded, and if not, what went wrong.

  • vcErrorDescription

    a description of the error, if any (e.g. "Could not find the person you asked for" )

  • objReturnData

    This would hold the actual data to be returned, whatever that may be - a query, an object, an array....whatever.


All sounds fine and dandy, and I had a nice object model designed to achieve this - then I started trying to get it work.

Over the last two days, I've seen more hair-pulling, teeth-clenching, blood-pressure-heightening, eye-popping, chair-kickingly frustrating error messages than I have done for years. There's been some great ones, from the old favourite

"Could not generate stub objects for web service invocation"

through

"Web service operation 'listPersonsByInitial' with parameters {VCEMAIL={[whatever]},VCINITIAL={[whatever]},VCPASSWORD={[whatever]},} could not be found."
...just after adding the web service 'listPersonsByInitial'

but this is my personal favourite:

"java.lang.IncompatibleClassChangeError : Dependent CFC type(s) have been modified. Please refresh your web service client."
When did I first get this error? When I was refreshing my web service client page, because I'd modified the dependent CFCs. GRRRRRRRRRRRRR! I seemed to get this every five minutes or so, even when I hadn't even changed the code!

Anyway, after swearing a lot, spending ages googling for other people with the same errors ( try googling for 'coldfusion "java.lang.IncompatibleClassChangeError"' ) and coming up with a few people with similar problems but NO applicable solutions, I posted some messages to the CF-Talk mailing list. After a few replies along the lines of "er....have you tried (one of the first things I tried) ?" I ended up saying:

"If anyone can come out with a 'I've done this and it worked first time, and I've never had any problems with it...' I'll be eternally grateful if they can walk me through it. Otherwise, I'm giving up - this is way too much of a headache."

That was yesterday. To date, no one has....

But it dawned on me just now that I'm actually trying to get the engine to do something which it fundamentally wasn't designed to do.

It seems like the WSDL which describes the web service gets cached on the client. If you want to change the web service, every client has to update their WSDL. Which is fine, I guess it makes sense that way, and I could live with that in development if it worked consistently.

The problem is that in order for a web service to be able to understand your data, the WSDL needs to describe any non-native type (pretty much anything more complicated than a string or basic number) right down to the last detail. What fields are in your complex data, what type they are... so if you're trying to return arbitrary complex data, you're not enforcing a rigid type in your data, you're leaving it to run-time. So the WSDL can't possibly describe your data ahead of time.

Bottom line: it's my own fault. I'm trying to push the SOAP/web services mechanism way outside of its scope. What I actually need is a whole host of facade classes, one for every bit of copmlex data, even if it's only slightly different.

Bugger.

Back to the drawing board then....

Tuesday, July 13, 2004

General ANT build and release scripts

While pottering about with SVN and ANT, I discovered - yes, you guessed it - SVNANT, a plug in custom task for ANT which provides integration with SVN repositories.

So, I had a bit of a play, and came up with two ANT scripts that I thought I'd share, to save other innocents the buttock-clenching frustration that ANT development on Windows can be.

The first script is to be run on your development machine, and it's called build.xml

This build script:

  • Connects to a given repository repository

  • Exports all files in a given revision from the given folder to a temporary directory

  • Zips up all files within that temporary folder which have been modified since a given date

  • Clears up the temporary folder


…producing a timestamped zip file containing all files changed since a given date.

You can then upload the zip file to the live server and extract it over the live code using the second script, release.xml.

This second script:

  • Scans the build/ directory for build_*.zip files

  • Asks you to choose the build file for release

  • Backs up all files contained in the zip file to the build/ directory

  • Extracts the build_*.zip over the top of the existing files


giving you a backup file called BACKUP_(whatever your build file was called).zip, which you can easily re-apply if something goes wrong with your release.

So, to run the build.xml file on your dev machine, you will need :

You can download all the above in one package here : build_release_files.zip

I'd recommend that you put the three files in the root of your project - most projects I've worked on tend to have a structure along the lines of:

Project Root
- wwwroot
- components (or custom tags, or whatever)

so i've written the scripts on the basis of them being in the project root.

One thing to note : you'll need to set the repoURL value in build.properties to the default url of your repository in SVN, something like svn://yourserver/yourrepository/trunk

To run the script, just open up a command prompt, cd to the project root directory, type:
ant


and follow the on-screen prompts.

This will produce a build file in the Project Root/build/ subdirectory, which will be created if necessary.

On your live server, you'll need a working installation of ANT, plus this file : release.xml

Again, put this in your project root directory on the live server.

To run it, again open a command prompt and cd to the project root directory, but this time type:
ant -f release.xml


to make ant execute the file release.xml (by default it looks for build.xml)

It will look for zip files in the Project Root/build/ subdirectory, so make sure your build .zip file is in there. Again, it prompts you for everything it needs.

Standard legal disclaimer bumph: these files are provided as is, use them at your own risk, if it makes a mess of your server, it's entirely your own fault and I'm not accepting any responsibility whatsoever. Nope, none. Sorry...

Happy scripting!

Thursday, July 08, 2004

Version Control with Subversion and TortoiseSVN

Been looking at version control systems for use with our CF development. After a lot of investigation, I decided to go with Subversion (SVN) for the server, and TortoiseSVN for the client.

These two nifty little tools are free (as in beer) and open source, and they're self-hosting : i.e. the source code and version control for Subversion is done with... Subversion. SVN is intended to be "a new version control system that is very similar to CVS, but fixes many things that are broken", and I'd say it does that job very well.

I've also found the documentation to be pretty good, which is something you don't always get with open-source projects. The TortoiseSVN docs in particular are excellent, I actually found myself looking through the TortoiseSVN docs for info about Subversion rather than the Subversion docs themselves! It's probably the fact that it runs as a Windows Explorer plug-in that makes it feel very intuitive and easy to get to grips with.

Mind you, that's not to say either of them are perfect. There's a few niggles and things that you have to work around - for instance, SVN doesn't support a way to show/export ONLY the files that have changed since a particular revision or date, and setting up an initial repository can be a bit confusing until you've figured out what's actually going on - but they're a damn good place to start. There's also a Subversion API for integration with custom programs and developing addons. I found the svnant package, which gives you a custom ANT task for use in automatic build files to be excellent.

At my last job we used StarTeam, which again took some getting used to, and was also good, but far from perfect -he lack of an explicit branching method was particularly frustrating. But StarTeam costs $699 for just the Standard version, and if you're looking for something free, open-source, widely used, well-documented and fairly easy to get to grips with, you'd have a hard time finding a better system than SVN.

I'll probably blog more details about my issues, solutions and workarounds here in future - stay tuned...

Wednesday, June 23, 2004

Collecting Your Garbage in CFMX

I came up with a handy little trick the other day, while working on a Dreaded Import Job. I wouldn't recommend relying on it, but it's a nifty last-ditch "kludge" if the pressure is on and the deadline is tight.

I was looping round a few hundred records from database X to import them into database Y. Each time round the loop, depending on the data, it was grabbing lots of related records from DB X, and performing all sorts of checks on existing records in DB Y via CFC calls.

The issue was that every time it ran, it would slow to a crawl after a couple of hundred records. A little investigation showed that it was hogging server RAM at a quite frightening rate - maybe two or three MB every second - and never releasing it until the job finished. As the available RAM decreased, the rate of processing got slower and slower, and a hasty back-of-an-envelope calculation showed that it was extremely unlikely to finish the job in finite time.

Hmmm....

A bit more digging uncovered that the problem was most probably due to some less-than-optimal memory usage inside the CFCs. For instance, the "view" method, to retrieve a record from the DB and return an instance of the CFC with the properties populated, was following a process something like this -

- Get the record from the DB
- Create a new instance of the CFC
- Populate the fields
- Return it.

Hands up who spots whats wrong with that? Anyone? What happens to the new instance of the CFC? When does it get destroyed? Ah-hah....!

--- ASIDE -----
Variable scoping, memory management and the lifetimes of objects are critical issues in a "proper" object-oriented language like Java or (particularly) C++. In fact, one of the main motivations behind the development of Java in the first place was to provide a way to free devlopers from the headaches of manual memory management in C++.

Having written a few fairly complex apps in both languages, I can testify that having to manually allocate AND free the exact number of bytes that you need, in C/C++ is both a curse AND a blessing - it's very easy to write code that will allocate memory and never release it, and often very difficult to track down the cause of the problem when it happens. The upside of it is that it forces you to be very aware of what the code you write is actually doing on a very low level. As a result, variable-scoping is very well defined and documented in C++ - it HAS to be.

Java, on the other hand, uses a different approach. Because a Java app runs in a virtual machine, it allows the use of a Garbage Collector. Developers can happily create new instances of objects with abandon, knowing that every so often the system's garbage collector will check for objects that have gone out of scope, or have no remaining valid references to them, and destroy those instances and free the memory they were occupying back to the heap. This gives developers a lot more freedom, at the cost of losing the hands-on control that you get with C/C++. And sometimes, when things aren't quite working as they should, you can really miss that low-level control.

As CFMX is now a 100% Java application, and CF code is actually compiled into Java classes for execution, this means that intensive, long-run-time code needs to rely on the garbage collector to free resources, but the lifetime of CFC instances and the point where they go out of scope is extremely difficult to pin down - it doesn't seem to be clearly documented anywhere that I've found in a couple of hours of solid Googling.

--- /ASIDE ---


When the above method is called hundreds of times in a loop, it looks to be creating hundreds of instances of CFCs that never seem to die. As stated above, it's very very difficult to find any documentation on this, so all I have to go on is educated guesswork, and hunches.

My hunch is that CF doesn't explicitly run the garbage collector until the end of a request. It's a fairly reasonable way to have designed it, given that 99% of all CF requests are intended to display stuff to a browser within a short amount of time. It's not a language that's really designed for long back-end tasks.

What the view method above should really have been doing was more like -

- Get the record from the DB
- Populate the properties of THIS INSTANCE
- Return this

It's quite a subtle difference in approach - object-oriented versus procedural - and with CFCs still being a pretty new advance in ColdFusion, it's not something you'd be likely to spot without having developed object-oriented code in other languages, and come across similar issues.

When faced with a situation like this, the "correct" thing to do is to re-develop the CFCs to do it properly. However, that takes a lot of time for re-coding and retesting the whole application. That's the downside of centralised, modular code - when you change a bit of modularised code that gets called from all over the place, you have to re-test all over the place!
The deadline, unfortunately, was very tight, so a quick "kludge" had to be found. And here it is -

We can use the fact that CFMX is 100% Java to our advantage here, by FORCING the garbage collector to run. It's actually quite simple:


<cfscript>
// NOTE: I'm typing this code from memory, on a laptop with
// no access to the code, so apologies for any thing that
// may not be 100% correct. The aim here is to illustrate
// the principle, rather than give a copy-and-paste solution

// Create a java System object
objSystem = CreateObject( "java", "java.lang.System" );

// How often do we want it to run?
if( NOT isDefined( "attributes.forceGCEveryNLoops" ) ){
attributes.forceGCEveryNLoops = 25;
}
</cfscript>

<!--- get import data --->
<cfquery name="qryImportData" datasource="whatever"&gt;

</cfquery>

<cfset intLoopCount = 1>

<!--- start of big loop --->
<cfloop query="qryImportData">

<!--- do lots of complicated stuff here --->

<!--- do we need to run the GC? --->
<cfscript>
if( intLoopCount GT attributes.forceGCEveryNLoops ){
// run the GC
System.gc();
intLoopCount = 1;
} else {
intLoopCount = intLoopCount + 1;
}
</cfscript>
</cfloop>



And that's it! Yes, it's a kludge, it's a cheap-and-nasty workaround to cover over fundamental flaws in the code. But on deadline day, it might just save your skin until you get time to re-do those pesky CFC's properly.


Thursday, April 22, 2004

CODING METHODOLOGIES ARE A GOOD THING

As a developer with 8+ years experience now, I've worked in several different environments with several different coding styles and methodologies - sometimes the actively chosen methodology is "None At All". I've also been asked to take over many different projects at many different stages of completion, with wildly varying results.

Having just changed jobs, I now find myself in that familiar situation again - we have a large project just nearing completion, the client has asked for numerous amendments which must be completed ASAP, and the existing techies don't have the time to sit down and go through the codebase in any great depth with me.

Which brings me to my point - Coding Methodologies Are A Good Thing.

I should probably elaborate a bit here - I'm a web developer, and of the various programming languages I can program in, my primary skill is ColdFusion (CF), and my preferred methodology for CF development is Fusebox. I'm not going to get involved in the whole "is CF a 'proper' programming language?" debate here, or even the "my favourite methodology is better than your favourite methodolgy" slanging matches which frequently clog up the CF community forums. My point here is a general one - using ANY recognised coding methodology, even if it's customised, is nearly always better than having none at all.

I've heard many arguments against using methodologies, including the following :


  1. "Why should we let someone else tell us how to do things?"

  2. "I don't want to go searching through lots of little files for the bit I need - I want all the code for one page in one file, so I know where it is"

  3. "We'd need training on it, and we can't spare the time or money for that"

  4. "Fusebox (or methodology X) has too much overhead - we want to be able to totally fine-tune every single page for performance"

But consider this - at my previous job, there was no methodology in place. When I joined, we'd just taken over a complex online community application (which I shall refer to as Project X, to protect the guilty ;-) ). I was there for two and a half years, working on it every single day, and when I left, there were still great swathes of the codebase which no-one clearly understood and had to be treated like a black box. Ostensibly simple amendments would take days, sometimes weeks, and it was virtually impossible to have confidence in the code. Releases became a matter of "cross your fingers and pray that it works".

At my new job, I have just been handed the codebase for another complex online community application, similar to Project X but with even more complex functionality. However, THIS code was written to a methodology - a customised methodology, but still a methodology. One sentence was all that was needed to introduce me to the code - "it's Fusebox 3, but with CFC calls instead of act_ and qry_ fuses". This may sound like gobbledigook to the uninitiated, but with that one sentence, I instantly knew -


  • How the directory structure of the code would break down

  • Where to look for global-scope variables and module-specific settings

  • How the framework of the application was put together

  • How application flow would be handled

  • That interface code would be almost completely separated from back-end code, greatly simplifying the task of amending either

  • How the documentation would be structured

  • Where I would need to look to find out any more information

Of course, as an experienced Fusebox developer, I had a distinct advantage over a non-Fusebox developer. But the point is that if it had been "CFObjects with X instead of Y" or "Struts with X instead of Y" or indeed any recognised methodology with some customisations, I could have spent a few hours getting up to speed on how that methodology works, and been able to make a start.

I produced my first constructive code within a few minutes of getting my hands on the existing codebase, and my first amendments went live within a couple of hours.

Now to address the arguments against methodologies listed above:

"Why should we let someone else tell us how to do things?"

Why not take advantage of other people's existing work in figuring out the framework, and let your developers focus on what the application needs to achieve, rather than how it should be put together?

"I don't want to go searching through lots of little files for the bit I need - I want all the code for one page in one file, so I know where it is"

Fair enough, but that one file may well end up several thousand lines long, performing many different tasks. Personally, when I need to amend some functionality, I would much rather be able to zero in on one small, distinct file that performs the specific task I am interested in. Maintenance and debugging become much more simple when you KNOW that you have, for instance, changed the interface code which displays the results of an information update, but you have not touched the code which performs the update itself, or the code which reads back the altered data. Especially in larger organisations, when many different developers need to work on the same areas of the application - often simultaneously - enforcing a physical separation of interface code from functional code, via a design pattern like Model-View-Controller or n-Tier prevents change management from spiralling out-of-control very quickly. It also facilitates developer specialisation - for instance, a developer who is strong on interface design but not so good at SQL can focus on display files without being distracted by all the back-end code, and vice-versa - in other words, you can focus on the task in hand.

"We'd need training on it, and we can't spare the time or money for that"

Many studies have shown that the biggest component of software costs is ongoing maintenance. A little investment now, in terms of training on a methodology, can pay HUGE dividends in terms of savings on maintenance. Also consider the time lost in training new developers on existing code - every hour spent reading documents or trawling through spaghetti code is an hour that could be spent coding those new amendments for the client.

"Fusebox (or methodology X) has too much overhead - we want to be able to totally fine-tune every single page for performance"

Yes, Fusebox, and probably every other common methodology out there, introduces a small overhead in terms of framework code, when compared like-for-like with a single file that performs the same task. But this overhead is minimal, of the order of a couple of milliseconds, especially with Coldfusion MX compiling code into Java and caching the compiled classes in server RAM. Besides, if your site has so much load that an extra millisecond or two on a page hit really is a big problem, then servers can always be upgraded - a thousand pounds spent upgrading the CPU and memory of a server is nothing compared to the savings made on developer maintenance time through using a known, flexible, extensible methodology.

It is well known that a common killer of software projects is so-called "Feature Creep", whereby more and more features are tacked-on to a system which ends up being almost unrecogniseable from the original specification, and often becomes a nightmare to maintain. As the complexity increases, so does the time required to add another feature, find and fix bugs, and train a new developer, often to the point of the system eventually being scrapped and replaced from scratch. Hal Helms, one of the original designers of Fusebox, has argued that Feature Creep is a perfectly natural phenomenon which can be accounted for early on in a project through the extensive prototype phase of the FLiP methodology (Fusebox Lifecycle Process). Having used this methodology myself, I can vouch for its effectiveness - the one project for which I was able to use FLiP in its entirety was delivered by a team of four developers, two weeks ahead of schedule, to complete client satisfaction, and to date has proved virtually bug-free. Using a standard methodology like Fusebox also means someone else has already done the dirty work of figuring out how to design for extensibility in the code, making future additions and alterations far less difficult.

Of course, I would never claim that any methodology is appropriate for every project - methodologies change, adapt, and evolve, as the limits of each are found, and new solutions developed. The latest version of Fusebox, which was intended to take advantages of the Object-Oriented (OO) approach inherent in Coldfusion MX Components (CFCs), evolved into something rather different - Mach II, an OO methodology utilising Implicit Invocation and an event-listener based architecture. The design of software methodologies, like the design of software itself, has been and always SHOULD be an evolutionary process. The choice of methodology for a particular project will always come down to a balance of existing skillsets, personal experience and individual preference - and there may well be a project out there which logically needs a completely individual methodology of its own, for which no existing architecture or design pattern will do, or can be adapted..... however, in eight years of designing and developing enterprise-class software projects, I haven't found it yet.