Wednesday, December 20, 2006

(NSFW) Taking "User Generated Content" a bit too far

The Register is reporting on a heroic altruist's solution to a sticky situation regarding copyright-infringing contributions to Wikipedia :

Wikipedia semen shortage filled by User Generated Content

....well, it's certainly very public-spirited of the honourable member.

Wednesday, December 13, 2006

Going loco with Mo'MoSoSo

Nice to see that m'former'colleague Rik Abel is valiantly continuing his one-man crusade to get the gloriously-silly acronym MoSoSo more widely adopted, whilst simultaneously pointing out its majestic silliness in an ironic, post-modernist, self-referential way. Or something....


Don't get me wrong, I think Mobile Social Software could well be the killer app of Web 2-point-whatever-we're-up-to-now, but seriously folks - MoSoSo? That's Well Jackson!*

Make sure you don't target your MoSoSo applications at the Small office / Home Office market, 'cause that would be SoHoMoSoSo.

And if your users wore black eyeshadow and jaggedy-cut hairstyles, would that make it SoHoEmoMoSoSo ?

And if it was in a Bohemian New Romantic style...? (BohoRomoSoHoEmoMoSoSo)

Yikes, this one could run...


* - look here if you don't get the reference

Tuesday, December 12, 2006

Trusting The Magic Pixies - Hibernate HQL

<rant>
I'm having trouble trusting the Magic Pixies. I admit, it's a common theme of mine, and maybe it's just NIH syndrome-by-proxy, but Magic Pixies - able and willing though they are - can only do your job for you if you ask them in just the right way. And sometimes they seem like they're being deliberately dumb and obstinate little buggers that just won't do as they're told, dammit!

(ahem) Perhaps I should elaborate here...

The Magic Pixies in question here are the ones that do the "hard" work for you in Hibernate, the near-as-dammit de facto standard ORM system for Java. Or Persistence Management. Or whatever term you prefer. Pete Bell has been blogging in great detail about the process of writing his own cf-based ORM framework, so if you're not familiar with ORM systems then you're probably best going to check out his blog, because this is a pure and simple straightforward bit of spleen venting.

The basic idea of Hibernate is that you can write your object model just as you choose, and then map your objects to persistent entities using (surprise surprise) an XML config file - or, if you're using Java 5 and EJB 2 and Hibernate 3, then you can dispense with the XML config file (yay!) and use annotations instead (double yay!) So long as you construct your mappings correctly, then you "don't need to worry about" the SQL - the Magic Pixies of Hibernate will auto-generate your db schema and SQL queries, and magically do your CRUD for you with a sprinkling of their Magic Pixie Dust™.

"But what if I want to do something a bit more complex than basic CRUD?" I cried.

"Like what?" said the Magic Pixies

"Like a left outer join?" I replied

"Oh, you don't need to worry about all that nasty SQL" said the Magic Pixies, " because that would tie your code to your database system, and that's a BAD THING! BAD Al! BAAAAAD!"

"Gosh, sorry Magic Pixies," I said, rather sheepishly, "I promise to use that nice abstracted Criteria API that you so generously provided in future"

"And so you should!" said the Magic Pixies, "Remember, you do the code, we do the data, otherwise we'll have harsh words with our union rep, OK?"

"Ok! Ok!" I said. "Now could you stop hitting me with that rolled-up newspaper please?"

"So long as you promise to be good"

"I do! I do!"

"Ok then"

"But what if I want to do something more complex than that?" I asked.

The Magic Pixies looked a bit puzzled.

"What on Earth could you want to do that's more complex than that?" they replied.

"Well, what if I wanted to find a set of objects of type X that didn't have any corresponding objects of type Y that match certain criteria?"

"Pfffft!" said the Magic Pixies, " that's easy, you just create Criteria along the association paths!"

"Er, huh?" I said.

"You just create a Criteria object for class X, and then create another Criteria object using that Criteria object by passing the name of the property of class X which refers to the encapsulated class Y contained within class X!"

"Huh?" I said.

"Or if you really want to, you can use HQL"

"HQL?"

"Yes, Hibernate Query Language. It's almost like SQL, but not quite. Because we wouldn't want you using SQL - SQL's tied to database platforms, and that's BAAAAAD"

"So how do I do it HQL?"

"You create a query that queries along the association paths and properties of the objects"

"Oh, right, OK" I said. "But what if class X doesn't have class Y as a property?"

"Er..... huh?" said the Magic Pixies.

"Class X has no property that refers to class Y"

"Well then, you won't need to query for it, will you?" said the Magic Pixies, a touch too smugly for my liking.

"But I do!" I insisted.

"Er.... huh?" said the Magic Pixies.

"Well, say if I had a table of ItemLinks...." I began.

"A WHAT of ItemLinks????"

"Sorrysorrysorry! I mean an ItemLink object..."

"That's better!"

"...that represented a weighted link between two Items, such as might be calculated by some very complicated fuzzy logic and Natural Language Processing"

"OK"

"...and a separate LinkPreference object that represented an preference expressed by a Person as to whether their ItemLink to a particular object would be public or not"

"Erm... can you give us an example?"

"Sure - this clever NLP stuff might detect that Bob from SysAdmin has been talking a lot about clustering database servers, and he might want to share that link so that he is known as an expert in that field."

"OK, with you so far..."

"But it might detect that the boss has been talking with his secretary about a dirty weekend in Brighton, and they really wouldn't want that shared at all, would they?"

"Erm, isn't that an outmoded stereotype that just reinforces age-old gender-typecast notions of sycophantic star-crossed secretaries as prey for the equally-stereotypical notion of amoral boss-as-alpha-male-predator?"

"Alright, alright, but you get the idea!" (that Magic Pixie was really starting to get on my titty ends)

"Yes, I follow you"

"So what if I want to query for all ItemLinks that have been created in the last, say, two weeks, and that don't have a corresponding LinkPreference?"

"Well, you could query for ItemLinks that have LinkPreference set to null"

At this point I was starting to snort quite heavily.

"But I told you, ItemLink doesn't have a property that refers to LinkPreference! The two are completely independent!"

"Well then you shouldn't want to query for them" said the Magic Pixies

"But I DO!"

"Well, can't you follow the association path up from ItemLink to Item and then down to LinkPreference?"

"Well, yes, I could, but wouldn't that result in the Items table being read in the query when there's absolutely no need for it?"

The Magic Pixies looked down at their feet

"...might do..."

"And isn't that horribly inefficient?"

They started fiddling with their shorts

"...might be..."

"And it's not that simple anyway, because it's a compond join on TWO properties!"

"...yes..."

"WHAT was that?"

"yes!" said the Magic Pixies, with bottom lip sticking out.

"So can you perform this raw SQL query in your own way?"

SELECT item_links.* 
FROM item_links
LEFT OUTER JOIN link_prefs
ON item_links.item_id = link_prefs.owner_item_id
AND item_links.other_item_id = link_prefs.linked_item_Id
WHERE
item_links.created_at < ?
AND link_prefs.shared IS NULL



"...might do, if you ask us nicely..."

(sigh) "OK, can you pleeeeeeease do it?"

They conferred for a moment in hushed whispers, and then turned back with a very smug-looking smile, and said

"Yes, we can - but we're not going to"

"WHAT?"

"You have to ask us in the right way"

Steam was starting to emerge from my ears

"And what IS the right way to ask you?"

They grinned even wider

"We're not going to tell you!"

And I stormed out of the room.

You see, the trouble I have with ORM systems is that they're all well and good as far as they go, and yes they can save large amounts of "donkey work" But sooner or later you nearly always come up against something that would be almost trivially easy to do with raw SQL, but the nice insulated ORM abstraction just can't deal with. I know that I'm probably looking at this from the "wrong" direction, I'm thinking about the data rather than the objects, but until the Magic Pixies start to play a bit more nicely, I'm always going to be a bit suspicious of them.
</rant>

(deep breaths..... calm.... happy thoughts..... nearly Christmas....)

Friday, December 08, 2006

Friday Brainf**k : How Unique Is A Phrase?

Eep - posts have been a bit thin on the ground recently, due to big crunch time on SONAR, but here's an interesting question that's just cropped up, and I'm not sure of the answer to, and blogging the question might just help me get my own thoughts on it straight....

First, some context. (Yeah, yeah, skip the context and go to the question) I'm writing the DAO's for the system as a layer of abstract interfaces, with a default implementation based on Hibernate, and using the everything-is-an-item pattern that I've blogged about before.

Aside: This pattern, in conjunction with Java Generics - a Java 5 mechanism that's kind of like C++'s Templating system, but without the horrors that infest the STL Standard Template Library - has lead to a really nice way of getting lots of basic operations (e.g. CRUD) "for free", which deserves a post of its own and I'll blog about it next time I get chance.


In this design, for every type of item, there's a corresponding DAO. In the DAO, there's a save() method which either adds the passed object to the DB, or updates it if it already exists.

This save() method is also responsible for throwing an exception if the given object can't be saved, as it would break the application's business rules for uniqueness. So each implementation of the save() method calls an isDuplicate() method, which is defined by default on the abstract ItemDAO, and can be overridden as appropriate on the subclass DAOs. For instance, it's not acceptable to have two Email records with the same messageID, but it's perfectly fine to have two Person records called John Smith - and this is where the interesting question arises...

In our model, a Theme is also an Item. A Theme, in this case, being a word or phrase that has been extracted from the content of an Item due to it being "potentially interesting". There's a whole load of extreme Lisp cleverness being worked on by M'Colleague Craig McMillan, he of the prodigious beard and piratical proclivities, regarding how you determine a word or group of words is potentially interesting, but in the words of Frank Drebin, that's not important right now.

The question is, what makes a Theme unique? On a superficial level, you can say that a Theme is a group of characters that make up words, so that group of characters must be unique. In other words, no two Theme records must exist with the same String in the "Title" field. But if you think about it a bit more, that might not actually be the case.

For instance, the problems of homonyms (a word that has the same pronunciation and spelling as another word, but a different meaning - e.g. "bat" the animal and "bat" in cricket) and polysemy (capacity of a word or phrase with multiple, related meanings that derive from the same etymology - e.g. "bank on it" with bank meaning to rely upon something, which derives from the reputation of bank-the-financial-instution for reliability) are perennial problems for Natural Language Processing. Should we take that into account in the model? Can we?

In the above cases, I can probably take the reasonable position that because the words are spelled the same, I can assume that they are actually related in this sense, and if two emails refer frequently to "bush", they should be pointing to the same DB record for the "bush" theme, regardless of whether they were talking about a US President, a shrub, or the Australian outback - and homonymy and polysemy can just be swept under the carpet. (Hmm, starting to get a lot of bulges under this carpet in the office here...where's my hammer?)

However, when you start to think about the possibility for the system to be dealing with multiple locales and even multiple languages, which may well be the case in large multi-national corporates, another, related, linguistic term starts to rear it's ugly head - the Heterologue.

A Heterologue is a word that occurs in multiple languages, possibly with completely different meanings. For instance, the syllable "bat" in Cantonese means the number eight (at least, when pronounced with a high inflection), and having seen the way the lovely Lisa flits between English and Cantonese sometimes several times a sentence, even in emails where she types out the Cantonese word phonetically with English letters, this may well occur.

It's not just languages that show heterologues either - the same word or phrase in the same language in a different locale can have completely different meanings - the web is strewn with plenty of examples of British / American English ambiguities, the classic one being at Rocom when my newly-immigrated American colleague Stuart asked what Martin and I were doing for lunch, and I replied that we were driving into town so that Martin could pick up some fags, and did he want to come too?

So it boils down to this - should I include the locale of a Theme in the check for uniqueness along with the title (i.e. the actual string) or not? Is a given string of characters unique just within a particular locale, or globally? I'm tempted to say globally, but I have a nagging feeling that at some point in the not too distant future, that choice may turn round and bite me.

Monday, November 20, 2006

Integrating Applets With AJAX

For our new SONAR product, as demo'ed in the Enron Explorer, we needed a way of visualising the social network of up to 80,000 people. Our first thought was of course, Flash, but we found that it just wasn't up to the job* of visualising large numbers of nodes. If only there was a way of combining the scalability of a Java applet with a slick, whizz-bang AJAX interface... well, as luck would have it, there is!

There are certain problems you have to get round, particularly related to the issue of the applet showing through anything that you place over it, and the browser reloading the VM if you hide the applet in any way , but these are fairly straightforward to get round once you've figured it out (hint: try moving it off screen!) - m'colleague Jan Berkel blogs about the technique in more detail here.

It would be overstepping the mark to suggest that we invented the technique as speculative articles have been written on this subject before, but the feedback we've been getting from the Enron Explorer shows that a lot of people are taken most of all by the interface, and as far as we know we're the first to use it in a production application.

All we need now for the meme to take hold, of course, is a snappy acronym....
- APAX?
- APJAX?
- JAPAX?

Hmmm..... there HAS to be an amusing acronym we can tease out of this - all suggestions gratefully received!

* OK, I'm fairly sure that, given enough time, we could probably have found a way of getting round the scalability issues associated with doing it in Flash, but as Jan mentions in the article, we had a large amount of pre-written Java code that it would have been a shame to waste, and what you also get with Java is a vast library of free code and APIs out there, plus complete flexibility over how you use them.

Monday, November 13, 2006

10 Things to Check for Supporting International Characters In Your Web App

These days, more and more of our web apps have to be ready and able to support international characters. It's a non-trivial problem, and one that causes many furrowed brows, because usually by the time you notice it, you've already screwed up some data. I've dealt with it many times, and found that it's generally much easier to prepare for the problem before it occurs, rather than hack a solution together after the fact. This is not a post about i18n-ing your display templates, that's a whole topic in itself, even though some standard mechanisms are pretty well-defined by now. It's about issues involved in storing and displaying content with non-English characters.

A full discussion of character encodings, and the headaches thereof, is WAAAY beyond the scope of this post. It could fill a pretty weighty multi-volume book all by itself. So I'll just refer you to Joel Spolsky's article on the topic - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) - and say that the character set that's most commonly used for international characters is UTF-8.

So here's a quickie list of things to check for and bear in mind:

String Processing Issues


  1. Use Unicode strings
    Internally, CFMX is entirely based on Java, and so it "should" use Unicode strings by default. The main things that you need to to worry about are when data goes in - into the database - and when it comes out - gets presented to the user. Of these two, the most important is the first - so long as data is being stored correctly, you'll always be able to get it out again. If it's being stored incorrectly, you're a bit screwed :)

  2. If you're using any Regular Expressions for processing or validating strings, BE CAREFUL!
    It's very common to use expressions such as [a-zA-Z] to check for letters, or [a-zA-Z0-9] to check for alphanumerics. What do you think is going to happen if you pass an accented character such as à or é through this reg ex? Yup - é is NOT between a and z, so it will not match. How best to handle this is up to you - the POSIX regex elements such as [[:alpha:]] are good enough for some situations, but not others. For instance, POSIX does not allow more than 20 characters to be categorized as digits, whereas there are many more than 20 digit characters in Unicode. There's more detail on unicode.org


Data Storage


  1. Store everything in UTF-8
    Even if you're designing an app that "won't ever need anything except English", a little thought and effort at the design stage will mean that next time you have to write a similar app that does need international characters, you can re-use the code. Besides, it's just good practise, and the extra storage overhead of storing up-to-4-bytes-per-character is relatively easy to deal with in these days of 750GB disks for £250.

  2. Check the collation on your databases, tables, and columns
    Collations can be set at all levels of specificity, from the server right down to individual columns. Make sure that they're ALL UTF-8. Easiest way to do this is to generate a CREATE... script for your database (SQL Server) or a mysqldump file (MySQL), open it up in a text editor and search for COLLATE

  3. In SQL Server, make sure that any character-based field that can be populated from user-entered data in any way, is specified as a Unicode field
    i.e. the type starts with an N. varchar fields should be nvarchar. text fields should be ntext.

  4. Plan ahead!
    Plan to cope with multiple character sets up front, and you know your database can handle just about anything you're likely to throw at it. Sweeping it under the carpet and saying "we'll worry about that when it happens" is likely to end up with you taking your app offline while you rebuild tables and text indices. This can easily take several hours, for anything above a couple of thousand rows. Long downtimes make unhappy users.


Presentation


  1. Explicitly declare the UTF-8 character set on every page
    Make sure that every page has a meta tag like this:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    ...and also make sure that it's the FIRST tag in the Head of your doc, because as soon as a browser encounters a charset declaration, it starts re-parsing from the top.

  2. You "should" specify the primary language in the HTML tag
    In ordinary HTML, this is just :
    <html lang="en-GB">

    In XHTML 1.0, it's slightly different :
    <html lang="en-GB" xml:lang="en-GB" xml‍ns="http://www.w3.org/1999/xhtml">

    And in XHTML 1.1, you don't need the lang attribute:
    <html xml:lang="en-GB" xml‍ns="http://www.w3.org/1999/xhtml">


    This makes it much easier for text readers, translation software, and - crucially - search engines to recognise the language and take appropriate action. More details at W3.org

  3. For multi-language markup, you can provide a comma-separated list of languages
    You can (and should) also specify the language on any element within the page that is in a different language to the primary language. If your document structure doesn't break down to a logical tag that encompasses the different language part, then use a span tag:
    <span xml:lang="fr-CA" >....</span>

  4. Language is a CSS pseudo-class
    This means that you can specify different styling for different languages, like so:
    /* smaller font for documents in German */
    HTML:lang(de){ font-size:90%; }
    /* italicise any bits of French in any document */
    :lang(fr){ font-style : italic; }
    /* change the quotation marks for any Q tag INSIDE a French element */
    :lang(fr) > Q { quotes: '« ' ' »' }

    More details at W3.org



There are many many more things to think about, and this list is by no means exhaustive, but it should be enough to give you a starting point. Lots of the issues that are thrown up by this tend to be general "take-a-step-back" kind of issues that make you question your workflows and assumptions, rather than your programming expertise - e.g. if we're letting people enter their own nickname, and then using that nickname as part of the url, then what's going to happen if someone enters a nickname entirely in Russian? How are we going to handle generated emails to them? What about if they're text-only emails? (Hint: Content-type: text/plain; charset=utf-8 !)

But there are also deeper issues involved - if we start accepting and labelling content in different languages, what facilities do we need to provide to our users in order to filter out - or focus exclusively on - particular languages and character sets? Do we need to create a separate site for each language? Or do we accomodate all the content in one site, with filters?

Answers to those kinds of questions, I leave up to you :)

Tuesday, November 07, 2006

Alas, poor Smartgroups, I knew it well....

It's not without a tinge of sadness and wistful sigh that I noticed that, in possibly the least-surprising announcement I've seen for a long time, Orange is to finally pull the plug on Smartgroups

Smartgroups was, in its day, the Daddy of first-generation community applications, and I worked on it for just over two years just after it got bought out by Freeserve, who became wholly-owned by Wanadoo, then France Telecom, then finally Orange UK.

(Remember those "Ten signs you are in a dotcom company" emails that went round about the turn of the millenium? One of them particularly stuck in my mind - "You've sat at the same desk for two years, and worked for four different companies")

I learnt a lot about managing large-scale systems from that job - SG handled more than 50 million emails per month - and a good percentage of the stories I'll come out with after several late-night-beers-with-other-techies hail from that time. I also learned a lot about human nature, and the myriad ways in which people will never fail to suprise you, even when you think you've seen it all. And I'm not just talking about the users there...

But ultimately, it was a failure to evolve and keep up with the competition (e.g. MySpace) that gradually put the nails in its coffin, and after about four years of knowing full well that its days were numbered, SG has finally been taken out the back and given a nice sunny wall to stand against, and a roll-up to smoke. And a blindfold.

Farewell Smutgropes - you will live on in the memories of all those who worked on you. (Despite some quite determined scrubbing with Mind Bleach in some cases) And if anyone manages to track down the new homes of the Stereo Stimming group, the Brent Spiner Data Lovers group, or my own personal favourite - Hairy Bearded Scotsmen In Kilts ("ONLY pictures of hairy bearded scotsmen in kilts will be accepted - any pictures of hairy bearded men in kilts who are NOT Scottish will be deleted...") then be sure to let me know. Such gems of the longest of long tails are surely too fine to be lost forever :)

Friday, October 27, 2006

Mutually Exclusive Interfaces

Jason Nussbaum has posed a nice juicy thought experiment - how do you handle mutually exclusive interfaces? I started typing a comment, but it got too long, so here's my take on it :

The question was:

Had a thought: how do people handle mutually exclusive interfaces?

I suppose there's no such thing as mutually exclusive interfaces from a code perspective, but from a logic perspective, there may be. For example, let's say you do one of two things based on whether a class implements one of two interfaces. What do you do if something implements both?


It's a conceptual problem - as Jason says, there's not really a technical way that interfaces could be mutually exclusive. You can think of it this way - if an object X implements interface Y, then what you're really saying is "X can act like a Y". So there's no reason why X can't also act like a Z, depending on the situation.

There are so many ways you could tackle this problem. You could introduce a super-interface:


public interface DeadOrAlive{ public boolean isAlive(); }

public interface Dead extends DeadOrAlive{
public void startToSmell();
}
public interface Alive extends DeadOrAlive{
public void breathe();
}


...and thus force implementers of Dead or Alive to provide a method which indicates if they are alive or not. This feels like a bit of a kludge, but to my mind, less so than without.

You could add Exceptions onto the signature methods of Dead and Alive, which might throw NotDeadException and NotAliveException respectively - but this actually feels more kludgey to me than the first case.

I think the "proper" way to think of this is maybe the biggest kludge of all - as your object model is meant to reflect the real-world problem that you're modelling, if it produces this kind of conflict then maybe you need to re-think your model!

- unless, of course, your name happens to be Schrödinger...


public interface Alive{ public void breathe(); }
public interface Dead{ public void startToSmell(); }

public class Cat implements Dead, Alive
{
public void startToSmell();
public void breathe();
}

public class Box
{
private Cat cat;

public void putCatInBox();
}


...eep, lame quantum physics jokes on a Friday?! I think it's time for coffee...

Wednesday, October 25, 2006

Enron Explorer Going Viral

Since we got Boing Boing'ed yesterday, Google Analytics shows that we've had over 21,000 page views so far - and that's just to the front page. Unfortunately the GA javascript can't be included in the main app, as it's pretty much all AJAX requests, so we'll have to wait for analysis of the Apache logs for full statistics.

I also put the Enron Explorer on del.icio.us 4 days ago - it's now been echoed by 112 people, including one who speaks Russian, by the looks of it. Google Analytics confirms that we've had visitors from 71 different countries, insterestingly enough including two from the Cayman Islands and one from an anonymising proxy - Kenny Boy, that's not you is it....?

Some of the comments are fantastic :

fpaulus
"Interesting (both technologically and economically) walk through the Enron e-mail archives"

markwithasee
"searchable database of all of enron's internal email from 99-02. WOW."

RStacy
"A look into what the future could look like in corporate transparency and analysis."

slightlyfleury
"Enron emails. Gawd, did I just come?"

some of the tags that people have applied to it (under the posting history on the right) are also interesting - I always find it intriguing to see how other people see what we've done.

Tuesday, October 24, 2006

We got Boing Boing'ed!

Another way to hammer a webserver - the Enron Explorer got highlighted on Boing Boing, "The Most Popular Blog In The World" according to Technorati. And our traffic has gone through the roof, but the app is holding up thanks to judicious planning for heavy usage, and some hefty Squid proxying.

Remember kids, "Proper Planning Prevents Piss-Poor Performance"

....and Corny Cliches Create Copious Cringes...
and Aggravated Alliteration Attains Acute Annoyance.
etc.

Wednesday, October 18, 2006

The Enron Explorer!

Phew - at long last we've gone live, and I can finally blog about what we've been working on here!

( yeah, yeah, skip all that, what about Enron? )

Our next major product is aimed squarely at the enterprise market, and it's a tool to analyse and extract meaning from corporate data stores and email traffic. Based on this information extraction, we can then map social networks and analyse information flow throughout the organisation, and use this information to give a shot in the arm to communications effectiveness - forwarding emails to you that are especially relevant to you but that you might otherwise have missed, and letting you set "volume level" for each of your interests.

There's no end of applications for this kind of tech - everything from expertise analysis to Sarbannes-Oxley compliance. The hardest question of all, though, was what to call it. After a tortuous voting and lobbying process second only to the IOC in labyrinthine complexity, we finally settled on the name SONAR, as it kind of implies what the product does (scans things and identifies things that you need to know about) and it's also a merely-mildly-icky acronym - SOcial Networks And Relevance.

Anyway, while we were working on the early stages of SONAR, we needed a large set of plausible test data to test the NLP algorithms - sadly, my usual plethora of furry animals and pitiful puns just wasn't enough in this case - and Jan came up with a stroke of genius - the Enron email archive.

In Oct. 2003, the FERC released ~200,000 Enron emails into the public domain, as part of the inquiry into the Enron fraud. So we grabbed the database, imported it into a very early version of SONAR, and set it chugging away. The results were so absorbing that we can't stop fiddling with it! We decided it was a great way to demo our system, even though it's a very early version, so laydeeeez an' gennulmen, (drum roll) allow me to present....

THE ENRON EXPLORER!

Just click on a name or a theme to get in, and off you go. There's some real gems in the archive, ranging from the hilariously obscene :

Jeff Skilling to Andy Zipper:
Fuck you, you piece of shit. I can't wait to see you go down with the ship like all the other vermin. Smug, paranoid, unhappy mother fucker. Eat shit.
...to which I think Andy Zipper responded with admirable restraint!


- to the heartbreaking:

Sato (Enron Japan) to Andy Fastow:
Please don't fire Enron Japan staff, we do nothing wrong! Please!!! [....] We know how serious is the situation but please don't fire us now. Our families are waiting for a happy chrismas and new year!!


We can (and often do) lose ourselves in this for ages - have an explore and enjoy finding out what they were saying to each other as the house of cards crumbled down around them.

On a technical note, the interface is written in RubyOnRails, with a Java backend. It's AJAX'ed up to the eyeballs - hey, we even have rounded corners and gradient fills on the AJAX loading indicators! - and although there are still a couple of issues to sort out, we've worked hard on maintaining the usual expectations of browser behaviour.

The back button works (with a couple of minor niggles), and you can bookmark and email the url of your current view, and it should (nearly!) always bring you back to the same point. There's some smooth integration between the Java applet visualiser and the AJAX calls too, although again there's still a couple of niggles to deal with.

There's a whole load of other niftiness going on behind the scenes, but I'll blog about that later. In the meantime, have fun with it, and if you find anything particularly juicy, leave a comment (either here or on the app itself) and share it with the world!

Wednesday, October 11, 2006

Styled Checkboxes and Radio Buttons

I'm sure you've experienced the problem - most form elements can be styled pretty easily, but checkboxes and radio buttons? Forget it. If you're anything like me, you probably gave up by now and accepted that it's just one of those things you have to put up with. However, Philip Howard has released a nice CSS and JS solution that allows you to wrap a span of a certain class around the elements you want styled, and the magic pixies will do the rest.

I've seen this done before, but this solution is worth highlighting because :

  • It degrades gracefully back to the standard form elements if your browser does not support JS, CSS or images, and
  • He's put the extra bit of effort in to support the standard keyboard controls for form elements - space bar toggles status, left and right move the focus along a group of radio buttons, etc.


Nice one Philip.

Wednesday, October 04, 2006

Breaking Dependencies With Interfaces

I just picked up on Mike Dinowitz's post Interfaces - Why Bother? asking for benefits that interfaces offer. I started typing a comment, but realised that my two-penn'orth was way too sprawling for a comment and needed a whole post of its own. So here goes - I'm sure this won't be telling someone as experienced as Mike anything he doesn't know, but there's also been a call for more introductory-level blog posts in the last couple of days, so hopefully someone out there will find this useful as a concrete example of where I'd be lost without interfaces.

I'm currently working on a large J2EE platform with over 2000 classes. The only sensible way to manage such complexity is to split it into several distinct modules, each of which is conceptually self-contained and compiled and unit-tested separately, in a strict order. The ANT build script builds the core framework module first, then the email server, then the groups and user modules, etc etc, finally finishing up with the Tapestry-based web interface.

This all works great, except that you're still left with cross-module dependencies. The email server needs to know that group emails should be forwarded to group members, the groups module needs to know about its users, and so on.

The way these dependencies are resolved, is through interfaces. In the email module, we create an interface that represents the bit of group- or member- related functionality that the email server needs to know about - in this case, the ability to accept and propagate an email - and make the group and member objects (which come after the email server) implement that interface.

So we make an interface called EmailPropagator with one method - propagate() - in the email server module, and make groups and members implement EmailPropagator.

This way, the email server doesn't need to know anything else about the "thing" it's sending email to - so long as it implements EmailPropagator, the email server can ask it to propagate email to whoever or wherever it feels like, and that's all that the email server needs to know about it in order to do its job.

We can make anything be accepted by the email server, so long as it implements that one method. It could be a group / mailing list, an individual person, a spam demon that forwards ten thousand copies to random addresses, a black hole that just swallows it up, a file store.... anything, so long as it implements that one method.

This is part of the power of OO, and particularly the power of interfaces. It's also different from inheritance - in Java, and hence CF too, a class can only extend one base class. However, it can implement as many different interfaces as you like. You've heard of "If it looks like a duck and quacks like a duck, it must be a duck...." - well, this extends to "If it quacks like a duck, I don't care what it is, so long as I can ask it to quack".

It's one of the guiding mantras of OO that you should "design to interfaces, not implementations" and this is exactly why - we can make a million different things plug into our email server and do a million different actions with an email, without ever needing to change the email server code. That's power, and that's reusability. And that's why I love interfaces.

How Enterprise 2.0 Products Can Succeed

Much food for thought on Andrew McAfee's blog discussing John Gourville's concept of the "9x Email" problem which new technologies must overcome - "a mismatch of 9 to 1 between what innovators think consumers want and what consumers actually want."

People - real people, not tech people, who sometimes seem to be rabidly adopting the newest, most obscure technology for no other reason than to claim "geekier than thou" bragging rights amongst their peers - have an inbuilt tendency to stick with what they know. Gourville suggests that in order for new technology to go viral, it must offer a tenfold improvement over what's already out there. The problem is that in order to overcome the inertia associated with the status quo, en evolutionary, not revolutionary approach is called for:

Gourville's research suggests that the average person will underweight the prospective benefits of a replacement technology for it by about a factor of three, and overweight by the same factor everything they're being asked to give up

McAfee takes the example of email versus "groupware", and cites the intuitive nature of email interfaces as a point of comparison for Enterprise 2.0 apps. Just about everyone "gets" email as a concept, and the critical problem for Enterprise 2.0 technologies is one of interfaces. Rather than adding more and more bells and whistles to an interface, we should concentrate on making the interfaces clean, elegant, and instantly-comprehensible. This is a viewpoint I wholeheartedly share, but McAfee puts it very succinctly:

A great UI not only heightens the perceived benefits of a proposed collaboration technology, it also lowers the perceived costs.
...
The greatest challenge here, I think, doesn't have to do with making the browser sufficiently application-like ... It has to do with making technologists sufficiently user-like -- getting them to stop thinking in terms of bells and whistles and elaborate functionality, and to start thinking instead about busy users with short attention spans who need to get something done, and who can always reach for email


Insightful stuff.

Friday, September 29, 2006

How To Architect Your CSS

We all have our own favourite strategies for architecting our application code, but CSS is often one of the aspects of a site that gets copmletely overlooked. As CSS - along with its browser support - matures and grows more powerful, how we structure our CSS becomes more and more important for maintainability and expansibility1 of our applications.

I'm sure we've all experienced it - you start out with a prototype with a few simple CSS rules and the best of intentions, but as your application grows and changes, and more and more people have their input into the design process, the CSS grows and morphs and accumulates more and more quick fixes and cheesey hacks. Months later, you realise you've got to the point where something as simple as "can we make the comments link appear in red?" can take hours of navigating the jungle of interfering specificities and ids. Eventually you admit defeat, and put in a cheesey hack "just this once", and mentally promise to go back and fix it properly later - but somehow you never get round to it, and the process continues.

Digital Web has an interesting, if short, article on Architecting CSS, with some useful tips. If I had to pick two simple things that you should do RIGHT NOW, DAMMIT if you're not already, it would be :

  • Commenting your rules just as much as your code, and
  • Alphabetically sorting your attributes.


Oh, and avoiding the !important hack if at all possible.

Alright, three simple things....2 You get the idea :)



1 - apologies for grimace-inducing linguistic contortions. It's Friday, it's early, and I haven't had coffee yet. Bleh.
2 - No-one expects the Spanish blog post!

Thursday, September 21, 2006

Unfortunate Comedy Typos part 1

Ever have one of those days where no matter what you do, you just can't type properly? Today, I'm finding it impossible to type StringBuffer - it keeps coming out as StringBugger.

Calling Dr. Freud, calling Dr. Freud.....

A couple of other Freudian slaps (ahem) SLIPS I keep having, mean that I keep declaring functino's (maybe a new fundamental particle for the Standard Model?) and functoni's - and I don't know if it's just me, but Func Toni conjures up a disturbing image of Swiss Toni in a gold lamé shirt with eighteen-inch collars, star-shaped shades, and a big-ass medallion, strutting his funky stuff to Graham Central Station - like this:



Damn, that's a funky bassline!

(sigh)

Thank crunchie it's Friday...

Demos wins award

Congratulations to m'erstwhile colleagues at Headshift on scooping another award for the recently launched Demos site. I, along with the rest of the crew, put a huge amount of thought and care in to that site, and it gives me a nice warm fuzzy feeling to know that it's not just the client who appreciates it. Special kudos to Neil Roberts for doing the always-tricky task of picking up the reins on it after I left and teasing it lovingly into deployment.

Wednesday, September 13, 2006

Svn over ssh (svn+ssh://) on Windows via cygwin and PuTTy

One of the more obvious things lacking from Windows is an ssh (secure shell) client, and this has caused me no end of grief trying to check out code from Subversion repositories on servers that require ssh access - such as our own.

I'd previously managed to get access by forcing my username in the URL of the repository along with a saved PuTTY session name:


svn+ssh://alistair@(saved PuTTY session name)/etc...


...where the saved PuTTY session used my imported private key from the live host.

(Note: you *must* use PuTTYGen to convert your OpenSSH keys to PuTTY-compatible keys)

This caused a certain amount of sneering from the Debian users in the office, who didn't have to jump through any of these hoops, but I could live with that...

The latest problem to have me gesticulating at the screen like a south-american-footballer-in-front-of-a-referee-who's-reaching-for-his-pocket was the use of externals.

An external in a Subversion repository is a URL reference to another resource - for instance, in a Java app that requires a certain library to build, you might add an external pointing to the download url of the particular jar file that you need.

All well and good, but you can also add externals pointing to a different location in the same repository, for instance, to share config files between back end and front end modules.

Now here was the problem - you can't have relative urls in externals, they have to be absolute. This particular external was pointing to a different lcoation in the same repository in the standard URL way, which works fine on UNIX (Debian) clients because they have a native ssh client :


svn+ssh//(host)/(path)


- but there was no way this was going to work on Windows via PuTTY - the only way I'd got svn+ssh:// urls to work previously was via the username@savedsessionname method. Every time I tried to update the working copy, it would get as far as that external, then give me this error message:


svn: Connection closed unexpectedly


So how do you get round this? Enter the cavalry - Cygwin.

Cygwin is a command shell for Windows that gives you a Bash-style environment, including....(drum roll)..... an ssh client!

With this, you can configure SVN (and the rather wonderful TortoiseSVN) to use this ssh client and - crucially - pass it a command line parameter that tells it to use your private key file for authentication.

If you open your SVN config file in a text editor, (you'll find this in C:\Documents and Settings\(username)\Application Data\Subversion\config) you'll see instructions on how to configure ssh. You can pretty much ignore it :-) and just use the following line:


ssh = "C:/(cygwin root path)/bin/ssh.exe" -i "C:/(path to your private key)"


...and that's it! It's one of those really simple solutions that took me a couple of minutes to implement, but hours of desk-headbanging frustration to find. You can also use the same command line in TortoiseSVN -> settings -> network -> ssh client.

One other note: make sure that you point ssh.exe at your openssh-compatible private key. If you've set up your key with PuTTYGen, you can load it in and export an OpenSSH-compatible key from there.

Thursday, September 07, 2006

Lessons Learned From Kiko

Richard White has posted an insightful write-up of his Actual Lessons From Kiko. Well worth a few minutes of anybody who is in the web app space.

Wednesday, August 30, 2006

RegEx to fully validate RFC822 email addresses

I love Perl. No, I really do - I can sit and look at the really good examples for hours, and still have no clue what they're doing. And here is a fantastic example....

Mail::RFC822::Address - a module that tells you whether the given string is a valid email address or not.

Most email validators only check the absolute basics - e.g. some characters followed by an at-sign, followed by some more characters and at least one dot. But have you ever actually read RFC822, the critical RFC that defines the standard of what's acceptable in an email and what isn't ? It's surprisingly loose on what's an acceptable address.

(go on, read chapter 6, and try to summarise it - you know you want to...!)

But no more, thanks to Paul Warren, who has solved all our email-address-validation woes with one almighty mother of a reg ex - and here it is. You'll like this. No, you will.... here we go:


(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[
\t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [
\t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[
\t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)



After a while, all you see is ....blonde......brunette.....


PS - hope Paul doesn't mind me posting it here. It's truly a thing of beauty, but I'll take it down if requested.

URGENT html prototype designer/coder needed today and tomorrow!

Bit of a long shot this one, but Lise has been left high-and-dry by a contract developer taken ill. The job is mocking up flat HTML interfaces, and will probably require a bit of a late night tonight as it must be finished by close-of-play tomorrow for user testing on Friday. Ideally you'd be able to make it to Islington THIS AFTERNOON for a briefing and handover.

Very short notice, but they're a bit stuck - so if anyone finds themselves at a loose end for a couple of days, add a comment on this post and i'll pass your details on.

Tuesday, August 29, 2006

Flickr's Geotagging Let Down By Location Search

Thomas Hawk of Geo-tagging mashup Zoomr does a good job of keeping his objectivity in his review of Flickr's new Geotagging facility.

I just had a play with it myself, and I have to agree with Thomas - the nicest aspect is the way that it's integrated with the Flickr Organiser (I still keep spelling that Organizr...) and the by-now-expected Ajax drag-and-drop wizardry. But the most disappointing aspect of it all is the underlying Yahoo! maps data. It's fine if you want to geotag photos down to city block level in most major US cities, but stray off the beaten path to tag photos of mountaineering treks or rock climbing venues - even world-famous rock climbing venues - and the map detail just isn't there.

To be fair, this isn't just a limitation of Yahoo!'s map data, it's similarly limited in Google Maps aswell - and I guess it's a reflection of the underlying business drivers behind the map data. Constructing an index of the whole globe is a massive undertaking, and it has to be funded somehow. The obvious channel is advertising, but only businesses are willing to pay for advertising, and most businesses tend to be centred around urban areas, therefore it's more important to the map provider that the advertisers are kept happy.

It just seems to take on an added dimension of disappointment when this limitation applies to photos. To be honest, one U.S. city block looks pretty much the same as any other U.S. city block, in the grand scheme of things, and part of the whole joy of Flickr comes from discovering fascinating, beautiful images that you may never otherwise have seen. Almost by its very nature, this is going to involve out-of-the-way places like the Sim Gang Glacier in Pakistan, La Dibona of Les Ecrins in the Alps, or K2, which requires 14 days of hard trekking from the nearest road before you even reach the mountain - often cited as the hardest trek in the world, but surely one of the most beautiful. Try to find any of these places in the Flickr map, and you'll have trouble. Even closer to home, in the rugged mountain landscape of North Wales around Tryfan, the map engine draws a blank.

To take the next step in geotagging, and in mapping as a whole, requires the next step to be taken in the search technology that takes a user-supplied string and works out what the hell you actually meant. This is a mammoth task in itself, considering how many different ways people can refer to a particular langitude and latitude, across languages and character sets, let alone local names vs. standard names. But whoever cracks that problem can look forward to a very bright future indeed, and that's just one reason why Natural Language Processing - clearing away the cobwebs of context and language and working with raw chunks of meaning - is becoming such a hot topic right now. Stay tuned...

Thursday, August 24, 2006

So many parties, so little time...

Well, I had the best intentions of making it to tonight's London CFUG, but once again my path is strewn with cowpats from the Devil's own Satanic herd** as it once again conflicts with something else that I just can't miss. This time, it's the Trampoline Systems Summer Bash, graced by the intriguing Czechoslovakian Alternative Folktronica of the most delectable Miss Eva Eden. Should be a larf - I've never had to mike-up a Bontempi organ before.... I'll whack the more-salacious photos onto Flickr tomorrow, plus any videos I happen to get of any embarrassingly drunk bigwigs saying spectacularly inappropriate things.


So have a pint and a whinge for me at the CFUG tonight, and hopefully I'll make it to the next one. Unless I'm halfway up some mountain somewhere or something.


**series and episode, anyone?

Thursday, August 17, 2006

Who Wants To Buy An Ajax Calendar App?

In a move that's not-at-all-designed-to-create-an-internet-buzz, AJAX calendar app Kiko is for sale on eBay.

Starting price US $49,999.99. No Bids yet.

In the interests of completeness, I should point out that they're not the first, by a long way - which market sector do you think got there first? Of course - porn!.

The listing states

We are selling Kiko because we want to have time to work on other projects as a development team.

...and not because they are now in direct competition with Google, or anything...

best of luck Kiko guys

A Pox Upon AppleMail!

A lot of my time since joining Trampoline 6 weeks ago has been spent reaquainting myself with the black art of parsing and dismembering MIME emails with the JavaMail API. There's much I could say about the MIME format and particularly the JavaMail API, but those apopleptic rants deserve to be written up and nailed to church doors all of their own.

This post is about a problem I've been having with a mail generated by Apple Mail, that has been driving me nuts. It's not the first issue I've had with Apple Mail and it's funky attachment formatting, and I'm sure it won't be the last. However, it's the most maddening to date! Here's the problem:

In a multipart email, you separate each part with a unique string that must not occur in any of the parts. This is generated by the email client, and declared in the Content-Type header. RFC 2045 states that the boundary declaration is required for any multipart subtype. The header should look like this:


Content-Type: multipart/(whatever); boundary="----=(unique string)"


You will then get a set of Parts, each separated by an occurence of the boundary string, and each declaring what type of content it is by means of its own Content-Type header -


Content-Type: multipart/(whatever); boundary="----=(unique string)"


An example might be:


Content-Type: multipart/mixed; boundary="----=ABCDEFGHIJKLMNOP"

----=ABCDEFGHIJKLMNOP
Content-Type: text/plain; charset=US-ASCII

Hi Al,

Here's the schematic for the secret base under the island volcano. Note the new layout of the shark pools, and the trapdoor is now triggered from the pressure pad under your desk as requested. Will give the engineers a kick about the frickin' lasers and see what's taking them so long.

Cheers,

Dave

----=ABCDEFGHIJKLMNOP
Content-Type: image/png; name="plans.png"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(lots of data encoded into base 64 so that it can be transferred as text)
----=ABCDEFGHIJKLMNOP


All well and good so far.

It gets a bit more complicated when you introduce the fact that any part of the mail body can also be a multipart type, which must declare its own boundary string, but still, it should be parseable into a coherent tree structure, right?

Well yes - as long as you play by the rules.

Apple Mac files consist of two forks:

1) an apple-specific part called the RESOURCE fork which contains arbitrary information such as icon bitmaps and file info parameters,
2) a DATA fork which contains the actual file data.

This translates logically into a MIME multipart format - multipart/appledouble - with one part for each fork.

So, if we were to to send the example message above from a Mac using AppleMail, you would get something like this:


Content-Type: multipart/mixed; boundary="----=TOPLEVELBOUNDARY"

----=TOPLEVELBOUNDARY
Content-Type: text/plain; charset=US-ASCII

Hi Al,

Here's the schematic for the secret base under the island volcano. Note the new layout of the shark pools, and the trapdoor is now triggered from the pressure pad under your desk as requested. Will give the engineers a kick about the frickin' lasers and see what's taking them so long.

Cheers,

Dave

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=HEYIMTHEAPPLEDOUBLEBOUNDARY"

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=HEYIMTHEAPPLEDOUBLEBOUNDARY

----=TOPLEVELBOUNDARY


Again, this is all well and good so far - apart from one or two minor irritations like that lack of quotes around the name of the file in the Content-Type header - which can cause some grief if the filename has spaces in it.... but that can be got round without much trouble using a bit of regex in pre-processing.

The problem comes when you have multiple appledouble-encoded attachments. What you would expect is something like this:


Content-Type: multipart/mixed; boundary="----=TOPLEVELBOUNDARY"

----=TOPLEVELBOUNDARY
Content-Type: text/plain; charset=US-ASCII

blah - message text

ATTACHMENT 1:

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=HEYIMTHEAPPLEDOUBLEBOUNDARY"

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=HEYIMTHEAPPLEDOUBLEBOUNDARY

ATTACHMENT 2:

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=DIFFERENTAPPLEDOUBLEBOUNDARY"

----=DIFFERENTAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=DIFFERENTAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=DIFFERENTAPPLEDOUBLEBOUNDARY

----=TOPLEVELBOUNDARY


But what's actually happening is that in the second attachment, the all-important Content-Type: declaration -

Content-Type: multipart/appledouble; boundary="----=DIFFERENTAPPLEDOUBLEBOUNDARY"

- is missing!

This line is absolutely vital, as it not only declares that this part is in appledouble format, but more fundamentally it declares that this part is itself a multipart and is split with THIS boundary marker rather than any other.

If this line is missing, then ONLY the FIRST attachment gets recognised. Any subsequent attachments which don't get the content-type header are then considered to be text/plain by default, so you get an email which has the first attachment properly parsed as an image, but everything after that appears inline as text. So anyone reading the email gets a big long string of base64 encoded image data. Not nice.

Wednesday, August 16, 2006

Slovenia - The Game!

At last, the Flash app you've all been waiting for - Slovenia, The Game! It's a very cute, virtual community based around various beauty spots in Slovenia. We went there last summer for a week, and completely fell in love with the place. It's just stupidly beautiful - everywhere you look, your jaw drops, and this quirky little game just brought it all back.... I'm sitting here with a silly grin on my face and a wistful look in my eye. Ah, memories...

by the way - you seem to get called "pedr" a lot - pedr loosely translates as "fag"

Monday, August 14, 2006

How to get a job in Silicon Valley

Good-old Guy Kawasaki has an endearingly cynical but still-so-true guide to How To Get Hired In Silicon Valley. Well worth a read, and just as applicable outside California. Not that I've ever applied for a job in Silicon Valley (yet), but it certainly brought a wry smile to my face.

I've read many CVs and conducted many interviews, and it's amazing just how easy some people make it for you to put their CV on the "no" pile. I know we're now living in the txt-spk generation, but seriously - if you can't spell or punctuate on your CV when you're promoting yourself, how well are you going to promote the company? Interviewers are busy people, and will be looking for any reason to say no - spell well, communicate well, keep it short, and if you get an interview, read and learn from Mr Kawasaki.

Friday, August 11, 2006

What do atoms look like?

....they look like THIS - Physics News Graphics have published a field-ion microscope image of the sharpest man-made object ever produced - a needle with a tip that's just a single tungsten atom. The thing I love about this image is the way some of the atoms look like blobs of mercury - because they moved during the 1-second-long imaging process.

Friday, August 04, 2006

Java Programmers are the Erotic Furries of Programming

Nice chart of smugness here on Some Guy Ranting

....but where would CF programmers sit on this chart?

Thursday, August 03, 2006

Demos is live

Nice to see that Headshift have got the new Demos site live. This was the last site I worked on at Headshift, and there's quite a few aspecs to it of which I - and m'former colleagues Neil Roberts and Andy Birchall - feel quite proud.

It's architected with the "Everything-is-a-Item" approach that I learned from my Headshift predecessor, erstwhile colleague and all-round-good-egg Matt Perdeaux, which opens a lot of doors in terms of code reuse and making the most of your content.

Put simply, everything on the site - blog posts, comments, people, users, CMS pages - has the basic common data such as title, summary, and full description abstracted to an Items table, to which you can join as required in your SQL. All links between items and other entities (e.g. tags) are done at the Items level. This makes things like a global search across all content trivially easy (at least in theory - doesn't always work out that way of course... ) and means that once you've implemented tagging for, say, blog posts, you can have tagging on everything else as well with virtually zero extra effort - because everything is-a Item.

As with a lot of systems I've written over the years, it gets a little frustrating that some of the nicest bits are hidden away from public view in the admin interfaces. But you'll just have to take my word for it that there are lots of self-contained AJAX-driven custom tags that do things like tag administration and item-to-item linkages, which only had to be written once, and could be brought into any form for any kind of item and would work straight away - because everything is-a Item.

Other aspects of the site which give me that warm fluffy feeling inside include the friendly url scheme. The site is written with a Fusebox front-end, but there's a very thin layer above that which translates human-friendly urls into fusebox-friendly urls. This is done with Apache mod_rewrite, translating a url like

http://www.demos.co.uk/projects/demoswebsite/blog/artistaudience

into :

http://www.demos.co.uk/projects.cfm?sParams=/demoswebsite/blog/artistaudience

projects.cfm then parses the sParams string into its expected parameters, retrieves any referenced items into request scope, checks permissions, works out the appropriate fuseaction, and passes on to the FB framework by including index.cfm.

It would make a good CFUG case study, I reckon. (Hey Matt - fancy another vaudevillian double-act? Gaw blimey, roll out the barrel guvnor...) Mind you, I'm not sure if it's appropriate for me to reveal the technical innards of a system when I don't work for the company anymore... hmm... over to you then Neil ;)

Friday, July 28, 2006

The Long Tail 'Ain't So Long?

The "Long Tail" is an oft-quoted theory that underpins many a Web 2.0 business model - the idea that the low-demand niche products at the esoteric end of an inventory can be aggregated into a collection of sales that rival or exceed the more popular products. In other words,

"A very, very big number (the products in the Tail) multiplied by a relatively small number (the sales of each) is still equal to a very, very big number. And, again, that very, very big number is only getting bigger."


The idea rapidly took off, as David Hornik notes in Where's The Money In The Long Tail?

"Six months ago there was barely a pitch I heard that didn't include a slide entitled "Long Tail" or "The Long Tail of [fill in the vertical]," with the obligatory long tail curve. Impressively, it has taken less than a year of entrepreneurs explicitly referencing and explaining the Long Tail before it has become so well recognized and understood that it need only be implicated in passing without the same sort of fanfare as it used to receive"


Guy Kawasaki also has heard so many long-tail pitches, he's produced cyncic's checklist for Long Tail ipmlementations.

However, the figures behind the idea may not quite add up after all - or at least, the figures have been exaggerated and doubtless been subject to a kind of Chinese Whispers - after all, when everyone concerned has a vested interest in a mem taking hold, it's not difficult to see how "I Want To Believe" can turn into "I Believe".

more details from everyone's favourite IT red-top scandal sheet, The Register at The Long Tail's maths begin to crumble :

The Long Tail, then, isn't snake oil. Rather its restorative properties have been exaggerated. Who wants to let facts get in the way of a buzzword bestseller good thesis?

Tuesday, July 18, 2006

The ultimate social software...?

...or maybe that should be hardware - Channel 4 to screen public jerk-a-thon

I'm going to leave that without any comment. I think the punnage is bad enough as it is.

(wince) I'll get me coat...

Interoperability and Monetising Web 2.0

Following on from my earlier post about automating the build process, I thought I'd add a bit more about interoperability.

It's this commitment to interoperability that I love about the OSS movement, more than any of the ideals and ideologies and motivations behind it. In general, it seems that commercial software tends to be designed to interoperate only with other tools from the same vendor (yes, Microsoft, I'm looking at you), whereas OSS tools tend to like to interoperate with just about anything they can. Of course, that's a sweeping generalisation, but I'm talking trends rather than specifics here. Java itself may not be fully Open source yet, but it's getting there.

Interoperability is also one of the big changes in Web 2.0, much more significant in my eyes than whizz-bang AJAX interfaces. The first dotcom boom was all about control and ownership of information. So many of our business conflicts and process difficulties at Rocom on the Data eXchange project (gotta love those great-big-feck-off X's in project names) boiled down, in the end, to a question of "who owns this information?" Web 1.0 business models were all about monetising content, getting people to pay for access to information, locking it down and charging for access. You want my product data? Sure, but it'll cost you X per Y products...

Web 2.0, on the other hand, is all about openness, of information and APIs. You want my product data? Sure! You want to help sell my products by showing them on your site, of course you can! RSS or Atom? Do you want a SOAP-enabled web service with that, so you can write a desktop client? No problem... API keys are over there, just by the chocolate sprinkles... however, providing syndication and API services can be costly, both in terms of bandwidth and server resources. So what's the incentive?

The question of how to effectively monetise a Web 2.0 business is yet to be fully answered, of course. The famous "Three A's" business model - AJAX, AdSense and Arrogance - is really just the same as the classic Web 1.0 business plans of "we'll get 10 million hits a month, and generate all our revenue from advertising", and it's not really any more viable now than it was then.

As ads become more-and-more intrusive (those pop-over Flash ads that appear right in front of the content you're reading just make me want to run through a shopping mall with an Uzi, I'm afraid - sorry to any Flash developers who make those for a living that might be reading) - ad-blocking tools become more and more sophisticated and widely-used, and the arms race continues, just like spam. The only media that can survive solely on ad revenue are the traditional print media, where blocking the ads is either not feasible, or not worth the effort. Even TV ad revenue is declining (third video down) in the face of TiVo and other similar products - advertisers are unwilling to pay for face-time with viewers who effectively aren't there anymore.

On the other hand, web-based mash-up services that depend entirely on the availability of third-party services like Flickr or Google Maps, or even del.icio.us, are also building their houses on sand. The systems analyst side of me says that if you build your business model to depend 100% on a third-party who provides their service for free, then you need to rethink your business model!

But it's here that I think the big service providers may be missing a trick. A basic Flickr account costs nothing, and gives you a taster of the service, enough to make you want to sign up for a low-cost Pro account. But how about providing another tier of service above that, for the mash-up businesses that depend on availability?

Why not provide a cordonned-off cluster of high-avaiability servers with a dedicated support team, available only to a dependent business who's willing to pay a premium because they depend on it?

I haven't done the sums myself, I admit, but it's another way of monetising your Web 2.0 services, and provides a revenue stream to get some payback for your openness.

Now That's What I Call A Compilation, Vol 1

One of the things I'm having to get used to all over again in my mutation from a CF guy to a Java guy, is having an intervening compilation stage inbetween making my changes and testing them. It's how I started out, of course, going back through C/C++ and right back to COBOL, but in the 6yrs I've been doing CF, I'd got kind of comfy with making a change, ALT+TABbing to the browser and hitting F5, and seeing the effects straight away. And if there was any issue, just sticking a <CFDUMP><CFABORT> in and hitting F5 again.

But on an enterprise Java system with around 2000 classes, the build time can easily be a couple of minutes. This can get quite irritating at times. All the time I'm waiting for the compiler, I can feel my mental logic tree (e.g. "if X happens, then my assumption about Y was correct which means that Z shouldn't happen, so if I see THAT then I need to think again about A, B and C....") fading out of focus and getting replaced with trivia.... or a mental picture of a nice big steaming hot cappucino... or ham slapping, or whatever...

Luckily, there are compensations in the Java world which make up for the extra build step.

Having a fully-featured debugger helps a lot. We use IntelliJ IDEA as our IDE (Eclipse just gets too slow, unfortunately) and the debugger in that is excellent.

You can step forwards AND backwards through your code, evaluating expressions and variables and function calls at any point, getting a tree view of all in-scope variables' and properties' values, and you can change the value of any simple variable at any time.

This came in SO handy the other day when I was trying to parse a MIME multipart message generated by AppleMail, which is notoriously bad at handling attachments in a standards-compliant way. Narrowing it down to one particular header, stepping backwards and forwards over the exception-throwing line, changing one character at a time until I'd cracked it..... it almost made me forget the good-old <CFDUMP><CFABORT> combo - almost!

The other thing I'm appreciating more and more is that with a little effort, and a liberal sprinkling of creamy Open-Source goodness, this build stage can automated to a remarkable degree.

We have an ANT script to perform the compilation, compile bindings, package classes into jars, put everything together in the right way for whichever app server you are deploying onto, etc etc. We use Subversion for version control and Trac for issue tracking, which work together fantastically well.

If you're cunning, you can also set things up so that every time a revision is committed to the SVN repository, a full build is performed on the repository server, and all the unit tests run in the background, with any errors getting automatically emailed to the development team - so you know if anything you've committed has broken something else. Maybe this could be achieved with CFCUnit, I don't know - anyone tried it?

Thursday, July 13, 2006

Faking a Left Outer Join in Query-of-Queries

Just picked up on Rip's post about Query-of-Queries tricks and thought I'd add share a trick I came up with some months ago.

One of the big things missing from Queries-of-Queries, IMHO, is support for Left Outer Joins. You can join two queries together using an "old-style" implied join like so:

<cfquery dbtype="query">
SELECT (fields)
FROM myQuery1, myQuery2
WHERE myQuery1.fooID = myQuery2.fooID
</cfquery>


...but this "implied join" syntax is only an INNER JOIN - i.e. it will only return rows where both queries have matching rows.

Often, you need the functionality of a LEFT OUTER JOIN. For the example above, a left outer join would return a row for each record in myQuery1, plus each matching record in myQuery2 if a matching row exists, or NULL in the myQuery2 fields if no matching row exists.

For instance, you might want to list all products, with the order numbers of any orders for each product, but if the product has no orders you would still want it to appear in the query. This is a perfect situation for a LEFT OUTER JOIN.

Unfortunately, query-of-queries only supports the implied join syntax, therefore it doesn't support left outer joins. But left outer joins are just so mind-buggeringly useful that I had to find a way to fake them.... hyukhyukhyuk...

Q-of-Q does support the UNION construct, for joining two result sets together. By default, a UNION filters out duplicate rows, unless you specify it as a UNION ALL - which returns all rows including duplicates.

So you do one q-of-q to get all the rows which are matched, and then another q-of-q to get all the rows in your "left" query that DON'T have a matching row in the "right" query, and you UNION ALL them together. There's a little bit of fun you have to go through in order to work out the columns in the "right" query that aren't in the "left" query, and fill them with "NULL" values - especially as you have to work around the fact that you can't really do NULLs....

...anyway, here we go. This code has been written for MX 6.1, and works pretty well on that, and on 7. I'm sure someone could make it more bulletproof if they really wanted to, and it would be fairly simple to introduce support for explicitly-named column types, but frankly, I couldn't be arsed :) Feel free to tinker with at your leisure.




<!------------------------------------------------->
<!--- leftOuterJoin --->
<!--- Emulates a left outer join in QofQ's --->
<!--- @author : Al Davidson --->
<!------------------------------------------------->

<cffunction name="leftOuterJoin" access="public" returntype="query" output="yes">
<cfargument name="qry_left" type="query" required="yes" />
<cfargument name="qry_right" type="query" required="yes" />
<cfargument name="sLeftJoinColumn" type="string" required="true" />
<cfargument name="sRightJoinColumn" type="string" required="true" />
<cfargument name="sOrderBy" type="string" required="false" default="" />

<cfscript>
// var'ed struct so that we don't have to var every local variable
var stLocals = structNew();

// check for an empty left query
if( arguments.qry_left.recordcount EQ 0 ){
return arguments.qry_left;
}

// get all the fields in qry_right that AREN'T in qry_left
stLocals.lstRightColumns = getUnMatchedListElems( lstFilter=arguments.qry_right.columnlist, lstCompareTo=arguments.qry_left.columnlist );

</cfscript>

<cfquery name="stLocals.qryDistinct" dbtype="query">
SELECT DISTINCT (#sRightJoinColumn#) AS sValues
FROM qry_right
</cfquery>
<cfset stLocals.lstRValues = valuelist( stLocals.qryDistinct.sValues ) />
<cfset stLocals.sEmptyClause = "0" />

<!--- numeric or string values? --->
<cfif NOT isNumeric( stLocals.qryDistinct.sValues[1] )>
<cfset stLocals.lstRValues = listQualify( stLocals.lstRValues, "'" ) />
<cfset stLocals.sEmptyClause = "''" />
</cfif>

<cfif listLen( stLocals.lstRValues ) EQ 0 >
<Cfset stLocals.lstRValues = stLocals.sEmptyClause />
</cfif>

<!--- try and guess the right type for each column in qry_right --->
<!--- by getting the first element in qry_right for each --->
<cfset stLocals.lstNullClause = "" />

<cfloop list="#stLocals.lstRightColumns#" index="stLocals.sCol">
<cfif isNumeric(arguments.qry_right[stLocals.sCol][1])
AND compareNoCase( stLocals.sCol, "project_code" )>
<cfset stLocals.sElem = "0 AS #stLocals.sCol#" />
<cfelse>
<cfset stLocals.sElem = "'' AS #stLocals.sCol#" />
</cfif>
<cfset stLocals.lstNullClause = listAppend( stLocals.lstNullClause, stLocals.sElem ) />
</cfloop>

<cfquery name="stLocals.qryUnion" dbtype="query">
<cfif arguments.qry_right.recordcount GT 0>
SELECT qry_left.*,
#stLocals.lstRightColumns#
FROM qry_left, qry_right
WHERE qry_left.#sLeftJoinColumn# = qry_right.#sRightjoinColumn#

UNION ALL
</cfif>
SELECT qry_left.*,
#stLocals.lstNullClause#
FROM qry_left
<cfif arguments.qry_right.recordcount GT 0>
WHERE qry_left.#sLeftJoinColumn# NOT IN( #stLocals.lstRValues# )
</cfif>
<cfif len(Trim(arguments.sOrderby))>
ORDER BY #arguments.sOrderBy#
</cfif>
</cfquery>

<cfreturn stLocals.qryUnion />
</cffunction>


<!--------------------------------------------------------->
<!--- getUnMatchedListElems --->
<!--- Returns a list containing all elements of --->
<!--- lstFilter that are NOT in lstCompareTo --->
<!--- @author : Al Davidson --->
<!--------------------------------------------------------->

<cffunction name="getUnMatchedListElems" access="public" returntype="string" output="no">
<cfargument name="lstFilter" type="string" required="true" />
<cfargument name="lstCompareTo" type="string" required="true" />

<cfset var stLocals = structNew() />
<cfset stLocals.lstReturn = "" />
<cfloop list="#arguments.lstFilter#" index="stlocals.sThisElem">
<cfif listFindNoCase( arguments.lstCompareTo, stlocals.sThisElem ) EQ 0>
<cfset stLocals.lstReturn = listAppend( stLocals.lstReturn, stlocals.sThisElem ) />
</cfif>
</cfloop>
<cfreturn stLocals.lstReturn />
</cffunction>

Monday, July 10, 2006

OT: That Was The Week That Was...

Phew! Quite a week, last week. Just to recap, I did my first week at Trampoline Systems as a Java guy (more to come on that later), and was ready to round things off nicely with a weekend of climbing in the Peak District. But as I got to Highbury & Islington station at 6pm on Friday, to meet Lisa and the rest of the guys, I was coming out of the WAGN (overground rail) and passing the Victoria Line platform on my right, about 5m away from me, when I saw something out of the corner of my eye, then heard a very loud bang followed instantly by loud screaming and people staggering out of the platform holding their heads and looking shocked.

Now I don't know about you, but if you're on the tube in peak rush hour, on the first anniversary of the July 7th attacks, and you hear a loud bang followed by screaming, you fear the worst. That was my first thought. It was followed very quickly by "but I'm fine, and that bang wasn't big enough to have been a big bomb - maybe a gunshot?"

Then the human sea spilling out of the platform started saying "he went under...he just went under the train..." and I knew what had happened. The bang had been him hitting the train. Someone said "he was pushed!"

Amid all the screams and sobs, the station staff and police - already on a high alert by virtue of the date - were already running down the escalators in droves, and I felt an overwhelming need for fresh air. As I ascended the escalators, I could see more and more police running in, people at the top who had heard the bang and the screams looking deathly pale, not knowing what was going on, one woman halfway down the "down" escalator collapsed in uncontrollable sobbing and trying to crawl back up, people craning their necks to see, others trying to hide their eyes. By the time I got to the top of the escalator I could hear the sirens of police cars and ambulances racing to the scene, and as I stepped out of the station blinking into the sunlight, the rapid putter-putter-putter of incoming helicopters overhead.

It was all a bit crazy. I was shaking and unsteady on my feet, despite nothing "bad" actually happening to me, just that initial fraction of a second as I heard the bang and the screams and my heart leapt into my mouth, and there were many more people worse affected than me. All through the weekend, in quiet moments, if I closed my eyes, I heard that bang followed by the screams, and I saw that distraught woman trying to crawl back up the escalator. Are we really so conditioned into a state of fear that the mere possibility that an attack has taken place can have such a profound effect on us? I thought we were over that by now.

It turns out that the poor guy WAS pushed in front of the train. A 20yr-old man was arrested over the weekend and will go on trail today.