Friday, July 28, 2006

The Long Tail 'Ain't So Long?

The "Long Tail" is an oft-quoted theory that underpins many a Web 2.0 business model - the idea that the low-demand niche products at the esoteric end of an inventory can be aggregated into a collection of sales that rival or exceed the more popular products. In other words,

"A very, very big number (the products in the Tail) multiplied by a relatively small number (the sales of each) is still equal to a very, very big number. And, again, that very, very big number is only getting bigger."


The idea rapidly took off, as David Hornik notes in Where's The Money In The Long Tail?

"Six months ago there was barely a pitch I heard that didn't include a slide entitled "Long Tail" or "The Long Tail of [fill in the vertical]," with the obligatory long tail curve. Impressively, it has taken less than a year of entrepreneurs explicitly referencing and explaining the Long Tail before it has become so well recognized and understood that it need only be implicated in passing without the same sort of fanfare as it used to receive"


Guy Kawasaki also has heard so many long-tail pitches, he's produced cyncic's checklist for Long Tail ipmlementations.

However, the figures behind the idea may not quite add up after all - or at least, the figures have been exaggerated and doubtless been subject to a kind of Chinese Whispers - after all, when everyone concerned has a vested interest in a mem taking hold, it's not difficult to see how "I Want To Believe" can turn into "I Believe".

more details from everyone's favourite IT red-top scandal sheet, The Register at The Long Tail's maths begin to crumble :

The Long Tail, then, isn't snake oil. Rather its restorative properties have been exaggerated. Who wants to let facts get in the way of a buzzword bestseller good thesis?

Tuesday, July 18, 2006

The ultimate social software...?

...or maybe that should be hardware - Channel 4 to screen public jerk-a-thon

I'm going to leave that without any comment. I think the punnage is bad enough as it is.

(wince) I'll get me coat...

Interoperability and Monetising Web 2.0

Following on from my earlier post about automating the build process, I thought I'd add a bit more about interoperability.

It's this commitment to interoperability that I love about the OSS movement, more than any of the ideals and ideologies and motivations behind it. In general, it seems that commercial software tends to be designed to interoperate only with other tools from the same vendor (yes, Microsoft, I'm looking at you), whereas OSS tools tend to like to interoperate with just about anything they can. Of course, that's a sweeping generalisation, but I'm talking trends rather than specifics here. Java itself may not be fully Open source yet, but it's getting there.

Interoperability is also one of the big changes in Web 2.0, much more significant in my eyes than whizz-bang AJAX interfaces. The first dotcom boom was all about control and ownership of information. So many of our business conflicts and process difficulties at Rocom on the Data eXchange project (gotta love those great-big-feck-off X's in project names) boiled down, in the end, to a question of "who owns this information?" Web 1.0 business models were all about monetising content, getting people to pay for access to information, locking it down and charging for access. You want my product data? Sure, but it'll cost you X per Y products...

Web 2.0, on the other hand, is all about openness, of information and APIs. You want my product data? Sure! You want to help sell my products by showing them on your site, of course you can! RSS or Atom? Do you want a SOAP-enabled web service with that, so you can write a desktop client? No problem... API keys are over there, just by the chocolate sprinkles... however, providing syndication and API services can be costly, both in terms of bandwidth and server resources. So what's the incentive?

The question of how to effectively monetise a Web 2.0 business is yet to be fully answered, of course. The famous "Three A's" business model - AJAX, AdSense and Arrogance - is really just the same as the classic Web 1.0 business plans of "we'll get 10 million hits a month, and generate all our revenue from advertising", and it's not really any more viable now than it was then.

As ads become more-and-more intrusive (those pop-over Flash ads that appear right in front of the content you're reading just make me want to run through a shopping mall with an Uzi, I'm afraid - sorry to any Flash developers who make those for a living that might be reading) - ad-blocking tools become more and more sophisticated and widely-used, and the arms race continues, just like spam. The only media that can survive solely on ad revenue are the traditional print media, where blocking the ads is either not feasible, or not worth the effort. Even TV ad revenue is declining (third video down) in the face of TiVo and other similar products - advertisers are unwilling to pay for face-time with viewers who effectively aren't there anymore.

On the other hand, web-based mash-up services that depend entirely on the availability of third-party services like Flickr or Google Maps, or even del.icio.us, are also building their houses on sand. The systems analyst side of me says that if you build your business model to depend 100% on a third-party who provides their service for free, then you need to rethink your business model!

But it's here that I think the big service providers may be missing a trick. A basic Flickr account costs nothing, and gives you a taster of the service, enough to make you want to sign up for a low-cost Pro account. But how about providing another tier of service above that, for the mash-up businesses that depend on availability?

Why not provide a cordonned-off cluster of high-avaiability servers with a dedicated support team, available only to a dependent business who's willing to pay a premium because they depend on it?

I haven't done the sums myself, I admit, but it's another way of monetising your Web 2.0 services, and provides a revenue stream to get some payback for your openness.

Now That's What I Call A Compilation, Vol 1

One of the things I'm having to get used to all over again in my mutation from a CF guy to a Java guy, is having an intervening compilation stage inbetween making my changes and testing them. It's how I started out, of course, going back through C/C++ and right back to COBOL, but in the 6yrs I've been doing CF, I'd got kind of comfy with making a change, ALT+TABbing to the browser and hitting F5, and seeing the effects straight away. And if there was any issue, just sticking a <CFDUMP><CFABORT> in and hitting F5 again.

But on an enterprise Java system with around 2000 classes, the build time can easily be a couple of minutes. This can get quite irritating at times. All the time I'm waiting for the compiler, I can feel my mental logic tree (e.g. "if X happens, then my assumption about Y was correct which means that Z shouldn't happen, so if I see THAT then I need to think again about A, B and C....") fading out of focus and getting replaced with trivia.... or a mental picture of a nice big steaming hot cappucino... or ham slapping, or whatever...

Luckily, there are compensations in the Java world which make up for the extra build step.

Having a fully-featured debugger helps a lot. We use IntelliJ IDEA as our IDE (Eclipse just gets too slow, unfortunately) and the debugger in that is excellent.

You can step forwards AND backwards through your code, evaluating expressions and variables and function calls at any point, getting a tree view of all in-scope variables' and properties' values, and you can change the value of any simple variable at any time.

This came in SO handy the other day when I was trying to parse a MIME multipart message generated by AppleMail, which is notoriously bad at handling attachments in a standards-compliant way. Narrowing it down to one particular header, stepping backwards and forwards over the exception-throwing line, changing one character at a time until I'd cracked it..... it almost made me forget the good-old <CFDUMP><CFABORT> combo - almost!

The other thing I'm appreciating more and more is that with a little effort, and a liberal sprinkling of creamy Open-Source goodness, this build stage can automated to a remarkable degree.

We have an ANT script to perform the compilation, compile bindings, package classes into jars, put everything together in the right way for whichever app server you are deploying onto, etc etc. We use Subversion for version control and Trac for issue tracking, which work together fantastically well.

If you're cunning, you can also set things up so that every time a revision is committed to the SVN repository, a full build is performed on the repository server, and all the unit tests run in the background, with any errors getting automatically emailed to the development team - so you know if anything you've committed has broken something else. Maybe this could be achieved with CFCUnit, I don't know - anyone tried it?

Thursday, July 13, 2006

Faking a Left Outer Join in Query-of-Queries

Just picked up on Rip's post about Query-of-Queries tricks and thought I'd add share a trick I came up with some months ago.

One of the big things missing from Queries-of-Queries, IMHO, is support for Left Outer Joins. You can join two queries together using an "old-style" implied join like so:

<cfquery dbtype="query">
SELECT (fields)
FROM myQuery1, myQuery2
WHERE myQuery1.fooID = myQuery2.fooID
</cfquery>


...but this "implied join" syntax is only an INNER JOIN - i.e. it will only return rows where both queries have matching rows.

Often, you need the functionality of a LEFT OUTER JOIN. For the example above, a left outer join would return a row for each record in myQuery1, plus each matching record in myQuery2 if a matching row exists, or NULL in the myQuery2 fields if no matching row exists.

For instance, you might want to list all products, with the order numbers of any orders for each product, but if the product has no orders you would still want it to appear in the query. This is a perfect situation for a LEFT OUTER JOIN.

Unfortunately, query-of-queries only supports the implied join syntax, therefore it doesn't support left outer joins. But left outer joins are just so mind-buggeringly useful that I had to find a way to fake them.... hyukhyukhyuk...

Q-of-Q does support the UNION construct, for joining two result sets together. By default, a UNION filters out duplicate rows, unless you specify it as a UNION ALL - which returns all rows including duplicates.

So you do one q-of-q to get all the rows which are matched, and then another q-of-q to get all the rows in your "left" query that DON'T have a matching row in the "right" query, and you UNION ALL them together. There's a little bit of fun you have to go through in order to work out the columns in the "right" query that aren't in the "left" query, and fill them with "NULL" values - especially as you have to work around the fact that you can't really do NULLs....

...anyway, here we go. This code has been written for MX 6.1, and works pretty well on that, and on 7. I'm sure someone could make it more bulletproof if they really wanted to, and it would be fairly simple to introduce support for explicitly-named column types, but frankly, I couldn't be arsed :) Feel free to tinker with at your leisure.




<!------------------------------------------------->
<!--- leftOuterJoin --->
<!--- Emulates a left outer join in QofQ's --->
<!--- @author : Al Davidson --->
<!------------------------------------------------->

<cffunction name="leftOuterJoin" access="public" returntype="query" output="yes">
<cfargument name="qry_left" type="query" required="yes" />
<cfargument name="qry_right" type="query" required="yes" />
<cfargument name="sLeftJoinColumn" type="string" required="true" />
<cfargument name="sRightJoinColumn" type="string" required="true" />
<cfargument name="sOrderBy" type="string" required="false" default="" />

<cfscript>
// var'ed struct so that we don't have to var every local variable
var stLocals = structNew();

// check for an empty left query
if( arguments.qry_left.recordcount EQ 0 ){
return arguments.qry_left;
}

// get all the fields in qry_right that AREN'T in qry_left
stLocals.lstRightColumns = getUnMatchedListElems( lstFilter=arguments.qry_right.columnlist, lstCompareTo=arguments.qry_left.columnlist );

</cfscript>

<cfquery name="stLocals.qryDistinct" dbtype="query">
SELECT DISTINCT (#sRightJoinColumn#) AS sValues
FROM qry_right
</cfquery>
<cfset stLocals.lstRValues = valuelist( stLocals.qryDistinct.sValues ) />
<cfset stLocals.sEmptyClause = "0" />

<!--- numeric or string values? --->
<cfif NOT isNumeric( stLocals.qryDistinct.sValues[1] )>
<cfset stLocals.lstRValues = listQualify( stLocals.lstRValues, "'" ) />
<cfset stLocals.sEmptyClause = "''" />
</cfif>

<cfif listLen( stLocals.lstRValues ) EQ 0 >
<Cfset stLocals.lstRValues = stLocals.sEmptyClause />
</cfif>

<!--- try and guess the right type for each column in qry_right --->
<!--- by getting the first element in qry_right for each --->
<cfset stLocals.lstNullClause = "" />

<cfloop list="#stLocals.lstRightColumns#" index="stLocals.sCol">
<cfif isNumeric(arguments.qry_right[stLocals.sCol][1])
AND compareNoCase( stLocals.sCol, "project_code" )>
<cfset stLocals.sElem = "0 AS #stLocals.sCol#" />
<cfelse>
<cfset stLocals.sElem = "'' AS #stLocals.sCol#" />
</cfif>
<cfset stLocals.lstNullClause = listAppend( stLocals.lstNullClause, stLocals.sElem ) />
</cfloop>

<cfquery name="stLocals.qryUnion" dbtype="query">
<cfif arguments.qry_right.recordcount GT 0>
SELECT qry_left.*,
#stLocals.lstRightColumns#
FROM qry_left, qry_right
WHERE qry_left.#sLeftJoinColumn# = qry_right.#sRightjoinColumn#

UNION ALL
</cfif>
SELECT qry_left.*,
#stLocals.lstNullClause#
FROM qry_left
<cfif arguments.qry_right.recordcount GT 0>
WHERE qry_left.#sLeftJoinColumn# NOT IN( #stLocals.lstRValues# )
</cfif>
<cfif len(Trim(arguments.sOrderby))>
ORDER BY #arguments.sOrderBy#
</cfif>
</cfquery>

<cfreturn stLocals.qryUnion />
</cffunction>


<!--------------------------------------------------------->
<!--- getUnMatchedListElems --->
<!--- Returns a list containing all elements of --->
<!--- lstFilter that are NOT in lstCompareTo --->
<!--- @author : Al Davidson --->
<!--------------------------------------------------------->

<cffunction name="getUnMatchedListElems" access="public" returntype="string" output="no">
<cfargument name="lstFilter" type="string" required="true" />
<cfargument name="lstCompareTo" type="string" required="true" />

<cfset var stLocals = structNew() />
<cfset stLocals.lstReturn = "" />
<cfloop list="#arguments.lstFilter#" index="stlocals.sThisElem">
<cfif listFindNoCase( arguments.lstCompareTo, stlocals.sThisElem ) EQ 0>
<cfset stLocals.lstReturn = listAppend( stLocals.lstReturn, stlocals.sThisElem ) />
</cfif>
</cfloop>
<cfreturn stLocals.lstReturn />
</cffunction>

Monday, July 10, 2006

OT: That Was The Week That Was...

Phew! Quite a week, last week. Just to recap, I did my first week at Trampoline Systems as a Java guy (more to come on that later), and was ready to round things off nicely with a weekend of climbing in the Peak District. But as I got to Highbury & Islington station at 6pm on Friday, to meet Lisa and the rest of the guys, I was coming out of the WAGN (overground rail) and passing the Victoria Line platform on my right, about 5m away from me, when I saw something out of the corner of my eye, then heard a very loud bang followed instantly by loud screaming and people staggering out of the platform holding their heads and looking shocked.

Now I don't know about you, but if you're on the tube in peak rush hour, on the first anniversary of the July 7th attacks, and you hear a loud bang followed by screaming, you fear the worst. That was my first thought. It was followed very quickly by "but I'm fine, and that bang wasn't big enough to have been a big bomb - maybe a gunshot?"

Then the human sea spilling out of the platform started saying "he went under...he just went under the train..." and I knew what had happened. The bang had been him hitting the train. Someone said "he was pushed!"

Amid all the screams and sobs, the station staff and police - already on a high alert by virtue of the date - were already running down the escalators in droves, and I felt an overwhelming need for fresh air. As I ascended the escalators, I could see more and more police running in, people at the top who had heard the bang and the screams looking deathly pale, not knowing what was going on, one woman halfway down the "down" escalator collapsed in uncontrollable sobbing and trying to crawl back up, people craning their necks to see, others trying to hide their eyes. By the time I got to the top of the escalator I could hear the sirens of police cars and ambulances racing to the scene, and as I stepped out of the station blinking into the sunlight, the rapid putter-putter-putter of incoming helicopters overhead.

It was all a bit crazy. I was shaking and unsteady on my feet, despite nothing "bad" actually happening to me, just that initial fraction of a second as I heard the bang and the screams and my heart leapt into my mouth, and there were many more people worse affected than me. All through the weekend, in quiet moments, if I closed my eyes, I heard that bang followed by the screams, and I saw that distraught woman trying to crawl back up the escalator. Are we really so conditioned into a state of fear that the mere possibility that an attack has taken place can have such a profound effect on us? I thought we were over that by now.

It turns out that the poor guy WAS pushed in front of the train. A 20yr-old man was arrested over the weekend and will go on trail today.

Wednesday, July 05, 2006

XP (Home) Client + Linux Domain Controller + Samba Share = a whole world of fun

Just spent two hours getting a Windows XP (Home) laptop to access a Samba share on a Linux server.... FECK ME, that was fun! There was nothing wrong with the share itself, because we could access it fine from a Mac using the same username and password, but could we get Windows XP to authenticate? Could we hell.....

Here's the setup - the Linux box was the domain controller, and it exposed a Workgroup to the Windows Network (under My Network Places -> Entire Network -> Microsoft Windows Network -> ) but it just couldn't authenticate, no matter what we tried.

I'll cut a long, exasperating and expletive-ridden story short, missing out all the entertaining cursing of various rude bits of various animals to various circles of hell, and jump straight to the end. Turns out the problem was that WinXP Home couldn't resolve the domain properly without an LMHOSTS file.

Pretty much everyone who's worked with Windows and the web for any length of time knows about the HOSTS file. This lives in (Windows Root)/System32/drivers/etc and is an extensionless file, mapping host names (e.g. myserver.mydomain.com) to IP addresses. Handy for those occasions when you need multiple domain names mapping to your own pc, for instance when you have a system serving multiple sites from one codebase resolving via the hostname.

The LMHOSTS file is similar, and lives in the same place, but is not quite as straightforward. Its purpose is to tell Windows how to resolve NetBios names, and the format is... well... non-trivial. Full information is here : How to Write an LMHOSTS File for Domain Validation and Other Name Resolution Issues

The important entries you'll need are:

10.0.0.1 PDCNAME #PRE #DOM:DOMAIN_NAME
10.0.0.1 "DOMAIN_NAME \0x1b" #PRE


Replace 10.0.0.1 with the IP address of the Linux domain controller
Replace PDCNAME with the hostname of the Linux domain controller
Replace DOMAIN_NAME with the name of the WORKGROUP

Now for the fiddly bit : on the second line, you need EXACTLY 20 characters between the quotes. Between your domain name and the \0x1b there must be spaces, exactly the right number of spaces, and nothing but spaces. The MS reference above gives more detail.

But once I'd created an extensionless file called lmhosts in the right place, with the right entries, and the right number of spaces, suddenly I could authenticate.

As for the more fundamental question of which person at Microsoft decided on that format of the lmhosts file, where he/she lives, and how to gain access to their bedroom undetected while armed with a large fish, I'll leave that as an exercise for the reader.