Thursday, February 16, 2006

"users" : Reserved Word in Apache mod_rewrite ?

Quick pop quiz : does anyone know if the word "users" is some kind of reserved word in Apache 2.0.55 mod_rewrite on Windows?

The reason I ask is that I have a rewrite rule like this:

RewriteRule ^members/(.+)$ /users.cfm?sParams=$1 [R,L]

- to allow me to rewrite a URL of the form (site)/users/(screenname) into the form (site)/users.cfm?sParams=(screenname)

The L means this is the LAST rule that should be processed, if it matches
The R stands for REDIRECT and means that the rewritten URL appears in the browser location bar - handy for debugging on dev.

The issue I'm having is that if I do this, it doesn't work - it gives me a 404 error.
However, if I change the URL stub to something else - anything else - e.g.

RewriteRule ^donkeys/(.+)$ /users.cfm?sParams=$1 [R,L]

then it works fine.

I don't have any file or directory called users in that directory.
The word "users" doesn't appear in any other Apache directives.

In the face an extremely tight dev budget, I've had to just admit defeat and use a url stub of "members" instead, but I'd be interested to hear if anyone knows what might be going on here.

Tuesday, February 14, 2006

We Like Apache mod_rewrite

There is a curse.

They say: "may you live in interesting times"

We like Apache mod_rewrite.

It means I live in interesting times.

Wednesday, February 08, 2006

Notes from Josh Schachter's Del.icio.us Presentation

Josh Schachter of del.icio.us just finished giving his opening speech. Some nice insights in there that brought a few wry chuckles from the audience. Here's my notes:

BROWSERS

  • Browsers suck.
  • You will spend a vast amount of time trying to cater for every browser version under the sun.
Scaling.
  • You will try to plan for scaling - don't bother! Whatever you predict ahead of time will turn out to be wrong.
  • handy tips about DB setup
  • try to partition your db to get maximum load-balancing across disks, machines, etc...
  • you can configure your DB table indexes to only index part of the table - e.g. if viewing items for a tag, no-one is going to go to page 150,160, etc...
  • all this scaling is necessary because people will come out with all kinds of weird stuff that you never would have thought of, that will break your app in all kinds of ways that you never thought possible.

Apache
  • tuning Apache can be just as important as tuning the DB.
  • put a proxy IN FRONT of Apache, to get e.g. images coming from one server, RSS coming from another, main app coming from another, etc
APIS
  • make your APIs AS EASY AS POSSIBLE to use
  • SOAP & XML-RPC are quite heavyweight - take time to set up
  • del.icio.us has an incredibly simple XML API - you can make a command-line script to do it => lots of people use it.
IDs
  • giving everything a unique id might not be best for scaliability
  • don't expose internal ids to outside world, especially if sequential - because some idiot is going to try to script a GET for every single possible ID to try and scrape your db!
Features
  • build features that people will actually USE - rather than what they ask for
  • always ask WHY they want a particular feature. That way you get at the core of the problem.
RSS
  • there are lots of really poorly-written RSS readers.
  • caching is vital
URLs
  • design your URLs according to the usage pattern of the site
  • keep your framework hidden
SURPRISES
  • people will surprise you with what they try to do
  • do you ignore it, block it, discourage it, or embrace it?
PASSION
  • build something to solve a problem that YOU YOURSELF have
  • otherwise someone else who does have that problem will be more passionate than you about it, so they will do a better job than you.
RELEASE
  • Every day you don't have something out there in the world, that's another day that you're not picking up users, your users are not getting other users on board, you're not learning anything.
  • => release early, release often
ATTENTION
  • As community grows, bias drifts - del.icio.us most popular is not as interesting as it used to be, due to averaging out
  • figure out a way to either keep things on topic, or fragment the attention
SPAM
  • people will always try to spam you, and use your service to spam others.
  • any "most popular" or "hot" list will attract spam
  • figure out ways to cut off the easy routes to that
  • DON't give spammers an immediate error message.
  • if you cut people off, don't tell them - let them stay scratching their heads
  • if you tell them straight away, they'll know, and they'll try another route
TAGS
  • The value of tagging is in the attention - you saw this and you thought it was worth saving and this is what you thought it was about. THAT's what creates the social connection - you saw this and thought it was about X, someone else saw the same thing and thought it was about X (or Y)
  • Any automatic tag generator is kind of defeating the object.
  • Make people do the minimum amount of work - but always SOME work

MOTIVATION
  • Why are users there on your system? What do they get out of it?
  • They are selfish - they're in it for them
  • Ask what the app does for users? What's in it for them?
  • Make the userbase that you already have, want to bring others into the system.
EFFORT
  • If you spend a lot of time building a feature that nobody uses, that's wasted time
  • Be careful about where you expend your effor
MEASUREMENT
  • Monitor everything, put numbers on everything
  • If a user is using something in week 1, are they still using it in week 5?
  • You want to measure how people are reacting to features, implementation
  • delicious has no rating system - why would you bookmark something that's bad?
TESTING
  • Make sure the system you're building - in terms of look, feel, behaviour - matches your users
  • delicious did two days of user testing with a one-way mirror - it threw up big surprises
  • don't give people a listof todo's - they have vastly different behaviour when they do that - not like real-world usage at all
  • let people wander off on their own as they would do in real usage
LANGUAGE
  • speak the user's language
  • delicious is about "bookmarking" - but most people didn't know what a bookmark was. They called them "favourites" because that's what IE calls them
  • over half his users didn't understand it
REGISTRATION
  • don't make users register before they can see stuff
  • give them as much functionality as possible before they have to sign up
  • even let them use it as an anonymous user account (session cookie?)
  • be very careful about when and where and WHY you ask for registration
  • let them wander around and get a feel for the app
  • if you have to ask them to login/register, then make it as short and sweet and painless as possible, THEN TAKE THEM BACK TO EXACTLY WHERE THEY WERE

DESIGN GRAMMAR
  • use standard navigation elements / structure
  • don't try to break the mould - people expect things to work a certain way
  • if I put the title underlined in blue, then a summary, then metadata below, then i look like a search engine - with all the implied usage pattern that implies
MORALS
  • your data belongs to your users, NOT YOU
  • let your users "take their ball and go home" - remove themselves completely including all their data
  • e.g. if a user bookmarked something personal that they might not want others to know about, then they want to remove it completely

INFECTION
  • invade every communication stream you can find to promote your app
  • figure out how to turn every RSS reader into a client app - there's functionality in the RSS reader/feed itself
  • look for viral streams - email / rss
  • desktop apps consume vast amounts of info via http - work out how to use this to work for you
COMMUNITIES
  • how do communities use your system>?
  • don't think of your site users as a community in itself - unless you're actaully trying to build that
  • delicious lets communities use the system, but is not a community itself
  • there's a lot of community dynamics that Josh didn't want - flame wars, battles for alpha male status, etc

At the summit 1

9:50 - Arrived in the main hall and saw a sea of bloggers, heads down over their laptops.
It reminds me of that "So you think your office is bad?" picture of the infinite Korean cubicle farm. There's an Adobe freebies CD on the seat, with copies of Flex builder 2 beta 1, Flex Enterprise Services beta 1, Flash player 8.5 alpha, and some sample code from Adobe labs - I'll have to have a play if any of the speeches get a bit dull...

Tuesday, February 07, 2006

Web 2.0 / Future Of Web Apps summit tomorrow

I'm off to the "Future Of Web Applications" summit tomorrow (The conference formerly known as the Web 2.0 Summit. Should be an interesting day - I'm particularly looking forward to the talks from Josh Schacter of del.icio.us, and Cal Henderson of Flickr. Well, now you have to say "Cal Henderson of Flickr", but to me (and, I'm sure, to many others) I don't care how good Flickr gets, nothing can ever top creating B3TA.

Any other CF-ers going? Or am I to be laughed at and whispered about by groups of Ruby-on-Rails guys?

Friday, February 03, 2006

TQL: A Standard Syntax for Multi-Tag Queries

Introduction


Tagging is booming. Everywhere you look, it seems that these little free-text Post-Its have become the semantic glue of Web 2.0.

But therein lies the problem - the tech-heavy community of del.icio.us users tends to use the tag "java" to refer to the programming language Java, as is immediately apparent from the link above. The same tag on Flickr, however, shows lots of photos of the island of Java, and none relating to the programming language.

This is to some extent inescapable - a single tag is always context-sensitive, and taken out of that context, the inherent ambiguities of language become apparent.

These ambiguities can be resolved, or at least reduced, by using multiple tags to provide context. For instance, anything tagged with Java AND code is probably about the programming language, whereas something tagged with Java AND Indonesia probably refers to the place.

Or maybe not - according to Wikipedia, Java is also a type of coffee which originated on the island, and it's also a term for the Javanese language. The combination of java AND indonesia could plausibly be used for both of these subjects aswell. java AND coffee should drill down to the coffee-related posts, but java AND language ? Well, which language? The programming language or the spoken language?

So maybe we need a third tag? Something like java AND language AND javanese. But then we are relying on any item about the Javanese language being tagged with all three tags. This may not be true in many cases - if a post is tagged with just java AND javanese, our three-tag query above will not return it.

So how about java AND language OR javanese? But then we must be careful - does that mean that it must have java AND EITHER language-or-java, or does it mean EITHER java-AND-language OR javanese ?

There are some tools and APIs already starting to emerge which support combinations of tags - Del.icio.us supports tag "unions" via the + operator. Ultimate Tag Warrior, the tag plugin for WordPress, supports tag "unions" and "intersections" via the + and | operators. Technorati, on the other hand, uses a plain-text string of "tag1 OR tag2" in the URL, which gets URL-encoded to "tag1%20OR%20tag2".

So which is it to be?

A Proposed Solution

What we need, then, is a generalised boolean syntax for tag URLs. A Tag Query Language, if you like. It should probably have certain properties :

Properties

  1. It should be able to be used in a URL
    so no "/" characters, no "." characters, etc.
  2. It should be human-interpretable
    a person should be able to look at a TQL query string and work out what was intended without too much effort.

  3. It should be simple
    Web 2.0 is showing us that the simple solution is usually the best, and something which requires too much time to implement is just not going to be widely adopted.
  4. It should be easily translatable into standard SQL
  5. It must not expose the implementing website to potential SQL Injection attacks

So what form would such a syntax have? Maybe something like the following:

Operators

  • A boolean AND is indicated by the plus sign +
  • A boolean OR is indicated by the pipe symbol |
  • A boolean NOT is indicated by the exclamation mark !
  • Ambiguous logic (e.g. "x and y or z" ) can be resolved by grouping with braces ( )

Rules

  • To satisfy requirement 5, any character which is not an alphanumeric or one of the above qualifiers should be stripped out.
  • To satisfy requirement 3 - the KISS principle -
    • braces must not be nested.
      That way lies a whole world of nastiness...
    • all operators have equal precedence after grouping with braces ().
      In the absence of braces, they
      should be evaluated in straightforward left-to-right order.

To use our example above, this would allow a query for items relating to java, the spoken language to be written as follows:

java+(language|javanese)

Example SQL

Assume that :

  1. tags are stored in a column tagcolumn in a table tagtable
  2. the items we want to return are stored in a table itemtable
  3. items are related to tags in a many-to-many : one item can have many tags, and a tag can apply to many items.
  4. this link is accomplished by an intermediary linktable relating itemids to tagids
  5. we can represent the selection criteria for an item by the placeholder myothercriteria

Then our query above would translate to something like the following:

SELECT (columns)
FROM (itemtable)
WHERE
EXISTS (
SELECT tagcolumn FROM tagTable INNER JOIN linktable ON linkcriteria
WHERE tagcolumn = 'java' AND linktable.itemid = itemtable.itemid
)
AND
(
EXISTS ( SELECT tagcolumn FROM tagTable INNER JOIN linktable ON linkcriteria
WHERE tagcolumn = 'language' AND linktable.itemid = itemtable.itemid
)
OR EXISTS ( SELECT tagcolumn FROM tagTable INNER JOIN linktable ON linkcriteria
WHERE tagcolumn = 'javanese' AND linktable.itemid = itemtable.itemid
)
)

Of course, this SQL is only an example, and is written for clarity rather than performance. There are many optimisations which could be done in this scenario, and many platform-specific options to be considered. For instance, SQL Server allows the indexing of views, which could reap large benefits in this application. Coldfusion allows the re-querying of in-memory resultsets using SQL syntax, However, these optimisations are implementation-specific, and as such are beyond the scope of this article.