Thursday, August 16, 2007

The Perennial RSS Authentication Dilemma

It's a common problem, one that has cropped up many times for me over the last few years. You build a secure system, locked up behind a login so that only authenticated users can access the tightly-controlled data, and everything's fine - and then you come to the RSS feeds.

Simply put, RSS feeds and the corresponding use-case of syndicating data out of the application into another application - be it a desktop RSS reader, a web-based aggregator, or even another context within the same system - is in direct contradiction to your security. You can't have an RSS reader log in to your app using the standard login form, and most readers certainly don't support cookies, so you have to provide a bypass.... but what mechanism?

I've tried numerous approaches - most often HTTP AUTH (which some readers support, but not many) or an encrypted url token.

HTTP AUTH is somewhat like an old faithful, it's been around forever, every web server and browser worth mentioning has supported it for years, and it's simple to implement. But it has the disadvantage that once authenticated, the only way to log out is to close your browser completely. Also, many RSS readers don't support it.

Encrypting the users security credentials into a token that you can pass on the URL, is guaranteed to work on anything that can pass on a url correctly, but it has the disadvantage that then anyone who gets access to that url, to all intents and purposes, is that user - so you still have to be careful what you expose in the feed itself.

The main thing, as ever, is to establish exactly what behaviour is "intended". If the brief is for the user to be able to copy-and-paste RSS urls into readers / emails / other sites, then make sure everyone is clear on the implications of that - you're essentially allowing people and / or applications to impersonate a user without going through the login process.


Requirements, of course, vary wildly from app to app, but the approach I've tended to settle on is a combination of multiple methods - if there is a cookie identifying the user, then use that to establish ID, else if you have HTTP AUTH credentials, use them, authenticating appropriately as required. But remember that if you are dealing with automatic requests from readers via HTTP AUTH or encrypted tokens, you should ALWAYS clear out any session variables at the end of the request, otherwise you can quickly end up with thousands of persistent sessions for no apparent reason

Wednesday, August 15, 2007

Astroturfing in action : social media manipulation exchange

"Astroturfing" is the art of faking "grassroots" - social media manipulation, in other words.

And here is where you can buy, sell, or exchange Diggs, Stumbles, Reddits, blog posts, reviews, you name it....

Monday, August 13, 2007

Apache Proxied Rewrite Character Set Gotcha

Character sets, character sets, I'm perenially beseiged by character sets.... I just got to the bottom of a particularly strange character set issue with one of our clients' website.

Their domain points to our apache web server, which rewrites any requests for collaboration engine urls (e.g. somedomain.org/groups/ or somedomain.org/people ) in order to send them to the collab platform, and rewrites any other urls to go directly to a third-party hosted CMS-driven website.

The problem they were having was that when viewing the live, proxied site, strange characters were appearing in the generated content - e.g. "?" and that funny square box that handily tells you "you got the character set wrong, fool" - but when they looked at the site directly on the third-party CMS, it was fine.

Using Firebug to inspect the headers showed that when viewing the third-party CMS, the Content-Type was:

Content-Type text/html

but when proxied via our Apache web server, it was:

Content-Type text/html; charset=UTF-8

The meta tag in the HTML itself was :

< META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" >

- and the ensuing character set mismatch was causing the dodgy character funkiness.

It turns out that buried deep within the bowels of our Apache config was this line:

#
# Specify a default charset for all content served; this enables
# interpretation of all content as UTF-8 by default. To use the
# default browser choice (ISO-8859-1), or to allow the META tags
# in HTML content to override this choice, comment out this
# directive:
#
AddDefaultCharset UTF-8


Which was doing exactly what it said on the tin - forcing UTF-8 interpretation whatever the META tag said. Rather than changing it in the central config, I overrode it in their specific config with this line :

AddDefaultCharset Off

and it seems to work fine now. So that's a big "yay" for Firebug, a dependent "yay" for Firefox's plugin/extension architecture that allows and encourages third parties to develop these funky plugins, and a big slap round the face with a large fish for anyone who's not using UTF-8 in this day and age....