Monday, August 13, 2007

Apache Proxied Rewrite Character Set Gotcha

Character sets, character sets, I'm perenially beseiged by character sets.... I just got to the bottom of a particularly strange character set issue with one of our clients' website.

Their domain points to our apache web server, which rewrites any requests for collaboration engine urls (e.g. or ) in order to send them to the collab platform, and rewrites any other urls to go directly to a third-party hosted CMS-driven website.

The problem they were having was that when viewing the live, proxied site, strange characters were appearing in the generated content - e.g. "?" and that funny square box that handily tells you "you got the character set wrong, fool" - but when they looked at the site directly on the third-party CMS, it was fine.

Using Firebug to inspect the headers showed that when viewing the third-party CMS, the Content-Type was:

Content-Type text/html

but when proxied via our Apache web server, it was:

Content-Type text/html; charset=UTF-8

The meta tag in the HTML itself was :

< META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" >

- and the ensuing character set mismatch was causing the dodgy character funkiness.

It turns out that buried deep within the bowels of our Apache config was this line:

# Specify a default charset for all content served; this enables
# interpretation of all content as UTF-8 by default. To use the
# default browser choice (ISO-8859-1), or to allow the META tags
# in HTML content to override this choice, comment out this
# directive:
AddDefaultCharset UTF-8

Which was doing exactly what it said on the tin - forcing UTF-8 interpretation whatever the META tag said. Rather than changing it in the central config, I overrode it in their specific config with this line :

AddDefaultCharset Off

and it seems to work fine now. So that's a big "yay" for Firebug, a dependent "yay" for Firefox's plugin/extension architecture that allows and encourages third parties to develop these funky plugins, and a big slap round the face with a large fish for anyone who's not using UTF-8 in this day and age....

No comments: