Thursday, August 17, 2006

A Pox Upon AppleMail!

A lot of my time since joining Trampoline 6 weeks ago has been spent reaquainting myself with the black art of parsing and dismembering MIME emails with the JavaMail API. There's much I could say about the MIME format and particularly the JavaMail API, but those apopleptic rants deserve to be written up and nailed to church doors all of their own.

This post is about a problem I've been having with a mail generated by Apple Mail, that has been driving me nuts. It's not the first issue I've had with Apple Mail and it's funky attachment formatting, and I'm sure it won't be the last. However, it's the most maddening to date! Here's the problem:

In a multipart email, you separate each part with a unique string that must not occur in any of the parts. This is generated by the email client, and declared in the Content-Type header. RFC 2045 states that the boundary declaration is required for any multipart subtype. The header should look like this:


Content-Type: multipart/(whatever); boundary="----=(unique string)"


You will then get a set of Parts, each separated by an occurence of the boundary string, and each declaring what type of content it is by means of its own Content-Type header -


Content-Type: multipart/(whatever); boundary="----=(unique string)"


An example might be:


Content-Type: multipart/mixed; boundary="----=ABCDEFGHIJKLMNOP"

----=ABCDEFGHIJKLMNOP
Content-Type: text/plain; charset=US-ASCII

Hi Al,

Here's the schematic for the secret base under the island volcano. Note the new layout of the shark pools, and the trapdoor is now triggered from the pressure pad under your desk as requested. Will give the engineers a kick about the frickin' lasers and see what's taking them so long.

Cheers,

Dave

----=ABCDEFGHIJKLMNOP
Content-Type: image/png; name="plans.png"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(lots of data encoded into base 64 so that it can be transferred as text)
----=ABCDEFGHIJKLMNOP


All well and good so far.

It gets a bit more complicated when you introduce the fact that any part of the mail body can also be a multipart type, which must declare its own boundary string, but still, it should be parseable into a coherent tree structure, right?

Well yes - as long as you play by the rules.

Apple Mac files consist of two forks:

1) an apple-specific part called the RESOURCE fork which contains arbitrary information such as icon bitmaps and file info parameters,
2) a DATA fork which contains the actual file data.

This translates logically into a MIME multipart format - multipart/appledouble - with one part for each fork.

So, if we were to to send the example message above from a Mac using AppleMail, you would get something like this:


Content-Type: multipart/mixed; boundary="----=TOPLEVELBOUNDARY"

----=TOPLEVELBOUNDARY
Content-Type: text/plain; charset=US-ASCII

Hi Al,

Here's the schematic for the secret base under the island volcano. Note the new layout of the shark pools, and the trapdoor is now triggered from the pressure pad under your desk as requested. Will give the engineers a kick about the frickin' lasers and see what's taking them so long.

Cheers,

Dave

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=HEYIMTHEAPPLEDOUBLEBOUNDARY"

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=HEYIMTHEAPPLEDOUBLEBOUNDARY

----=TOPLEVELBOUNDARY


Again, this is all well and good so far - apart from one or two minor irritations like that lack of quotes around the name of the file in the Content-Type header - which can cause some grief if the filename has spaces in it.... but that can be got round without much trouble using a bit of regex in pre-processing.

The problem comes when you have multiple appledouble-encoded attachments. What you would expect is something like this:


Content-Type: multipart/mixed; boundary="----=TOPLEVELBOUNDARY"

----=TOPLEVELBOUNDARY
Content-Type: text/plain; charset=US-ASCII

blah - message text

ATTACHMENT 1:

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=HEYIMTHEAPPLEDOUBLEBOUNDARY"

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=HEYIMTHEAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=HEYIMTHEAPPLEDOUBLEBOUNDARY

ATTACHMENT 2:

----=TOPLEVELBOUNDARY
Content-Type: multipart/appledouble; boundary="----=DIFFERENTAPPLEDOUBLEBOUNDARY"

----=DIFFERENTAPPLEDOUBLEBOUNDARY
Content-Type: application/applefile; name=plans.png
Content-Disposition: inline; filename="plans.png"

(apple-specific file information encoded into base 64)

----=DIFFERENTAPPLEDOUBLEBOUNDARY
Content-Type: image/png; name=plans.png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="plans.png"

(actual file data encoded into base 64)
----=DIFFERENTAPPLEDOUBLEBOUNDARY

----=TOPLEVELBOUNDARY


But what's actually happening is that in the second attachment, the all-important Content-Type: declaration -

Content-Type: multipart/appledouble; boundary="----=DIFFERENTAPPLEDOUBLEBOUNDARY"

- is missing!

This line is absolutely vital, as it not only declares that this part is in appledouble format, but more fundamentally it declares that this part is itself a multipart and is split with THIS boundary marker rather than any other.

If this line is missing, then ONLY the FIRST attachment gets recognised. Any subsequent attachments which don't get the content-type header are then considered to be text/plain by default, so you get an email which has the first attachment properly parsed as an image, but everything after that appears inline as text. So anyone reading the email gets a big long string of base64 encoded image data. Not nice.

No comments: