Monday, August 05, 2013

UNO IllegalArgument during import phase: Source file cannot be read. URL seems to be an unsupported one.

Creating Word documents in a fully linux-based environment can be tricky. There is a trick you can do which is basically saving html with a .doc extension, which allows Word to open the resulting document, but it has some drawbacks - if you try to make any changes and save it again, you can't save it as a .doc, etc etc.

So we have a multi-step chain of services set up on our Quill Platform to allow us to render our articles as genuine Word documents.

In the Rails app:

  1. User clicks 'download as word doc' on an article - this results in a request like this: GET /articles/1234.doc
  2. Check for an existing word doc representing the correct version of the requested article. If it's there, serve it back as a document. If not....
    1. Render as a string, and save a WordDocumentConversion object which encapsulates the resulting HTML
    2. Submit a job to Resque to perform the actual conversion.
    3. Redirect the user, and flash a message saying "That document might take a minute or two to generate - we'll email it to you when it's ready"
In the Resque job:
  1. Load the WordDocumentConversion
  2. POST the saved HTML as a file input to our DocumentConverter web service - a little Sinatra app which provides a RESTful endpoint around LibreOffice
  3. Save the response as a file named .doc, in binary mode, and email it to the user.
In the DocumentConverter web service:
  1. accept the POST-ed HTML content
  2. invoke UNOCONV - a command line python script that wraps the LibreOffice / OpenOffice headless document conversion service.
  3. respond with the binary content of the returned Word doc.

It all works pretty well, most of the time, and was a good exercise in building complexity through keeping each individual part very simple. However, we recently moved our live platform servers from the US-EAST EC2 region over to the EU-WEST region, and took the opportunity to rebuild them from scratch on updated Ubuntu, and while setting up our staging server we got the above error ( 'UNO IllegalArgument during import phase: Source file cannot be read. URL seems to be an unsupported one.' ) which had us scratching our heads for most of Friday.

To cut a long story short, this is another instance of what we refer as "Tao errors" - the error which can be seen is not the true error. When it says that the URL is unsupported, what it actually means is "I can't handle the file format you've requested" - usually because there are some LibreOffice OpenOffice packages missing. 

If you've only installed the base & core packages, that's not enough - to be able to render Word documents, you need to actually install the "writer" package as well. A quick scan of the unoconv documentation does give you this little tidbit -

Various sub-packages are needed for specific import or export filters, e.g. XML-based filters require the xsltfilter subpackage, e.g. libobasis3.5-xsltfilter.
ImportantNeglecting these requirements will cause unoconv to fail with unhelpful and confusing error messages.
- so I guess we were warned... but still, problem solved at last.