More information about the Underscore mailing list

[_] Reading Word doc in PHP.

Ben Butterfield underscore at beebee-design.co.uk
Mon May 21 20:43:24 BST 2007

> I need to be able to read the contents of a Word document, either at upload
> time or from the filesystem, using PHP 4 or 5. Research has thrown up quite
> a few suggestions to using the COM object in PHP, but I am assuming that
> this is only possible if the server is running on Windows with Word
> installed, and I aren't innit.
> 
> The technical environment is:
> 
> Debian Etch
> Apache 2
> PHP 5
> MySQL 5

Ok I got to the bottom of this one and it turned out to be really 
simple. I can't believe there aren't any more resources on this subject 
kicking around the net! So, I thought I would post my solution, in the 
hope that it helps someone when the archives are back....

With the above environment still, I went on to install Gnome, which 
comes with AbiWord bundled in. Then, I simply used the following two 
lines to call AbiWord from within PHP:

exec('/usr/bin/abiword --to=txt document.doc');
exec('/usr/bin/abiword --to=html document.doc');

This takes document.doc and creates two copies of it - one in TXT format 
and one in HTML format.

Now I don't know what the performance hit of having Gnome installed is 
like on a server, but this one seems to work quite nicely and a desktop 
could be pretty useful for other things.