More information about the Underscore mailing list

[_] French Char to HTML entity lookup

Matt Hamilton matth at
Mon May 24 10:33:39 BST 2004

> FYI I can't change the charset as I'm displaying both French and Russian
> on the same page and there is more Russian than French so we're encoding
> using UTF-8 but the French breaks. I also can't simply use something
> like htmlentities() (the php function) as for architectural reasons
> we're already doing an htmlentity_decode(), I need to do this final
> reconversion if you like, as a post process.

That is really strange. UTF-8 should be able to handle both languages at
once. Are you *sure* you are displaying in UTF-8? Maybe somehow the
Russian is putting the browser in KOI-8 or one of the other russian sets.

Since many people speak French in Russia (due to Napolean and his buddies
coming along to pilfer all their goodies) I would have thought that the
french characters would be represented in the russian charsets.

FWIW, I think the french chars are the the ones with accents and graves
(eg. diagonal lines above in either direction) on the vowels, cidilla
(sp?) (the small 'c' with a squiggle under it)...errr.. I think thats is.
Oh yeah, and the 'o' with the 'chinese hat' on it :)

I suppose some clever so and so is going to post the actual characters
now, as they are in latin-1 as well.


Matt Hamilton matth at
Netsight Internet Solutions, Ltd. Business Vision on the Internet +44 (0)117 9090901
Web Design | Zope/Plone Development and Consulting | Co-location | Hosting