česky | english

Unicode for CAcert

First we have to research:

the possibilities for UTF-8 in all the standards we have (OpenPGP, X.509, PKIX, ...?)
interoperability for UTF-8 with existing software (how much breaks, when we deploy them?)
How do we have to configure OpenSSL to do it?
OpenSSL reads user data from a file, if this file has utf-8 codepage, it is supposed to work on unix.
Then we have to examine how PHP and our Email systems do it properly (Encoding Subject: and other headers are quite fun with UTF-8)
Then we have to examine how MySQL does it properly
UTF-8 support for our PDF generator
Then we have to work out how we can migrate the existing MySQL database contents we have to UTF-8
Then we have to work on the security aspects of UTF-8: UTF-8 exploits (stray \x00 inside a UTF-8 character for example)
- http://unicode.org/reports/tr36/
Then we have to work on the homograph-security of UTF-8
We should implement a similar security mechanism as Konqueror did, to print all UTF-8 characters in bold.
Then we have to examine the security aspects of Punycode
If all those things work out well, we can plan the migration
Then we can do the migration.
And then we can hope that it worked.

OpenPGP

OpenPGP is rather good in that area, since the OpenPGP standard defines UTF-8 to be the only encoding possible. (Likely a few applications don´t do that properly yet, but at least the standard is clear).

X.509

For X.509, I think there is a UTF8-String string-type, which could be used, but I don´t know much about the compatibility of the applications. I heard that there are a few standards which demand other stringtypes than UTF8String for specific fields, so the standards have to be examinde.

PHP

utf8_decode

Unicode exploits

We have to search for Unicode exploits that happened to other software, verify the Unicode handling routines that are implemented in the software that we are using, to see whether it can be exploited. One potential problem are Beginning-of-Unicode-character Bytes followed by 0x00 Bytes. Another often found problem are Non-Unicode Bytes inside a supposed to be Unicode string, which KDE for example likes to crash on.

TODO:

perldoc perlunicode
perldoc charnames
perldoc utf8
HTML::Entities
Encode::Byte
URI::Escape

Help Needed

If you want to help us with the Unicode Taskforce, please contact us!