Portuguese dictionary for OpenMoko’s Illume keyboard


Hi, taking advantage of a word list composed from one million words of a Portuguese newspaper, I filtered, and filtered and filtered the garbage (and I have a strong idea it still has a lot of garbage), to generate a Portuguese dictionary for OpenMoko’s Illume keyboard.

Since UTF-8 support is still borked, I have replaced special characters like é with a plain e. Yeah, it’s like using an US keyboard for writing Portuguese, but one’s gotta work with the eggs one has in order to make an omelet.

Enjoy: Portuguese (ASCII).dic

Just do:

bunzip2 "Portuguese (ASCII)-0.1.0.dic.bz2"
scp "Portuguese (ASCII)-0.1.0.dic" root@192.168.0.202:Portuguese\ \(ASCII\).dic
ssh root@192.168.0.202
mv Portuguese\ \(ASCII\).dic \
   /usr/lib/enlightenment/modules/illume/dicts/Portuguese\ \(ASCII\).dic

With a lot of thanks to Alberto Simões for pointing me to http://www.linguateca.pt/ACDC/ and Rasterman for the hints about the (quite simple) file format.

, ,

  1. #1 by furester on 14 de Outubro de 2008 - 10:24

    Hi, where you find the name (Portuguese\ \(ASCII\).dic) to use for the dict?

  2. #2 by Rui Seabra on 14 de Outubro de 2008 - 15:25

    That file name is the name I gave to the word list I link at the beggning after 5 steps of filtering…

  3. #3 by Pander on 14 de Novembro de 2008 - 10:43

    You could also make a file where all characters like ‘é’, ‘è’, etc. are simply converted to ‘e’ for usage in SMS, this allows for more characters in an SMS.

  4. #4 by Pander on 14 de Novembro de 2008 - 11:17

    Ha, ignore my previous post. I’m working on a similar list for Dutch and according to http://en.wikipedia.org/wiki/Short_message_service I’ve decided to generate three versions, 7-bit, 8-bit and 16-bit. I ran into the same problem and am also considering a 4th version like you did for US keyboard.

  5. #5 by Rui Seabra on 14 de Novembro de 2008 - 12:03

    In the dictionary file, I replaced all the accentuated characters with no accent characters (eg, ‘é’ into ‘e’) because I read somewhere that these kind of characters wasn’t yet well supported.

    If I missed any, please let me know.

    I also did a new Default.kbd whit a few more commonly used characters (at least by me) on SMS/texting.