Hi, taking advantage of a word list composed from one million words of a Portuguese newspaper, I filtered, and filtered and filtered the garbage (and I have a strong idea it still has a lot of garbage), to generate a Portuguese dictionary for OpenMoko’s Illume keyboard.
Since UTF-8 support is still borked, I have replaced special characters like é with a plain e. Yeah, it’s like using an US keyboard for writing Portuguese, but one’s gotta work with the eggs one has in order to make an omelet.
Enjoy: Portuguese (ASCII).dic
Just do:
bunzip2 "Portuguese (ASCII)-0.1.0.dic.bz2" scp "Portuguese (ASCII)-0.1.0.dic" root@192.168.0.202:Portuguese\ \(ASCII\).dic ssh root@192.168.0.202 mv Portuguese\ \(ASCII\).dic \ /usr/lib/enlightenment/modules/illume/dicts/Portuguese\ \(ASCII\).dic
With a lot of thanks to Alberto Simões for pointing me to http://www.linguateca.pt/ACDC/ and Rasterman for the hints about the (quite simple) file format.
#1 by furester on 14 de Outubro de 2008 - 10:24
Hi, where you find the name (Portuguese\ \(ASCII\).dic) to use for the dict?
#2 by Rui Seabra on 14 de Outubro de 2008 - 15:25
That file name is the name I gave to the word list I link at the beggning after 5 steps of filtering…
#3 by Pander on 14 de Novembro de 2008 - 10:43
You could also make a file where all characters like ‘é’, ‘è’, etc. are simply converted to ‘e’ for usage in SMS, this allows for more characters in an SMS.
#4 by Pander on 14 de Novembro de 2008 - 11:17
Ha, ignore my previous post. I’m working on a similar list for Dutch and according to http://en.wikipedia.org/wiki/Short_message_service I’ve decided to generate three versions, 7-bit, 8-bit and 16-bit. I ran into the same problem and am also considering a 4th version like you did for US keyboard.
#5 by Rui Seabra on 14 de Novembro de 2008 - 12:03
In the dictionary file, I replaced all the accentuated characters with no accent characters (eg, ‘é’ into ‘e’) because I read somewhere that these kind of characters wasn’t yet well supported.
If I missed any, please let me know.
I also did a new Default.kbd whit a few more commonly used characters (at least by me) on SMS/texting.