Hi, taking advantage of a word list composed from one million words of a Portuguese newspaper, I filtered, and filtered and filtered the garbage (and I have a strong idea it still has a lot of garbage), to generate a for ’s Illume keyboard.

Since UTF-8 support is still borked, I have replaced special characters like é with a plain e. Yeah, it’s like using an US keyboard for writing , but one’s gotta work with the eggs one has in order to make an omelet.

Enjoy: Portuguese (ASCII).dic

Just do:

bunzip2 " (ASCII)-0.1.0.dic.bz2"
scp "Portuguese (ASCII)-0.1.0.dic" root@192.168.0.202:\ \(ASCII\).dic
ssh root@192.168.0.202
mv \ \(ASCII\).dic \
   /usr/lib/enlightenment/modules/illume/dicts/\ \(ASCII\).dic

With a lot of thanks to Alberto Simões for pointing me to http://www.linguateca.pt/ACDC/ and Rasterman for the hints about the (quite simple) file format.

Temas: , ,
This entry was posted on Segunda-feira, Outubro 13th, 2008 at 20:12 and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

5 comments so far

 1 

Hi, where you find the name (Portuguese\ \(ASCII\).dic) to use for the dict?

Outubro 14th, 2008 at 10:24
 2 

That file name is the name I gave to the word list I link at the beggning after 5 steps of filtering…

Outubro 14th, 2008 at 15:25
Pander
 3 

You could also make a file where all characters like ‘é’, ‘è’, etc. are simply converted to ‘e’ for usage in SMS, this allows for more characters in an SMS.

Novembro 14th, 2008 at 10:43
Pander
 4 

Ha, ignore my previous post. I’m working on a similar list for Dutch and according to http://en.wikipedia.org/wiki/Short_message_service I’ve decided to generate three versions, 7-bit, 8-bit and 16-bit. I ran into the same problem and am also considering a 4th version like you did for US keyboard.

Novembro 14th, 2008 at 11:17
 5 

In the dictionary file, I replaced all the accentuated characters with no accent characters (eg, ‘é’ into ‘e’) because I read somewhere that these kind of characters wasn’t yet well supported.

If I missed any, please let me know.

I also did a new Default.kbd whit a few more commonly used characters (at least by me) on SMS/texting.

Novembro 14th, 2008 at 12:03