Monday, September 8, 2008
Free source code section
Carabao Language Kit 1.2.0.0 released
The version 1.2.0.0 is now available for download.
Fixed:
- Unknown patterns were translated as hypernyms
- Regression: certain category-based sequences were omitted on second execution because of a malfunctioning guess scan caching mechanism
- In analytical mode (Carabao DeepAnalyzer), there was a mismatch between word index number and an idiom member index, in sentences with attached tokens such as 'em, 'm
- When copying a token with 1 rule units or less, the text is always reset to the original
Added:
- Capability to match numbers as patterns
- When a translation is not found, the engine tries to fall back to a matching hypernym instead
- New methods to Carabao DeepAnalyzer that enable accessing the members of the detected idioms
- New methods to Carabao CDA that enable accessing the unknown heuristics table
- New sequences
- Russian morphological exceptions
Improved:
- If an "unknown pattern" is forced to match a known word, it will not create a new guess if a guess with a same hypernym already exists. For example, if you force to check, whether a known word can be a city, a new record will not be created, if there is already a guess with a known city
Automatic input language switching in locator fields - Locator fields are pre-filled with the list of all existing languages in the database, eliminating the need to jump to the next language
Friday, September 5, 2008
Status update
Just to update what's going on:
Now we can reveal the project we are working on.
It's MiaMia, created by the vision of Jo Lernout, the first half of Lernout & Hauspie. Younger generation probably has no idea who they are, but to assess the influence they had on the speech and language technology, simply trace where the linguistic components of nearly everything on the consumer market today came from. Look at this, for example:
It's rather unusual to see a piece of software that has been in use for 15 years. I suppose it means that no one since then did better than them.
MiaMia is a hybrid system, based on the principle of "reliable NLP": if the automatic NLP can't handle it, humans do.
We will be releasing a new version of Carabao over the next few days, but there is only a little bit of new functionality. While a lot of work has been done, it was mostly about polishing the existing mechanisms and exposing certain structures required for real-world applications such as MiaMia.
I will be blogging about MiaMia here.
Thursday, June 5, 2008
The surreal world of VoIP
But... is VoIP a strange industry or what. I don't know who runs all these companies - but:
- if you don't know what IVR, DID, PSTN, or all their other "secret handshakes" mean - they won't even talk to you. Forget about the forums, they are even less helpful than Usenet.
- the responses come usually after weeks, and they are of the type "I just transferred your inquiry to our sales representative". Obviously, unless you kick and scream, the sales rep won't get back
- you have to re-tell the story over and over again
Ah yes, but the internet is full of dotcomish optimism about mashups and other kewl stuff. Awesome, dude.
Of course, there are also people who do need paying customers, and those who are able to concentrate, and - surprise surprise - they were the ones who got the job eventually.
Thursday, May 1, 2008
Published in MultiLingual
This part is for subscribers only though.
Wednesday, April 30, 2008
Localization can be a matter of life and death
Wednesday, April 23, 2008
We are published at ELRA
It is important to mention that all this data can be created from (usually free) ASCII dictionaries on the net using Carabao Linguist Edition.
Clarification: OLIF is Open Lexicon Interchange Format backed by SAP, especially created for NLP oriented lexica. The official website is www.olif.net.

