Monday, September 8, 2008

Free source code section

We added a small source code section on our Download page, where we will post freebies for developers.

Carabao Language Kit 1.2.0.0 released

The version 1.2.0.0 is now available for download.

Fixed:

  • Unknown patterns were translated as hypernyms

  • Regression: certain category-based sequences were omitted on second execution because of a malfunctioning guess scan caching mechanism

  • In analytical mode (Carabao DeepAnalyzer), there was a mismatch between word index number and an idiom member index, in sentences with attached tokens such as 'em, 'm

  • When copying a token with 1 rule units or less, the text is always reset to the original

Added:

  • Capability to match numbers as patterns

  • When a translation is not found, the engine tries to fall back to a matching hypernym instead

  • New methods to Carabao DeepAnalyzer that enable accessing the members of the detected idioms

  • New methods to Carabao CDA that enable accessing the unknown heuristics table

  • New sequences

  • Russian morphological exceptions

Improved:

  • If an "unknown pattern" is forced to match a known word, it will not create a new guess if a guess with a same hypernym already exists. For example, if you force to check, whether a known word can be a city, a new record will not be created, if there is already a guess with a known city
    Automatic input language switching in locator fields
  • Locator fields are pre-filled with the list of all existing languages in the database, eliminating the need to jump to the next language

Friday, September 5, 2008

Status update

Just to update what's going on:

Now we can reveal the project we are working on.

It's MiaMia, created by the vision of Jo Lernout, the first half of Lernout & Hauspie. Younger generation probably has no idea who they are, but to assess the influence they had on the speech and language technology, simply trace where the linguistic components of nearly everything on the consumer market today came from. Look at this, for example:


It's rather unusual to see a piece of software that has been in use for 15 years. I suppose it means that no one since then did better than them.

MiaMia is a hybrid system, based on the principle of "reliable NLP": if the automatic NLP can't handle it, humans do.

We will be releasing a new version of Carabao over the next few days, but there is only a little bit of new functionality. While a lot of work has been done, it was mostly about polishing the existing mechanisms and exposing certain structures required for real-world applications such as MiaMia.

I will be blogging about MiaMia here.

Thursday, June 5, 2008

The surreal world of VoIP

We're currently working on a massive project involving telephony and mobile technologies, and I had to look for VoIP vendors to cater for relatively simple needs of my client. I have to say that while the needs are simple, the traffic is extremely high, so in monetary terms, it could be a nice deal for the VoIP vendors.

But... is VoIP a strange industry or what. I don't know who runs all these companies - but:
  • if you don't know what IVR, DID, PSTN, or all their other "secret handshakes" mean - they won't even talk to you. Forget about the forums, they are even less helpful than Usenet.
  • the responses come usually after weeks, and they are of the type "I just transferred your inquiry to our sales representative". Obviously, unless you kick and scream, the sales rep won't get back
  • you have to re-tell the story over and over again

Ah yes, but the internet is full of dotcomish optimism about mashups and other kewl stuff. Awesome, dude.

Of course, there are also people who do need paying customers, and those who are able to concentrate, and - surprise surprise - they were the ones who got the job eventually.

Thursday, May 1, 2008

Published in MultiLingual

My article about real-world applications of machine translation has been published in MultiLingual Computing, the leading industry magazine for globalization, international software development and language technology.

This part is for subscribers only though.

Wednesday, April 30, 2008

Localization can be a matter of life and death

This story (A Cellphone's Missing Dot Kills Two People, Puts Three More in Jail) reads like it was taken from a Tarantino or Coen brothers' movie.

Wednesday, April 23, 2008

We are published at ELRA

After a few months of evaluations, agreements, and inspections, our linguistic data is published at European Linguistic Resources Association's website. The Russian - English OLIF dictionary is sold at quite a price, while the freebie Swahili, Czech and Cebuano dictionaries are distributed for free (although ELRA takes postage and media charges).

It is important to mention that all this data can be created from (usually free) ASCII dictionaries on the net using Carabao Linguist Edition.

Clarification: OLIF is Open Lexicon Interchange Format backed by SAP, especially created for NLP oriented lexica. The official website is www.olif.net.