Sunday, February 24, 2008

Why is MT so formal?

I came across an interesting discussion about machine translation in LinkedIn:

Among the obvious stuff (obligatory "spirit is willing, flesh is weak", "out of sight, out of mind" quotes and recommendations from professional translators to hire professional translators instead), there was one curious comment that machine translation is "unnecessary formal".

Brushing aside the exaggerated expectations (you don't expect your computer to have a Jerry Seinfeld inside, do you?), I now recall that when I myself first encountered an MT software (it was PARS in early 1990s), what struck me was the unnecessarily formal style of the output (OK, nowadays they also have SMTS, which produces "porridge o' words" style).

Really, why does it have to be so formal?

If you fly often, and it's usually not business class, then probably you developed strong aversion for airplane food and collocations like "sky chefs". While usually food is well-preserved and reasonably fresh, it rarely tastes like real food with real flavour. I love spicy food and I frequently fly Asian airlines, but I never got to taste real spice there. Aside from the safety concerns (sick people on the plane don't really make it fly faster), I think the reason is that they are aiming for the bland, politically correct, acceptable, good enough by everyone average. No one gets offended. No one gets hurt.

It is the same with MT. The formality is not a product of technical limitations. It is possible to implement all styles even in older generation systems, but it is more difficult to maintain them. So essentially, the developer needs to pick one style. And if it has to be one style, the best bet is for a bland, politically correct, formal language.

And just like in the case of the airlines that do not set the goal of providing a unique culinary experience, no MT system ever promised to produce a literary masterpiece. Just as you pay the airlines to get you from point A to point B, you use MT to get your "cargo" from a source language to a target language, with as little damage to the wares as possible.

No comments: