Man, is this annoying or what. With claims like these, the MT industry is going to move from Web 2.0 to Dot Com Bubble Burst 2.0.
I do hold high respect for Asia Online's team. Phillip Koehn co-authored some papers with Franz Och, the guy behind Language Weaver and Google Translate. Dion Wiggins has some super-duper credentials as an IT businessman, although with no experience at all in MT or NLP (and this industry is very different).
From the technological point of view, Koehn did the right thing (IMHO) going in the hybrid direction instead of looking for the philosopher's stone of pure ideal engine that builds itself and with some kind of divination deduces stuff that hand-crafted rules can't. It is also refreshing to see that they decided to use a high quality corpus.
But - c'mon, people!
The article contains odd bits such as:
...adding a syntax component to a SMT system "seriously degrades throughput performance from 5,000 words a minute to only around 300" on a machine with 4 high speed CPUs
Did they mean the SMT system? Because how can you measure speed of all SMT systems?
If this is the SMT system (i.e. they already built the kernel), this probably means they are going to use Moses. Which calls for more questions. Koehn created Pharaoh and Moses; both are open-source; both have been around for a while; yet I never heard of a commercial or even semi-commercial application that uses them. And I know at least one huge translation agency that launched a project to create their own MTs based on Moses.
There is also EuroMatrix, an all-you-can-spend research project, where Koehn also took part, and which, just like all the other euro-science-charity projects, produced nothing (yeah, OK, there is a Czech English lexicon, which is a huge deal, right?).
The website of Asia Online looks nice. Obviously, lots of Wikipedia articles, SMT for dummies complete with scientifically-looking formulas, media buzz. Not a word about the actual engine, no screenshots, no demos.
One thing that particularly captured my attention was the Careers page. As of now, they are looking for:
- country office managers in every Asian country, including Thailand
- "content procurement" managers that read content and make sure it is well-translated (good luck with that) before feeding it to the lexicon builders
- the coolest thing: programmers in Thailand: C++ - probably to fix bugs in Moses, and C# to hack the front-end. Now this last part is kind of OK, except that one of the requirements is "Have database skills in MS SQL Server, Oracle Database and etc". Data and stuff. In other words, on this stage they did not decide yet what backend they are going to use. Which means, there are no design specs.
So they have no people responsible for the content. They have no design specs. They have no solid plan. They have no people who worked and produced anything of this class.
They did not even research their markets properly. If they even looked at Wikipedia articles about the Philippines, or talked to one or two Filipinos, they'd learn that English, not Tagalog, is the lingua franca of the Philippines when it comes to written language. 90% of the major newspapers, all the official correspondence, all the commercial documents in the Philippines are in English.
All they have is the prototype of a system that was never shown to work in a real-world environment, and a bunch of British guys who rented an office in "beautiful downtown Bangkok" and announced that they will conquer the world in a year.
And, of course, the usual reservations about statistical MT apply.