Turning iPhones into Babel fish

Google plans to develop true speech-to-speech translation apps by 2012.

In 1954, IBM demonstrated the first computerized language translation. The room-filling IBM 701, fed with punch cards and armed with a vocabulary of 250 words, translated Russian text to English in a few seconds. The Financial Times reported, “It is hoped that by 1958 the multilingual automatic translator for business men and scientists will really have arrived.”

Well, not quite. Fifty years later, researchers are still searching for technology that can emulate the Babel fish of The Hitchhiker’s Guide to the Galaxy fame, which could be slipped into your ear and allow you to instantly understand any language. Progress has been slow, but recently some big advances have brought us a lot closer. Smartphones, thanks to their ubiquity and computing power, provide an ideal platform to distribute speech-to-speech software. And there have been some real breakthroughs in how computers translate. Lots of data is required to do it accurately, and new technology is allowing software to use the wisdom of the Internet as a learning tool. In fact, Franz Och, Google’s head of machine translation, told the U.K. Times Online this month that the company expects a basic program for speech-to-speech translation for mobile devices to be ready within two years.

Google’s edge comes not just from the fact that it has already developed fairly sophisticated speech-recognition software, but because the search engine giant has access to information like few other companies. Computers typically learn to translate by identifying patterns between two languages. But first they need reams of source material in multiple languages to identify those patterns with any degree of accuracy. The Internet, of course, is the greatest source of data available, and has been a gold mine for researchers.

Google is not alone in its pursuit. Other, smaller companies have already put smartphone translation apps on the market. Mobile Technologies, a startup in Pittsburgh, released an iPhone app called Jibbigo last fall that translates between English and Spanish, and boasts a vocabulary of 40,000 words. Say a sentence in English, and the software will convert it to Spanish and say it aloud within seconds. The company also has a Japanese version, and has plans for a dozen more languages. “People really like the fact that you can have an interpreter in your pocket,” says Alex Waibel, co-founder and professor of computer science at Carnegie Mellon University.

Meanwhile Sakhr Software USA, a Vienna, Va., firm, has just released an English-Arabic speech-to-speech translation app, an impressive feat, given that Arabic is one of the more difficult languages to work with. Since Arabic is written without vowels, the software must first understand the meaning of the sentence before translating the individual words.

IBM, perhaps the world’s foremost innovator in computer translation, is closing in on true speech-to-speech translation as well. David Nahamoo, manager of the human-languages group, says the company is focusing on specific applications, rather than the general approach taken by Mobile Technologies and Sakhr. Zeroing in on certain scenarios makes the task simpler, as it limits the number of phrases the software needs to understand.

“The first set of applications that is going to be needed is in tourism,” Nahamoo says. IBM is perfecting translation for typical tourist scenarios, such as asking for directions or ordering food. IBM has already created a custom handheld speech-to-speech translator for the U.S. military, which could also be used for humanitarian and medical workers around the world. Nahamoo says it will be a few more years before these technologies move out of the lab, however. “These things will never be perfect,” he concedes. Dialects, accents and emotions are hard to translate, and errors can snowball as there are many steps involved.

For now, the Mobile Technologies and Sakhr apps are the closest we can get to bridging the communication gap. Both can handle only one sentence at a time, but Waibel is working to change that. He’s already demonstrated simultaneous translation in his lab, allowing academic lecturers to give presentations in one language to a room full of people who receive it in another via special speakers that direct sound. How long before simultaneous translation can be incorporated into a smartphone? Waibel, for one, claims it’s coming sooner than you think. “The processing power is not yet sufficient,” he says, “but give it a couple of years.”