This is sort of fascinating. Google has decided to withdraw its language translation tools from public use (though the Google Translate site itself will stay around), and the reason, as with so many things internet-related, is that it’s a victim of its own success. Google’s translation engine improves over time by comparing side-by-side samples of translated text that get scooped up by its search robots, but this continuing improvement depends on the translations themselves being high quality. So what happens when spammers and link farmers flood the internet with text translated by Google’s own tools? Kirti Vashee of eMpTy Pages explains:
The higher the quality of input to this training process, the higher quality the resulting engine can translate. So the increasing amount of “polluted drinking water” is becoming more statistically relevant. Over time, instead of improving each time more machine learning data is added, the opposite can occur….This results in potentially lower quality translations over time, rather than improvements.
….What Google did not anticipate was extent of abuse of the Google Translate API in a manner prohibited by its Terms of Use. This has resulted in such a significant mass of poorly translated content that the impact on Google’s core search business is notable and poses a significant threat to the quality of Google’s search results and the quality of its future translation initiatives. Given how important search and translation are to Google’s current and future business, this is most likely the “Substantial Economic Burden” and “abuse” that Google refers to in its shutdown announcement. With this realization, it makes sense that Google is taking action to rectify the problem.
This comes via James Fallows, who says, “This is the computer-world equivalent of sloppy overuse of antibiotics creating new strains of drug-resistant bacteria.” It just goes to show, once again, that there’s hardly anything that spammers and other internet leeches can’t ruin.