This is sort of fascinating. Google has decided to withdraw its language translation tools from public use (though the Google Translate site itself will stay around), and the reason, as with so many things internet-related, is that it’s a victim of its own success. Google’s translation engine improves over time by comparing side-by-side samples of translated text that get scooped up by its search robots, but this continuing improvement depends on the translations themselves being high quality. So what happens when spammers and link farmers flood the internet with text translated by Google’s own tools? Kirti Vashee of eMpTy Pages explains:
The higher the quality of input to this training process, the higher quality the resulting engine can translate. So the increasing amount of “polluted drinking water” is becoming more statistically relevant. Over time, instead of improving each time more machine learning data is added, the opposite can occur….This results in potentially lower quality translations over time, rather than improvements.
This comes via James Fallows, who says, “This is the computer-world equivalent of sloppy overuse of antibiotics creating new strains of drug-resistant bacteria.” It just goes to show, once again, that there’s hardly anything that spammers and other internet leeches can’t ruin.