Spammers Ruin Yet Another Cool Thing

membership ticker

Autocratic forces are ascendant.

Truth-telling independent media is one remaining bulwark against an autocrat’s unrestrained exercise of power. At a time when billionaire owners of corporate media are making accommodations to political leaders, our nonprofit newsroom cannot be bought, bent, or broken. Depend on it, and please stand with us.

Autocratic forces are ascendant in America. But truth-telling independent media is one remaining bulwark against the unrestrained exercise of power. At a time when billionaire owners of corporate media are making accommodations to political leaders, our nonprofit newsroom cannot be bought or broken. Please stand with us.

Fight disinformation: Sign up for the free Mother Jones Daily newsletter and follow the news that matters.

This is sort of fascinating. Google has decided to withdraw its language translation tools from public use (though the Google Translate site itself will stay around), and the reason, as with so many things internet-related, is that it’s a victim of its own success. Google’s translation engine improves over time by comparing side-by-side samples of translated text that get scooped up by its search robots, but this continuing improvement depends on the translations themselves being high quality. So what happens when spammers and link farmers flood the internet with text translated by Google’s own tools? Kirti Vashee of eMpTy Pages explains:

The higher the quality of input to this training process, the higher quality the resulting engine can translate. So the increasing amount of “polluted drinking water” is becoming more statistically relevant. Over time, instead of improving each time more machine learning data is added, the opposite can occur….This results in potentially lower quality translations over time, rather than improvements.

….What Google did not anticipate was extent of abuse of the Google Translate API in a manner prohibited by its Terms of Use. This has resulted in such a significant mass of poorly translated content that the impact on Google’s core search business is notable and poses a significant threat to the quality of Google’s search results and the quality of its future translation initiatives. Given how important search and translation are to Google’s current and future business, this is most likely the “Substantial Economic Burden” and “abuse” that Google refers to in its shutdown announcement. With this realization, it makes sense that Google is taking action to rectify the problem.

This comes via James Fallows, who says, “This is the computer-world equivalent of sloppy overuse of antibiotics creating new strains of drug-resistant bacteria.” It just goes to show, once again, that there’s hardly anything that spammers and other internet leeches can’t ruin.