Every evening last summer, after I’d shut down my work laptop, my 3-year-old daughter and I would approach our Google Home smart speaker and yell, “Hey Google, can you play ‘Aankh Marey’ from the movie Simmba?” We’d hold our breaths and wait for a response. The digital assistant would then repeat the name of the Bollywood song we’d requested in its default standard American accent.
We’d rejoice and dance when the assistant played the right number, which happened about half the time. My daughter was going to a Bollywood dance class and we’d finally found a use for the device that my husband had won at a tech conference.
Often, however, it would mishear our requests and play something else. My daughter and I would look at each other and chuckle, like the only people in the room who got a joke. We’d roll our eyes and bond over our assistant’s incompetence. These moments turned out to be funny and special, and secretly, I enjoyed the role reversal of having an assistant who sounded like a stereotypical American. When would that happen in real life?
Yet several days into our routine, I noticed something strange. My daughter and I were contorting our mouths to pronounce the names of Bollywood songs with an American accent. I don’t know if our exaggerated Midwestern accents improved Google Home’s hit rate or if we were doing it unconsciously so we felt like we were being understood. Either way, the gadget that had entered our house as a helper had turned into an intruder. Not just an intruder that could listen to our private conversations, but an intruder that was telling us how we should speak our own language in our own home. I’d been wrong about our reversed power dynamic.
My hunch was confirmed when I spoke with Halcyon Lawrence, an assistant professor of technical communication and information design at Towson University who studies user accessibility and design for voice recognition systems such as Google Assistant, Amazon’s Alexa, and Apple’s Siri. “Your daughter is being disciplined by Google Home. You are being disciplined,” she told me. This artificial intelligence–powered machine, she explained, either understands users’ accents based on its programming or it doesn’t. If it misunderstands something, it just assumes it knows what it’s hearing and powers through its mistake. It’s essentially a one-way feedback loop where humans must change their behavior to make the machine run more smoothly.
“If I am going to use this technology then I must assimilate. I must code-switch,” Lawrence said. “I find there is something inherently violent about that, because it is no different than the kind of language discipline that we faced when we were colonized.” Like most postcolonial English speakers, I float in an in-between land of languages. I speak four Indian languages and I speak English fluently. Yet my accent and dialect are seen not as marks of erudition or class like British accents, but as punchlines that reinforce stereotypes. (Think Apu from The Simpsons.)
Lawrence’s own experiences being misinterpreted because of her Trinidadian accent inspired her to study how voice recognition systems embed “accent bias.” Linguistic studies have found that nonnative English speakers and people who don’t have standard American accents, particularly immigrants from non-European countries, are penalized in the job and housing markets because they are perceived to be less intelligent and less competent. Vice President Kamala Harris has described how people would assume that her mom, who had a PhD in nutrition and endocrinology, was unintelligent because of her Indian accent.
Switching your personal digital assistant’s accent won’t affect its ability to understand yours. But it will tell you more about who it thinks it’s talking to. The most popular services—Google Assistant, Alexa, Siri, and Microsoft’s Cortana—can speak in a range of languages, dialects, and accents, with notable exceptions. Alexa has only one version of a standard American accent. Siri has American, British, Irish, Indian, Australian, and South African accents. Google Assistant speaks with one of eight color-coded American-accented presets as well as “British Racing Green” and “Sydney Harbour Blue.” None of the assistants, however, offer any regional or ethnic American accents or African American Vernacular English, a dialect with its own accent and unique grammatical features. Google Home, however, does have a limited-time “cameo” voice appearance by comedian Issa Rae (who was preceded by John Legend). Alexa features Samuel L. Jackson.
But these celebrity voices aren’t really about functionality—Jackson can offer his opinion on snakes but isn’t programmed to help you with your shopping. Miriam Sweeney, an associate professor at the University of Alabama’s school of communication who studies voice assistants, told me that the virtual Jackson and Rae are further examples of technology companies using Black voices to entertain white consumers while ignoring Black consumers. A recent study by computer scientists and linguists at Stanford University found that all the major speech recognition systems routinely misunderstood users who speak African American Vernacular English at almost twice the rate of their white counterparts.
Google told me that fairness is one of its “core AI principles” and that the company seeks to make its digital assistants accessible to as many people as possible. “From day one, we’ve strived to build an inclusive product that can be helpful for all users and equally serve them,” Beth Tsai, the director of Google Assistant policy, told me. Much as it doesn’t specify the gender of its assistants’ voices, Google seeks to make its default American voice raceless. “Labeling a voice—even if it’s recorded by a Black actor—as a ‘Black voice’ defines what a Black voice sounds like…Similar to gender, voices for people of different races are really diverse. We’d be doing a disservice and be leaning into stereotypes if we applied those labels,” Tsai said.
Amazon told me something similar. “Alexa’s understanding of different languages, dialects, and accents is of the utmost importance to Amazon and our customers,” said a spokesperson for Amazon. “Alexa has been designed to work well for everyone, and our speech recognition models work with many different dialects and variations in speech. We continuously improve our models in order to accurately recognize variations in speech.”
Even as the tech companies boast that their voice recognition products are accessible to a wide range of users, they add that the technology is still difficult to develop, which partly explains why they haven’t introduced more accent options. Even as they say African Americans are an important consumer market, some in Silicon Valley argue that market dynamics dictate which accent and language options are available. These arguments aren’t entirely convincing. All four digital assistants offer Italian, which is spoken as a first language by 63 million people. (Siri even comes in Finnish, which has about 5 million native speakers.) Yet only a couple assistants offer Swahili, Telugu, and Marathi—languages with nearly 100 million speakers each.
It’s nothing new for companies to exclude consumers of color, says Safiya Umoja Noble, an associate professor of information studies at UCLA and the author of Algorithms of Oppression, an investigation into how search engines reinforce racial biases. For years, consumers of color have heard that “they are not a market that matters the way high-end luxury, middle-class, and affluent consumers matter,” she says. “The whole history of advertising in the United States has been about prioritizing people who don’t have accents, as if there is some type of neutral space of language, which of course we know is absurd.”
Another likely explanation for the digital assistants’ limitations is that they reflect their creators’ blind spots. An analysis that I did for Reveal of 177 large US technology companies found that in 2016, 73 percent of their executives and senior managers were white, 21 percent were Asian (including South Asian), 3 percent were Latino, and 1.4 percent were Black. (In December, a prominent Black AI ethics researcher claimed Google fired her after she co-wrote a draft paper about the risks of text recognition systems reinforcing racial and gender biases.)
However, both Noble and Sweeney think it may be a good thing that voice recognition devices aren’t trained to recognize many accents of marginalized groups, effectively stymieing the devices’ primary function, which is collecting users’ personal data. Sweeney tells her students to throw away their smart speakers, and Noble refuses to buckle under pressure from her 9-year-old to buy one. When we use these technologies, she said, “we teach our kids that our voice isn’t the normative voice, that they have to be something else in order to engage, in order to participate, in order to find themselves.”
Lawrence notes that when voice recognition technology misinterprets people of color, it’s not simply inconvenient but sometimes harmful. A ProPublica investigation into “aggression detectors” that are being installed in schools found that they incorrectly identified kids’ voices as aggressive even when they were saying completely innocuous things. On the other hand, she noted, if Black people don’t use digital assistants, they will continue to be overlooked and misunderstood by tech companies, which will keep baking biases into their algorithms. “For how long will we be spared not being recognized? In my mind, it’s a no-win situation,” Lawrence said.
Within my home, I found a small, twisted way to win. I was excited to learn that Google Home has a setting called English (Indian), which speaks Indian-accented English. I also selected the Hindi language option, though I quickly became frustrated because it only understood a formal Hindi that is totally unlike the version we speak at home. For the first time in my life, I had to look up the Hindi word for “movie.” The English–Indian assistant didn’t understand me any better than the default setting, but at least it didn’t make me feel like I had to imitate an American accent.
Talking to someone—or something—that sounded like us did improve our experience, even if it didn’t completely reflect our multilingual reality. I noticed that my daughter wasn’t code-switching her English anymore. And every so often, we’d google phrases to speak in pure Hindi with our assistant. And we’d look at each other and laugh at how abnormal our nonconversational Hindi sounded.
It was clear that we didn’t have a perfect substitute for the polished product aimed at “American” speakers. When my daughter would ask the American-accented assistant to tell her a story, it would regale her with fairy tales like “Cinderella” or “Hansel and Gretel.” Yet when she asked the Hindi assistant for a story, it revealed a hole in its programming—not a serious functional issue, but a revealing sign of the lack of imagination that had gone into it. It replied, “Once upon a time there was a king and once upon a time there was a queen. They both slept and that’s the end of the story.”