How Mature AI Tools Can Promote Diversity and Inclusion

By: Sarah Roberts

Many technologies designed to be inclusive fall short. Speakers with non-anglicized names or who have less neutral accents often feel that technology is excluding them.

Voice recognition programs like Siri and Alexa frequently misinterpret a speaker’s words.  For many individuals, the inability of tech to correctly transcribe their speech is a frustrating result of discriminatory programming.

Programming voice recognition is a complicated process. It’s simply not feasible for developers themselves to consider every possible accent, unique name or each specific set of vocabulary. Fortunately, mature technology can make up for these shortcomings. Artificial Intelligence can be trained to learn a speaker’s accent and commonly used vocabulary to understand that person better and achieve more accurate results over time.
 mobile phone with airpods beside it resting on a table

Voice-Recognition Software & Discrimination

According to studies, voice recognition technology often fails to comprehend various speakers. Researchers looked at five transcription programs and found that they were about twice as likely to misinterpret words spoken by a Black person than those uttered by a Caucasian speaker.

When it comes to names, the issue becomes personal for many. Users may find themselves needing to choose between mispronouncing their own names or simply avoiding using features or technologies they can’t rely on for accuracy. They, therefore, aren’t able to benefit from technologies designed to be valuable timesavers and are not offered equity to their peers. This disparity creates obvious problems. Some technologies are, albeit unintentionally, discriminating against anyone whose speech or name does not fall into the programmers’ ideas of “standard” English.

Of course, programming technology that can comprehend each possible accent, combination of accents, use of vernacular language and other variances might seem overwhelming or even impossible. However, many Artificial Intelligence providers have discovered how to solve this dilemma.
laptop on a table with a mic setup

Training AI to Learn & Adapt

Artificial Intelligence machines can learn from their mistakes. Each time captions or a transcript are generated with AI tools, the AI can learn from corrections to errors and improve its performance. For example, Verbit’s in-house AI can be trained with each use. Users can upload a glossary of terms, such as speaker names or complex words which they will likely use, so the technology can recognize them more easily and transcribe them accurately. Additionally, when professional transcribers go in and fact-check the technology to make corrections, the AI learns as well. With more use and more corrections, the AI better understands the speech patterns, accents and common vocabulary.

AI can and should adapt to the user, not the other way around. Individuals with uncommon names should not have to purposely mispronounce them. Rather, developers should design technology that trains itself and improves based on prefigured settings and uploaded materials.

Additionally, the more individuals who utilize the technology, the stronger the performance will be for all of its users. Individuals and professionals should be sure to select automatic speech recognition and AI tools that are constantly adapting rather than settle for mediocre results that do not provide equity.

Real AI Case Studies on Providing Equity


To adhere to the Americans with Disabilities Act and foster a positive, inclusive learning environment, the University of California at Santa Barbara began using captions provided by Verbit for better accuracy and efficiency. According to officials at UCSB, Verbit demonstrated a high degree of accuracy even when the speaker had a heavy accent. The ability to interpret diverse accents and speech patterns is vital in an inclusive university atmosphere that values incorporating different voices or serves international students and professors.


Udacity offers hundreds of online courses covering highly technical subjects. Accurate captioning for uncommon terms can prove to be challenging. Udacity turned to Verbit to ensure that its learners have access to accurate materials and correctly capture names and technical terms.

University of Glasgow

Many AI-based solutions fail to accurately interpret UK English because the programmers focus on American English. The University of Glasgow uses Verbit to provide captioning services that meet the UK’s needs.  Verbit provides accurate transcriptions of courses that deliver on professors’ diverse accents and complex medical terminology.

AI is Improving Speech to Text Features

While Siri and other free, built-in solutions may not transcribe diverse voices effectively, mature AI tools are helping businesses, schools and the legal system better accommodate individuals with various backgrounds. Moreover, they’re helping to provide more equitable experiences to all. Effective AI technology helps include individuals who are deaf or hard of hearing, non-native English speakers and those with learning disabilities, such as ADHD.

To be truly inclusive, universities, employers and broadcasters must ensure they’re employing tech like Verbit’s, which can be trained to meet inclusivity needs and to prevent biases. Technology that fails to perform for a great portion of the population simply cannot provide an inclusive environment or experience.