How Security and Accuracy Set Captioning Solutions Apart

By: Sarah Roberts

With the greater need to provide support to students, employees, consumers and others who are Deaf or hard of hearing, more are captioning their content. However, many automatic Artificial Intelligence-based tools miss the mark. Renowned institutions like Harvard and MIT found out the hard way, but all school leaders can pivot to protect themselves and better serve their students.

School leaders can do their due diligence by understanding what to look for in captioning tools. They need to ensure they offer the necessary security to protect sensitive information, as well as reach the level of accuracy needed to offer equity to individuals who rely on captions to participate.

A phone showing a padlock graphics.

ASR is Not Enough: Concerns Over Captioning, Accuracy and Accessibility

Universities, businesses and others must meet high accuracy standards when it comes to their captioned content. The National Deaf Center (NDC) specifically points out that ASR is not (yet) accurate enough to provide adequate captions. The NDC states that offering flawed automated closed captioning fails to create an equitable experience for the Deaf and hard of hearing communities.

The UK government also requires a human editing layer for all of its online captioning. According to the UK, the World Wide Web Consortium’s Web Content Accessibility Guidelines (WCAG) create an expectation of accuracy that exceeds ASR’s capabilities.

Imagine that ASR is trying to interpret the sentence “I designed that red dress” but misinterprets “designed” with “declined.” The sentence becomes, “I declined that red dress.” While grammatically accurate and intelligible, that single substitution error changes the meaning of the sentence. Substitutions are one of the three error types common to ASR, along with deletions and insertions.

Throughout a long video or lecture, ASR that generates many errors may create captions that are incredibly challenging to follow. Not only could this lead to direct misunderstandings of the material, but it also causes distractions and requires that readers perform an extra step in digesting the information.

The rate of errors increases with accented speech, poor recording quality and uncommon terminology. Studies also indicate that when a Deaf person speaks in classrooms, business meetings or other venues, the technology is less capable of generating accurate captions. It is, therefore, not difficult to see why organizations like the NDC claim that ASR alone cannot create an equitable experience for those who rely on captions.

Accuracy and Legal Consequences

Caption accuracy is not just vital for the user’s experience but also to protect the organization from potential lawsuits. The National Association of the Deaf recently sued Harvard and MIT for discrimination based on their online content. The lawsuits pointed not only to a lack of captions but also to woefully inaccurate captions.

For instance, in one video, captions for the phrase “on our campus…” stated, “hot Campen good.” In another example, the captions for “these were either the government or partners of the government…” read “the square governmental Oct ago.” These errors illustrate how inaccurate captions can be useless to those who need them, while the lawsuit highlights the legal importance of offering accurate captions.

How Verbit Addresses Accuracy Concerns

Verbit’s captioning and transcription services tackle ASR’s accuracy challenges in two ways. First, Verbit uses mature AI that continues to learn and improve to accommodate heavy accents, new words and phrases. The user can also preprogram complex terms so that the captioning software is better at recognizing them. This feature makes Verbit incredibly helpful for professionals, academics and others who often use complex or industry-specific speech in lectures, meetings and events.

Additionally, even when using the best closed captioning software, a human transcriptionist needs to correct the ASR’s errors to ensure clean and readable transcript or captions for the user. Verbit’s solution adds this human layer to the transcription and captioning process.

Verbit’s solution uses AI technology to support human transcriptionists while offering robust privacy standards and up to 99% targeted accuracy. For more information about Verbit’s captioning, transcription, audio description and subtitling services, contact us.