Up Next

Why Verbatim Transcription is Required in the Legal Industry


When selecting a transcription solution, it’s critical to pick a service based on the type of transcription that is required. Every business or organization has different needs that dictate the kind of transcription that they will need to best serve their requirements.

There are two main kinds of transcription services: verbatim and non-verbatim. Each one serves a different unique purpose that is best suited to specific industries. The key difference separating the two is that non-verbatim aims to capture what is said, while verbatim focuses on capturing exactly how something is said.

What is Verbatim Transcription?

Verbatim transcription converts every element of an audio recording into text, to produce a completely faithful record of the audio file. This includes all spoken words as well as filler speech, such as “um” or “uh”. Fillers are typically spelled out in full, while ambient sounds, like doors opening and closing or background voices, are usually noted in parentheses.

In addition, verbatim transcription rules require including the following elements in the final product:

  • Stutters
  • Repeated words
  • Speaker characteristics such as the repetitive use of “like,” “actually,” “sort of,” “kind of”, etc.
  • Interjections made by an interviewer or other speakers, such as “yeah” and “mm-hmm”
  • Non-speech sounds, including coughing, throat clearing, laughter, etc.
  • False starts (i.e. a sentence the speaker begins but never finishes) and redirects
  • Run-on sentences
  • Pauses

A transcriptionist tasked with producing a verbatim transcript is required to note every detail of audio he or she hears in the recording, including tone or vocal inflection, which is typically conveyed using various punctuation elements. For example, ellipses may be used to indicate a pause or a false start. In addition, different speakers must be differentiated in verbatim transcription. In scenarios that involve more than one speaker, a verbatim transcript will also indicate segments where the voices overlap. Since verbatim transcription must remain absolutely faithful to the source audio file, any grammatical errors must also be accurately recorded without being corrected. Run-on sentences cannot be made more concise, and unfinished thoughts and sentences cannot be completed. This approach often gives the transcript a realistic feel, almost akin to a movie script.

This type of transcription is necessary when documenting an interview or testimony for legal purposes because the intent behind the spoken words is often implied through verbal elements, such as hesitations, awkward pauses or repeated words or phrases, as well as through non-verbal cues.

What is Non-Verbatim Transcription?

Sometimes referred to as clean read, non-verbatim transcription captures the fundamental meaning of the spoken words in an audio recording but does not type them exactly as they are spoken. Unlike in verbatim transcription, any grammatical errors are corrected and sounds or words that do not contribute to the main idea are removed. Typically, the transcriptionist will remove fillers or repetition that occur naturally in speech patterns. In certain cases, the transcriptionist will even paraphrase a statement so that it still conveys the same idea but in a more clear and succinct way.

Common uses for non-verbatim transcription include the business sector or in higher education, where a clean transcript free of any unnecessary elements is both more clear and easily digestible by readers.


Why is Verbatim Transcription Necessary in the Legal Context?

Many, if not all, legal proceedings are predicated on totally accurate documentation. As such, verbatim legal transcription is typically required for the following scenarios:

  • Depositions
  • Hearings
  • Legal briefs
  • Court proceedings
  • Witness interviews and statements
  • Police interrogations
  • Arbitrations
  • Wiretaps and phone calls

More and more police departments and state courts are mandating digital recording, leading to an increased need for verbatim transcription in the legal field. Indeed, a number of states, including Maine, Minnesota, Illinois, Alaska, as well as the District of Columbia, have statewide policies requiring electronic recording of custodial interrogations in various types of cases. In addition, over 450 police departments across the United States have independently adopted the policy – and for good reason. An electronic recording provides an objective and irrefutable record, representing tangible evidence to help convict the guilty and exonerate the innocent

Let’s consider an example: “Umm, yes, I think I did – I believe so – you will have to ask my boss.” The clean read of the same recording would appear as follows: “You will have to ask my boss”. The non-verbatim transcription fails to capture important speech details, as it omits words that could imply an admission of guilt or knowledge that the person has done something wrong. In this case, “I probably did” and “I think so” are vital to preserving the true meaning of what was said and intended by the speaker.

A courtroom is a place where emotions tend to run high. Angry outbursts, laugher, pauses, and stumbles must all be captured in order to truly capture the essence of the proceedings. It’s often said that the majority of communication is non-verbal. From a layer’s perspective, nonverbal cues can be just as essential as words when it comes to formulating a cogent argument. Therefore, verbatim transcription is a must.

Let’s consider another example: “Yeah, I mean, I know what you’re saying is like right, but [laughs] I was just having a really bad day and I umm didn’t realize that I was doing anything wrong. A clean read would be transcribed as: “I was having a bad day and didn’t realize that I was doing anything wrong.” It’s clear that the non-verbatim version fails to capture that the person in question may have not been taking things seriously, or may be anxious about the situation, hence the nervous laugh. Either way, verbatim transcription enables a deeper and more nuanced examination of the events, leading to more accurate conclusions.


Legal transcription is a crucial component of most legal processes and activities. Whether it is a negotiation, arbitration, interrogation, or court case, a clear record is necessary to provide effective legal counsel, or to achieve a just outcome. The ability to consult the exact words that were spoken can mean the difference between a favorable and an unfavorable verdict. Beyond the words themselves, transcriptions that have been cleaned up or are not fully verbatim can fundamentally alter the context of speech and pervert the course of justice. A typo in a legal transcript might be the differentiating factor that separates jail from freedom.

Up Next

From Audrey to Siri: The Evolution of Speech Recognition Technology

A Brief Speech Recognition History  

From its earliest days in Bell Laboratories to the ubiquitous digital assistants of today, speech recognition has definitely come a long way. Back in 1952, Bell Labs designed its “Audrey” system, which was capable of recognizing digits spoken by a single voice – a remarkable feat for the time. A decade later, IBM unveiled its “Shoebox” machine at the 1962 Seattle World’s Fair, capable of recognizing 16 spoken English words in addition to the digits 0 through 9.


Speech Recognition as a Part of Everyday Life

Fast forward a few decades and these early, somewhat primitive systems have given way to the highly advanced, AI-enabled technologies that are virtually everywhere today. Dial a customer service call center and the odds of immediately reaching a human representative are slim to none. The more likely scenario is that you will be asked a few automated questions prior to finally being sent to a representative. Speech recognition technology has also helped make driving safer for motorists and pedestrians alike by enabling hands-free dialing and voice-activated navigation systems.  

One of the biggest impacts of speech recognition is in making technology available and accessible to individuals with vision, mobility or other impairments. Voice-activated functionality is the most simple and intuitive method to interact with technology, as it represents a familiar interface that doesn’t place undue demand on the user.

Apple’s Siri is largely responsible for bringing AI-driven voice recognition to the masses. The culmination of decades of research, the world’s most recognized digital assistant has injected some humanity, and often, a bit of wit and humor into the oft-dry world of speech recognition.


Expanding to New Sectors with AI

Although speech recognition may have found its footing and exploded in popularity as a result of personal uses, years of training and mountains of data have enabled the technology to reach a point where accuracy levels are high enough to be applied in an enterprise setting.   

Indeed, the market is experiencing a boom, propelled by the advancement of sophisticated AI technologies. The $55-billion speech recognition industry has been forecasted to grow at 11% from 2016 to 2024, providing massive opportunities in a variety of industries for smaller startups and for tech giants alike to grab market share.

Speech recognition technology works using algorithms through acoustic and linguistic modeling. Acoustic modeling refers to the connection between linguistic units of speech and audio signals, while language modeling matches those sounds with word sequences to differentiate between similar-sounding words.


Here’s a brief overview of some of the intricate AI technologies that are involved:


Machine learning (ML):A subset of AI, ML refers to systems that learn from experience and get “smarter” over time, without human intervention. ML is a method of training an algorithm so that it can learn how to perform a specific task. Training involves feeding the algorithm large amounts of data and allowing it to adjust and improve.

Deep learning (DL):Like ML, DL refers to systems that learn from experience. The difference is that DL is applied to much larger data sets.

Natural language processing: Part of AI that refers to systems that can process spoken or written language to understand meaning.

Neural networks: This is a technique or approach to ML that refers to biologically inspired networks of artificial neurons that mimic the structure and capabilities of the animal brain. It is a framework or model designed to help machines learn.


Context is Key

A key innovation that has spurred the evolution of speech recognition technology is the introduction of context-focused algorithms. It can often be hard to differentiate between two similar-sounding phrases without any background information. However, if the speech-to-text engine is fed with data about the subject matter, it can accurately convert the spoken word to text with minimal errors. After all, a conversation about “euthanasia” is likely to be very different from one about “youth in Asia”.  


Future Speech Recognition Improvement

Although speech recognition technology has come a long way since the days of Audrey and Shoebox, there is still a long way to go. One area that offers significant room for improvement is in accent recognition. A recent study commissioned by the Washington Post found that Google and Amazon’s smart speakers were 30% less likely to understand non-American accents than those of native-born speakers. The issue persisted among certain American accents as well, as more rural or southern accents were less likely to be understood than northern, western or midwestern accents.

Speech analytics is another field that holds a lot of promise for future advancement. Turning unstructured data such as audio and video files into structured, text-based information will enable greater pattern recognition and the extraction of practical insights which is sure to have a great impact. This capability has the potential to revolutionize call quality monitoring, sentiment analysis, and will undoubtedly improve the speed and accuracy of workflows in a variety of industries, including speech recognition software for the legal and academic sectors.


From AI-powered digital assistants to personalized product recommendations when shopping online, artificial intelligence is all around us, with no signs of slowing down. And while it’s clear that significant progress has been made, it’s also evident that we’ve only begun to scratch the surface of what AI can do when applied to speech recognition and speech to text technologies.   


Back To Top