Product Explainer

Verbit’s data policy

Introduction

As a global leader in captioning and transcription solutions, Verbit’s mission is to make verbal and visual information accessible, insightful and useful, serving as a trusted partner to over 3,000 businesses and institutions.

Since its inception, Verbit has been a pioneer in combining the power of artificial intelligence (AI) with domain-specific human intelligence to provide highly accurate captions and transcripts. Verbit’s Captivate™ platform combines Verbit’s powerful proprietary automatic speech recognition (ASR) engine with a unique technology layer that enables dynamic adaptation of terms and is language specific to the customer’s domain or use case.

Apart from its proprietary ASR engine underlying the Captivate platform, Verbit also offers a generative AI (Gen AI) powered Gen.V™ solution suite and integrates Large Language Models (LLMs) to enhance our transcription and analysis capabilities. Gen.V builds on our strength in providing industry-leading accurate captions and transcripts, while also extracting and presenting customers with actionable insights based on analysis of their textual information.

Verbit’s team is keenly aware that the recent proliferation of Gen AI-based solutions raises questions and concerns, as is often the case with new technology. These include concerns around accuracy, leakage of intellectual property, confidentiality, privacy and “hallucinations.” While it is important for any provider offering Gen AI solutions to address these concerns, it is especially critical for providers like Verbit, whose offerings comprise significant generative and non-generative AI technology.

This document aims to provide clarity and transparency around our data practices and solutions and, where applicable, highlight differences between these two flavors of AI in the context of Verbit’s solutions.

What is Automatic Speech Recognition (ASR)?

Since its founding, Verbit has incorporated ASR technology into its core offerings, combining it with the power of human intelligence to provide captions and transcripts of the highest quality.

ASR technology is an example of non-generative AI.

More specifically – and unlike Gen AI-based technology, which we’ll explain further below – ASR engines output predictions based on statistical properties derived from the input training data.

Another example of non-generative AI is image categorization. An image categorization engine could, for example, be trained to distinguish between images of birds and airplanes. Such AI engines can never produce any output that contains data identifiable to the material used to train them. They merely learn to distinguish between the types of objects (birds and airplanes in this context) from many different examples of each of these objects. They do this by calculating and aggregating properties whose values are different for birds and airplanes. Thus, these engines provide a statistical prediction when presented with a new image of one of these objects, categorizing the image as a bird or an airplane. However, such non-generative AI engines would be unable to generate an example of a bird or an airplane. They certainly could not provide output comprising an image from the training set.

ASR technology is similar. The output of an ASR engine comprises the most likely sequence of words that corresponds to the input audio signal. This is a prediction based on statistical properties derived from huge amounts of input training data.

In addition to ASR, Verbit also utilizes predictive AI models that enrich transcription output with auxiliary information. These models include speaker change detection, acoustic event identification (such as laughter, music or applause) and spoken language detection. Like ASR, these models are non-generative in nature. They operate by analyzing patterns and statistical features in the input audio, such as changes in pitch, energy or spectral composition, to make predictions about specific attributes of the audio. Importantly, these models do not reproduce or generate content based on training examples and, therefore, they do not present any risk of intellectual property leakage or the inadvertent exposure of sensitive customer information. Their sole purpose is to augment the accuracy and usability of ASR output, ensuring better structure and richer metadata without compromising privacy.

How Verbit trains its ASR engine

Verbit trains its proprietary ASR engine and ASR enrichment models with input consisting of short audio segments (each are a few seconds in length) together with their corresponding transcription. These inputs are used to correlate statistically between the acoustic speech input signal and the most likely sequence of words or letters it represents.

Verbit’s training data for its ASR engine consists of both publicly available data sets and media submitted by customers. Training is done with many thousands of hours of audio. The data first undergoes statistical feature extraction, such as acoustic and spectral properties of short segments of the speech signal, phonetic information used to distinguish between different sounds and duration of pauses. Those features are accumulated together, so the model can learn the probabilities of the different letters or word outputs that correspond to the acoustic input.

Using customer data to train ASR models offers a myriad of advantages in enhancing system accuracy and ensuring it is fully optimized and fit for specific customer needs. The diverse and expansive dataset derived from real interactions allows the ASR model to better adapt to the nuances of natural speech, capturing various accents, acoustic conditions and contextual intricacies. This leads to improved accuracy in transcribing spoken language, making the system more reliable and accurate. Additionally, leveraging customer data enables the model to evolve dynamically, staying abreast of emerging linguistic trends and evolving vocabulary and terminology that is often unique.

Customers benefit from the strong customization and adaptation techniques Verbit employs, together with the fine segmentation Verbit performs over linguistic content and acoustics. Verbit invests significant effort in analyzing recognition accuracy over different customer verticals and tiers. This ensures that we use the model, vocabulary and configuration best suited for each customer and session. This can only be achieved by analyzing customers’ data, spotting possible gaps, retraining and adapting the models to specifically address any specific customer need. These include specialized terminology, unique acoustic conditions, specific accents that are frequent in the customer’s data or detailed formatting requirements.

Using real customer data enables Verbit to provide a more seamless and tailored transcription service that best fits our customers’ needs and use cases. In essence, the utilization of customer data not only refines the technical aspects of ASR but also fosters a more reliable and customer-specific transcription output.

Verbit’s Gen.V solutions

When we refer to “generative AI,” we mean AI models that can output both original and creative content – for example text, images, music or computer code in response to input prompts.

We differentiate between two sources of data which Gen AI (including Verbit’s Gen.V solutions) works with:
(1) the data that the model was trained on (Training Data)
(2) the data that the model uses when asked to perform a specific task – for example the transcript of a lecture the model is asked to summarize (Input Data)

Gen AI output may reflect (but generally does not duplicate) some of the data it consumes from both sources.

For example, when prompting for a summary of a lecture, the Input Data is the full transcript of the lecture. The Gen AI model provides a short text with the main ideas expressed in the lecture transcripts, using wordings that appeared in the Input Data transcripts as well as rephrasing some of the content of the lecture.

The rephrasing may resemble summaries and texts that were part of the Training Data set used to train the model. This means that it is possible the Gen AI model may output excerpts of the Training Data, excerpts of the Input Data and/or (for the most part) original outputs not readily recognizable as either Training Data or Input Data. Because of the theoretical possibility that Training Data may later appear – potentially unpredictably – in outputs from the trained Gen AI model, there is a legitimate concern that intellectual property, confidential information or personal data found in the Training Data may be exposed later.

For this reason, Verbit’s strict policy is to not use customer data as Training Data for Gen AI models, except where the customer has specifically requested a custom fine-tuned model. Such custom fine-tuned models are made available only to that customer (whose data was used for the fine-tuning) and not to any other customers. Verbit further ensures that any third-party Gen AI services it uses comply with these restrictions.

Privacy and security measures

For all AI solutions, Verbit implements the following privacy protections:

No cross-referencing: Models cannot access or learn from other customers’ data
Encryption: All data is encrypted both in transit and at rest
Access controls: Strict access controls ensure only authorized personnel can access customer data
Audit logging: All data access and processing is logged for security monitoring

Verbit protects our customers’ data

Whether we are providing non-generative AI, Gen AI or any other solutions to our customers, our priority is the security and protection of their data. An overview of our security program and certifications may be found on our Trust Page, while our privacy practices are described in our Privacy Policy.

Further, considering the fundamental differences between various AI technologies, we have taken a comprehensive approach to how we process customers’ data in the context of each solution type.

With respect to our use of customer data for training our ASR engine and as described above, any information that might be identifiable to an individual is stripped out of the trained ASR models. Because ASR technology is non-generative AI, the output of Verbit’s ASR models cannot contain any segments of the data that was used to train the models, so there is no risk that customer data, intellectual property or confidential information will be exposed in the output. We may also use LLMs for the purpose of enhancing the accuracy of ASR engine output. Consistent with Verbit’s policy toward Gen AI, we do not use customer data for training LLMs in this context either.

With respect to our Gen.V solution suite and LLM-based services, we maintain strict policies to ensure customer data privacy:

Customer data is never used as training data for general models
We use LLM infrastructure that is designed with multiple layers of security and logical separation to ensure each tenant’s data and interactions are isolated from others.
No cross-customer data sharing or learning occurs

These measures ensure that no customer is exposed in the output of other customers or third-parties, maintaining the highest standards of privacy and security across all our services.

Copy link