Verbit raises $31 million for human-in-the-loop AI transcription tech

The voice and speech recognition tech market is anticipated to be worth $31.82 billion by 2025, driven by new applications in the banking, health care, and automotive industries. And Tom Livne, who cofounded Verbit.ai with Eric Shellef and Kobi Ben Tzvi in 2017, asserts the Tel Aviv and New York-based startup will contribute substantially to the segment’s rise. Verbit’s adaptive speech recognition tech, which can generate detailed transcriptions with up to 99% targeted accuracy, recently attracted the attention of VCs that include Vertex Ventures and Oryzn Capital, both of which participated in the startup’s $23 million series A round. Now, roughly a year later, Verbit hopes to lay the groundwork for growth with an infusion of fresh capital.

Verbit today announced the close of a $31 million series B round at double its previous valuation. The round was led by New York-based equity growth firm Stripes, with participation from Viola Ventures, Vertex Ventures, HV Ventures, Oryzn Capital, and ClalTech. It brings the company’s total raised to $65 million, and Livne says it will enable Verbit to expand its products’ language support and further develop its speech recognition technologies as it triples the headcount in its New York office this year.

“When I established Verbit [three] years ago, I didn’t anticipate we would become one of the market-leading companies in our industry so quickly,” said Livne, who added that revenue has tripled in the past year. “This latest financing round is an important milestone in Verbit’s journey and strengthens the incredible momentum we had in 2019. The collaboration with Stripes is a great indicator of Verbit’s category-leading product and will allow us to continue innovating in the market.”

Voice transcription and captioning isn’t novel — it’s a decades-old venture with well-established players, like Nuance and Google. Enterprise platforms such as Microsoft 365 offer AI-powered speech-to-text, along with Cisco and startups like Otter and Voicera.

But what sets Verbit apart is its reliance on “cutting-edge” advances in machine learning and natural language understanding, according to Livne.

Three models — acoustic, linguistic, and contextual — inform Verbit’s captioning. They filter out background noise and echoes and identify speakers regardless of accent, and they detect domain-specific terms while incorporating current events and updates. Clients first upload audio or video files to a cloud dashboard for processing. Then a team of over 15,000 human freelancers in over 20 countries edits and reviews the material, taking into account customer-supplied notes and guidelines. The finished transcriptions are available for export to services like Blackboard, Vimeo, YouTube, Canvas, and BrightCode.

Verbit’s web frontend shows the progress of each job and lets users edit and share files or define the access permissions for each and add inline comments, request reviews, or view usage reports. A forthcoming feature — Verbit Express — will allow them to drag files in need of transcription to a folder on a desktop PC, where they’ll be automatically uploaded and processed.

The transcriber side of the equation is a self-serve and on-demand affair. Verbit transcribers can choose the files they’d like to work on — the platform doesn’t assign them manually — and take advantage of built-in dictionary and research tools, keyboard shortcuts, speed control, a highlighter, and spell check. Those who consistently produce exceptional work and achieve high quality scores are offered the chance to become reviewers, responsible for proofreading — and editing, if necessary — transcribers’ work.

Livne claims the platform can reduce operating costs by up to 50% and deliver results 10 times faster than the competition. In any case, it was enough to woo a healthy client base of over 150 educational institutions and commercial customers (up from 70 as of January 2019), including the NCAA, London Business School, Fashion Institute of Technology, Utah State University, University of Utah, University of Southern Utah, University of Vermont, Auburn University, Western Governor University, University of California at Santa Barbara, Oakland University, Stanford, Coursera, Udacity, Panopto, Kaltura, and close to 100 others (up from 50 in May 2018).

Customer have to make a minimum commitment of $10,000, a pricing structure that apparently paid dividends. Verbit.ai isn’t disclosing exact revenue but says it’s in the “millions” and that the company is cash flow positive.

Verbit plans to explore verticals in the insurance and financial sectors, as well as media and medical use cases. To this end, it recently launched a human-in-the-loop transcription service for media firms with a delay of only a few seconds. “We are thrilled to partner with Tom and the rest of the Verbit team on their mission to build the leading AI-powered transcription and captioning platform,” said Stripes partner Saagar Kulkarni, who intends to join Verbit’s board of directors. “We are big believers in the power of AI to fundamentally change business models and provide critical services better, faster, and more affordably.”