Hello, everyone.

Thank you so much for joining our first legal webinar,

Will AI Replace Stenographers?

I know many of you are probably

wondering what that title or are surprised by it,

but we want to make a statement

and just to grab your attention.

So it worked, didn't it?

Today, Verbit's Chief Technology Officer, Eric Shellef,

will discuss the current state of AI technology and why

the hybrid model of incorporating it with

human expertise is the ideal approach.

Our goal is for you to learn

how court reporters and stenographers can use

AI to their advantage and spearhead

the usage of technology in the legal world.

For all of you who are wondering or for those of you

who have colleagues who are probably miss out,

we will have a captioned recording of

this webinar and send it out to you guys as well,

probably in the beginning of next week

so you can share that with

whomever and just to have that for your records as well.

Feel free to add your questions and comments in

the questions panel below and we'll

be more than happy to answer during

the Q&A at the end of the presentation.

Thank you, everyone.

I will let Eric introduce

himself and take he stage from here.

Thanks Shir.

Hi, I'm Eric, the CTO and co-founder of Verbit

and responsible for speech technology,

research and data science.

In today's webinar, I'll try to explain

a little bit about AI in general and typically,

in our domain of legal court reporting.

So a little bit about myself.

Maybe some of you have seen me

in the recent one week for AAERT where I did

a bit of a longer lecture about AI in legal tech.

Yeah, I think we'll begin.

So what is artificial intelligence?

What is AI?

Basically, it's the broad discipline

of creating smart machines,

machines that can perform tasks that are reminiscent with

human capabilities such as recognizing objects,

texts, sounds, speech,

solving different types of problems.

Another more technical term for AI is machine learning.

Before we dive in,

let's do a reality check

on what AI really can do, can't do.

Let's start with some nice stories.

Two decades ago, we had IBM scientists design Deep Blue,

a software that was able to defeat world champion,

Garry Kasparov in a chess match,

sequence of matches and it really shattered

the way people thought about machine capabilities.

Last year, there was another huge success

in the domain where AlphaGo designed

by DeepMind group in Google

was able to defeat the world champion.

Go is considered a much more open game than chess,

and it was predicted that it would take many more years

until a machine beat a human but it happened.

These kinds of achievements make us think

that AI is really superhuman

and perform any tasks much better than people can.

Now, let's look at the flip side of this coin.

Here, we're looking at computer vision tasks

and an image recognition tasks.

On the left, we have

three objects that when

presented in their usual orientation,

we see the computer has

no problem detecting that it's a school bus,

a motor scooter, a fire truck, in high confidence.

However, when we take

these objects and start twisting them,

rotating in ways the computer isn't accustomed to,

it suddenly completely loses the understanding it had

and a school bus then becomes a punching bag,

a motor scooter becomes a parachute,

and things just break.

This is all with high confidence.

On the right, we can see

even stranger phenomenon when

something that looks like noise,

like static, is again identified

with high probability as a gorilla,

a jackfruit, and so forth.

Let's look at another example coming

from the speech understanding,

the text understanding world.

This is something from seven years ago.

I'll give you a few seconds to read it.

What we have now,

this is a bit for entertainment value

and probably today, they fixed it.

So I wouldn't recommended to try this today.

You'll probably have Siri call 911.

But a few years ago,

this task, which is simple for human

was still impossible for a machine to do correctly.

So what's in common with

these tasks that AI underperforms?

What separates them from the task where it excels?

So let's get to the next slide.

This is concerned with,

are these examples part of

the real-world that they require

understanding how things work out there,

how objects look when

they're rotated in three dimensions,

in what situations one would want to call an ambulance,

or are they completely separate

logical problems like chess or Go?

So this is the difference.

Where we have problems that involve real-world knowledge,

this is where humans still have

a huge advantage over machines

and even in the high tech labs in academia,

we're still far behind the human capabilities.

What is our solution to that issue?

How can we still leverage machine capability

in a domain which is obviously

dependent on knowledge of the real world?

We leverage what we call the hybrid model

using both machine and people,

each in their own best role

to carry out these tasks in the most efficient way.

What we want to attempt to assign the AI,

the computer, is the easier task,

which you can perform well, very quickly,

very cheaply, and leave the more difficult,

more creative tasks with more maybe ambiguity

to people to work on.

Let's look at an example in the legal world.

We'll start with something not exactly a court reporting.

Let's think about document and contract review.

Here, to find the document

which is similar to the one that we might be looking at,

we'll probably have to parse

thousands or hundreds of thousands of

documents and try to find similar candidates.

This is obviously not relevant for a human,

much too laborious, but a machine can do it very

quickly and bring a couple of candidates.

But now, we leave it to

the humans and decide which one is

really similar and which one just looks similar.

Let's move on to

our domain of legal transcript and court reporting.

So let's start with the transcription.

So I actually want to show an example

from another vertical that Verbit is actively engaged in,

online education.

So here we also perform transcription.

Let's look at an example of an online course and

how it's done by both human and machine.

So what you see here is online lecture.

I'll start playing it. It's about Dante, I think.

Welcome back to our lectures on Dante's Purgatorio.

Now that you've geared up for the journey,

let's discuss how we proceed in the rest of our lectures.

So what we see in black or gray is the computer's guess,

the computer's automatic transcription,

and what we see in red is

mistakes that were corrected by humans.

So this is a perfect example of

how the computer does most of the work, the easier work.

Then we have human layers

which do the tougher terminology

context related corrections. So let's go back

I did want to dedicate one slide to speech recognition,

which is probably the major AI elements in

the way we handle legal transcripts and court reporting.

So despite all I said about

weakness in real-world problems,

there has still been tremendous advances

in speech recognition in the past 10 years

due to huge amounts of data becoming available,

advances in machine learning, deep learning, neural nets,

and we're getting better and better.

Still, there's a gap,

a gap of understanding,

a gap that is dependent on audio.

If you have a much harder audio,

it's tougher to get the right transcription,

if you have accents that

the machine wasn't introduced to,

it will perform worse.

So in Verbit, we take pride in owning our own ASR stack.

ASR is automatic speech recognition,

and why do we dedicate the resources

to building our own ASR technology

when we can just use Google's or some third party?

The lets us one, focus on specific domains

and get much higher accuracy there

and make our machines smarter in that domain

because we're telling it,

"Look, you have to figure out

something that is in education, in finance, in legal"

that helps it become more accurate.

Another point is the learning cycle.

If the machine doesn't know some accent or some term,

once the human layer corrects it,

then it will learn and get this right the next time.

This is something you can do

when you have your own technology.

So I'll go a bit deeper into an example.

Let's look at taxonomy.

Different terminology that can be specific to

some document or some lecture,

or in this case in the example I want to

show you an earning call, a finance call.

So let's compare three speech endings.

So here we have Verbit, Google and Watson,

all transcribing automatically the same call.

Let's start.

Stand by, we're about to begin.

Good day and welcome to the Ooma Fourth Quarter

Fiscal 2017 earnings conference call.

Today's call is being recorded and at this time I'd

like to turn the conference over

to Erin Rheaume, please go ahead.

Okay let's give it a bit more.

Thank you, this is Erin Rheaume,

Ooma investor relations and I'm pleased to

welcome you to Ooma's conference call process

Fourth Quarter Fiscal 2017

earning result with me on the call [inaudible]

So if we look at the names of the people, the companies,

then we see that Google and IBM's Watson

really have no idea.

It's not that they're not good engines

and they have great speech scientists,

but they don't have our knowledge advantage that we're

currently transcribing a finance call of the Ooma company.

So this shows how we leverage

this knowledge to improve accuracy.

Go back.

So just in a quick scan, the Verbit solution.

One that would want transcription of

a legal document would

simply upload an audio or video file.

An automatic draft would be generated by

our speech engine and then it would

be passed through two human layers

that perform the correction and

bring it to the required accuracy.

Then these corrections would be fed back to the engine

so it could be more accurate for the next transcription.

What value does this give for court reporting firms

that send us audio to transcribe?

Employing AI efficiently lets us do our job much faster,

so we have fast turnaround time.

We have high capacity because

much of the labor is done by machine.

Cost efficiency, for the same reason.

Let's look for a second at

the legal part of transcription.

So we know that when we produce a legal document,

we have legal annotation and we have

different formats depending on jurisdictions,

states and court reporting.

So how do we use computer there?

So here we see the same

transcription interface I showed you before.

Now on the speaker box,

we have the annotation

we're used to of "answer, question, answer."

So this is actually generated by the machine

by looking at the sentences

and figuring out what the right annotation is.

Is it an attorney's question?

Maybe it's colloquy and then we have

a human corrector going

over it and possibly fixing any issue.

The last, I'd like to think about

how AI could be used effectively in court reporting,

not just in transcription.

So we have a live court recording session,

and this could also be

a stenography session that's using a recording,

and we want to go back

to what someone said about stamping.

A stenographer would have to search

his notes for this statement,

a digital reporter would have to quickly scan

the long notes hoping they wrote something relevant,

AI could simply search whatever phrase

is entered and find the place in the audio records.

Even if the original transcription was wrong,

this technology lets you search

the audio and then that can be played back.

We could automatically annotate the real-time graph,

just as we've shown you on offline legal,

we could do the same in real-time.

It's the digital reporter's duty

to make an accurate records.

So if someone speaks unclearly, they have to mention it.

So this is something that can go unnoticed sometimes,

but for a machine it's easy to pick it up

and alert the reporter.

Other examples,

are we off the record?

Is there some static noise source that comes in?

Are people talking over each other?

All these can be detected by machine

and the reporter would be alerted.

So I gave you some examples of how we could use

AI potentially in live court recording situation.

Now, this is a quick slide about Verbit in numbers,

and that's the webinar.

I hope you guys all enjoyed this webinar.

We still haven't received questions from you guys,

but we have some prepared frequently asked questions

that we get from a lot of people

that we meet at conferences.

So one of them is,

"You mentioned that AI can learn,

so what is the framework of

those learning cycles and what is the human influence?"

Okay. So I'll give an example on accents, for example.

AI needs data to be able to

correctly handle such future data.

So if the speech algorithm was trained on data

that was missing a particular accent or,

say, even speech of children,

it could have a harder time coping with

such speech when it was given it to transcribe.

But once we have such speech and transcription

corrected it by the human layers,

then we take that and you'd feed it back,

and then that data is aggregated

and the machine is retrained on the fly,

and then it can handle that accent.

It's similar to person that

is not familiar with some accent,

but then they hear it for a couple of times,

they understand what the person means,

and then they can understand it better in the future.

Okay. Looks like an attendee raising his hand.

I see [inaudible] .

Can you just type your question in

the Q&A portion and then I'll read it out loud?

I have one question from Irene Nakamura,

"What is these fast turnaround?

How many hours, how many days?

Can you go more into depth on you turnaround?"

So if we're talking about legal transcription,

theoretically, we could do it very quickly,

less than a day.

This is really dependent on multiple factors.

Usually, we service the rate

of five-business-day turnaround

required by our customers.

The limiting factor, of course,

is our human layers.

You have a certain capacity,

so if we have a huge volume and a very short turnaround,

you'd have to cope with that in some way,

but theoretically, we can do it in a matter of hours.

Okay. We got another question.

"Can it transcribe children?"

I think so.

Yeah. We actually do have data

which allows it to transcribe children,

but any special type of speech that is rare,

would be more difficult for the machine

to transcribe accurately.

That's why we have our human layers to teach it.

Okay. We've got another one.

"How about the technical material being spoken?

What is the accuracy?"

Okay. Good question.

This can actually be an example where AI really shines.

If we know what the position

or court hearing is about in advance,

say it's about some medical issue

or some insurance claim,

then we can tell that to the AI and it will

bring in the necessary terminology,

have a specific model for the case,

and have actually much higher accuracy for these terms

than a human that's not particularly familiar

with these terms would have.

Sarah asked, "Can you do real-time?"

I think we said that but I'm not sure.

So the answer is yes.

The speech recognition output in real time.

We were working on loads where we have

human enhanced real-time speech recognition.

"Do you have a court recording program similar to

what it is now that incorporates the AI

and helps the court reporter write

more accurately?" From Howard.

This is an interesting use case.

We'd have to explore this idea.

Currently, what we have

is an offline transcripts and service,

but in the brainstorm of how AI can help a court reporter,

then this is an interesting direction to think about.

"How fast can a transcriptionist do a two-hour audio?

One hour equals four hours in

transcribing right now, then you have to proof."

I think that when the audio is pretty good,

and I think that's the right metric;

one hour of audio would turn into four hours of work.

If you split that up to multiple people,

you'd have it go faster.

Of course you'd have to have someone

make sure that there's consistency in terms.

So depending on how much you parallelize it.

"Do you have setups for wireless microphones?".

So this is something in the future.

We're currently concerned with transcribing audio

that were provided by court recording programs.

We're trying to think about and work

together with our partners there to see

how we help the deposition

or the court hearing event itself.

Okay. "Do we buy the program,

or do we use Verbit,

or is there a choice?"

We're getting lots of question.

Sorry, just moving up fast.

So our customers at this point

in legal or court recording firms,

court reporters that submit

audio for legal transcription,

so it's just leaving your details on Verbit site.

I'm so sorry guys,

we're out of time.

But we receive lots of questions from you all.

You all have my email address.

I sent you all the emails registered.

If you have any questions that were not answered,

feel free to email me.

We can even set up a call and we

can answer any of the questions you may have.

Again, we hope you understand how you can use AI to

your advantage in the ever-growing presence

of artificial intelligence in today's world.

So tune in next month for the next legal webinar.

The captioned recording of this webinar

will be available to you.

I will send it all out to you guys on email.

So thank you so much for attending.

Thank you Eric for presenting.

Thank you.

We'll see you guys next time. Bye.