Thank you so much for joining our first legal webinar,
Will AI Replace Stenographers?
I know many of you are probably
wondering what that title or are surprised by it,
but we want to make a statement
and just to grab your attention.
So it worked, didn't it?
Today, Verbit's Chief Technology Officer, Eric Shellef,
will discuss the current state of AI technology and why
the hybrid model of incorporating it with
human expertise is the ideal approach.
Our goal is for you to learn
how court reporters and stenographers can use
AI to their advantage and spearhead
the usage of technology in the legal world.
For all of you who are wondering or for those of you
who have colleagues who are probably miss out,
we will have a captioned recording of
this webinar and send it out to you guys as well,
probably in the beginning of next week
so you can share that with
whomever and just to have that for your records as well.
Feel free to add your questions and comments in
the questions panel below and we'll
be more than happy to answer during
the Q&A at the end of the presentation.
Thank you, everyone.
I will let Eric introduce
himself and take he stage from here.
Hi, I'm Eric, the CTO and co-founder of Verbit
and responsible for speech technology,
research and data science.
In today's webinar, I'll try to explain
a little bit about AI in general and typically,
in our domain of legal court reporting.
So a little bit about myself.
Maybe some of you have seen me
in the recent one week for AAERT where I did
a bit of a longer lecture about AI in legal tech.
Yeah, I think we'll begin.
So what is artificial intelligence?
What is AI?
Basically, it's the broad discipline
of creating smart machines,
machines that can perform tasks that are reminiscent with
human capabilities such as recognizing objects,
texts, sounds, speech,
solving different types of problems.
Another more technical term for AI is machine learning.
Before we dive in,
let's do a reality check
on what AI really can do, can't do.
Let's start with some nice stories.
Two decades ago, we had IBM scientists design Deep Blue,
a software that was able to defeat world champion,
Garry Kasparov in a chess match,
sequence of matches and it really shattered
the way people thought about machine capabilities.
Last year, there was another huge success
in the domain where AlphaGo designed
by DeepMind group in Google
was able to defeat the world champion.
Go is considered a much more open game than chess,
and it was predicted that it would take many more years
until a machine beat a human but it happened.
These kinds of achievements make us think
that AI is really superhuman
and perform any tasks much better than people can.
Now, let's look at the flip side of this coin.
Here, we're looking at computer vision tasks
and an image recognition tasks.
On the left, we have
three objects that when
presented in their usual orientation,
we see the computer has
no problem detecting that it's a school bus,
a motor scooter, a fire truck, in high confidence.
However, when we take
these objects and start twisting them,
rotating in ways the computer isn't accustomed to,
it suddenly completely loses the understanding it had
and a school bus then becomes a punching bag,
a motor scooter becomes a parachute,
and things just break.
This is all with high confidence.
On the right, we can see
even stranger phenomenon when
something that looks like noise,
like static, is again identified
with high probability as a gorilla,
a jackfruit, and so forth.
Let's look at another example coming
from the speech understanding,
the text understanding world.
This is something from seven years ago.
I'll give you a few seconds to read it.
What we have now,
this is a bit for entertainment value
and probably today, they fixed it.
So I wouldn't recommended to try this today.
You'll probably have Siri call 911.
But a few years ago,
this task, which is simple for human
was still impossible for a machine to do correctly.
So what's in common with
these tasks that AI underperforms?
What separates them from the task where it excels?
So let's get to the next slide.
This is concerned with,
are these examples part of
the real-world that they require
understanding how things work out there,
how objects look when
they're rotated in three dimensions,
in what situations one would want to call an ambulance,
or are they completely separate
logical problems like chess or Go?
So this is the difference.
Where we have problems that involve real-world knowledge,
this is where humans still have
a huge advantage over machines
and even in the high tech labs in academia,
we're still far behind the human capabilities.
What is our solution to that issue?
How can we still leverage machine capability
in a domain which is obviously
dependent on knowledge of the real world?
We leverage what we call the hybrid model
using both machine and people,
each in their own best role
to carry out these tasks in the most efficient way.
What we want to attempt to assign the AI,
the computer, is the easier task,
which you can perform well, very quickly,
very cheaply, and leave the more difficult,
more creative tasks with more maybe ambiguity
to people to work on.
Let's look at an example in the legal world.
We'll start with something not exactly a court reporting.
Let's think about document and contract review.
Here, to find the document
which is similar to the one that we might be looking at,
we'll probably have to parse
thousands or hundreds of thousands of
documents and try to find similar candidates.
This is obviously not relevant for a human,
much too laborious, but a machine can do it very
quickly and bring a couple of candidates.
But now, we leave it to
the humans and decide which one is
really similar and which one just looks similar.
Let's move on to
our domain of legal transcript and court reporting.
So let's start with the transcription.
So I actually want to show an example
from another vertical that Verbit is actively engaged in,
So here we also perform transcription.
Let's look at an example of an online course and
how it's done by both human and machine.
So what you see here is online lecture.
I'll start playing it. It's about Dante, I think.
Welcome back to our lectures on Dante's Purgatorio.
Now that you've geared up for the journey,
let's discuss how we proceed in the rest of our lectures.
So what we see in black or gray is the computer's guess,
the computer's automatic transcription,
and what we see in red is
mistakes that were corrected by humans.
So this is a perfect example of
how the computer does most of the work, the easier work.
Then we have human layers
which do the tougher terminology
context related corrections. So let's go back
I did want to dedicate one slide to speech recognition,
which is probably the major AI elements in
the way we handle legal transcripts and court reporting.
So despite all I said about
weakness in real-world problems,
there has still been tremendous advances
in speech recognition in the past 10 years
due to huge amounts of data becoming available,
advances in machine learning, deep learning, neural nets,
and we're getting better and better.
Still, there's a gap,
a gap of understanding,
a gap that is dependent on audio.
If you have a much harder audio,
it's tougher to get the right transcription,
if you have accents that
the machine wasn't introduced to,
it will perform worse.
So in Verbit, we take pride in owning our own ASR stack.
ASR is automatic speech recognition,
and why do we dedicate the resources
to building our own ASR technology
when we can just use Google's or some third party?
The lets us one, focus on specific domains
and get much higher accuracy there
and make our machines smarter in that domain
because we're telling it,
"Look, you have to figure out
something that is in education, in finance, in legal"
that helps it become more accurate.
Another point is the learning cycle.
If the machine doesn't know some accent or some term,
once the human layer corrects it,
then it will learn and get this right the next time.
This is something you can do
when you have your own technology.
So I'll go a bit deeper into an example.
Let's look at taxonomy.
Different terminology that can be specific to
some document or some lecture,
or in this case in the example I want to
show you an earning call, a finance call.
So let's compare three speech endings.
So here we have Verbit, Google and Watson,
all transcribing automatically the same call.
Stand by, we're about to begin.
Good day and welcome to the Ooma Fourth Quarter
Fiscal 2017 earnings conference call.
Today's call is being recorded and at this time I'd
like to turn the conference over
to Erin Rheaume, please go ahead.
Okay let's give it a bit more.
Thank you, this is Erin Rheaume,
Ooma investor relations and I'm pleased to
welcome you to Ooma's conference call process
Fourth Quarter Fiscal 2017
earning result with me on the call [inaudible]
So if we look at the names of the people, the companies,
then we see that Google and IBM's Watson
really have no idea.
It's not that they're not good engines
and they have great speech scientists,
but they don't have our knowledge advantage that we're
currently transcribing a finance call of the Ooma company.
So this shows how we leverage
this knowledge to improve accuracy.
So just in a quick scan, the Verbit solution.
One that would want transcription of
a legal document would
simply upload an audio or video file.
An automatic draft would be generated by
our speech engine and then it would
be passed through two human layers
that perform the correction and
bring it to the required accuracy.
Then these corrections would be fed back to the engine
so it could be more accurate for the next transcription.
What value does this give for court reporting firms
that send us audio to transcribe?
Employing AI efficiently lets us do our job much faster,
so we have fast turnaround time.
We have high capacity because
much of the labor is done by machine.
Cost efficiency, for the same reason.
Let's look for a second at
the legal part of transcription.
So we know that when we produce a legal document,
we have legal annotation and we have
different formats depending on jurisdictions,
states and court reporting.
So how do we use computer there?
So here we see the same
transcription interface I showed you before.
Now on the speaker box,
we have the annotation
we're used to of "answer, question, answer."
So this is actually generated by the machine
by looking at the sentences
and figuring out what the right annotation is.
Is it an attorney's question?
Maybe it's colloquy and then we have
a human corrector going
over it and possibly fixing any issue.
The last, I'd like to think about
how AI could be used effectively in court reporting,
not just in transcription.
So we have a live court recording session,
and this could also be
a stenography session that's using a recording,
and we want to go back
to what someone said about stamping.
A stenographer would have to search
his notes for this statement,
a digital reporter would have to quickly scan
the long notes hoping they wrote something relevant,
AI could simply search whatever phrase
is entered and find the place in the audio records.
Even if the original transcription was wrong,
this technology lets you search
the audio and then that can be played back.
We could automatically annotate the real-time graph,
just as we've shown you on offline legal,
we could do the same in real-time.
It's the digital reporter's duty
to make an accurate records.
So if someone speaks unclearly, they have to mention it.
So this is something that can go unnoticed sometimes,
but for a machine it's easy to pick it up
and alert the reporter.
are we off the record?
Is there some static noise source that comes in?
Are people talking over each other?
All these can be detected by machine
and the reporter would be alerted.
So I gave you some examples of how we could use
AI potentially in live court recording situation.
Now, this is a quick slide about Verbit in numbers,
and that's the webinar.
I hope you guys all enjoyed this webinar.
We still haven't received questions from you guys,
but we have some prepared frequently asked questions
that we get from a lot of people
that we meet at conferences.
So one of them is,
"You mentioned that AI can learn,
so what is the framework of
those learning cycles and what is the human influence?"
Okay. So I'll give an example on accents, for example.
AI needs data to be able to
correctly handle such future data.
So if the speech algorithm was trained on data
that was missing a particular accent or,
say, even speech of children,
it could have a harder time coping with
such speech when it was given it to transcribe.
But once we have such speech and transcription
corrected it by the human layers,
then we take that and you'd feed it back,
and then that data is aggregated
and the machine is retrained on the fly,
and then it can handle that accent.
It's similar to person that
is not familiar with some accent,
but then they hear it for a couple of times,
they understand what the person means,
and then they can understand it better in the future.
Okay. Looks like an attendee raising his hand.
I see [inaudible] .
Can you just type your question in
the Q&A portion and then I'll read it out loud?
I have one question from Irene Nakamura,
"What is these fast turnaround?
How many hours, how many days?
Can you go more into depth on you turnaround?"
So if we're talking about legal transcription,
theoretically, we could do it very quickly,
less than a day.
This is really dependent on multiple factors.
Usually, we service the rate
of five-business-day turnaround
required by our customers.
The limiting factor, of course,
is our human layers.
You have a certain capacity,
so if we have a huge volume and a very short turnaround,
you'd have to cope with that in some way,
but theoretically, we can do it in a matter of hours.
Okay. We got another question.
"Can it transcribe children?"
I think so.
Yeah. We actually do have data
which allows it to transcribe children,
but any special type of speech that is rare,
would be more difficult for the machine
to transcribe accurately.
That's why we have our human layers to teach it.
Okay. We've got another one.
"How about the technical material being spoken?
What is the accuracy?"
Okay. Good question.
This can actually be an example where AI really shines.
If we know what the position
or court hearing is about in advance,
say it's about some medical issue
or some insurance claim,
then we can tell that to the AI and it will
bring in the necessary terminology,
have a specific model for the case,
and have actually much higher accuracy for these terms
than a human that's not particularly familiar
with these terms would have.
Sarah asked, "Can you do real-time?"
I think we said that but I'm not sure.
So the answer is yes.
The speech recognition output in real time.
We were working on loads where we have
human enhanced real-time speech recognition.
"Do you have a court recording program similar to
what it is now that incorporates the AI
and helps the court reporter write
more accurately?" From Howard.
This is an interesting use case.
We'd have to explore this idea.
Currently, what we have
is an offline transcripts and service,
but in the brainstorm of how AI can help a court reporter,
then this is an interesting direction to think about.
"How fast can a transcriptionist do a two-hour audio?
One hour equals four hours in
transcribing right now, then you have to proof."
I think that when the audio is pretty good,
and I think that's the right metric;
one hour of audio would turn into four hours of work.
If you split that up to multiple people,
you'd have it go faster.
Of course you'd have to have someone
make sure that there's consistency in terms.
So depending on how much you parallelize it.
"Do you have setups for wireless microphones?".
So this is something in the future.
We're currently concerned with transcribing audio
that were provided by court recording programs.
We're trying to think about and work
together with our partners there to see
how we help the deposition
or the court hearing event itself.
Okay. "Do we buy the program,
or do we use Verbit,
or is there a choice?"
We're getting lots of question.
Sorry, just moving up fast.
So our customers at this point
in legal or court recording firms,
court reporters that submit
audio for legal transcription,
so it's just leaving your details on Verbit site.
I'm so sorry guys,
we're out of time.
But we receive lots of questions from you all.
You all have my email address.
I sent you all the emails registered.
If you have any questions that were not answered,
feel free to email me.
We can even set up a call and we
can answer any of the questions you may have.
Again, we hope you understand how you can use AI to
your advantage in the ever-growing presence
of artificial intelligence in today's world.
So tune in next month for the next legal webinar.
The captioned recording of this webinar
will be available to you.
I will send it all out to you guys on email.
So thank you so much for attending.
Thank you Eric for presenting.
We'll see you guys next time. Bye.