In this episode, Frank and Andy explore voice assistants and the behind the scenes technology that makes them tick.
00:00:02 British Voiceover AI Lady
Hello and welcome to data driven, the podcast where we explore the emerging fields of data science, machine learning and artificial intelligence and will not be the only AI generated voice today. As Frank and Andy interview, my Cousins Alexa, Cortana, Siri and the Google assistant.
00:00:18 British Voiceover AI Lady
Now that I think of it, the Google assistant needs a proper name.
00:00:22 British Voiceover AI Lady
Doesn't it?
00:00:23 British Voiceover AI Lady
Without further ado, here are your hosts Frank Lavigna and Andy Leonard.
00:00:29 Frank
So we're both together and we're going to be talking about voice assistants and kind of how they work and.
00:00:38 Frank
Uh, we have some special guests with us today.
00:00:42 Frank
Welcome once again, if you're just joining us live. It's Andy later tonight we are here and we are live streaming, data driven podcast where we explore the emerging fields of data science, machine learning an artificial intelligence.
00:00:55 Frank
How are you doing Andy?
00:00:56 Andy
I'm doing pretty good Frank. How are you?
00:00:59 Frank
I'm doing well. I know you have a hard stop so I won't Yammer too long we have.
00:01:04 Frank
Three special guests with us today.
00:01:06 Frank
And E 3 three.
00:01:09 Frank
That's a record. It is a record.
00:01:14 Frank
These guests are.
00:01:19 Andy
Alexa Hello Alexa.
00:01:22 Frank
She's going to say hello back, I'm sure.
00:01:26 Andy
Yeah.
00:01:28 Frank
Cortana.
00:01:30 Andy
Hello Cortana.
00:01:33 Frank
And.
00:01:36 Frank
On my phone, I have Google Assistant.
00:01:38 Andy
Hello Google Assistant Hey Google.
00:01:41 Frank
That didn't work. It now correctly phones on. Let me tell you whenever there's a training video or like a keynote where they talk about the integration between them. It's pandemonium in my Home Office, because I usually have all three and it's just harder pandemonium.
00:01:59 Frank
So I want to switch to, so we're recording this last. If you're watching live. Thank you. If you're watching later, thank you. We always try to respond to the comments. I think we're pretty good about that. And if you're watching this, if you're listening to this on the podcast, I will try to transcribe everything I'm saying. So let me switch.
00:02:18 Frank
Here.
00:02:19 Frank
An I'll see if I can put us in the little bottom here, how do?
00:02:22 Frank
I do that.
00:02:24 Frank
There we go.
00:02:27 Frank
Oh well anyway.
00:02:30 Andy
So there we.
00:02:31 Andy
Are were there this is a closed.
00:02:34 Frank
Almost there, this is the. This is a quote.
00:02:38 Frank
From Charles the 5th, who if you're not up in your history he was kind of a big deal. I think he was a hapsburg. I don't remember shame on me but he has this quote where he says I speak Spanish to God, Italian to women, French to men in German to my horse.
00:02:57 Frank
Now you're probably wondering what the heck does this have to do with anything? Well, here's what it has to do with.
00:03:03 Frank
Oh no, PowerPoint is going to crash.
00:03:07 Frank
No, you can tell.
00:03:08 Frank
We're live here we go. This is what I want to say.
00:03:11 Frank
This is the my my modern take on this. I speak to Elexa when I'm home to Cortana. When I'm at work and Google Assistant when I'm in my car or have my phone with me.
00:03:24 Frank
You can also replace that with Siri. I don't. I do have an iPad, but it's not with me.
00:03:28 Andy
That's Frank, the 1st right.
00:03:30 Frank
Frank the 1st.
00:03:34 Frank
I rather like that I rather like that.
00:03:38 Frank
So the idea here is you know how do these things work.
00:03:41 Frank
You know what?
00:03:44 Frank
An you know, in terms of the guests and and kind of things and I know you have a A Time Constraint so I just want to kind of demonstrate something that I've had the chance for a work engagement had a chance to kind of study up on all three because it was a competitive situation between Louis.
00:04:03 Frank
Which is ultimately what powers Cortana, kind of behind the scenes, and Lex which is the Alexa version for processing text and dialogue flow, which is the Google version, so it's a lot of mouthfuls an I did. Let's see if we can get our guests to introduce themselves Alexa.
00:04:24 Frank
Hi how are you?
00:04:27 Speaker 3
I'm feeling like a home run.
00:04:29 Speaker 3
This weekend I'll be watching a lot of my favorite sports.
00:04:35 Frank
OK.
00:04:38 Frank
I'm getting a warning sign on my stream ability here, so I don't know what's going on.
00:04:42 Andy
What's up with that?
00:04:43 Frank
I don't know Well, you're still hear me so that's a good sign.
00:04:46 Andy
Yeah, you're you're good with me and I'm out here on the in the boondocks with 25 minutes bro.
00:04:50 Frank
There you go.
00:04:52 Frank
Hey Cortana.
00:04:54 Frank
How are you?
00:04:57 Speaker 4
Great thanks.
00:04:59 Frank
There you go.
00:05:00 Frank
And let's see what our friend Google Assistant.
00:05:03 Frank
Has to say.
00:05:10 Frank
Hey court, I'm sorry. OK Google, how are you?
00:05:18 Frank
Oops, it's on my Bluetooth, that's why OK.
00:05:22 Frank
You could tell where life looks 'cause it's just all bloopers.
00:05:27 Frank
How are you?
00:05:32 Frank
So we've returned a bunch of short search results, OK?
00:05:39 Frank
What's interesting about these three is that they're all trying to solve essentially the same problem, right? The they they are trying to solve.
00:05:46 Frank
The ability to take human language.
00:05:49 Frank
And type, it in and convert it to let me see if I get this screen back up.
00:05:55 Frank
An I will maximize that there we go see my fancy setup I do. It's cool, isn't it? Yes.
00:06:04 Frank
Alright, so ultimately they're all trying to say the same problem. Hey, we have a comment wise guy. Yes I am miserable. OK, alright, so here's the problem that all these devices want to solve, right? This is a human. This is some speaker device thingy.
00:06:21 Frank
Right?
00:06:23 Frank
And.
00:06:25 Frank
You have the cloud.
00:06:27 Frank
Which I think is really makes this.
00:06:29 Frank
Possible in a lot of ways or not. Just possible and practical? Yeah yeah.
00:06:34 Frank
I say.
00:06:36 Frank
You know, turn.
00:06:39 Frank
I have to be careful 'cause I actually do have the lights in my Home Office so.
00:06:43
Set up to this.
00:06:46 Frank
Right, right? So this gets digitized into audio.
00:06:51 Frank
Right?
00:06:52 Frank
Here right, I'll draw that by Squiggly Lines.
00:06:55 Andy
Right, I like to squiggly lines.
00:06:57 Frank
See, I'm talented, I'm very.
00:06:59 Andy
Hard you are. You're an artist.
00:07:01 Frank
Then a cloud service, right? Whether that's Louis.
00:07:07 Frank
Dialogflow
00:07:09 Frank
Or Lax.
00:07:12 Frank
Converts that into.
00:07:15 Frank
Back into text or into text, right? Right turn the.
00:07:20 Frank
Lights on.
00:07:27 Frank
Then what happens is then you have to figure out what does that mean. What's the context here, right? What's the intent? That's the official word.
00:07:34 Frank
So that's turn lights.
00:07:38 Frank
And then on now most people will argue with me. Is that technically this is the intent?
00:07:43 Frank
And this is the the destination or slot.
00:07:48 Frank
Lex calls us a slot and this is the state that you want, right? So ultimately there's 100 different ways I can say that, and this is what makes the really kind of an LP problem, right? Please turn the lights on or do would you kindly turn the lights on right bioshock?
00:08:02 Frank
Right there for you.
00:08:05 Frank
Um?
00:08:06 Frank
That sort of thing, and then whatever that happens, is that this will then parse that into an action, right?
00:08:12 Frank
Which, if you have smart plugs, it will then send a message back through the magic of the Internet and then turn the actual.
00:08:20 Frank
Oh, I like how that's doing that. Turn the actual light on.
00:08:27 Frank
Right, so that's that's basically solving the same problem.
00:08:30 Frank
Right?
00:08:32 Frank
And what's interesting about this? I just realized I didn't say it out loud for folks listening on the podcast, but ultimately what happens is my words get translated into an electronic signal, right? A sign? A wave of sorts.
00:08:45 Frank
And then that is then.
00:08:47 Frank
Re on the other side, it's then sent from the speaker to the cloud, where it will turn those that sound form that sound wave back into text, right? Or words and then it'll go through and it'll parse out.
00:09:02 Frank
What I'm saying is try to get an intent from it or an action to it, and then based on that, some other program that also lives in the cloud.
00:09:11 Frank
Mostly, we'll then take an action based on that. Does that make sense like that? Explain that clearly.
00:09:17 Andy
I think so yeah, yeah I like it. I like the flow.
00:09:21 Frank
Yeah, and it's it's it's.
00:09:22 Frank
Amazing how simple this is, right? This is not rocket science inside your average in inside your average you know echo device. You know it's not rocket science, it's just well, this one is the fancy one with the screen, but you know the the typical kind of .1 or whatever is a microphone and speaker in a Wi-Fi connection. It's essentially all it is, right?
00:09:42 Frank
So ultimately the the goal then is that let me see if I can D minimize minimize this. So the the goal is is that I have an example of that, and this is essentially a build your own voice kit that I saw at Micro Center for.
00:10:01 Frank
Like $5 or something like that.
00:10:03 Frank
An inside is a speaker, a button, and a Cardboard box.
00:10:09 Frank
And if you attach your Raspberry Pi to this.
00:10:12 Frank
You basically have a Google home assistant.
00:10:17 Andy
That's nice, yeah.
00:10:19 Frank
Shame on me because I bought this longer ago than I care to admit and I haven't built it yet.
00:10:26 Frank
But that's just to demonstrate. The point is that these these actual devices are rather simple in terms of, you know, just them being their own thing, right? So what's interesting about this, and this is where the cutting edge comes is when you when I talk.
00:10:41 Frank
We have our human brains or whole.
00:10:44 Frank
Some will debate about.
00:10:45 Frank
Whether or not I have a human brain, but let's let's go with you.
00:10:49 Frank
So.
00:10:51 Frank
The short of it is, is that.
00:10:54 Frank
I have the ability to understand context right from my previous statement.
00:10:58 Frank
So I'm going to mute some of these devices because if I start to hear their name, they'll start going wild. What's interesting is how good Cortana is at this. How good the Google assistant is at this, and how.
00:11:14 Frank
Alexa needs some room for improvement, right, right? So for instance, if you haven't caught on the shirt I'm wearing says cream cash rules everything around me, that's from a Wu Tang clan song, so I will ask this simple question from Alexa Alexa, who is the Wu Tang clan.
00:11:36 Speaker 3
According to Wikipedia, Wu Tang clan is an American hip hop group formed in Staten Island, NY city in 1992 original.
00:11:45 British Voiceover AI Lady
Hopes.
00:11:46 Frank
Alexa.
00:11:48 Frank
What was their first album?
00:11:51 Speaker 3
According to Wikipedia, the first album is the debut studio album by German dual Modern Talking. It was released on April 1st, 19.
00:11:59 Frank
80 so you get the idea you and I know like if you asked me who they who the Wu Tang clan were an, then what was their first album? I would tell you right, right?
00:12:09 Frank
It does not have the notion of context. This is turns out to be very difficult problem for computers to solve. OK, because.
00:12:19 Frank
There's a lot going on, right? So if I start talking to you is like handy. I was at this great restaurant last night that and then we switch to another.
00:12:27 Frank
Topic.
00:12:28 Frank
Then
00:12:29 Frank
We would, we would then say, Hey where was that place? And then I would kind of know if you said place I would know what you were talking about right? That is humans have trouble with this right? 'cause I have many conversations with my wife that kind of go in different directions 'cause I have no idea where she's talking about.
00:12:47 Frank
But I mean it's hard for humans. It's really hard for machines, so let's try and see if Cortana does this any better. Hey Cortana?
00:12:54 Frank
Who is the Wu Tang clan?
00:12:58 Speaker 4
According to wikipedia.org Wu Tang, clan is an American hip hop group formed in Staten Island, NY city in 1992. Originally composed of rza.
00:13:10 Frank
Hey Cortana, what was their first album?
00:13:15 Speaker 4
Should I read a snippet from Wikipedia?
00:13:17 Speaker 4
That might be related.
00:13:19 Frank
Yeah, I'm afraid.
00:13:23 Speaker 4
The Wu Tang clan is a NYC based hip hop musical group consisting of 10 American rappers, rza, gza, Method Man, Raekwon, Ghostface Killah.
00:13:34 Frank
There's a lot of members of the Wu Tang clan. In case you didn't know.
00:13:37 Speaker 4
Cappadonna and the label dirty *******
00:13:42 Frank
Hey Cortana, what was their first album?
00:13:49 Speaker 4
There might be something on Wikipedia.
00:13:51 Speaker 4
Should I read it?
00:13:53 Frank
Yeah.
00:13:56 Speaker 4
The Wu Tang clan is a NYC.
00:13:58 Frank
Based all right?
00:13:59 Frank
Well, in the past she did get that right.
00:14:03 Andy
Well, she wasn't completely off base, wasn't completely off. Now she she kept it. Seems like some kind of workflow thing put her into.
00:14:11 Andy
It shot well at at least identify the context back to your previous question.
00:14:16 Frank
It did on that's a new active. That's a new behavior. I swear I I used to do this demo all the time and depending on the audience it would be Wu Tang clan or you know Aerosmith, you know. So let's see what Google has to say. OK, Google.
00:14:32 Frank
Who is the Wu Tang clan?
00:14:39 Frank
Alright, you're not very talkative today.
00:14:45 Frank
What was their first album?
00:14:50 Frank
OK, the demo gods are not kind to me today.
00:14:54 Frank
But in the past this has worked on.
00:14:56 Frank
On home assistant, an Cortana.
00:15:00 Frank
OK, so.
00:15:05 Frank
So the reason?
00:15:05 Frank
Why we're doing this today, and I know Andy has a hard stop in a couple of minutes is because we are hoping to get data driven as a flash briefing on Alexa.
00:15:15 Frank
And.
00:15:17 Frank
Alexa.
00:15:20 Frank
So I was trying to do this whole surprise thing, but apparently since the demo failed, I figured I'd break into that.
00:15:27 Frank
Into that, but that's ultimately the goal. But I also think this is an interesting, interesting topic, because for a lot of folks, this is just this magical black box. There is listening, right? An you know it's not magical and it all comes down to math and science, right? An and the key is to understand, kind of how it's built. And once you understand how it's built, you can build your own systems and it's actually not that hard.
00:15:47 Frank
There are more moving parts than you would think, but ultimately it just comes down to.
00:15:53 Frank
You know you're taking that speed that sound data, converting it into text, then taking that text and then converting that back into some kind of intent in action, right?
00:16:04 Frank
Yeah, and then on the other side, I'm sorry, go ahead.
00:16:07 Andy
No, Mark Taylor just said it's a do loop and he's right, he's.
00:16:10 Frank
A do loop. We have Mark joining us again. Thanks for watching mark.
00:16:14 Frank
I really should ask this.
00:16:17 Frank
But unfortunately to be is a bit of a bit of A.
00:16:21 Frank
Not a nice word or not a professional word for LinkedIn.
00:16:25
Yeah.
00:16:28 Frank
But they don't have enough memory. I think not at all.
00:16:33
No they don't.
00:16:34 Frank
But it's an interesting. It's an interesting thing where you, you know, 'cause I'm a nerd. I have I happen to have all three different types, but you know, actually 4 if you count Siri. Let's see if Siri will do any better on the on the album question.
00:16:49 Frank
So Andy, we actually have 4 special guests.
00:16:52 Andy
Wow, thank you.
00:16:54 Andy
Not crazy, that's that's a new record.
00:16:56 Frank
Hey Siri, who is the Wu Tang clan?
00:17:03 Speaker 4
Here's some information.
00:17:05 Frank
Alright, so she basically.
00:17:07 Frank
Pointed to Wikipedia.
00:17:13 Frank
What was their first album?
00:17:25 Frank
So it did the transcription.
00:17:27 Frank
What I said is good.
00:17:30 Speaker 3
I don't recognize this song.
00:17:32 Andy
OK oh OK.
00:17:34 Frank
So I swear this it did work before, but I mean ultimately it's a very hard problem. In fact, one of the things that they showed a couple of years ago at ignite I or build.
00:17:44 Frank
They showed this concept video of this lady talking to Cortana and it was on her phone.
00:17:51 Frank
More on that in a minute. It was on her phone an as she was driving into work. She'd be like, Oh, remind me to tell this person.
00:18:00 Frank
You know have a meeting with them.
00:18:03 Frank
And then the the logic would then go and then schedule the meeting through the through the outlook calendar and then tell her you know so and so rejected the request. But they are able to meet 30 minutes later. Is that OK? Yes, oh and invite so and so to this meeting as well.
00:18:20 Frank
Right,...