In this episode, Frank and Andy explore voice assistants and the behind the scenes technology that makes them tick.
00:00:02 British Voiceover AI Lady
Hello and welcome to data driven, the podcast where we explore the emerging fields of data science, machine learning and artificial intelligence and will not be the only AI generated voice today. As Frank and Andy interview, my Cousins Alexa, Cortana, Siri and the Google assistant.
00:00:18 British Voiceover AI Lady
Now that I think of it, the Google assistant needs a proper name.
00:00:22 British Voiceover AI Lady
00:00:23 British Voiceover AI Lady
Without further ado, here are your hosts Frank Lavigna and Andy Leonard.
So we're both together and we're going to be talking about voice assistants and kind of how they work and.
Uh, we have some special guests with us today.
Welcome once again, if you're just joining us live. It's Andy later tonight we are here and we are live streaming, data driven podcast where we explore the emerging fields of data science, machine learning an artificial intelligence.
How are you doing Andy?
I'm doing pretty good Frank. How are you?
I'm doing well. I know you have a hard stop so I won't Yammer too long we have.
Three special guests with us today.
And E 3 three.
That's a record. It is a record.
These guests are.
Alexa Hello Alexa.
She's going to say hello back, I'm sure.
On my phone, I have Google Assistant.
Hello Google Assistant Hey Google.
That didn't work. It now correctly phones on. Let me tell you whenever there's a training video or like a keynote where they talk about the integration between them. It's pandemonium in my Home Office, because I usually have all three and it's just harder pandemonium.
So I want to switch to, so we're recording this last. If you're watching live. Thank you. If you're watching later, thank you. We always try to respond to the comments. I think we're pretty good about that. And if you're watching this, if you're listening to this on the podcast, I will try to transcribe everything I'm saying. So let me switch.
An I'll see if I can put us in the little bottom here, how do?
I do that.
There we go.
Oh well anyway.
So there we.
Are were there this is a closed.
Almost there, this is the. This is a quote.
From Charles the 5th, who if you're not up in your history he was kind of a big deal. I think he was a hapsburg. I don't remember shame on me but he has this quote where he says I speak Spanish to God, Italian to women, French to men in German to my horse.
Now you're probably wondering what the heck does this have to do with anything? Well, here's what it has to do with.
Oh no, PowerPoint is going to crash.
No, you can tell.
We're live here we go. This is what I want to say.
This is the my my modern take on this. I speak to Elexa when I'm home to Cortana. When I'm at work and Google Assistant when I'm in my car or have my phone with me.
You can also replace that with Siri. I don't. I do have an iPad, but it's not with me.
That's Frank, the 1st right.
Frank the 1st.
I rather like that I rather like that.
So the idea here is you know how do these things work.
You know what?
An you know, in terms of the guests and and kind of things and I know you have a A Time Constraint so I just want to kind of demonstrate something that I've had the chance for a work engagement had a chance to kind of study up on all three because it was a competitive situation between Louis.
Which is ultimately what powers Cortana, kind of behind the scenes, and Lex which is the Alexa version for processing text and dialogue flow, which is the Google version, so it's a lot of mouthfuls an I did. Let's see if we can get our guests to introduce themselves Alexa.
Hi how are you?
00:04:27 Speaker 3
I'm feeling like a home run.
00:04:29 Speaker 3
This weekend I'll be watching a lot of my favorite sports.
I'm getting a warning sign on my stream ability here, so I don't know what's going on.
What's up with that?
I don't know Well, you're still hear me so that's a good sign.
Yeah, you're you're good with me and I'm out here on the in the boondocks with 25 minutes bro.
There you go.
How are you?
00:04:57 Speaker 4
There you go.
And let's see what our friend Google Assistant.
Has to say.
Hey court, I'm sorry. OK Google, how are you?
Oops, it's on my Bluetooth, that's why OK.
You could tell where life looks 'cause it's just all bloopers.
How are you?
So we've returned a bunch of short search results, OK?
What's interesting about these three is that they're all trying to solve essentially the same problem, right? The they they are trying to solve.
The ability to take human language.
And type, it in and convert it to let me see if I get this screen back up.
An I will maximize that there we go see my fancy setup I do. It's cool, isn't it? Yes.
Alright, so ultimately they're all trying to say the same problem. Hey, we have a comment wise guy. Yes I am miserable. OK, alright, so here's the problem that all these devices want to solve, right? This is a human. This is some speaker device thingy.
You have the cloud.
Which I think is really makes this.
Possible in a lot of ways or not. Just possible and practical? Yeah yeah.
You know, turn.
I have to be careful 'cause I actually do have the lights in my Home Office so.
Set up to this.
Right, right? So this gets digitized into audio.
Here right, I'll draw that by Squiggly Lines.
Right, I like to squiggly lines.
See, I'm talented, I'm very.
Hard you are. You're an artist.
Then a cloud service, right? Whether that's Louis.
Converts that into.
Back into text or into text, right? Right turn the.
Then what happens is then you have to figure out what does that mean. What's the context here, right? What's the intent? That's the official word.
So that's turn lights.
And then on now most people will argue with me. Is that technically this is the intent?
And this is the the destination or slot.
Lex calls us a slot and this is the state that you want, right? So ultimately there's 100 different ways I can say that, and this is what makes the really kind of an LP problem, right? Please turn the lights on or do would you kindly turn the lights on right bioshock?
Right there for you.
That sort of thing, and then whatever that happens, is that this will then parse that into an action, right?
Which, if you have smart plugs, it will then send a message back through the magic of the Internet and then turn the actual.
Oh, I like how that's doing that. Turn the actual light on.
Right, so that's that's basically solving the same problem.
And what's interesting about this? I just realized I didn't say it out loud for folks listening on the podcast, but ultimately what happens is my words get translated into an electronic signal, right? A sign? A wave of sorts.
And then that is then.
Re on the other side, it's then sent from the speaker to the cloud, where it will turn those that sound form that sound wave back into text, right? Or words and then it'll go through and it'll parse out.
What I'm saying is try to get an intent from it or an action to it, and then based on that, some other program that also lives in the cloud.
Mostly, we'll then take an action based on that. Does that make sense like that? Explain that clearly.
I think so yeah, yeah I like it. I like the flow.
Yeah, and it's it's it's.
Amazing how simple this is, right? This is not rocket science inside your average in inside your average you know echo device. You know it's not rocket science, it's just well, this one is the fancy one with the screen, but you know the the typical kind of .1 or whatever is a microphone and speaker in a Wi-Fi connection. It's essentially all it is, right?
So ultimately the the goal then is that let me see if I can D minimize minimize this. So the the goal is is that I have an example of that, and this is essentially a build your own voice kit that I saw at Micro Center for.
Like $5 or something like that.
An inside is a speaker, a button, and a Cardboard box.
And if you attach your Raspberry Pi to this.
You basically have a Google home assistant.
That's nice, yeah.
Shame on me because I bought this longer ago than I care to admit and I haven't built it yet.
But that's just to demonstrate. The point is that these these actual devices are rather simple in terms of, you know, just them being their own thing, right? So what's interesting about this, and this is where the cutting edge comes is when you when I talk.
We have our human brains or whole.
Some will debate about.
Whether or not I have a human brain, but let's let's go with you.
The short of it is, is that.
I have the ability to understand context right from my previous statement.
So I'm going to mute some of these devices because if I start to hear their name, they'll start going wild. What's interesting is how good Cortana is at this. How good the Google assistant is at this, and how.
Alexa needs some room for improvement, right, right? So for instance, if you haven't caught on the shirt I'm wearing says cream cash rules everything around me, that's from a Wu Tang clan song, so I will ask this simple question from Alexa Alexa, who is the Wu Tang clan.
00:11:36 Speaker 3
According to Wikipedia, Wu Tang clan is an American hip hop group formed in Staten Island, NY city in 1992 original.
00:11:45 British Voiceover AI Lady
What was their first album?
00:11:51 Speaker 3
According to Wikipedia, the first album is the debut studio album by German dual Modern Talking. It was released on April 1st, 19.
80 so you get the idea you and I know like if you asked me who they who the Wu Tang clan were an, then what was their first album? I would tell you right, right?
It does not have the notion of context. This is turns out to be very difficult problem for computers to solve. OK, because.
There's a lot going on, right? So if I start talking to you is like handy. I was at this great restaurant last night that and then we switch to another.
We would, we would then say, Hey where was that place? And then I would kind of know if you said place I would know what you were talking about right? That is humans have trouble with this right? 'cause I have many conversations with my wife that kind of go in different directions 'cause I have no idea where she's talking about.
But I mean it's hard for humans. It's really hard for machines, so let's try and see if Cortana does this any better. Hey Cortana?
Who is the Wu Tang clan?
00:12:58 Speaker 4
According to wikipedia.org Wu Tang, clan is an American hip hop group formed in Staten Island, NY city in 1992. Originally composed of rza.
Hey Cortana, what was their first album?
00:13:15 Speaker 4
Should I read a snippet from Wikipedia?
00:13:17 Speaker 4
That might be related.
Yeah, I'm afraid.
00:13:23 Speaker 4
The Wu Tang clan is a NYC based hip hop musical group consisting of 10 American rappers, rza, gza, Method Man, Raekwon, Ghostface Killah.
There's a lot of members of the Wu Tang clan. In case you didn't know.
00:13:37 Speaker 4
Cappadonna and the label dirty *******
Hey Cortana, what was their first album?
00:13:49 Speaker 4
There might be something on Wikipedia.
00:13:51 Speaker 4
Should I read it?
00:13:56 Speaker 4
The Wu Tang clan is a NYC.
Based all right?
Well, in the past she did get that right.
Well, she wasn't completely off base, wasn't completely off. Now she she kept it. Seems like some kind of workflow thing put her into.
It shot well at at least identify the context back to your previous question.
It did on that's a new active. That's a new behavior. I swear I I used to do this demo all the time and depending on the audience it would be Wu Tang clan or you know Aerosmith, you know. So let's see what Google has to say. OK, Google.
Who is the Wu Tang clan?
Alright, you're not very talkative today.
What was their first album?
OK, the demo gods are not kind to me today.
But in the past this has worked on.
On home assistant, an Cortana.
So the reason?
Why we're doing this today, and I know Andy has a hard stop in a couple of minutes is because we are hoping to get data driven as a flash briefing on Alexa.
So I was trying to do this whole surprise thing, but apparently since the demo failed, I figured I'd break into that.
Into that, but that's ultimately the goal. But I also think this is an interesting, interesting topic, because for a lot of folks, this is just this magical black box. There is listening, right? An you know it's not magical and it all comes down to math and science, right? An and the key is to understand, kind of how it's built. And once you understand how it's built, you can build your own systems and it's actually not that hard.
There are more moving parts than you would think, but ultimately it just comes down to.
You know you're taking that speed that sound data, converting it into text, then taking that text and then converting that back into some kind of intent in action, right?
Yeah, and then on the other side, I'm sorry, go ahead.
No, Mark Taylor just said it's a do loop and he's right, he's.
A do loop. We have Mark joining us again. Thanks for watching mark.
I really should ask this.
But unfortunately to be is a bit of a bit of A.
Not a nice word or not a professional word for LinkedIn.
But they don't have enough memory. I think not at all.
No they don't.
But it's an interesting. It's an interesting thing where you, you know, 'cause I'm a nerd. I have I happen to have all three different types, but you know, actually 4 if you count Siri. Let's see if Siri will do any better on the on the album question.
So Andy, we actually have 4 special guests.
Wow, thank you.
Not crazy, that's that's a new record.
Hey Siri, who is the Wu Tang clan?
00:17:03 Speaker 4
Here's some information.
Alright, so she basically.
Pointed to Wikipedia.
What was their first album?
So it did the transcription.
What I said is good.
00:17:30 Speaker 3
I don't recognize this song.
OK oh OK.
So I swear this it did work before, but I mean ultimately it's a very hard problem. In fact, one of the things that they showed a couple of years ago at ignite I or build.
They showed this concept video of this lady talking to Cortana and it was on her phone.
More on that in a minute. It was on her phone an as she was driving into work. She'd be like, Oh, remind me to tell this person.
You know have a meeting with them.
And then the the logic would then go and then schedule the meeting through the through the outlook calendar and then tell her you know so and so rejected the request. But they are able to meet 30 minutes later. Is that OK? Yes, oh and invite so and so to this meeting as well.
Right, each one of those steps, and they call them turns. I don't like the idea of turn, but that's what the industry calls it. The AI was able to follow. Now they were very careful to say this is a, you know near future scenario like there's no but that turned out to be a very hard thing to solve. Yeah, yeah, carrying that context from.
Turn to turn, whereas with humans it's much easier for us to operate that way, so it I think if you're talking about the real cutting edge of what this research is.
It's going to be about that, you know, I will give Alexa props on this. Is that if you play a song?
Um, an you know, pick any song you like. You can try this at home when the song ends.
Wait a minute or two and then say hey, can you play that song again?
Now there is a timeout period. I'm not sure exactly what it is on Alexa, but 24 hours I tried it later. She's had no idea what I was talking about, right right now. I'm not sure how configurable that timeout period would be, but I do know for Lewis and for dialogflow I think it's about 5 to 20 minutes.
Wow, yeah, that's about that's reasonable. I would think. I've also asked Alexa to play more like that song.
Yes, and she does get.
It, and yes, she'll go to a station or something that's similar music similar genre.
I haven't played a song on her since at least this morning, so I'll say Alexa.
Play more songs like that.
00:19:54 Speaker 3
This is weekend plan from Saints or.
Look at that, but no way.
Alright, I better pause that before bad words. No, now that's interesting that I did not expect that that's interesting. Interesting that it it went back that far. It kept that context. Yeah, that's that's actually an improvement. And and for me.
Turning lights on and off and stuff like that. I have that set up here. You know Alexa, turn off the Backlight.
Oh, she's muted so it's not working.
Alexa turn off the Backlight.
And if you're watching on the camera, you'll see that backlight one off and then.
See if I could turn it back on.
So you mean you get the idea? Now I look like a ghost 'cause the lighting is not Alexa. Turn the Backlight back on.
OK, that's awesome.
But that's kind of it, and I think you know, let us know in the comments if you want to see more of these kind of these practicals. Maybe we'll do a live coding session. I am hoping to get the flash briefing to get data driven.
On your Alexa device, so you can just add it to our your flash briefings. I'm hoping to get that done very soon. It's definitely something Andy and I had talked about long overdue, but you know, I want to get that done. And, but in order to do that, I want to kind of explain like what exactly is going on.
You know with this, so it's it's definitely, I'm sorry, go ahead.
Now just saying, sure agreeing with you.
And I do love my Cortana Speaker, it's Harman Kardon Speaker. The sound on this is awesome. I can pipe Spotify to it, I can get my calendar information through Cortana. However, the other day C was activated. So however the other day.
I was moving something and I accidentally elbowed the top button 'cause the top button here is kind of like a secret button. Oh, and it'll tell you like an announcement or random fact of the day, but unfortunately when I did accidentally hit the other day
00:21:52 Speaker 4
You should be aware of. In January 2021 the Cortana service will be removed from your invoke speaker.
00:22:00 Speaker 4
Please call that OK dot Ms slash invoke support.
I love this thing. I mean, it's so that that I'm going to get my enjoyment's worth.
And the last three months I have of that speaker, I actually enjoy it. I I get a lot of value out of it, so I was really disappointed to hear that the upside is is that if you have one of these under active use when you go to the link that's in there, you get a $50 credit to the Microsoft Store.
So I bought a video game with it so.
But I I mean.
I got my value out of it. I'm sad to see it go, but I can always Bluetooth it to this you know so.
Gotcha Frank, I gotta roll this was awesome.
Roll it was awesome. Thanks for joining and I'll do the clothes out in a little bit.
Do the clothes out now and answer any questions you have.
Alright, that's it for data driven. Thanks for listening to the show. Be sure to check us out on Spotify, LinkedIn, live. Oh no, we don't. This is for the podcast audience, so see there are actually lots of bloopers here.
Get log analytics and ASC for your home now ASC.
Oh, Azure Security center. That's a good idea actually.
Alright, let me finish this this closeout.
Alright, Thanks for joining us here on this episode of data driven. Be sure to check out our website datadriven.tv and hopefully very soon we will be on an Alexa Flash briefing. Stay tuned. In the meantime you.
And have a great day.
Alright, that's it.
Yeah, I'd love to get Azure Security Center on what I really want to get is they make this thing, it's called.
The Wi-Fi squirrel packets world where?
You can put it on a network and it will sniff all the packets and I would do that just to say you know, hey you know, is that Smart TV? Is it really phoning home like the way people say it is that sort of thing.
A guy I know has a really killer network setup. I think I mentioned it.
In an episode of impact quantum.
Oh yeah, I probably should re record the whole thing.
Thanks for watching.
Alright, let me re record that outro.
Oh, thanks for listening for David.
Thanks for listening for.
Thanks for listening to data driven the pod.
Now wait a minute.
Thanks for listening for data driven.
Thanks for listening David River.
Alright, Thanks for listening to this episode of data driven. Recorded live on LinkedIn, twitch stream and YouTube and Periscope. If you like what you're doing, definitely check us out on those platforms like subscribe, share. Tell your friends about us and as a reward, if you go to the data drivenbook.com.
We partnered up with audible so you can actually get yourself a free audiobook. Definitely if you will listen to the show, you know Andy and I are huge fans of audiobooks and we definitely been been making our moneys worth out of that subscription.
On another note, if you have not already, check out our other podcast, impact quantum where we talk about how quantum computing and innovations in that space, or about to completely upend.
The world solved the world's biggest problems and as well as true, really revolutionized deep learning and artificial intelligence. So with that, I bid you a great day and a good afternoon, and I'll let the automated Voice Assistant that we have set up do the outro.
00:26:14 British Voiceover AI Lady
Thanks for listening to data driven.
00:26:17 British Voiceover AI Lady
We know you're busy and we appreciate you. Listening to our podcast.
00:26:20 British Voiceover AI Lady
But we have a favor to ask. Please rate and review our podcast on iTunes, Stitcher or wherever you subscribe to us.
00:26:27 British Voiceover AI Lady
You have subscribed to us, haven't you?
00:26:30 British Voiceover AI Lady
Having high ratings and reviews helps us improve the quality of our show and rank us more favorably with the search algorithms.
00:26:37 British Voiceover AI Lady
That means more people listen to us spreading the joy and can't the world use a little more joy these days?
00:26:43 British Voiceover AI Lady
So go do your part to make the world just a little better and be sure to rate and review the show.