In this Deep Dive, Frank and Andy delve into the world of Data Warehousing, what is it and do they know things? Let's find out!
Frank also shares that he has a new role at Microsoft.
Hello and welcome to data driven,
the podcast where we explore the emerging field of data
We bring the best minds in data,
software, engineering, machine learning and artificial intelligence.
Now hear your hosts Frank Lavigna and Andy Leonard.
Hello and welcome back to data driven.
The podcast where we explore the emerging fields of data
science machine learning an artificial intelligence.
If you like to think of data as the new
oil then you could consider us like Car Talk.
However, we can't go on a road trip because of
the Corona virus lock down.
So it's just Andy and I kind of stuck at
And thanks to the Magic of Technology we can be
on the show at the same time.
And, uh, how's it going?
Andy? It's going
well, Frank, how are you doing?
Good, good, uh, you'll
probably hear my kids in the background.
will, and you know what Frank,
I think it's fine. You know I'm going to.
I understand why you said the word stuck with you
and I work remotely an awful lot.
We usually record like this.
There's there's less in the background.
It's your place most of the time,
but you have couple of young boys there and you
need to be in the room with them when mom
who's also working from home is you know is doing
some of her work so kudos to you to both
of you for finding a way to manage this.
Everybody's going through these sorts of things and I'm sure
that none of our listeners will mine here in your
sons play in
the background or hopefully won't start fighting so that's Well,
I asked, I asked if they do I think a
lot of folks
can relate though. Yeah, oh absolutely,
so. We're recording us on April 16th.
We Speaking of kids, we had your son on which
if the order of recording goes the way I planted
in my head.
That would have been released last week.
And Uh, which I thought was a pretty good,
uh, discussion on. How stem is taught?
How stemmers perceived by quota quote policymakers?
And how the actuality of it is?
And some of the interesting stuff your son is doing
with Raspberry Pi and stuff like that.
I was a I was first I was very proud
You know the work that he's doing and he's he's
had his his hands in machine learning for really a
couple of three years.
Now I want to say he was 14 and I
came into his room.
You know just checking on say something or something I
A Mario Brothers playing in the background.
Like what do you think you know he was?
He he had done his school work?
He was home schooled at the time he done his
So you know what he wants.
But um, later talking to him about it,
he said he actually came and got me and he
OK, dad, it took, you know with I think it
was like 6.
You know neural nodes. Here he was able to,
Mario was able to figure this out and something like
4 hours or something you know later he said I
wonder what it would be if I added a note.
I wonder what that would do to it and I'm
kind of sitting there with my mouth hanging open.
Going show dad more about that nice,
but he's been doing it for awhile.
I know your kids are interested in the same thing.
They're younger Stevie 17 now and you know.
and I know that your sons are coming up in
In this age as well,
they are mentioned Mark Tapatio in that show as he
referred to digital natives.
They are digital natives and yeah,
that comes with some pretty interesting stuff.
So I'm just glad we were able to record that
show as he gets ready for his first sequel Saturday
presentation here on that topic.
So and that's all assuming that we were able to
overcome the technical glitch.
We we learned something, Frank,
I'd learn something. Yeah,
it's not a glitch. If you learn something.
So if if for some reason.
The you know what hit the fan then that episode
will be recorded at a future date,
so we'll see it will,
but we've got. We've got
a great topic today. You and I've been bad this
around I want.
I know it's been several weeks.
It may have been a couple of months.
We've been talking about doing this.
Right absolutely, and part of what motivates this?
An based on the release schedule that I anticipate this
will have already happened.
I'm changing jobs at Microsoft Woo.
At your new job. I will be the data and
the AI technology architect at the Reston MTC or Microsoft
so congratulations. Thank you very much.
It's an honor to join such a prestigious team.
If you're not familiar with what the MTC is.
MTC is a Microsoft Technology Center.
There is about 80 of them around the world,
and they basically are meant to provide specific experiences.
Ends well as architecture design guidance for customers around the
world and it's an honor to be kind of in
It's very rarely does an opening happen in an empty,
so when one opened up in my neck of the
Woods is like I have to take it.
I have to at least try.
Right right? So Fortunately I am super excited.
And Uhm, 'cause That's what we say at Microsoft were
super excited and it's a great team.
Great stuff that they do.
They do a lot of work with the community.
They do a lot of work with customers.
It's just an awesome gig.
I'm really looking forward to it and.
Yeah, I'm really excited about
it. Congratulations brother. That's a great thing and I think
you're perfect for that job.
I know, I know, someone else in that job at
an MTC in the northeast.
And it's it's kind of a rare breed of person
that has to walk into that role because.
It optimally you have a smattering of exposure to all
whole slew of enterprise architecture,
an both both you and this other individual that I
know fit that mold.
You've got programming experience, software development experience,
and you also have data experience,
and it's just rare to be good at both of
those things I know,
but I know you're good at it,
and I know my other friend is good at this
so I just I just think it's going to be
a great fit for you,
Frank. I'm I'm excited, you got
it. Thank you. Thank you very much.
So with that, one of the things that I've been
ramping up on in anticipation for this job or whatever
opportunity I was going to go to next.
I was learning more about the quote Unquote traditional side
of the data world,
which let me move kind of explain my little worldview,
which is twisted and as weird as it may be,
it might actually be right.
I see this alot in my current current or old
roll current as of April 16th.
Role is that we have data in the I cloud
but there's a very clear line of demarcation between the
Part of the data in the icy essays and the
sequel veterans side of things.
So I actually had a call this morning where it
It was very, very much laid bare 'cause we were
talking about that and that there's essentially kind of two
types of data in AI folks at Microsoft for sure,
probably everywhere else, to you have the RDBMS folks.
These folks have been doing sequel since it was aside
based joint venture,
right, right? That's their world.
Ann, you have kind of the big data open source
kind of tooling world,
right? The folks that are more comfortable in spark or
Hadoop or with the crazy statistics and math around machine
learning and AI,
right? You kind of have those two.
the two. Rarely do you
have a person who's. Comfort,
Rible and happy in both.
I am aiming to be happy and comfortable in both.
Obviously I'm more in the data science kind of world.
And part of my part of what I see is
the opportunity in this new role is to grow into
the kind of the sequel.
RDBMS traditional database world. That makes
sense. They are no. It makes perfect sense.
and I mean coming at coming at this from,
you know, we as we shared in each show the
past few days that we've recorded.
We've known each other for like 15 years.
And most of that time you were a professional software
You are a Microsoft MVP in.
I forget which discipline it was.
Frank, I know it was software development related.
the world has forgotten that this discipline never existed.
PC. Tablet PC right? OK and you did an awful
lot in there and I know there's a lot of
people out there working in what that evolved into mobile.
That benefited from the blog post you shared,
solutions. You shared an all of that,
but yeah, that whole mobile thing turned out not to
be such a,
you know, it was a trend it and it evolved.
To what it is now,
and having that experience, I think you're going to find
that that plays well into kind of backfilling like you
or filling this other bucket that you want to go
which is traditional T SQL an.
I know, I know, from experience and dabbling in machine
learning and AI.
I'm on the opposite side of the fence,
although I'm not really that good at,
you know. Let's say like DBA level T SQL,
but I you know I can hold my own in
but if we are. If we're selling tuning performance tuning
to a client.
I may be involved in the project,
but rarely am I the person actually performing the tuning.
There are lots of people out there that we subcontract
as a as a consulting firm,
enterprise data and Analytics. We bring others in who are
better at that much better at that than I am,
and we have people on the team who are much
better than I am as well,
but it's. I think your experiences has his set you
up really well.
To make this transition and it will like everything else
We talked about this in the other shows.
It takes time. And it's frustrating,
but I think you're well positioned to pick up this
skill as fast or faster than almost anyone else I
know just well,
thank you now you know part of it,
I'm not. I'm not completely like naive to the ways
I took sequel in college,
database design and college and my professor worked with card
So you know, like. You know,
I'm only two degrees of Kevin Bacon away.
From the founders of the theory,
so you know that's going for me,
but I never really got into just kind of the
nuts and bolts of it,
and I'm not. I'm not concerned about that.
I'm actually fascinated about it,
because it's just another way to solve the same problem.
Absolutely. Ultimately, at the end of the day,
you're moving bits around, and it's a question.
What's your philosophy? Or obviously,
RDBMS has a philosophy and it you know I'm not
I mean, it worked well for 5060 years.
But now we live in a world where there's a
lot more unstructured data.
And how do you deal with that?
And how do you deal with it now that you're
not making assumptions about spinning
disks, right? Right there's a whole.
'cause we haven't hazard.
Yeah who talked about that on our show that yeah
there still leaves it's 2020 and I would say still
most of our code is designed for that age of
the heads picking up seeking a sector an reading data
and then picking up again.
So there's there's a whole new opportunity where obviously relational
databases are going to still matter,
but it's just one of many tool sets.
In fact, one of the things that I learned when
I was doing start up with angelism for Microsoft was.
You know, having debates with startup founders who UR?
I will say I put them in a hipster category,
right? I worked with when you work with startups runs
the gamut between really like I mean like that this
person is going to be the next Steve Jobs to
this person is kind of like I think they're living
in their parents basement,
but rather than seeing unemployed they haven't so somewhere in
the middle you kind of what I have.
The hipster ones where they learned code because of make
Now that's not nothing wrong with that,
but do you think that you're an expert in all
things technology because you learn to code?
Right, you know, and then you go to a person
that is supposed to help you take your stuff to
the next level and kind of talk down to them.
So right context this conversation.
So they were basically lamenting the fact that they wanted
They wanted to have the reliability of.
Up an RDBMS, but they wanted to do it in
a note SQL type of environment.
An I was like that's
a fair. You know that's a fair thing to want.
I'm just all cards on the table.
Approaching that, architecturally, that's that's not an unreasonable request.
But unless and until you get into the engineering part
And that's where you start to see that you just
can't have everything that you want.
I mean, there's no single do it all type application,
everything, every software application ever.
And I'm going to maintain,
probably forever. They're going to be applications.
There's going to be some spot that I define as
It's something that the application or server or what have
you doesn't do well.
And what you'll often find is there's some other application
out there that's available,
or some other platform, and it will do that part
But again, that also has its corners,
and So what you're trading is pain.
The nicest way possible. You're picking your picking your poison,
picking your pain. What is it that you want to
And it depends on. You know.
Relational databases have their pain points.
No sequel. It turns out a lot of companies have
learned this over the past few years.
Also has its pain points as well so.
You can't always get what you want,
but if you try sometimes you might get get what
Awesome. So, so I
mean part of it is,
you know, sometimes whether it's technology,
anything else, you have, kind of these dueling philosophes an
there is a point where they just won't meet just
They're they're kind of philosophically opposed an you're right,
you have to kind of pick which one you want
to have over the other.
And there's cause and effect to that.
So with that kind of deep philosophical you were data
so that's good. So I wanted to talk to you
We want to do a deep dive.
It's not officially a deep dive until I have fun
with my soundboard there.
Into data warehousing, what is data warehousing?
Where did it start? I'll channel A little bit of
What is data warehousing? What do they know?
Do they know things? Let's find out.
Well, yeah, data
warehousing in my opinion in my experience is really this
idea of of collecting data from all over different places
and placing it into a centralized location.
Now there's some distinctions and there's other scientific answers to
and you can actually build something that today is not
considered a technically a data warehouse.
You can gather all of the information that spread across
the enterprise in different places.
Into what's now called an operational data store.
Ann, it's not totally unlike a data warehouse.
In fact, I think the Euler diagrams have quite a
bit of overlap for that,
at least if we if we kind of improve or
add to the word data warehouse or the term data
warehouse with relational data warehousing,
there's a lot of overlap between relational data warehousing and
operational data store.
Wanna confuse that really with our listeners?
But I just want to make you aware if you
hear oh DS or DW or EW.
It could be that they're talking about largely the same
And when you think about like you think about supply
which is a topic on everyone's mind these days as
we're talking about the economic impact of the pandemic.
Supply chains are where really where really way more important
than we realize and it's kind of like oxygen or
You don't recognize how important it is until you don't
An supply chains are like this and you could think
of a data warehouse.
In that terminology. The analogy holds for quite a bit,
and I'm going. I'm just going to use Walmart and
Amazon as you know,
is kind of examples of this.
They both have these distribution centers and they have these
network set up all over the United States,
probably all over the world and its places where the
goods come from the source and they're trucked into.
You know, they may be collected at other points along
But they're trucked into these large,
physically large warehouses and then stocked.
And then from there there actually shipped out to in
the case of Amazon.
Usually there handed off to some delivery service.
In the case of Walmart,
they're placed on other Walmart trucks that are shipped to
The actual brick and mortar stores and that warehouse in
That distribution center. That's what I think of when I
think of data warehouses.
I think of the the electronic equivalent of that because
there's all of these. You'll see especially at what I
consider an EDW enterprise data warehouse.
You've got a collection of companies that have been acquired
in mergers and acquisitions,
and they're looking at. I want to get all of
But they have and want to bring that into this
and that I want it there for a number of
But one of the big reasons is so I can
query that data and I can learn how my entire
How's it working? And. And in that,
and now if I apply that Walmart Amazon analogy to
that to the data there,