Artwork for podcast Data Driven
Data Warehousing Deep Dive
Episode 829th April 2020 • Data Driven • Data Driven
00:00:00 01:01:35

Share Episode


In this Deep Dive, Frank and Andy delve into the world of Data Warehousing, what is it and do they know things? Let's find out!

Frank also shares that he has a new role at Microsoft.

AI Generated Transcription

Hello and welcome to data driven,

the podcast where we explore the emerging field of data


We bring the best minds in data,

software, engineering, machine learning and artificial intelligence.

Now hear your hosts Frank Lavigna and Andy Leonard.

Hello and welcome back to data driven.

The podcast where we explore the emerging fields of data

science machine learning an artificial intelligence.

If you like to think of data as the new

oil then you could consider us like Car Talk.

However, we can't go on a road trip because of

the Corona virus lock down.

So it's just Andy and I kind of stuck at

home respectively.

And thanks to the Magic of Technology we can be

on the show at the same time.

And, uh, how's it going?

Andy? It's going

well, Frank, how are you doing?

Good, good, uh, you'll

probably hear my kids in the background.


will, and you know what Frank,

I think it's fine. You know I'm going to.

I understand why you said the word stuck with you

and I work remotely an awful lot.

We usually record like this.

There's there's less in the background.

It's your place most of the time,

but you have couple of young boys there and you

need to be in the room with them when mom

who's also working from home is you know is doing

some of her work so kudos to you to both

of you for finding a way to manage this.

Everybody's going through these sorts of things and I'm sure

that none of our listeners will mine here in your

sons play in

the background or hopefully won't start fighting so that's Well,

I asked, I asked if they do I think a

lot of folks

can relate though. Yeah, oh absolutely,


so. We're recording us on April 16th.

We Speaking of kids, we had your son on which

if the order of recording goes the way I planted

in my head.

That would have been released last week.

And Uh, which I thought was a pretty good,

uh, discussion on. How stem is taught?

How stemmers perceived by quota quote policymakers?

And how the actuality of it is?

And some of the interesting stuff your son is doing

with Raspberry Pi and stuff like that.


I was a I was first I was very proud

of him.

You know the work that he's doing and he's he's

had his his hands in machine learning for really a

couple of three years.

Now I want to say he was 14 and I

came into his room.

You know just checking on say something or something I


A Mario Brothers playing in the background.

Like what do you think you know he was?

He he had done his school work?

He was home schooled at the time he done his

school work.

So you know what he wants.

But um, later talking to him about it,

he said he actually came and got me and he


OK, dad, it took, you know with I think it

was like 6.

You know neural nodes. Here he was able to,

Mario was able to figure this out and something like

4 hours or something you know later he said I

wonder what it would be if I added a note.

I wonder what that would do to it and I'm

kind of sitting there with my mouth hanging open.

Going show dad more about that nice,

but he's been doing it for awhile.

I know your kids are interested in the same thing.

They're younger Stevie 17 now and you know.

and I know that your sons are coming up in


In this age as well,

they are mentioned Mark Tapatio in that show as he

referred to digital natives.

They are digital natives and yeah,

that comes with some pretty interesting stuff.

So I'm just glad we were able to record that

show as he gets ready for his first sequel Saturday

presentation here on that topic.

So and that's all assuming that we were able to

overcome the technical glitch.

We we learned something, Frank,

I'd learn something. Yeah,

it's not a glitch. If you learn something.

So if if for some reason.

The you know what hit the fan then that episode

will be recorded at a future date,

so we'll see it will,

but we've got. We've got

a great topic today. You and I've been bad this

around I want.

I know it's been several weeks.

It may have been a couple of months.

We've been talking about doing this.

Right absolutely, and part of what motivates this?

An based on the release schedule that I anticipate this

will have already happened.

I'm changing jobs at Microsoft Woo.

At your new job. I will be the data and

the AI technology architect at the Reston MTC or Microsoft

Technology Center,

so congratulations. Thank you very much.

It's an honor to join such a prestigious team.

If you're not familiar with what the MTC is.

MTC is a Microsoft Technology Center.

There is about 80 of them around the world,

and they basically are meant to provide specific experiences.

Ends well as architecture design guidance for customers around the

world and it's an honor to be kind of in

that team.

It's very rarely does an opening happen in an empty,

so when one opened up in my neck of the

Woods is like I have to take it.

I have to at least try.

Right right? So Fortunately I am super excited.

And Uhm, 'cause That's what we say at Microsoft were

super excited and it's a great team.

Great stuff that they do.

They do a lot of work with the community.

They do a lot of work with customers.

It's just an awesome gig.

I'm really looking forward to it and.

Yeah, I'm really excited about

it. Congratulations brother. That's a great thing and I think

you're perfect for that job.

I know, I know, someone else in that job at

an MTC in the northeast.

And it's it's kind of a rare breed of person

that has to walk into that role because.

It optimally you have a smattering of exposure to all

whole slew of enterprise architecture,

an both both you and this other individual that I

know fit that mold.

You've got programming experience, software development experience,

and you also have data experience,

and it's just rare to be good at both of

those things I know,

but I know you're good at it,

and I know my other friend is good at this

as well,

so I just I just think it's going to be

a great fit for you,

Frank. I'm I'm excited, you got

it. Thank you. Thank you very much.

So with that, one of the things that I've been

ramping up on in anticipation for this job or whatever

opportunity I was going to go to next.

I was learning more about the quote Unquote traditional side

of the data world,

which let me move kind of explain my little worldview,

which is twisted and as weird as it may be,

it might actually be right.

I see this alot in my current current or old

roll current as of April 16th.

Role is that we have data in the I cloud

solution architects,

but there's a very clear line of demarcation between the

data scientist.

Part of the data in the icy essays and the

sequel veterans side of things.

So I actually had a call this morning where it


It was very, very much laid bare 'cause we were

talking about that and that there's essentially kind of two

types of data in AI folks at Microsoft for sure,

probably everywhere else, to you have the RDBMS folks.

These folks have been doing sequel since it was aside

based joint venture,

right, right? That's their world.

Ann, you have kind of the big data open source

kind of tooling world,

right? The folks that are more comfortable in spark or

Hadoop or with the crazy statistics and math around machine

learning and AI,

right? You kind of have those two.

Rarely do

the two. Rarely do you

have a person who's. Comfort,

Rible and happy in both.

I am aiming to be happy and comfortable in both.

Obviously I'm more in the data science kind of world.

And part of my part of what I see is

the opportunity in this new role is to grow into

the kind of the sequel.

RDBMS traditional database world. That makes

sense. They are no. It makes perfect sense.

and I mean coming at coming at this from,

you know, we as we shared in each show the

past few days that we've recorded.

We've known each other for like 15 years.

And most of that time you were a professional software


You are a Microsoft MVP in.

I forget which discipline it was.

Frank, I know it was software development related.



the world has forgotten that this discipline never existed.


PC. Tablet PC right? OK and you did an awful

lot in there and I know there's a lot of

people out there working in what that evolved into mobile.

That benefited from the blog post you shared,

solutions. You shared an all of that,

but yeah, that whole mobile thing turned out not to

be such a,

you know, it was a trend it and it evolved.

To what it is now,

and having that experience, I think you're going to find

that that plays well into kind of backfilling like you


or filling this other bucket that you want to go


which is traditional T SQL an.

I know, I know, from experience and dabbling in machine

learning and AI.

I'm on the opposite side of the fence,

although I'm not really that good at,

you know. Let's say like DBA level T SQL,

but I you know I can hold my own in


but if we are. If we're selling tuning performance tuning

to a client.

I may be involved in the project,

but rarely am I the person actually performing the tuning.

There are lots of people out there that we subcontract

as a as a consulting firm,

enterprise data and Analytics. We bring others in who are

better at that much better at that than I am,

and we have people on the team who are much

better than I am as well,

but it's. I think your experiences has his set you

up really well.

To make this transition and it will like everything else


We talked about this in the other shows.

It takes time. And it's frustrating,

but I think you're well positioned to pick up this

skill as fast or faster than almost anyone else I

know just well,

thank you now you know part of it,

I'm not. I'm not completely like naive to the ways


I took sequel in college,

database design and college and my professor worked with card

in date.

So you know, like. You know,

I'm only two degrees of Kevin Bacon away.

From the founders of the theory,

so you know that's going for me,

but I never really got into just kind of the

nuts and bolts of it,

and I'm not. I'm not concerned about that.

I'm actually fascinated about it,

because it's just another way to solve the same problem.

Absolutely. Ultimately, at the end of the day,

you're moving bits around, and it's a question.

What's your philosophy? Or obviously,

RDBMS has a philosophy and it you know I'm not


I mean, it worked well for 5060 years.

But now we live in a world where there's a

lot more unstructured data.

And how do you deal with that?

And how do you deal with it now that you're

not making assumptions about spinning

disks, right? Right there's a whole.

Kevin hazard.

'cause we haven't hazard.

Yeah who talked about that on our show that yeah

there still leaves it's 2020 and I would say still

most of our code is designed for that age of

the heads picking up seeking a sector an reading data

and then picking up again.

So there's there's a whole new opportunity where obviously relational

databases are going to still matter,

but it's just one of many tool sets.

In fact, one of the things that I learned when

I was doing start up with angelism for Microsoft was.

You know, having debates with startup founders who UR?

I will say I put them in a hipster category,

right? I worked with when you work with startups runs

the gamut between really like I mean like that this

person is going to be the next Steve Jobs to

this person is kind of like I think they're living

in their parents basement,

but rather than seeing unemployed they haven't so somewhere in

the middle you kind of what I have.

The hipster ones where they learned code because of make

the startup.

Now that's not nothing wrong with that,

but do you think that you're an expert in all

things technology because you learn to code?

Right, you know, and then you go to a person

that is supposed to help you take your stuff to

the next level and kind of talk down to them.

So right context this conversation.

So they were basically lamenting the fact that they wanted


They wanted to have the reliability of.

Up an RDBMS, but they wanted to do it in

a note SQL type of environment.

An I was like that's

a fair. You know that's a fair thing to want.

I'm just all cards on the table.

Approaching that, architecturally, that's that's not an unreasonable request.

But unless and until you get into the engineering part

of it.

And that's where you start to see that you just

can't have everything that you want.

I mean, there's no single do it all type application,

everything, every software application ever.

And I'm going to maintain,

probably forever. They're going to be applications.

There's going to be some spot that I define as

a corner.

It's something that the application or server or what have

you doesn't do well.

And what you'll often find is there's some other application

out there that's available,

or some other platform, and it will do that part


But again, that also has its corners,

and So what you're trading is pain.

The nicest way possible. You're picking your picking your poison,

picking your pain. What is it that you want to


And it depends on. You know.

Relational databases have their pain points.

No sequel. It turns out a lot of companies have

learned this over the past few years.

Also has its pain points as well so.

You can't always get what you want,

but if you try sometimes you might get get what

you need.

Awesome. So, so I

mean part of it is,

you know, sometimes whether it's technology,

anything else, you have, kind of these dueling philosophes an

there is a point where they just won't meet just

because of.

They're they're kind of philosophically opposed an you're right,

you have to kind of pick which one you want

to have over the other.

And there's cause and effect to that.

So with that kind of deep philosophical you were data


so that's good. So I wanted to talk to you


We want to do a deep dive.

It's not officially a deep dive until I have fun

with my soundboard there.

Into data warehousing, what is data warehousing?

Where did it start? I'll channel A little bit of

bojack horseman.

What is data warehousing? What do they know?

Do they know things? Let's find out.

Well, yeah, data

warehousing in my opinion in my experience is really this

idea of of collecting data from all over different places

and placing it into a centralized location.

Now there's some distinctions and there's other scientific answers to

that question,

and you can actually build something that today is not

considered a technically a data warehouse.

You can gather all of the information that spread across

the enterprise in different places.

Into what's now called an operational data store.

Ann, it's not totally unlike a data warehouse.

In fact, I think the Euler diagrams have quite a

bit of overlap for that,

at least if we if we kind of improve or

add to the word data warehouse or the term data

warehouse with relational data warehousing,

there's a lot of overlap between relational data warehousing and

operational data store.

Wanna confuse that really with our listeners?

But I just want to make you aware if you

hear oh DS or DW or EW.

It could be that they're talking about largely the same


And when you think about like you think about supply

chain management,

which is a topic on everyone's mind these days as

we're talking about the economic impact of the pandemic.

Supply chains are where really where really way more important

than we realize and it's kind of like oxygen or


You don't recognize how important it is until you don't

have enough.

An supply chains are like this and you could think

of a data warehouse.

In that terminology. The analogy holds for quite a bit,

and I'm going. I'm just going to use Walmart and

Amazon as you know,

is kind of examples of this.

They both have these distribution centers and they have these

network set up all over the United States,

probably all over the world and its places where the

goods come from the source and they're trucked into.

You know, they may be collected at other points along

the way.

But they're trucked into these large,

physically large warehouses and then stocked.

And then from there there actually shipped out to in

the case of Amazon.

Usually there handed off to some delivery service.

In the case of Walmart,

they're placed on other Walmart trucks that are shipped to

the stores.

The actual brick and mortar stores and that warehouse in

the middle.

That distribution center. That's what I think of when I

think of data warehouses.

I think of the the electronic equivalent of that because

you know,

there's all of these. You'll see especially at what I

consider an EDW enterprise data warehouse.

You've got a collection of companies that have been acquired

in mergers and acquisitions,

and they're looking at. I want to get all of

their data.

But they have and want to bring that into this

one location,

and that I want it there for a number of


But one of the big reasons is so I can

query that data and I can learn how my entire

enterprises performing.

How's it working? And. And in that,

and now if I apply that Walmart Amazon analogy to

that to the data there,