Artwork for podcast The Backup Wrap-Up
What is deduplication and how does it work? (Backup to Basics series)
20th March 2023 • The Backup Wrap-Up • W. Curtis Preston (Mr. Backup)
00:00:00 00:48:53

Share Episode

Transcripts

Speaker:

On this episode of restore it all we talk about what I think is the

Speaker:

biggest advancement in backup and recovery technology during my career.

Speaker:

And that's deduplication.

Speaker:

I hope you enjoy the episode.

W. Curtis Preston:

Hi, and welcome to Backup Central's Restored all podcast.

W. Curtis Preston:

I'm your host, w Curtis Preston, aka Mr.

W. Curtis Preston:

Backup.

W. Curtis Preston:

And a half with me, my network, rearchitect Rearchitect, engineer.

Prasanna Malaiyandi:

Hey, Curtis, whatever I could do to keep you safe, you know?

W. Curtis Preston:

You know what's really funny is like I, I consider myself a

W. Curtis Preston:

pretty tech savvy guy, and when we were talking today, About what I'm, you know

W. Curtis Preston:

how I've, I've replaced a bunch of gear and I'm swapping out some stuff and

W. Curtis Preston:

moving some cables around, and then you were like, you were yelling at me.

W. Curtis Preston:

You were like, you can't do that.

W. Curtis Preston:

You can't put the switch on the thing.

W. Curtis Preston:

And I was like, yeah, I can, like, what are you talking about?

W. Curtis Preston:

And it, and it took me like a couple of seconds and I was like, oh, wait.

W. Curtis Preston:

You're right.

W. Curtis Preston:

I can't, that's not, I can't do that.

W. Curtis Preston:

I can't put.

W. Curtis Preston:

The switch.

W. Curtis Preston:

I can't put the router.

W. Curtis Preston:

That's gonna be my firewall on the same switch

W. Curtis Preston:

As my home LAN

Prasanna Malaiyandi:

Yeah,

W. Curtis Preston:

I dunno what I was thinking.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

Just another topic that I know just a little bit about.

W. Curtis Preston:

I'm a little, I feel a little ashamed that that was.

W. Curtis Preston:

But I'm glad I talked to you about my, you know, as, as is

W. Curtis Preston:

the case with many subjects.

W. Curtis Preston:

I'm glad I talked to you about, you know, what I'm up to.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

Glad I could help.

W. Curtis Preston:

I have successfully purchased and configured for the

W. Curtis Preston:

video for the video watchers.

W. Curtis Preston:

Let's see if it makes it into the camera before the cable runs.

W. Curtis Preston:

There it is, the ASUS AX6600, which is a mesh router.

W. Curtis Preston:

And I gotta say it's much more better than what I had before,

W. Curtis Preston:

and it's able, I've got two.

W. Curtis Preston:

It's supposed to provide 5,500 square feet, but of course that's, that doesn't

W. Curtis Preston:

include drywall and two by fours, right?

Prasanna Malaiyandi:

it's crazy how much signal degrades going through drywall.

Prasanna Malaiyandi:

And the other thing people don't realize is five gigahertz,

Prasanna Malaiyandi:

like degrades like no tomorrow

W. Curtis Preston:

Right.

W. Curtis Preston:

Remind me, remind me why five gertz is better again.

Prasanna Malaiyandi:

It's faster because it can handle more bandwidth, and also

Prasanna Malaiyandi:

the channel is wider, so you can have more things talking at the same time.

Prasanna Malaiyandi:

It's just as your frequency goes up, the distance goes

Prasanna Malaiyandi:

down for the same power levels,

W. Curtis Preston:

So is this like DC versus ac?

Prasanna Malaiyandi:

:

not quite DC versus ac.

Prasanna Malaiyandi:

:

It's more about.

Prasanna Malaiyandi:

:

You need to pump as many things as possible into, because high frequency,

Prasanna Malaiyandi:

:

right, it's more per cycle, right, than 2.4, which is less airtime, if you will.

W. Curtis Preston:

Right.

Prasanna Malaiyandi:

And so every sort of peak, you can send

Prasanna Malaiyandi:

more out with the five gigahertz because you're doing it more often.

W. Curtis Preston:

right.

Prasanna Malaiyandi:

And so it works a lot better.

Prasanna Malaiyandi:

It's just the distance isn't as great.

Prasanna Malaiyandi:

Now, I will tell people, so this is one of my, I'm gonna

Prasanna Malaiyandi:

get up on my soapbox now, right?

Prasanna Malaiyandi:

One of my rare soapbox events and tell people, a lot of times people

Prasanna Malaiyandi:

think they need more wifi access points in their house to get coverage.

Prasanna Malaiyandi:

And to those people, I will say, plan out your network carefully.

Prasanna Malaiyandi:

Put your devices where they matter.

Prasanna Malaiyandi:

And also don't put too many devices and don't crank up the power all the way

Prasanna Malaiyandi:

to high, because I know Curtis, you and I were talking about this when you're

Prasanna Malaiyandi:

looking at mesh, and it was like, imagine that your router can overpower your

Prasanna Malaiyandi:

phone, your laptop, your iPad, so it's screaming at the top of its lungs and your

Prasanna Malaiyandi:

phone can barely even scream back at it.

Prasanna Malaiyandi:

And so that's actually worse for your network and for airtime than

Prasanna Malaiyandi:

actually sort of balancing out power.

W. Curtis Preston:

I just don't know if, like, the stuff

W. Curtis Preston:

you're talking about, like is.

W. Curtis Preston:

is that even, is that configuration option even on consumer class routers?

Prasanna Malaiyandi:

you'll have sort of the low, medium, high power

Prasanna Malaiyandi:

levels, uh, but it takes time to fine tune and tweet these, right?

Prasanna Malaiyandi:

You have to walk around with a wifi analyzer on your phone, right?

Prasanna Malaiyandi:

So Apple with their, uh, iPhones, right?

Prasanna Malaiyandi:

They ship, what is it?

Prasanna Malaiyandi:

Airport utility, which has a wifi scan.

Prasanna Malaiyandi:

Option, which will show you all the wifi networks and sort of the signal

Prasanna Malaiyandi:

strength, and you basically have to walk around your house with that and

Prasanna Malaiyandi:

be like, okay, where is it strong?

Prasanna Malaiyandi:

Where is it weak?

Prasanna Malaiyandi:

Right, to figure out the placement.

Prasanna Malaiyandi:

That's the ideal way, because what you want is you want coverage in the right

Prasanna Malaiyandi:

places, because what you see is in a lot of high density housing areas, or

Prasanna Malaiyandi:

even homes next to each other is most people end up with crummy wifi because

Prasanna Malaiyandi:

their power is turned up so high, it bleeds into everyone else's area

Prasanna Malaiyandi:

such that everyone has a crappy time.

Prasanna Malaiyandi:

because then you get interference and then everyone sort of slows down and then it

W. Curtis Preston:

Right.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

I got a lot of wifi.

W. Curtis Preston:

I got a lot of networks.

W. Curtis Preston:

Um, you know, um, yeah,

Prasanna Malaiyandi:

And for, and for the last bit, last bit of my

Prasanna Malaiyandi:

soapbox is please, please, please do not use 40 megahertz channel widths

Prasanna Malaiyandi:

on your 2.4 gigahertz channels.

Prasanna Malaiyandi:

You do not need to use 40 megahertz and ruin everyone else's connectivity.

Prasanna Malaiyandi:

Please only use 20 megahertz bands for 2.4 gigahertz.

W. Curtis Preston:

Uh, I'll see what I can do but I, but I have this

W. Curtis Preston:

new, you know, and again, I am not a wireless, I feel like a wireless nbe,

W. Curtis Preston:

but I have this new fancy right where it automatically selects the right.

W. Curtis Preston:

Um, that's

W. Curtis Preston:

pretty cool.

Prasanna Malaiyandi:

Point to go.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Well, not just that, but also

W. Curtis Preston:

2.4 versus five.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

So actually all of this is part of the wifi standard,

Prasanna Malaiyandi:

so the figuring out which access point, that's part of the 8 0 2 11 R standard.

Prasanna Malaiyandi:

And I think that the band steering is also part of the standard as well.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

a lot of folks are implementing now.

Prasanna Malaiyandi:

Some devices don't do well with band steering.

Prasanna Malaiyandi:

It basically looks at sort of the difference between the five gigahertz

Prasanna Malaiyandi:

and the 2.4 gigahertz and says, okay, which one should I pick?

Prasanna Malaiyandi:

And most devices, if it's seven decibels difference or more, then it'll pick,

Prasanna Malaiyandi:

uh, the higher the faster speed.

Prasanna Malaiyandi:

And so that's kind of how it tricks your devices into picking the right band.

W. Curtis Preston:

Interesting.

W. Curtis Preston:

Yeah, it's kind of cool.

W. Curtis Preston:

Um, all I know is that I finally have a mesh that covers the two.

W. Curtis Preston:

Cuz my problem is that I have things in the garage, things embedded inside

W. Curtis Preston:

walls in the garage that need wifi, not just, not just inside walls.

W. Curtis Preston:

, I have a device that's inside a wall, inside an electrical

W. Curtis Preston:

cabinet, inside a wall.

W. Curtis Preston:

Right?

W. Curtis Preston:

I have a sense, uh, app or a bi, a device, and that's deep inside my

W. Curtis Preston:

electrical, my circuit breaker box.

W. Curtis Preston:

Um, and this reached to it.

W. Curtis Preston:

No problem.

W. Curtis Preston:

It didn't, it it had, it had like two bars.

W. Curtis Preston:

Right.

W. Curtis Preston:

So clearly, and, and the thing is, it's only, it's like 20 feet from.

Prasanna Malaiyandi:

yep.

W. Curtis Preston:

Right.

W. Curtis Preston:

But it's, you know, a couple of drywall walls and some

W. Curtis Preston:

two by fours and some metal.

W. Curtis Preston:

Uh, but it worked.

W. Curtis Preston:

That's the important part is that it worked.

W. Curtis Preston:

Um, yeah, so I th I think I might be in, I think I might

W. Curtis Preston:

be in wifi heaven for a while.

W. Curtis Preston:

Um, and you too can be there for the low, low price of $350 That's

W. Curtis Preston:

a two, that's a two node system.

W. Curtis Preston:

Um, And it's supposed like, yeah, but I'm pretty happy.

W. Curtis Preston:

But, uh, that's not what we're talking about today.

Prasanna Malaiyandi:

really.

Prasanna Malaiyandi:

We can talk about wifi all day if you want.

W. Curtis Preston:

yeah.

W. Curtis Preston:

Well, you could talk about wifi all day.

W. Curtis Preston:

I feel really stupid when you're talking about wifi, because I'm

W. Curtis Preston:

like, this is not my Bailey Wick.

W. Curtis Preston:

That's a cool word, by the way, Bailey Wick.

W. Curtis Preston:

So I thought we'd talk about backups instead because that's, that's my world.

W. Curtis Preston:

And I feel comfortable knowing them.

W. Curtis Preston:

Most people don't know crap about this space, uh, because they, they, you know,

W. Curtis Preston:

they get the job as a junior person and then next thing you know, they become a,

W. Curtis Preston:

a real sys admin or a network admin or a, you know, or a security admin or a dba.

Prasanna Malaiyandi:

Yeah, well, except our listeners who are all

Prasanna Malaiyandi:

awesome and probably experts in the backup field and know all about this.

W. Curtis Preston:

Well, certainly Daniel.

Prasanna Malaiyandi:

Hi Daniel.

W. Curtis Preston:

Hi Daniel.

W. Curtis Preston:

The backup anorak.

W. Curtis Preston:

Um, I wonder, you know, he's never, he's never, he better still be

W. Curtis Preston:

listening to the show since we call out to him every once in a while.

W. Curtis Preston:

Him and Stuart, although Stuart's retired.

W. Curtis Preston:

I don't think Stuart's listening to our show.

W. Curtis Preston:

I only tell 'em when we talk about 'em.

W. Curtis Preston:

But, um, so we're continuing in our backup to basic series.

W. Curtis Preston:

It's been a couple of weeks, uh, as the kids say it's been a minute,

W. Curtis Preston:

uh, since such a, I remember the first time I heard that thing, I was

W. Curtis Preston:

like, what are you talking a minute?

W. Curtis Preston:

Anyway, . But yeah, it's been a minute since we've done an episode of our

W. Curtis Preston:

Backup to Basic series, but I am looking down at the book and of course,

W. Curtis Preston:

uh, for those of you that don't know, basically we're doing a podcast version

W. Curtis Preston:

of my book, modern Data Protection.

W. Curtis Preston:

Make sure it gets in camera here from O'Reilly.

W. Curtis Preston:

Uh, you can purchase the, uh, the, the print version from,

W. Curtis Preston:

uh, your favorite book seller.

W. Curtis Preston:

Um, , perhaps it's one based in the Amazon, perhaps not, uh, Um, and,

W. Curtis Preston:

uh, but if you would like an ebook version of it, you can get your

W. Curtis Preston:

own by going to druva.com/ebook.

W. Curtis Preston:

That's d r uva.com/ebook.

W. Curtis Preston:

They will, of course, ask for your contact information and then email the crap

W. Curtis Preston:

out of you until you tell 'em to stop.

W. Curtis Preston:

But, That is, that is the price that you pay.

W. Curtis Preston:

Um, let's talk

W. Curtis Preston:

about, oh yeah.

W. Curtis Preston:

And while we're at it, uh, I'll throw out the disclaimer, uh, that

W. Curtis Preston:

this is an independent podcast and, um, uh, I work for Druva,

W. Curtis Preston:

Prasanna works for Zoom and, um,

W. Curtis Preston:

The, um, but the opinions that you hear are ours.

W. Curtis Preston:

Um, and.

W. Curtis Preston:

Et cetera.

W. Curtis Preston:

Please rate us, uh, by going to your, you know, most of you're on iTunes.

W. Curtis Preston:

Just scroll down to the bottom there, give us five or six stars and a comment.

W. Curtis Preston:

We love comments.

W. Curtis Preston:

And, uh, if you'd like to join the conversation, just contact me, w Curtis

W. Curtis Preston:

Preston gmail or WC Preston on Twitter.

Prasanna Malaiyandi:

What about LinkedIn?

W. Curtis Preston:

But n oh yeah, LinkedIn.

W. Curtis Preston:

Uh, it's linkedin.com/what is it?

W. Curtis Preston:

Slash in slash Mr.

W. Curtis Preston:

Beck.

W. Curtis Preston:

Um, and by the way, my Twitter account already has multifactor

W. Curtis Preston:

authentication, configured not using sms, which as should you, especially

W. Curtis Preston:

now that they're disabling, that so weird the way they did that.

W. Curtis Preston:

What's funny is I support the desysion.

W. Curtis Preston:

That's just the way

W. Curtis Preston:

they

Prasanna Malaiyandi:

way it came out.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Oh, Elon.

W. Curtis Preston:

Okay.

W. Curtis Preston:

So in our backup to basic series, we're continuing on, and today we are talking

W. Curtis Preston:

about using disk and deduplication.

W. Curtis Preston:

You know, I, I, um, couple weeks ago, I hit 30 years in the backup industry,

W. Curtis Preston:

and I got interviewed by Chris Mellor

Prasanna Malaiyandi:

the register and blocks and files.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

It's in his, for his block and file.

W. Curtis Preston:

Um, um, blog and one of the questions was what I thought was the most,

W. Curtis Preston:

um, important development in the backup industry since I joined.

W. Curtis Preston:

And to me, hands down, it's not even, it's not, there's not even a close second, and

W. Curtis Preston:

that is the invention of deduplication

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

and because.

W. Curtis Preston:

I, I can't think of another technology in the backup space that has changed backup

W. Curtis Preston:

architecture more than deduplication, and I can think of many other things

W. Curtis Preston:

that we do that are only possible because deduplication is underneath them,

Prasanna Malaiyandi:

Oh yeah, definitely.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

I don't think we would be able to get, especially with the data growth

Prasanna Malaiyandi:

and the size of these applications.

W. Curtis Preston:

Is data growing?

W. Curtis Preston:

Is

Prasanna Malaiyandi:

No, not at all.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

I don't think it would be possible to do, like I know Curtis, you've talked

Prasanna Malaiyandi:

about previous, like in your early days, right, about trying to do a backup.

Prasanna Malaiyandi:

I being like, oh my God, how am I gonna do this full backup in a weekend?

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

And just with the fact, and I know we'll go and talk

Prasanna Malaiyandi:

about more about deduplication, but yeah, just being able to now do that

Prasanna Malaiyandi:

in a cost effective way, using new ways of actually doing the backups as well,

Prasanna Malaiyandi:

which is enabled with deduplication.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

So it, it's, it's like disk.

W. Curtis Preston:

You could argue that disk using disk and backups is the bigger, uh, advancement.

W. Curtis Preston:

But first off, not really an advancement.

W. Curtis Preston:

It's just instead of tape, we're gonna use disc,

W. Curtis Preston:

but

Prasanna Malaiyandi:

was there to start with anyway.

Prasanna Malaiyandi:

It was just sort of, the cost was so high, and especially given the type of workload

Prasanna Malaiyandi:

you see with deduplication where, or with backups where you're doing periodic

Prasanna Malaiyandi:

fulls or other things like that, and keeping them for long periods of time.

Prasanna Malaiyandi:

Are you going to spend what, 40 x or 30 x on storage for your backup

Prasanna Malaiyandi:

system versus your production?

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

That's a hard sell.

W. Curtis Preston:

just, yeah, cuz that's a problem.

W. Curtis Preston:

So one of the, one of the things, uh, that I remember from back in the

W. Curtis Preston:

day, like I, I don't remember really thinking about this lately, but back

W. Curtis Preston:

in the day, I would say that for every gigabyte of primary storage, you

W. Curtis Preston:

had 20 gigabytes of backup storage.

W. Curtis Preston:

And so if you're gonna do that with disk, even, you know, even once, many years ago.

W. Curtis Preston:

Wow.

W. Curtis Preston:

At this point, it's like 20 years ago, . But, but even once they came out with

W. Curtis Preston:

this idea of, Uh, SATA disk instead

W. Curtis Preston:

of

Prasanna Malaiyandi:

nearline

Prasanna Malaiyandi:

storage.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, that, that helped bring the cost down significantly.

W. Curtis Preston:

But, But,

W. Curtis Preston:

not, But not as much as deduplication.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Because even with those price differences, right?

Prasanna Malaiyandi:

Maybe it was half the price or a third of the price, but once you add

Prasanna Malaiyandi:

in that 20 x that you talked about, right, Curtis, then that adds up.

Prasanna Malaiyandi:

And it's not only just the storage cost, it's also you have

Prasanna Malaiyandi:

to account for the power, the cooling, the floor space, right?

Prasanna Malaiyandi:

All the things that go into that system.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Um, it's funny, um, just sort of, just sort of a, an afterthought that, that.

W. Curtis Preston:

Post that, um, that Chris Mellor did about the 30 years.

W. Curtis Preston:

The one group that jumped on the article and just started retweeting all kinds of

W. Curtis Preston:

parts of, of, or pieces of the article was the tape group , because I said,

W. Curtis Preston:

I said really good things about tape.

W. Curtis Preston:

And, and the thing is that, um, you know, I, I, you know, I, I

W. Curtis Preston:

believe in all of those things, but.

W. Curtis Preston:

You know, all of the advancements that I've seen in backup in the last 20 plus

W. Curtis Preston:

years has been disk and deduplication.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, so let's talk about, so what, so not everybody really

W. Curtis Preston:

understands what deduplication is.

W. Curtis Preston:

Some people used to describe it like, well, it's like compression, uh, the way

W. Curtis Preston:

I remember it's like macro compression.

W. Curtis Preston:

Um, it's like compression over time.

W. Curtis Preston:

do you think of that?

Prasanna Malaiyandi:

uh, I don't quite like that, so, so, right.

W. Curtis Preston:

may be some old blog posts that I might have

W. Curtis Preston:

said that phrase, but go ahead.

Prasanna Malaiyandi:

so in my mind, right, deduplication is.

Prasanna Malaiyandi:

Finding two identical segments and tossing one away, keeping only one copy,

Prasanna Malaiyandi:

but still keeping a reference to that so you can, so you still know you have two

Prasanna Malaiyandi:

virtual copies, but one physical copy,

W. Curtis Preston:

Mm-hmm.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

At a high level, that's what I, and now

W. Curtis Preston:

you?

Prasanna Malaiyandi:

what is compression is taking an object, a singular object,

Prasanna Malaiyandi:

and squeezing it into a smaller space.

W. Curtis Preston:

Right.

W. Curtis Preston:

But how do you understand how compression works?

W. Curtis Preston:

Cuz I Sure as hell don't

Prasanna Malaiyandi:

yeah, so typically like you would run it

Prasanna Malaiyandi:

through different types of algorithms like LZ compression and all the rest

Prasanna Malaiyandi:

in order to look for patterns and throw away bits and compress it down.

Prasanna Malaiyandi:

Now, the difference I would say between duping compression

Prasanna Malaiyandi:

because they do sound the same,

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

I would say one of the differences is with deduplication.

Prasanna Malaiyandi:

It's more like a file system level compression, if you want to

Prasanna Malaiyandi:

think of it that way, because it's not just I'm taking this block.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

It's not just I'm taking this and I'm squeezing it down such

Prasanna Malaiyandi:

that it could be, I just need to look at this and figure it out.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

It's a lot more complex than that.

W. Curtis Preston:

It is definitely a lot more complex than compression.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, I, I, I've just, I've, I've just honestly never really dug into the code

W. Curtis Preston:

of how traditional compression works.

W. Curtis Preston:

So the idea is that I'm looking for duplicate segments of data across many

W. Curtis Preston:

places, both from different sources as well as different time periods, right?

W. Curtis Preston:

I'm, I'm comparing the, this chunk of data that's coming in right

W. Curtis Preston:

now and, and tonight's backup.

W. Curtis Preston:

I'm comparing it literally with every chunk of data that I've ever received

W. Curtis Preston:

from anywhere else.

W. Curtis Preston:

. Prasanna Malaiyandi: I would say that's

W. Curtis Preston:

builds their deduplication that way.

W. Curtis Preston:

, W. Curtis Preston: So,

Prasanna Malaiyandi:

where

Prasanna Malaiyandi:

it's

W. Curtis Preston:

there are, yeah, go ahead.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

So it all goes down to sort of what is your deduplication domain is another

Prasanna Malaiyandi:

term that some people talk about, right?

Prasanna Malaiyandi:

Which is, is it limited to a system?

Prasanna Malaiyandi:

Is it limited to a cluster which might be formed to multiple systems,

Prasanna Malaiyandi:

or is it limited to sort of a single backup stream coming in?

Prasanna Malaiyandi:

So

Prasanna Malaiyandi:

there.

W. Curtis Preston:

that the question is what is your data domain?

W. Curtis Preston:

Uh,

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

D Domain.

W. Curtis Preston:

So let's back up.

W. Curtis Preston:

So a, as I understand it, right, so basically we're taking the data

W. Curtis Preston:

that's, that's coming in or that's going to come in, we're slicing

W. Curtis Preston:

it up into, I like the term chunk.

W. Curtis Preston:

, right?

W. Curtis Preston:

We run those chunks through a cryptographic hashing algorithm.

W. Curtis Preston:

SH one, Shaw 2 56, whatever it, whatever you're using.

W. Curtis Preston:

On the other side of that, we get a alpha numeric value, in the case of SH one,

W. Curtis Preston:

it's 160 bit alpha alphanumeric value.

W. Curtis Preston:

so basically you, you, depending on the algorithm you use, you get a, um, you get

W. Curtis Preston:

an alpha numeric value at the end, and the size of that val, of that value is going

W. Curtis Preston:

to be based on which algorithm you use.

W. Curtis Preston:

In the case of SHA-1, it's 160 bits, right?

W. Curtis Preston:

And.

W. Curtis Preston:

You can then take the 160 bits.

W. Curtis Preston:

You can't reverse engineer it.

W. Curtis Preston:

You can't take the 160 bits and turn it into the chunk, but you can use that, that

W. Curtis Preston:

value to uniquely identify that chunk.

W. Curtis Preston:

And so if you have another chunk of data, regardless of where it came from,

W. Curtis Preston:

If it's 160 bit value, again, that's SHA-1 and other values are different.

W. Curtis Preston:

If it's fingerprint is the same, you can say that this chunk is identical

W. Curtis Preston:

to that other chunk that had the same fingerprint, and you can then

W. Curtis Preston:

discard the other chunk, right?

W. Curtis Preston:

the,

W. Curtis Preston:

the

Prasanna Malaiyandi:

Yeah, you can, you can discard the actual data,

Prasanna Malaiyandi:

but you should still keep track of it somewhere in a file system,

Prasanna Malaiyandi:

just because you need, still need

W. Curtis Preston:

Yeah.

W. Curtis Preston:

You're gonna keep track.

W. Curtis Preston:

Oh, we found another one of these,

W. Curtis Preston:

right?

Prasanna Malaiyandi:

And so usually that lookup is in a deduplication

Prasanna Malaiyandi:

index is what they called them.

Prasanna Malaiyandi:

Usually a dedupe index, which keeps a list of, Hey, here are

Prasanna Malaiyandi:

all the fingerprints that I have.

W. Curtis Preston:

Right.

W. Curtis Preston:

As we, we were alluding to before, one of the things that determines

W. Curtis Preston:

sort of your effectiveness of, of dedupe is the dedupe domain, right?

W. Curtis Preston:

So I've seen it file system level, meaning it only looks for

W. Curtis Preston:

duplicate data within each volume.

W. Curtis Preston:

I've seen it host level, I've seen it backup level, meaning

W. Curtis Preston:

literally backup configuration wise.

W. Curtis Preston:

right?

W. Curtis Preston:

So if I, if I have a Windows server and I'm backing up the host and I'm

W. Curtis Preston:

backing up SQL Server, I only look for duplicates within SQL Server

W. Curtis Preston:

backups right against each other.

W. Curtis Preston:

Uh, then we have, um, if we're backing up several systems to a box, right?

W. Curtis Preston:

Maybe that the dedupe domain is only within that box.

W. Curtis Preston:

It's only looking for.

W. Curtis Preston:

Duplicates between all of that.

W. Curtis Preston:

And then there's what I would call truly global dedupe, which is , we're looking

W. Curtis Preston:

for duplicates from everything that's coming in, uh, from multiple sources.

W. Curtis Preston:

Right?

Prasanna Malaiyandi:

Mm-hmm.

W. Curtis Preston:

there is a.

W. Curtis Preston:

Point of decreasing marginal returns, right?

W. Curtis Preston:

You can argue, and certainly if you're a company that only does d dedupe within,

W. Curtis Preston:

like earlier I was, we only looked for dupes within SQL server backups.

W. Curtis Preston:

You could make an argument that, well, there's not a lot of duplicate data

W. Curtis Preston:

between SQL Server and Windows, right?

W. Curtis Preston:

so even though we're not comparing the two, there's not, there's not gonna

W. Curtis Preston:

be a lot of duplicate data there, and there's not gonna be a lot of duplicate.

W. Curtis Preston:

between the SQL Server database on this host and the SQL

W. Curtis Preston:

Server database on that host.

W. Curtis Preston:

So that's another argument that some

Prasanna Malaiyandi:

but, but I think a lot of that was because of

Prasanna Malaiyandi:

architectural limitations of the products themselves rather than,

Prasanna Malaiyandi:

that is really what you wanted to do.

Prasanna Malaiyandi:

Right?

Prasanna Malaiyandi:

Because

Prasanna Malaiyandi:

that's more of a management issue.

W. Curtis Preston:

they didn't, It was like, it was like, well, if we're

W. Curtis Preston:

gonna do it, if we're gonna do it that way, it's gonna be much harder.

W. Curtis Preston:

to, to, to design a product to do it that way.

W. Curtis Preston:

And we don't think, we don't think that there's going to

W. Curtis Preston:

be that much more benefit, um,

Prasanna Malaiyandi:

But on the other hand, if you look

Prasanna Malaiyandi:

at things like VMware, right?

Prasanna Malaiyandi:

If I have a bunch of VMs, right, there's a good cha, and they all came

Prasanna Malaiyandi:

from a single golden image, right?

Prasanna Malaiyandi:

There's a good chance that as you're backing it up, 80, 90% of that stuff

Prasanna Malaiyandi:

is all gonna be deduplicated, right?

W. Curtis Preston:

Absolutely.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

There's also a lot of duplicate data even within like a large filer, right?

W. Curtis Preston:

There's gonna be lots of duplicate data there, right?

W. Curtis Preston:

So if you're only doing it volume to volume or backup configuration to

W. Curtis Preston:

backup configuration, you, there's a lot of duplicate data that I

W. Curtis Preston:

think you would, you would miss.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

I know you talked about the domains, but I think another

Prasanna Malaiyandi:

thing to also mention is, Some products do different types of chunking, if you will.

Prasanna Malaiyandi:

Some do it at the file level, others do it at sort of a smaller level, right?

Prasanna Malaiyandi:

And some do sort of fixed segment where each one is sort of a fixed length.

Prasanna Malaiyandi:

Others do sort of variable segments where they try to figure out what is optimal,

Prasanna Malaiyandi:

because depending on how you're doing your fingerprinting, right, you want to

Prasanna Malaiyandi:

find the most number of matches, right?

Prasanna Malaiyandi:

So you can save on storage.

W. Curtis Preston:

right.

W. Curtis Preston:

I,

Prasanna Malaiyandi:

another thing that also comes up.

W. Curtis Preston:

I would argue that file level dedupe isn't really

W. Curtis Preston:

dedupe, it's more a single instance.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, that's like single instance storage of a file, you

W. Curtis Preston:

know?

W. Curtis Preston:

Okay.

W. Curtis Preston:

It, it's, yeah.

W. Curtis Preston:

But so I, I'm always thinking subfile, uh, when I think about what I think

W. Curtis Preston:

of actual dedupe . There is a much, like a very big, uh, other way that

W. Curtis Preston:

we divide up the dedupe industry, and that is source versus target.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Um, the, um, the first dedupe product I ever saw,

W. Curtis Preston:

which was, uh, no, was not, that was not the first, no, the first one I saw

W. Curtis Preston:

the product at the time was called Undo.

W. Curtis Preston:

Have we talked

W. Curtis Preston:

about this?

Prasanna Malaiyandi:

Mm.

W. Curtis Preston:

Undo with two Os.

W. Curtis Preston:

It was really funny that the name of a dedupe vendor.

W. Curtis Preston:

Had duplicate data in their company name.

W. Curtis Preston:

It was undoo with two os.

W. Curtis Preston:

You know this product, you just don't know that that's what it used to be called.

Prasanna Malaiyandi:

What is it?

Prasanna Malaiyandi:

What

W. Curtis Preston:

give you a, I'll give, I'll give you a hint.

W. Curtis Preston:

It.

W. Curtis Preston:

The name comes from the fact that it would be a C of availability.

W. Curtis Preston:

I'm gonna, I'm gonna put the, the Jeopardy theme in here.

Prasanna Malaiyandi:

What would it see of availability?

W. Curtis Preston:

That's what the name, that's where the name for the company

W. Curtis Preston:

comes from, or if I want to put it in the right order, an availability c.

Prasanna Malaiyandi:

I don't know what this is.

W. Curtis Preston:

Avamar

Prasanna Malaiyandi:

Oh, oh, that makes sense.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

So that's, that's where the name Avamar came from.

W. Curtis Preston:

So the, the first

Prasanna Malaiyandi:

I should know that

W. Curtis Preston:

you shouldn't know

Prasanna Malaiyandi:

I having being, uh, part of my former employer.

Prasanna Malaiyandi:

Yes.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Well, I mean, you know, I, I have a bit of an inside track because that they're, They

W. Curtis Preston:

were right up the road from me, right?

W. Curtis Preston:

They were up there.

W. Curtis Preston:

They were up in Irvine.

W. Curtis Preston:

Um, and that was, uh, the first dedupe product.

W. Curtis Preston:

They were a source dedupe . So what's the difference between source

W. Curtis Preston:

dedupe and target dedupe Prasanna?

Prasanna Malaiyandi:

So the biggest one is, so let's first

Prasanna Malaiyandi:

talk about target tup, right?

Prasanna Malaiyandi:

So Target Tup is data comes into the system and then a deduplication

Prasanna Malaiyandi:

algorithm runs tosses away data.

Prasanna Malaiyandi:

It can support any type of client as long as it supports

Prasanna Malaiyandi:

whatever the protocol it has.

Prasanna Malaiyandi:

So it's NFS or smb, right?

Prasanna Malaiyandi:

Whatever can write to it, the data gets deduped.

W. Curtis Preston:

Hang on, hang on.

W. Curtis Preston:

Before you go on to that.

W. Curtis Preston:

I don't disagree with what you said.

W. Curtis Preston:

I just, I think there could be a little bit more clarification.

W. Curtis Preston:

It's a box that I send whatever I want to.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Typically it, the thing about Target Dedup was that,

W. Curtis Preston:

um, that it was, you didn't have to do a lot of re-engineering of

W. Curtis Preston:

your backup system.

Prasanna Malaiyandi:

it's like a VTL system, right?

Prasanna Malaiyandi:

That came.

W. Curtis Preston:

plug in a box.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

And you would send you, and basically you stopped using tape and you

W. Curtis Preston:

sent your backups to this box.

W. Curtis Preston:

Maybe the box might even be pretending to be a tape library,

W. Curtis Preston:

the virtual tape library.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, and then it did all the dedupe magic over there.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

Which was great because you can

Prasanna Malaiyandi:

just plug in your box and go.

Prasanna Malaiyandi:

Now the other side is called source side dedupe, instead of sending all the

Prasanna Malaiyandi:

data and tossing it away, why don't we do something smart and actually figure

Prasanna Malaiyandi:

out the duplicates on the client itself, on the source right, dedupe on the

Prasanna Malaiyandi:

source, and only send the unique data.

Prasanna Malaiyandi:

And this has the advantage.

Prasanna Malaiyandi:

Actually not sending the data over the wire, which is actually

Prasanna Malaiyandi:

a huge benefit that people don't understand always, right?

Prasanna Malaiyandi:

Is not sending the data can actually make it a lot faster, even though

Prasanna Malaiyandi:

you think, oh, I'm now putting additional load on my server itself.

Prasanna Malaiyandi:

But it ends up being better than trying to send all the data and just tossing

Prasanna Malaiyandi:

it away like target-side dedupe does.

W. Curtis Preston:

I would say it theoretically should be better

W. Curtis Preston:

right?

W. Curtis Preston:

Because you, I'm just saying I've seen some crappy source dedupe systems, right?

Prasanna Malaiyandi:

Okay.

Prasanna Malaiyandi:

Sorry.

Prasanna Malaiyandi:

I've seen some, I've seen some good ones, or the ones that I've

Prasanna Malaiyandi:

interacted with have been good.

Prasanna Malaiyandi:

And so I've seen the performance numbers around

W. Curtis Preston:

Yeah.

W. Curtis Preston:

I, I do think it, it makes more sense to me.

W. Curtis Preston:

It always made more sense to me.

W. Curtis Preston:

The only reason why we had Target dedupe was because to do source dedupe , you

W. Curtis Preston:

have to redesign the backup product.

W. Curtis Preston:

, right?

W. Curtis Preston:

It took a long time to get, to get, uh, basically you have to

W. Curtis Preston:

stop using net backup networker or tsm, whatever it was back in the

W. Curtis Preston:

day, and you had to replace it.

W. Curtis Preston:

Like in this case with Avamar, Avamar was a source do-do product.

W. Curtis Preston:

You had to do what we call a four clipped upgrade.

W. Curtis Preston:

You had to throw out the baby with the bathwater, whatever phrase, whatever.

W. Curtis Preston:

You know,

W. Curtis Preston:

uh, analogy you want to use there.

W. Curtis Preston:

That was the main problem as I saw it with source dedup.

W. Curtis Preston:

Right.

W. Curtis Preston:

Is that, is that you, you had to change your backup product to get it,

Prasanna Malaiyandi:

and that was in the beginning, right?

Prasanna Malaiyandi:

At the very early

W. Curtis Preston:

Well, well, You.

W. Curtis Preston:

You, well, yeah.

W. Curtis Preston:

Now you just had to, had to upgrade your backup product, right?

W. Curtis Preston:

Because many of modern backup technologies now support source dedupe , although

W. Curtis Preston:

even some newer backup technologies don't, I don't know if, I dunno if that

W. Curtis Preston:

came out in English, so some I, there was some double negatives in there.

W. Curtis Preston:

Some newer, very new backup technologies.

W. Curtis Preston:

Don't do source dedupe

W. Curtis Preston:

. Prasanna Malaiyandi: which seems bunkers.

W. Curtis Preston:

:

which does seem bonkers.

W. Curtis Preston:

:

Um, I, you know, and, um, I'm talking about the likes of

W. Curtis Preston:

:

Rubric and Cohesity, right?

W. Curtis Preston:

:

These are new, these are, you know, next gen backup products that were designed

W. Curtis Preston:

:

in the last, less than the last 10 years.

W. Curtis Preston:

:

Right.

W. Curtis Preston:

:

And it's based on an appliance model.

W. Curtis Preston:

:

and they do all the dedupe inside that box, is my understanding, right?

Prasanna Malaiyandi:

And I just wanna challenge that, Curtis, because I thought

Prasanna Malaiyandi:

in some cases, They do do source side deduplication, but I think because they've

Prasanna Malaiyandi:

tried to be open and act as a target device, in those cases, you can't, like,

Prasanna Malaiyandi:

you don't really have another option.

W. Curtis Preston:

Yeah, I, I don't, well, again, I'm not,

Prasanna Malaiyandi:

I, but I don't know

W. Curtis Preston:

work at, I work at Druva, not at Rubrik, uh, or, or Cohesity.

W. Curtis Preston:

But it is my understanding that they do target side dedup, which is, and,

W. Curtis Preston:

and one of the challenges of target side dedup is you need an appliance.

W. Curtis Preston:

at each location.

W. Curtis Preston:

Now I know that they can do virtual appliances, right?

W. Curtis Preston:

So they have a, they have a VM level appliance.

W. Curtis Preston:

Uh, but you need a box or something pretending to be a box at each location,

W. Curtis Preston:

because if you're not eliminating the duplicates before you send it

W. Curtis Preston:

to the box, um, then you need, you need something that's on-prem, right?

Prasanna Malaiyandi:

Because you definitely don't wanna

Prasanna Malaiyandi:

send that all over the Wan

W. Curtis Preston:

No, no, that's the, to me, that's the biggest advantage

W. Curtis Preston:

of a source dedupe system is that it's ultimately scalable, right?

W. Curtis Preston:

That you, that assuming, assuming it doesn't slow things down, assuming,

W. Curtis Preston:

assuming all these things, assuming that the product actually works, um, that

W. Curtis Preston:

you, um, you could back up a laptop.

W. Curtis Preston:

, right?

W. Curtis Preston:

You can back up a mobile phone and the, the duplicate data will be eliminated

W. Curtis Preston:

before it's sent over the wan, which is what you need to do if you're

W. Curtis Preston:

backing up something over the internet.

Prasanna Malaiyandi:

Mm-hmm.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, and, um, so the, the downside that some, you know, again, you,

W. Curtis Preston:

you, you talked about it already, is that it does put additional

W. Curtis Preston:

compute requirement on the client.

W. Curtis Preston:

The argument is that it's offset by the,

W. Curtis Preston:

um, the savings of the network bandwidth.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

There is also one more downside,

W. Curtis Preston:

okay.

Prasanna Malaiyandi:

which is that.

Prasanna Malaiyandi:

Not all applications can do source side deduplication.

Prasanna Malaiyandi:

So if you do have an application which only supports writing to like

Prasanna Malaiyandi:

an NFS Mount point or an SMB Mount point, or something that doesn't

Prasanna Malaiyandi:

allow the integration of these source side deduplication duplication logic,

Prasanna Malaiyandi:

then you are going to need to be able to support target side dedupe.

Prasanna Malaiyandi:

do.

W. Curtis Preston:

Yep.

W. Curtis Preston:

Uh, agreed.

W. Curtis Preston:

Um, and an example of that would be like, um, uh, Oracle, right?

Prasanna Malaiyandi:

Yep.

Prasanna Malaiyandi:

Incremental merge.

W. Curtis Preston:

yeah.

W. Curtis Preston:

Um, although I would think that you should be able, I don't know, we could, we

Prasanna Malaiyandi:

No, you can't.

Prasanna Malaiyandi:

You can't.

Prasanna Malaiyandi:

You can't.

W. Curtis Preston:

You can't take the Oracle stream and slice it and dice it.

W. Curtis Preston:

I don't know.

Prasanna Malaiyandi:

Did you what?

Prasanna Malaiyandi:

Sorry?

Prasanna Malaiyandi:

You could, um, there are companies out there which give, which provide

Prasanna Malaiyandi:

a virtual file system interface

Prasanna Malaiyandi:

that lives

W. Curtis Preston:

So you you fake it.

W. Curtis Preston:

You fake it out.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

And then I've got something called hybrid dedupe and this, this was

W. Curtis Preston:

invented by your former employer.

Prasanna Malaiyandi:

I don't even know what a hybrid dedupe is.

W. Curtis Preston:

it's, it's, it's Target Dedoo pretending to be Source cdu.

W. Curtis Preston:

It's

Prasanna Malaiyandi:

D.

Prasanna Malaiyandi:

Oh, see, here's my, okay, so here's my problem is I think Boost

W. Curtis Preston:

Uhhuh.

Prasanna Malaiyandi:

is source.

Prasanna Malaiyandi:

I deduplication, I don't know if I would call it hybrid, because it is

Prasanna Malaiyandi:

very similar to what Avamar DI did.

Prasanna Malaiyandi:

, right?

Prasanna Malaiyandi:

It's moving the deduplication logic to the client

Prasanna Malaiyandi:

such that you could do all of the computation.

Prasanna Malaiyandi:

The same thing that we have talked about with source I deduplication,

W. Curtis Preston:

I, I'll tell you why I put it in a different category.

W. Curtis Preston:

To me, hybrid dedupe is redoing the backup software.

W. Curtis Preston:

I'm sorry, source dedupe, true source iDation.

W. Curtis Preston:

It's done at the backup software level,

Prasanna Malaiyandi:

Okay, then.

Prasanna Malaiyandi:

I

W. Curtis Preston:

with, with with hybrid dedupe . I'm still dumb sending

W. Curtis Preston:

everything to this source dedupe thing that's gonna redo it, right?

W. Curtis Preston:

Um, it doesn't matter in the end, you get, you get roughly the same benefits, right?

W. Curtis Preston:

Um, that's what, uh,

Prasanna Malaiyandi:

Okay.

Prasanna Malaiyandi:

So with hybrid, yeah.

Prasanna Malaiyandi:

You get the benefits of source without having to upgrade and, or sorry,

Prasanna Malaiyandi:

throw away your backup software.

W. Curtis Preston:

Right, right, right.

W. Curtis Preston:

Um, so I, I, um, we spent most of this time talking about dedupe . Um,

W. Curtis Preston:

there are a bunch of different ways to use disk in your backup system.

W. Curtis Preston:

Some of which don't really require dedup, right?

W. Curtis Preston:

We used to do what we call disk cashing, where you just had enough

W. Curtis Preston:

disk for last night's backup.

W. Curtis Preston:

You would back up to disk and then you would copy that to tape, and then

W. Curtis Preston:

you would hand that to a man in a van.

W. Curtis Preston:

Uh, then we got a bunch of different things.

W. Curtis Preston:

I got D to D to T D to D to D, D to D, D to C, and D to D to to C.

W. Curtis Preston:

Did I do all that?

W. Curtis Preston:

So dis to dis to tape disc, to disc to disk, direct cloud and

W. Curtis Preston:

dis to disc to cloud, right?

W. Curtis Preston:

So these are all ways that people use disk in current backup systems.

W. Curtis Preston:

Um, to me, d D to C or disto disc to cloud is really dis to disc.

W. Curtis Preston:

To disc is just the cloud is or the

W. Curtis Preston:

disc Is being run by the cloud, right?

W. Curtis Preston:

And I will say that dedupe , by the way, I will say that without d.

W. Curtis Preston:

The whole thing of using the cloud, the way we use the cloud just wouldn't work.

W. Curtis Preston:

I mean, you can't send full backups to the cloud.

W. Curtis Preston:

I mean, you could, with unlimited bandwidth.

Prasanna Malaiyandi:

well, and yeah, with unlimited bandwidth

Prasanna Malaiyandi:

it would just be expensive.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

Just going back to the conversation we had earlier about the wan, right?

Prasanna Malaiyandi:

You don't wanna send full copies out to over the wan.

W. Curtis Preston:

right.

Prasanna Malaiyandi:

Um, because that gets expensive and very slow.

Prasanna Malaiyandi:

Um, the other one I was going to comment on was, uh, oh, I know we've

Prasanna Malaiyandi:

been talking about disk, but I think it's also important to acknowledge

Prasanna Malaiyandi:

that now it's no longer spinning disk.

Prasanna Malaiyandi:

It could also be flash.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

We've seen

W. Curtis Preston:

yeah,

W. Curtis Preston:

but that's a whole other thing

Prasanna Malaiyandi:

I I, I, know, but I'm just saying that when it

Prasanna Malaiyandi:

comes to deduplication and backup ST or protection storage, right?

Prasanna Malaiyandi:

This, it could be flash, it could be disk, it could be object storage, right?

Prasanna Malaiyandi:

So I think it's important to differentiate that, like what we're

Prasanna Malaiyandi:

talking about with deduplication, when we mentioned disk, right?

Prasanna Malaiyandi:

The media layer itself.

Prasanna Malaiyandi:

Yeah, the media layer.

Prasanna Malaiyandi:

Yes.

Prasanna Malaiyandi:

The media layer is not tape.

W. Curtis Preston:

Right, right.

W. Curtis Preston:

Hang on one second.

W. Curtis Preston:

Um, I need to, didn't realize I had a, I had a, um, Meeting

Prasanna Malaiyandi:

Meaning a.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Four.

W. Curtis Preston:

Well, four 15, which is an odd, um, all right.

W. Curtis Preston:

It's a, it's a pre-meeting with a podcast thing.

W. Curtis Preston:

It's, um, anyway, um, so, uh, yeah, so, okay, you know, I hate the idea of flash

Prasanna Malaiyandi:

know, I know, I know.

Prasanna Malaiyandi:

I'm, I, I'm just saying that people will bring it up.

Prasanna Malaiyandi:

So I just wanna clarify that when we talk about disc, we're

Prasanna Malaiyandi:

just talking about not tape.

W. Curtis Preston:

The only place.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Correct.

W. Curtis Preston:

The only place where I think maybe Flash has a place in the backup

W. Curtis Preston:

system is, and you know, you know, the folks over at Pierre and

W. Curtis Preston:

Neil, they're all mad at me now.

W. Curtis Preston:

Right.

W. Curtis Preston:

But, uh, the only place that I, where I think Flash has a place in the backup

W. Curtis Preston:

system is with like live recovery.

W. Curtis Preston:

If you're gonna do, if you're gonna do instant recovery and you're actually gonna

W. Curtis Preston:

run VMs off of your backups, that better be some really nice performing disk.

W. Curtis Preston:

But the thing is, it doesn't need to be your whole system.

W. Curtis Preston:

It just needs to be like the most

Prasanna Malaiyandi:

A part, part of, and it needs to, you don't need

Prasanna Malaiyandi:

your entire system to be flash,

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

You just

Prasanna Malaiyandi:

need enough to be able to support that use case.

W. Curtis Preston:

I, I just think that where Flash does really,

W. Curtis Preston:

really, Is in random access, right?

W. Curtis Preston:

Backup isn't a random access application.

W. Curtis Preston:

Backup is a streaming application.

W. Curtis Preston:

Even if what we're talking is large dedupe chunks.

W. Curtis Preston:

I don't

W. Curtis Preston:

know.

W. Curtis Preston:

I, I,

Prasanna Malaiyandi:

I,

Prasanna Malaiyandi:

I,

W. Curtis Preston:

say, let's just say the jury is out for me.

W. Curtis Preston:

I, I am in Missouri.

W. Curtis Preston:

Missouri.

W. Curtis Preston:

Is that, is that the show me state?

W. Curtis Preston:

That's the show me state.

W. Curtis Preston:

Right?

Prasanna Malaiyandi:

yeah.

W. Curtis Preston:

So I'll tell you what, I'll tell you what.

W. Curtis Preston:

If there's anybody that's listening to this that just got pissed off,

Prasanna Malaiyandi:

what's his

Prasanna Malaiyandi:

name?

Prasanna Malaiyandi:

I'll come back

W. Curtis Preston:

to, I welcome you to, come on and tell me why I'm wrong.

W. Curtis Preston:

I, I just,

Prasanna Malaiyandi:

I, I, I, I know who will come back on, you

W. Curtis Preston:

who, who,

W. Curtis Preston:

will come back on,

Prasanna Malaiyandi:

what's his name?

Prasanna Malaiyandi:

Bass Data guy.

W. Curtis Preston:

uh oh.

W. Curtis Preston:

Oh, are they flash

Prasanna Malaiyandi:

Yeah,

W. Curtis Preston:

mark?

W. Curtis Preston:

Um, No, sorry, Howard.

W. Curtis Preston:

Uh, Howard.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

Fastest.

Prasanna Malaiyandi:

Pure flash.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Um, all right.

W. Curtis Preston:

All right.

W. Curtis Preston:

Well, yeah, Howard, uh, you wanna tell, you wanna tell me why I'm wrong?

W. Curtis Preston:

Um, I'm more than happy to have you back.

W. Curtis Preston:

We can duke it out.

W. Curtis Preston:

We can duke it.

W. Curtis Preston:

wouldn't be the first time.

W. Curtis Preston:

Howard and I have, have disagreed on something.

W. Curtis Preston:

I don't know.

W. Curtis Preston:

It's just, it's just there are so many area, there are so many

W. Curtis Preston:

other places where I would wanna spend money in the backup system.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Um, and, um,

Prasanna Malaiyandi:

comes down to what the cost is.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

If you could get flash down to a low enough point,

W. Curtis Preston:

which is the point of vast data, right?

W. Curtis Preston:

Their architecture allows using flash in a, um, you know, a significant way,

Prasanna Malaiyandi:

:

That's, that's why I brought

W. Curtis Preston:

uh, close to cost.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

All right.

W. Curtis Preston:

All right.

W. Curtis Preston:

Um, and then I got this whole other thing.

W. Curtis Preston:

I'm not gonna go into that other thing.

W. Curtis Preston:

Um, but yeah, so d d makes disk and, and cloud-based products, both physiologically

W. Curtis Preston:

feasible as well as economically feasible.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

is.

W. Curtis Preston:

hmm.

Prasanna Malaiyandi:

Is there something that a person shopping for

Prasanna Malaiyandi:

a dedupe system should be asking?

Prasanna Malaiyandi:

Like what are the important things that they should be

Prasanna Malaiyandi:

asking in order to determine

W. Curtis Preston:

yeah, that's a, that's a great question.

W. Curtis Preston:

I think the, the question would be things about what's the restored performance?

W. Curtis Preston:

Because in the end, that's the only thing that matters.

W. Curtis Preston:

I remember.

W. Curtis Preston:

A product.

W. Curtis Preston:

Now, this product is still on the market, but I believe, I believe

W. Curtis Preston:

they have addressed this, this issue.

W. Curtis Preston:

I remember a dedupe product.

W. Curtis Preston:

It was a Target dedupe product that had, uh, I remember that had 400 megabytes

W. Curtis Preston:

a second throughput in to an appliance.

Prasanna Malaiyandi:

:

And like 10 megabits out

W. Curtis Preston:

It was 40, it was 40, it was 40, uh, megabytes out.

W. Curtis Preston:

It had a 90%, what we call dedupe tax.

W. Curtis Preston:

Right.

W. Curtis Preston:

That the, because the problem with dedupe, depending on how

W. Curtis Preston:

you store it, is that you've got everything you need all over the

Prasanna Malaiyandi:

All over the place.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

And so this was just a really, really, really bad design.

W. Curtis Preston:

And um, uh, I believe that they addressed it and, um, because that

W. Curtis Preston:

product is still on the market today.

W. Curtis Preston:

But that version, one of that product was ble.

W. Curtis Preston:

Um, so yeah, it's about restored performance, right?

W. Curtis Preston:

So one thing, oh, I'm.

W. Curtis Preston:

Uh, dedupe ratio is crap.

W. Curtis Preston:

Don't look at dedupe ratio.

W. Curtis Preston:

dedupe ratio is a made up number.

W. Curtis Preston:

Um, I will, um, I'll, I'll go back to, I'll pick on Avamar.

W. Curtis Preston:

Avamar.

W. Curtis Preston:

Back in the day, they used to say they had a 400 to one DEDUP ratio.

W. Curtis Preston:

Do you remember this?

W. Curtis Preston:

Because

W. Curtis Preston:

they basically considered every backup as a full backup.

W. Curtis Preston:

They're like, the way we store backups, which is the same way Druva stores

W. Curtis Preston:

backups, the way we store backups.

W. Curtis Preston:

It's like, even though they're incremental, it's like they're a full.

W. Curtis Preston:

, right?

W. Curtis Preston:

Because they behave like a full during a restore.

W. Curtis Preston:

And so they considered every backup a full.

W. Curtis Preston:

And so they said, well then therefore the dedup ratio is 400 to one.

W. Curtis Preston:

Well, that was always complete nonsense.

W. Curtis Preston:

Um, the other would be, I remember, uh, again, I'm gonna pick on people equally.

W. Curtis Preston:

I remember sales reps of a certain large target.

W. Curtis Preston:

D company that where you might've worked, where they would tell customers to go

W. Curtis Preston:

and do full backups more frequently because it made their dedup ratio better.

W. Curtis Preston:

, which is just, again, nonsense.

W. Curtis Preston:

What matters, in my opinion, what matters is how big is a full backup versus

W. Curtis Preston:

how big are all the backups, right?

W. Curtis Preston:

So if I have.

W. Curtis Preston:

If I, let me, let me explain what I'm saying.

W. Curtis Preston:

If I have a hundred terabytes, if, if one full backup of my environment is a hundred

W. Curtis Preston:

terabytes and then after three months how big is, or whatever number you want.

W. Curtis Preston:

Uh, but it's just three months seems like a, a nice, long, um, what do you call it?

W. Curtis Preston:

Uh, POC thing,

W. Curtis Preston:

right?

W. Curtis Preston:

Um, after a hundred, after, you know, three months, how.

W. Curtis Preston:

How much stuff is stored over there?

W. Curtis Preston:

That's what I'm saying.

W. Curtis Preston:

Don't dedupe ratios is nonsense that that didn't come out in English.

W. Curtis Preston:

dedupe ratios are nonsense, but if I can fit a hundred terabytes right, if I have

W. Curtis Preston:

a hundred terabyte environment and then a series of incremental backups, and then

W. Curtis Preston:

over there, my question is how big is.

W. Curtis Preston:

How much data did I write to disk?

W. Curtis Preston:

And let's say it's, it's, it's 200 terabytes after 90 days.

W. Curtis Preston:

And then compare that with another product who writes a hundred terabytes?

W. Curtis Preston:

You backed up the same data, but you used half as much storage on the back end.

W. Curtis Preston:

. That's what I'm saying.

W. Curtis Preston:

The the, the problem is, and the, the other reason, and again,

W. Curtis Preston:

I'm a little extra sensitive to this cuz I work for Druva.

W. Curtis Preston:

People ask us what's our, what's our dedupe ratio?

W. Curtis Preston:

We're like, well the thing is we're like the opposite of Avamar.

W. Curtis Preston:

Well we're actually similar to Avamar in that we're source I dedupe,

W. Curtis Preston:

but we don't use that funny math.

W. Curtis Preston:

So we could say 401, but that's nonsense.

W. Curtis Preston:

So you know, we say, well, we.

W. Curtis Preston:

Because, because we also do incremental forever backups.

W. Curtis Preston:

That's, that's the problem.

W. Curtis Preston:

Right.

W. Curtis Preston:

So, um, but I know that on average, if we have a hundred terabyte

W. Curtis Preston:

customer, we store, you know, roughly a year's worth of backups in less

W. Curtis Preston:

than a hundred terabytes of disk.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

And I think it's important there to also account for that increment, like

Prasanna Malaiyandi:

how I look at these like numbers.

Prasanna Malaiyandi:

I totally get what you said, Curtis, like you should just do an apples apples.

Prasanna Malaiyandi:

But if you don't have that ability, you should also look to say, okay,

Prasanna Malaiyandi:

I have a hundred terabyte full.

Prasanna Malaiyandi:

And then say, my daily change rate is 2%.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

So if I do 2% for a month, right?

Prasanna Malaiyandi:

That's, what is that two 60?

Prasanna Malaiyandi:

60 more terabytes, right?

Prasanna Malaiyandi:

So it should be 160 terabytes worth of data that I sent over, right?

Prasanna Malaiyandi:

For 160 terabytes worth of data, how much should I actually store?

Prasanna Malaiyandi:

Right?

Prasanna Malaiyandi:

Which will give you similar things to what you're saying, right?

Prasanna Malaiyandi:

But Bec, because what you're saying is if you had the two products, then

Prasanna Malaiyandi:

you could do a direct comparison.

Prasanna Malaiyandi:

But I'm saying if you don't have the two products, then

Prasanna Malaiyandi:

here's another way you could

W. Curtis Preston:

Well, I, well, I would argue that there's no way

W. Curtis Preston:

to compare them if you don't have two pro, if you, if you're not, if

W. Curtis Preston:

you're not doing a true comparison.

W. Curtis Preston:

Right.

Prasanna Malaiyandi:

A

W. Curtis Preston:

it's just, it's just that d math is funny, right?

W. Curtis Preston:

So different products charge differently, right?

W. Curtis Preston:

You look at, um, like when you look at Metallic, which competes

W. Curtis Preston:

with Druva, they have a frontend price and we have a backend price.

W. Curtis Preston:

They have, they actually have the front end price, and then you also

W. Curtis Preston:

need to pay for the backend storage.

W. Curtis Preston:

Right?

W. Curtis Preston:

So you're paying, so how do you, how do you compare that?

W. Curtis Preston:

Um, it's, it's just, it's difficult

Prasanna Malaiyandi:

hard.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

it's hard.

W. Curtis Preston:

Uh, but all I'm saying is dedup ratio is crap and doesn't mean anything.

W. Curtis Preston:

Um, but what does matter is how much data are you storing on that

W. Curtis Preston:

backend because you will be paying for that one way or the other.

W. Curtis Preston:

All right.

W. Curtis Preston:

I don't know if we made this, if we, if this is clear as mud or what, but, uh, I

W. Curtis Preston:

hope that was helpful and, uh, maybe we, maybe we ticked off Howard and Howard's

W. Curtis Preston:

gonna come on next week's episode.

W. Curtis Preston:

. I dunno.

Prasanna Malaiyandi:

Come join us,

Prasanna Malaiyandi:

Howard,

W. Curtis Preston:

Thanks for, thanks for, uh, thanks for helping

W. Curtis Preston:

me with my network as well, so,

Prasanna Malaiyandi:

anytime, Curtis.

Prasanna Malaiyandi:

Just remember I am not tech support.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

All right.

W. Curtis Preston:

Well, uh, and thanks to the listeners and remember to subscribe

Links

Chapters

Video

More from YouTube