Artwork for podcast Modern Digital Business
ModernOps with Beth Long: Transferring Operational Expertise to the Cloud
Episode 726th September 2022 • Modern Digital Business • Lee Atchison
00:00:00 00:17:36

Share Episode

Shownotes

Today on Modern Digital Business, we continue our highly successful series called ModernOps. ModernOps is a series of interviews co-hosted with a good friend of mine, Beth long, who is the head of product at jeli.io, an incident analysis company.

This will be our second in a series of episodes. In the first episode, we talked about how the experience using the cloud varies from large companies to small companies. In this episode, we talk about transferring operational expertise 📍 from an on-premise data center to a cloud centric infrastructure.

Today on Modern Digital Business.

About Lee

Lee Atchison is a software architect, author, public speaker, and recognized thought leader on cloud computing and application modernization. His most recent book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee has been widely quoted in multiple technology publications, including InfoWorld, Diginomica, IT Brief, Programmable Web, CIO Review, and DZone, and has been a featured speaker at events across the globe.

Take a look at Lee's many books, courses, and articles by going to leeatchison.com.

Looking to modernize your application organization?

Check out Architecting for Scale. Currently in it's second edition, this book, written by Lee Atchison, and published by O'Reilly Media, will help you build high scale, highly available web applications, or modernize your existing applications. Check it out! Available in paperback or on Kindle from Amazon.com or other retailers.


Don't Miss Out!

Subscribe here to catch each new episode as it becomes available.

Want more from Lee? Click here to sign up for our newsletter. You'll receive information about new episodes, new articles, new books and courses from Lee. Don't worry, we won't send you spam and you can unsubscribe at any time.

Mentioned in this episode:

LinkedIn Learning Courses

Are you looking to become an architect? Or perhaps are you looking to learn how to drive your organization towards better utilization of the cloud? Are you you looking for ways to help you utilize a Cloud Center of Excellence in your organization? I have a whole series of cloud and architecture courses available on LinkedIn Learning. For more information, please go to leeatchison.com/courses or mdb.fm/courses.

O'Reilly Media - Building a Cloud Roadmap

Have you struggled with the cloud migration? Then you'll appreciate my live training course, Building a Cloud Roadmap presented by O'Reilly Media. Live on October 5th at 9:00 AM PDT. For more information, go to mdb.fm/roadmap or leeatchison.com/roadmap. But hurry seats are limited.

Transcripts

Lee:

Today on Modern Digital Business, we continue our highly

Lee:

successful series called ModernOps.

Lee:

ModernOps is a series of interviews co-hosted with a good friend of mine,

Lee:

Beth long, who is the head of product at jeli.io, an incident analysis company.

Lee:

This will be our second in a series of episodes.

Lee:

In the first episode, we talked about how the experience using the cloud varies

Lee:

from large companies to small companies.

Lee:

In this episode, we talk about transferring operational expertise

Lee:

from an on-premise data center to a cloud centric infrastructure.

Lee:

Are you ready?

Lee:

Let's go.

Lee:

My co-host for this series is Beth Long.

Lee:

Beth is head of product at jeli.io, an incident analysis and management

Lee:

platform that combines comprehensive data from multiple sources to help identify

Lee:

problems and proactive solutions.

Lee:

Beth and I worked together in New Relic where Beth was heavily involved in the

Lee:

product operational management of a highly scaled and fast growing application.

Lee:

Beth has a strong background in IT operations.

Lee:

Given my cloud and IT management expertise in her, IT operations expertise, Beth and

Lee:

I joined together to create this series of episodes that we call ModernOps.

Lee:

We recorded this content back in the spring of 2021, right in

Lee:

the middle of the pandemic, but we never published it until now.

Lee:

In this second episode, Beth and I are talking about operational expertise

Lee:

and how that changes when you move from an on-premise data center to

Lee:

a cloud centric infrastructure.

Lee:

How do you manage changing expertise requirement's across your organization

Lee:

as you make the transition to the cloud.

Lee:

I hope you enjoy.

Lee:

So a related topic here, I think is the question about what does AWS or GCP or

Lee:

Azure, whoever, but what does AWS provide?

Lee:

You as a small company compared to a larger company, like a new

Lee:

Relic or even larger companies that are cloud native, what type

Lee:

of support do they provide to you?

Lee:

And does that impact your ability to use the cloud or to,

Lee:

to leverage what AWS provides?

Beth:

Certainly there is a different stated offering in terms of level of

Beth:

support as an engineer at a tiny startup.

Beth:

I probably can't get a rep on the phone if something goes wrong.

Beth:

Yeah.

Beth:

I'm gonna be looking on stack overflow.

Beth:

That's my that's probably gonna be my primary support as opposed

Beth:

to being able to actually talk to someone when there is an issue.

Lee:

So when you read new Relic, you were able to talk to someone directly.

Lee:

If you needed to,

Beth:

to an extent

Beth:

there, there were people who could talk directly to an AWS.

Beth:

if there was a major problem, but I think

Lee:

so there were, were the cases where that

Beth:

helped that's what I was gonna get to.

Beth:

Okay.

Beth:

It, it usually trying to think of specific examples and outcomes,

Beth:

the, my memory of some of the events where that happened was that.

Beth:

helped us to identify what we could do better on our end to mitigate the issue,

Beth:

but it typically didn't lead to a change in what was happening on the AWS side.

Beth:

Like we typically had to just wait out what, whatever the issue was.

Beth:

So we did get more visibility, but didn't necessarily get an accelerated

Beth:

resolution on the AWS side.

Beth:

And I'm thinking of big issues like, uh, issues with, with network

Beth:

links and that sort of thing.

Lee:

So there is a perceived value, but not necessarily a practical value.

Lee:

Interesting question then.

Lee:

So since in the realm where you are like jelly, where you have to depend on

Lee:

the stack overflow, because you're not getting the support from AWS, are you

Lee:

getting better support from stack overflow than you got at directly from AWS?

Lee:

Is this a blessing in disguise?

Lee:

I, I guess is what I'm saying.

Beth:

Yeah.

Beth:

It certainly means that you're planning for reality a little bit more.

Beth:

There's not this idea that you can call someone and get help.

Beth:

So you know that you have to plan for that.

Beth:

It's hard to compare because the scale of issue that we're dealing with is itself.

Beth:

So different, right?

Beth:

And the kinds of problems that we tend to have are the sorts

Beth:

of things that you could.

Beth:

If you're having an issue with your, with your database instance,

Beth:

you could fail over to another one.

Beth:

And at new Relic scale, when you run into big issues, they tend to be so thorny that

Beth:

you don't have a lot of escape patches.

Lee:

but that was as much by the new Relic architecture and scale than that.

Lee:

It was because of AWS that's.

Lee:

Yeah.

Lee:

So was it the architecture and scale or was it the architecture and, and

Lee:

this isn't a knock on architecture is a knock on starting over again with a

Lee:

company like jelly and who has a young immature architecture compared to.

Lee:

Architecture like new Relic, which has been established for many years and

Lee:

there's pros and cons to that, right.

Lee:

There's obviously a maturity has value, but maturity also has

Lee:

Scruff mm-hmm and have to deal with all of those sorts of issues.

Lee:

Mm-hmm how much of that was the complexity that was added with new Relic?

Lee:

Was it because of AWS and the complexity of using the cloud and how much of it was.

Lee:

The Scruff of the architecture from the maturity and how much

Lee:

of it was the scale involved?

Beth:

Certainly there's an element that was scale, but any architecture

Beth:

is gonna by definition, make various trade offs and when your architecture.

Beth:

Was designed to optimize for the trade offs of a different world.

Beth:

Then as you try to bend that into the cloud, I think sometimes you

Beth:

run up against spots where you've optimized for a different landscape.

Lee:

So that actually brings, us back to talking about the infrastructure

Lee:

versus the infrastructure you don't know.

Lee:

Mm.

Lee:

And that is when you are building your infrastructure from scratch.

Lee:

Like you are with jelly, you build knowledge of the infrastructure

Lee:

and you, that knowledge grows as your application grows.

Lee:

And as the application scales, and in theory, your knowledge grows at

Lee:

the same rate and the same time.

Lee:

And as you need more expertise, you have more expertise and

Lee:

everything's good and wonderful.

Lee:

I make it sound a lot.

Lee:

Yeah.

Lee:

Right.

Lee:

More simple than it really is.

Lee:

I, I get that, but, but there's, I think there's a generalization that does apply

Lee:

the, the same process occurred by the way, with new Relic, as they grew from

Lee:

a small company into a larger company.

Lee:

Um, with their on-prem data centers and the knowledge and

Lee:

expertise grew, the maturity grew.

Lee:

They knew what, how it worked and how it, how different things would

Lee:

respond and knew this type of problem.

Lee:

And when this sort of thing happened, they knew how to respond to it.

Lee:

And all that expertise grew in and got to the point where they were a very large

Lee:

company or relatively speaking, a large company and knew that level of expertise.

Lee:

Now, take that and move to a completely different infrastructure.

Lee:

All that expertise goes out the window and you have to start over again yet.

Lee:

You still have the same scale and level of responsibility that you did

Lee:

previously that makes migration hard.

Lee:

Talk about that a little bit.

Lee:

And how you know, that was obviously when you were in new Relic, you ran into that.

Lee:

I'm sure.

Lee:

Talk about that a little bit.

Lee:

Sure.

Beth:

Yeah, that, that is a great point.

Beth:

And, and even at jelly, even though we have the access to all of the

Beth:

AWS managed infrastructure and services, we've still elected to use

Beth:

the things that we know how to use.

Beth:

We've still used, like we're big proponents of boring tech.

Beth:

And so even within the set of options, we're not using things.

Beth:

That folks haven't used at least some version of that in production,

Beth:

as new Relic moved into the cloud, a number of things happened.

Beth:

One is for example, we had, we had a team of really strong network

Beth:

engineers that knew how to manage networks had really deep expertise

Beth:

in, in that their expertise became.

Beth:

Redundant.

Beth:

And they all either changed roles or moved on.

Beth:

And where before we had people who could trace the packet all the way through

Beth:

the entire system, we, a lot of that, a lot of that expertise became redundant

Beth:

and we shifted to teams that say had previously owned, bare metal, having to

Beth:

learn how to manage AWS infrastructure.

Beth:

So a lot of the, a lot of the expertise in managing what was in the cloud, kind

Beth:

of centered on the team container fabric that owned our containerization platform.

Beth:

So they were juggling, both owning the, the old stuff and keeping that

Beth:

running while also spinning up the new stuff and learning the new ecosystem.

Beth:

And so there was the challenge, not just of building up that new expertise,

Beth:

but also of pulling a critical team very thin during that whole phase.

Lee:

So you, when you think about layered expertise, right, you.

Lee:

Network engineers and knew how packets worked all the way up to application

Lee:

expertise that knew how UIs worked.

Lee:

Mm-hmm , I'm just making this up, but there's a whole level of

Lee:

expertise in the middle there.

Lee:

And as you made this move, you were making several changes at the same time.

Lee:

One is you were containerizing and virtualizing where containers were

Lee:

stored along with moving to the cloud.

Lee:

You can do those two things independently, but so they were

Lee:

both going on at the same time.

Lee:

So the, the net result of all that was you were adding required layers of expertise

Lee:

in the stack, as well as completely removing specific layers, right?

Lee:

You didn't need network engineers anymore, or at least network engineers

Lee:

doing the same thing that they were before doing exactly a different level.

Lee:

Issues.

Lee:

They were dealing maybe with more security or routing issues than they

Lee:

were with, you know, just packet tracking and, and is this cable

Lee:

broken and those sorts of issues.

Lee:

And so it's, it's not saying that all network engineering went away.

Lee:

Absolutely didn't but a certain class of network engineering went away and another

Lee:

class was heightened and a different class of service fabric went away.

Lee:

And another class of service service fabric.

Lee:

Appeared and changed overall.

Lee:

Do you think you need your, the size and complexity of the layers when

Lee:

you moved to the cloud decreased or stayed the same or increased?

Lee:

That's a great change, but did you really need fewer or did you

Lee:

really need less expertise and less people implementing that expertise?

Lee:

Once you moved to the cloud than you did before

Beth:

the cloud, it, we didn't need less expertise.

Beth:

I'd say if anything, we needed more.

Beth:

We just, it just changed shape.

Beth:

We needed expertise in new areas and we needed expertise.

Beth:

That was more about navigating the AWS ecosystem.

Beth:

and the things that were built on top of it.

Beth:

I'd say that we had more total layers and what happened was some

Beth:

of the layers at the very bottom, we no longer had access to directly.

Beth:

And so the people that were dealing solely with those bottom layers, Their,

Beth:

their expertise became redundant to the organization because that was

Beth:

hidden beneath these other layers.

Beth:

But then we added this whole new set of layers at the top that, that we had

Beth:

to, or I shouldn't say at the top, but we added this whole new set of layers

Beth:

and the complexity of interaction between those layers increased

Lee:

complexity of interaction between layers increases.

Lee:

So the.

Lee:

There's more need for fungible expertise.

Beth:

Exactly.

Lee:

So you actually needed more experts, but more fungible

Lee:

experts than you ever did before.

Lee:

Mm-hmm that makes total sense.

Lee:

Mm-hmm yeah.

Lee:

Yeah.

Lee:

You, you could, you needed someone who knew exactly the size and

Lee:

shape of ethernet packets before.

Lee:

Exactly.

Lee:

You don't need that at all now.

Lee:

No, that expertise is irrelevant now.

Lee:

But being able to go up and down to stack and dealing with container fabric up to

Lee:

security firewall configuration issues.

Lee:

Yep.

Lee:

That variability and that was critical.

Lee:

Right.

Beth:

And it was less, it's becoming less about knowing, obviously it's

Beth:

important to know a lot of fundamental principles, but it's less about knowing

Beth:

those fundamental sort of less changeable principles and more about maintaining a

Beth:

current vocabulary because all of these systems that we're using in the cloud

Beth:

are changing and evolving so quickly.

Beth:

So knowing what are the most recent releases, what are the most

Beth:

recent features that have come out?

Beth:

How does that impact what understanding AWS itself as a changing

Beth:

ecosystem, where you can estimate.

Beth:

The speed that you need to be moving at so that you'll land in the right place based

Beth:

on what's going to happen in six months, that all becomes much more important

Beth:

as you begin to navigate that world.

Beth:

That is, that is so constantly evolving.

Lee:

I hope you enjoyed ModernOps with my co-host Beth Long.

Lee:

ModernOps will be a regular series that will appear occasionally

Lee:

on Modern Digital Business.

Lee:

If you enjoyed this episode, let me know so we can make sure

Lee:

to include more conversations like this in future episodes.

Lee:

You can reach me via the links in the show notes, or sign up for more