Artwork for podcast Data Career Podcast: Helping You Land a Data Analyst Job FAST
217: Anthropic Just Dropped Their Internal Data Playbook (copy this)
Episode 217 β€’ 30th June 2026 β€’ Data Career Podcast: Helping You Land a Data Analyst Job FAST β€’ Avery Smith - Data Career Coach
00:00:00 00:20:46

Share Episode

Shownotes

Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away!

Anthropic just dropped their entire internal data playbook. Here's what they're doing and how it affects your career.

πŸ’Œ Join 30k+ aspiring data analysts & get my tips in your inbox weekly πŸ‘‰ https://datacareerjumpstart.com/newsletter

πŸ†˜ Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training πŸ‘‰ https://datacareerjumpstart.com/training

πŸ‘©β€πŸ’» Want to land a data job in less than 90 days? πŸ‘‰ https://datacareerjumpstart.com/daa

πŸ‘” Ace The Interview with Confidence πŸ‘‰ https://datacareerjumpstart.com/interviewsimulator

πŸ“„ Read Anthropic's full data playbook πŸ‘‰ https://claude.com/blog/how-anthropic-enables-self-service-data-analytics-with-claude

⌚ TIMESTAMPS

00:00 – Anthropic dropped their data playbook

02:39 – Why AI analytics keeps failing

05:24 – How they hit 95% accuracy

09:24 – What a Claude skill is

14:39 – None of this is actually new

17:09 – Still hiring data people

πŸ”— CONNECT WITH AVERY

πŸŽ₯ YouTube Channel

🀝 LinkedIn

πŸ“Έ Instagram

🎡 TikTok

πŸ’» Website

Mentioned in this episode:

July Cohort of DAA

Join the July Cohort of DAA and become an analyst! Be sure to check out our current deal to save BIG! See you in class!

https://datacareerjumpstart.com/daa

Transcripts

Speaker:

Avery Smith-3: So Anthropic, the makers

of Claude, literally just dropped an

2

:

absolute masterclass on how they analyze

data internally, and they posted a blog

3

:

post that is four thousand five hundred

words, and there's a lot in there.

4

:

So I summarized that entire blog

post, and I will explain it to you

5

:

like you're five years old in today's

episode, and literally you can steal

6

:

it and learn how to analyze data

just like a Claude data analyst.

7

:

So this is what Claude

is actually claiming.

8

:

They're claiming that they now

do self-serve analytics, which

9

:

is kind of a funny phrase.

10

:

Basically, it means allowing non-technical

people, non-data analysts to do data

11

:

analytics in easy ways, and this has

been a thing for the last decade or so.

12

:

In fact, it's one of the main reasons

why Tableau and Power BI became

13

:

so important with dashboards is it

allows business people, non-technical

14

:

people to actually kind of analyze

their data in predefined ways.

15

:

It's been really hard to do for

the last ten, fifteen years.

16

:

Now, basically, Anthropic just

tweeted that they are able to do

17

:

ninety-five percent accuracy on

all of their business analytics

18

:

queries with Claude, which is crazy.

19

:

That basically means that w- if anyone has

some sort of an analytics question, they

20

:

can answer it now with ninety-five percent

accuracy using this internal playbook.

21

:

So what are they actually doing, and

how can you replicate it in your own

22

:

organization, or how can you bring

this to an interview to make you a

23

:

more marketable aspiring data analyst?

24

:

So- Basically, like I said, self-serve

analytics has always kind of sucked.

25

:

It's when non-technical people are

analyzing the data sets, and there's

26

:

basically two different ways to do it.

27

:

Option A is you open up to everyone,

which basically means you have

28

:

non-data analyst people trying to

analyze data, and a lot can go wrong.

29

:

You can get really messy,

different queries, maybe

30

:

messy dashboards, conflicting

definitions, those type of things.

31

:

Or you lock it all down, which basically

means that, uh, you create a bajillion

32

:

different types of dashboards, but it

never really answers anyone's question

33

:

when they want it the way they want it.

34

:

And, uh, that's been, that's

been tricky in the past.

35

:

So now there's AI, and now you

can give, you know, Claude…

36

:

You can give someone Claude or

ChatGPT and point it to a database,

37

:

and you can have them ask ChatGPT

or Claude questions to the database.

38

:

Uh, but there's a big issue.

39

:

Number one, that we all think the

AI doesn't hallucinate, doesn't

40

:

lie, doesn't make things up, and

it does, and it can be wrong.

41

:

Uh, and number two, it gives everyone

like, "Oh, this is a hundred percent

42

:

accuracy," but it's, it's not, and

that can cause a lot of issues.

43

:

So, um, you know, AI is a great

solution for self-serve analytics, but

44

:

it causes a lot of problems as well.

45

:

So how did, uh, Anthropic

actually solve it?

46

:

Because what they're claiming

that ninety-five percent of their

47

:

business analytics queries are now

automatedly solved by Claude, and

48

:

they're ninety-five percent accuracy,

accurate, um, which is a big claim.

49

:

Like that's basically like,

"Hey, Claude is now our company

50

:

data analyst, essentially."

51

:

Now, I, I will mention here, um, that

the data team can now work on bigger

52

:

and better problems that are like

less sequel monkey questions, right?

53

:

Um, so it's not like they're

getting rid of their data

54

:

analyst or their data scientists.

55

:

It's just you don't have to do as

many ad hoc reportings, and you can

56

:

just focus on more important things.

57

:

And just managing this Claude

infrastructure of creating this

58

:

company-wide, uh, self-serve

analytics platform is a beast, and

59

:

we'll get to that here in a second.

60

:

Um, basically, in this article,

their thesis is data is very

61

:

different than software.

62

:

If you've, you know, heard about Claude

or Codex, um, for programming and software

63

:

engineering, it can do those things

really, really well out of the box.

64

:

Um, because coding has

lots of right answers.

65

:

There's ways to test things.

66

:

There's documentation that goes with code.

67

:

Um, and all those, you know,

infrastructure can basically

68

:

catch hallucinations.

69

:

It's a more solved problem.

70

:

Analytics, it's quite a bit

different because there's only one

71

:

right answer, and you don't really

know what the right answer is.

72

:

There's no way to actually test

what the answer is versus i-in

73

:

programming, you're like, "Does this

box open up if I click the button?"

74

:

You can test that.

75

:

There's no way to know, like if I ask

Claude for the m- you know, the mean

76

:

of our sales over the last month, you

really have to like go actually run the

77

:

query yourself To make sure that Claude's

not giving you, uh, a false answer.

78

:

So, um, their, their argument

is we're not having issues

79

:

coming up with code generation.

80

:

It's basically all of the context

and verification that goes around

81

:

solving a business analytics problem.

82

:

And LLMs historically have been pretty bad

at this, uh, for a multitude of reasons.

83

:

One is that we give it unclear directions.

84

:

I don't know about you guys, but if

you're anything, uh, like me, you don't

85

:

necessarily give Claude or ChatGPT the

most specific instructions on planet

86

:

Earth, and there's some ambiguity.

87

:

And the problem with that is, like,

it can go into the database and, like,

88

:

it thinks it knows what you're talking

about, but it finds a different column,

89

:

or it's not using the same definition.

90

:

You're not basically on the

same page as ChatGPT unless you

91

:

give really explicit directions.

92

:

Number two, there's data staleness,

which basically means that your database

93

:

is constantly changing, uh, over time.

94

:

Definitions change, tables change,

and, uh, these AI LLMs, they're not

95

:

really good at following with that.

96

:

Like, they don't have the business

context, the domain context that you

97

:

may have as a human being on the other

side of like, "This is why we made

98

:

those changes," you know, "This is

why it's better," so on and so forth.

99

:

And then number three is it just doesn't

know where to find the right thing.

100

:

Like, it thinks the data's in there,

it's looking, but it's not entirely sure

101

:

Avery Smith-4: So here's what

Anthropic did to try to solve this

102

:

problem, and they're calling it

Anthropic's Agent Analytics Stack.

103

:

And there's basically four

different stages right here, and

104

:

each one is built to try to take

one of those previous problems

105

:

that we talked about and solve it.

106

:

So the first one is data foundations,

and basically, it just means you

107

:

have really solid data foundations.

108

:

It means you're very clear on what

a table is, what it actually has,

109

:

what a row represents, what a column

represents, and how often it's updated.

110

:

Um, number two is you only have one

source of truth, and the idea is

111

:

if you have a sales table in your

database, you don't have, like,

112

:

another sales table in your database.

113

:

Like, there's only one sales

table, and that is the sales table.

114

:

There are no other sales tables.

115

:

And for some of you guys listening

who might be more junior data analysts

116

:

or aspiring data analysts might be

thinking, "Well, that makes sense.

117

:

Why would it ever be a different case?"

118

:

And the issue is when you get to, like,

large organizations, something like

119

:

Anthropic or when I worked at ExxonMobil,

you gotta think that there's literally

120

:

seventy thousand plus employees, and all

of them might need access to that table,

121

:

and they might need it slightly different.

122

:

So you might have someone that's

like, "Oh, this is their sales

123

:

table, but we only need the weekly

averages," so they create, you know,

124

:

the weekly average sales table.

125

:

And then there's someone else who's like,

"Oh, well, we actually only need the

126

:

sales from Monday, Wednesday and Friday,"

and so they create this other table.

127

:

And basically, you just get a bajillion

versions of really the same table.

128

:

So, uh, one source of truth,

really important here.

129

:

Number three, they develop skills.

130

:

These are like Claude skills for

LLMs that specifically do a repeated

131

:

task with specific instructions and

maybe even some, uh, accompanying

132

:

code to make it really repetitive.

133

:

LLMs have inherent

randomness built into them.

134

:

They are non-deterministic, as in

you don't get the answer every time,

135

:

the same answer every time you ask

the same question, and skills helps

136

:

make it more deterministic, that

there actually is a specific answer.

137

:

This is exactly what you should be doing.

138

:

So it's basically like instructions and

almost code files to actually follow

139

:

every single time this gets asked.

140

:

And the fourth one is validation, and

that is making sure that the LLMs are

141

:

actually doing what you think they

are and validating their answers.

142

:

So let's dive in a little bit deeper.

143

:

So like I said, uh, layers one and layers

two, basically this is just having good

144

:

data governance and good data foundations.

145

:

One source of truth.

146

:

Um, they also make sure that they have

like little, uh, descriptions for each one

147

:

of your different tables that describes

what the table is and what it isn't.

148

:

Uh, you know, LLMs are really good

at reading text, so if you add a

149

:

little bit of text with your tables

that explains what's going on, the

150

:

LLM understands the context a little

bit better versus just looking at the

151

:

rows and the columns and guessing.

152

:

Um, you can think of this as

like a README file for your data.

153

:

In code, in building software, in

software engineering, in programming,

154

:

we've always had README files.

155

:

If you're unfamiliar, a README

file, you can just think of it

156

:

as like a summary of the actual

what's going on in your code base.

157

:

Like all of these different folders,

all these different files, all these

158

:

different code scripts, what's going on.

159

:

So it's just a human way to

describe what's going on for

160

:

your code or your different, you

know, databases in this case.

161

:

And they also feed it

company knowledge maps.

162

:

So for this system, they give it roadmaps,

org charts, decisions, so like a bunch

163

:

of business context that isn't data.

164

:

It's not data related.

165

:

It's all business and domain related,

but that extra information helps the

166

:

LLMs make smarter choices on how to

actually analyzing the da-- how to

167

:

analyze the data based off of what

the, what the context says So they

168

:

actually tried an experiment here,

which I thought was really interesting,

169

:

where they basically took all the data

analysts' and all the data scientists'

170

:

old sequel files, and they said, "Here,

Claude, you know, learn from these.

171

:

These are all, all the things that

our engineers and our analysts and our

172

:

data scientists have done over time.

173

:

Uh, learn from it."

174

:

And it actually didn't really

help, which was really interesting.

175

:

Um, it didn't know what code to use when.

176

:

Um, and they found that there

was a right answer eighty percent

177

:

of the time, but Claude wasn't

good at pulling that answer out.

178

:

And so what's actually been the biggest

skill, uh, uh, I guess the biggest,

179

:

uh, unlock is actually having skills.

180

:

And that went from twenty-one

percent accuracy in actually

181

:

analyzing data to ninety-five

percent accuracy in analyzing data.

182

:

And if you're unfamiliar with,

like, what a Claude skill is, or

183

:

I think they have some equivalent

in ChatGPT and OpenAI and Codex.

184

:

But basically a, an LLM skill,

an AI skill is a reusable

185

:

step-by-step pattern to follow.

186

:

Think of it almost like a recipe for

AI LLM models to actually follow.

187

:

So like I said, majority of the

time they're written kind of like

188

:

a human would write them, and it's

just like, "Hey, AI, do exactly this.

189

:

Step one, step two, step three.

190

:

Look out for this.

191

:

Be aware of this."

192

:

And it might have some coding files

specifically like, "This is what your code

193

:

should look like if you generate code."

194

:

Um, so they…

195

:

It, it's, it's essentially what a

senior analyst's thoughts written down

196

:

on paper, uh, for a specific task.

197

:

So you might have a skill on how

to, you know, create a, a bar, a

198

:

bar chart, or you might have a skill

on how to do a hypothesis test or

199

:

AB testing or something like that.

200

:

And it's basically like you have

your, your team get together and write

201

:

down exactly what the process is.

202

:

It's like a standard operating procedure

that you'd give to a junior analyst,

203

:

"Hey, follow this," except for now the

junior data analyst is Claude or an AI One

204

:

issue they saw was if you don't actually

update these skills, like if you don't

205

:

like constantly add to them and improve

them, that the accuracy slides over time.

206

:

They actually were at ninety-five

percent accuracy, and then they

207

:

jumped down to sixty-five percent

accuracy in only a few weeks.

208

:

Um, so you need to make sure

you're updating your skills.

209

:

And the last thing is they wanted to make

sure that their skills were everywhere.

210

:

So analytics is really changing.

211

:

Uh, and this-- You probably haven't

seen this in big organizations now.

212

:

It's just kind of rolling out to

maybe, you know, these more frontier

213

:

trillion-dollar companies, um, and maybe

like small solopreneurs like, like me.

214

:

Um, but the way that we do

data analytics is changing.

215

:

So obviously, like in the past, you'd use

Excel to do data analytics, and there's

216

:

still literally billions and billions of

Excel files that we will analyze in Excel.

217

:

Uh, but gradually, you know, ten,

fifteen years down the road, I'm

218

:

not sure if that will be the case.

219

:

We will probably be analyzing data

in a different way than we are now.

220

:

And before you're really scared and like,

"Oh my gosh, this is awful, AI's coming

221

:

for my job," well, just think about this.

222

:

Uh, basically, Power BI

came out fifteen years ago.

223

:

So fifteen years ago, there were

like basically no dashboards.

224

:

Tableau was around, but not super popular

at the time, yet it was about to be.

225

:

Uh, about twenty eighteen it

started to get really popular.

226

:

So it's just like, yes, the way that

we analyze data changes over a decade.

227

:

That's the truth.

228

:

Um, and just know that right now we are

moving into, you know, analyzing our data

229

:

with these chatbots, and those chatbots

may be in multiple different places.

230

:

So for example, at my company, um, I try

to analyze data on, you know, my YouTube

231

:

watches or my podcast listens, and I've

been trying to tr- to automate that as

232

:

much as I can or make it easier for me

to follow, you know, all these analytics.

233

:

And so we actually have a bot that

will help me with these analytics where

234

:

I can just ask it natural language

questions like, "How many, uh, views

235

:

did the last YouTube video get?"

236

:

You know, "How many listens

did this podcast episode get?"

237

:

And we can actually do that on a website

that I've built and also in our Slack.

238

:

So they want to make sure that they

have the truth and those-- these skills

239

:

avail-available everywhere, whether

it's, you know, you're coding, whether

240

:

you're using like a website or a

dashboard or whether you're in Slack.

241

:

So those are the keys to having

good skills in your organization.

242

:

And the last thing is, even

if it has a good skill, how

243

:

do you know that it's correct?

244

:

And that's what we call verifications.

245

:

And so what, what Anthropic's doing, what

Claude's doing is for any analytics they

246

:

do, they have the sources in the footer.

247

:

Like this is where we

got this information.

248

:

This is how we calculate it.

249

:

This is the table we used.

250

:

Um, so that way it's like very clear

that you could look at the table and

251

:

be like, "Oh, that is the right table,"

or, "It's not even the right table."

252

:

They also have a freshness

and a version stamp on every

253

:

data model and how old it is.

254

:

So like think about like i- if

your data changes over time.

255

:

They're basically timestamping

everything, so that way you know,

256

:

okay, we can trace it back to this

database on this day type of a thing.

257

:

Uh, they're also doing correction

harvesting, which is a really fancy way

258

:

to say they're giving the AI feedback.

259

:

So every time that this Claude

data analyst gets something wrong,

260

:

the humans are saying, "Hey,

you actually did this wrong.

261

:

You know, you're supposed to

grab from database A, and you

262

:

grabbed it from database B."

263

:

Or maybe you, you know, you

did your query wrong some way.

264

:

And every time that feedback goes

from the human to the agent, the agent

265

:

actually updates itself, and it's

like, "Oh, okay, I'm gonna mark that

266

:

as something to try in the future."

267

:

And the last thing they add is basically

before it gives any answer back to the

268

:

human, they run a second agent against

it that's called an adversarial review.

269

:

And basically, if, if you are the AI

data analyst and you come up with an

270

:

answer and you're like, "The average over

the last, you know, the average revenue

271

:

over the last month was thirty thousand

dollars," this ad-adversarial re-review

272

:

comes in and says, "Is it though?

273

:

Like, does, does that actually make sense?

274

:

Uh, like, it's been this for the last

month and this for the last month.

275

:

Are you a hundred percent sure?"

276

:

Um, it's basically trying to prove

the first agent incorrect before

277

:

actually giving them the model,

the information to the human.

278

:

So that way, it's like almost like a peer

review, a double check from an agent to

279

:

actually make sure that the analytics

is correct So this might be really

280

:

interesting to some of you guys, and this

might be really scary to some of you guys.

281

:

You're like, "Oh my gosh, these

AI agents are coming for my job."

282

:

Well, the first thing I'll tell you

that none of this is actually new.

283

:

It's just kind of packaged

in a fancy prettified way.

284

:

Like, if you literally take AI out of

this, it's just pure data fundamentals,

285

:

things that we've had for decades.

286

:

We've talked about this for years.

287

:

Like, yes, it's good to

have good data quality.

288

:

Yes, it's good to have good data

governance, like to actually know

289

:

what, what tables mean and what

columns mean and what rows mean.

290

:

Yes, we should repeat

our analysis when we can.

291

:

If we can analyze the data in a

uniform way, we should do that.

292

:

And yes, we should have verification.

293

:

Like, if I do an analysis,

someone else should check it

294

:

to make sure it all looks good.

295

:

This is not new.

296

:

It's just AI-fied, essentially.

297

:

The next step is this is actually a ton

of work to do, and really I don't see, you

298

:

know, a whole lot of companies being able

to pull this off bec- other than, like,

299

:

Anthropic, for example, because Anthropic

has literally trillions of dollars.

300

:

Uh, you know, they're growing like crazy.

301

:

They have tons of employees.

302

:

But all that documentation, all that

governance, all that quality, all

303

:

that metric mapping and, you know,

adding all the business information

304

:

to Claude, it takes hundreds of hours.

305

:

It takes so much time.

306

:

Before we even talk about maintenance,

like we talked about how they slipped

307

:

from ninety-five percent accuracy

to sixty-five percent accuracy

308

:

by not maintaining their skills.

309

:

Like, there's so much upfront

work and so much maintenance

310

:

work on this that it's insane.

311

:

I'm not the only person

who actually noticed this.

312

:

Uh, Kristen Lum said, "This work takes

hundreds and hundreds of upfront hours

313

:

at any moderately sized organization, and

that's not even counting maintenance."

314

:

So there is tons of work to be done

even if this is working, even this is

315

:

set up, you know, at normal companies.

316

:

I mean, I'm not Ex-ExxonMobil.

317

:

I haven't been at

ExxonMobil in five years.

318

:

I have no clue where they're at.

319

:

I have no insight.

320

:

A lot of people that I knew

there no longer work there.

321

:

But, like, just like the security

and privacy- concerns that

322

:

Exxon would have about all of

this would take years to solve.

323

:

Not, not even like

implementing and setting it up.

324

:

Maybe that's changed, I don't know.

325

:

But my point is these large

organizations, even ones with

326

:

billions of dollars, this is gonna

be difficult for them to pull off.

327

:

Um, the crazy thing about all this

is they literally just gave this out.

328

:

It's like they literally give you a skill

sheet, um, a skill file that you can

329

:

literally just copy and use for your own

personal analysis, or you can use it on

330

:

your team and organization's analysis.

331

:

Um, I have a little part of it right

here, or you can just go to the

332

:

blog post and find the full file.

333

:

My point here, though, is with all these

jobs are- with all these things that we

334

:

have to be doing for AI to become a good

data analyst, it's like Anthropic's not

335

:

getting rid of the data analyst right now.

336

:

They have four hundred roles open, and

eight of them at least are in data.

337

:

They have four thousand seven hundred

and forty-two employees on, on

338

:

LinkedIn and, uh, one- one thousand

four hundred and seventy-eight of

339

:

them deal with data, and a hundred and

ninety-six of them are data analysts.

340

:

So if this company that has mastered

ninety-five percent accuracy, the AI data

341

:

analyst is still hiring data people, I

think that AI jobs aren't going away.

342

:

Like, this is the company

that if they could get rid

343

:

of humans, they would, right?

344

:

If you've heard the CEO talk about

it, he thinks it's happening,

345

:

and you don't really see that

in their hiring numbers yet.

346

:

Um, my point of view is like this is

literally going to free you up to do

347

:

higher value work, including creating

and maintaining systems like this.

348

:

Like, like I said, like you guys

as data analysts are the people

349

:

best suited for the AI period.

350

:

Like, you guys know numbers, and

if you can compare numbers with

351

:

AI, you're going to be undefeated.

352

:

You're gonna be employed for a really

long time, and just the fact that you're

353

:

listening to this right now tells me

you're one of those people because

354

:

you're interested in data, you're

interested in AI, and if you can really

355

:

carve a niche that's AI plus data, I

think you're gonna land an awesome job.

356

:

I think you're gonna get

promoted to an awesome job.

357

:

I think you're gonna make a lot of money

in your career for a really long time.

358

:

So if you found this fascinating,

my name's Avery Smith.

359

:

Please hit subscribe because I really

want to talk about how data and AI

360

:

intertwine over the next six months,

and I want you to be on this journey.

361

:

I will see you in the next episode.

Links

Chapters

Video

More from YouTube