Artwork for podcast Learning Bayesian Statistics
#96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
Sports Analytics Episode 9628th November 2023 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 00:55:51

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Getting Daniel Lee on the show is a real treat — with 20 years of experience in numeric computation; 10 years creating and working with Stan; 5 years working on pharma-related models, you can ask him virtually anything. And that I did…

From joint models for estimating oncology treatment efficacy to PK/PD models; from data fusion for U.S. Navy applications to baseball and football analytics, as well as common misconceptions or challenges in the Bayesian world — our conversation spans a wide range of topics that I’m sure you’ll appreciate!

Daniel studied Mathematics at MIT and Statistics at Cambridge University, and, when he’s not in front of his computer, is a savvy basketball player and… a hip hop DJ — you actually have his SoundCloud profile in the show notes if you’re curious!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas and Luke Gorrie.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

Abstract

by Christoph Bamberg

Our guest this week, Daniel Lee, is a real Bayesian allrounder and will give us new insights into a lot of Bayesian applications. 

Daniel got introduced to Bayesian stats when trying to estimate the failure rate of satellite dishes as an undergraduate student. He was lucky to be mentored by Bayesian greats like David Spiegelhalter, Andrew Gelman and Bob Carpenter. He also sat in on reading groups at universities where he learned about cutting edge developments - something he would recommend anyone to really dive deep into the matter.

He used all this experience working on Pk/Pd (Pharmacokinetics/ Pharmacodynamics) models. We talk about the challenges in understanding individual responses to drugs based on the speed with which they move through the body. Bayesian statistics allows for incorporating more complexity into those models for more accurate estimation.

Daniel also worked on decision making and information fusing problems for the military, such as identifying a plane as friend or foe through the radar of several ships.

And to add even more diversity to his repertoire, Daniel now also works in the world of sports analytics, another popular topic on our show. We talk about the state of this emerging field and its challenges.

Finally, we cover some STAN news, discuss common problems and misconceptions around Bayesian statistics and how to resolve them.

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Let me show you how to be a good peasy and

change your production.

2

:

Getting Daniel Lee on the show is a real

treat.

3

:

With 20 years of experience in numeric

computation, 10 years creating and working

4

:

with Stan, 5 years working on

pharma-related models, you can ask him

5

:

virtually anything.

6

:

And that I did, my friends.

7

:

From joint models for estimating oncology

treatment efficacy to PKPD models, from

8

:

data fusion for US Navy applications to

baseball and football analytics.

9

:

as well as common misconceptions or

challenges in the Bajan world, our

10

:

conversation spans a wide range of topics

that I am sure you will appreciate.

11

:

Daniel studied mathematics at MIT and

statistics at Cambridge University, and

12

:

when he's not in front of his computer,

he's a savvy basketball player and a

13

:

hip-hop DJ.

14

:

You actually have his Soundcloud profile

in the show notes if you're curious.

15

:

This is Learning Bajan Statistics, episode

96.

16

:

recorded October 12, 2023.

17

:

Hello, my dear patients.

18

:

Some of you may know that I teach

workshops at Pimesy Labs to help you

19

:

jumpstart your basic journey, but

sometimes the fully live version isn't a

20

:

fit for you.

21

:

So we are launching what we call the

Guided Learning Path.

22

:

This is an extensive library of video

courses handpicked from our live

23

:

workshops.

24

:

that unlocks asynchronous learning for

you.

25

:

From A-B testing to Gaussian processes,

from hierarchical models to causal

26

:

inference, you can explore it all at your

own pace, on your own schedule, with

27

:

lifetime access.

28

:

If that sounds like fun and you too want

to become a vision modeler, feel free to

29

:

reach out at alex.andorra at

primec-labs.com.

30

:

And now, let's get nerdy with Daniel Lee.

31

:

Daniel Lee, welcome to Learning Bayesian

Statistics.

32

:

Hello.

33

:

Yeah, thanks a lot for taking the time.

34

:

I'm really happy to have you on the show

because I've followed your work for quite

35

:

a long time now and I've always thought

that it'd be fun to have you on the show.

36

:

And today was the opportunity.

37

:

So thank you so much for taking the time.

38

:

And so let's start writing.

39

:

What are you doing?

40

:

How would you define the work you're doing

nowadays?

41

:

And what are the topics you are

particularly interested in?

42

:

Yeah, so I just joined Zealous Analytics

recently.

43

:

They're a company that does sports

analytics, mostly for professional teams.

44

:

Although they're expanding to amateur

college teams as well.

45

:

And what I get to do is...

46

:

look at data and try to project how well

players are going to do in the future.

47

:

That's the bulk of what I'm focused on

right now.

48

:

That sounds like fun.

49

:

Were you already a sports fan or is it

that mainly you're a good modeler and that

50

:

was a fun opportunity that presented

itself?

51

:

Yeah, I think both are true.

52

:

I grew up playing a lot of basketball.

53

:

I coached a little bit of basketball.

54

:

Um, yeah.

55

:

So I feel like I know the subject matter

of basketball pretty well.

56

:

The other sports I know very little about,

but, um, uh, you know, combine that with

57

:

being able to model data.

58

:

It's actually a really cool opportunity.

59

:

Yeah.

60

:

And actually, how did you end up doing

what you're doing today?

61

:

Because.

62

:

I know you've got a very, very senior

path.

63

:

So I'm really interested also in your kind

of origin story because, well, that's an

64

:

interesting one.

65

:

So how did you end up doing what you're

doing today?

66

:

Yeah.

67

:

So sports ended up happening because I

don't know, it actually started through

68

:

stand.

69

:

I didn't really have...

70

:

an idea that I'd be working in sports

full-time professionally until this

71

:

opportunity presented itself.

72

:

And what ended up happening was I met the

founders of Zealous Analytics

73

:

independently about a decade ago and the

company didn't start till:

74

:

So, you know, met them.

75

:

Luke was at Harvard.

76

:

Dan was at NYU and Doug at the time was

going to the Dodgers.

77

:

And I talked to them independently about

different things and, you know, fast

78

:

forward about 10 years and I happened to

be free.

79

:

This opportunity came up.

80

:

They're using Stan inside.

81

:

They're using a bunch of other stuff too,

but it was a good time.

82

:

And do you remember how you first got

introduced to Bayesian methods and also

83

:

why they stuck with you?

84

:

Yeah.

85

:

So there are actually two different times

that I got introduced to Bayesian methods.

86

:

The first was I was working in San Diego.

87

:

This is after my undergraduate degree.

88

:

We were working on trying to estimate when

hardware would fail and we're talking

89

:

about modems and things that go with

satellite dishes.

90

:

So they happen to be somewhere that's hard

to

91

:

spread across and when one of those pieces

go down, it's actually very costly to

92

:

repair, especially when you don't have a

part available.

93

:

So we started using graphical models and

using something called Weka to build

94

:

graphical models and do Bayesian

computation.

95

:

This was all done using graphical models

and it was all discrete.

96

:

That was the first time I got introduced

to Bayesian statistics.

97

:

It was very simple at the time.

98

:

What ended up happening after that was I

went to grad school at Cambridge, did part

99

:

three mathematics and ended up taking all

the stats courses.

100

:

And that's where I really saw Bayesian

statistics, learned MCMC, learned how bugs

101

:

was built using the graphical models and

conjugacy.

102

:

And then...

103

:

Yeah, so that was the real introduction to

Bayesian modeling.

104

:

Yeah.

105

:

And actually I'm curious because,

especially in any content basically where

106

:

we talk about, so how do you end up doing

what you're doing and stuff like that,

107

:

there is kind of a hindsight

108

:

it looks obvious how you ended up doing

what you're doing.

109

:

And that almost seems easy.

110

:

But I mean, at least in my case, that

wasn't, you know, it's like you always

111

:

have obstacles along the way and so on,

which is not necessarily negative, right?

112

:

We have that really good saying in French

that says basically, what's the obstacle,

113

:

the obstacles in front of you makes you

114

:

grow, basically.

115

:

It's a very hard thing to translate, but

basically that's the substance.

116

:

So yeah, I'm just curious about your own

path.

117

:

How senior was it to get to where you are

right now?

118

:

I've always believed in learning from

failures or learning from experiences

119

:

where you don't succeed.

120

:

That's where you gain the most knowledge.

121

:

That's where you get to learn where your

boundary is.

122

:

If you want to know about the path to how

I became where I'm at now, let's see.

123

:

I guess I could go all the way back to

high school.

124

:

I grew up just outside of Los Angeles.

125

:

In high school...

126

:

I had a wonderful advisor named Sanzha

Kazadi.

127

:

He was a PhD student at Caltech and he ran

a research program for high school kids to

128

:

do basic research.

129

:

So starting there, I learned to code and

was working on the traveling salesman

130

:

problem.

131

:

From there, I went to MIT, talking about

failures.

132

:

I tried to be a physics major going in.

133

:

I failed physics three times in the first

year, so I couldn't.

134

:

I ended up being a math major.

135

:

And it was math with computer science, so

it was really close to a theoretical

136

:

computer science degree, doing some

operational research as well.

137

:

At the end of MIT, I wasn't doing so well

in school.

138

:

I was trying to apply to grad school, and

that wasn't happening.

139

:

Got a job in San Diego.

140

:

MIT alum hired me.

141

:

That's where I started working for three

and a half years in software, a little bit

142

:

of computation.

143

:

So a lot of it was translating algorithms

to production software, working on

144

:

algorithms and went through a couple of

companies with the same crew, but we just

145

:

kind of bounced around a little bit.

146

:

At the end of that, I ended up going back

to Cambridge for...

147

:

a one year program called part three

mathematics.

148

:

It's also a master's degree.

149

:

I got there not knowing anything about

Cambridge.

150

:

I didn't do enough research, obviously.

151

:

For the American viewers, people, the

system is completely different.

152

:

There's no midterms, no nothing.

153

:

You have three trimesters.

154

:

You take classes in each of them and you

take two weeks of exams at the end.

155

:

And that determines your fate.

156

:

And, um, I got to Cambridge and I couldn't

even understand anything in the syllabus

157

:

other than the stuff in statistics.

158

:

Mind you, I hadn't done an integral in

three years, right?

159

:

Integral derivative.

160

:

I didn't know what the normal distribution

was.

161

:

And I go to Cambridge.

162

:

Those are the only things I can read.

163

:

So I'm teaching myself.

164

:

Um,

165

:

measure theory while learning all these

new things that I've never seen and

166

:

managed to squeak out passing.

167

:

So happy.

168

:

At the end of that, I asked David

Spiegelhalter, who happened to just come

169

:

back to Cambridge, that was his first year

back in the stats department, who I should

170

:

talk to.

171

:

This is, so when I say I learned bugs,

he's, he had a course on

172

:

applied Beijing statistics, which was

taught in wind bugs.

173

:

And he would literally show us which

buttons to click and in which order, in

174

:

order for it not to crash.

175

:

So that was fun.

176

:

But he told me, he told me I should talk

to Andrew Gelman.

177

:

Um, so I ended up, uh, talking to Andrew

and working with Andrew from:

178

:

to 2016 and that's how I really got into

Beijing stats.

179

:

Um,

180

:

After Cambridge, I knew theory.

181

:

I hadn't seen any data.

182

:

Working for Andrew, I saw a bunch of data

and actually how to really work with data.

183

:

Since then I've run a startup.

184

:

We try to take Stan.

185

:

So Stan's an open source probabilistic

programming language.

186

:

In 2017, a few of us thought there was a

good opportunity for making a business

187

:

around it.

188

:

very much like time C labs.

189

:

And, you know, we try to make a horizontal

platform for it.

190

:

And at that time, there wasn't enough

demand.

191

:

So we pivoted and ended up estimating

models for writing very complicated models

192

:

and estimating things for the farm

industry.

193

:

And then since then I've like I left the

company in:

194

:

I consulted for a bit, just random

projects, and then ended up with Celus.

195

:

So that's how I got to today.

196

:

Yeah.

197

:

Man.

198

:

Yeah, thanks a lot for that exhaustive

summary, I'd say, because that really

199

:

shows how random usually paths are, right?

200

:

And I find that really inspiring also for

people who are a bit upstream in their

201

:

carrier path.

202

:

could be looking at you as a role model

and could be intimidated by thinking that

203

:

you had everything figured out from when

you were 18 years old, right?

204

:

Just getting out of high school, which was

not the case from what I understand.

205

:

And that's really reassuring and

inspiring, I think, for a lot of people.

206

:

Yeah, definitely not.

207

:

I could tell you going to career fairs at

the end of my undergraduate degree,

208

:

people will look at my math degree and not

even really look at my resume.

209

:

Because my GPA was low, my grades were bad

as a student, and also, who needs a bad

210

:

mathematician?

211

:

That makes no sense anywhere.

212

:

So that limited what I was doing, but at

the end it all worked out.

213

:

Yeah, yeah, yeah.

214

:

Now you made an agreement in a way, our

path, our seminar, except for me, that was

215

:

a GPA in business school.

216

:

So business school and political science.

217

:

Political science, I did have decent

grades.

218

:

Business school, it really depended on

what the course was about.

219

:

Because when I was not interested in the

course, yeah, that showed.

220

:

For sure, that showed in the GPA.

221

:

But yeah, and I find that also super

interesting because in your path, there is

222

:

also so many amazing

223

:

people you've met along the way and that

it seems like these people were also your

224

:

mentors at some point.

225

:

So Yeah, do you want to talk a bit more

about that?

226

:

Yeah, I've um, I've been really fortunate

You know as I was going through so Not you

227

:

know, I haven't had very many formal

mentors that were great and by that I mean

228

:

like

229

:

advisors that were assigned to me through

schools.

230

:

They tend to see what I do and discount my

abilities because of my inability to do

231

:

really well at school.

232

:

So that's what it is.

233

:

But there were a bunch of people that

really did sort of shape my career.

234

:

The, you know, working for Andrew Gelman

was great.

235

:

He's, um, he trusted me.

236

:

Like he, for me, he was a really, he

trusted me with a lot.

237

:

Right.

238

:

So he's, he was able to, um, just set me

loose on a couple of problems to start.

239

:

And he never micromanages.

240

:

So he just let me go for some that's a

really difficult place to be, um, without

241

:

having guidance in a difficult problem.

242

:

But.

243

:

For someone like me, that was absolutely

fine and encouraging.

244

:

You know, and working with Andrew and I

worked really closely with Bob Carpenter

245

:

for a long time and that was really great

because he has such a depth of knowledge

246

:

and also humility that, I don't know,

it's, it's fun working with Bob.

247

:

Some of the other times that I've really

gotten to grow in my career, we're sitting

248

:

in on some amazing reading groups.

249

:

So there are two that come to mind.

250

:

At Columbia, Dave Bly runs a reading group

for his group and got to sit in.

251

:

And those are phenomenal because they

actually go deep into papers and really,

252

:

really get at the content of the paper,

what it's doing, what the research is.

253

:

trying to infer what's going on, where the

research is going next.

254

:

But that really helped expand my horizon

for things that I wasn't seeing while

255

:

working in Andrew's group.

256

:

So it was just, you know, much more

machine learning oriented.

257

:

And in a similar vein at Cambridge, I was

able to sit in on Zubin Karamani's group.

258

:

Don't know why he let me, but he let me

just sit in.

259

:

I was group reading groups and

260

:

He had a lot of good people there at the

time.

261

:

That was when Carl Rasmussen was there

working on his book.

262

:

Um, David Knowles, uh, I don't know who

else, but just sitting there reading about

263

:

these papers, reading these techniques,

people presenting their own work inside

264

:

the reading group.

265

:

Um, yeah, my encouragement would be if you

have a chance to go sit in on reading

266

:

groups, go join them.

267

:

It's actually a good way, especially if

it's not in your.

268

:

area of focus.

269

:

It's a good way to learn and make

connections to literature that otherwise

270

:

would be very hard to read on your own.

271

:

Yeah, I mean, completely agree with that.

272

:

And yeah, it feels like a dream team of

mentors you've had.

273

:

I'm really jealous.

274

:

Like David Spiegelhalter, Andrew Gellman,

Bob Carpenter, all those people.

275

:

It's absolutely amazing.

276

:

And I've had the chance of interviewing

them on the podcast.

277

:

So I will definitely link to those

episodes in the show notes.

278

:

And yeah, completely agree.

279

:

Today, I would definitely try and do it

with Andrew, because I've talked with him

280

:

quite a lot already.

281

:

And yeah, it's really inspiring.

282

:

And that's really awesome.

283

:

And yeah, I completely agree that in

general, that's something that I'm trying

284

:

to do.

285

:

And that's also where I started the

podcast in a way.

286

:

Surrounding yourself with smarter people

than you is usually a good thing.

287

:

good way to go.

288

:

And definitely me, I've had the chance

also to have some really amazing mentors

289

:

along my way.

290

:

People like Ravin Kumar, Thomas Vicky,

Osvaldo Martin, Colin Carroll, Austin

291

:

Rushford.

292

:

Well, Andrew Ganneman also with everything

he's produced.

293

:

And yeah, Adrian Zabolt also absolutely

brilliant.

294

:

Luciano Paz.

295

:

All these people basically in the times

he...

296

:

world who helped me when I was really

starting and not even knowing about Git

297

:

and taking a bit of their free time to

review my PRs and help me along the way.

298

:

That's just really incredible.

299

:

So yeah, what I encourage people to do

when they really start in that domain is

300

:

much more than trying to find a...

301

:

an internship that shines on us, trying to

really find a community where you'll be

302

:

surrounded by smart and generous people.

303

:

That's usually going to help you much more

than a name on the CV.

304

:

Absolutely.

305

:

And so actually, I'd like to talk a bit

about some of the Pharma-related models

306

:

you've worked on.

307

:

You've worked on so many topics.

308

:

It's really hard to interview you.

309

:

But a kind of model I'm really curious

about, also because we work on that at

310

:

labs from time to time, is farmer-related

models.

311

:

And in particular, can you explain how

Bayesian methods are used in estimating

312

:

the efficacy of oncology treatments?

313

:

And also, what are PKPD models?

314

:

Yeah, let's start with PKPD models.

315

:

So PKPD stands for pharmacometric

pharmacodynamic models.

316

:

And these models, the pharmacokinetics

describe, so we take drug and it goes into

317

:

the body.

318

:

You can model that using, you know, you

know how much drug goes in the body.

319

:

And then at some point it has to exit the

body through.

320

:

absorption through something, right?

321

:

So your liver can take it out.

322

:

It'll go into your bloodstream, whatever.

323

:

That's the kinetics part.

324

:

You know that the drug went in and it

comes out.

325

:

So you can measure the blood at different

times.

326

:

You can measure different parts of the

body to get an estimate of how much is

327

:

left.

328

:

You can estimate how that works.

329

:

The pharmacodynamic part is the more

difficult part.

330

:

So each person responds differently to the

drug depending on what's inside the drug

331

:

and how much concentration is in the body.

332

:

You and I could take the same dose of

ibuprofen and we're going to ask each

333

:

other how you feel and that number is, I

don't know, is it on a scale of 1 to 10?

334

:

You might be saying a 3, I might be saying

a 4 just based on what we feel.

335

:

There are other measurements there that...

336

:

sometimes you can measure that's more

directly tied to the mechanism, but most

337

:

of the time it's a few hops away from the

actual drug entering the bloodstream.

338

:

So the whole point of pharmacokinetic,

pharmacodynamic modeling is just measuring

339

:

drug goes in, drug goes out, what's the

effect.

340

:

trials and in design of how much dose to

give people.

341

:

So if you give someone double the dosage,

are they actually gonna feel better?

342

:

Is the level of drug gonna be too high

such that there are side effects, so on

343

:

and so forth.

344

:

The way Bayesian methods play out here is

that if we, you know, just

345

:

really broad step.

346

:

If you take a step back, the last

generation of models, assume that everyone

347

:

came from, you were trying to estimate a

population mean for all these things.

348

:

So you're trying to take individuals and

individual responses and try to get the

349

:

mean parameters of a, usually a

parameterized model of how the kinetics

350

:

works and then the dynamics works.

351

:

it'd be better if we had hierarchical

models that assumed that there was a, you

352

:

know, a mean but each person's individual

and that could describe the dynamics for

353

:

each person a little better than it can

for just using plugging in the overall.

354

:

So to do that, you kind of ended up

needing Bayesian models.

355

:

But on top of that, the other reason why

Bayesian models are really popular for

356

:

this stuff right now is that...

357

:

The people that study these models have a

lot of expertise in how the body works and

358

:

how the drugs work.

359

:

And so they've been wanting to incorporate

more and more complexity into the models,

360

:

which is very difficult to do inside the

setting of certain packages that limit the

361

:

flexibility.

362

:

There's a lot of flexibility that you can

put in, but there's always a limit.

363

:

to that flexibility.

364

:

And that's where Stan and other tools like

PyMC are coming into play now, not just

365

:

for the Bayesian estimates, but really for

the ability to create models that are more

366

:

complex.

367

:

And that are generative in particular?

368

:

These are, because people are trying to

really understand

369

:

for these types of studies, they're trying

to understand what happens.

370

:

Like, what's the best dosage to give

people?

371

:

Should it be scaled based on the size of

the human?

372

:

What happens?

373

:

You know, it's a lot of what happens.

374

:

Can you characterize what's going to

happen if you give it to a larger

375

:

population?

376

:

You know, you've seen some variability

inside the smaller trial.

377

:

What happens next?

378

:

Yeah, fascinating.

379

:

And so it seems to me that it's kind of a

really great use case for patient stats,

380

:

right?

381

:

Because, I mean, you really need a lot of

domain knowledge here.

382

:

You want that in the model.

383

:

You probably also have good ideas of

priors and so on.

384

:

But I'm wondering what are the main

385

:

challenges when you work on that kind of

model?

386

:

The main challenges, I think, some of the

challenges have to do with at least when I

387

:

was working there.

388

:

So mind you, I didn't work directly for a

pharma company.

389

:

We had a startup where we were building

these models and selling to pharma.

390

:

One of the issues is that there's a lot of

historic...

391

:

very good reasons for using older tools.

392

:

They don't move as fast, right?

393

:

So you've got regulators, you've got

people trying to be very careful and

394

:

conservative.

395

:

So trying out new methods on the same

data, if it doesn't produce results that

396

:

they're used to, it's a little harder to

do there than it is, let's say in sports,

397

:

right?

398

:

In sports, no one's gonna die if I predict

something wrong next year.

399

:

If you use a model that's completely

incompatible with the data in pharma and

400

:

it gives you bad results, bad things do

happen sometimes.

401

:

So anyway, things move a little slower.

402

:

The other thing is that most people are

not trained in understanding Bayesian

403

:

stats yet.

404

:

You know, I do think that there's a

difference...

405

:

in understanding Bayesian statistics from

a theoretic, like on paper point of view,

406

:

and actually being a pragmatic modeler of

data.

407

:

Um, and right now I think there's a

turning point, right?

408

:

I think the world is catching up and the

ability to model is spreading, uh, a lot

409

:

wider and the, um,

410

:

So anyway, I think that's part of that is

happening in farm as well.

411

:

Yeah, yeah, for sure.

412

:

Yeah, these kind of models, I really find

them fascinating because they are both

413

:

quite intricate and complicated from a

statistical standpoint.

414

:

So you really learn a lot when you work on

them.

415

:

And at the same time, they are extremely

useful and helpful.

416

:

And usually, they are about extremely

fascinating projects that have a deep

417

:

impact.

418

:

on people, basically it's helping directly

people who I find them absolutely

419

:

fascinating.

420

:

I mean, I can tell you that specifically,

the place where I had difficulty working

421

:

in PTA-PD models was that I didn't

understand the biology enough.

422

:

So there are these terms, these constants,

these rate constants that describe

423

:

elimination of the drug through the liver.

424

:

And because I don't

425

:

don't know biology well enough, I don't

know what's a reasonable range.

426

:

And, you know, people that study the

biology know this off the back, off the

427

:

top of their head because they've studied

the body, but they can't, you know, most

428

:

aren't able to work with a system like

STAND well enough to write the model down.

429

:

And it's that mismatch that makes it

really tough because then, you know,

430

:

there's...

431

:

Some in some of the conversations we had

in that world, it's, you know, why aren't

432

:

you using a Jefferies prior?

433

:

Why aren't you using a non-informative

prior?

434

:

But on the flip side, it's like, if that

rate constant is 10 million, is that

435

:

reasonable?

436

:

No, it's not.

437

:

It has to be between like zero and one.

438

:

So we should be, you know, like for me,

it's if we put priors there, that limit

439

:

that, that makes the modeling side of it a

lot easier, but you know, as someone that

440

:

didn't understand the biology well enough

to make those claims, it made the modeling

441

:

much, much more difficult and harder to

explain as well.

442

:

Yeah, yeah, yeah.

443

:

Yeah, definitely.

444

:

And the biology of those models is

absolutely fascinating, but really, really

445

:

intriguing.

446

:

And also, you've also worked on something

that's called data fusion for US Navy

447

:

applications.

448

:

So that sounds very mysterious.

449

:

How did Bayesian statistics contribute to

these projects?

450

:

And what were some of the challenges you

faced?

451

:

Unfortunately, I didn't know Bayesian

stats at the time.

452

:

This was when I first started working.

453

:

But, you know, data fusion's actually...

454

:

We should have used Bayesian stats.

455

:

If I was working on a problem now, it

should be done with Bayesian stats.

456

:

The...

457

:

Just the problem in a nutshell, if you

imagine you have an aircraft carrier, it

458

:

can't move very fast, and what it has is

about a dozen ships around it.

459

:

All of them have radars.

460

:

All of them point at the same thing.

461

:

If you're sitting on the aircraft carrier

trying to make decisions about what's

462

:

coming at you, what to do next.

463

:

If there's a single plane coming at you,

that's one thing.

464

:

If all the 12 ships around you, you know,

hit that same thing with the radar and it

465

:

says that there are 12 things coming at

you because things are slightly jittered,

466

:

that's bad news, right?

467

:

So, you know, if they're not identifying

themselves.

468

:

So the whole problem is, is there enough

information there where you can...

469

:

accurately depict what's happening based

on multiple pieces of data.

470

:

Hmm.

471

:

Okay.

472

:

Yeah, that sounds pretty fun.

473

:

And indeed, yeah, lots of uncertainty.

474

:

So, and I'm guessing you don't have a lot

of data.

475

:

And also, it's the kind of experiments you

cannot really remake and remake.

476

:

So, your patient stats would be helpful

here, I'm guessing.

477

:

Yeah, it's, it's always the edge cases

that are tough, right?

478

:

It's, if the, if the, if the plane or the

ship that's coming at you,

479

:

says who they are, identifies themselves,

and follows normal protocol.

480

:

It's an easy problem, like you have the

identifier, but it's when that stuff's

481

:

latent, right?

482

:

People hide it intentionally.

483

:

Then you have to worry about what's going

on.

484

:

The really cool thing there was a guy I

worked for, Clay Stannick, had come up

485

:

with a way to

486

:

of each of the radar pictures and just

stack them on top of each other.

487

:

If you do that, if you see a high

intensity, then it means that the pictures

488

:

overlap.

489

:

And if there's no high intensity, then it

means the pictures don't overlap.

490

:

And the nice thing is that that's rotation

invariant.

491

:

So it really just helps with the alignment

problem because everyone's looking at the

492

:

same picture from different angles.

493

:

Yeah, yeah, it's super interesting also.

494

:

I love that.

495

:

And you haven't had the opportunity to

work again on that kind of models now that

496

:

you're an Asian expert?

497

:

No.

498

:

Well, you've heard it, folks.

499

:

If you have some model like that who are

entertaining you, feel free to contact him

500

:

or me, and I will contact him for you if

you want.

501

:

So actually.

502

:

I'm curious, you know, in general, because

you've worked with so many people and in

503

:

so many different fields.

504

:

I wonder if you picked up some common

misconceptions or challenges that people

505

:

face when they try to apply vision stats

to real world problems and how you think

506

:

we can overcome them.

507

:

Yeah, that's an interesting question.

508

:

I think working with Dan, well, yeah, I

think the common error is that we don't

509

:

build our models complex enough.

510

:

They don't describe the phenomenon well

enough to really explain the data.

511

:

And I think that's where, that's the most

common problem that we have.

512

:

Yeah, the thing that I use the most, that

I get the most mileage out of is actually

513

:

putting on either a measurement model or

just adding a little more complexity to

514

:

model and it starts working way better.

515

:

In pharmacometrics specifically, I

remember we started asking, how do you

516

:

collect the data?

517

:

What sort of ways is the measurement

wrong?

518

:

And we just modeled that piece and put it

into the same

519

:

parametric forms of the model and

everything started fitting correctly.

520

:

It's like, cool, I should do that more

often.

521

:

So yeah, I think if I was to think about

that, that's sort of the thing.

522

:

The other thing is, I guess people try to

apply Bayesian stats, Bayesian models to

523

:

everything, and it's not always

applicable.

524

:

I don't know if you're actually going to

be able to fit a true LLM using MCMC.

525

:

Like I think that'd be very, very

difficult.

526

:

Um, so it's okay to not be Bayesian for

that stuff.

527

:

Yeah.

528

:

So that's interesting.

529

:

So nothing about priors or about model

fitting or about model time sampling of

530

:

the models.

531

:

No, I mean, they're all related, right?

532

:

The worst the model fits.

533

:

So when a model doesn't actually match the

data, at least running in Stan, it tends

534

:

to.

535

:

overinflate the amount of time it takes,

the diagnostics look bad.

536

:

A lot of things get fixed once you start

putting in the right level of complexity

537

:

to match the data.

538

:

But you know, that's yeah.

539

:

I mean, is it MCMC is definitely slower

than running optimization?

540

:

That's true.

541

:

Yeah.

542

:

No, for sure.

543

:

Yeah, I'm asking because as I'm teaching a

lot, these are recurring themes.

544

:

I mean, it really depends where people are

coming from.

545

:

But you have recurring themes where that

can be kind of a difficulty for people.

546

:

Something I've seen that's pretty common

is understanding the different types of

547

:

distributions.

548

:

So prior predictive samples and prior

samples, how do they differ?

549

:

Posterior samples, post-hereditary

samples, what's the difference between all

550

:

of that?

551

:

That's definitely a topic of complexity

that can trigger some difficulty for

552

:

people.

553

:

And I mean, I think that's quite normal.

554

:

I remember personally, it took me a few

months to really understand that stuff

555

:

when I started learning Baystance.

556

:

And now with my educational content,

557

:

decrease that time for people so that they

maybe make the same mistakes as me, but

558

:

they realize it's faster than I did.

559

:

That's kind of the objective.

560

:

Yeah, that's really good.

561

:

So what other things do you see that

people are struggling with?

562

:

Or do you have, you know, what are some of

the common themes right now?

563

:

I mean, priors a lot.

564

:

priors is extremely common, especially if

people come from the classic machine

565

:

learning framework, where it's really hard

for them to choose a prior.

566

:

And actually something I've noticed is two

ways of thinking about them that allows

567

:

them to kind of be less anxious about

choosing a prior.

568

:

which is one, making them realize that

having flat priors doesn't mean not having

569

:

priors.

570

:

And so the fact that they were using flat

priors before by default in a class

571

:

equalization regression, for instance,

that's a prior.

572

:

That's already an assumption.

573

:

So why would you be less comfortable

making another assumption, especially if

574

:

it's more warranted in that case?

575

:

So.

576

:

Basically trying to see these idea of

priors along a slider, you know, a

577

:

gradient where you would have like the

extreme left would be the completely flat

578

:

priors, which lead to a completely overfit

model that has a lot of variance in the

579

:

predictions.

580

:

And then at the other end of the slider,

extreme right would be the completely

581

:

biased model where your priors would

basically be, you know, either a point or

582

:

completely outside of

583

:

the realm of the data and then you cannot

update, basically.

584

:

But that would be a completely underfit

model.

585

:

So in a way, the priors are here to allow

you to navigate that slider.

586

:

And why would you always want to be to the

extreme left of the slider, right?

587

:

Because in the end, you're already making

a choice.

588

:

So why not thinking a bit more

exhaustively and clearly about the choice,

589

:

explicitly about the choices you're

making.

590

:

Yeah, that already usually helps them to

make them feel less guilty about choosing

591

:

prior.

592

:

So that's interesting.

593

:

Yeah, absolutely.

594

:

And so to go on that point a little bit,

that's what I'm trying to say with the

595

:

complexity of the model.

596

:

It's like, if we just assume normal things

a lot of times, but sometimes things

597

:

aren't normal.

598

:

There's more variance than normal.

599

:

So.

600

:

making something a t-distribution

sometimes fixes it.

601

:

Just understanding the prior predictive,

the posterior, the posterior predictive

602

:

draws also summarizing those, looking at

the data really helps.

603

:

One thing that I think for anyone trying

to do models in production, one thing to

604

:

know is that

605

:

models, the programs that you write,

either in PyMC or Stan, the quality of the

606

:

fit is not just the program itself, it's

the program plus the data.

607

:

If you swap out the data and it has

different properties than the one that you

608

:

trained it on before, it might actually

have worse properties or better

609

:

properties.

610

:

And we can see this with like non-centered

parameterization and different variance

611

:

components being estimated in weird ways.

612

:

if you just blindly assume that you can go

and take your model that fit on one data

613

:

and just blindly productionize it.

614

:

It doesn't quite work that way yet,

unfortunately.

615

:

Yeah, yeah, yeah.

616

:

For sure.

617

:

And also, another prompt that I use to

help them understand a bit more,

618

:

basically, why we're using...

619

:

generative models and why that means

making assumptions and how to make them

620

:

and being more comfortable making

assumptions is, well, imagine that you had

621

:

to bet on every decision that your model

is making.

622

:

Wouldn't you want to use all the

information you have at your fingertips,

623

:

especially with the internet now?

624

:

It's not that hard to find some

information about the parameters of any

625

:

model you're working on and find a

pattern.

626

:

somewhat informed prior because you don't

need, you know, there is no best prior so

627

:

you don't need the perfect prior because

it's a prior, you have the data so it's

628

:

going to be updated anyways and if you

have a lot of data it's going to be washed

629

:

out so but you know if you had to bet on

any decision you're making or that your

630

:

model is making wouldn't you want you to

use

631

:

all the information you have available

instead of just throwing your hands in the

632

:

air and being like, oh, no, I don't know

anything, so I'm going to use flat priors

633

:

everywhere.

634

:

You really don't know anything?

635

:

Have you searched on Google?

636

:

It's not that far.

637

:

So yeah, that usually also helps when you

frame it in the context of basically

638

:

decision-making with an incentive, which

here would be money.

639

:

betting for your life, then, well, it

would make sense, right, to use any bit of

640

:

information that you can put your hands

on.

641

:

So why won't you do it here?

642

:

Actually, I'm curious with your extensive

experience in the modeling world, do you

643

:

have any advice you would give to someone

looking to start a career in computational

644

:

Bayesian stats or data science in general?

645

:

Yeah, my, my advice would probably to go

try to go deeper in one subject or not one

646

:

subject, go deeper in one dimension than

you're comfortable going.

647

:

If you want to get into like actually

building out tools, go deep, understand

648

:

how PyMC works, understand how Stan works,

try to

649

:

actually submit pull requests and figure

out how things are done.

650

:

If you want to get into modeling, go start

understanding what the data is.

651

:

Go deep.

652

:

Don't just stop at, you know, I have data

in a database.

653

:

Go ask how it's collected.

654

:

Figure out what the chain actually is to

get the data to where it is.

655

:

Going deep in that way, I think, is going

to get you pretty far.

656

:

It'll give you a better understanding of

how certain things are.

657

:

You never know when that knowledge

actually comes into play and will help

658

:

you.

659

:

But a lot of the...

660

:

Yeah, that would be my advice.

661

:

Just go deeper than maybe your peers or

maybe people ask you to.

662

:

Yeah, that's a really good point.

663

:

Yeah, I love it and that's true that I was

thinking, you know, in the people around

664

:

me, usually, yeah, it's that kind of

people who stick to it with that passion,

665

:

who are in the place they want it to be at

because, well, they also have that passion

666

:

to start with.

667

:

That's really important.

668

:

I remember someone recently asked me like,

should they focus on machine learning,

669

:

Beijing stats, is Beijing stats going to

go away, is AI taking over?

670

:

And my answer to that, I think was pretty

much along the lines of go and learn any

671

:

of them really well.

672

:

If you don't learn any of them really

well, then you'll just be following

673

:

different things and be bouncing back and

forth and you'll miss everything.

674

:

But if you...

675

:

end up like Bayesian stats has been around

for a while and I don't think it's going

676

:

to go away.

677

:

But if you bounce from Bayesian stats, try

to go to ML, try to go to deep learning

678

:

without actually really investing enough

time into any of those, when it comes down

679

:

to having a career in this stuff, you're

going to find yourself like a little short

680

:

of expertise to distinguish yourself from

other people.

681

:

So that, you know, that's...

682

:

That's where this advice mentality is

coming from.

683

:

Especially just starting out.

684

:

I mean, there's so many things to look at

right now that, you know, it's, it's hard

685

:

to keep track of everything.

686

:

Yeah, no, for sure.

687

:

That's definitely a good point, too.

688

:

And actually, in your opinion, currently,

what are the main sticking points in the

689

:

Bayesian workflow that you think we can

improve?

690

:

All of us in the community of

probabilistic programming languages, core

691

:

developers, Stan, IMC, and all those PPLs,

what do you think are those sticking

692

:

points?

693

:

would benefit from some love from all of

us?

694

:

Oh, that's a good question.

695

:

You know, in terms of the workflow, I

think just usability can get better.

696

:

We can, we can do a lot more from that.

697

:

Um, with that said, it's, it's hard.

698

:

Like the tools that we're talking about

are pretty niche.

699

:

And so it's, it's not like there are, um,

millions and millions of users of our

700

:

techniques, so it's, you know, the, it's

just hard to do that.

701

:

Um, but you know, the, the thing that I

run into a lot are transformations of prom

702

:

and I really wish that we end up with, um,

reparameterizations of problems

703

:

automatically such that it fits well with

the method that you choose.

704

:

Um, if we could do that, then life would

be good, but, uh, you know, I think that's

705

:

a hard problem to tackle.

706

:

Yeah, I mean, for sure.

707

:

Because, and that's also something I've

started to look into and hopefully in the

708

:

coming weeks, I'll be able to look into it

for our Prime C.

709

:

Precisely, I was talking about that with

Ricardo Viera, where we were thinking of,

710

:

you know, having user wrapper classes on

some, on some distributions, you know.

711

:

normal beta-gap with the classic

reparameterization, where instead of

712

:

letting the users, I mean, making the

users have to reparameterize by hand

713

:

themselves, you could just ask Climacy to

do pm.normal non-centered, for instance,

714

:

and do that for you.

715

:

In other words, that'd be really cool.

716

:

So of course, these are always...

717

:

bigger PRs than you suspect when you start

working on them.

718

:

But that definitely would be a fun one.

719

:

So, and then that'd be a fun project I'd

like to work on in the coming weeks.

720

:

But we'll see how that goes with open

source.

721

:

That's always very dependent on how much

work you have to do before to actually pay

722

:

your rent and then see how much time you

can afford to dedicate to

723

:

open source, but hopefully I'll be able to

make that happen and that'd be definitely

724

:

super fun.

725

:

And actually talking about the future

developments, I'm always curious about

726

:

Stan.

727

:

What do you folks have on your roadmap,

especially some exciting developments that

728

:

you've seen in the works for the future of

Stan?

729

:

So I actually haven't, I don't know what's

coming up on the roadmap too much.

730

:

Lately, I've been focused on working on my

new job and so that's good.

731

:

But a couple of the interesting things are

Pathfinder just made it in.

732

:

It's a new VI algorithm, which I believe

addresses some of the difficulties with

733

:

ADVI.

734

:

So that should be interesting.

735

:

And finally tuples should land if it

hasn't already landed inside the scan

736

:

language.

737

:

So that means that you can return from a

function multiple returns, which should be

738

:

better for efficiency in writing.

739

:

things down in the language.

740

:

Other than that, it's like, you know,

there's always activity around new

741

:

functionality in Stan and making things

faster.

742

:

And the, you know, interface, the work on

the interface is where it makes it a lot

743

:

easier to operate Stan is always good.

744

:

So there's command-stan-r command-stan-pi

that really do a lot of the heavy lifting.

745

:

Yeah.

746

:

Yeah, super fun.

747

:

For sure, I didn't know Pathfinder was

there, but definitely super cool.

748

:

Have you used it yourself?

749

:

And is there any kind of model you'd

recommend using it on?

750

:

No, I haven't used it myself.

751

:

But there is a model that I'm working on

at Zellis that I do want to use it on.

752

:

So we're doing, we call it.

753

:

component skill projection models.

754

:

So you have observations of how players

are doing for many measurements, and then

755

:

you have that over years, and you can

imagine that there are things that you

756

:

don't observe about them that kind of, you

know, there's a function that you apply to

757

:

the underlying latent skill that then

produces the output.

758

:

And, you know, over time you're trying to

estimate over time what that does.

759

:

And so for something like that,

760

:

I think using an approximate solution

would probably be really good.

761

:

Yeah.

762

:

Do you already have a tutorial page on

this 10 website that we can refer people

763

:

to for that episode's show notes?

764

:

I'm not sure.

765

:

I could send it to you, though.

766

:

I believe there's a Pathfinder paper out

in the archives.

767

:

Bob Carpenter's on it.

768

:

OK, yeah, for sure.

769

:

Yeah, add that to the show notes, and I'll

make sure to put that on the website when

770

:

your episode goes out, because I'm sure

people are going to be curious about that.

771

:

Yeah.

772

:

And more generally, are there any emerging

trends or developments in Bayesian stats

773

:

that you find particularly exciting or

promising for future applications?

774

:

No, but I do feel like the adoption of

Bayesian methods and modeling, there's

775

:

still time for that to spread.

776

:

especially in the world now where LLMs are

the biggest rage and it's, you know, LLMs

777

:

are being applied everywhere, but I still

think that there's space for more places

778

:

to use really smart, complex models with

limited data.

779

:

So with the, with all these tools, I just

think that, you know, more industries need

780

:

to catch on and start using them.

781

:

Yeah, I see.

782

:

Already, I'm pretty impressed by what you

folks do at Zillus.

783

:

That sounds really funny and interesting.

784

:

And actually, one of their most recent

episodes I did, episode 91, with Max

785

:

Gebel, was talking about European football

analytics.

786

:

And I'm really surprised.

787

:

So I don't know if you folks at Zillus

work already on the European market, but

788

:

I'm really impressed.

789

:

I'm pretty impressed in how mature the US

market is on that front of spots

790

:

analytics.

791

:

And on the contrary, how at least

continental Europe is really, really far

792

:

behind that curve.

793

:

I am both impressed and appalled.

794

:

I'm curious what you know about that.

795

:

I don't think anyone's that far behind

right now.

796

:

So I know you had Jim Albert on the show

too, and I heard both of those.

797

:

Right.

798

:

And the, the thing that I'm really excited

about right now is making all the models

799

:

more complex, right?

800

:

So I think that, you know, we probably

have some of the more advanced models or

801

:

at least up to industry standard in a lot

of them and like more complex than others

802

:

when I, you know, I just got here.

803

:

got to the company and when I look at it,

I think there's like another order of

804

:

complexity that we can get to using the

tools that already exist.

805

:

And that's where I'm excited.

806

:

It's the data is out there.

807

:

It's been collected for, you know, five

years, 10 years.

808

:

Uh, there's new tracking data.

809

:

That's, you know, that that's happening.

810

:

So there's more data coming out, more

fidelity of data, but even using the data

811

:

that we have, um,

812

:

A lot of the models that people are

fitting are at the summary level

813

:

statistics.

814

:

And that's great and all.

815

:

We're making really good things that

people can use using that level of

816

:

information.

817

:

But we can be more granular about that and

write more complex models and have better

818

:

understanding of the phenomenon, like how

these metrics are being generated.

819

:

And I think that's, for me, that's what's

exciting right now.

820

:

Yeah.

821

:

And that's what I've seen too, mainly in

Europe, where now you have amazing

822

:

tracking data.

823

:

Really, really good.

824

:

In football, I don't know that much

because unfortunately I haven't had any

825

:

insight peeking that I've had for rugby.

826

:

And I mean, that tracking data is

absolutely fantastic.

827

:

It's just that people don't do models on

them.

828

:

They just do descriptive statistics.

829

:

which is already good, but they could do

so much from that.

830

:

But for now, I haven't been successful

explaining to them what they would get

831

:

with models.

832

:

And something that I'm guessing is that

there is probably not enough competitive

833

:

pressure on this kind of usage of data.

834

:

Because I mean,

835

:

Unless they are very special, a sports

team is never going to come to you as a

836

:

data scientist and tell you, hey, we need

models.

837

:

Because they don't really know what the

difference between a mean and a model

838

:

actually is.

839

:

So usually these kinds of data analytics

are sold by companies here in Europe.

840

:

And well, from a company standpoint, they

don't have a lot of competitive pressure.

841

:

Why would you invest in writing models

which are hard to develop and takes time

842

:

and money?

843

:

Whereas you can just, you know, sell raw

data that then you do stat desk on.

844

:

And that costs way less and still you're

ahead of the competition with that.

845

:

Kind of makes sense.

846

:

So yeah, I don't know.

847

:

I'm curious what you've seen and I think

the competitive pressure is way higher in

848

:

the US, which also explains why you are.

849

:

trying to squeeze even more information

from your data with more complex models.

850

:

Yeah.

851

:

I think you've described sort of the path

of a lot of data analytics going into a

852

:

lot of industries, which is like, the

first thing that lands is there exists

853

:

data, let's go collect data.

854

:

Let's go summarize data, and then someone

will take that and sell it to the people

855

:

that collected the data.

856

:

And that's cool.

857

:

And I always think the next iteration of

that is taking that data and doing

858

:

something useful and deriving insight.

859

:

The thing that baseball has done really

well was linking, um, runs to outcomes

860

:

that they cared about winning games.

861

:

Right.

862

:

It's like you increase your runs, you win

games.

863

:

You decrease your runs, you lose games.

864

:

Right.

865

:

It's pretty simple.

866

:

Um, so this is where it's, you know, even

I'm having trouble right now too.

867

:

It's, it's, um,

868

:

for basketball, like you shoot slightly

higher percentage, you're gonna score a

869

:

little more, but does that actually

increase your wins?

870

:

Yeah.

871

:

And that's really tough to do in the

context of five on five.

872

:

If you're talking about rugby, you got, is

it nine on nine or is it 11?

873

:

It's 15.

874

:

15, right?

875

:

Classic European rugby is 15, yeah.

876

:

Like the World Cup that's happening right

now.

877

:

So if you got 15 players, like...

878

:

What's the impact of replacing one player?

879

:

And it starts getting a lot harder to

measure.

880

:

So I do think that there's, so even from

where I'm sitting, it seems like there's a

881

:

lot of hype around collecting data and

just visualizing data and understanding

882

:

what's there.

883

:

And people hope that a cool result will

come out by just looking at data, which I

884

:

do hope that it will happen.

885

:

But as soon as the lowest line fruit is

picked, the next thing has to be models.

886

:

And yeah.

887

:

Yeah, exactly.

888

:

Completely agree with that.

889

:

And I think it's for now, it's still a bit

too early for Europe for now.

890

:

It's going to come, but we can have

already really good success by just doing

891

:

stat desk, because a lot of people are

just not doing it.

892

:

And so recruiting and training just based

on gut instinct.

893

:

which is not useless but can definitely be

improved.

894

:

You know, one of the other things about

sport that's really difficult is that,

895

:

when we talk about models, we assume

everything is normally distributed.

896

:

We assume that the central limit there and

holds or the law of large numbers and all

897

:

these things are average.

898

:

When you talk about the highest level of

sport, you're talking about the tail end

899

:

of the tail end of the tail end.

900

:

And that is not normal.

901

:

And I'm seeing somebody to model.

902

:

This is where, like I said, I'm really

excited.

903

:

It's not everywhere, but a lot of times we

do assume that's normal normality

904

:

assumptions.

905

:

And I don't think they're normal.

906

:

And I think if we actually model that

properly, we're going to actually see some

907

:

better results.

908

:

But it's early days for me.

909

:

So.

910

:

Yeah, it's actually a good point.

911

:

Yeah.

912

:

I hadn't thought of that, but yeah, it

definitely makes sense because then you

913

:

get to scenarios which are really the

extreme by definition, because even the

914

:

people you have in your sample are

extremely talented people already.

915

:

So you cannot model that team the same way

as you would model the football team from

916

:

around the corner.

917

:

Awesome, Daniel.

918

:

Well, it's already been a long time, so I

don't want to take too much of your time.

919

:

But before asking you the last two

questions, I'm wondering if you have a

920

:

personal anecdote or example to share of a

challenging problem you encountered in

921

:

your research or teaching related to

Bayesian stats and how you were able to

922

:

navigate through it.

923

:

Oh, um...

924

:

in teaching.

925

:

I don't know.

926

:

That one's a tough one.

927

:

It's um...

928

:

Yeah.

929

:

I...

930

:

It's a different one.

931

:

Okay, here's one of the toughest ones

was...

932

:

Just kind of knowing when to give up.

933

:

So, going back to a workshop I taught

maybe in like:

934

:

I remember someone had walked in with a

laptop that was like a 20-pound laptop.

935

:

That was like 10 years old at that point

and was I think running a 32-bit Windows.

936

:

and asking for help on how to run Stan on

this thing.

937

:

I'm going to try to give up.

938

:

Sometimes you just need better tools.

939

:

It's a good point.

940

:

Yeah, for sure.

941

:

That's very true.

942

:

That's also something actually they want

to...

943

:

a message that I want to give to all the

people using Pimc.

944

:

Please install Pimc with Mamba and not

Beep because Mamba is doing things really

945

:

well, especially with the compiler, the C

compiler, and that will just make your

946

:

life way easier.

947

:

So I know we repeat that all the time.

948

:

It's in the readme.

949

:

It's in the readme of the workshops we

teach at Pimc Labs, and yet people still

950

:

install

951

:

So if you really have to install with

peep, then do it.

952

:

Otherwise, just use MambaForge.

953

:

It's amazing.

954

:

You're not going to have any problems and

it's going to make your life easier.

955

:

There is a reason why all the Pimc card

developers ask you that as a first

956

:

question anytime you tell them, so I have

a problem with my Pimc install.

957

:

Did you use Mamba?

958

:

So yeah, it was just a general public

announcement that you made me think about

959

:

that Daniel, thanks a lot.

960

:

Okay, before letting you go, I'm gonna ask

you the last two questions I ask every

961

:

guest at the end of the show.

962

:

First one, if you had unlimited time and

resources, which problem would you try to

963

:

solve?

964

:

My, I would try to solve the income

disparity in the US and what that gets

965

:

you.

966

:

I'm thinking mostly health insurance.

967

:

I think it's really bad here in the US.

968

:

You just need resources to have health

insurance and it should be basic.

969

:

It's a basic necessity.

970

:

So working on some way to fix that would

be awesome.

971

:

unlimited time and energy.

972

:

Yeah, I mean, definitely a great answer.

973

:

First one, we get that, but totally agree,

especially from a European perspective,

974

:

it's always something that looks really

weird to you when you're coming to the US.

975

:

It's super complicated.

976

:

Also, yeah.

977

:

One of the things, like, working in pharma

was like, realizing that a lot of the R&D

978

:

budget is coming from

979

:

you can call it overpayment from the

American system.

980

:

And so if you still want new drugs that

are better, it's got to come from

981

:

somewhere, but not sure where.

982

:

It's a tough problem.

983

:

Yeah, yeah, yeah.

984

:

I know for sure.

985

:

And second question, if you could have

dinner with a great scientific mind, dead,

986

:

alive, or fictional, who would it be?

987

:

That one, like I thought about this for a

while.

988

:

And you know, the normal cast of

characters came up, Andrew, Delman, Bob

989

:

Carpenter, Matt Hoffman.

990

:

But the guy that I would actually sit down

with is Sean Frayn.

991

:

You probably haven't heard of him.

992

:

He's an American inventor.

993

:

He has a company called Looking Glass

Factory that does 3D holographic displays

994

:

without the need of a headset.

995

:

He happens to have been my college

roommate and my big brother and my

996

:

fraternity at New Delta at MIT.

997

:

And I haven't caught up with him in a long

time.

998

:

So that's a guy I would go sit down with.

999

:

That sounds like a very fun dinner.

:

01:08:25,721 --> 01:08:27,802

Well, thanks a lot, Daniel.

:

01:08:28,383 --> 01:08:30,104

This was really, really cool.

:

01:08:30,385 --> 01:08:35,769

I'm happy because I had so many questions

for you and so many different topics, but

:

01:08:35,889 --> 01:08:37,611

we managed to get that in.

:

01:08:37,611 --> 01:08:40,213

So yeah, thank you so much.

:

01:08:41,074 --> 01:08:45,918

As usual, I put resources in a link to

your website in the show notes for those

:

01:08:45,918 --> 01:08:47,299

who want to dig deeper.

:

01:08:47,539 --> 01:08:50,741

Thanks again, Daniel, for taking the time

to be on this show.

:

01:08:50,972 --> 01:08:54,849

You had to be easy change your predictions

More Episodes
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
89. #89 Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler
01:59:50
85. #85 A Brief History of Sports Analytics, with Jim Albert
01:06:10
2. #2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck
00:43:37
25. #25 Bayesian Stats in Football Analytics, with Kevin Minkus
00:55:58
49. #49 The Present & Future of Baseball Analytics, with Ehsan Bokhari
01:12:54