Artwork for podcast Learning Bayesian Statistics
#108 Modeling Sports & Extracting Player Values, with Paul Sabin
Episode 10814th June 2024 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:18:04

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways

  • Convincing non-stats stakeholders in sports analytics can be challenging, but building trust and confirming their prior beliefs can help in gaining acceptance.
  • Combining subjective beliefs with objective data in Bayesian analysis leads to more accurate forecasts.
  • The availability of massive data sets has revolutionized sports analytics, allowing for more complex and accurate models.
  • Sports analytics models should consider factors like rest, travel, and altitude to capture the full picture of team performance.
  • The impact of budget on team performance in American sports and the use of plus-minus models in basketball and American football are important considerations in sports analytics.
  • The future of sports analytics lies in making analysis more accessible and digestible for everyday fans.
  • There is a need for more focus on estimating distributions and variance around estimates in sports analytics.
  • AI tools can empower analysts to do their own analysis and make better decisions, but it's important to ensure they understand the assumptions and structure of the data.
  • Measuring the value of certain positions, such as midfielders in soccer, is a challenging problem in sports analytics.
  • Game theory plays a significant role in sports strategies, and optimal strategies can change over time as the game evolves.

Chapters

00:00 Introduction and Overview

09:27 The Power of Bayesian Analysis in Sports Modeling

16:28 The Revolution of Massive Data Sets in Sports Analytics

31:03 The Impact of Budget in Sports Analytics

39:35 Introduction to Sports Analytics

52:22 Plus-Minus Models in American Football

01:04:11 The Future of Sports Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Folks, you may know it by now, I am a huge

sports fan.

2

:

So needless to say that this episode was

like being in a candy store for me.

3

:

Well, more appropriately, in a chocolate

store.

4

:

Paul Sabin is so knowledgeable that this

conversation was an absolute blast for me.

5

:

In it, Paul discusses his experience with

non -stats stakeholders in sports

6

:

analytics and the challenges of convincing

them to adopt evidence -based decisions.

7

:

He also explains his soccer power ratings

and projections model, which uses a

8

:

Bayesian approach and expected goals, as

well as the importance of understanding

9

:

player value in difficult to measure

positions and the need for more accessible

10

:

and digestible sports analytics for fans.

11

:

We also touch on the impact of budget on

team performance in American sports and

12

:

the use of plus -minus models in

basketball and American football.

13

:

Paul is a senior fellow at the Wharton

Sports Analytics and Business Initiative

14

:

and I like truer

15

:

in the Department of Statistics and Data

Science at the Wharton School of the

16

:

University of Pennsylvania.

17

:

He has spent his entire career as a sports

analytics professional, teaching and

18

:

leading sports analytics research

projects.

19

:

This is Learning Visions Statistics,

,:

20

:

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

21

:

methods, the projects, and the people who

make it possible.

22

:

I'm your host, Alex Andorra.

23

:

You can follow me on Twitter at Alex

underscore Andorra, like the country, for

24

:

any info about the show.

25

:

LearnBayStats .com is Laplace to me.

26

:

Show notes.

27

:

becoming a corporate sponsor, unlocking

Bayesian Merge, supporting the show on

28

:

Patreon, everything is in there.

29

:

That's LearnBasedStats .com.

30

:

If you're interested in one -on -one

mentorship, online courses, or statistical

31

:

consulting, feel free to reach out and

book a call at topmate .io slash alex

32

:

underscore and dora.

33

:

See you around, folks, and best Bayesian

wishes to you all.

34

:

Welcome to Learning Vagin Statistics.

35

:

a full conversation in French as we just

had before recording.

36

:

Well done.

37

:

It used to be though.

38

:

Go back two to three hundred years.

39

:

Maybe you just don't go to Africa enough.

40

:

That's where French is spoken a lot now

too.

41

:

Exactly.

42

:

But other than that, you can see French

used to be a very international language

43

:

because in my travels, almost all the time

people tell me, yeah, I studied French in

44

:

high school.

45

:

And the only thing they can say is just a

few words.

46

:

Which is normal, like if you don't use it,

right?

47

:

But yeah, you can see that because French

is still, or was still taught in high

48

:

school and now less and less.

49

:

So yeah, so well done Paul for that.

50

:

I know, I don't think French is an easy

language to learn.

51

:

What has been your experience?

52

:

I'm actually very curious.

53

:

You know, it's hard to say, so this is a

statistics pod or data science podcast.

54

:

So I guess I can't really, I can't really

compare it to anything else.

55

:

That's the only other language I've

learned besides my native English.

56

:

So, you know, I guess, you know, one

sample size for me, I took it in high

57

:

school as well.

58

:

I hated it.

59

:

I had, so, you know, coming from America,

you know, so the reason I chose, you know,

60

:

seventh grade is when I had to choose

whether I was taking French or Spanish.

61

:

And I'm the youngest of four kids in my

family growing up.

62

:

And my older siblings told me that the

Spanish teacher was really mean.

63

:

And that's originally why I took took

French.

64

:

and then I took it for the required two to

three years.

65

:

And then I was done.

66

:

I had in high school, I had this teacher

from Belgium and I still remember her

67

:

name, Madame Vendon Plus, and I couldn't

stand her, but come, come to find out

68

:

looking back in life that she was actually

a really nice person.

69

:

She was just Belgian.

70

:

And the cultural, you know, like Americans

think they're the best and the French

71

:

language in Europe people also think

they're the best because they ruled the

72

:

world in the 17 and 1800s and America felt

like they've ruled the world for the last

73

:

100 years.

74

:

And so when you get into a room together

and you think both of your cultures are

75

:

superior, you know, that doesn't go well

together.

76

:

But actually, so after that, I didn't...

77

:

speak French at all.

78

:

And then I did church service for my

church for two years and I lived in

79

:

Montreal, I lived in Quebec, not actually

in the city, I lived in a lot of rural

80

:

small town.

81

:

And so I studied French really hard.

82

:

I had to learn the very strong Quebecois

accent.

83

:

And then when I went back to school, it's

when I like really honed in my French.

84

:

I was very conversational, could speak

very fluently in Quebec, but then, you

85

:

know, I had to learn the grammar a little

bit more.

86

:

in depth.

87

:

So then I studied French as well at

university as well.

88

:

So, you know, immersing yourself and the

actually like learning languages because

89

:

when I learned it in school, it didn't

never made sense to me.

90

:

But when I studied it on my own and I

studied conjugation and all these things,

91

:

it became kind of like a math problem.

92

:

And so when I would speak a sentence in my

head, I'd always be like, I need a

93

:

subject.

94

:

I need to conjugate the verb.

95

:

And then I need to say like what I'm, you

know, just

96

:

do an adverb or an adjective after it.

97

:

And like it made sense in my head, but

that's not how I was taught in school.

98

:

I was taught, I had to memorize all these

words, like everything in the kitchen.

99

:

How do you say dishwasher?

100

:

How do you say refrigerator?

101

:

How do you say fork?

102

:

How do you say spoon?

103

:

I couldn't learn like that, but at like

living and like thinking about French as a

104

:

math equation, it made sense in my head

and I was able to pick it up.

105

:

You know, sure.

106

:

I made tons of mistakes and embarrassed

myself, but it wasn't too bad.

107

:

And that's how you learn.

108

:

Yeah.

109

:

So I'm guessing.

110

:

Like from that answer, I'm guessing people

already know why I invited you on the

111

:

podcast.

112

:

Very nerdy answer, your put languages,

that's perfect.

113

:

Thanks a lot.

114

:

And yeah, I completely relate actually.

115

:

I learned English and German in high

school and yeah, kind of the same.

116

:

I always hated formal language learning.

117

:

And like in the end I learned these

languages and Spanish that was the same

118

:

and Italian that was the same, just going

to the country basically.

119

:

And yeah, as you were saying, I think also

what it adds is you've got skin in the

120

:

game.

121

:

You're in the country, you're having a

conversation with someone.

122

:

If you're not able to talk, you look

extremely stupid.

123

:

So it's a very good incentive for the

brain to step up and learn.

124

:

And that's really awesome.

125

:

And then when you are in the situation

that you...

126

:

don't know what to say, you remember that.

127

:

And then when you learn, this is what I

should have said, it sticks with you

128

:

because it has an emotional attachment to

it.

129

:

Yeah.

130

:

Yeah.

131

:

No, exactly.

132

:

And I mean, and that's going to be a good

segue to my first question to you, but I

133

:

think it's also one of the situations in

life, where you can really, feel and see

134

:

your brain learning.

135

:

So that's why I also really love learning

new languages and going to countries to do

136

:

that because.

137

:

Like you arrive in the country, you don't

know how to say anything.

138

:

And in just a few weeks, your brain starts

picking up stuff and you can really,

139

:

really feel your brain doing its amazing

work that it's been like conditioned to do

140

:

from years of evolution.

141

:

And to me, that's just absolutely

incredible that the brain is able to do

142

:

that.

143

:

Even when you're like in your thirties and

beyond, you can do that.

144

:

And it's just, I found that absolutely

incredible.

145

:

And that's kind of like a Bayesian.

146

:

neural network, you know, so I mean, see

that segue, I should definitely have a

147

:

podcast.

148

:

So actually talking about base.

149

:

Yeah, I invited you on the podcast because

you do absolutely awesome work on sports

150

:

modeling.

151

:

And people know that I'm a big fan of a

lot of sports.

152

:

I love modeling sports and so on.

153

:

So I'm super happy to have you here.

154

:

And I have a list of questions that is

embarrassingly long.

155

:

But maybe can you tell us if you are

actually yourself using some basic

156

:

methods, if you're familiar with those or

not?

157

:

And yeah, in general, what does that look

like in your work?

158

:

Yeah.

159

:

So yeah, I mean, just a quick background

about myself, right?

160

:

I've worked in sports, what we call sports

analytics for almost 10 years now.

161

:

Out of actually, I was getting my PhD.

162

:

And statistics, and I, you got, there was

this job opportunity at ESPN, you know,

163

:

which is a sports broadcasting television

channel in the U S and a few other

164

:

countries.

165

:

And, you know, I got the job offer to work

on their sports analytics team where

166

:

essentially what the team there does is

make forecasts so that, you know, they can

167

:

show on TV, you know, on the bottom line,

like who's expected to win, or they can,

168

:

we will run simulations on.

169

:

you know, who's likely to win the

championship, you know, all throughout the

170

:

season.

171

:

And so, you know, you can tell stories

with that saying, you know, the team was

172

:

just like the beginning of the season.

173

:

No one thought they were going to be any

good, but just look how it, you know, they

174

:

got better or the opposite.

175

:

Like they were supposed to be really good

and everything just went wrong.

176

:

And so in my field in sports modeling, I

would think actually you can't, you can't

177

:

do it without being Bayesian.

178

:

And so when I would interview people, I'd

always focus on, on those.

179

:

So as people coming out of school,

sometimes they don't always learn Bayesian

180

:

methods very well.

181

:

And the reason is in sports, sample sizes

are very small and you have to make

182

:

forecasts with very limited data.

183

:

And the great thing about Bayesian is

statistics is that you actually have more

184

:

data.

185

:

You just haven't observed it.

186

:

You have expertise or you have opinions,

but those opinions actually matter.

187

:

And so maybe we'll get into this, but I'm

actually a very strong advocate because of

188

:

my field of being a subjective Bayesian

analysis.

189

:

It's okay to insert some information into

your models and it usually makes them

190

:

better.

191

:

Yeah.

192

:

Well, awesome.

193

:

couldn't have dreamt better and I have to

fully structure.

194

:

I didn't know Paul was going to answer

that because that's not really, I haven't

195

:

seen that in your, you know, on your

website or else,

196

:

So before, while preparing the episode, I

didn't know if you were already using

197

:

Bayesian methods or else.

198

:

But definitely, definitely happy to hear

that.

199

:

And so that people know that was not a

conspiracy.

200

:

I didn't know anything that Paul was going

to say.

201

:

OK, so that's awesome.

202

:

So I'm an open source developer, so I'm

always very curious about the stack you're

203

:

using.

204

:

What are you using actually when you're

doing Bayesian analysis of a spot model?

205

:

So in my career, I almost always use R and

Stan.

206

:

So if I'm doing Bayes analysis, I write a

lot of Stan code.

207

:

It's gotten easier with the Chat GPT.

208

:

It doesn't do it all the way, right?

209

:

But if it's like, hey, I want to build

this kind of model, it'll at least give me

210

:

a good framework.

211

:

And then I can adjust it and edit it as I

want from there.

212

:

Yeah.

213

:

Yeah.

214

:

And I mean, for sure, you cannot go wrong

with the.

215

:

with R and Stan.

216

:

So yeah, definitely.

217

:

And we've had the, one of the creators of

Stan, Andrew Gellman, was back on the

218

:

podcast a few weeks ago.

219

:

It was not released yet, but through time

travel, it's gonna have been released when

220

:

your episode is out.

221

:

So folks, you can go back to - Right,

because I am definitely a lesser draw than

222

:

Andrew Gellman is, but that's great.

223

:

No, yeah, so if people are curious about

what Andrew has been up to, lastly, it's

224

:

the third time he's been on the show and

he just released a new book, Active

225

:

Statistics, that I definitely recommend.

226

:

It's really fun to read.

227

:

It's like, it's how to teach statistics

with stories, which actually relates to

228

:

something you just said, Paul, about the,

like, cool and fun way to relate

229

:

statistics to...

230

:

non -stats people was to be able to tell

stories about a team's probability of

231

:

winning or any forecast like that.

232

:

So that's definitely interesting to hear

you talk about that.

233

:

And actually I'm curious because I've been

following that field of spots analytics

234

:

for a few years and I've seen it

personally mature.

235

:

quite a lot and evolved quite a lot when

it comes to the technology and the data

236

:

availability.

237

:

So I'm curious what an expert like you

think about that evolution of technology

238

:

and data availability and how that changed

the landscape of Spots Analytics.

239

:

Yeah, I mean, it's exploded in the last 10

to 15 years.

240

:

So I mean, if people are familiar with the

book slash movie Moneyball, which is

241

:

20, about 20 years, the book is about 20

years old now.

242

:

The movie is about 12, 13 years old now.

243

:

you know, back then in baseball, baseball

was the sport that sort of took off in

244

:

sports analytics.

245

:

I mean, for a couple of reasons.

246

:

One, the game is very discreet.

247

:

So their start and their stopping points.

248

:

So you can measure.

249

:

Right.

250

:

Discrete events very well in baseball, but

two, like they're the only sport that

251

:

actually had a really long running data

set.

252

:

And that went back and they've been

keeping statistics in baseball and you can

253

:

actually go back to the 1800s and find out

ople were playing baseball in:

254

:

No other sport has that.

255

:

So that's, that's probably the reason why

baseball took off.

256

:

but since then, you know, every sport for

a while after that, every sport had what

257

:

we call play by play data, which is like,

this is what happens.

258

:

Soccer had a, a version that was called

event data.

259

:

So would people would.

260

:

watch a game and every time someone

touched the ball or made a pass, they

261

:

would mark, the ball was touched here on

the field and it was passed to there or

262

:

they dribbled from here to there.

263

:

So it was, they kind of were discretizing

soccer in a way to make it a similar

264

:

format.

265

:

But then about 10 years ago, we started

getting this player tracking data, which

266

:

is the location of everybody and the ball

or the puck on the field, you know,

267

:

depending on the sport, 10 to 25 times per

second.

268

:

And that's drastically changed.

269

:

the methodologies and things that are

used.

270

:

So, I mean, Bayesian analysis was great

for this play by play data or even, you

271

:

know, game by game data and measuring how,

how players or teams performed.

272

:

And then now we've started getting such

huge data sets that, you know, more of the

273

:

computer science world, neural networks,

things like that started becoming much

274

:

more prevalent in sports analysis just

because the data sets were so massive.

275

:

Not that statistics doesn't play a role.

276

:

It still does.

277

:

And I think.

278

:

People sometimes overly rely on these

black box methods.

279

:

They don't think about the implications or

the biases in the data, which are still

280

:

important.

281

:

But we have these huge amounts of data now

and it's just exploded to like, you know,

282

:

if you want all the data in a season in

the NFL, it's like over one terabyte of

283

:

locations of everybody on every field, 20,

every play of 25 times a second.

284

:

It's just massive.

285

:

Right.

286

:

So it's, it's really changed the way

people have done things.

287

:

Right.

288

:

And we started going from really simple

questions to huge big questions.

289

:

And the funny thing is now, I actually

think with the data being so large, people

290

:

are now actually going back to answering

more simple questions.

291

:

Like we're not trying to measure

everything all at once.

292

:

Let's try to measure very specific things

that we weren't able to measure before.

293

:

Hmm.

294

:

Yeah, that is definitely interesting.

295

:

and is that so first.

296

:

Is that availability of data, massive

availability of data, the case in all the

297

:

sports industry?

298

:

Or is it more, well, the most historical

ones, as you were saying, maybe more

299

:

baseball.

300

:

I know the data set are more massive there

and maybe other sports like soccer are

301

:

less prevalent, the data set are less

prevalent, less massive, or is that a

302

:

uniform trend?

303

:

First question.

304

:

And then second question is,

305

:

Where does that data leave?

306

:

Is that mostly open source or is that

still quite close source data?

307

:

Yeah.

308

:

So I mean, baseball is usually like the

cutting edge of everything because they

309

:

had a head start.

310

:

And basketball and then like kind of

American football, international soccer

311

:

football and hockey kind of trail behind.

312

:

But the data sets now in all those sports

are very massive.

313

:

Hockey just got

314

:

The NHL just got their player puck

tracking data just a couple of years ago.

315

:

Now baseball and basketball have moved on

beyond just knowing where players are on

316

:

the field.

317

:

They actually have data of what's called

pose data.

318

:

So they know where different joints and

their arms and the legs are of every

319

:

player on the field or on the court.

320

:

So that data is massive.

321

:

It's massive everywhere.

322

:

There's companies that are trying to

collect new data based on

323

:

video, so they're using computer vision

algorithms to do that, but largely to

324

:

answer your second question.

325

:

This is not open source data.

326

:

So the old school data, the play by play

data is open source.

327

:

You can find that on every sport pretty

much via an open source mechanism now.

328

:

But this huge, these huge data sets of the

tracking of the players, you know, 10 to

329

:

25 times per second.

330

:

It's usually all closed source.

331

:

There are a few.

332

:

releases of that here and there, you know,

the NFL does a competition where they

333

:

release some of that data each year, like

a very small set.

334

:

and a few other leagues have done

something similar as well.

335

:

If they know that's, that's kind of gives

you a taste.

336

:

if you have money, there are companies

that try to create that data themselves

337

:

and they'll sell it to you.

338

:

But you know, that's usually pretty

expensive for an individual person to buy.

339

:

So again, just that.

340

:

I see.

341

:

Okay.

342

:

Yeah, interesting.

343

:

Definitely.

344

:

Because like data is kind of oil in our

industry, right?

345

:

So it's definitely interesting to know

what's the state of the supply of oil in a

346

:

way.

347

:

Maybe for people who are less versed in in

sports modeling, can you give us an

348

:

example of how analytical insights have

349

:

directly influenced team strategy or

player selection in one of your consulting

350

:

roles.

351

:

Yeah.

352

:

So I mean, I'll just kind of talk broadly

at first.

353

:

I mean, so sometimes it's just the most

basic things, right?

354

:

So like in basketball, people shoot three

pointers more because all they did is

355

:

figured out the expected value was larger

for three point shot than it was for most

356

:

two point shots.

357

:

Not, not those layups and the dunks,

right?

358

:

Those are very high percentages.

359

:

So the expected value of a, of a high

percentage times two is, you know, is, is

360

:

pretty good.

361

:

But then even if.

362

:

The percentage drops off a lot when you

multiply it by three to get the expected

363

:

value of a three point shot.

364

:

You know, it's also pretty good.

365

:

So that means basketball has changed

drastically because of that.

366

:

and in my roles, I guess, you know, I

think in a lot of sports, there's just

367

:

been a lot of open questions.

368

:

People kind of move one way.

369

:

And then I think actually, I think the

sports analysis does really good job of

370

:

tackling very easy problems first.

371

:

But then I think there's actually a

tendency for the analysts themselves to be

372

:

overconfident in their analysis and

they're not factoring in all of the

373

:

sources of variation that might be there.

374

:

And something I'm also very curious about

it is what's your experience with non

375

:

-stats stakeholders?

376

:

So coaches, scouts, players, how do they

typically respond to the analytics and the

377

:

insights you provide and other...

378

:

differences in reception across sports,

maybe across roles.

379

:

Yeah.

380

:

So, I mean, it really does vary as in all

things, there's variance.

381

:

There are some typically younger, you

know, coaches or scouts that are a little

382

:

bit more receptive than people who have

been doing something for a long time.

383

:

And I think that's just human nature.

384

:

You're used to doing things a certain way.

385

:

You don't like.

386

:

You know, to stereotype, you don't like

some young person coming and telling you

387

:

how to do your job.

388

:

Right.

389

:

So you have to be really careful about

that.

390

:

and the, and the funny thing is, you know,

everything that I have learned or, you

391

:

know, I believe in, in terms of making

data driven decisions and don't

392

:

overestimate based on small sample sizes

goes out the window when I'm trying to

393

:

convince a stakeholder of something.

394

:

So for example,

395

:

If I have a model and I want them to use

it, and I think it's going to help them.

396

:

Of course, I've done the analysis to say,

you know, what over the long run, how it

397

:

would improve our efficiency, or if we

make a decision in this way, it'd be

398

:

better process, et cetera.

399

:

I've done that analysis and I've done it

over a larger sample size.

400

:

But when I, when I tell them what they

want to know is they want confirmation

401

:

bias, right?

402

:

They love confirming their beliefs.

403

:

So in order to get them to, agree with

what you're saying, it, this works so much

404

:

more better than saying, you know, out of

the thousand players that I did this in,

405

:

you know, you only were correct 60 % of

the time, but my model would have been

406

:

correct 70%.

407

:

Like they don't want to hear that.

408

:

They essentially say, well, my model, you

know, you love this player.

409

:

So does my model.

410

:

I find the one guy, even if it's literally

only one person, they're like, yeah.

411

:

Like, if your model can.

412

:

If your model can see that, then it must

be doing something right.

413

:

And then it's like, then they start to

trust you a little bit.

414

:

And over time you give them little pieces,

little crumbs of a cookie that they can

415

:

help, you know, get confidence in.

416

:

And then, you know, then is when you share

with them, okay, well, but it's also

417

:

suggesting this, which is different than

what you've been doing in the past.

418

:

Right?

419

:

So you don't ever start with, you know,

trust me.

420

:

because you might be wrong, because you're

a human.

421

:

I mean, like, you know, humans always make

mistakes, but we usually don't think we

422

:

make as many mistakes as we do.

423

:

And so I found just over time is if you

get people to trust you by confirming

424

:

their prior held beliefs, right?

425

:

It's another Bayesian concepts.

426

:

If you can confirm their prior beliefs,

they're going to accept your future

427

:

recommendations or future things that the

model might suggest more than if you start

428

:

with.

429

:

the differences upfront.

430

:

And so that's like a little bit of human

bias, right?

431

:

That you have just learned over time.

432

:

And some things are just really hard for

people to accept, but over time, if you

433

:

get people to trust you and you build that

relationship, there's a lot of human

434

:

elements here and then they trust your

work by confirming their prior held

435

:

beliefs, then they'll trust you and open

up a little bit more to being a little bit

436

:

more open -minded about other things as

well.

437

:

Because then like, okay, well, I know

you're not an idiot.

438

:

Like you could speak my language some.

439

:

now I might be more open to learning a

little bit of your language.

440

:

And that's just sort of a human

relationship thing that you have to always

441

:

work on.

442

:

Yeah, that is very interesting.

443

:

And I'm very, yeah, I'm always very

interested to hear about that because I

444

:

also face clients daily and have to

explain models to them.

445

:

And so as you were saying, that definitely

varies a lot in interactions to the model.

446

:

But that negative wisdom of maybe

indulging the...

447

:

the confirmation bias at the beginning and

then slowly go towards a bit more of

448

:

speaking the truth.

449

:

It's very interesting.

450

:

I had not thought of that, but that's

yeah, definitely I can see that being a

451

:

valid strategy when you also are in front

of someone who doesn't really understand

452

:

the value of the modeling, I would say.

453

:

Whereas when I

454

:

encounter clients who are already

convinced of what the models can do for

455

:

them.

456

:

They are usually looking for contradicting

what they already think.

457

:

And that's when they find the model

interesting.

458

:

So I find that really, really cool to see.

459

:

The contradictions are really where

there's value, right?

460

:

But there's no value in a model if no one

uses it, right?

461

:

Even if the model is really good, if no

one uses it, it has zero value.

462

:

If they use it, the contradictions are

valuable if they're right, correct?

463

:

So in soccer analysis, you know, I've

spent my career doing lots of different

464

:

sports, but there's this sort of, this

applies to every sport.

465

:

In basketball, we can call it the LeBron

test and soccer, we'll call it the messy

466

:

test, where it's essentially, if you build

a model and it's trying to evaluate

467

:

players and messy is not like one of the

top players in your model, then.

468

:

You're not going to share it with anybody

because no one's going to believe you.

469

:

Right.

470

:

That's like the first thing everyone does

is like, okay, well is messy up top.

471

:

And if like, if messy is near the top,

then like people, at least they'll listen

472

:

to you a little bit longer.

473

:

Right.

474

:

But they're not going to listen to you at

all.

475

:

If you're like, yeah, messy is an okay

player.

476

:

Right.

477

:

Like I don't care what your model says.

478

:

Right.

479

:

That's wrong.

480

:

Right.

481

:

That's that, that's what people believe.

482

:

So it's like a little bit of like, I need

to feed you like, no, no, no.

483

:

Like I'm taking a different approach than

what you do, but you know, my approach

484

:

also thinks that messy is the best.

485

:

Right.

486

:

And then I'm like, it's okay.

487

:

You know,

488

:

Okay, yeah, we agree.

489

:

He is really good.

490

:

Yeah, it's like a sniff test, right?

491

:

And it's like, in a way, it's like, well,

that's a strong prior.

492

:

And it's like, it's saying, well, I have a

very strong prior.

493

:

That message is really good.

494

:

To convince me, otherwise you're going to

need really, really good data.

495

:

It's like, well, the earth is very

probably somewhat round.

496

:

It's going to be very hard for you to...

497

:

move that prior from me and telling me

it's not, in a way.

498

:

Yeah.

499

:

And in sports, people have really strong

priors, right?

500

:

So, you know, those sniff tests do really

matter.

501

:

And as a modeler, even for myself, like,

I'm a human.

502

:

So like, I do the same thing.

503

:

If I'm building a model, I always want to

see the results.

504

:

And it's like, I don't look at the median,

like I do, but I don't look at who the

505

:

median result is in my model half the

time.

506

:

I usually look at the best and I look at

the worst.

507

:

And if I don't understand it, then I'm

like, maybe my model is doing something

508

:

wrong.

509

:

And I'm all like, gonna, I'm going to dive

in a little bit more.

510

:

If it like confirms my prior held beliefs,

I'm like, it's probably correct.

511

:

Right.

512

:

And even as a modeler, right, you have to

be careful of that.

513

:

But at the same time in sports, you know,

it's like I said, subjective analysis can

514

:

be helpful.

515

:

It's because people's subjective and I'm

like, there's wisdom.

516

:

People coaches have been playing a game

for.

517

:

20, 30, or coaching a game for 20 or 30

years to think that they don't have

518

:

something to offer a model is kind of

crazy in my opinion.

519

:

They might have biases and of course they

do, but their information that they can

520

:

provide is useful.

521

:

Yeah, definitely.

522

:

And that's where we go back to what we

were talking about at the beginning in the

523

:

value of Bayesian inference in that

context.

524

:

Because if you can leverage that deep and

hard -hearned knowledge,

525

:

from the coaches, from the scouts, and add

that to your model, it's like getting the

526

:

best of both worlds.

527

:

And that can make your analysis extremely

powerful and useful, as you were saying.

528

:

Yeah.

529

:

And people have done studies like this,

I've done studies like this.

530

:

If you build a model just on the data and

ignore the human element, right?

531

:

Or if you build a model just on human and

scouting analysis and ignore the other

532

:

data.

533

:

Right.

534

:

Neither one of those is going to do as

well as when you combine both.

535

:

And that's really, that's what, you know,

that's Bayesian analysis is you're

536

:

combining subjective belief with objective

data and then making forecasts based on

537

:

them.

538

:

And we know that if you have priors that

are not really, really bad, a subjective

539

:

Bayesian forecast is going to have smaller

error than a data, you know, what we call

540

:

maximum likelihood forecast, right.

541

:

And stats terms, right.

542

:

Or.

543

:

You know, just the human one, just the no

data, but, you know, feelings forecast as

544

:

well, right?

545

:

So there's the combination of the two,

always does better.

546

:

Yeah.

547

:

Yeah.

548

:

Yeah.

549

:

Preaching, preaching to the choir here for

sure.

550

:

And actually, I think that's a good time

now in the episode to get a bit more

551

:

nerdy, if we can, because I've seen you,

so you've obviously worked extensively

552

:

with.

553

:

soccer analytics and you have an

interesting soccer power ratings and

554

:

projections on your website that I'm gonna

link to in the show notes but can you tell

555

:

us about it and what makes these

projections unique in your perspective in

556

:

evaluating team and player performance and

don't be afraid to dig into the nerdy

557

:

details because...

558

:

My audience definitely liked that.

559

:

Yes.

560

:

Sure.

561

:

I'll dig in.

562

:

So what's on my website is...

563

:

Sorry if you can hear my dog there.

564

:

What's on my website is perhaps the most

simple power ratings forecast that I've

565

:

ever done.

566

:

So I say that, not that it's like stupid

or anything.

567

:

So when I was at ESPN, I build power

ratings in American football, both

568

:

professional and collegiate, and

basketball, professional and collegiate.

569

:

and hockey, I mean, like almost every

sport, right?

570

:

So what's on my website, I'll explain the

model very simply is it's a Bayesian model

571

:

where you have an effect for each team,

right?

572

:

And the response variable is the expected

goals for each team.

573

:

So usually when we do a power ratings and

we're trying to estimate for a team, you

574

:

know, there's two sort of.

575

:

things that we're trying to estimate their

offensive ability and their defensive

576

:

ability and then you assume essentially

that their overall team ability, you know,

577

:

if it's a linear model, right is the

combination of their offense and their

578

:

defensive abilities.

579

:

Okay, so you so essentially in each match,

right?

580

:

You have essentially two rows of data

where you have the expected goals for the

581

:

one team and then the expected goals for

the other and the reason we use expected

582

:

goals, although I actually have

583

:

lot of issues with the expected goals.

584

:

They are a better indicator of how, how

good the team performed on offense than

585

:

just the raw number of goals.

586

:

And right.

587

:

I don't need to go into details, right?

588

:

It's essentially a, it's an expected value

as opposed to an observation from a

589

:

Poisson distribution, which soccer scores

roughly, roughly reflect a Poisson or

590

:

pretty close to a Poisson distribution,

right?

591

:

The expected goals is that expectation.

592

:

And so essentially I have a hierarchical

Bayesian model where I actually.

593

:

I actually do a few things.

594

:

So I actually assume the expected goals is

the mean of a Poisson distribution.

595

:

The observed goals is the actual outcome

of the Poisson distribution.

596

:

And then I fit a linear model essentially

where I look, okay, I have team A was on

597

:

offense, team B was the opponent.

598

:

And this was team A's expected goals.

599

:

And I'm essentially fitting a regression

model, right?

600

:

A Bayesian regression model where I have

individual team effects.

601

:

I have a prior on each team.

602

:

each team's offense and each team's

defense.

603

:

And that prior, you know, rough, I don't

have to get too crazy.

604

:

You know, I just use a normal distribution

and, and, you know, sometimes I actually,

605

:

when I code in Stan, I actually like

using, distribution was a little, a little

606

:

bit thicker tails.

607

:

But I think for this model, I was just

trying to go simple, normal distribution

608

:

prior with a mean, you know, for my

expected, essentially each team's expected

609

:

goals per game, on offense versus.

610

:

Defense right and the defensive value I

usually use I usually do the subtraction

611

:

So it's team the offensive team minus the

defensive team and that way The the

612

:

defensive team's value is is is higher if

they're a good defense So essentially if

613

:

team a's, you know expect the goals and

they in a game against an average opponent

614

:

is like 1 .5 and the defense was Average

expected goals in the game was you know

615

:

that they allowed was 1 .4

616

:

then you would say, the difference is like

0 .1, okay.

617

:

I also include effects for being at home

in this model.

618

:

I think, actually, I think that's all I

do.

619

:

But in other models I've done, you can

look at things such as how much rest

620

:

they've had since their last match.

621

:

You can look at the difference between

each team's rest.

622

:

And those are not linear effects, right?

623

:

You have to do some sort of nonlinear

effects for that, right?

624

:

Because like one day of rest is, two days

of rest is not,

625

:

Like the difference between two days of

rest and one day of rest is very different

626

:

than seven days versus eight days of rest,

right?

627

:

Seven and eight days of rest are pretty

much the same thing, but two and one is

628

:

very different, right?

629

:

Like much bigger effect for having two

days of rest than just one day of rest.

630

:

And so you can do things like that, or how

far away they had to travel, those sorts

631

:

of things.

632

:

Now in European soccer, that's not a huge

deal, because especially in the

633

:

competitions within each country, no team

is traveling that far.

634

:

But in American sports, it is a pretty big

deal.

635

:

Like, you know, you, you have to fly five,

six hours across the country on short

636

:

notice.

637

:

Like that can, that can really affect

performance.

638

:

and, and other things, like I said, I

don't have this in the soccer model, but

639

:

I, if anyone's interested in modeling

sports outcomes, that people typically

640

:

tend to overlook is the, I liked always a

big proponent of elevation, meaning that

641

:

if there are certain sports where there

are certain teams that play at higher

642

:

altitudes,

643

:

And if you're not used to playing at

higher altitudes, it's actually a very

644

:

noticeable effect in a model that you're

going to have a lower offensive output and

645

:

you'll actually allow more points on the

other end due to fatigue.

646

:

And so the United States, it's the teams

that are playing in Colorado and in Utah.

647

:

But in Europe, it could be the teams that

have to go to Switzerland or the teams

648

:

that have to go to some of these alpine

regions that are higher up in altitude.

649

:

In Mexico, if you have to go to Mexico

City, it's extremely high.

650

:

Or Colombia, right?

651

:

I mean, depending on what you're doing,

these are very high altitude places that

652

:

have shown to have a measurable impact on

an opponent's performance.

653

:

Yeah, that's very fun.

654

:

My God, I love those kind of models.

655

:

That's so much fun.

656

:

And I would also guess that, I mean, at

least my per would be that there is a

657

:

reverse mechanism also for teams who are

used to playing altitude.

658

:

Do they get a boost of performance when

they play closer to the C level?

659

:

Because they could have had adaptation

that make them better when they go to the

660

:

C level.

661

:

Yeah.

662

:

I mean, I think there's certainly science

behind that.

663

:

I found that is a lot harder to show in a

model than the reverse.

664

:

Not that it might not be there, but I

think the effect size, if it is there, is

665

:

definitely smaller than the reverse.

666

:

Yeah.

667

:

That's what...

668

:

That's what I would expect to like.

669

:

I think the effect is here mainly because,

well, I've seen it.

670

:

Like it seems to be pretty well seated in

the science literature, but that doesn't

671

:

mean the effect is big.

672

:

So yeah.

673

:

Yeah.

674

:

I mean, I'm a runner and I know that all

of the distance runners that are training

675

:

for marathons that are elites and

professionals, they all train at higher

676

:

altitudes, right?

677

:

For the...

678

:

six weeks leading up to a competition and

then they travel to the competition at a

679

:

lower altitude.

680

:

And, you know, they think they have an

oxygen performance boost due to that.

681

:

Yeah.

682

:

Yeah.

683

:

Kind of like legal oxygen doping, legal

blood doping.

684

:

Yeah.

685

:

Yeah, exactly.

686

:

Yeah.

687

:

Yeah.

688

:

I mean, I think it seems to be pretty much

proven.

689

:

I would say maybe it has more of an impact

on individual spots like marathon running

690

:

or else, because it's more like, you know,

it's just like,

691

:

Even if you're winning just a few tenths

of a second, well, it can help you have a

692

:

better time in the end because, well, at

this level, just having the smallest

693

:

increase in performance could be the

difference between first and second place.

694

:

But maybe that's harder to see such a

small effect on a collective spot, a

695

:

collective game because, well,

696

:

Maybe there are some...

697

:

Maybe it's just not an addition.

698

:

Maybe it's actually the effect cancel out.

699

:

So in the end, you don't really see a big

effect.

700

:

But that would be...

701

:

Yeah.

702

:

I'd love to do an experiment on that.

703

:

Like an RCT.

704

:

That would be so much fun.

705

:

Yeah.

706

:

Well, good luck trying to do experiments

in sports.

707

:

It's hard.

708

:

Yeah, I know.

709

:

I know.

710

:

But that...

711

:

I mean, if the multiverse exists...

712

:

Then there is a universe where we can do

that kind of experiments.

713

:

And my god, these scientists must have so

much fun.

714

:

And yeah, so thanks a lot, first, for

detailing the model that clearly and in so

715

:

much details.

716

:

That's super cool.

717

:

So the results of the model are in a cool

dashboard on your website.

718

:

Do you have the model and data available

freely, maybe on your GitHub, that we can

719

:

put in the show notes?

720

:

Yeah, I'm not sure.

721

:

I think my GitHub, I don't know if my

GitHub model is in the model.

722

:

It's on GitHub.

723

:

I don't know if it's private or not, but I

can let you know.

724

:

You know, I use actually open source data

for that.

725

:

So I, I, let me double check.

726

:

I can actually double check and get back

to you after the show on if, yeah, if I

727

:

could have it in my public GitHub or not.

728

:

So, yeah.

729

:

Yeah.

730

:

But essentially it uses the, there's a

package called world football R and.

731

:

It uses data from there to build the

model.

732

:

So some of that data is just from, it's

scraped from like transfer market.

733

:

so I use, I use, I didn't really talk

about how I set priors means for each of

734

:

the teams, but very, a very simple, very

simple, hierarchical model is essentially

735

:

just to use the expenditures of the club

and use that as a prior mean for how good

736

:

the club will be going into the season.

737

:

And, and.

738

:

Unlike some other sports in soccer, world

football, how much a club spends is very

739

:

highly correlated with how successful they

are, which makes sense, but it's not true

740

:

necessarily in like baseball.

741

:

So, do you see these effects of budget?

742

:

So, yeah, first, before I go on a follow

up question, yeah, for sure.

743

:

Get back to me after the show.

744

:

And if that's possible, we'll put that in

the show notes because I'm sure.

745

:

A lot of listeners will be interested in

checking that out.

746

:

I personally will be very interested in

checking that out, definitely.

747

:

So that'd be awesome.

748

:

And second, that effect of budget that you

see on the performance of a team.

749

:

And so I guess in football performance

mean number of expect expectation of games

750

:

won.

751

:

Do you see that on Curse?

752

:

Do you see?

753

:

that much of an effect also in a closed

league system like the MLS?

754

:

Or is that so because my prior would be

the effect of budget would be even

755

:

stronger in open leagues like we have in

Europe because it's like there is no

756

:

compensation mechanism, right?

757

:

Clubs can go down and usually in Europe

the strongest clubs are the historical

758

:

clubs.

759

:

or the new clubs are just the ones that

were lucky to be bought by very, very

760

:

healthy shareholders.

761

:

And like, there is not a lot of switching

of the hierarchy and changing of the

762

:

hierarchy, mainly because of budget, as

you were saying.

763

:

But I would think that maybe the effect of

budget is less strong in a closed league

764

:

like the MLS.

765

:

Is that true?

766

:

Is that something you see or is it

something that's still in the air?

767

:

Yes.

768

:

So I haven't looked specifically at the

MLS, but in general in American sports,

769

:

which all have closed leagues, the budget,

well, for various reasons, the budget

770

:

effects are not super strong.

771

:

So, you know, in American baseball, there

is no spending limit.

772

:

So in some American sports, like the NFL

and football, like there's a salary cap,

773

:

meaning you can't spend more than a

certain amount.

774

:

So there is no relationship between

overall spending and winning because

775

:

everyone has to spend a minimum and

there's a maximum.

776

:

In baseball, there is no limit.

777

:

There's a tax.

778

:

If you spend too much money, they do tax

you.

779

:

But there's still not a huge correlation.

780

:

And then in MLS, like I said, I'm not

entirely sure.

781

:

Most of the clubs, they are constrained

about how much they can spend.

782

:

And so there isn't as much variance also

in spending.

783

:

So like, you know, Messi going to Inter

Miami, it wasn't that Inter Miami could

784

:

pay him a lot of money.

785

:

They actually, you know, there's a couple

of exemptions that an MLS club could use

786

:

to pay an international player.

787

:

They have, they're called, you know, a

couple of exemption players they have.

788

:

And that's originally started when David

Beckham went to Los Angeles and they kind

789

:

of made that rule essentially just so he

could, they could afford paying him what

790

:

he was used to or close to what he was

used to being paid in Europe.

791

:

and, and the MLS is still kind of the

case.

792

:

You have one or two players you're allowed

to have on these exemptions and.

793

:

The way Messi was able to make it work is

he's getting paid from Apple for his

794

:

Apple's broadcasting the MLS games.

795

:

So they're paying him essentially to play

in the MLS because they're hoping, more

796

:

people are going to watch our broadcasts

are going to pay us.

797

:

And so we're going to give you a

percentage of that.

798

:

And that's where actually a lot of his

salary or like his earnings are coming

799

:

from is from a, a deal with Apple versus

the actual MLS club in Miami, which can

800

:

only pay him so much.

801

:

So my guess is, my prior is, I haven't

looked specifically at the MLS with this,

802

:

but my prior is yes, that there isn't a

huge relationship in the MLS between

803

:

winning and spending just because there's

not much of a variance.

804

:

In order to see those correlations, you

have to have a large enough variance in

805

:

the spending to notice the relationship,

right?

806

:

So.

807

:

Yeah, definitely interesting.

808

:

I mean, I love also looking at these, you

know, the...

809

:

how the structure of a league impacts the

show and the wins is extremely

810

:

interesting.

811

:

That can seem very nerdy and I think

that's my political science training that

812

:

kicks back here, but really how you

structure the game also makes the game

813

:

what it is and the results and the show

you're going to get.

814

:

I find that extremely interesting to see

how the American games, the US games are

815

:

structured.

816

:

Because ironically, it's a system where

there is much more social transfers, if

817

:

you want, like we have in Europe for

social security and health and education.

818

:

American sports are socialist, and

European sports are capitalist.

819

:

But typically, we consider Americans to be

more capitalist and the Europeans to be

820

:

more socialist.

821

:

So it's an interesting inversion.

822

:

Yeah.

823

:

No, definitely.

824

:

And I mean, I think...

825

:

Honestly, that's going to be interesting

in the coming years to see what's

826

:

happening on the European side because

there are more and more debates about

827

:

whether we should have a closed European

wide league, which would basically be an

828

:

extension of the current Champions League.

829

:

And honestly, I think it's going to take

that road because more and more

830

:

championship, at least all the

championship, I would say, for the

831

:

exception of the Premier League.

832

:

get more and more concentrated on just a

few clubs.

833

:

And just from time to time, you have one

club that bumps onto the top, like

834

:

Leverkusen this year in Germany, Monaco in

France a few years ago, Montpellier.

835

:

But that's like really exceptions.

836

:

And in the end, you almost always get the

same clubs that win all the time.

837

:

And so the idea of open leagues is not

really true for the top of the leagues.

838

:

It's definitely true for the bottom, but

the big clubs never go down.

839

:

And...

840

:

And so I think at some point, this

illusion of the open leagues is going to

841

:

disappear and probably we'll get a

European wide championship where like

842

:

basically the leagues are going to get a

bit more even because I think it's better

843

:

for the show and that's going to make more

money.

844

:

And in the end, I think that's what the

question is also.

845

:

Yeah, you might be right, but I hope, I

hope not.

846

:

I really, as an American, always have

dreamed of Americans doing relegation and

847

:

promotion just because...

848

:

You know, in America, we have this problem

where we call it tanking, right?

849

:

Because we have the socialist draft system

where if the worst teams are incentivized

850

:

to lose because they know they're not

going to win.

851

:

So they want to get the best possible

players in the draft the next season.

852

:

And so they're incentivized, you know, to,

to lose a little bit more.

853

:

And so that really does kind of, you know,

the promotion relegation is nice because

854

:

it solves that, you know, if you keep

losing, you lose a lot of money because

855

:

you get sent down.

856

:

so everyone's motivated even at the bottom

of each league to keep winning games,

857

:

right?

858

:

As much as possible.

859

:

Otherwise they lose a lot of money.

860

:

And in American leagues with the closed

system, it's like, well, Hey, you know,

861

:

it's actually, we talk about sick sickle.

862

:

He, and one thing that sports analytics

analytics have done is essentially say,

863

:

it's really hard to go from an American

sport being an average team to a really

864

:

good team.

865

:

And the reason is.

866

:

is the draft system.

867

:

So in the draft system, people are always

overconfident in how good the players are,

868

:

but there's really thick right tails of

how good a player can be.

869

:

So when you get a new player who's young

and you can draft them at the top of the

870

:

draft, they might not pan out, but they

also have a really thick right tail,

871

:

meaning that if they do pan out, you could

go from being one of the worst teams to

872

:

one of the best teams really quickly.

873

:

And so,

874

:

You know, it's this other analysis of

like, well, if you don't ever have an

875

:

option opportunity to draft someone in a

position where there's that right tail,

876

:

where, you know, once out of every five

years, you get a player who's transcends

877

:

everyone else that comes in, then you

can't move up from average to really good,

878

:

but you can go from being bad to really

good.

879

:

So often teams and the smarter teams, if

they're really good, they say really good.

880

:

But once they start noticing the players

are getting older, they just trade

881

:

everybody away.

882

:

They get rid of all their best players and

they just stink for a year or two and

883

:

hopefully they can get some good draft.

884

:

They get a lot of draft picks.

885

:

Essentially.

886

:

They try to trade their players away, get

more draft picks, and then it becomes a

887

:

sample size problem.

888

:

And it says, well, if we have more draft

picks, our probability of getting someone

889

:

on the right tail goes up.

890

:

And so that's all we're going to do is

we're just going to increase our odds of

891

:

getting that right tail player.

892

:

And if we get that player, then we'll be

good again.

893

:

Yeah.

894

:

Yeah.

895

:

It's like.

896

:

buying a lot of lottery tickets.

897

:

Yeah, that's what they're doing.

898

:

Yeah, now that's fascinating.

899

:

Yeah, I wasn't aware of these effects.

900

:

That's super interesting.

901

:

Because basically, what you're saying is

there is an incentive to be extreme,

902

:

basically.

903

:

Either you want to be among the top ones

or you want to be among the worst ones.

904

:

But being in the middle is the worst,

actually.

905

:

It is the worst.

906

:

Yeah.

907

:

Yeah.

908

:

That is extremely interesting.

909

:

And that's...

910

:

Yeah, I mean, I actually don't know which

system I prefer.

911

:

Honestly, I'm just saying I think Europe

is getting, is going there because we have

912

:

more and more basically concentration of

the wealth at the very top of the leagues

913

:

and that's going to make the national

leagues less and less interesting

914

:

basically.

915

:

But I don't know either if I prefer the

European wide championship.

916

:

Well, I think I would prefer European wide

championship.

917

:

for sure, but I think it would be great to

have it still open.

918

:

So where you could have, you know, like

basically countries would become regions

919

:

and then you get from like, if you, if

you're in the best in France, basically in

920

:

one year, then you get to the highest

level, which is the European one.

921

:

And then if you're among the worst, you

get down to your country the next year.

922

:

I think that would be very fun because

the, like, especially now that players can

923

:

be traded very easily between the, the...

924

:

continental Europe because it's basically

the same country legally.

925

:

That also makes sense that the teams, you

know, basically meeting PSG versus

926

:

Barcelona is much more tied than PSG

versus literally any team in France.

927

:

So yeah, that's going to be very

interesting.

928

:

But at the same time, I'm very, yeah, I

love hearing about the wrong incentives.

929

:

at the same time of the closed system.

930

:

So thanks a lot for that.

931

:

That's food for thought.

932

:

And that's again, like that's very close

to two elections, actually, like how you

933

:

count the votes impacts the winner.

934

:

And so here, like really in sports to how

you structure your game has an impact on

935

:

the winners.

936

:

And I think it's extremely important to

keep in mind because in the end, like how

937

:

the

938

:

the organization, so the MLS in the US or

the UEFA in Europe have actually huge

939

:

power over the game.

940

:

Well, thanks for that political science

parenthesis.

941

:

I wasn't expecting that, but that's

definitely super interesting.

942

:

To get back to the modeling because time

is running by and I definitely want to ask

943

:

you about the plus minus models because

you're using that also to...

944

:

estimate player value in American

football.

945

:

So I'm curious about that.

946

:

What is that kind of model?

947

:

Is that mainly for American football that

you're using that also for other sports?

948

:

Or if it's only for American football, why

is that particularly tailored to that

949

:

sport?

950

:

Yeah.

951

:

So plus minus models actually are

originated in basketball and they're, they

952

:

work the best in basketball.

953

:

They're not perfect.

954

:

And that sort of the concept in basketball

is you have 10 players on the court at

955

:

each.

956

:

at each moment and they substitute in and

out.

957

:

But while those 10 players are on the

court, you know how many points are scored

958

:

for each team, right?

959

:

So, you know, five players on the offense

side and five players on defensive side.

960

:

There's essentially just a big linear

model and you look at and you want to

961

:

adjust for how long they're on the court

or how many possessions they were on the

962

:

court for.

963

:

So you can say, okay, these 10 players are

on the court for two and a half minutes.

964

:

And in those two and a half minutes, this

team scored six points and their team

965

:

scored four points.

966

:

And essentially what you're doing then is

a plus minus model, essentially.

967

:

So sometimes you might see in a, in a

statistic after the game, like the total

968

:

difference in the net points for the team

when a player was on the court versus when

969

:

they're not.

970

:

Well, that's not too useful because

there's a lot of correlations, right?

971

:

You're playing with someone else a lot.

972

:

So what we call an adjusted plus minus

model, right, is a linear model that then

973

:

tries to fit those player effects of, you

know, you get a one when you're on the

974

:

court.

975

:

on offense and negative one year on

defense.

976

:

And we look at your team's efficiency,

right?

977

:

Your points divided by some denominator,

whether it's minutes or possessions.

978

:

Okay.

979

:

And that's sort of the basketball thing

over time.

980

:

They realized, okay, well, there's so much

correlation between who is playing

981

:

together.

982

:

We need to adjust for that.

983

:

So they used ridge regression.

984

:

And so that would divvy up the credit a

little bit better.

985

:

And you know, ridge regression is very

good at when there's

986

:

A lot of multicollinearity or correlation

between two effects, right?

987

:

And on the basketball team or all

basketball players, you have teammates

988

:

that play a lot together and they don't

play with other people a lot.

989

:

But Ridge Regression has done a decently

good job in basketball over a big sample

990

:

of estimating how effective players are.

991

:

And if you look at these things, you'll

see, we talked about the sniff test.

992

:

In 2012, LeBron is the number one player.

993

:

And he's the number one player for a lot

of the years, not so much anymore because

994

:

he's older, et cetera.

995

:

Right.

996

:

But that's sort of those sniff tests that

we get.

997

:

Well, some people in, in basketball and

I'm proponent of this, like, you know,

998

:

this is a Bayesian podcast is that ridge

regression, you know, for those unfamiliar

999

:

is, is a frequentist way to write a

Bayesian model.

:

00:55:04,864 --> 00:55:09,474

That's very specific where you have a

normal prior on each player with a mean

:

00:55:09,474 --> 00:55:10,474

zero.

:

00:55:10,474 --> 00:55:10,774

Okay.

:

00:55:10,774 --> 00:55:12,270

And that's ridge regression.

:

00:55:12,270 --> 00:55:16,310

So we think about it from that perspective

with adjusted plus minus models.

:

00:55:16,350 --> 00:55:20,030

What happens when you have a normal prior

with mean zero is that when you have

:

00:55:20,030 --> 00:55:24,710

players that play less, we shrink more

towards the prior mean.

:

00:55:24,770 --> 00:55:28,349

And it's only when we have more data for

players that we can deviate from that

:

00:55:28,349 --> 00:55:29,150

prior mean.

:

00:55:29,150 --> 00:55:33,490

Well, one thing we know about sports is if

you're not playing as much, that actually

:

00:55:33,490 --> 00:55:34,630

is pretty useful information.

:

00:55:34,630 --> 00:55:35,670

And what does that tell us?

:

00:55:35,670 --> 00:55:36,990

You're not very good.

:

00:55:36,990 --> 00:55:39,570

Because if you're good, you're going to

play more.

:

00:55:39,570 --> 00:55:41,710

And if you're bad, you play less.

:

00:55:41,710 --> 00:55:45,900

So other people have come around and, you

know, in the last 10, 15 years and said,

:

00:55:45,900 --> 00:55:50,090

okay, well, instead of a ridge regression

model for basketball, we should do a

:

00:55:50,090 --> 00:55:51,070

Bayesian regression model.

:

00:55:51,070 --> 00:55:55,650

And instead of having a mean zero for a

player, we should have a mean of something

:

00:55:55,650 --> 00:55:56,120

else.

:

00:55:56,120 --> 00:55:58,730

So there's a few different versions that

people have done.

:

00:55:58,730 --> 00:56:04,270

One thing, a very simple version is say

just everybody has a mean prior mean of,

:

00:56:04,270 --> 00:56:06,630

you know, what we call a replacement

player.

:

00:56:06,630 --> 00:56:07,110

Okay.

:

00:56:07,110 --> 00:56:08,750

Someone that doesn't play very much.

:

00:56:08,750 --> 00:56:10,574

If you're really good and you play a lot.

:

00:56:10,574 --> 00:56:14,633

It doesn't matter what the prior mean is

too much because the data is going to

:

00:56:14,633 --> 00:56:15,794

overwhelm the prior.

:

00:56:15,794 --> 00:56:20,294

But if you don't play very much, we're

going to stick with that sort of negative

:

00:56:20,294 --> 00:56:23,354

prior mean because it means you're below

average.

:

00:56:23,354 --> 00:56:25,144

And so that's one thing you can do.

:

00:56:25,144 --> 00:56:28,174

A more sophisticated thing sometimes

people will do is they'll build a

:

00:56:28,174 --> 00:56:33,874

hierarchical model where you have

essentially a, a prior mean that is based

:

00:56:33,874 --> 00:56:37,454

on other statistics that we observe.

:

00:56:37,474 --> 00:56:40,270

So how many points you score or how many

assists you have.

:

00:56:40,270 --> 00:56:46,610

And those that's called a box, a box score

prior mean or a box score plus minus.

:

00:56:46,610 --> 00:56:47,790

So that's sort of the basketball.

:

00:56:47,790 --> 00:56:49,190

So we gave you the what plus minus models.

:

00:56:49,190 --> 00:56:51,210

So that's sort of the basketball approach.

:

00:56:51,229 --> 00:56:51,850

Now.

:

00:56:51,920 --> 00:56:56,190

Basketball is really nice because you have

lots of games in the NBA.

:

00:56:56,190 --> 00:57:00,550

You play every team at least twice and you

substitute a lot and there's lots of

:

00:57:00,550 --> 00:57:01,450

scoring.

:

00:57:01,450 --> 00:57:06,690

Now my work in American football tried to

address a lot of these issues in American

:

00:57:06,690 --> 00:57:07,080

football.

:

00:57:07,080 --> 00:57:08,886

You don't play every team.

:

00:57:09,262 --> 00:57:11,182

you don't substitute very much.

:

00:57:11,182 --> 00:57:16,322

And if you do play, you only play with

certain people like all the time.

:

00:57:16,322 --> 00:57:18,812

And then there's not a lot of scoring

compared to basketball.

:

00:57:18,812 --> 00:57:22,742

There's some scoring, but you know,

there's, you know, American football point

:

00:57:22,742 --> 00:57:24,462

scoring is unique, right?

:

00:57:24,462 --> 00:57:27,542

You get six or seven points for a

touchdown, you get three points for a

:

00:57:27,542 --> 00:57:30,902

field goal, you know, and then on more

rare occasions, you get these two point

:

00:57:30,902 --> 00:57:32,182

safeties.

:

00:57:32,202 --> 00:57:32,878

Yeah.

:

00:57:32,878 --> 00:57:38,258

So there's roughly maybe 10 scoring events

in an American football game versus in

:

00:57:38,258 --> 00:57:42,318

basketball where you have, you know, a

hundred to a hundred.

:

00:57:42,318 --> 00:57:45,968

So there's, you know, about each two to

three points, each one there's, you know,

:

00:57:45,968 --> 00:57:48,428

80 to 120 scoring events in a basketball

game.

:

00:57:48,428 --> 00:57:48,598

Right.

:

00:57:48,598 --> 00:57:49,988

So these models work a lot better.

:

00:57:49,988 --> 00:57:53,398

My work in American football has been to

sort of, how do we take the basketball

:

00:57:53,398 --> 00:57:56,588

model and make some modifications so we

can do a football model?

:

00:57:56,588 --> 00:57:59,718

And so one of the things that is tricky in

football is.

:

00:57:59,886 --> 00:58:03,606

that certain positions never get

substituted out.

:

00:58:03,606 --> 00:58:07,925

So on offense, the quarterback plays every

single play unless they're hurt or they

:

00:58:07,925 --> 00:58:08,706

stink.

:

00:58:08,706 --> 00:58:10,296

So they get benched.

:

00:58:10,296 --> 00:58:14,966

Well, the quarterback also always plays

with the same offensive line as long as

:

00:58:14,966 --> 00:58:17,286

they're healthy and they don't get

substituted out.

:

00:58:17,286 --> 00:58:22,146

So how does a model separate credit when

the same players are on the field all the

:

00:58:22,146 --> 00:58:22,336

time?

:

00:58:22,336 --> 00:58:27,470

And so my work in that was sort of to use

Bayesian statistics and take the...

:

00:58:27,470 --> 00:58:31,590

the Bayesian regression model where we had

a prior mean, I used some information to

:

00:58:31,590 --> 00:58:36,690

inform the prior mean for each player, but

I also did this unique thing where I

:

00:58:36,690 --> 00:58:37,510

shrink.

:

00:58:37,510 --> 00:58:43,810

So the prior variance is a function and is

actually, there's one prior variance for

:

00:58:43,810 --> 00:58:48,410

all players and then it's multiplied by

another parameter, which is unique for the

:

00:58:48,410 --> 00:58:50,150

position that they play.

:

00:58:50,150 --> 00:58:54,610

And so quarterbacks have a different

shrinkage parameter, essentially, or prior

:

00:58:54,610 --> 00:58:56,238

variance than.

:

00:58:56,238 --> 00:58:58,278

a different position.

:

00:58:58,478 --> 00:59:02,218

And then instead of just looking at

scoring plays in football, we have what we

:

00:59:02,218 --> 00:59:03,868

call is expected points added.

:

00:59:03,868 --> 00:59:09,098

So at each play, we look at on average,

how many points are you going to score if

:

00:59:09,098 --> 00:59:11,238

you have the ball in this position?

:

00:59:11,238 --> 00:59:13,578

And I look at the difference between two

plays, right?

:

00:59:13,578 --> 00:59:18,378

And that tells you essentially how much

value you got in the result of the play.

:

00:59:18,438 --> 00:59:22,238

So instead of using every scoring play, I

just use every single play in football.

:

00:59:22,238 --> 00:59:24,782

And I do this unique shrinkage.

:

00:59:24,942 --> 00:59:28,901

dependent on position and doing that, and

it's a huge model.

:

00:59:28,901 --> 00:59:32,582

So I did this in college football, which

has way too many parameters because

:

00:59:32,582 --> 00:59:34,342

there's like 16 ,000 kids.

:

00:59:34,342 --> 00:59:39,652

But even in the NFL, I've done this and

you get interesting results.

:

00:59:39,652 --> 00:59:42,632

Sometimes they match up with what you

think, sometimes they don't.

:

00:59:42,632 --> 00:59:45,642

But the interesting thing is you can

actually estimate how much you should

:

00:59:45,642 --> 00:59:47,182

shrink each position.

:

00:59:47,182 --> 00:59:52,222

And so actually the model is nice because

it essentially tells you how much of the

:

00:59:52,222 --> 00:59:53,902

variance in the outcome of the play.

:

00:59:53,902 --> 00:59:58,622

is dependent on how good players are

across different positions.

:

00:59:58,762 --> 01:00:02,942

So in football, we all know that

quarterbacks are the most impactful

:

01:00:02,942 --> 01:00:05,062

position in the game.

:

01:00:05,062 --> 01:00:10,542

And I did give somewhat subjective priors,

but not with, I still left a lot of

:

01:00:10,542 --> 01:00:14,862

uncertainty around and the model very well

could see and estimate that quarterbacks

:

01:00:14,862 --> 01:00:20,182

are in fact the most important position

because you shrink them the less they have

:

01:00:20,182 --> 01:00:21,742

the largest variance.

:

01:00:21,742 --> 01:00:23,556

So.

:

01:00:23,886 --> 01:00:24,816

You could look at that.

:

01:00:24,816 --> 01:00:29,206

If you look at the most impactful players

in football, it should be a quarterback.

:

01:00:29,206 --> 01:00:33,866

But in the same measure, the worst players

in football are also quarterbacks because

:

01:00:33,866 --> 01:00:37,906

in order to negatively hurt your team, you

can only hurt your team really a lot.

:

01:00:37,906 --> 01:00:40,966

If you're a quarterback compared to other

positions, I mean, every position you can

:

01:00:40,966 --> 01:00:43,846

hurt your team, but no one can hurt a team

as much as a bad quarterback hurts their

:

01:00:43,846 --> 01:00:44,566

team.

:

01:00:44,566 --> 01:00:47,276

Just like a good quarterback can help

their team better.

:

01:00:47,276 --> 01:00:52,866

So that's sort of like a kind of rough

overview of, of my plus minus modeling in

:

01:00:52,866 --> 01:00:53,614

football.

:

01:00:53,614 --> 01:00:59,784

I think I do have, when I wrote the paper,

I have a version of that written in Stan.

:

01:00:59,784 --> 01:01:04,354

The data set itself was not public, but I

did have a version of the Stan model

:

01:01:04,354 --> 01:01:08,194

written and uploaded on my GitHub that you

can look at.

:

01:01:08,194 --> 01:01:10,194

It's pretty massive.

:

01:01:10,474 --> 01:01:15,694

In recent years, I've tried to expand it

and to do a state space model type

:

01:01:15,694 --> 01:01:16,074

version.

:

01:01:16,074 --> 01:01:19,022

So I have effects for each player for each

season over time.

:

01:01:19,022 --> 01:01:20,932

Yeah, that was exactly what I meant.

:

01:01:20,932 --> 01:01:23,822

Computationally, that gets a little bit

trickier.

:

01:01:23,822 --> 01:01:28,342

And my dataset, actually, I was able to

scrape some data for that.

:

01:01:28,342 --> 01:01:29,742

And then actually, I can't anymore.

:

01:01:29,742 --> 01:01:32,622

The NFL just stopped releasing that.

:

01:01:32,622 --> 01:01:34,492

So that work is on hold for now.

:

01:01:34,492 --> 01:01:38,842

But I probably need to find a graduate

student that can help me finish it.

:

01:01:40,162 --> 01:01:43,102

Yeah, definitely we should put that in the

show notes.

:

01:01:43,462 --> 01:01:44,642

That's super interesting.

:

01:01:44,642 --> 01:01:46,414

Your paper in the...

:

01:01:46,414 --> 01:01:49,354

and the link to the GitHub repo.

:

01:01:49,674 --> 01:01:51,554

That's for sure.

:

01:01:52,014 --> 01:02:01,094

And that makes me think a recent episode I

did, and also a recent interest of mine, I

:

01:02:01,094 --> 01:02:09,514

started contributing to that package

called Baseflow, where that's precisely

:

01:02:09,794 --> 01:02:15,074

that could be useful in your case here,

because your model structure doesn't

:

01:02:15,074 --> 01:02:15,950

change.

:

01:02:15,950 --> 01:02:19,430

If I understand correctly, because well,

once you have the model structure, it's

:

01:02:19,430 --> 01:02:20,770

kind of like a physics model.

:

01:02:20,770 --> 01:02:25,040

It's not going to change when you have new

data, but the data sets do change.

:

01:02:25,040 --> 01:02:27,230

So you have new data sets coming in.

:

01:02:27,230 --> 01:02:33,210

And so that's where probably using these

kind of inference that's called amortized

:

01:02:33,210 --> 01:02:38,550

Bayesian inference could be extremely

useful because you would basically, if the

:

01:02:38,550 --> 01:02:42,320

bottle, the computational bottleneck would

just happen once.

:

01:02:42,729 --> 01:02:46,192

That would be when you train the deep

neural network.

:

01:02:46,222 --> 01:02:50,522

to learn the posterior structure and

parameters.

:

01:02:50,582 --> 01:02:57,342

So instead of MCMC, you're using the deep

neural network to learn the posterior.

:

01:02:57,342 --> 01:03:03,442

But then once you have trained the deep

neural network, then it's like doing

:

01:03:03,442 --> 01:03:05,502

posterior inference is trivial.

:

01:03:06,182 --> 01:03:12,782

And so for that kind of models where you

have a lot of data, but the model is the

:

01:03:12,782 --> 01:03:13,882

same.

:

01:03:14,318 --> 01:03:18,378

That's a very good use case for amortized

Bayesian inference.

:

01:03:18,598 --> 01:03:21,538

So that could be something very

interesting here.

:

01:03:21,958 --> 01:03:22,058

Yeah.

:

01:03:22,058 --> 01:03:22,798

Yeah.

:

01:03:22,798 --> 01:03:23,078

Yeah.

:

01:03:23,078 --> 01:03:23,178

Yeah.

:

01:03:23,178 --> 01:03:28,518

Happy to tell you more about that

afterwards if you're interested.

:

01:03:28,518 --> 01:03:32,498

But yeah, I've started digging into that,

and that's super fun for sure.

:

01:03:32,498 --> 01:03:36,118

So yeah, and I think this is a cool use

case.

:

01:03:36,958 --> 01:03:37,308

Awesome.

:

01:03:37,308 --> 01:03:41,646

Well, I still have a few questions, but

can I?

:

01:03:41,646 --> 01:03:45,986

We are getting short on time, so can I

keep you a bit longer?

:

01:03:46,146 --> 01:03:47,426

Yeah, just a few more minutes.

:

01:03:47,426 --> 01:03:47,566

Sure.

:

01:03:47,566 --> 01:03:48,406

Yeah.

:

01:03:48,406 --> 01:03:48,786

Okay.

:

01:03:48,786 --> 01:03:49,366

Awesome.

:

01:03:49,366 --> 01:03:49,986

Yeah.

:

01:03:49,986 --> 01:03:56,626

So actually, I'd like to pick your brain

about now talking a bit more about the

:

01:03:56,626 --> 01:03:57,666

future.

:

01:03:57,666 --> 01:03:59,066

I'm curious.

:

01:03:59,066 --> 01:04:01,846

So let me fuse two questions.

:

01:04:01,846 --> 01:04:09,606

So first, I'm curious what, where do you

see the field of spots analytics heading

:

01:04:09,606 --> 01:04:11,438

in the next years?

:

01:04:11,438 --> 01:04:13,038

five to 10 years.

:

01:04:13,278 --> 01:04:19,118

And also sub question is other spots,

specific spots where you see significant

:

01:04:19,118 --> 01:04:21,958

potential for growth in analytics.

:

01:04:23,918 --> 01:04:25,378

Yeah, those are, those are good questions.

:

01:04:25,378 --> 01:04:27,478

I think they go kind of hand in hand.

:

01:04:27,478 --> 01:04:33,718

You know, I think it's hard to, it's hard

if I could predict the future, right?

:

01:04:33,718 --> 01:04:36,857

I would probably have a different job.

:

01:04:37,258 --> 01:04:39,258

I'd probably be retired.

:

01:04:39,258 --> 01:04:41,216

But.

:

01:04:41,486 --> 01:04:47,366

You know, I think a lot of the future is

going to be catching up to, you know,

:

01:04:47,366 --> 01:04:52,776

sports like soccer, American football,

hockey going to be catching up.

:

01:04:52,776 --> 01:04:56,846

And I think a lot of the growth is

actually going to be making sports

:

01:04:56,846 --> 01:05:00,866

analytics more digestible for just

everyday people.

:

01:05:00,866 --> 01:05:02,306

So the fans, right.

:

01:05:02,306 --> 01:05:03,686

And that's happened over time, right?

:

01:05:03,686 --> 01:05:07,266

You watched a broadcast of a, of a soccer

game,

:

01:05:07,854 --> 01:05:10,144

20 years ago, no one talked about expected

goals.

:

01:05:10,144 --> 01:05:12,314

Now, most broadcasts will show it.

:

01:05:12,314 --> 01:05:13,634

They might not always talk about it.

:

01:05:13,634 --> 01:05:14,534

They'll show it.

:

01:05:14,534 --> 01:05:19,234

Like I said, expected goals, it's better

than just showing the score, but there's a

:

01:05:19,234 --> 01:05:22,094

lot to be left undone.

:

01:05:22,094 --> 01:05:26,714

I think in the future, there's going to be

a lot of sports analytics that's really

:

01:05:26,714 --> 01:05:29,974

much focused on expected values to date.

:

01:05:29,974 --> 01:05:35,354

And not enough has been focused on

distributions and variance around

:

01:05:35,354 --> 01:05:36,654

estimates.

:

01:05:36,654 --> 01:05:41,334

And so I think once one place it's going

to have to end up going.

:

01:05:41,344 --> 01:05:44,104

and part of the reason is, right, we, we

talk about neural networks.

:

01:05:44,104 --> 01:05:49,014

Neural networks are very good at expected

values, with really large data sets.

:

01:05:49,014 --> 01:05:50,284

It's a lot harder, right?

:

01:05:50,284 --> 01:05:54,754

Modeling variance is a lot harder in

anything than modeling and expectations.

:

01:05:54,754 --> 01:05:57,544

So I think catching up on some of those

things.

:

01:05:57,544 --> 01:06:02,094

And I think also, like I said, taking a

step back and I think, you know, there's

:

01:06:02,094 --> 01:06:04,694

been a lot of good work that has been

done, but I think we're going to find a

:

01:06:04,694 --> 01:06:05,550

few things that.

:

01:06:05,550 --> 01:06:08,850

Hey, maybe we were a little bit

overconfident, right?

:

01:06:08,850 --> 01:06:12,450

And with everything in sports, it's always

about game theory.

:

01:06:12,450 --> 01:06:18,310

So even if something is optimal today,

that strategy is not always going to be

:

01:06:18,310 --> 01:06:20,030

optimal in the future.

:

01:06:20,030 --> 01:06:25,570

And so if you, if, you know, in basketball

for a sec, we talked about three pointers.

:

01:06:25,570 --> 01:06:28,350

Of course, three pointers are really good

right now because they have higher

:

01:06:28,350 --> 01:06:32,830

expected value, but you know, defensively

players are learning to play against three

:

01:06:32,830 --> 01:06:34,446

pointers better than they used to.

:

01:06:34,446 --> 01:06:38,606

or in American football, the numbers have

said you should pass the ball more.

:

01:06:38,606 --> 01:06:42,385

Well, now the defenses are learning how to

defend it better.

:

01:06:42,385 --> 01:06:45,216

And so running is going to be more

important than it used to be.

:

01:06:45,216 --> 01:06:45,366

Right.

:

01:06:45,366 --> 01:06:47,446

And so these things are always going to

change.

:

01:06:47,446 --> 01:06:50,626

And so in five to 10 years, I don't know

exactly what it's going to be, but I think

:

01:06:50,626 --> 01:06:55,446

in some ways, you know, you might find

some analytics person in 10 years giving

:

01:06:55,446 --> 01:07:00,166

exact opposite advice of what we're seeing

now, just because the game has evolved.

:

01:07:00,166 --> 01:07:01,606

The game has changed.

:

01:07:02,030 --> 01:07:04,250

And so now you should do something else,

right?

:

01:07:04,250 --> 01:07:05,570

To get an edge.

:

01:07:06,130 --> 01:07:09,450

and so I think the growth is in twofold.

:

01:07:09,450 --> 01:07:12,310

We're always staying on the cutting edge

of like, what's next.

:

01:07:12,310 --> 01:07:15,870

Sometimes that's going back to where you

were.

:

01:07:16,000 --> 01:07:21,490

and like I said, making the numbers more

digestible for the everyday consumer.

:

01:07:21,850 --> 01:07:24,440

you know, it's, it's one thing you and I,

we can talk about models.

:

01:07:24,440 --> 01:07:26,280

I had to do this at ESPN all the time.

:

01:07:26,280 --> 01:07:29,230

I can't talk about prior distributions on

TV.

:

01:07:29,230 --> 01:07:29,560

Right?

:

01:07:29,560 --> 01:07:31,840

So how do we explain these things?

:

01:07:31,840 --> 01:07:31,990

Right?

:

01:07:31,990 --> 01:07:35,150

And I think what's really going to be key

is over time, this has happened already,

:

01:07:35,150 --> 01:07:38,810

but it's going to keep on happening that

the analysts themselves are going to be

:

01:07:38,810 --> 01:07:41,990

much more data literate than they have

been in the past.

:

01:07:41,990 --> 01:07:45,990

Not just because they have more people

working with them or they're younger.

:

01:07:45,990 --> 01:07:51,170

Also the analysts in the future is going

to be able to use AI to do their own

:

01:07:51,170 --> 01:07:51,870

analysis.

:

01:07:51,870 --> 01:07:57,084

And that could be scary because they might

make some bad assumptions.

:

01:07:57,230 --> 01:08:01,130

but they're also going to be more data

savvy and they could load up a data set

:

01:08:01,130 --> 01:08:02,440

and use an AI tool.

:

01:08:02,440 --> 01:08:06,830

And even if they can't code to get

insights that, you know, I used to have to

:

01:08:06,830 --> 01:08:10,510

write some code to get them and now they

can just do it themselves.

:

01:08:10,510 --> 01:08:10,740

Right.

:

01:08:10,740 --> 01:08:14,150

And so that's, I think somewhere else that

teams and coaches are going to be able to

:

01:08:14,150 --> 01:08:16,490

do more analysis on their own.

:

01:08:16,490 --> 01:08:19,180

And it's not that the data people aren't,

aren't needed.

:

01:08:19,180 --> 01:08:22,530

In fact, they're going to be needed even

more to make sure that the coach isn't

:

01:08:22,530 --> 01:08:24,280

missing an assumption, right.

:

01:08:24,280 --> 01:08:26,894

That he needs to be thinking about of the

structure of the data.

:

01:08:26,894 --> 01:08:28,094

Cause he might just be, great.

:

01:08:28,094 --> 01:08:29,184

Now I can run a regression.

:

01:08:29,184 --> 01:08:29,794

I don't even know.

:

01:08:29,794 --> 01:08:32,034

I don't even need to know how to code it.

:

01:08:32,034 --> 01:08:32,414

Right.

:

01:08:32,414 --> 01:08:33,054

that's great.

:

01:08:33,054 --> 01:08:35,534

But are you thinking about this?

:

01:08:35,534 --> 01:08:35,654

Right.

:

01:08:35,654 --> 01:08:38,934

And so there's going to be a lot of

education about using some of these tools

:

01:08:38,934 --> 01:08:42,784

better and every, but everyone's going to

have their access to it.

:

01:08:42,784 --> 01:08:42,894

Right.

:

01:08:42,894 --> 01:08:45,854

It's going to be so much more accessible

in the future than it has been in the

:

01:08:45,854 --> 01:08:46,914

past.

:

01:08:46,914 --> 01:08:47,494

Yeah.

:

01:08:47,494 --> 01:08:48,174

Yeah.

:

01:08:48,174 --> 01:08:49,194

Yeah.

:

01:08:49,694 --> 01:08:50,194

yeah, for sure.

:

01:08:50,194 --> 01:08:53,144

Completely, completely agree with that.

:

01:08:53,144 --> 01:08:54,954

and that's also something I'm very

passionate about.

:

01:08:54,954 --> 01:08:56,750

That's also what these show.

:

01:08:56,750 --> 01:08:57,750

is here, right?

:

01:08:57,750 --> 01:09:05,490

It's to have the bridge between the

modelers and the known stats people be

:

01:09:05,490 --> 01:09:06,600

easier, in a way.

:

01:09:06,600 --> 01:09:09,370

And that's something I really love doing

also in my job, basically being that

:

01:09:09,370 --> 01:09:14,150

bridge between the really nitty gritty

details of the model.

:

01:09:14,150 --> 01:09:19,410

And then, OK, now that we have the model,

how do we explain to the people who are

:

01:09:19,410 --> 01:09:23,890

actually going to consume the model

results what the model can do, what it

:

01:09:23,890 --> 01:09:26,573

cannot do, and how we can?

:

01:09:26,573 --> 01:09:30,293

make decisions based on that, that

hopefully are going to be better decisions

:

01:09:30,293 --> 01:09:31,793

than we used to make.

:

01:09:31,793 --> 01:09:33,583

And also, how do we update our decisions?

:

01:09:33,583 --> 01:09:37,453

Because, well, the game changes, as you

said so well.

:

01:09:37,453 --> 01:09:42,273

So yeah, for sure, all that stuff is

absolutely crucial.

:

01:09:42,273 --> 01:09:45,433

And I like using the metaphor of the

engine and the car, right?

:

01:09:45,433 --> 01:09:48,573

It's like building the model is the engine

of the car.

:

01:09:48,573 --> 01:09:52,133

So surely, you want the best engine

possible, but you also need a very cool

:

01:09:52,133 --> 01:09:55,393

car, because otherwise, nobody's going to

want your engine.

:

01:09:55,393 --> 01:09:55,950

And so...

:

01:09:55,950 --> 01:10:00,670

like then building all the communication

around the model, the visualizations,

:

01:10:00,670 --> 01:10:05,050

things like that, extremely important

because then in the end, as you were

:

01:10:05,050 --> 01:10:10,130

saying at the beginning of the show, if

the model isn't used, well, that's not a

:

01:10:10,130 --> 01:10:11,830

very good investment.

:

01:10:13,950 --> 01:10:14,550

Yeah.

:

01:10:14,550 --> 01:10:20,030

So I would have literally, I would have a

lot more questions if they are on my list,

:

01:10:20,030 --> 01:10:24,142

but we are going to call it a show poll

because I don't want to keep you...

:

01:10:24,142 --> 01:10:28,062

three hours, you've already been very

generous with your time.

:

01:10:28,142 --> 01:10:32,122

You can come back to the show anytime if

you want to, if you have a cool new

:

01:10:32,122 --> 01:10:34,302

project you want to talk about for sure.

:

01:10:34,622 --> 01:10:38,932

Yeah, maybe we can record the French

version of the podcast sometime, you know.

:

01:10:38,932 --> 01:10:39,842

yeah, yeah.

:

01:10:39,842 --> 01:10:41,092

I'll definitely be down for that.

:

01:10:41,092 --> 01:10:44,922

You know, someone who will be very happy

is my mother.

:

01:10:44,922 --> 01:10:48,222

She's always asking me, so when are you

going to do the French version of your

:

01:10:48,222 --> 01:10:49,962

courses in your podcasting zone?

:

01:10:49,962 --> 01:10:52,722

I'm like, that's not going to happen, mom.

:

01:10:54,078 --> 01:10:58,978

Maybe that's what moms are for though.

:

01:10:59,138 --> 01:11:00,398

Exactly.

:

01:11:03,178 --> 01:11:07,438

Before letting you go, Paul, I'm going to

ask you the last two questions.

:

01:11:07,438 --> 01:11:11,818

I ask every guest at the end of the show

because it's a Beijing show, so what

:

01:11:11,818 --> 01:11:15,878

counts is not the individual point

estimate, but the distribution of the

:

01:11:15,878 --> 01:11:17,018

responses.

:

01:11:17,498 --> 01:11:21,838

First question, if you had unlimited time

and resources, which problem?

:

01:11:21,838 --> 01:11:23,332

would you try to solve?

:

01:11:29,838 --> 01:11:30,058

Good.

:

01:11:30,058 --> 01:11:30,708

That's a good question.

:

01:11:30,708 --> 01:11:34,478

You sent me this ahead of time and I spent

a couple seconds and I was like, man, I

:

01:11:34,478 --> 01:11:35,438

don't know.

:

01:11:35,438 --> 01:11:37,848

But I, it's tough.

:

01:11:37,848 --> 01:11:39,458

There's so many questions in sports.

:

01:11:39,458 --> 01:11:40,218

Yeah.

:

01:11:40,218 --> 01:11:41,038

I know.

:

01:11:41,038 --> 01:11:48,658

I, I mean, my, one of my passions is

American football and I just keep going

:

01:11:48,658 --> 01:11:49,358

back.

:

01:11:49,358 --> 01:11:53,338

So I could tell, I love American football

and I love soccer, international football.

:

01:11:53,338 --> 01:11:53,738

Right.

:

01:11:53,738 --> 01:11:57,582

And both of those games, understanding.

:

01:11:57,582 --> 01:12:02,082

There's certain positions that are just

really hard to understand how valuable

:

01:12:02,082 --> 01:12:03,162

they are.

:

01:12:03,162 --> 01:12:05,482

And so in soccer, it's like the midfield.

:

01:12:05,482 --> 01:12:09,102

It's we know you need a good midfielder,

but how do you measure that?

:

01:12:09,102 --> 01:12:10,582

That's a really hard problem.

:

01:12:10,582 --> 01:12:13,522

And in football, there's a lot of

positions in American football.

:

01:12:13,522 --> 01:12:14,852

There's a lot of positions like that as

well.

:

01:12:14,852 --> 01:12:16,352

So I probably go somewhere along those.

:

01:12:16,352 --> 01:12:20,942

Like I want to, I want to discover and

measure the value in these really hard to

:

01:12:20,942 --> 01:12:25,202

measure, traits and values and these two

sports.

:

01:12:25,202 --> 01:12:26,902

Yeah.

:

01:12:27,182 --> 01:12:28,862

Yeah, I definitely understand.

:

01:12:29,582 --> 01:12:35,901

The battle for the middle is extremely

important always in soccer.

:

01:12:35,962 --> 01:12:42,102

And if you look at all the teams which win

the Champions League, so the Holy Grail,

:

01:12:42,102 --> 01:12:48,262

like the Super Bowl of the soccer world,

almost all the time they have an amazing

:

01:12:48,262 --> 01:12:54,302

and impressive pair or three players as

midfielders.

:

01:12:54,302 --> 01:12:56,222

And that's like a sine qua non.

:

01:12:56,222 --> 01:12:57,262

But...

:

01:12:57,262 --> 01:13:03,062

As you were saying, it's extremely hard to

come up with a metric that's going to not

:

01:13:03,062 --> 01:13:08,982

only explain why the midfielders are good,

but also help you constantly choose

:

01:13:08,982 --> 01:13:12,562

midfielders that will increase your

probability of winning the Champions

:

01:13:12,562 --> 01:13:13,062

League.

:

01:13:13,062 --> 01:13:17,702

And I'm seeing that as a very frustrated

Paris fan because that's been years since

:

01:13:17,702 --> 01:13:21,622

Thiago Mota basically retired that we're

looking for a number six.

:

01:13:21,622 --> 01:13:25,562

So the play, the midfielder just before

the defense and we're still looking for

:

01:13:25,562 --> 01:13:26,722

him.

:

01:13:26,842 --> 01:13:27,576

Yeah.

:

01:13:28,270 --> 01:13:31,970

So please, Paul, let me know when you're

done with that.

:

01:13:31,990 --> 01:13:32,510

Yeah.

:

01:13:32,510 --> 01:13:36,980

Well, unfortunately, there's several

really good French midfielders.

:

01:13:36,980 --> 01:13:38,930

They just don't play for PSG.

:

01:13:38,930 --> 01:13:40,790

I know.

:

01:13:40,790 --> 01:13:41,250

I know.

:

01:13:41,250 --> 01:13:43,330

Not a lot of French players stay in

France.

:

01:13:43,330 --> 01:13:46,430

That's why I'm telling you, we need a

European wide league.

:

01:13:46,430 --> 01:13:51,670

Many more players would stay in France and

play for PSG, I guess.

:

01:13:52,090 --> 01:13:56,750

And second question, if you could have

dinner with any great scientific mind.

:

01:13:56,750 --> 01:14:00,110

dead, alive or fictional, who would it be?

:

01:14:00,110 --> 01:14:01,030

Fictional?

:

01:14:01,030 --> 01:14:03,880

I haven't really thought about fictional

scientific minds.

:

01:14:03,880 --> 01:14:08,470

That is a good question.

:

01:14:10,650 --> 01:14:11,716

Geez.

:

01:14:17,038 --> 01:14:18,298

Man.

:

01:14:19,238 --> 01:14:23,418

Well, I mean, I thought you were going to

answer very fast.

:

01:14:23,418 --> 01:14:27,018

Actually, that one, I thought you were

going to answer Bill James like super

:

01:14:27,018 --> 01:14:27,598

fast.

:

01:14:27,598 --> 01:14:28,518

Bill James.

:

01:14:28,518 --> 01:14:28,838

Yeah.

:

01:14:28,838 --> 01:14:30,538

Well, I've met Bill James.

:

01:14:30,538 --> 01:14:31,038

So, okay.

:

01:14:31,038 --> 01:14:34,478

So I have dinner with him, but I have met

him.

:

01:14:34,478 --> 01:14:40,558

I'll go a little, how liberal are you with

the word scientific mind here?

:

01:14:40,558 --> 01:14:41,118

Yeah.

:

01:14:41,118 --> 01:14:46,278

So I think scientific mind, I think

Galileo, I think Newton, I think Einstein,

:

01:14:46,278 --> 01:14:46,518

right?

:

01:14:46,518 --> 01:14:47,278

Like,

:

01:14:47,278 --> 01:14:54,368

You know, those are all, but I'm sure from

the sports world, from the sports world,

:

01:14:54,368 --> 01:15:00,458

there is a former football player that

very few people have ever heard of and his

:

01:15:00,458 --> 01:15:01,838

name is Virgil Carter.

:

01:15:01,838 --> 01:15:07,518

And the reason why I love him, he played

in the seventies is that he wrote a paper

:

01:15:07,518 --> 01:15:11,878

about expected points in football while he

was playing in the NFL.

:

01:15:11,958 --> 01:15:15,566

And it was sort of the first sports

analytics.

:

01:15:15,566 --> 01:15:20,586

ever done in American football and he was

a player in American football at the same

:

01:15:20,586 --> 01:15:21,326

time.

:

01:15:21,326 --> 01:15:22,956

So very, not very well known.

:

01:15:22,956 --> 01:15:23,826

He's still alive.

:

01:15:23,826 --> 01:15:26,846

I don't know him at all, but he would be a

really cool person.

:

01:15:26,846 --> 01:15:31,846

If I go like classical, scientific,

scientific minds, I would, I would

:

01:15:31,846 --> 01:15:38,086

probably, maybe Gauss like, Hey, this

distribution that has your name is like

:

01:15:38,086 --> 01:15:41,006

used everywhere and it's very useful.

:

01:15:41,006 --> 01:15:43,246

So I probably, I would stick with him.

:

01:15:43,466 --> 01:15:45,650

Normal distributions.

:

01:15:45,742 --> 01:15:48,682

counseling distributions, like the rule of

world nowadays.

:

01:15:48,682 --> 01:15:52,402

So I'd probably stick with that if I were

to go traditional scientific mind.

:

01:15:52,402 --> 01:15:52,582

Yeah.

:

01:15:52,582 --> 01:15:53,242

Yeah.

:

01:15:53,242 --> 01:15:54,212

Now good choices.

:

01:15:54,212 --> 01:15:55,322

Good choices.

:

01:15:55,322 --> 01:15:58,072

I am amazed about that Virgil Carter

story.

:

01:15:58,072 --> 01:15:59,642

That's so amazing.

:

01:15:59,642 --> 01:16:00,042

Yeah.

:

01:16:00,042 --> 01:16:05,682

So if anybody knows Virgil Carter, please

contact us and we'll try to get that

:

01:16:05,682 --> 01:16:06,642

dinner for Paul.

:

01:16:06,642 --> 01:16:11,802

If you do that, I'll definitely be here to

grab the dinner and have a conversation

:

01:16:11,802 --> 01:16:15,062

with Virgil because like having someone

like that on the show would be absolutely

:

01:16:15,062 --> 01:16:15,694

amazing.

:

01:16:15,694 --> 01:16:16,774

I love that story.

:

01:16:16,774 --> 01:16:17,874

That's so amazing.

:

01:16:17,874 --> 01:16:21,234

It's like, you know, the myth of the

philosopher king.

:

01:16:21,234 --> 01:16:24,034

Well, here is like the myth of the

scientist player.

:

01:16:24,034 --> 01:16:26,253

It's just like, I love that.

:

01:16:26,474 --> 01:16:27,394

Yeah.

:

01:16:27,394 --> 01:16:28,234

that's fantastic.

:

01:16:28,234 --> 01:16:28,974

Damn.

:

01:16:28,974 --> 01:16:30,514

Thanks a lot, Paul.

:

01:16:30,614 --> 01:16:32,014

Let's call it a show.

:

01:16:32,014 --> 01:16:33,294

Thanks for having me.

:

01:16:33,294 --> 01:16:35,694

Yeah, that was amazing.

:

01:16:35,874 --> 01:16:41,034

As usual, we'll put resources and a link

to your website in the show notes for

:

01:16:41,034 --> 01:16:42,494

those who want to dig deeper.

:

01:16:42,494 --> 01:16:45,678

Thanks again, Paul, for taking the time

and being on this show.

:

01:16:46,126 --> 01:16:48,614

Thanks once again, I really enjoyed it.

:

01:16:52,558 --> 01:16:56,258

This has been another episode of Learning

Bayesian Statistics.

:

01:16:56,258 --> 01:17:01,218

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

:

01:17:01,218 --> 01:17:06,138

visit learnbaystats .com for more

resources about today's topics, as well as

:

01:17:06,138 --> 01:17:10,858

access to more episodes to help you reach

true Bayesian state of mind.

:

01:17:10,858 --> 01:17:12,798

That's learnbaystats .com.

:

01:17:12,798 --> 01:17:17,618

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lass and Megharam.

:

01:17:17,618 --> 01:17:20,798

Check out his awesome work at bababrinkman

.com.

:

01:17:20,798 --> 01:17:21,966

I'm your host.

:

01:17:21,966 --> 01:17:22,966

Alex and Dora.

:

01:17:22,966 --> 01:17:27,226

You can follow me on Twitter at Alex

underscore and Dora like the country.

:

01:17:27,226 --> 01:17:32,286

You can support the show and unlock

exclusive benefits by visiting patreon

:

01:17:32,286 --> 01:17:34,466

.com slash LearnBasedDance.

:

01:17:34,466 --> 01:17:36,886

Thank you so much for listening and for

your support.

:

01:17:36,886 --> 01:17:42,806

You're truly a good Bayesian change your

predictions after taking information and

:

01:17:42,806 --> 01:17:45,946

if you think and I'll be less than

amazing.

:

01:17:45,946 --> 01:17:49,262

Let's adjust those expectations.

:

01:17:49,262 --> 01:17:54,662

Let me show you how to be a good Bayesian

Change calculations after taking fresh

:

01:17:54,662 --> 01:18:00,702

data in Those predictions that your brain

is making Let's get them on a solid

:

01:18:00,702 --> 01:18:02,502

foundation

Chapters

Video

More from YouTube