#108 Modeling Sports & Extracting Player Values, with Paul Sabin

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways

Convincing non-stats stakeholders in sports analytics can be challenging, but building trust and confirming their prior beliefs can help in gaining acceptance.
Combining subjective beliefs with objective data in Bayesian analysis leads to more accurate forecasts.
The availability of massive data sets has revolutionized sports analytics, allowing for more complex and accurate models.
Sports analytics models should consider factors like rest, travel, and altitude to capture the full picture of team performance.
The impact of budget on team performance in American sports and the use of plus-minus models in basketball and American football are important considerations in sports analytics.
The future of sports analytics lies in making analysis more accessible and digestible for everyday fans.
There is a need for more focus on estimating distributions and variance around estimates in sports analytics.
AI tools can empower analysts to do their own analysis and make better decisions, but it's important to ensure they understand the assumptions and structure of the data.
Measuring the value of certain positions, such as midfielders in soccer, is a challenging problem in sports analytics.
Game theory plays a significant role in sports strategies, and optimal strategies can change over time as the game evolves.

Chapters

00:00 Introduction and Overview

09:27 The Power of Bayesian Analysis in Sports Modeling

16:28 The Revolution of Massive Data Sets in Sports Analytics

31:03 The Impact of Budget in Sports Analytics

39:35 Introduction to Sports Analytics

52:22 Plus-Minus Models in American Football

01:04:11 The Future of Sports Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan and Francesco Madrisotti.

Links from the show:

LBS Sports Analytics playlist: https://www.youtube.com/playlist?list=PL7RjIaSLWh5kDiPVMUSyhvFaXL3NoXOe4
Paul’s website: https://sabinanalytics.com/
Paul on GitHub: https://github.com/sabinanalytics
Paul on Linkedin: https://www.linkedin.com/in/rpaulsabin/
Paul on Twitter: https://twitter.com/SabinAnalytics
Paul on Google Scholar: https://scholar.google.com/citations?user=wAezxZ4AAAAJ&hl=en
Soccer Power Ratings & Projections: https://sabinanalytics.com/ratings/soccer/
Estimating player value in American football using plus–minus models: https://www.degruyter.com/document/doi/10.1515/jqas-2020-0033/html
World Football R Package: https://github.com/JaseZiv/worldfootballR

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:02

Folks, you may know it by now, I am a huge

sports fan.

2

: 00:00:09

So needless to say that this episode was

like being in a candy store for me.

3

: 00:00:12

Well, more appropriately, in a chocolate

store.

4

: 00:00:16

Paul Sabin is so knowledgeable that this

conversation was an absolute blast for me.

5

: 00:00:21

In it, Paul discusses his experience with

non -stats stakeholders in sports

6

: 00:00:26

analytics and the challenges of convincing

them to adopt evidence -based decisions.

7

: 00:00:31

He also explains his soccer power ratings

and projections model, which uses a

8

: 00:00:35

Bayesian approach and expected goals, as

well as the importance of understanding

9

: 00:00:39

player value in difficult to measure

positions and the need for more accessible

10

: 00:00:43

and digestible sports analytics for fans.

11

: 00:00:46

We also touch on the impact of budget on

team performance in American sports and

12

: 00:00:50

the use of plus -minus models in

basketball and American football.

13

: 00:00:54

Paul is a senior fellow at the Wharton

Sports Analytics and Business Initiative

14

: 00:00:59

and I like truer

15

: 00:01:00

in the Department of Statistics and Data

Science at the Wharton School of the

16

: 00:01:04

University of Pennsylvania.

17

: 00:01:06

He has spent his entire career as a sports

analytics professional, teaching and

18

: 00:01:11

leading sports analytics research

projects.

19

: 00:01:13

This is Learning Visions Statistics,

,: 2024

20

: 00:01:28

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

21

: 00:01:42

methods, the projects, and the people who

make it possible.

22

: 00:01:45

I'm your host, Alex Andorra.

23

: 00:01:47

You can follow me on Twitter at Alex

underscore Andorra, like the country, for

24

: 00:01:52

any info about the show.

25

: 00:01:53

LearnBayStats .com is Laplace to me.

26

: 00:01:56

Show notes.

27

: 00:01:57

becoming a corporate sponsor, unlocking

Bayesian Merge, supporting the show on

28

: 00:02:01

Patreon, everything is in there.

29

: 00:02:03

That's LearnBasedStats .com.

30

: 00:02:05

If you're interested in one -on -one

mentorship, online courses, or statistical

31

: 00:02:10

consulting, feel free to reach out and

book a call at topmate .io slash alex

32

: 00:02:15

underscore and dora.

33

: 00:02:17

See you around, folks, and best Bayesian

wishes to you all.

34

: 00:02:24

Welcome to Learning Vagin Statistics.

35

: 00:02:54

a full conversation in French as we just

had before recording.

36

: 00:02:58

Well done.

37

: 00:02:59

It used to be though.

38

: 00:03:00

Go back two to three hundred years.

39

: 00:03:02

Maybe you just don't go to Africa enough.

40

: 00:03:06

That's where French is spoken a lot now

too.

41

: 00:03:09

Exactly.

42

: 00:03:10

But other than that, you can see French

used to be a very international language

43

: 00:03:14

because in my travels, almost all the time

people tell me, yeah, I studied French in

44

: 00:03:20

high school.

45

: 00:03:21

And the only thing they can say is just a

few words.

46

: 00:03:24

Which is normal, like if you don't use it,

right?

47

: 00:03:27

But yeah, you can see that because French

is still, or was still taught in high

48

: 00:03:31

school and now less and less.

49

: 00:03:34

So yeah, so well done Paul for that.

50

: 00:03:37

I know, I don't think French is an easy

language to learn.

51

: 00:03:41

What has been your experience?

52

: 00:03:43

I'm actually very curious.

53

: 00:03:45

You know, it's hard to say, so this is a

statistics pod or data science podcast.

54

: 00:03:50

So I guess I can't really, I can't really

compare it to anything else.

55

: 00:03:53

That's the only other language I've

learned besides my native English.

56

: 00:03:57

So, you know, I guess, you know, one

sample size for me, I took it in high

57

: 00:04:03

school as well.

58

: 00:04:03

I hated it.

59

: 00:04:04

I had, so, you know, coming from America,

you know, so the reason I chose, you know,

60

: 00:04:11

seventh grade is when I had to choose

whether I was taking French or Spanish.

61

: 00:04:15

And I'm the youngest of four kids in my

family growing up.

62

: 00:04:19

And my older siblings told me that the

Spanish teacher was really mean.

63

: 00:04:23

And that's originally why I took took

French.

64

: 00:04:26

and then I took it for the required two to

three years.

65

: 00:04:29

And then I was done.

66

: 00:04:30

I had in high school, I had this teacher

from Belgium and I still remember her

67

: 00:04:34

name, Madame Vendon Plus, and I couldn't

stand her, but come, come to find out

68

: 00:04:39

looking back in life that she was actually

a really nice person.

69

: 00:04:42

She was just Belgian.

70

: 00:04:45

And the cultural, you know, like Americans

think they're the best and the French

71

: 00:04:51

language in Europe people also think

they're the best because they ruled the

72

: 00:04:56

world in the 17 and 1800s and America felt

like they've ruled the world for the last

73

: 00:05:01

100 years.

74

: 00:05:02

And so when you get into a room together

and you think both of your cultures are

75

: 00:05:06

superior, you know, that doesn't go well

together.

76

: 00:05:09

But actually, so after that, I didn't...

77

: 00:05:10

speak French at all.

78

: 00:05:11

And then I did church service for my

church for two years and I lived in

79

: 00:05:15

Montreal, I lived in Quebec, not actually

in the city, I lived in a lot of rural

80

: 00:05:19

small town.

81

: 00:05:21

And so I studied French really hard.

82

: 00:05:23

I had to learn the very strong Quebecois

accent.

83

: 00:05:27

And then when I went back to school, it's

when I like really honed in my French.

84

: 00:05:32

I was very conversational, could speak

very fluently in Quebec, but then, you

85

: 00:05:37

know, I had to learn the grammar a little

bit more.

86

: 00:05:39

in depth.

87

: 00:05:39

So then I studied French as well at

university as well.

88

: 00:05:43

So, you know, immersing yourself and the

actually like learning languages because

89

: 00:05:47

when I learned it in school, it didn't

never made sense to me.

90

: 00:05:50

But when I studied it on my own and I

studied conjugation and all these things,

91

: 00:05:55

it became kind of like a math problem.

92

: 00:05:56

And so when I would speak a sentence in my

head, I'd always be like, I need a

93

: 00:06:00

subject.

94

: 00:06:01

I need to conjugate the verb.

95

: 00:06:03

And then I need to say like what I'm, you

know, just

96

: 00:06:06

do an adverb or an adjective after it.

97

: 00:06:08

And like it made sense in my head, but

that's not how I was taught in school.

98

: 00:06:12

I was taught, I had to memorize all these

words, like everything in the kitchen.

99

: 00:06:15

How do you say dishwasher?

100

: 00:06:16

How do you say refrigerator?

101

: 00:06:18

How do you say fork?

102

: 00:06:19

How do you say spoon?

103

: 00:06:20

I couldn't learn like that, but at like

living and like thinking about French as a

104

: 00:06:25

math equation, it made sense in my head

and I was able to pick it up.

105

: 00:06:28

You know, sure.

106

: 00:06:29

I made tons of mistakes and embarrassed

myself, but it wasn't too bad.

107

: 00:06:33

And that's how you learn.

108

: 00:06:34

Yeah.

109

: 00:06:34

So I'm guessing.

110

: 00:06:36

Like from that answer, I'm guessing people

already know why I invited you on the

111

: 00:06:40

podcast.

112

: 00:06:41

Very nerdy answer, your put languages,

that's perfect.

113

: 00:06:44

Thanks a lot.

114

: 00:06:45

And yeah, I completely relate actually.

115

: 00:06:47

I learned English and German in high

school and yeah, kind of the same.

116

: 00:06:54

I always hated formal language learning.

117

: 00:06:58

And like in the end I learned these

languages and Spanish that was the same

118

: 00:07:02

and Italian that was the same, just going

to the country basically.

119

: 00:07:06

And yeah, as you were saying, I think also

what it adds is you've got skin in the

120

: 00:07:12

game.

121

: 00:07:13

You're in the country, you're having a

conversation with someone.

122

: 00:07:17

If you're not able to talk, you look

extremely stupid.

123

: 00:07:21

So it's a very good incentive for the

brain to step up and learn.

124

: 00:07:27

And that's really awesome.

125

: 00:07:28

And then when you are in the situation

that you...

126

: 00:07:31

don't know what to say, you remember that.

127

: 00:07:33

And then when you learn, this is what I

should have said, it sticks with you

128

: 00:07:36

because it has an emotional attachment to

it.

129

: 00:07:40

Yeah.

130

: 00:07:40

Yeah.

131

: 00:07:41

No, exactly.

132

: 00:07:42

And I mean, and that's going to be a good

segue to my first question to you, but I

133

: 00:07:47

think it's also one of the situations in

life, where you can really, feel and see

134

: 00:07:55

your brain learning.

135

: 00:07:56

So that's why I also really love learning

new languages and going to countries to do

136

: 00:07:59

that because.

137

: 00:08:01

Like you arrive in the country, you don't

know how to say anything.

138

: 00:08:03

And in just a few weeks, your brain starts

picking up stuff and you can really,

139

: 00:08:08

really feel your brain doing its amazing

work that it's been like conditioned to do

140

: 00:08:14

from years of evolution.

141

: 00:08:16

And to me, that's just absolutely

incredible that the brain is able to do

142

: 00:08:19

that.

143

: 00:08:19

Even when you're like in your thirties and

beyond, you can do that.

144

: 00:08:24

And it's just, I found that absolutely

incredible.

145

: 00:08:27

And that's kind of like a Bayesian.

146

: 00:08:30

neural network, you know, so I mean, see

that segue, I should definitely have a

147

: 00:08:35

podcast.

148

: 00:08:37

So actually talking about base.

149

: 00:08:41

Yeah, I invited you on the podcast because

you do absolutely awesome work on sports

150

: 00:08:45

modeling.

151

: 00:08:46

And people know that I'm a big fan of a

lot of sports.

152

: 00:08:52

I love modeling sports and so on.

153

: 00:08:54

So I'm super happy to have you here.

154

: 00:08:55

And I have a list of questions that is

embarrassingly long.

155

: 00:08:59

But maybe can you tell us if you are

actually yourself using some basic

156

: 00:09:05

methods, if you're familiar with those or

not?

157

: 00:09:09

And yeah, in general, what does that look

like in your work?

158

: 00:09:13

Yeah.

159

: 00:09:14

So yeah, I mean, just a quick background

about myself, right?

160

: 00:09:16

I've worked in sports, what we call sports

analytics for almost 10 years now.

161

: 00:09:24

Out of actually, I was getting my PhD.

162

: 00:09:27

And statistics, and I, you got, there was

this job opportunity at ESPN, you know,

163

: 00:09:34

which is a sports broadcasting television

channel in the U S and a few other

164

: 00:09:38

countries.

165

: 00:09:38

And, you know, I got the job offer to work

on their sports analytics team where

166

: 00:09:45

essentially what the team there does is

make forecasts so that, you know, they can

167

: 00:09:50

show on TV, you know, on the bottom line,

like who's expected to win, or they can,

168

: 00:09:54

we will run simulations on.

169

: 00:09:56

you know, who's likely to win the

championship, you know, all throughout the

170

: 00:09:59

season.

171

: 00:09:59

And so, you know, you can tell stories

with that saying, you know, the team was

172

: 00:10:03

just like the beginning of the season.

173

: 00:10:04

No one thought they were going to be any

good, but just look how it, you know, they

174

: 00:10:08

got better or the opposite.

175

: 00:10:09

Like they were supposed to be really good

and everything just went wrong.

176

: 00:10:13

And so in my field in sports modeling, I

would think actually you can't, you can't

177

: 00:10:19

do it without being Bayesian.

178

: 00:10:21

And so when I would interview people, I'd

always focus on, on those.

179

: 00:10:24

So as people coming out of school,

sometimes they don't always learn Bayesian

180

: 00:10:28

methods very well.

181

: 00:10:30

And the reason is in sports, sample sizes

are very small and you have to make

182

: 00:10:34

forecasts with very limited data.

183

: 00:10:38

And the great thing about Bayesian is

statistics is that you actually have more

184

: 00:10:42

data.

185

: 00:10:43

You just haven't observed it.

186

: 00:10:44

You have expertise or you have opinions,

but those opinions actually matter.

187

: 00:10:48

And so maybe we'll get into this, but I'm

actually a very strong advocate because of

188

: 00:10:51

my field of being a subjective Bayesian

analysis.

189

: 00:10:55

It's okay to insert some information into

your models and it usually makes them

190

: 00:10:59

better.

191

: 00:11:01

Yeah.

192

: 00:11:02

Well, awesome.

193

: 00:11:03

couldn't have dreamt better and I have to

fully structure.

194

: 00:11:07

I didn't know Paul was going to answer

that because that's not really, I haven't

195

: 00:11:10

seen that in your, you know, on your

website or else,

196

: 00:11:15

So before, while preparing the episode, I

didn't know if you were already using

197

: 00:11:18

Bayesian methods or else.

198

: 00:11:20

But definitely, definitely happy to hear

that.

199

: 00:11:23

And so that people know that was not a

conspiracy.

200

: 00:11:26

I didn't know anything that Paul was going

to say.

201

: 00:11:30

OK, so that's awesome.

202

: 00:11:33

So I'm an open source developer, so I'm

always very curious about the stack you're

203

: 00:11:37

using.

204

: 00:11:39

What are you using actually when you're

doing Bayesian analysis of a spot model?

205

: 00:11:46

So in my career, I almost always use R and

Stan.

206

: 00:11:50

So if I'm doing Bayes analysis, I write a

lot of Stan code.

207

: 00:11:55

It's gotten easier with the Chat GPT.

208

: 00:11:58

It doesn't do it all the way, right?

209

: 00:12:00

But if it's like, hey, I want to build

this kind of model, it'll at least give me

210

: 00:12:04

a good framework.

211

: 00:12:05

And then I can adjust it and edit it as I

want from there.

212

: 00:12:10

Yeah.

213

: 00:12:11

Yeah.

214

: 00:12:12

And I mean, for sure, you cannot go wrong

with the.

215

: 00:12:15

with R and Stan.

216

: 00:12:17

So yeah, definitely.

217

: 00:12:19

And we've had the, one of the creators of

Stan, Andrew Gellman, was back on the

218

: 00:12:26

podcast a few weeks ago.

219

: 00:12:29

It was not released yet, but through time

travel, it's gonna have been released when

220

: 00:12:35

your episode is out.

221

: 00:12:36

So folks, you can go back to - Right,

because I am definitely a lesser draw than

222

: 00:12:42

Andrew Gellman is, but that's great.

223

: 00:12:44

No, yeah, so if people are curious about

what Andrew has been up to, lastly, it's

224

: 00:12:52

the third time he's been on the show and

he just released a new book, Active

225

: 00:12:56

Statistics, that I definitely recommend.

226

: 00:12:58

It's really fun to read.

227

: 00:13:00

It's like, it's how to teach statistics

with stories, which actually relates to

228

: 00:13:05

something you just said, Paul, about the,

like, cool and fun way to relate

229

: 00:13:11

statistics to...

230

: 00:13:12

non -stats people was to be able to tell

stories about a team's probability of

231

: 00:13:20

winning or any forecast like that.

232

: 00:13:23

So that's definitely interesting to hear

you talk about that.

233

: 00:13:27

And actually I'm curious because I've been

following that field of spots analytics

234

: 00:13:35

for a few years and I've seen it

personally mature.

235

: 00:13:40

quite a lot and evolved quite a lot when

it comes to the technology and the data

236

: 00:13:44

availability.

237

: 00:13:46

So I'm curious what an expert like you

think about that evolution of technology

238

: 00:13:52

and data availability and how that changed

the landscape of Spots Analytics.

239

: 00:13:59

Yeah, I mean, it's exploded in the last 10

to 15 years.

240

: 00:14:02

So I mean, if people are familiar with the

book slash movie Moneyball, which is

241

: 00:14:07

20, about 20 years, the book is about 20

years old now.

242

: 00:14:10

The movie is about 12, 13 years old now.

243

: 00:14:13

you know, back then in baseball, baseball

was the sport that sort of took off in

244

: 00:14:19

sports analytics.

245

: 00:14:20

I mean, for a couple of reasons.

246

: 00:14:22

One, the game is very discreet.

247

: 00:14:25

So their start and their stopping points.

248

: 00:14:27

So you can measure.

249

: 00:14:28

Right.

250

: 00:14:29

Discrete events very well in baseball, but

two, like they're the only sport that

251

: 00:14:32

actually had a really long running data

set.

252

: 00:14:36

And that went back and they've been

keeping statistics in baseball and you can

253

: 00:14:39

actually go back to the 1800s and find out

ople were playing baseball in: 1895

254

: 00:14:46

No other sport has that.

255

: 00:14:47

So that's, that's probably the reason why

baseball took off.

256

: 00:14:50

but since then, you know, every sport for

a while after that, every sport had what

257

: 00:14:54

we call play by play data, which is like,

this is what happens.

258

: 00:14:57

Soccer had a, a version that was called

event data.

259

: 00:15:01

So would people would.

260

: 00:15:02

watch a game and every time someone

touched the ball or made a pass, they

261

: 00:15:06

would mark, the ball was touched here on

the field and it was passed to there or

262

: 00:15:10

they dribbled from here to there.

263

: 00:15:12

So it was, they kind of were discretizing

soccer in a way to make it a similar

264

: 00:15:18

format.

265

: 00:15:18

But then about 10 years ago, we started

getting this player tracking data, which

266

: 00:15:22

is the location of everybody and the ball

or the puck on the field, you know,

267

: 00:15:27

depending on the sport, 10 to 25 times per

second.

268

: 00:15:30

And that's drastically changed.

269

: 00:15:32

the methodologies and things that are

used.

270

: 00:15:34

So, I mean, Bayesian analysis was great

for this play by play data or even, you

271

: 00:15:39

know, game by game data and measuring how,

how players or teams performed.

272

: 00:15:44

And then now we've started getting such

huge data sets that, you know, more of the

273

: 00:15:48

computer science world, neural networks,

things like that started becoming much

274

: 00:15:51

more prevalent in sports analysis just

because the data sets were so massive.

275

: 00:15:56

Not that statistics doesn't play a role.

276

: 00:15:58

It still does.

277

: 00:15:58

And I think.

278

: 00:15:59

People sometimes overly rely on these

black box methods.

279

: 00:16:02

They don't think about the implications or

the biases in the data, which are still

280

: 00:16:06

important.

281

: 00:16:07

But we have these huge amounts of data now

and it's just exploded to like, you know,

282

: 00:16:11

if you want all the data in a season in

the NFL, it's like over one terabyte of

283

: 00:16:17

locations of everybody on every field, 20,

every play of 25 times a second.

284

: 00:16:22

It's just massive.

285

: 00:16:22

Right.

286

: 00:16:23

So it's, it's really changed the way

people have done things.

287

: 00:16:27

Right.

288

: 00:16:28

And we started going from really simple

questions to huge big questions.

289

: 00:16:31

And the funny thing is now, I actually

think with the data being so large, people

290

: 00:16:36

are now actually going back to answering

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube