Artwork for podcast Learning Bayesian Statistics
#131 Decision-Making Under High Uncertainty, with Luke Bornn
Sports Analytics Episode 13130th April 2025 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:31:45

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.

Takeaways:

  • Player tracking data revolutionized sports analytics.
  • Decision-making in sports involves managing uncertainty and budget constraints.
  • Luke emphasizes the importance of portfolio optimization in team management.
  • Clubs with high budgets can afford inefficiencies in player acquisition.
  • Statistical methods provide a probabilistic approach to player value.
  • Removing human bias is crucial in sports decision-making.
  • Understanding player performance distributions aids in contract decisions.
  • The goal is to maximize performance value per dollar spent.
  • Model validation in sports requires focusing on edge cases.
  • Generative models help account for uncertainty in player performance.
  • Computational efficiency is key in handling large datasets.
  • A diverse skill set enhances problem-solving in sports analytics.
  • Broader knowledge in data science leads to innovative solutions. 
  • Integrating software engineering with statistics is crucial in sports analytics.
  • Model validation often requires more work than model fitting itself.
  • Understanding the context of data is essential for accurate predictions.
  • Continuous learning and adaptation are essential in analytics.

Chapters:

11:58 Transition from Academia to Sports Analytics

20:44 Evolution of Sports Analytics and Data Sources

23:53 Modeling Uncertainty in Decision Making

32:05 The Role of Statistical Models in Player Evaluation

39:20 Generative Models and Bayesian Framework in Sports

46:54 Hacking Bayesian Models for Better Performance

49:55 Understanding Computational Challenges in Bayesian Inference

52:44 Exploring Different Approaches to Model Fitting

56:30 Building a Comprehensive Statistical Toolbox

01:00:37 The Importance of Data Management in Modeling

01:03:21 Iterative Model Validation and Diagnostics

01:06:53 Uncovering Insights from Sports Data

01:16:47 Emerging Trends in Sports Analytics

01:21:30 Future Directions and Personal Aspirations

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

uh Today, I am excited to host Luke Bourne, a true pioneer in sports analytics and based

in statistics.

2

:

Luke started his career as a statistics professor at Harvard and Simon Fraser before

pivoting almost entirely to sports analytics over a decade ago.

3

:

Luke has worked across multiple sports roles including quantitative gambling, analytics

leadership at A.S.

4

:

Roma and the Sacramento Kings and co-founding

5

:

obviously Zealous Analytics, which grew to a 75 plus person company before being acquired

by Teamworks.

6

:

He was also part of the ownership group at Toulouse FC, where he's applied data-driven

decision-making to build a competitive club on a budget, winning both Ligue 2 and the

7

:

Coupe de France.

8

:

In this episode, Luke takes us through the evolution of player tracking data and the

application of basic methods in decision-making under uncertainty.

9

:

We dive into portfolio optimization for player acquisition, the challenges of model

validation, and the role of generative models in forecasting performance.

10

:

Whether you're into sports analytics or just fascinated by decision-making in high-stakes

environments, this episode is packed with insight from one of the best in the field.

11

:

This is Learning Basics and episode 131, recorded December 10, 2024.

12

:

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

13

:

I'm your host, Alex Andorra.

14

:

You can follow me on Twitter at alex-underscore-andorra.

15

:

like the country.

16

:

For any info about the show, learnbasedats.com is the last to be.

17

:

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on

Patreon, everything is in there.

18

:

That's learnbasedats.com.

19

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.

20

:

See you around, folks.

21

:

and best patient wishes to you all.

22

:

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

23

:

Check us out at pimc-labs.com.

24

:

Luke Bourne, welcome to Learning Bayesian Statistics.

25

:

Yeah, thanks for me on.

26

:

Yeah, thank you so much for taking the time.

27

:

This is a blast to have you here.

28

:

uh I never thought I would be able to uh interview you on the show, to be honest.

29

:

uh You're kind of a uh superstar to me in my world, you know, so I'm very happy to have

you here.

30

:

And thank you so much to Patrick Ward for...

31

:

us in in contact.

32

:

Patrick, if you're listening, thank you so much.

33

:

if you when you come to Miami, I definitely owe you a good dinner.

34

:

know, anything Patrick asks, I do.

35

:

So and as prep for this, I went back and I looked at all your previous, listened to a

bunch of the previous episodes and really kind of

36

:

really cool to me to see your podcast in some sense resembles my career like the last you

recent episodes you have a lot of people that I call colleagues and friends now Robbie and

37

:

Patrick and ah Paul Saban and then I look farther back and it's like

38

:

Julian Cornerbees, and I worked together in North Carolina, what he was doing was

post-doctoral at UBC.

39

:

We actually wrote a short paper together.

40

:

Andy Gellman and I shared an office when I started at Harvard.

41

:

Nicholas Chapan and I have had many dinners together over the years.

42

:

Kevin Murphy, know well from my time at UBC.

43

:

So was really cool to kind of, it really blends my sort of two phases of my career, of

first as an academic and now sort of in sports.

44

:

Oh yeah, okay, that's good to know.

45

:

Yeah, for sure.

46

:

So I didn't know you knew um Nicolas and Julien so well.

47

:

um I remember Andrew Gelman, um which much has been really cool.

48

:

honestly, working in the same hallway as Andrew because each time I get to see him, it's

really cool and I get a lot of, uh you know, it's just awesome because he always has so

49

:

many stories and I can ask him any questions I have.

50

:

So working with him.

51

:

must be incredible.

52

:

Yeah, and we shared an office my first my first semester at Harvard.

53

:

I was actually on sabbatical kind of a parental leave because I just had a kid.

54

:

And I was going back and forth between Vancouver and Boston sort of we're in the process

of moving.

55

:

And so yeah, I shared an office with him.

56

:

It was a really cool experience.

57

:

Yeah, I mean, I guess.

58

:

And actually something Patrick told me a while preparing for the episode he told me so you

can ask any Luke anything about spots.

59

:

He knows a lot about that in stance.

60

:

But don't ask him anything about guitar playing, because I'm way better than him,

something like that.

61

:

Do you know what he's referring?

62

:

Yeah, so I used to play guitar quite a lot.

63

:

For those watching this on video, we'll see the guitars in the background.

64

:

um

65

:

But Patrick is actually a properly trained musician.

66

:

Actually, I have an interesting story about Patrick and guitar which is so I was at the

Sloan Sports Onlives Conference This is probably five six years ago and we were upstairs

67

:

at Trillium Brewing, is the sort of a you know, nice nice restaurant brew pub near the

conference and I was sitting at the corner of the bar and on one side of me was Patrick

68

:

who works for the Seattle Seahawks I think was on the your podcast what a couple months

ago or something and Sonny Maida was on the other side

69

:

of me and he's an assistant general manager for the Florida Panthers, incredibly

interesting guy, just two incredible guys on both sides of me.

70

:

And at some point, late into the evening, it occurred to me that these guys had just met

and they had this incredible unknown connection, is Patrick actually studied jazz guitar

71

:

at Berklee in Boston and so formally studied jazz guitar.

72

:

Sonny Maeda was actually a professional jazz guitarist in New Orleans for years.

73

:

In addition to being a

74

:

professional poker playing, all sorts of other cool stuff.

75

:

ah And so I said to these guys, said, whoa, like you guys are both like leaders in sports

analytics, but you have this really bizarre connection that neither of you realizes, and I

76

:

do.

77

:

And see how long it takes you to figure this out.

78

:

So they started on like, know, academic stuff.

79

:

It took them probably a good 20, 30 minutes before they realized.

80

:

then I couldn't keep them apart for the next like four hours as they just, you know,

compared teachers and the musicians.

81

:

they played with and so on.

82

:

you know a long history of course of math and music being tightly tied together but that

was a cool example of two of my favorite people and Sonny and Patrick sort of like come

83

:

together on a musical interest.

84

:

Yeah this is really cool.

85

:

For those watching the video it's the first time that happens.

86

:

I'm in a new flight now and I have a picture behind me.

87

:

uh

88

:

Like there is a face on that picture and basically the iPhone is freaking out because he's

seeing two faces and so he doesn't know if he should focus on myself or the painting.

89

:

It's literally going back and forth between the weird painting and you.

90

:

So I think I should take that personally, right?

91

:

Because I look nothing like the painting and the painting is pretty awful so it's like I'm

taking it quite personally.

92

:

I'm selling my iPhone after the episode.

93

:

I'm going to get rid of that painting actually while you're talking, which is way too

distracting for people.

94

:

So anyways, yeah, great, great story.

95

:

I love that.

96

:

And yeah, as you were saying, yeah, of course, deep history of music being actually very

mathematical and mathematics also being very creative and artistic in a way.

97

:

We've talked about that already actually on the show.

98

:

Yeah, at Zellas, which we'll talk about in a bit.

99

:

there's a very active music culture, a lot of people who play instruments that are hooked

on music, one of them being uh Daniel Lee, who you've also had in the podcast.

100

:

Really cool guy, but just really super passionate about music as well, and I think it's

pretty common connection.

101

:

Yeah, yeah, yeah.

102

:

So as you saw, could not get shoot of the paintings like really into the wall.

103

:

So we'll have to do that with that.

104

:

That's going to be a collector episode, I encourage you to check the YouTube video of that

one.

105

:

uh Yeah, mean, Daniel is actually incredible.

106

:

Each time I come to New York and I get to meet him, he always gives me...

107

:

tons of recommendations, know, of stuff to do in New York, very artistic.

108

:

He's also very well versed into in architecture, which is something he and I have in

common.

109

:

yeah, he's like, knows, he knows really well architecture.

110

:

And so that's really cool because he always gives me amazing things to go see in New York.

111

:

So yeah, thank you.

112

:

Thank you so much, Daniel, for enlightening me all the time.

113

:

Did you get a private concert when you were at the Sloan Conference from Patrick in that

other name?

114

:

don't remember.

115

:

From Sonny, no.

116

:

Patrick, though, in the past has sent me videos of himself playing guitar, and he's a much

better guitarist than I am, so I have never reciprocated.

117

:

What's talent of yours actually that you know like something that's I mean you're

obviously talented in your job and everything you do related but like a non-professional

118

:

talent that you could send a video to Patrick.

119

:

man it's good question.

120

:

Yeah and I know I'm putting you on the spot here because it's not at all it's the first

time I ask these questions and that was not in the questions I sent you.

121

:

I'm trying to get sort of obscure talent you know I'm

122

:

These days, it's my whole life is job and family.

123

:

So I'm extremely good at losing to my kids at Mario Kart and horrendously bad at putting

together IKEA furniture.

124

:

So maybe it's like good at being bad at child-related things.

125

:

So yeah.

126

:

I mean, I don't know if you're already at the part where you're losing on...

127

:

you're not losing anymore on purpose at my account against your kids.

128

:

I am at that point, my kids a little older now, they're eight, 10 and 12.

129

:

And so certainly the older two are, you know, super smash bros.

130

:

have no chance.

131

:

I just mash buttons and hope I get lucky.

132

:

So yeah, I'm well past that point.

133

:

They're definitely better gamers than me.

134

:

Yeah.

135

:

I mean, they probably don't listen to the show.

136

:

So if you tell them you do it on purpose, I think you can still do it for a few years, you

137

:

I didn't purpose that one.

138

:

Exactly.

139

:

I made you win.

140

:

uh Actually, yeah, so can you tell us what you're doing nowadays?

141

:

Because you do so many things, uh very original.

142

:

yeah, how, what do you tell people when they're asking you what you're doing nowadays?

143

:

And also how you ended up doing what you do and working on this?

144

:

Yeah, I tend to give people, it depends on how long of a conversation I want to have with

someone.

145

:

If I want to sort of end it quickly, that's what I do.

146

:

I say I'm a statistician and it ends the conversation right there.

147

:

For longer conversations, you know, I'm really fortunate that I have sort of a one word

description of what I do, which is moneyball.

148

:

So certainly people outside of the field, what do do?

149

:

Yeah, you know, I'm the moneyball guy and that's sort of a really easy way to sort of

describe what I do.

150

:

But yeah, if I give sort of a longer background,

151

:

I did a PhD in stats and machine learning under Arnaud Doucet and Jim Zidak at UBC.

152

:

Then I went on to spend some time on the tenure track at Harvard and then Simon Fraser.

153

:

through that time sort of made a transition into sports and we can talk about that

transition later if you want but sort of pivoted over to sports analytics and since then

154

:

have worked for a bunch of NBA teams, spent a few years with the Sacramento Kings, some

consulting games with others, spent a little over a year with AS Roma in Italy, a geek in

155

:

quantitative gambling.

156

:

And then the last four or five years with some partners, we banded together alongside some

investors and we bought Toulouse FC in the South of France and another club more recently.

157

:

And alongside that, I co-founded a company called Zealous Analytics, which uh we built up

to about 75.

158

:

employees and we actually sold the Teamworks to, sorry, we sold Zellis to Teamworks back

in the summer.

159

:

sort of just came full circle on that.

160

:

So now my time is spent.

161

:

I continue to work a little bit with Teamworks with the sort of old Zellis crew and

continuing to be part of the operations of Toulouse.

162

:

Hmm, yeah.

163

:

Okay, so Toulouse and not Milan.

164

:

Not terribly involved in Milan at the moment, that's right.

165

:

Yeah.

166

:

Yeah, because you still cannot be involved in both at the same time, right?

167

:

Because of the...

168

:

Yeah, it's complicated.

169

:

It's complicated with UEFA.

170

:

Yeah.

171

:

Yeah.

172

:

Okay.

173

:

Yeah, that's just a great...

174

:

Just such a great path.

175

:

yeah, we'll definitely dive into these lightest activities that you were talking about

because I'm really interested in how...

176

:

are the kind of models we're going to talk about and we talk about on this show all the

time and I do personally in my in my everyday life uh how they used how they consumed by

177

:

um the people I make the models for you know and that's that's something we have

178

:

I get asked to do interviews and podcasts quite regularly, but it's almost always in

sports and I turn it down.

179

:

This was like, get a chance to sort of like be nerdy again and talk about things that,

know, like technical things that I'm super interested in.

180

:

So, yeah, cool.

181

:

So then thank you.

182

:

Thank you so much.

183

:

That's an honor.

184

:

Yeah.

185

:

And I think also if people are interested in a bit more about your background and stuff

like that, I would refer to the

186

:

I think it's the Wharton Monable podcast where you were a few months ago I listened to

that one to prepare for the show so I'm not gonna ask you the same questions uh this

187

:

interview is really great because you go into detail how you ended up doing what you did

your time at Sacramento, your time at S-Roma and that's a great interview so

188

:

Yeah, for people, I put that in the show notes for people who want to get a bit more

detailed with your background.

189

:

Today, uh let's talk about patient stats a bit, you know, like how were you introduced to

these weird worlds of peace?

190

:

When I got married and my brothers gave a speech, part of the speech was...

191

:

they said he's a proud Bayesian and I can't remember, I just remember them like

pronouncing the word Bayesian like really strangely.

192

:

So it's part of my identity when it's part of a wedding speech.

193

:

But you know, I did my PhD at UBC and at the time I was there,

194

:

It was like this incredible hotbed of people in this domain.

195

:

So, you know, did my PhD with Arnold de Se.

196

:

And actually at the time there was like this really, was him, Jim Zidak, who also did

another co-supervisor of mine, Rafael Guitardo, Paul Gustafson, Nando De Freitas, Kevin

197

:

Murphy, you know, and many others.

198

:

So these people at the time, it's kind of all I knew.

199

:

So I didn't kind of realize how special that situation was, but it was sort of,

200

:

you know, got to spend essentially six years of my life surrounded by those people and

kind of deeply ingrained in me a sort of Bayesian way about thinking about the world.

201

:

oh yes, so that was, yeah, that was quite fast basically and that's the way, that's kind

of the way you learn statistics if I understand correctly.

202

:

Yeah, you know, I did an undergrad in mathematics and then my master's and PhD were both

in stats and

203

:

I sort of, my PC thesis was sort of divided between sort of spatial stats and Monte Carlo

methods and.

204

:

So you sort of those two things together when you think about sort of Bayesian spatial

statistics, combined with Monte Carlo, it sort of covered the whole gamut of sort of

205

:

Bayesian modeling techniques.

206

:

And of course, with Kevin Murphy there and Nanodifera, just like also a lot of exposure

to, you know, what Kevin might call like probabilistic machine learning or statistical

207

:

machine learning kind of thing.

208

:

yeah yeah okay yeah so that's funny because it's a bit like myself also where I didn't

have to unlearn you know all the stuff I had learned very confusingly in in undergrad like

209

:

basically learn patient sense from the from the start um

210

:

kind of make it simpler on the teacher, would say, as a teacher now, like it's way easier

to teach people like us than to teach people who learned a lot of frequent distance and

211

:

now are trying to switch because I mean, naturally they always try to come back to

something that's familiar and sometimes it's very different.

212

:

So you're like, okay, try to, you know, try to forget that, that paradigm.

213

:

That's hot.

214

:

UBC was interesting because you had sort of this Bayesian cohort, but then you also had

people working on like robust estimators like M estimators and Taoist estimators, all

215

:

these kind of very frequentist ideas.

216

:

So it was kind of being exposed to both worlds.

217

:

I think it was actually quite useful because it helps you really understand the sort of

philosophical differences as well as the, you know, the pros and cons of different ways of

218

:

thinking about things.

219

:

Yeah.

220

:

yeah.

221

:

I think that's definitely super interesting and super worth it.

222

:

And, as always, as always,

223

:

say for people interested in kind of the epistemological kind of side of things.

224

:

The two best episodes to start with that that we have are episode 50 and 51.

225

:

50 is with David Spiegelhalter, the only sir we had on the podcast for now.

226

:

Maybe we'll get Alex Ferguson, know, one day.

227

:

And 51 with Aubrey Clayton, the author of

228

:

the book Bernoulli's Fallacy.

229

:

If you want to start with some epistemological topics, would definitely recommend these

ones, So yeah, just go on the website and look for those.

230

:

But yeah, I think in my experience that I only give that to the students when they ask for

it.

231

:

There's usually the way that works for people is there are many interesting in the, you

know, in the practical side of things.

232

:

Um, and unless they are very nerdy like me, they're like, okay, but why is that actually

interesting?

233

:

And why is that working better in these cases?

234

:

You know, and interesting in really the why then I'll, I'll give that to them very happily

because I love that.

235

:

But, uh, in my experience and especially also for my bosses or else, you know, I'm like,

just.

236

:

I just show the model and show why that's interesting and why that would help their

decision making.

237

:

uh And actually you personally, as we've seen, you've been involved in sports analytics

for years.

238

:

So I'm curious about so many things.

239

:

First, what's your vision on how the field has evolved, ah especially with the rise of

different sources of data?

240

:

in different spots.

241

:

another question I have you as a decision maker is how do you approach modeling

uncertainty in your decision making, uh whether that's with Toulouse or, uh well, Milan

242

:

less now, the different things and projects you're working on, and whether we're talking

about evaluating players or planning strategies.

243

:

Yeah, so the first thing is about sports analytics.

244

:

So when I sort of getting into sports analytics at the time, it was the early days of

player tracking data.

245

:

So prior to that, the vast majority of the data up to that point had been event data, hand

tracked, maybe a couple hundred or a couple thousand events per game sort of true across

246

:

sports.

247

:

Now baseball may be the exception there, but for most sports it's sort of very simple

data, count data, that kind of thing.

248

:

And about 2012, I was really fortunate.

249

:

One of the first people I met when I started at Harvard was...

250

:

guy named Kurt Goldsbear and he had just been handed the NBA's player tracking data.

251

:

And this data captures multiple times per second the location on the court of every player

in the ball in three dimensions.

252

:

And so to me, when I looked at this, I like, this is like, I'm actually not that big of a

sports fan, but to me it was the richest space time data I'd ever seen.

253

:

And the amount of structure that exists in this data because of the sport itself, both the

rules as well as the sort of tactics and strategy.

254

:

It was just like the most fascinating and challenging problem I think I had ever seen ever

come across.

255

:

So for me, sports was not, uh my path into sports was not, I want to work in sports or

sports is really cool.

256

:

It's like this is really interesting and hard and unsolved.

257

:

Keep in mind my PhD was essentially on modeling things in space-time along with SMC

methods and so on to handle these things computationally efficiently.

258

:

And so I had the right toolkit to handle this type of data.

259

:

So yeah, that's kind of how I made the transition from academia to sports is fundamentally

with these new data sets that are really complex and spatially and temporally rich, uh

260

:

having the right skill set and just realizing actually there's lots of interesting

problems.

261

:

After a while I actually consulted in sports and continued academic gigs and just found

that I enjoyed the sports stuff more.

262

:

The problems were more interesting.

263

:

I never enjoyed going to committee meetings and reviewing papers and revising papers.

264

:

oh The peer review process in academia is fundamentally broken.

265

:

uh

266

:

So it was an easy decision when I decided I'm gonna go full time into sports.

267

:

So anyway, that's the sort of sports piece, how I sort of pivoted to sports.

268

:

The second part of question, which is like handling uncertainty in decision making.

269

:

I really think about that as the end user.

270

:

We wanna try and construct team.

271

:

that's gonna outperform its budget.

272

:

So let me use Toulouse as an example.

273

:

So Toulouse has a payroll of about, you know, 15, 18 million a year.

274

:

That's euros.

275

:

our goal with that- Which is, so to give perspective to people who don't know, you know,

sports.

276

:

That's not a lot.

277

:

Right.

278

:

If you compare to the We are the lowest or second lowest in the league.

279

:

And our goal is to have...

280

:

the are like way bigger.

281

:

Like they have way bigger payrolls than you two, the big clubs.

282

:

You might know a big club in France that has an insanely inflated payroll.

283

:

There's a few of them.

284

:

It starts with a P and ends with a G.

285

:

And so, ah know, just to make the math easy, we're spending 15 a year.

286

:

We want to perform like a mid-tier club, like a sort of an average club in the league,

which is typically spending 40 or 50 million.

287

:

So the way I think about us is like, how do we spend 15 million a year?

288

:

So that's like the lowest or second most budget performed, like the 10th or 11th best team

and that's sort of 40, 50 million.

289

:

And to do that, we sort of have to have a distributional sort of.

290

:

assumptions around the value that each individual player creates.

291

:

And essentially it becomes like a portfolio optimization problem where you have a set of

players which sort of act like individual uh assets in a way.

292

:

And you're trying to sort of find the combination of players that you can have some with

some reasonable confidence that you're going to be able to perform to the level that you

293

:

need.

294

:

Certainly the very least avoiding relegation, sort of assembling these things in a way

which may, you you're trying to sort of de-correlate these assets that these player

295

:

perform.

296

:

is, and so on, sort of such that you minimize your chance of relegation and maximize your

chance of performing at the level you want to perform at.

297

:

Yeah, yeah, and that's funny because that's also a lot how I personally, you know, try and

explain the issue to, I mean...

298

:

what what I do to to people and why actually it's interesting for clubs to model with you

know the kind of models we do um because often people don't know at all what's going on

299

:

you know in the front office is like especially in europe in the us I think you know the

influence of money bowl as you were saying um not only the book but also the movie is is

300

:

much more pregnant but in europe

301

:

especially soccer, especially continental Europe is like, really?

302

:

People do that.

303

:

And so, yeah, like trying to explain them in a way, as you were saying, as a portfolio

management, or I'm saying, well, if you're, know, when you have less, remember when you

304

:

were a student and you were broke, you had to be very careful about any dime that was

leaving your account.

305

:

So that's basically why also for the clubs which have way less budget, it's actually much

more interesting to

306

:

do these modeling because they have to be much more careful about what they spend, whereas

PSG or Real Madrid, it's not that big of a deal for them, at least at the beginning.

307

:

uh

308

:

such a high budget relative to rest of league, you can actually be fairly inefficient with

it and still be quite successful, right?

309

:

um I think they are.

310

:

think that's good description of the club.

311

:

Yeah, and to boil it down to a really simple example, let's say you want to acquire a

player, want to fill, say you're going for striker, have player A, player B.

312

:

The traditional approach would say, okay, let's say player A costs one million a year,

player B costs two million dollars a year.

313

:

And traditionally the Scots would

314

:

Well, we like player B better so let's get them but they say hold on one guy cost two

million a year that one cost one million a year one of them's you older young girl these

315

:

different factors how do we choose one and they have to sort of Either kind of comparing

apples to oranges.

316

:

They have these sort of financial traits and then they have the scout saying I like this

guy better He's got a nice he's got a good left foot and he has good spatial awareness And

317

:

he said well, how do I turn that into dollars?

318

:

Like how do I decide whether I'm gonna spend two million or one million on player B or

player a?

319

:

whereas Statistical methods come along and you can sort of very precisely say hey

320

:

there's a 10 % chance that this player performs about his contract, there's a 30 % chance

that this other guy performs about his contract.

321

:

So you can actually be much more explicit and probabilistic about the value a player

brings, most importantly, how they bring relative to their contract.

322

:

Yeah, yeah.

323

:

And so I'll show you how...

324

:

how does that work for you?

325

:

I'm curious as the decision maker, as one of the decision makers, how do you actually use

the models to make decisions?

326

:

And I think for me, you would be one of the best bosses I could have, right?

327

:

Because you come from the modeling side.

328

:

So that would make my work, of course, way easier because I could talk to you in technical

terms, of course.

329

:

But I'm very interested in how you personally consume the models.

330

:

What is actually useful

331

:

For you, know, are those the posterior distribution?

332

:

Is this something that you are more interested in the comparisons?

333

:

You're more interested in the tail probabilities?

334

:

What's actually something you care about when you make the decisions?

335

:

Yeah, I think the most useful way to think about

336

:

like for the end user is really sort of a distribution on future performance value.

337

:

And what I mean by that is you can imagine saying, we think next year this player's

performance on our team and ideally in dollars would be to say, you know, some, some

338

:

distribution over the value they will add in dollars.

339

:

So if you look at basketball, it's like, you know, maybe, maybe LeBron James is worth 40

million a year, but, and, we might say, Hey, we think that there's a 20 % chance that

340

:

he's, you know, he's thinking some distribution.

341

:

There's a 20 % chance that

342

:

he's worth 60 million or more.

343

:

And there's a 10 % chance, because of injury or whatnot, that he's actually worth 5

million or less.

344

:

So understanding that distribution and then seeing how it progresses through time.

345

:

That's the key information you need to make a decision about whether to, you know, it's

not going to sell the player or sign that player or to extend their contract or

346

:

renegotiate the contract.

347

:

So that's sort of the key bits of information.

348

:

And ultimately we want to use that information as much as possible.

349

:

I sort of think about...

350

:

A lot of what I do now is actually just removing humans from decision making.

351

:

You know, sort of, there's so much work now in behavioral economics and elsewhere that

humans are just quite biased in the way that they go about decision making fundamentally.

352

:

so, you know, lot of what I do is sort of focusing on how do we...

353

:

How do we remove that bias from the decision making process?

354

:

How do we take the good bits of subjective information and remove those biases in a way

which is ultimately going to allow you to create some arbitrage opportunity in player

355

:

value and ultimately to create shareholder value.

356

:

Okay, I see.

357

:

If I try to summarize.

358

:

Basically something like a war metric is how you summarize the contribution of the player

then but that's still in terms of you know goals or wins or whatever and then transforming

359

:

that contribution to dollars in and then making the decision based on that and probably

making a decision that's helped a lot by uh

360

:

computer assistance to try and limit the decision, the biases that are very well

documented in how humans make decisions.

361

:

Exactly.

362

:

One way to think about it on a player, if every single player on your team is underpaid,

you are going to have a very good team.

363

:

Right?

364

:

Like assuming you're paying, you know, have a certain payroll, right?

365

:

Conversely, it's easier to think about the other way, right?

366

:

If you have a fixed payroll, if you have every single player on your team, if you have a

low payroll and every player in your team is overpaid, you're going to be very, very bad,

367

:

right?

368

:

So the goal is essentially to getting as much performance value as you can for the dollars

you spend, right?

369

:

history has shown that that statistical models and sort of probabilistic reasoning are a

much better way to go about that process than sort of the traditional gut-feel scouting

370

:

approach.

371

:

Yeah, yeah, but that's really fascinating to really see that research now pan out in, you

know, in

372

:

in real uh clubs like you're doing so that's why I'm really fascinating by that kind of

work and also you know yeah I think these are great organizations as modellers at least to

373

:

work in because

374

:

That's also where your work has more impact, I would say.

375

:

Yeah, in sports, you just get a tremendous amount of super interesting modeling problems.

376

:

Like, of thinking that you talk about how to use Bayesian methods and so on.

377

:

That's...

378

:

Fitting the model is not the sort of interesting part, right?

379

:

It's like even defining the model is not necessarily the interesting part.

380

:

Where you get in, just, it's all sort of the issues that come around it that become super

interesting.

381

:

It's like, if you've defined your estimate, like slightly incorrectly, you can end up with

wildly wrong conclusions.

382

:

Conclusions that can cost you millions of dollars.

383

:

Or there can be like,

384

:

there's probably lots of examples we can get into, where this sort of standard model, like

think about what you're doing in sort of traditional validation of models.

385

:

You say, okay, we fit this model and then we sort of say, hey, this thing predicts well by

some metric, like, you know, RMSC or blog loss or whatever it might be, right?

386

:

We say, hey, this thing works well.

387

:

And that's sort of a summary across all of the data.

388

:

But think about what you're doing in practice when you're running the sports team.

389

:

saying, I want the players who have really distinctly act like

390

:

surplus value to add to my team.

391

:

Ultimately, you're really you're grabbing players from sort of the edges of the prediction

space.

392

:

If you think about the joint space of like performance and dollars, the players you want

are sort of at that threshold where they're cheap and they're good.

393

:

Right.

394

:

And so if you're so you don't care about the bulk of predictions from the model, what you

care about is the predictions in the space where you're actually actioning.

395

:

Right.

396

:

So it's the it's the predictions in the space of players that you're actually going to be

able to action with.

397

:

players on your team or players you're acquiring or of course that extends ideas of

strategy and so on as well, right?

398

:

Sort of unique strategies and so on.

399

:

So you care about pathological behavior and you really care about edge cases.

400

:

So, and a lot of times in sort of an academic settings, it's like, hey, if this model, it

works, it performs well, great.

401

:

Here you're sort of saying like, I'm actually fine if the model in aggregate is slightly

worse if in the sort of space...

402

:

of the in the covariate space that we care about or the prediction space we care about, it

performs better.

403

:

Right.

404

:

So there's lots of little issues like that that come up in sports that are like super kind

of interesting and make the modeling part of it, you know, much more, uh much more

405

:

interesting.

406

:

I'll give you another example.

407

:

This is like so basing but unbasing and see if you can see if we can sort of wrap our

heads around this, which is there's this idea in sports called the messy test.

408

:

You've probably heard this phrase.

409

:

Maybe it's like the trout test in baseball or something, right?

410

:

But the idea is that if you build a model for overall player skill and you know, this is

like, you know, five years ago or whatever, but if you build a model for player skill, if

411

:

Messi's not like at the top or close to the top, your model probably sucks.

412

:

Right.

413

:

It's like a very simple, like it's the eye test, right?

414

:

Like you build a, if you build a model for, you know, best player, you want messy to be at

the top.

415

:

And you can sort of simply, you could imagine doing the same for individual skills, right?

416

:

If like, if I went basketball and I said, if I built a model for the best three point

shooter, if I didn't have like Steph Curry right near the top, like I should probably be

417

:

concerned about my model, right?

418

:

And that's actually like really insightful information.

419

:

But if you think about what that is,

420

:

It's typically not because, know, unless you're sort of like have some latent space of

player skills.

421

:

Typically in this case, what you're actually doing is having some model that's like

weighting different parameters and so on.

422

:

And then you sort of like calculate it for each individual player to see where they land

on this ranking.

423

:

What you're actually doing is you're putting a, you have prior information about which

players are good.

424

:

Okay.

425

:

But when you're showing these rankings, essentially that's like the posterior predictive

of the model, right?

426

:

It's like the predictions from the posterior.

427

:

So it's the posterior predictive.

428

:

And you're saying, have information about what that posterior predictor should look like.

429

:

So you're putting a prior on the posterior predictive, right?

430

:

Which is like any like traditional Bayesian would say like, that's just, you know, heresy.

431

:

But it's like there's something there and it's actually valuable.

432

:

So there's all these super, super interesting things that come out in sports where you

have to think really deeply um about the problem you're trying to solve and how your model

433

:

sort of dovetails into that problem and really sort of working on...

434

:

um

435

:

sort of those issues.

436

:

I'll give you one more example here and I could come up with more if we could keep going

where the modeling just becomes super fascinating, which is one thing I've seen quite a

437

:

bit is that there's a lot of these models that sort of capture overall player value, and

they do it on the same units.

438

:

So they say like, in basketball, for example, it might be like points per hundred

possessions.

439

:

So like you could have, you could put a whole bunch of grad students and say, hey, you're

all gonna build a model for player performance with points per hundred possessions.

440

:

And then they will do like they might have all other analyst team go out and build these

models and then they say, okay, now we're going to put, we're going to wait these things

441

:

to create an overall model, sort of like an ensemble laying or an averaging idea, right?

442

:

And they might say, but because these things are all in the same scale, let's make sure

these weights sum to one, right?

443

:

Or, know, a triangle, triangle, prior on the simplex, right?

444

:

But that's actually not necessarily the right thing to do because each of these models in

and of themselves has some shrinkage typically built into them.

445

:

And so it's very possible that on the whole, once average, there's actually more

information in the sort of aggregate model than there's in the individual.

446

:

So there's no reason actually that you should do that.

447

:

So the sort of the,

448

:

if you don't kind of understand what's going into these models, why they're shrunken, why

sort of multiple models combined might not wanna be shrunken.

449

:

In fact, in this case, it's not even obvious to me that you should enforce positivity on

the weights, right?

450

:

So there's like all these things that I think if you go and just say like, we're gonna

throw the obvious solution into some of these problems, you oftentimes end up with the

451

:

wrong solution.

452

:

So you need to sort of like...

453

:

deeply understand the models you're building as well as the data you have as well as sort

of the underlying sport.

454

:

And that's, that's, think where it becomes super, super interesting.

455

:

Hmm.

456

:

Yeah.

457

:

Indeed.

458

:

That's fascinating.

459

:

Um, and what's really interesting to me is then, and that hints a bit about how useful,

um,

460

:

these models and the Beijing framework is in your work and at Teamworks in particular.

461

:

So I'm curious, like, if you can tell us a bit more about how you leverage these methods

in your work at Teamworks uh or for Toulouse and how you...

462

:

like, and maybe to come back, you know, to what you started doing in your career.

463

:

uh How did do they help account for uncertainty and thinking about the generative graph of

the model with in particular, spatial temporal data?

464

:

Yeah, I think the word generative that you said is really useful.

465

:

That's kind of how I think about modeling problems, right?

466

:

I very much think about like, what is the generative model here?

467

:

And how do we think in sort of

468

:

This may be a bit over simplistic, but I sort of think about generative model as the way

of specifying the model and Bayes is kind of the way of giving the data how you sort of

469

:

work backwards, right?

470

:

Yeah.

471

:

Sort of the Bayesian, the sort of way of thinking, right?

472

:

Where you're sort of pooling information across maybe across players or across matches or

across seasons or across space or this is all like incredibly important in sports where

473

:

you have what might look like a lot of data.

474

:

but there's so much sort of contextual variation combined with a tremendous amount of

players.

475

:

So our database in soccer is about 80,000 players.

476

:

And if you sort of, you know, just think about like, oh, I want to fit even like an adjust

plus minus model, which is essentially like a giant regression model.

477

:

It's a really, you know.

478

:

large P, small N type of problem.

479

:

But of course there's a tremendous amount of structure there, right?

480

:

You have a lot of insights based on the league they play on, the positions they play on,

how good the team is, as well as sort of individual event information that happens in the

481

:

game.

482

:

you get like all this sort of...

483

:

super universal information uh that comes from the structure of the game and you want to

be able to use that to get sort of better, better information, right?

484

:

So uh when you talk about Bayesian models, that's sort of where it comes into play.

485

:

Now, I should say that oftentimes what we end up doing in the end is sort of finding this

balance between, uh want to, in my head, maybe I've specified this perfect model for this

486

:

problem.

487

:

But then either it's like, it's too slow to fit because of the size of this data is

especially tracking data can be absolutely massive.

488

:

So it's like, how do we get sort of poor man's bays or like some, so oftentimes it's like,

can we get 90 % of the value of this model for 5 % of the computational cost or 10 % of

489

:

the computational cost?

490

:

Right.

491

:

So, you know, there's a lot of those types of conversations go on.

492

:

And then the other things that will happen is again, is you'll sort of fit this model.

493

:

And then you might back to sort of the earlier conversation, you look at the

494

:

these edge cases and sort of the the perimeters of the covariate space and you're like

there's some bizarre behavior um in the model just doesn't look right in for various

495

:

reasons in areas that we ultimately care about and So then you sort of like have to do

think of okay.

496

:

How do we solve this?

497

:

Right?

498

:

Do you like up weight that area that covariate space in terms of the likelihood?

499

:

Do you do sort of ad hoc corrections?

500

:

Do you?

501

:

uh redefine the model right like there's uh

502

:

There's like all these sort of interesting things that arise.

503

:

I'll give you an example, one fairly recently, which is that we're building, you can

imagine you have some model that says, okay, I'm going to build component models for

504

:

different skills.

505

:

And then I'm going to sort of have those skills try and predict like player performance or

team performance or something.

506

:

So you can actually think of this as like multiple likelihoods in the same model, right?

507

:

A likelihood for, there's some like latent, there's some latent skills.

508

:

Think of them as like speed and ball control and whatever, right?

509

:

And then you might be observing like uh actual speed data and maybe some of the, you know,

team performance and much of things, right?

510

:

So there's all these different likelihoods sort of stack on top of those variables.

511

:

And it turns out that because of this, whether it's like model mis-specification or

collinearity or something, you can end up in these situations where, um okay, the speed

512

:

data in theory should define everything you need to know about speed, but because of the

sort of incompleteness of the data or who knows what, you end up sort of, likelihood for

513

:

team performance, what can actually happen is to say, actually the speed data says this

player's slow, but because he's so good, this latent variable for speed

514

:

actually going to think he's really fast.

515

:

And so you end up in like situations like no, no, I don't want the players performance or

team performance to sort of flow into the speed like in variable because I, you know,

516

:

subjectively, I think all the information should be captured from this other thing.

517

:

And so you sort of end up doing things which are kind of unbazing actually, which are like

cut models where you're saying, yeah, I won't like even within say like a Gibbs sampler,

518

:

right?

519

:

Where you're saying I'm not going to allow this information to affect this variable.

520

:

or this parameter to sort of flow into this other parameter.

521

:

So based on the underlying models, you can sort of hack these models to do very, very

interesting things.

522

:

Actually, this is an old idea.

523

:

If you remember, WinBugs used to have this like cut option, which sort of was largely for

computational reasons, but allowed you to sort of like, in some sense, uh sort of portions

524

:

of the graphical model to sort of produce what you want.

525

:

There's actually some really interesting papers.

526

:

I'll see if I can find it later by Pierre Jacob that looks at these models, these cut

models.

527

:

They're sort of like unbazing in a way, but sort of shows how they behave and how they're

super useful in certain cases.

528

:

Yeah, yeah, that is very interesting.

529

:

It's like, yeah, I'm not sure it's unbazing.

530

:

To me, it's just specifying prior information and making sure it goes through the sampler.

531

:

The issue you have usually, and that's actually an active area of research, is

532

:

If the model gets complex enough, it gets hard to set the priors on something you really

care about.

533

:

know, like setting the priors on the standard deviation terms of a multivariate normal is

like, you have to do it, but...

534

:

sometimes not interpretable.

535

:

What you have an idea about is the standard deviation of the whole data, right?

536

:

So your whole generative graph, you know, more or less the prior you want to have on the

standard deviation.

537

:

but you don't know how to specify that for each of the parameters.

538

:

so ideally you would want to specify one big standard deviation and that one is just

allocated then up upstream to the rest of the model.

539

:

This is even more weird though, because this is like saying we have some Gibbs sampler.

540

:

like maybe you're alternating through parameters.

541

:

Maybe it's these latent variables, right?

542

:

And you're saying, okay, if we write the full joint posterior and then we break it down

into the conditional so we can do Gibbs sampling.

543

:

And we say, this one,

544

:

All the Gibbs samplers we're gonna sample with, whether, whatever, maybe it's conjugate or

maybe you can do rejection sampling, it doesn't really matter.

545

:

But this one conditional that we wanna sample from, we're actually not gonna sample from

the true uh conditional as defined by the full posterior.

546

:

We're actually gonna sample from some version which looks like that, but we're actually

gonna cut out the dependency on this particular data source.

547

:

So you're actually like, you're sort of creating this posterior, sort of artificially

creating this posterior which is actually not the full posterior you originally defined.

548

:

you're of saying, you're defining some sort of new posterior, is uh by removing certain

dependencies within these conditional statements.

549

:

Now it's actually not necessarily true that you end up with like a valid joint posterior,

but you end up with better performance.

550

:

So it's just a super interesting problem for me where, I have this, like, know how things

work on a broad scale for sort of the full Bayesian model, but I also know I can get, I

551

:

can solve problems by sort of hacking these things and sort of pinning certain variables

here and tweaking things here.

552

:

and sort of breaking things, but breaking them in a way which actually produces better

results and ultimately better decisions.

553

:

Yeah, that's fascinating.

554

:

I love that.

555

:

So basically it's like...

556

:

If I understood correctly, when you say cutting the posterior, it's like making sure we're

only taking a subset of the full posterior that the model was actually defining.

557

:

Kind of, yeah, but only on one of the conditionals, only on one of the variables.

558

:

you're saying, when we learn about the speed...

559

:

So normally it's saying, if I'm updating, let's say the variable for players

decision-making, you have to condition on everything.

560

:

It's the data, know, to give sample, you're also conditioning on the previous draws of all

the other variables.

561

:

But you're saying that when saying, for speed, I actually only...

562

:

when I update the speed latent variable I only want to condition on the other variables

and the speed data I don't want to condition on these other data so you're sort of

563

:

dropping the conditioning on these other variables and sort of just removing them from

that conditional uh sampler.

564

:

So kind of a really interesting case where you're sort of like just taking this thing that

looks like a well-constructed

565

:

posterior distribution and sort of just like hacking it in ways, which is like kind of

unprincipled, but like makes sense and leads to the performance that you want.

566

:

Yeah, yeah.

567

:

Damn, that's very cool.

568

:

So I'm guessing that means a lot of custom code, whether that's in Stan or Pimcee or

Python or stuff like that, right?

569

:

Yeah, certainly, right?

570

:

A lot of these things aren't necessarily handled well with Stan, right?

571

:

Because Stan is great for, or any these small PMC, whatever, they're great for, hey, let's

build like a Bayesian model that does this, this, and this, but it's not great for when

572

:

you're like, hey, we need to fix this edge case, we need do this, and we need to of hack

these things.

573

:

So first up, to do those types of hacks, you have to like deeply understand the problem

and deeply understand the models, that's sort of step one.

574

:

But as you say, it oftentimes falls outside of these automatic inference engines.

575

:

So you oftentimes have to do things custom, right?

576

:

So, you know, it actually reminds me

577

:

bit of like, you know, think a lot of times because of Stan and PIMC and others, like you

oftentimes don't need to understand what's going on with the hood with HMC or any of these

578

:

other algorithms, right?

579

:

But I always thought it was important for people to understand, like at the very least,

like what's going on when you're fitting sort of basic models.

580

:

Like I'll give you one example here, which is when I taught spatial stats, it was a grad

course.

581

:

I would give people like a very simple multivariate normal where they're trying to like

estimate the mean or the covariance or something, right?

582

:

It's like, okay, here's a whole bunch of data and you know, maybe P is 50 and N is a

thousand.

583

:

Hey, learn the mean and covariance.

584

:

And that's like, that's a trivial problem.

585

:

It's conjugate, blah, blah, blah, right?

586

:

And then I say, okay, now I'm going to give them another data set, which is kind of the

same thing, but P is now a thousand and N is maybe 10,000.

587

:

And all of sudden it's easy to pull in the data.

588

:

The data is actually relatively small, but they go to fit the data and it just doesn't

work.

589

:

And they say like, what's going on here?

590

:

Why doesn't this work?

591

:

It's a conjugate problem.

592

:

why doesn't it work?

593

:

And they say, well, you're trying to invert a thousand by thousand covariate matrix.

594

:

Maybe that's not a lot.

595

:

Maybe it breaks down at 5,000.

596

:

Whatever it is, it's a pretty relatively low number on sort of a typical laptop where

these things start to fail.

597

:

And then I okay, well, that doesn't work.

598

:

So go fit it with stochastic gradient descent.

599

:

And then they go do that and they say, actually that works way better for these large data

samples sizes.

600

:

And I said, but what about for the small data?

601

:

And they say, well, actually then just use the sort of the conjugate one, right?

602

:

So it's like realizing that, there's different ways to solve the same problem.

603

:

And you might use...

604

:

this sort of direct sort of analytical solution in some cases, and you might prefer to use

SGD in another.

605

:

And then I go back and say, actually, there's a piece of information I didn't actually

tell you.

606

:

there's actually a certain correlation structure in this data.

607

:

And it turns out to be just an AR1, it's a time series.

608

:

Oh, now you can just fit this thing with like, you know, common filter-esque kind of,

message passing kind of ideas, typically like common filter.

609

:

So you can do things in sort of linear time in P, because P is sort of T now, like time,

right?

610

:

So, and then there, and all of a sudden they go from an exact solution to something which

actually is exact and very, very fast.

611

:

solving the same problem fundamentally, but understanding sort of there's different ways

of actually doing the computation, right?

612

:

And I think having those skills and sort of uh thinking holistically, not just about the

modeling, but as well as the sort of fit uh is really important.

613

:

Like in sports where we're dealing with space all the time, you're sort of like, you're

always having to discretize space in some way or another, right?

614

:

Whether you like it or not, whether you sort of think about it as like...

615

:

we are projecting onto some basis or something, ultimately you're discretizing.

616

:

you put it into the computer, it's discretizing it.

617

:

So uh people oftentimes will say, I really don't like having a continuous model, but like,

hold on, you're to have to sort of come up with some low basis approximation to it at some

618

:

point.

619

:

So might as well be sort of transparent that you either sort of either having a sort of

simple model that you can do like really accurate exact computation on or a complicated

620

:

model that you got to do approximate computation on and sort of understanding those

trade-offs.

621

:

think is like is critical in anyone who's working on this problem.

622

:

Yeah, no, for sure.

623

:

And that makes me think, you know, in your experience, what what have been the different

ways, you know, to get these uh

624

:

90 % of the results with 5 % of the computational cost of the model.

625

:

What have you found particularly helpful?

626

:

And I'm guessing these are cases where you cannot run classy nuts sampler, for instance,

on a model.

627

:

Yeah, a lot of this looks like...

628

:

uh

629

:

It's like making sure you have a big enough toolbox so you can say, here's the perfect way

of doing this.

630

:

Maybe it's like some big model built in Stan.

631

:

ah But hey, that's not going to work as the data scales.

632

:

So can we fit this with like a penalized regression?

633

:

Or what happens if I sort of

634

:

In a lot of these samplers, where things start to get really slow is when you're sort of

sampling hierarchical parameters as well as sort of like, like in a lot of player, if you

635

:

think about like a situation where you have like one variable per uh player, right?

636

:

Those types of problems are oftentimes easy because you can marginalize them out.

637

:

But as soon as you start sampling like hierarchical parameters as well, where you sort of,

you want like some, you know, uh

638

:

some like group level variances or something like that, then those conditional updates can

be super expensive to calculate because then you're sort of conditioning on the full data,

639

:

And so it's like, okay, what happens if...

640

:

What happens if we just like do some hack and just find an estimate for that hierarchical

parameter, just plug it in, right?

641

:

Do we lose a lot by just sort of specifying that directly rather than sampling it and

having it posterior over it, right?

642

:

So that's like lot of the types of things we're thinking about is like, where are the

computational bottlenecks, especially as the data scales and how can we make it work?

643

:

Oftentimes work sort of in sequential updating, because essentially we want to update

these things.

644

:

weekly as new matches come in.

645

:

How can we do this sort of in really efficient way without losing sort of prediction

fidelity, right?

646

:

And so yeah, it looks like things like, hey, actually, if it's a big regression problem,

can we do sort of ping-wise regression?

647

:

And sure, we won't get full characterization of uncertainty, but can we get like a point

estimate and 1 % of the time?

648

:

And then is there sort of ways, can we bootstrap or something else to get some notion of

uncertainty around it?

649

:

Or as I said earlier, can we pin some hierarchical parameter to speed up the sampling,

even though we know that, you know, that's

650

:

But we're gonna end up with some bias that results.

651

:

So those are the types of conversations we're having constantly.

652

:

Hmm.

653

:

Okay.

654

:

Yeah.

655

:

Yeah.

656

:

Yeah.

657

:

That tickles a lot of the conversations I'm having also.

658

:

In my work for sure.

659

:

so if you and curious, know, if you would, um if you would be mentoring or teaching right

now, these students, you know, um what would you

660

:

recommend them to invest their learning time on providing conditional and then knowing

patient stats and being fluent in a probabilistic programming language already, which is

661

:

the case of a lot of the listeners here.

662

:

But so if you had to tell them, it's something that can actually help you also complete

your toolbox for these cases, for instance, what advice would you give them?

663

:

Yeah, one thing I've seen with like a lot of

664

:

of students that are coming out of master's programs or data science programs is that

they're sort of quite good at give them a data set, hey, either use this method or choose

665

:

a method that's going to give you a good prediction performance.

666

:

so people are good at maybe tuning neural networks or Gaussian processes or...

667

:

you know, penalized regression or what have you, they're sort of quite confident with

these, with sort of fitting methods and looking at predictions and saying, hey, this has a

668

:

better RMSE.

669

:

But where they can really sort of struggle is that a lot of times at sports, actually,

there are certainly prediction cases, but oftentimes what we're doing is fundamentally as

670

:

inference.

671

:

You're saying, hey, we observe some team level performance where you want to infer what's

causing this.

672

:

So you have some latent parameters, which are like player skills or player performances,

and you're essentially trying to learn those.

673

:

It's, it's fundamentally an inference task.

674

:

And so, you know, thinking about more sort of statistical problems and sort of building

that statistical toolbox, you know, is the first thing I would say is if you're sort of

675

:

the bulk of your time has been spent building prediction algorithms is sort of broadening

your toolbox beyond that, right?

676

:

And then learning as much as you can, you know, if you've some of the best ideas I've ever

had in sports have come from like text modeling.

677

:

Right?

678

:

Where it's like using like Duryshy processes to model plays in basketball was like one of

the coolest things, I think, projects that I've ever been involved in.

679

:

Again, that wasn't my idea.

680

:

was a genius PhD student named Andy Miller.

681

:

there's like those types of things, right?

682

:

If you want to be able to sort of creatively solve these problems to find like, to find...

683

:

computationally efficient solutions, you have to have way more than just one tool in your

toolbox.

684

:

Read broadly, study broadly, and you can get into the weeds on things.

685

:

Learn about language models, learn about text models, learn about image modeling, learn

about spatial statistics, learn about robust estimators, as we talked about earlier, learn

686

:

about all these different areas.

687

:

And sure, you won't necessarily be an expert in any of them, but if you sort of

understand, I know what these models do, and I sort of understand their pros and their

688

:

cons, and what they work on, what they don't work on.

689

:

then it becomes much easier to sort of solve these sort of unique problems that arise.

690

:

Yeah, yeah, yeah.

691

:

That's really, really interesting to hear that.

692

:

And how do you...

693

:

You know, in your experience, what have been very interesting uh open source software that

maybe you've used outside of the patient framework?

694

:

curious.

695

:

Yeah, early days for me was not open source.

696

:

I started when I started my master's, it was like MATLAB programming, right?

697

:

That's what I was like the tool de jour.

698

:

And since then I've sort of been...

699

:

combination of R and Python has been the large bulk of my work.

700

:

But a lot of stuff is built on packages designed for those two languages as well as Stan

and others.

701

:

And these days, a lot of what I do is also sort of broader technology, which is just

revolutionized the way we do things, whether it's cloud computing or Docker.

702

:

Like all these sort of standard tools in broader tech, think, also, you if you want to

work in industry in particular, are incredibly valuable, incredibly valuable to learn,

703

:

right?

704

:

Like...

705

:

At Zell's, example, now Teamworks, even have sort of our models themselves are like

Dockerized, they're containerized.

706

:

So you think about like, I'm building a Bayesian model.

707

:

Well, we think of it as like a container object that holds predict functions and test

functions and train functions and allows you to like very simply uh probe these models as

708

:

well as to sort of aversion and control them.

709

:

And so it's like blending ideas from statistics and machine learning with sort of

710

:

software engineering ultimately to solve a lot of interesting problems that come up in

sort of uh production-alized machine learning.

711

:

And so, to this it's a lot, right?

712

:

But it's a lot also outside of what you might think of as traditional stats and ML

toolbox, right?

713

:

But broader tech stuff.

714

:

Heck, I spent a weekend, this is not work-related, but I spent a weekend flashing a

Raspberry Pi and messing around with a Raspberry Pi last weekend.

715

:

So I'm deep into this.

716

:

stuff.

717

:

Yeah, I mean, that makes sense to me in the sense that, you know, a lot actually a

substantial portion of proportion of the modeling work is actually not done on the model.

718

:

It's much like a lot before, how do I get the data in which format?

719

:

How do which data do I want?

720

:

Even before that, what's the generative graph we're thinking about?

721

:

Which questions are we interested in answering?

722

:

And then once I have the data, how do I format it in the format that the model can

actually take it?

723

:

Doing all the EDA, also extremely important.

724

:

Parameter recovery, simulation-based calibration, all this stuff.

725

:

So this is something I hear a lot, right?

726

:

Which is like, data science in the real world is like a lot more data managing and like

all this sort of pre-model fitting stuff.

727

:

That's certainly all true.

728

:

I think the analysts that have worked with me in the past, where they're most surprised

when they work with me is that how much work there is after that first model fit.

729

:

they do all this work getting everything in place and then, hey, I finally built the

model.

730

:

We're good to go.

731

:

And I said, well, hold on.

732

:

I want to see.

733

:

Calibration, I want to see like all these different, want to explore all the edge cases.

734

:

I want to explore, you know, what, what happened, how sensitive is it to different

assumptions?

735

:

How, know, is there like all these sort of like poking and prodding the model?

736

:

It's like, so you have, yeah, there's all this work that comes before the model and then

they're sort of fitting the model, which is kind of the quote unquote fun part.

737

:

And then before you sort of productionalize this thing and release it to the world, like

you have to probe it endlessly to make sure that, that it does the things you want it to

738

:

do and that in sort of predictable and controllable.

739

:

ways.

740

:

And that's sometimes more work than is actually needed before the models fit.

741

:

yeah.

742

:

Yeah, I was gonna go there.

743

:

Definitely.

744

:

And yeah, I think that's almost always even more work.

745

:

um Because, well, that, that'll that often doesn't work like you want to, know, model,

especially if it's complex enough.

746

:

So there's definitely some dimension.

747

:

where it's not working as you would have expected.

748

:

um And so that's interesting to know, okay, in which conditions do my model uh collapse?

749

:

Where does it not work?

750

:

And so that takes a lot of form.

751

:

think then there is even after that, there is another part that is visualization.

752

:

If you're doing that as the modeler, it's extremely important because you want custom

visualizations uh depending on the people who are gonna talk to.

753

:

uh We'll get to that a bit later, but yeah, on that model validation part, I think it's an

extremely important part uh that you bring up.

754

:

that ah also...

755

:

goes back to what something you were saying earlier which is actually when we develop the

model we're fine if it's not doing a very good job at the league level for instance but

756

:

where it does a good job for the kind of players we're interested in so it's like maybe

the model is not that good overall or it's not that good for

757

:

old players because they have low athleticism stuff like that but we're actually

interested in young players so more bigger like better athleticism and so on in this

758

:

population the model is doing better so it's actually fine for us so that's the very

important part amateurs what's your workflow there if you have one you know if the other

759

:

stuff that you

760

:

always check once someone comes with a model that's fitting well, all the conversions and

diagnostics are all good.

761

:

Now we're entering that part of model validation.

762

:

Yeah, we do a lot of stuff pre-modeling, project plan type stuff where there's documents

that we will use, the layout, the types of things we want to do beforehand.

763

:

Afterwards as well, like calibrations, lot of standard things that we'll do.

764

:

But lot of it just comes, it's sort of fairly iterative, right?

765

:

You do the initial sets of plots and then say, actually, we should look at this and this

and then something looks weird and you say, okay, that's weird.

766

:

Like how else can we slice that in a different way?

767

:

Can we see it?

768

:

uh

769

:

trying to expose what's actually going on under the hood.

770

:

Hey, can we see this, set of parameters?

771

:

Is there some, is there some sort of like high leverage data that's maybe driving these,

this weirdness?

772

:

Like let's explore all that type of stuff.

773

:

So it's a combination of sort of a standard checklist, if you will, as well as sort of

really custom stuff that really comes from just working with these models and these for,

774

:

for so long that you sort of start when you, you sort of really spot little issues and you

sort of figure out how to, to sort of pull the thread and to get at what you want.

775

:

Hmm.

776

:

Yeah.

777

:

It's like, yeah, like a detective work.

778

:

um And what's like, is there a case, you know, that you remember that was particularly

hard, you know, where you were like banging your head against the wall before finally

779

:

understanding what was going on with the model?

780

:

Yeah, there's lots.

781

:

That happens like that's daily.

782

:

I'll give you one example.

783

:

It's it ended up being my favorite paper I think that I've ever written is with Patrick.

784

:

Very simple statistically.

785

:

There's this idea in sports science of acute chronic ratios and the...

786

:

It's a pretty simple notion of, how much load on your body has there been?

787

:

I'm going to simplify a bit, so bear with me, but how much load has your body had in the

last week versus how much has it had in the last month, roughly, or maybe month and a

788

:

half?

789

:

Just the idea being sort of like, Hey, are you, you getting, um,

790

:

is the load that you've experienced recently, is this sort of normal to what you normally

experience or is it high or is it low?

791

:

You can imagine if you're a runner and you sort of normally run 10 miles a week and then

this last week you ran 50 miles, well you'd have a super high Q-chronic ratio.

792

:

If you normally run 10 miles a week and this week you ran one mile, well you'd have a

really low Q-chronic ratio.

793

:

So sort of like a short-term average divided by a long-term average, right?

794

:

So it's the ratio of the two.

795

:

And there's a tremendous amount of papers out there that show that if that cute chronic

ratio is outside of some range.

796

:

So here I'm talking about like load on the players, like amount of physical exertion.

797

:

It's probably easiest to think of it as like running, right?

798

:

It's like distance run, it's a little more complicated than that.

799

:

But basically say, if we have these notions of player load and if you fall way outside of

these bands, that it's highly predictive of injury in the future.

800

:

And there's like some intuitive logic there, right?

801

:

Which says that, uh which is like, hey, um if I'm a runner and I go from running 10 miles

a week to all of a sudden running 50 miles, I'm increasing my injury risk.

802

:

I'm more likely to get injured moving forward, right?

803

:

Or if I'm training on the soccer pitch for two hours a week and now if I'm going to 10

hours a week, uh

804

:

to have a higher chance of injury.

805

:

But, and to be clear, there's like literally hundreds of papers out there that show that

that is predictive of future injury.

806

:

Okay, great, okay, great.

807

:

So we have this Kube Chronic Ratio.

808

:

In fact, just as a little aside, the Apple Watch now, if you have the newest Apple Watch,

has this built in Kube Chronic Ratio.

809

:

It's like a plot that shows how are you relative to your normal.

810

:

It's the same idea, and it's using a lot of these same ideas.

811

:

But it's essentially like, again, the short-term average divided by long-term average.

812

:

But when we did this internally, a couple of teams I've worked for is that, excuse me, we

found that it wasn't predictive and I couldn't figure out why.

813

:

And so you look at all these papers and basically what they're at is like, their estimate

is like a zero one.

814

:

Did the player get injured after some point in time or not?

815

:

So it's like, if...

816

:

If my acute chronic ratio is high at time t, do I get injured in time t plus one to time,

let's say, you know, the next month or the next week or whatever.

817

:

That's how all these papers look.

818

:

It's like the acute chronic ratio at a snapshot in time, it's past sort of looking

backwards.

819

:

Does it predict injury in future?

820

:

And then I realized like, that's actually not what you care about because what these are

often saying is that, okay, if...

821

:

What we sort of uncovered is that the main reason this is predictive is because if your

load has increased in the last week...

822

:

It also means that your load will likely be high in the future.

823

:

so what is actually happening is that your chance of injury per minute of exposure or per

mile run in the running example, it actually stays, oftentimes it stay constant, but your

824

:

total underlying exposure has increased.

825

:

these are not actually, this like great hundreds of papers that show this thing super

predictive, it's actually predicting exposure more than it's predicting injury.

826

:

ah So you end up with this like confounding with sort of it's oftentimes it's like the

time of the season, right?

827

:

Like you come into training camp, your load spikes and therefore you have higher chance of

injury.

828

:

It's not that your chance of injury for every minute you're on the field is higher.

829

:

It's that you spend more minutes on the field after that.

830

:

Right?

831

:

so like, you know, I'll give you another example, right?

832

:

If you, there's like, if you looked at

833

:

uh So eating ice cream is super predictive of drowning deaths.

834

:

Like if you look at the data on this, you'd say like, okay, when people eat a lot of ice

cream, there's a lot of drownings as well.

835

:

Right?

836

:

And so it's like, what's the mechanism here?

837

:

Are people like eating too much?

838

:

People eat all this ice cream, they get bloated, they can't swim.

839

:

It's like what your grandmother used to tell you, don't swim after eating, you you sort of

like make some choices to make sense of this.

840

:

And then you step back and like, no, there's actually confounding variable here, which is

like season.

841

:

It's the summer.

842

:

People eat more ice cream, they, you know, they swim more, right?

843

:

And so the ice cream is actually just, it's actually saying that there's, it's it's

predicted not of the drownings, but of the amount of water exposure.

844

:

And it's the same thing here that the,

845

:

the Q-chronic ratio is not necessarily predicting a higher, it's not like your chance of

drowning per minute of being in the pool is any different, it's that your minutes in the

846

:

pool goes away.

847

:

And here it's saying the same thing.

848

:

So that's like a simple case of like, uh we did this big project and it wasn't necessarily

this new and novel, like, we got this new thing.

849

:

It was saying, hey, everyone's using this thing that turns out largely to be garbage.

850

:

And it's, it's.

851

:

or certainly at the very least they're putting way more confidence and trust than they

should because fundamentally all these papers have misdefined the estimate.

852

:

They've defined it in a way which is actually not what you care about.

853

:

You don't care about whether the player gets injured in the next two weeks.

854

:

You care about their injury risk per minute of exposure or per minute of gameplay or

whatever.

855

:

And you need to essentially control for that.

856

:

And none of these papers have.

857

:

So that's a really simple example.

858

:

Again, not terribly Bayesian, but sort of when you

859

:

really deeply about a problem, sometimes like this, there's like a really simple solution

right in front of you that can, in that case, know, it's essentially telling people to

860

:

stop doing this thing or at least be much more thoughtful about how you're using this

data.

861

:

And so that's one example of like something that's still super prevalent in sports, but we

kind of uncovered this really interesting confounding effect.

862

:

Yeah, it's fascinating.

863

:

Yeah.

864

:

And I mean, I love it because it's really, I couldn't find the solution in the data.

865

:

Right.

866

:

So I think in a way that's quite patient because you had to something that's fundamental

in base is solutions are not always in the data.

867

:

And that's why you need priors.

868

:

And that's why you need the structure of the model.

869

:

And so here is like literally thinking much more about the generative process of the data

and basically um realizing.

870

:

like in the ice cream and drowning example where you have basically season is, as Richard

McArth say, uh is a fork, right?

871

:

So because it causes both viable, you're interested in, and unless you control or

condition on that variable, you're gonna have uh biased estimates.

872

:

And so here, thinking about that, then you condition on the time plate.

873

:

the time spent on the field and then you see that this predictive aspect basically

disappears because you've taken care of it.

874

:

In this paper we also showed that if you use some ideas from, back to the point about

having a broad toolbox, if you use some really basic tools from causal inference like

875

:

matching or propensity scores that you can actually solve this problem, right?

876

:

If these studies, instead of saying, let's predict whether a player gets injured or not,

if instead you had said, let's take two players at the same point of the season, one who

877

:

has a high IQ chronic and one who doesn't, but control for all the other things like

minutes played in the games and all this other stuff.

878

:

And if you did that, then you can actually control for this effect.

879

:

so, yeah, it's like, I think it's...

880

:

just a really simple example of a case like, you know, have a lot of that work a lot of

money to teams and sort of understanding them well and perhaps in this case, like not

881

:

chasing down rabbit holes.

882

:

Yeah.

883

:

Yeah, I love that.

884

:

That's really fascinating.

885

:

And that's a solution that's like simple, but it can take so long to get there.

886

:

So thanks.

887

:

That's super interesting.

888

:

That's not a big model.

889

:

It's a good example because there's no real stats there.

890

:

It's just like a probabilistic.

891

:

I guess there is in a way, but it's super simple.

892

:

It's a very simple case of this really, are you defining the estimate, the thing you care

about properly?

893

:

Yeah.

894

:

and end up with wildly wrong conclusions.

895

:

Yeah, for sure.

896

:

um So ah I'm going to start winding us down.

897

:

Now he's like, I've already taken quite a lot of your time.

898

:

eh Maybe two other questions, if you can, before we go to the last two questions.

899

:

um

900

:

I'm curious, what do you see as the most exciting trends or advancement now in your field

and how you think they will impact sports analytics and decision making?

901

:

Yeah, think there's probably a couple of things there.

902

:

The first is, as I mentioned earlier, just the growth of this really super interesting

data and all the statistical problems that come along with it.

903

:

That has sort of been a trend over last five, 10 years, but it certainly continues.

904

:

And it means that teams are and sort of...

905

:

Other businesses in the sports space are investing a lot, so you have lot of teams scoring

their analytics departments and media companies and other sort of spending in the space.

906

:

Another cool thing I've seen, which seems sort of like tangential but is actually really,

I think actually quite nice, is that...

907

:

When teams are building out their internal resources, every team typically has to do a lot

of the same things.

908

:

So they have to like set up a database, they have to build all the ETL to ingest the data,

they have to...

909

:

create like an IDM to sort of standardize across the different data sources.

910

:

They have to map players.

911

:

Anyone who's done like player mappings in soccer where you have tens of thousands of

players and you're trying to map, you know, uh data from one data provider to another for

912

:

some Brazilian player with like six names, like it's just impossible.

913

:

Right.

914

:

And so everyone's doing these things sort of on repeat before you can get to like the

interesting statistical problems.

915

:

Right.

916

:

And so one thing that actually that I think has been great in that it's a trend that has

just recently started and actually Zellis has been a big part of that, but

917

:

but there's others as well, uh sort of realizing that's a problem and sort of like finding

centralized solutions for that.

918

:

So there's been cases, for example, of leagues taking some of that on and sort of saying,

all these teams don't need to each be spending hundreds of thousands of dollars a year,

919

:

like mapping players are doing this and this.

920

:

We'll create something.

921

:

Or Zelis has essentially said, hey, all these teams are doing a lot of the same things

when it comes to data engineering and even sort of early simple states, the disk of

922

:

modeling.

923

:

let's sort of democratize that and for a fraction of the cost teams can just sort of

ingest all that.

924

:

And then they sort of to use the baseball analogy.

925

:

Like they're not starting from nothing.

926

:

They're kind of starting on their base.

927

:

Right?

928

:

So that's sort of another thing that I think is like a huge progress in terms of creating

the space for sports analysts to actually work on, you know, challenging interesting

929

:

problems.

930

:

No, for sure.

931

:

mean, so first, now that I work in baseball, I can understand that metaphor.

932

:

So thank you.

933

:

And second, yeah, I mean,

934

:

I'm an open source developer so I'm not gonna tell you this is a wrong direction.

935

:

think, like, know, ideally if you'd ask me I'd be like, eh, I think you know the leagues

should just take over all of that and put all that tracking data and stuff like that.

936

:

Not the latest frontier because, well,

937

:

the industry to keep progressing, know, people still has to have to earn money because

they were like the first one to develop that kind of data.

938

:

know, the old data and stuff like older data and stuff like that, you know, just put that

on the league and then the league could just open source everything so that everything

939

:

gets access to that.

940

:

Everything has the same data.

941

:

Um, and then it's just how you use the data and the combination, not only a light of like

the models, but the combination of the people working on the data, the modelers.

942

:

the GMs, the coaches, like all these people together, the scientists, that's what really

makes the value out of the team much more than having that exact row in their database

943

:

that the other team doesn't have.

944

:

think the rest of the other industries also show that, you know.

945

:

Yeah, our edge at Toulouse did not come necessarily from having the best data.

946

:

Like I think we did actually have the best data and the best models, but it came from

execution.

947

:

There's no question about it.

948

:

Like, yeah, we probably did a little better than if we'd had simpler data, some of the

models.

949

:

Yeah, maybe we would have been a little less efficient, but uh there's a lot of clubs out

there, and I certainly won't say names here, but a lot of teams out there that have great

950

:

data and big analytics groups and so on.

951

:

And, you know, they treat them like houseplants.

952

:

They stick them in the corner and ignore them.

953

:

Right.

954

:

And you're not going to get any, you're not going to create competitive advantage doing

that.

955

:

No, no, no, for sure.

956

:

mean, I see that a bit like, you know, high quality cooking, right?

957

:

What makes a chef good is not only the ingredients, it's like how they use the ingredients

and with whom they're using that and to whom they are feeding the, the, the plates and the

958

:

where they are feeding people.

959

:

know, like, if you go to big restaurants, it's like, it's not only one table where they

just give you the food.

960

:

It's like a whole experience.

961

:

So.

962

:

It's yeah, like I think I think it's a it's a great thing that we're moving in the in this

direction and and yeah, as you were seeing zls is doing a lot of that and that's I think

963

:

that's really amazing.

964

:

um Another question I have for you before the last two ones is, well, what's next for you,

you know, because you're a very curious person and uh now you've so you've sold zls so I'm

965

:

curious are there

966

:

any upcoming projects you're particularly excited about for the month to come?

967

:

know, I sort of spent the first 15 years of my career trying to be the best statistician

that I could.

968

:

I sort of had this personal mantra.

969

:

I was always personally offended if I didn't understand something or I didn't know a

method.

970

:

That's part of what drove me to learn so much.

971

:

It's just like...

972

:

feeling like I just wanted to sort of cover everything.

973

:

And so I spent so much of my career doing sort of being that, right?

974

:

And sort of growing as a statistician.

975

:

And mostly in academia, but of course with very consulting eggs and all sorts of industry

as well.

976

:

And then the last 15 years have really been about, and some overlap there, but the last 15

years have been about sort of sports and learning about sort of applying that domain

977

:

really into sports and sort of how that leads to sort of decision-making, negotiations,

sort of finances and the whole, like the integration of all those things coming together

978

:

to ultimately go from, hey, how do we value players better?

979

:

Right through to like running a team and

980

:

and ultimately outperforming our payroll and all the way through to like creating equity

value for shareholders, right?

981

:

And I kind of feel like I'm at this point where like I've built a lot of time, know, maybe

it's like hit Malcolm Gladwell's 10,000 hours in both of these things.

982

:

And I feel like I'm, with Toulouse, we've sort of proven the thesis here.

983

:

so a lot of right now what's next to me, like I sort of want to keep leaning in on this.

984

:

I feel like I have this, these two things.

985

:

things sort of perfectly hybridized together where these technical skills that I've built

over the years combine with the team management skills and it's a super powerful

986

:

combination and a really valuable combination.

987

:

I like just love using those tools and these ideas to sort of play Revenge of the Nerds in

real life.

988

:

Like going out there and just dominating, using essentially, this is gonna sound very not

humble, but using our group's intellectual advantage to win.

989

:

And to me that's like super, super fun.

990

:

Yeah, mean, love that.

991

:

Definitely support it.

992

:

Like, Revenge of the Nerd, you had me there.

993

:

But yeah, I mean, of course, yeah.

994

:

And that's something.

995

:

Also try to do with the podcast from like trying to...

996

:

percolate these ideas through more people.

997

:

um Because I think that's something that's needed a lot.

998

:

yeah.

999

:

Awesome.

:

01:24:11,406 --> 01:24:12,547

Well, Luke.

:

01:24:13,048 --> 01:24:14,678

I think I think we'll call it a show.

:

01:24:14,678 --> 01:24:20,620

I would still have so many questions like literally I still have so many questions like I

had for you today I didn't ask you.

:

01:24:20,620 --> 01:24:22,511

Sorry that's my fault.

:

01:24:22,511 --> 01:24:24,831

I filibustered a couple of those questions there.

:

01:24:24,831 --> 01:24:27,152

No, no, That's you know, that's me.

:

01:24:27,152 --> 01:24:28,202

have, that's also my job.

:

01:24:28,202 --> 01:24:32,493

have to you know adapt uh to the topics.

:

01:24:32,493 --> 01:24:33,724

But yeah, I mean that's cool.

:

01:24:33,724 --> 01:24:42,396

That means you can come back on the show next time you have a cool project to talk about

and I'll get you, I'll get you to you these questions.

:

01:24:42,736 --> 01:24:47,516

But first, before you leave, of course, I have to ask you the last two questions.

:

01:24:47,556 --> 01:24:49,996

Ask every guest at the end of the show.

:

01:24:50,916 --> 01:24:53,356

So I'm going to change a bit the first one for you.

:

01:24:53,356 --> 01:24:54,556

First time I do that.

:

01:24:54,556 --> 01:25:00,196

But I think I think that you kind of answered a bit the first one already.

:

01:25:00,196 --> 01:25:10,596

And I'm curious because you have a particular background and origin story where if I

remember correctly, in this moneyball episode,

:

01:25:11,568 --> 01:25:25,528

you were saying that you actually started working on spatial temporal data with the

Sacramento Kings, because someone came into your office with like, some question, but the

:

01:25:25,528 --> 01:25:28,848

question was actually not for you, think, if I remember correctly.

:

01:25:28,848 --> 01:25:39,464

And that was so that's a very, very random, you know, origin story where that could be in

a in a movie in a way where it's like, the hero never wants to

:

01:25:39,630 --> 01:25:40,620

be a hero, right?

:

01:25:40,620 --> 01:25:43,543

It's like the situation imposes onto him.

:

01:25:43,543 --> 01:25:47,369

So my question is counterfactual for you.

:

01:25:47,369 --> 01:25:49,411

If that moment had not happened, right?

:

01:25:49,411 --> 01:26:01,683

If you had not been in your office at that time, at that place, and you hadn't met that

person and ended up working on on sports data, what do you think you would have done?

:

01:26:02,992 --> 01:26:10,472

Yes, I think everyone's lives are sort of built up with a lot of these just kind of random

events that accumulate into who you are.

:

01:26:10,632 --> 01:26:11,892

That was certainly one of them.

:

01:26:11,892 --> 01:26:22,972

I won't retell that story because as you say, it's on the Word and Money Balls podcast,

but I think I'd still be working in sort of very similar spaces, but possibly not sports,

:

01:26:22,972 --> 01:26:23,072

right?

:

01:26:23,072 --> 01:26:29,936

Sort of taking all these ideas, bays and so on and...

:

01:26:29,936 --> 01:26:32,476

trying to apply it to some interesting problem.

:

01:26:32,476 --> 01:26:41,616

for some reason, I ended up working in a space which is working on a zero-sum game where

billionaire owners are trying to extract value from millionaire players and the

:

01:26:41,616 --> 01:26:43,996

millionaire players are trying to get dollars from the billionaires.

:

01:26:43,996 --> 01:26:54,656

So it's like a very strange space to work in, but I think I'd probably be in a very

similar spot but not in sports.

:

01:26:54,816 --> 01:26:59,816

I was on a path where I was doing a lot of stuff in climate, maybe it'd that, maybe it'd

be...

:

01:26:59,864 --> 01:27:07,892

sort of some other domain, but I think it'd be sort of the same thing I'm doing now, but

just a different domain other than sports.

:

01:27:07,953 --> 01:27:09,034

Yeah, yeah.

:

01:27:09,034 --> 01:27:11,036

Yeah, I was thinking maybe agriculture.

:

01:27:11,036 --> 01:27:15,261

oh They do a lot of spatial temporal stuff over there.

:

01:27:15,261 --> 01:27:20,016

I actually published a paper once on crop yield predictions in the Canadian prairies.

:

01:27:20,016 --> 01:27:21,507

How exciting is that?

:

01:27:21,870 --> 01:27:23,842

Yeah, yeah, yeah, exactly.

:

01:27:23,842 --> 01:27:29,968

That's actually, we worked on a similar project when I was at Pimcee Labs.

:

01:27:29,968 --> 01:27:33,231

it was actually something like that.

:

01:27:33,452 --> 01:27:34,533

Gaussian processes.

:

01:27:34,533 --> 01:27:37,916

So that's a cool space because you can use a lot of Gaussian processes.

:

01:27:38,857 --> 01:27:42,460

That's also very challenging because Gaussian processes are hard to fit.

:

01:27:42,861 --> 01:27:44,743

But they are such cool beasts.

:

01:27:46,274 --> 01:27:54,009

And second question, if you could have dinner with any great scientific mind, dead, alive

or fictional, who would it be?

:

01:27:55,130 --> 01:28:00,533

Yeah, this is interesting because I know you've asked the same question to others, so I

sort of gave it some thought.

:

01:28:00,533 --> 01:28:08,258

You know, I've been really fortunate that over the last 20 years or so, I've had a lot of

tremendous dinners with people, right?

:

01:28:08,258 --> 01:28:10,680

Whether that's in academia, like...

:

01:28:10,882 --> 01:28:22,021

Some of the, I just have great memories of dinners with people that you probably know,

Christian Robert and so I had this amazing dinner in Bristol with Julian Bisseg in the

:

01:28:22,021 --> 01:28:22,732

last year of his life.

:

01:28:22,732 --> 01:28:24,103

was a visitor there in Bristol.

:

01:28:24,103 --> 01:28:25,840

ah

:

01:28:25,840 --> 01:28:27,660

Peter Green at the same time.

:

01:28:27,660 --> 01:28:36,120

Just so many incredible dinners and sort of, and then of course now in the sports world

with coaches and owners and so on, I've just been really fortunate.

:

01:28:36,320 --> 01:28:43,560

So in sort of like the explore, exploit, I think I have a pretty good idea of what the

distribution of good and bad dinners are.

:

01:28:43,560 --> 01:28:55,500

And so I think there's a really good chance that if I named a name of someone I've never

met, that it would be sort of below my expected return if I were to sort of exploit the,

:

01:28:55,794 --> 01:28:57,354

explore rather than exploit.

:

01:28:57,534 --> 01:29:07,174

I think I would have to say to sort of look at some of the people that I've had some of

the most interesting dinners with, and I thought about this a little bit, and think the

:

01:29:07,174 --> 01:29:09,914

person I would name is Xiaoli Meng.

:

01:29:10,210 --> 01:29:15,402

So Shelley was the chair when I was hired at Harvard and then later became the Dean.

:

01:29:15,643 --> 01:29:19,184

And uh he is perhaps one of the most fascinating people I've ever met.

:

01:29:19,184 --> 01:29:24,620

He's just full of wit and humor and intelligence and just a kind human being.

:

01:29:24,620 --> 01:29:33,090

It doesn't hurt that he has a uh giant box of scotch uh sitting under his desk, which can

uh come in handy at times.

:

01:29:33,211 --> 01:29:35,432

And so, yeah, I think I would not explore.

:

01:29:35,432 --> 01:29:37,052

think I would exploit.

:

01:29:37,413 --> 01:29:40,354

by that, I mean, I haven't had dinner with Shelley and probably

:

01:29:40,354 --> 01:29:44,416

a decade and I would love to sit down with him again.

:

01:29:44,957 --> 01:29:45,898

Nice, yeah.

:

01:29:45,898 --> 01:29:51,071

Well, I really like the structured answer.

:

01:29:51,071 --> 01:29:53,001

I think I never had that yet.

:

01:29:54,163 --> 01:29:55,823

So thank you so much.

:

01:29:56,604 --> 01:29:57,414

Yeah, awesome.

:

01:29:57,414 --> 01:29:59,286

Well, I think that's...

:

01:29:59,286 --> 01:30:01,807

Let's call it a show, Luke.

:

01:30:01,967 --> 01:30:03,218

That was really amazing.

:

01:30:03,218 --> 01:30:10,030

I'm really happy because we get to explore a lot of the questions I had for you from a...

:

01:30:10,030 --> 01:30:15,746

decision-making perspective but also got to be very nerdy so that's great.

:

01:30:16,347 --> 01:30:17,628

Thank you so much.

:

01:30:17,628 --> 01:30:22,884

As usual, we'll put links in the show notes for those who want to dig deeper.

:

01:30:22,884 --> 01:30:26,998

Thanks again, Luke, for taking the time and being on the show.

:

01:30:27,179 --> 01:30:27,979

Thank you, Alex.

:

01:30:27,979 --> 01:30:28,940

It was a blast.

:

01:30:33,796 --> 01:30:37,499

This has been another episode of Learning Bayesian Statistics.

:

01:30:37,499 --> 01:30:47,988

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbaystats.com for more resources about today's topics, as well as access to more

:

01:30:47,988 --> 01:30:52,071

episodes to help you reach true Bayesian state of mind.

:

01:30:52,071 --> 01:30:54,033

That's learnbaystats.com.

:

01:30:54,033 --> 01:30:58,877

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.

:

01:30:58,877 --> 01:31:02,039

Check out his awesome work at bababrinkman.com.

:

01:31:02,039 --> 01:31:03,224

I'm your host,

:

01:31:03,224 --> 01:31:04,204

Alex Andora.

:

01:31:04,204 --> 01:31:08,424

can follow me on Twitter at Alex underscore Andora, like the country.

:

01:31:08,424 --> 01:31:15,690

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

:

01:31:15,690 --> 01:31:18,071

Thank you so much for listening and for your support.

:

01:31:18,071 --> 01:31:20,382

You're truly a good Bayesian.

:

01:31:20,382 --> 01:31:23,503

Change your predictions after taking information.

:

01:31:23,503 --> 01:31:30,530

And if you're thinking I'll be less than amazing, let's adjust those expectations.

:

01:31:30,530 --> 01:31:43,688

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
131. #131 Decision-Making Under High Uncertainty, with Luke Bornn
01:31:45
128. #128 Building a Winning Data Team in Football, with Matt Penn
00:58:10
125. #125 Bayesian Sports Analytics & The Future of PyMC, with Chris Fonnesbeck
00:58:14
124. #124 State Space Models & Structural Time Series, with Jesse Grabowski
01:35:43
116. #116 Mastering Soccer Analytics, with Ravi Ramineni
01:32:46
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
111. #111 Nerdinsights from the Football Field, with Patrick Ward
01:25:43
108. #108 Modeling Sports & Extracting Player Values, with Paul Sabin
01:18:04
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
89. #89 Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler
01:59:50
85. #85 A Brief History of Sports Analytics, with Jim Albert
01:06:10
2. #2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck
00:43:37
25. #25 Bayesian Stats in Football Analytics, with Kevin Minkus
00:55:58
49. #49 The Present & Future of Baseball Analytics, with Ehsan Bokhari
01:12:54