Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.
Takeaways:
Chapters:
11:58 Transition from Academia to Sports Analytics
20:44 Evolution of Sports Analytics and Data Sources
23:53 Modeling Uncertainty in Decision Making
32:05 The Role of Statistical Models in Player Evaluation
39:20 Generative Models and Bayesian Framework in Sports
46:54 Hacking Bayesian Models for Better Performance
49:55 Understanding Computational Challenges in Bayesian Inference
52:44 Exploring Different Approaches to Model Fitting
56:30 Building a Comprehensive Statistical Toolbox
01:00:37 The Importance of Data Management in Modeling
01:03:21 Iterative Model Validation and Diagnostics
01:06:53 Uncovering Insights from Sports Data
01:16:47 Emerging Trends in Sports Analytics
01:21:30 Future Directions and Personal Aspirations
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
uh Today, I am excited to host Luke Bourne, a true pioneer in sports analytics and based
in statistics.
2
:Luke started his career as a statistics professor at Harvard and Simon Fraser before
pivoting almost entirely to sports analytics over a decade ago.
3
:Luke has worked across multiple sports roles including quantitative gambling, analytics
leadership at A.S.
4
:Roma and the Sacramento Kings and co-founding
5
:obviously Zealous Analytics, which grew to a 75 plus person company before being acquired
by Teamworks.
6
:He was also part of the ownership group at Toulouse FC, where he's applied data-driven
decision-making to build a competitive club on a budget, winning both Ligue 2 and the
7
:Coupe de France.
8
:In this episode, Luke takes us through the evolution of player tracking data and the
application of basic methods in decision-making under uncertainty.
9
:We dive into portfolio optimization for player acquisition, the challenges of model
validation, and the role of generative models in forecasting performance.
10
:Whether you're into sports analytics or just fascinated by decision-making in high-stakes
environments, this episode is packed with insight from one of the best in the field.
11
:This is Learning Basics and episode 131, recorded December 10, 2024.
12
:Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
13
:I'm your host, Alex Andorra.
14
:You can follow me on Twitter at alex-underscore-andorra.
15
:like the country.
16
:For any info about the show, learnbasedats.com is the last to be.
17
:Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
18
:That's learnbasedats.com.
19
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.
20
:See you around, folks.
21
:and best patient wishes to you all.
22
:And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
23
:Check us out at pimc-labs.com.
24
:Luke Bourne, welcome to Learning Bayesian Statistics.
25
:Yeah, thanks for me on.
26
:Yeah, thank you so much for taking the time.
27
:This is a blast to have you here.
28
:uh I never thought I would be able to uh interview you on the show, to be honest.
29
:uh You're kind of a uh superstar to me in my world, you know, so I'm very happy to have
you here.
30
:And thank you so much to Patrick Ward for...
31
:us in in contact.
32
:Patrick, if you're listening, thank you so much.
33
:if you when you come to Miami, I definitely owe you a good dinner.
34
:know, anything Patrick asks, I do.
35
:So and as prep for this, I went back and I looked at all your previous, listened to a
bunch of the previous episodes and really kind of
36
:really cool to me to see your podcast in some sense resembles my career like the last you
recent episodes you have a lot of people that I call colleagues and friends now Robbie and
37
:Patrick and ah Paul Saban and then I look farther back and it's like
38
:Julian Cornerbees, and I worked together in North Carolina, what he was doing was
post-doctoral at UBC.
39
:We actually wrote a short paper together.
40
:Andy Gellman and I shared an office when I started at Harvard.
41
:Nicholas Chapan and I have had many dinners together over the years.
42
:Kevin Murphy, know well from my time at UBC.
43
:So was really cool to kind of, it really blends my sort of two phases of my career, of
first as an academic and now sort of in sports.
44
:Oh yeah, okay, that's good to know.
45
:Yeah, for sure.
46
:So I didn't know you knew um Nicolas and Julien so well.
47
:um I remember Andrew Gelman, um which much has been really cool.
48
:honestly, working in the same hallway as Andrew because each time I get to see him, it's
really cool and I get a lot of, uh you know, it's just awesome because he always has so
49
:many stories and I can ask him any questions I have.
50
:So working with him.
51
:must be incredible.
52
:Yeah, and we shared an office my first my first semester at Harvard.
53
:I was actually on sabbatical kind of a parental leave because I just had a kid.
54
:And I was going back and forth between Vancouver and Boston sort of we're in the process
of moving.
55
:And so yeah, I shared an office with him.
56
:It was a really cool experience.
57
:Yeah, I mean, I guess.
58
:And actually something Patrick told me a while preparing for the episode he told me so you
can ask any Luke anything about spots.
59
:He knows a lot about that in stance.
60
:But don't ask him anything about guitar playing, because I'm way better than him,
something like that.
61
:Do you know what he's referring?
62
:Yeah, so I used to play guitar quite a lot.
63
:For those watching this on video, we'll see the guitars in the background.
64
:um
65
:But Patrick is actually a properly trained musician.
66
:Actually, I have an interesting story about Patrick and guitar which is so I was at the
Sloan Sports Onlives Conference This is probably five six years ago and we were upstairs
67
:at Trillium Brewing, is the sort of a you know, nice nice restaurant brew pub near the
conference and I was sitting at the corner of the bar and on one side of me was Patrick
68
:who works for the Seattle Seahawks I think was on the your podcast what a couple months
ago or something and Sonny Maida was on the other side
69
:of me and he's an assistant general manager for the Florida Panthers, incredibly
interesting guy, just two incredible guys on both sides of me.
70
:And at some point, late into the evening, it occurred to me that these guys had just met
and they had this incredible unknown connection, is Patrick actually studied jazz guitar
71
:at Berklee in Boston and so formally studied jazz guitar.
72
:Sonny Maeda was actually a professional jazz guitarist in New Orleans for years.
73
:In addition to being a
74
:professional poker playing, all sorts of other cool stuff.
75
:ah And so I said to these guys, said, whoa, like you guys are both like leaders in sports
analytics, but you have this really bizarre connection that neither of you realizes, and I
76
:do.
77
:And see how long it takes you to figure this out.
78
:So they started on like, know, academic stuff.
79
:It took them probably a good 20, 30 minutes before they realized.
80
:then I couldn't keep them apart for the next like four hours as they just, you know,
compared teachers and the musicians.
81
:they played with and so on.
82
:you know a long history of course of math and music being tightly tied together but that
was a cool example of two of my favorite people and Sonny and Patrick sort of like come
83
:together on a musical interest.
84
:Yeah this is really cool.
85
:For those watching the video it's the first time that happens.
86
:I'm in a new flight now and I have a picture behind me.
87
:uh
88
:Like there is a face on that picture and basically the iPhone is freaking out because he's
seeing two faces and so he doesn't know if he should focus on myself or the painting.
89
:It's literally going back and forth between the weird painting and you.
90
:So I think I should take that personally, right?
91
:Because I look nothing like the painting and the painting is pretty awful so it's like I'm
taking it quite personally.
92
:I'm selling my iPhone after the episode.
93
:I'm going to get rid of that painting actually while you're talking, which is way too
distracting for people.
94
:So anyways, yeah, great, great story.
95
:I love that.
96
:And yeah, as you were saying, yeah, of course, deep history of music being actually very
mathematical and mathematics also being very creative and artistic in a way.
97
:We've talked about that already actually on the show.
98
:Yeah, at Zellas, which we'll talk about in a bit.
99
:there's a very active music culture, a lot of people who play instruments that are hooked
on music, one of them being uh Daniel Lee, who you've also had in the podcast.
100
:Really cool guy, but just really super passionate about music as well, and I think it's
pretty common connection.
101
:Yeah, yeah, yeah.
102
:So as you saw, could not get shoot of the paintings like really into the wall.
103
:So we'll have to do that with that.
104
:That's going to be a collector episode, I encourage you to check the YouTube video of that
one.
105
:uh Yeah, mean, Daniel is actually incredible.
106
:Each time I come to New York and I get to meet him, he always gives me...
107
:tons of recommendations, know, of stuff to do in New York, very artistic.
108
:He's also very well versed into in architecture, which is something he and I have in
common.
109
:yeah, he's like, knows, he knows really well architecture.
110
:And so that's really cool because he always gives me amazing things to go see in New York.
111
:So yeah, thank you.
112
:Thank you so much, Daniel, for enlightening me all the time.
113
:Did you get a private concert when you were at the Sloan Conference from Patrick in that
other name?
114
:don't remember.
115
:From Sonny, no.
116
:Patrick, though, in the past has sent me videos of himself playing guitar, and he's a much
better guitarist than I am, so I have never reciprocated.
117
:What's talent of yours actually that you know like something that's I mean you're
obviously talented in your job and everything you do related but like a non-professional
118
:talent that you could send a video to Patrick.
119
:man it's good question.
120
:Yeah and I know I'm putting you on the spot here because it's not at all it's the first
time I ask these questions and that was not in the questions I sent you.
121
:I'm trying to get sort of obscure talent you know I'm
122
:These days, it's my whole life is job and family.
123
:So I'm extremely good at losing to my kids at Mario Kart and horrendously bad at putting
together IKEA furniture.
124
:So maybe it's like good at being bad at child-related things.
125
:So yeah.
126
:I mean, I don't know if you're already at the part where you're losing on...
127
:you're not losing anymore on purpose at my account against your kids.
128
:I am at that point, my kids a little older now, they're eight, 10 and 12.
129
:And so certainly the older two are, you know, super smash bros.
130
:have no chance.
131
:I just mash buttons and hope I get lucky.
132
:So yeah, I'm well past that point.
133
:They're definitely better gamers than me.
134
:Yeah.
135
:I mean, they probably don't listen to the show.
136
:So if you tell them you do it on purpose, I think you can still do it for a few years, you
137
:I didn't purpose that one.
138
:Exactly.
139
:I made you win.
140
:uh Actually, yeah, so can you tell us what you're doing nowadays?
141
:Because you do so many things, uh very original.
142
:yeah, how, what do you tell people when they're asking you what you're doing nowadays?
143
:And also how you ended up doing what you do and working on this?
144
:Yeah, I tend to give people, it depends on how long of a conversation I want to have with
someone.
145
:If I want to sort of end it quickly, that's what I do.
146
:I say I'm a statistician and it ends the conversation right there.
147
:For longer conversations, you know, I'm really fortunate that I have sort of a one word
description of what I do, which is moneyball.
148
:So certainly people outside of the field, what do do?
149
:Yeah, you know, I'm the moneyball guy and that's sort of a really easy way to sort of
describe what I do.
150
:But yeah, if I give sort of a longer background,
151
:I did a PhD in stats and machine learning under Arnaud Doucet and Jim Zidak at UBC.
152
:Then I went on to spend some time on the tenure track at Harvard and then Simon Fraser.
153
:through that time sort of made a transition into sports and we can talk about that
transition later if you want but sort of pivoted over to sports analytics and since then
154
:have worked for a bunch of NBA teams, spent a few years with the Sacramento Kings, some
consulting games with others, spent a little over a year with AS Roma in Italy, a geek in
155
:quantitative gambling.
156
:And then the last four or five years with some partners, we banded together alongside some
investors and we bought Toulouse FC in the South of France and another club more recently.
157
:And alongside that, I co-founded a company called Zealous Analytics, which uh we built up
to about 75.
158
:employees and we actually sold the Teamworks to, sorry, we sold Zellis to Teamworks back
in the summer.
159
:sort of just came full circle on that.
160
:So now my time is spent.
161
:I continue to work a little bit with Teamworks with the sort of old Zellis crew and
continuing to be part of the operations of Toulouse.
162
:Hmm, yeah.
163
:Okay, so Toulouse and not Milan.
164
:Not terribly involved in Milan at the moment, that's right.
165
:Yeah.
166
:Yeah, because you still cannot be involved in both at the same time, right?
167
:Because of the...
168
:Yeah, it's complicated.
169
:It's complicated with UEFA.
170
:Yeah.
171
:Yeah.
172
:Okay.
173
:Yeah, that's just a great...
174
:Just such a great path.
175
:yeah, we'll definitely dive into these lightest activities that you were talking about
because I'm really interested in how...
176
:are the kind of models we're going to talk about and we talk about on this show all the
time and I do personally in my in my everyday life uh how they used how they consumed by
177
:um the people I make the models for you know and that's that's something we have
178
:I get asked to do interviews and podcasts quite regularly, but it's almost always in
sports and I turn it down.
179
:This was like, get a chance to sort of like be nerdy again and talk about things that,
know, like technical things that I'm super interested in.
180
:So, yeah, cool.
181
:So then thank you.
182
:Thank you so much.
183
:That's an honor.
184
:Yeah.
185
:And I think also if people are interested in a bit more about your background and stuff
like that, I would refer to the
186
:I think it's the Wharton Monable podcast where you were a few months ago I listened to
that one to prepare for the show so I'm not gonna ask you the same questions uh this
187
:interview is really great because you go into detail how you ended up doing what you did
your time at Sacramento, your time at S-Roma and that's a great interview so
188
:Yeah, for people, I put that in the show notes for people who want to get a bit more
detailed with your background.
189
:Today, uh let's talk about patient stats a bit, you know, like how were you introduced to
these weird worlds of peace?
190
:When I got married and my brothers gave a speech, part of the speech was...
191
:they said he's a proud Bayesian and I can't remember, I just remember them like
pronouncing the word Bayesian like really strangely.
192
:So it's part of my identity when it's part of a wedding speech.
193
:But you know, I did my PhD at UBC and at the time I was there,
194
:It was like this incredible hotbed of people in this domain.
195
:So, you know, did my PhD with Arnold de Se.
196
:And actually at the time there was like this really, was him, Jim Zidak, who also did
another co-supervisor of mine, Rafael Guitardo, Paul Gustafson, Nando De Freitas, Kevin
197
:Murphy, you know, and many others.
198
:So these people at the time, it's kind of all I knew.
199
:So I didn't kind of realize how special that situation was, but it was sort of,
200
:you know, got to spend essentially six years of my life surrounded by those people and
kind of deeply ingrained in me a sort of Bayesian way about thinking about the world.
201
:oh yes, so that was, yeah, that was quite fast basically and that's the way, that's kind
of the way you learn statistics if I understand correctly.
202
:Yeah, you know, I did an undergrad in mathematics and then my master's and PhD were both
in stats and
203
:I sort of, my PC thesis was sort of divided between sort of spatial stats and Monte Carlo
methods and.
204
:So you sort of those two things together when you think about sort of Bayesian spatial
statistics, combined with Monte Carlo, it sort of covered the whole gamut of sort of
205
:Bayesian modeling techniques.
206
:And of course, with Kevin Murphy there and Nanodifera, just like also a lot of exposure
to, you know, what Kevin might call like probabilistic machine learning or statistical
207
:machine learning kind of thing.
208
:yeah yeah okay yeah so that's funny because it's a bit like myself also where I didn't
have to unlearn you know all the stuff I had learned very confusingly in in undergrad like
209
:basically learn patient sense from the from the start um
210
:kind of make it simpler on the teacher, would say, as a teacher now, like it's way easier
to teach people like us than to teach people who learned a lot of frequent distance and
211
:now are trying to switch because I mean, naturally they always try to come back to
something that's familiar and sometimes it's very different.
212
:So you're like, okay, try to, you know, try to forget that, that paradigm.
213
:That's hot.
214
:UBC was interesting because you had sort of this Bayesian cohort, but then you also had
people working on like robust estimators like M estimators and Taoist estimators, all
215
:these kind of very frequentist ideas.
216
:So it was kind of being exposed to both worlds.
217
:I think it was actually quite useful because it helps you really understand the sort of
philosophical differences as well as the, you know, the pros and cons of different ways of
218
:thinking about things.
219
:Yeah.
220
:yeah.
221
:I think that's definitely super interesting and super worth it.
222
:And, as always, as always,
223
:say for people interested in kind of the epistemological kind of side of things.
224
:The two best episodes to start with that that we have are episode 50 and 51.
225
:50 is with David Spiegelhalter, the only sir we had on the podcast for now.
226
:Maybe we'll get Alex Ferguson, know, one day.
227
:And 51 with Aubrey Clayton, the author of
228
:the book Bernoulli's Fallacy.
229
:If you want to start with some epistemological topics, would definitely recommend these
ones, So yeah, just go on the website and look for those.
230
:But yeah, I think in my experience that I only give that to the students when they ask for
it.
231
:There's usually the way that works for people is there are many interesting in the, you
know, in the practical side of things.
232
:Um, and unless they are very nerdy like me, they're like, okay, but why is that actually
interesting?
233
:And why is that working better in these cases?
234
:You know, and interesting in really the why then I'll, I'll give that to them very happily
because I love that.
235
:But, uh, in my experience and especially also for my bosses or else, you know, I'm like,
just.
236
:I just show the model and show why that's interesting and why that would help their
decision making.
237
:uh And actually you personally, as we've seen, you've been involved in sports analytics
for years.
238
:So I'm curious about so many things.
239
:First, what's your vision on how the field has evolved, ah especially with the rise of
different sources of data?
240
:in different spots.
241
:another question I have you as a decision maker is how do you approach modeling
uncertainty in your decision making, uh whether that's with Toulouse or, uh well, Milan
242
:less now, the different things and projects you're working on, and whether we're talking
about evaluating players or planning strategies.
243
:Yeah, so the first thing is about sports analytics.
244
:So when I sort of getting into sports analytics at the time, it was the early days of
player tracking data.
245
:So prior to that, the vast majority of the data up to that point had been event data, hand
tracked, maybe a couple hundred or a couple thousand events per game sort of true across
246
:sports.
247
:Now baseball may be the exception there, but for most sports it's sort of very simple
data, count data, that kind of thing.
248
:And about 2012, I was really fortunate.
249
:One of the first people I met when I started at Harvard was...
250
:guy named Kurt Goldsbear and he had just been handed the NBA's player tracking data.
251
:And this data captures multiple times per second the location on the court of every player
in the ball in three dimensions.
252
:And so to me, when I looked at this, I like, this is like, I'm actually not that big of a
sports fan, but to me it was the richest space time data I'd ever seen.
253
:And the amount of structure that exists in this data because of the sport itself, both the
rules as well as the sort of tactics and strategy.
254
:It was just like the most fascinating and challenging problem I think I had ever seen ever
come across.
255
:So for me, sports was not, uh my path into sports was not, I want to work in sports or
sports is really cool.
256
:It's like this is really interesting and hard and unsolved.
257
:Keep in mind my PhD was essentially on modeling things in space-time along with SMC
methods and so on to handle these things computationally efficiently.
258
:And so I had the right toolkit to handle this type of data.
259
:So yeah, that's kind of how I made the transition from academia to sports is fundamentally
with these new data sets that are really complex and spatially and temporally rich, uh
260
:having the right skill set and just realizing actually there's lots of interesting
problems.
261
:After a while I actually consulted in sports and continued academic gigs and just found
that I enjoyed the sports stuff more.
262
:The problems were more interesting.
263
:I never enjoyed going to committee meetings and reviewing papers and revising papers.
264
:oh The peer review process in academia is fundamentally broken.
265
:uh
266
:So it was an easy decision when I decided I'm gonna go full time into sports.
267
:So anyway, that's the sort of sports piece, how I sort of pivoted to sports.
268
:The second part of question, which is like handling uncertainty in decision making.
269
:I really think about that as the end user.
270
:We wanna try and construct team.
271
:that's gonna outperform its budget.
272
:So let me use Toulouse as an example.
273
:So Toulouse has a payroll of about, you know, 15, 18 million a year.
274
:That's euros.
275
:our goal with that- Which is, so to give perspective to people who don't know, you know,
sports.
276
:That's not a lot.
277
:Right.
278
:If you compare to the We are the lowest or second lowest in the league.
279
:And our goal is to have...
280
:the are like way bigger.
281
:Like they have way bigger payrolls than you two, the big clubs.
282
:You might know a big club in France that has an insanely inflated payroll.
283
:There's a few of them.
284
:It starts with a P and ends with a G.
285
:And so, ah know, just to make the math easy, we're spending 15 a year.
286
:We want to perform like a mid-tier club, like a sort of an average club in the league,
which is typically spending 40 or 50 million.
287
:So the way I think about us is like, how do we spend 15 million a year?
288
:So that's like the lowest or second most budget performed, like the 10th or 11th best team
and that's sort of 40, 50 million.
289
:And to do that, we sort of have to have a distributional sort of.
290
:assumptions around the value that each individual player creates.
291
:And essentially it becomes like a portfolio optimization problem where you have a set of
players which sort of act like individual uh assets in a way.
292
:And you're trying to sort of find the combination of players that you can have some with
some reasonable confidence that you're going to be able to perform to the level that you
293
:need.
294
:Certainly the very least avoiding relegation, sort of assembling these things in a way
which may, you you're trying to sort of de-correlate these assets that these player
295
:perform.
296
:is, and so on, sort of such that you minimize your chance of relegation and maximize your
chance of performing at the level you want to perform at.
297
:Yeah, yeah, and that's funny because that's also a lot how I personally, you know, try and
explain the issue to, I mean...
298
:what what I do to to people and why actually it's interesting for clubs to model with you
know the kind of models we do um because often people don't know at all what's going on
299
:you know in the front office is like especially in europe in the us I think you know the
influence of money bowl as you were saying um not only the book but also the movie is is
300
:much more pregnant but in europe
301
:especially soccer, especially continental Europe is like, really?
302
:People do that.
303
:And so, yeah, like trying to explain them in a way, as you were saying, as a portfolio
management, or I'm saying, well, if you're, know, when you have less, remember when you
304
:were a student and you were broke, you had to be very careful about any dime that was
leaving your account.
305
:So that's basically why also for the clubs which have way less budget, it's actually much
more interesting to
306
:do these modeling because they have to be much more careful about what they spend, whereas
PSG or Real Madrid, it's not that big of a deal for them, at least at the beginning.
307
:uh
308
:such a high budget relative to rest of league, you can actually be fairly inefficient with
it and still be quite successful, right?
309
:um I think they are.
310
:think that's good description of the club.
311
:Yeah, and to boil it down to a really simple example, let's say you want to acquire a
player, want to fill, say you're going for striker, have player A, player B.
312
:The traditional approach would say, okay, let's say player A costs one million a year,
player B costs two million dollars a year.
313
:And traditionally the Scots would
314
:Well, we like player B better so let's get them but they say hold on one guy cost two
million a year that one cost one million a year one of them's you older young girl these
315
:different factors how do we choose one and they have to sort of Either kind of comparing
apples to oranges.
316
:They have these sort of financial traits and then they have the scout saying I like this
guy better He's got a nice he's got a good left foot and he has good spatial awareness And
317
:he said well, how do I turn that into dollars?
318
:Like how do I decide whether I'm gonna spend two million or one million on player B or
player a?
319
:whereas Statistical methods come along and you can sort of very precisely say hey
320
:there's a 10 % chance that this player performs about his contract, there's a 30 % chance
that this other guy performs about his contract.
321
:So you can actually be much more explicit and probabilistic about the value a player
brings, most importantly, how they bring relative to their contract.
322
:Yeah, yeah.
323
:And so I'll show you how...
324
:how does that work for you?
325
:I'm curious as the decision maker, as one of the decision makers, how do you actually use
the models to make decisions?
326
:And I think for me, you would be one of the best bosses I could have, right?
327
:Because you come from the modeling side.
328
:So that would make my work, of course, way easier because I could talk to you in technical
terms, of course.
329
:But I'm very interested in how you personally consume the models.
330
:What is actually useful
331
:For you, know, are those the posterior distribution?
332
:Is this something that you are more interested in the comparisons?
333
:You're more interested in the tail probabilities?
334
:What's actually something you care about when you make the decisions?
335
:Yeah, I think the most useful way to think about
336
:like for the end user is really sort of a distribution on future performance value.
337
:And what I mean by that is you can imagine saying, we think next year this player's
performance on our team and ideally in dollars would be to say, you know, some, some
338
:distribution over the value they will add in dollars.
339
:So if you look at basketball, it's like, you know, maybe, maybe LeBron James is worth 40
million a year, but, and, we might say, Hey, we think that there's a 20 % chance that
340
:he's, you know, he's thinking some distribution.
341
:There's a 20 % chance that
342
:he's worth 60 million or more.
343
:And there's a 10 % chance, because of injury or whatnot, that he's actually worth 5
million or less.
344
:So understanding that distribution and then seeing how it progresses through time.
345
:That's the key information you need to make a decision about whether to, you know, it's
not going to sell the player or sign that player or to extend their contract or
346
:renegotiate the contract.
347
:So that's sort of the key bits of information.
348
:And ultimately we want to use that information as much as possible.
349
:I sort of think about...
350
:A lot of what I do now is actually just removing humans from decision making.
351
:You know, sort of, there's so much work now in behavioral economics and elsewhere that
humans are just quite biased in the way that they go about decision making fundamentally.
352
:so, you know, lot of what I do is sort of focusing on how do we...
353
:How do we remove that bias from the decision making process?
354
:How do we take the good bits of subjective information and remove those biases in a way
which is ultimately going to allow you to create some arbitrage opportunity in player
355
:value and ultimately to create shareholder value.
356
:Okay, I see.
357
:If I try to summarize.
358
:Basically something like a war metric is how you summarize the contribution of the player
then but that's still in terms of you know goals or wins or whatever and then transforming
359
:that contribution to dollars in and then making the decision based on that and probably
making a decision that's helped a lot by uh
360
:computer assistance to try and limit the decision, the biases that are very well
documented in how humans make decisions.
361
:Exactly.
362
:One way to think about it on a player, if every single player on your team is underpaid,
you are going to have a very good team.
363
:Right?
364
:Like assuming you're paying, you know, have a certain payroll, right?
365
:Conversely, it's easier to think about the other way, right?
366
:If you have a fixed payroll, if you have every single player on your team, if you have a
low payroll and every player in your team is overpaid, you're going to be very, very bad,
367
:right?
368
:So the goal is essentially to getting as much performance value as you can for the dollars
you spend, right?
369
:history has shown that that statistical models and sort of probabilistic reasoning are a
much better way to go about that process than sort of the traditional gut-feel scouting
370
:approach.
371
:Yeah, yeah, but that's really fascinating to really see that research now pan out in, you
know, in
372
:in real uh clubs like you're doing so that's why I'm really fascinating by that kind of
work and also you know yeah I think these are great organizations as modellers at least to
373
:work in because
374
:That's also where your work has more impact, I would say.
375
:Yeah, in sports, you just get a tremendous amount of super interesting modeling problems.
376
:Like, of thinking that you talk about how to use Bayesian methods and so on.
377
:That's...
378
:Fitting the model is not the sort of interesting part, right?
379
:It's like even defining the model is not necessarily the interesting part.
380
:Where you get in, just, it's all sort of the issues that come around it that become super
interesting.
381
:It's like, if you've defined your estimate, like slightly incorrectly, you can end up with
wildly wrong conclusions.
382
:Conclusions that can cost you millions of dollars.
383
:Or there can be like,
384
:there's probably lots of examples we can get into, where this sort of standard model, like
think about what you're doing in sort of traditional validation of models.
385
:You say, okay, we fit this model and then we sort of say, hey, this thing predicts well by
some metric, like, you know, RMSC or blog loss or whatever it might be, right?
386
:We say, hey, this thing works well.
387
:And that's sort of a summary across all of the data.
388
:But think about what you're doing in practice when you're running the sports team.
389
:saying, I want the players who have really distinctly act like
390
:surplus value to add to my team.
391
:Ultimately, you're really you're grabbing players from sort of the edges of the prediction
space.
392
:If you think about the joint space of like performance and dollars, the players you want
are sort of at that threshold where they're cheap and they're good.
393
:Right.
394
:And so if you're so you don't care about the bulk of predictions from the model, what you
care about is the predictions in the space where you're actually actioning.
395
:Right.
396
:So it's the it's the predictions in the space of players that you're actually going to be
able to action with.
397
:players on your team or players you're acquiring or of course that extends ideas of
strategy and so on as well, right?
398
:Sort of unique strategies and so on.
399
:So you care about pathological behavior and you really care about edge cases.
400
:So, and a lot of times in sort of an academic settings, it's like, hey, if this model, it
works, it performs well, great.
401
:Here you're sort of saying like, I'm actually fine if the model in aggregate is slightly
worse if in the sort of space...
402
:of the in the covariate space that we care about or the prediction space we care about, it
performs better.
403
:Right.
404
:So there's lots of little issues like that that come up in sports that are like super kind
of interesting and make the modeling part of it, you know, much more, uh much more
405
:interesting.
406
:I'll give you another example.
407
:This is like so basing but unbasing and see if you can see if we can sort of wrap our
heads around this, which is there's this idea in sports called the messy test.
408
:You've probably heard this phrase.
409
:Maybe it's like the trout test in baseball or something, right?
410
:But the idea is that if you build a model for overall player skill and you know, this is
like, you know, five years ago or whatever, but if you build a model for player skill, if
411
:Messi's not like at the top or close to the top, your model probably sucks.
412
:Right.
413
:It's like a very simple, like it's the eye test, right?
414
:Like you build a, if you build a model for, you know, best player, you want messy to be at
the top.
415
:And you can sort of simply, you could imagine doing the same for individual skills, right?
416
:If like, if I went basketball and I said, if I built a model for the best three point
shooter, if I didn't have like Steph Curry right near the top, like I should probably be
417
:concerned about my model, right?
418
:And that's actually like really insightful information.
419
:But if you think about what that is,
420
:It's typically not because, know, unless you're sort of like have some latent space of
player skills.
421
:Typically in this case, what you're actually doing is having some model that's like
weighting different parameters and so on.
422
:And then you sort of like calculate it for each individual player to see where they land
on this ranking.
423
:What you're actually doing is you're putting a, you have prior information about which
players are good.
424
:Okay.
425
:But when you're showing these rankings, essentially that's like the posterior predictive
of the model, right?
426
:It's like the predictions from the posterior.
427
:So it's the posterior predictive.
428
:And you're saying, have information about what that posterior predictor should look like.
429
:So you're putting a prior on the posterior predictive, right?
430
:Which is like any like traditional Bayesian would say like, that's just, you know, heresy.
431
:But it's like there's something there and it's actually valuable.
432
:So there's all these super, super interesting things that come out in sports where you
have to think really deeply um about the problem you're trying to solve and how your model
433
:sort of dovetails into that problem and really sort of working on...
434
:um
435
:sort of those issues.
436
:I'll give you one more example here and I could come up with more if we could keep going
where the modeling just becomes super fascinating, which is one thing I've seen quite a
437
:bit is that there's a lot of these models that sort of capture overall player value, and
they do it on the same units.
438
:So they say like, in basketball, for example, it might be like points per hundred
possessions.
439
:So like you could have, you could put a whole bunch of grad students and say, hey, you're
all gonna build a model for player performance with points per hundred possessions.
440
:And then they will do like they might have all other analyst team go out and build these
models and then they say, okay, now we're going to put, we're going to wait these things
441
:to create an overall model, sort of like an ensemble laying or an averaging idea, right?
442
:And they might say, but because these things are all in the same scale, let's make sure
these weights sum to one, right?
443
:Or, know, a triangle, triangle, prior on the simplex, right?
444
:But that's actually not necessarily the right thing to do because each of these models in
and of themselves has some shrinkage typically built into them.
445
:And so it's very possible that on the whole, once average, there's actually more
information in the sort of aggregate model than there's in the individual.
446
:So there's no reason actually that you should do that.
447
:So the sort of the,
448
:if you don't kind of understand what's going into these models, why they're shrunken, why
sort of multiple models combined might not wanna be shrunken.
449
:In fact, in this case, it's not even obvious to me that you should enforce positivity on
the weights, right?
450
:So there's like all these things that I think if you go and just say like, we're gonna
throw the obvious solution into some of these problems, you oftentimes end up with the
451
:wrong solution.
452
:So you need to sort of like...
453
:deeply understand the models you're building as well as the data you have as well as sort
of the underlying sport.
454
:And that's, that's, think where it becomes super, super interesting.
455
:Hmm.
456
:Yeah.
457
:Indeed.
458
:That's fascinating.
459
:Um, and what's really interesting to me is then, and that hints a bit about how useful,
um,
460
:these models and the Beijing framework is in your work and at Teamworks in particular.
461
:So I'm curious, like, if you can tell us a bit more about how you leverage these methods
in your work at Teamworks uh or for Toulouse and how you...
462
:like, and maybe to come back, you know, to what you started doing in your career.
463
:uh How did do they help account for uncertainty and thinking about the generative graph of
the model with in particular, spatial temporal data?
464
:Yeah, I think the word generative that you said is really useful.
465
:That's kind of how I think about modeling problems, right?
466
:I very much think about like, what is the generative model here?
467
:And how do we think in sort of
468
:This may be a bit over simplistic, but I sort of think about generative model as the way
of specifying the model and Bayes is kind of the way of giving the data how you sort of
469
:work backwards, right?
470
:Yeah.
471
:Sort of the Bayesian, the sort of way of thinking, right?
472
:Where you're sort of pooling information across maybe across players or across matches or
across seasons or across space or this is all like incredibly important in sports where
473
:you have what might look like a lot of data.
474
:but there's so much sort of contextual variation combined with a tremendous amount of
players.
475
:So our database in soccer is about 80,000 players.
476
:And if you sort of, you know, just think about like, oh, I want to fit even like an adjust
plus minus model, which is essentially like a giant regression model.
477
:It's a really, you know.
478
:large P, small N type of problem.
479
:But of course there's a tremendous amount of structure there, right?
480
:You have a lot of insights based on the league they play on, the positions they play on,
how good the team is, as well as sort of individual event information that happens in the
481
:game.
482
:you get like all this sort of...
483
:super universal information uh that comes from the structure of the game and you want to
be able to use that to get sort of better, better information, right?
484
:So uh when you talk about Bayesian models, that's sort of where it comes into play.
485
:Now, I should say that oftentimes what we end up doing in the end is sort of finding this
balance between, uh want to, in my head, maybe I've specified this perfect model for this
486
:problem.
487
:But then either it's like, it's too slow to fit because of the size of this data is
especially tracking data can be absolutely massive.
488
:So it's like, how do we get sort of poor man's bays or like some, so oftentimes it's like,
can we get 90 % of the value of this model for 5 % of the computational cost or 10 % of
489
:the computational cost?
490
:Right.
491
:So, you know, there's a lot of those types of conversations go on.
492
:And then the other things that will happen is again, is you'll sort of fit this model.
493
:And then you might back to sort of the earlier conversation, you look at the
494
:these edge cases and sort of the the perimeters of the covariate space and you're like
there's some bizarre behavior um in the model just doesn't look right in for various
495
:reasons in areas that we ultimately care about and So then you sort of like have to do
think of okay.
496
:How do we solve this?
497
:Right?
498
:Do you like up weight that area that covariate space in terms of the likelihood?
499
:Do you do sort of ad hoc corrections?
500
:Do you?
501
:uh redefine the model right like there's uh
502
:There's like all these sort of interesting things that arise.
503
:I'll give you an example, one fairly recently, which is that we're building, you can
imagine you have some model that says, okay, I'm going to build component models for
504
:different skills.
505
:And then I'm going to sort of have those skills try and predict like player performance or
team performance or something.
506
:So you can actually think of this as like multiple likelihoods in the same model, right?
507
:A likelihood for, there's some like latent, there's some latent skills.
508
:Think of them as like speed and ball control and whatever, right?
509
:And then you might be observing like uh actual speed data and maybe some of the, you know,
team performance and much of things, right?
510
:So there's all these different likelihoods sort of stack on top of those variables.
511
:And it turns out that because of this, whether it's like model mis-specification or
collinearity or something, you can end up in these situations where, um okay, the speed
512
:data in theory should define everything you need to know about speed, but because of the
sort of incompleteness of the data or who knows what, you end up sort of, likelihood for
513
:team performance, what can actually happen is to say, actually the speed data says this
player's slow, but because he's so good, this latent variable for speed
514
:actually going to think he's really fast.
515
:And so you end up in like situations like no, no, I don't want the players performance or
team performance to sort of flow into the speed like in variable because I, you know,
516
:subjectively, I think all the information should be captured from this other thing.
517
:And so you sort of end up doing things which are kind of unbazing actually, which are like
cut models where you're saying, yeah, I won't like even within say like a Gibbs sampler,
518
:right?
519
:Where you're saying I'm not going to allow this information to affect this variable.
520
:or this parameter to sort of flow into this other parameter.
521
:So based on the underlying models, you can sort of hack these models to do very, very
interesting things.
522
:Actually, this is an old idea.
523
:If you remember, WinBugs used to have this like cut option, which sort of was largely for
computational reasons, but allowed you to sort of like, in some sense, uh sort of portions
524
:of the graphical model to sort of produce what you want.
525
:There's actually some really interesting papers.
526
:I'll see if I can find it later by Pierre Jacob that looks at these models, these cut
models.
527
:They're sort of like unbazing in a way, but sort of shows how they behave and how they're
super useful in certain cases.
528
:Yeah, yeah, that is very interesting.
529
:It's like, yeah, I'm not sure it's unbazing.
530
:To me, it's just specifying prior information and making sure it goes through the sampler.
531
:The issue you have usually, and that's actually an active area of research, is
532
:If the model gets complex enough, it gets hard to set the priors on something you really
care about.
533
:know, like setting the priors on the standard deviation terms of a multivariate normal is
like, you have to do it, but...
534
:sometimes not interpretable.
535
:What you have an idea about is the standard deviation of the whole data, right?
536
:So your whole generative graph, you know, more or less the prior you want to have on the
standard deviation.
537
:but you don't know how to specify that for each of the parameters.
538
:so ideally you would want to specify one big standard deviation and that one is just
allocated then up upstream to the rest of the model.
539
:This is even more weird though, because this is like saying we have some Gibbs sampler.
540
:like maybe you're alternating through parameters.
541
:Maybe it's these latent variables, right?
542
:And you're saying, okay, if we write the full joint posterior and then we break it down
into the conditional so we can do Gibbs sampling.
543
:And we say, this one,
544
:All the Gibbs samplers we're gonna sample with, whether, whatever, maybe it's conjugate or
maybe you can do rejection sampling, it doesn't really matter.
545
:But this one conditional that we wanna sample from, we're actually not gonna sample from
the true uh conditional as defined by the full posterior.
546
:We're actually gonna sample from some version which looks like that, but we're actually
gonna cut out the dependency on this particular data source.
547
:So you're actually like, you're sort of creating this posterior, sort of artificially
creating this posterior which is actually not the full posterior you originally defined.
548
:you're of saying, you're defining some sort of new posterior, is uh by removing certain
dependencies within these conditional statements.
549
:Now it's actually not necessarily true that you end up with like a valid joint posterior,
but you end up with better performance.
550
:So it's just a super interesting problem for me where, I have this, like, know how things
work on a broad scale for sort of the full Bayesian model, but I also know I can get, I
551
:can solve problems by sort of hacking these things and sort of pinning certain variables
here and tweaking things here.
552
:and sort of breaking things, but breaking them in a way which actually produces better
results and ultimately better decisions.
553
:Yeah, that's fascinating.
554
:I love that.
555
:So basically it's like...
556
:If I understood correctly, when you say cutting the posterior, it's like making sure we're
only taking a subset of the full posterior that the model was actually defining.
557
:Kind of, yeah, but only on one of the conditionals, only on one of the variables.
558
:you're saying, when we learn about the speed...
559
:So normally it's saying, if I'm updating, let's say the variable for players
decision-making, you have to condition on everything.
560
:It's the data, know, to give sample, you're also conditioning on the previous draws of all
the other variables.
561
:But you're saying that when saying, for speed, I actually only...
562
:when I update the speed latent variable I only want to condition on the other variables
and the speed data I don't want to condition on these other data so you're sort of
563
:dropping the conditioning on these other variables and sort of just removing them from
that conditional uh sampler.
564
:So kind of a really interesting case where you're sort of like just taking this thing that
looks like a well-constructed
565
:posterior distribution and sort of just like hacking it in ways, which is like kind of
unprincipled, but like makes sense and leads to the performance that you want.
566
:Yeah, yeah.
567
:Damn, that's very cool.
568
:So I'm guessing that means a lot of custom code, whether that's in Stan or Pimcee or
Python or stuff like that, right?
569
:Yeah, certainly, right?
570
:A lot of these things aren't necessarily handled well with Stan, right?
571
:Because Stan is great for, or any these small PMC, whatever, they're great for, hey, let's
build like a Bayesian model that does this, this, and this, but it's not great for when
572
:you're like, hey, we need to fix this edge case, we need do this, and we need to of hack
these things.
573
:So first up, to do those types of hacks, you have to like deeply understand the problem
and deeply understand the models, that's sort of step one.
574
:But as you say, it oftentimes falls outside of these automatic inference engines.
575
:So you oftentimes have to do things custom, right?
576
:So, you know, it actually reminds me
577
:bit of like, you know, think a lot of times because of Stan and PIMC and others, like you
oftentimes don't need to understand what's going on with the hood with HMC or any of these
578
:other algorithms, right?
579
:But I always thought it was important for people to understand, like at the very least,
like what's going on when you're fitting sort of basic models.
580
:Like I'll give you one example here, which is when I taught spatial stats, it was a grad
course.
581
:I would give people like a very simple multivariate normal where they're trying to like
estimate the mean or the covariance or something, right?
582
:It's like, okay, here's a whole bunch of data and you know, maybe P is 50 and N is a
thousand.
583
:Hey, learn the mean and covariance.
584
:And that's like, that's a trivial problem.
585
:It's conjugate, blah, blah, blah, right?
586
:And then I say, okay, now I'm going to give them another data set, which is kind of the
same thing, but P is now a thousand and N is maybe 10,000.
587
:And all of sudden it's easy to pull in the data.
588
:The data is actually relatively small, but they go to fit the data and it just doesn't
work.
589
:And they say like, what's going on here?
590
:Why doesn't this work?
591
:It's a conjugate problem.
592
:why doesn't it work?
593
:And they say, well, you're trying to invert a thousand by thousand covariate matrix.
594
:Maybe that's not a lot.
595
:Maybe it breaks down at 5,000.
596
:Whatever it is, it's a pretty relatively low number on sort of a typical laptop where
these things start to fail.
597
:And then I okay, well, that doesn't work.
598
:So go fit it with stochastic gradient descent.
599
:And then they go do that and they say, actually that works way better for these large data
samples sizes.
600
:And I said, but what about for the small data?
601
:And they say, well, actually then just use the sort of the conjugate one, right?
602
:So it's like realizing that, there's different ways to solve the same problem.
603
:And you might use...
604
:this sort of direct sort of analytical solution in some cases, and you might prefer to use
SGD in another.
605
:And then I go back and say, actually, there's a piece of information I didn't actually
tell you.
606
:there's actually a certain correlation structure in this data.
607
:And it turns out to be just an AR1, it's a time series.
608
:Oh, now you can just fit this thing with like, you know, common filter-esque kind of,
message passing kind of ideas, typically like common filter.
609
:So you can do things in sort of linear time in P, because P is sort of T now, like time,
right?
610
:So, and then there, and all of a sudden they go from an exact solution to something which
actually is exact and very, very fast.
611
:solving the same problem fundamentally, but understanding sort of there's different ways
of actually doing the computation, right?
612
:And I think having those skills and sort of uh thinking holistically, not just about the
modeling, but as well as the sort of fit uh is really important.
613
:Like in sports where we're dealing with space all the time, you're sort of like, you're
always having to discretize space in some way or another, right?
614
:Whether you like it or not, whether you sort of think about it as like...
615
:we are projecting onto some basis or something, ultimately you're discretizing.
616
:you put it into the computer, it's discretizing it.
617
:So uh people oftentimes will say, I really don't like having a continuous model, but like,
hold on, you're to have to sort of come up with some low basis approximation to it at some
618
:point.
619
:So might as well be sort of transparent that you either sort of either having a sort of
simple model that you can do like really accurate exact computation on or a complicated
620
:model that you got to do approximate computation on and sort of understanding those
trade-offs.
621
:think is like is critical in anyone who's working on this problem.
622
:Yeah, no, for sure.
623
:And that makes me think, you know, in your experience, what what have been the different
ways, you know, to get these uh
624
:90 % of the results with 5 % of the computational cost of the model.
625
:What have you found particularly helpful?
626
:And I'm guessing these are cases where you cannot run classy nuts sampler, for instance,
on a model.
627
:Yeah, a lot of this looks like...
628
:uh
629
:It's like making sure you have a big enough toolbox so you can say, here's the perfect way
of doing this.
630
:Maybe it's like some big model built in Stan.
631
:ah But hey, that's not going to work as the data scales.
632
:So can we fit this with like a penalized regression?
633
:Or what happens if I sort of
634
:In a lot of these samplers, where things start to get really slow is when you're sort of
sampling hierarchical parameters as well as sort of like, like in a lot of player, if you
635
:think about like a situation where you have like one variable per uh player, right?
636
:Those types of problems are oftentimes easy because you can marginalize them out.
637
:But as soon as you start sampling like hierarchical parameters as well, where you sort of,
you want like some, you know, uh
638
:some like group level variances or something like that, then those conditional updates can
be super expensive to calculate because then you're sort of conditioning on the full data,
639
:And so it's like, okay, what happens if...
640
:What happens if we just like do some hack and just find an estimate for that hierarchical
parameter, just plug it in, right?
641
:Do we lose a lot by just sort of specifying that directly rather than sampling it and
having it posterior over it, right?
642
:So that's like lot of the types of things we're thinking about is like, where are the
computational bottlenecks, especially as the data scales and how can we make it work?
643
:Oftentimes work sort of in sequential updating, because essentially we want to update
these things.
644
:weekly as new matches come in.
645
:How can we do this sort of in really efficient way without losing sort of prediction
fidelity, right?
646
:And so yeah, it looks like things like, hey, actually, if it's a big regression problem,
can we do sort of ping-wise regression?
647
:And sure, we won't get full characterization of uncertainty, but can we get like a point
estimate and 1 % of the time?
648
:And then is there sort of ways, can we bootstrap or something else to get some notion of
uncertainty around it?
649
:Or as I said earlier, can we pin some hierarchical parameter to speed up the sampling,
even though we know that, you know, that's
650
:But we're gonna end up with some bias that results.
651
:So those are the types of conversations we're having constantly.
652
:Hmm.
653
:Okay.
654
:Yeah.
655
:Yeah.
656
:Yeah.
657
:That tickles a lot of the conversations I'm having also.
658
:In my work for sure.
659
:so if you and curious, know, if you would, um if you would be mentoring or teaching right
now, these students, you know, um what would you
660
:recommend them to invest their learning time on providing conditional and then knowing
patient stats and being fluent in a probabilistic programming language already, which is
661
:the case of a lot of the listeners here.
662
:But so if you had to tell them, it's something that can actually help you also complete
your toolbox for these cases, for instance, what advice would you give them?
663
:Yeah, one thing I've seen with like a lot of
664
:of students that are coming out of master's programs or data science programs is that
they're sort of quite good at give them a data set, hey, either use this method or choose
665
:a method that's going to give you a good prediction performance.
666
:so people are good at maybe tuning neural networks or Gaussian processes or...
667
:you know, penalized regression or what have you, they're sort of quite confident with
these, with sort of fitting methods and looking at predictions and saying, hey, this has a
668
:better RMSE.
669
:But where they can really sort of struggle is that a lot of times at sports, actually,
there are certainly prediction cases, but oftentimes what we're doing is fundamentally as
670
:inference.
671
:You're saying, hey, we observe some team level performance where you want to infer what's
causing this.
672
:So you have some latent parameters, which are like player skills or player performances,
and you're essentially trying to learn those.
673
:It's, it's fundamentally an inference task.
674
:And so, you know, thinking about more sort of statistical problems and sort of building
that statistical toolbox, you know, is the first thing I would say is if you're sort of
675
:the bulk of your time has been spent building prediction algorithms is sort of broadening
your toolbox beyond that, right?
676
:And then learning as much as you can, you know, if you've some of the best ideas I've ever
had in sports have come from like text modeling.
677
:Right?
678
:Where it's like using like Duryshy processes to model plays in basketball was like one of
the coolest things, I think, projects that I've ever been involved in.
679
:Again, that wasn't my idea.
680
:was a genius PhD student named Andy Miller.
681
:there's like those types of things, right?
682
:If you want to be able to sort of creatively solve these problems to find like, to find...
683
:computationally efficient solutions, you have to have way more than just one tool in your
toolbox.
684
:Read broadly, study broadly, and you can get into the weeds on things.
685
:Learn about language models, learn about text models, learn about image modeling, learn
about spatial statistics, learn about robust estimators, as we talked about earlier, learn
686
:about all these different areas.
687
:And sure, you won't necessarily be an expert in any of them, but if you sort of
understand, I know what these models do, and I sort of understand their pros and their
688
:cons, and what they work on, what they don't work on.
689
:then it becomes much easier to sort of solve these sort of unique problems that arise.
690
:Yeah, yeah, yeah.
691
:That's really, really interesting to hear that.
692
:And how do you...
693
:You know, in your experience, what have been very interesting uh open source software that
maybe you've used outside of the patient framework?
694
:curious.
695
:Yeah, early days for me was not open source.
696
:I started when I started my master's, it was like MATLAB programming, right?
697
:That's what I was like the tool de jour.
698
:And since then I've sort of been...
699
:combination of R and Python has been the large bulk of my work.
700
:But a lot of stuff is built on packages designed for those two languages as well as Stan
and others.
701
:And these days, a lot of what I do is also sort of broader technology, which is just
revolutionized the way we do things, whether it's cloud computing or Docker.
702
:Like all these sort of standard tools in broader tech, think, also, you if you want to
work in industry in particular, are incredibly valuable, incredibly valuable to learn,
703
:right?
704
:Like...
705
:At Zell's, example, now Teamworks, even have sort of our models themselves are like
Dockerized, they're containerized.
706
:So you think about like, I'm building a Bayesian model.
707
:Well, we think of it as like a container object that holds predict functions and test
functions and train functions and allows you to like very simply uh probe these models as
708
:well as to sort of aversion and control them.
709
:And so it's like blending ideas from statistics and machine learning with sort of
710
:software engineering ultimately to solve a lot of interesting problems that come up in
sort of uh production-alized machine learning.
711
:And so, to this it's a lot, right?
712
:But it's a lot also outside of what you might think of as traditional stats and ML
toolbox, right?
713
:But broader tech stuff.
714
:Heck, I spent a weekend, this is not work-related, but I spent a weekend flashing a
Raspberry Pi and messing around with a Raspberry Pi last weekend.
715
:So I'm deep into this.
716
:stuff.
717
:Yeah, I mean, that makes sense to me in the sense that, you know, a lot actually a
substantial portion of proportion of the modeling work is actually not done on the model.
718
:It's much like a lot before, how do I get the data in which format?
719
:How do which data do I want?
720
:Even before that, what's the generative graph we're thinking about?
721
:Which questions are we interested in answering?
722
:And then once I have the data, how do I format it in the format that the model can
actually take it?
723
:Doing all the EDA, also extremely important.
724
:Parameter recovery, simulation-based calibration, all this stuff.
725
:So this is something I hear a lot, right?
726
:Which is like, data science in the real world is like a lot more data managing and like
all this sort of pre-model fitting stuff.
727
:That's certainly all true.
728
:I think the analysts that have worked with me in the past, where they're most surprised
when they work with me is that how much work there is after that first model fit.
729
:they do all this work getting everything in place and then, hey, I finally built the
model.
730
:We're good to go.
731
:And I said, well, hold on.
732
:I want to see.
733
:Calibration, I want to see like all these different, want to explore all the edge cases.
734
:I want to explore, you know, what, what happened, how sensitive is it to different
assumptions?
735
:How, know, is there like all these sort of like poking and prodding the model?
736
:It's like, so you have, yeah, there's all this work that comes before the model and then
they're sort of fitting the model, which is kind of the quote unquote fun part.
737
:And then before you sort of productionalize this thing and release it to the world, like
you have to probe it endlessly to make sure that, that it does the things you want it to
738
:do and that in sort of predictable and controllable.
739
:ways.
740
:And that's sometimes more work than is actually needed before the models fit.
741
:yeah.
742
:Yeah, I was gonna go there.
743
:Definitely.
744
:And yeah, I think that's almost always even more work.
745
:um Because, well, that, that'll that often doesn't work like you want to, know, model,
especially if it's complex enough.
746
:So there's definitely some dimension.
747
:where it's not working as you would have expected.
748
:um And so that's interesting to know, okay, in which conditions do my model uh collapse?
749
:Where does it not work?
750
:And so that takes a lot of form.
751
:think then there is even after that, there is another part that is visualization.
752
:If you're doing that as the modeler, it's extremely important because you want custom
visualizations uh depending on the people who are gonna talk to.
753
:uh We'll get to that a bit later, but yeah, on that model validation part, I think it's an
extremely important part uh that you bring up.
754
:that ah also...
755
:goes back to what something you were saying earlier which is actually when we develop the
model we're fine if it's not doing a very good job at the league level for instance but
756
:where it does a good job for the kind of players we're interested in so it's like maybe
the model is not that good overall or it's not that good for
757
:old players because they have low athleticism stuff like that but we're actually
interested in young players so more bigger like better athleticism and so on in this
758
:population the model is doing better so it's actually fine for us so that's the very
important part amateurs what's your workflow there if you have one you know if the other
759
:stuff that you
760
:always check once someone comes with a model that's fitting well, all the conversions and
diagnostics are all good.
761
:Now we're entering that part of model validation.
762
:Yeah, we do a lot of stuff pre-modeling, project plan type stuff where there's documents
that we will use, the layout, the types of things we want to do beforehand.
763
:Afterwards as well, like calibrations, lot of standard things that we'll do.
764
:But lot of it just comes, it's sort of fairly iterative, right?
765
:You do the initial sets of plots and then say, actually, we should look at this and this
and then something looks weird and you say, okay, that's weird.
766
:Like how else can we slice that in a different way?
767
:Can we see it?
768
:uh
769
:trying to expose what's actually going on under the hood.
770
:Hey, can we see this, set of parameters?
771
:Is there some, is there some sort of like high leverage data that's maybe driving these,
this weirdness?
772
:Like let's explore all that type of stuff.
773
:So it's a combination of sort of a standard checklist, if you will, as well as sort of
really custom stuff that really comes from just working with these models and these for,
774
:for so long that you sort of start when you, you sort of really spot little issues and you
sort of figure out how to, to sort of pull the thread and to get at what you want.
775
:Hmm.
776
:Yeah.
777
:It's like, yeah, like a detective work.
778
:um And what's like, is there a case, you know, that you remember that was particularly
hard, you know, where you were like banging your head against the wall before finally
779
:understanding what was going on with the model?
780
:Yeah, there's lots.
781
:That happens like that's daily.
782
:I'll give you one example.
783
:It's it ended up being my favorite paper I think that I've ever written is with Patrick.
784
:Very simple statistically.
785
:There's this idea in sports science of acute chronic ratios and the...
786
:It's a pretty simple notion of, how much load on your body has there been?
787
:I'm going to simplify a bit, so bear with me, but how much load has your body had in the
last week versus how much has it had in the last month, roughly, or maybe month and a
788
:half?
789
:Just the idea being sort of like, Hey, are you, you getting, um,
790
:is the load that you've experienced recently, is this sort of normal to what you normally
experience or is it high or is it low?
791
:You can imagine if you're a runner and you sort of normally run 10 miles a week and then
this last week you ran 50 miles, well you'd have a super high Q-chronic ratio.
792
:If you normally run 10 miles a week and this week you ran one mile, well you'd have a
really low Q-chronic ratio.
793
:So sort of like a short-term average divided by a long-term average, right?
794
:So it's the ratio of the two.
795
:And there's a tremendous amount of papers out there that show that if that cute chronic
ratio is outside of some range.
796
:So here I'm talking about like load on the players, like amount of physical exertion.
797
:It's probably easiest to think of it as like running, right?
798
:It's like distance run, it's a little more complicated than that.
799
:But basically say, if we have these notions of player load and if you fall way outside of
these bands, that it's highly predictive of injury in the future.
800
:And there's like some intuitive logic there, right?
801
:Which says that, uh which is like, hey, um if I'm a runner and I go from running 10 miles
a week to all of a sudden running 50 miles, I'm increasing my injury risk.
802
:I'm more likely to get injured moving forward, right?
803
:Or if I'm training on the soccer pitch for two hours a week and now if I'm going to 10
hours a week, uh
804
:to have a higher chance of injury.
805
:But, and to be clear, there's like literally hundreds of papers out there that show that
that is predictive of future injury.
806
:Okay, great, okay, great.
807
:So we have this Kube Chronic Ratio.
808
:In fact, just as a little aside, the Apple Watch now, if you have the newest Apple Watch,
has this built in Kube Chronic Ratio.
809
:It's like a plot that shows how are you relative to your normal.
810
:It's the same idea, and it's using a lot of these same ideas.
811
:But it's essentially like, again, the short-term average divided by long-term average.
812
:But when we did this internally, a couple of teams I've worked for is that, excuse me, we
found that it wasn't predictive and I couldn't figure out why.
813
:And so you look at all these papers and basically what they're at is like, their estimate
is like a zero one.
814
:Did the player get injured after some point in time or not?
815
:So it's like, if...
816
:If my acute chronic ratio is high at time t, do I get injured in time t plus one to time,
let's say, you know, the next month or the next week or whatever.
817
:That's how all these papers look.
818
:It's like the acute chronic ratio at a snapshot in time, it's past sort of looking
backwards.
819
:Does it predict injury in future?
820
:And then I realized like, that's actually not what you care about because what these are
often saying is that, okay, if...
821
:What we sort of uncovered is that the main reason this is predictive is because if your
load has increased in the last week...
822
:It also means that your load will likely be high in the future.
823
:so what is actually happening is that your chance of injury per minute of exposure or per
mile run in the running example, it actually stays, oftentimes it stay constant, but your
824
:total underlying exposure has increased.
825
:these are not actually, this like great hundreds of papers that show this thing super
predictive, it's actually predicting exposure more than it's predicting injury.
826
:ah So you end up with this like confounding with sort of it's oftentimes it's like the
time of the season, right?
827
:Like you come into training camp, your load spikes and therefore you have higher chance of
injury.
828
:It's not that your chance of injury for every minute you're on the field is higher.
829
:It's that you spend more minutes on the field after that.
830
:Right?
831
:so like, you know, I'll give you another example, right?
832
:If you, there's like, if you looked at
833
:uh So eating ice cream is super predictive of drowning deaths.
834
:Like if you look at the data on this, you'd say like, okay, when people eat a lot of ice
cream, there's a lot of drownings as well.
835
:Right?
836
:And so it's like, what's the mechanism here?
837
:Are people like eating too much?
838
:People eat all this ice cream, they get bloated, they can't swim.
839
:It's like what your grandmother used to tell you, don't swim after eating, you you sort of
like make some choices to make sense of this.
840
:And then you step back and like, no, there's actually confounding variable here, which is
like season.
841
:It's the summer.
842
:People eat more ice cream, they, you know, they swim more, right?
843
:And so the ice cream is actually just, it's actually saying that there's, it's it's
predicted not of the drownings, but of the amount of water exposure.
844
:And it's the same thing here that the,
845
:the Q-chronic ratio is not necessarily predicting a higher, it's not like your chance of
drowning per minute of being in the pool is any different, it's that your minutes in the
846
:pool goes away.
847
:And here it's saying the same thing.
848
:So that's like a simple case of like, uh we did this big project and it wasn't necessarily
this new and novel, like, we got this new thing.
849
:It was saying, hey, everyone's using this thing that turns out largely to be garbage.
850
:And it's, it's.
851
:or certainly at the very least they're putting way more confidence and trust than they
should because fundamentally all these papers have misdefined the estimate.
852
:They've defined it in a way which is actually not what you care about.
853
:You don't care about whether the player gets injured in the next two weeks.
854
:You care about their injury risk per minute of exposure or per minute of gameplay or
whatever.
855
:And you need to essentially control for that.
856
:And none of these papers have.
857
:So that's a really simple example.
858
:Again, not terribly Bayesian, but sort of when you
859
:really deeply about a problem, sometimes like this, there's like a really simple solution
right in front of you that can, in that case, know, it's essentially telling people to
860
:stop doing this thing or at least be much more thoughtful about how you're using this
data.
861
:And so that's one example of like something that's still super prevalent in sports, but we
kind of uncovered this really interesting confounding effect.
862
:Yeah, it's fascinating.
863
:Yeah.
864
:And I mean, I love it because it's really, I couldn't find the solution in the data.
865
:Right.
866
:So I think in a way that's quite patient because you had to something that's fundamental
in base is solutions are not always in the data.
867
:And that's why you need priors.
868
:And that's why you need the structure of the model.
869
:And so here is like literally thinking much more about the generative process of the data
and basically um realizing.
870
:like in the ice cream and drowning example where you have basically season is, as Richard
McArth say, uh is a fork, right?
871
:So because it causes both viable, you're interested in, and unless you control or
condition on that variable, you're gonna have uh biased estimates.
872
:And so here, thinking about that, then you condition on the time plate.
873
:the time spent on the field and then you see that this predictive aspect basically
disappears because you've taken care of it.
874
:In this paper we also showed that if you use some ideas from, back to the point about
having a broad toolbox, if you use some really basic tools from causal inference like
875
:matching or propensity scores that you can actually solve this problem, right?
876
:If these studies, instead of saying, let's predict whether a player gets injured or not,
if instead you had said, let's take two players at the same point of the season, one who
877
:has a high IQ chronic and one who doesn't, but control for all the other things like
minutes played in the games and all this other stuff.
878
:And if you did that, then you can actually control for this effect.
879
:so, yeah, it's like, I think it's...
880
:just a really simple example of a case like, you know, have a lot of that work a lot of
money to teams and sort of understanding them well and perhaps in this case, like not
881
:chasing down rabbit holes.
882
:Yeah.
883
:Yeah, I love that.
884
:That's really fascinating.
885
:And that's a solution that's like simple, but it can take so long to get there.
886
:So thanks.
887
:That's super interesting.
888
:That's not a big model.
889
:It's a good example because there's no real stats there.
890
:It's just like a probabilistic.
891
:I guess there is in a way, but it's super simple.
892
:It's a very simple case of this really, are you defining the estimate, the thing you care
about properly?
893
:Yeah.
894
:and end up with wildly wrong conclusions.
895
:Yeah, for sure.
896
:um So ah I'm going to start winding us down.
897
:Now he's like, I've already taken quite a lot of your time.
898
:eh Maybe two other questions, if you can, before we go to the last two questions.
899
:um
900
:I'm curious, what do you see as the most exciting trends or advancement now in your field
and how you think they will impact sports analytics and decision making?
901
:Yeah, think there's probably a couple of things there.
902
:The first is, as I mentioned earlier, just the growth of this really super interesting
data and all the statistical problems that come along with it.
903
:That has sort of been a trend over last five, 10 years, but it certainly continues.
904
:And it means that teams are and sort of...
905
:Other businesses in the sports space are investing a lot, so you have lot of teams scoring
their analytics departments and media companies and other sort of spending in the space.
906
:Another cool thing I've seen, which seems sort of like tangential but is actually really,
I think actually quite nice, is that...
907
:When teams are building out their internal resources, every team typically has to do a lot
of the same things.
908
:So they have to like set up a database, they have to build all the ETL to ingest the data,
they have to...
909
:create like an IDM to sort of standardize across the different data sources.
910
:They have to map players.
911
:Anyone who's done like player mappings in soccer where you have tens of thousands of
players and you're trying to map, you know, uh data from one data provider to another for
912
:some Brazilian player with like six names, like it's just impossible.
913
:Right.
914
:And so everyone's doing these things sort of on repeat before you can get to like the
interesting statistical problems.
915
:Right.
916
:And so one thing that actually that I think has been great in that it's a trend that has
just recently started and actually Zellis has been a big part of that, but
917
:but there's others as well, uh sort of realizing that's a problem and sort of like finding
centralized solutions for that.
918
:So there's been cases, for example, of leagues taking some of that on and sort of saying,
all these teams don't need to each be spending hundreds of thousands of dollars a year,
919
:like mapping players are doing this and this.
920
:We'll create something.
921
:Or Zelis has essentially said, hey, all these teams are doing a lot of the same things
when it comes to data engineering and even sort of early simple states, the disk of
922
:modeling.
923
:let's sort of democratize that and for a fraction of the cost teams can just sort of
ingest all that.
924
:And then they sort of to use the baseball analogy.
925
:Like they're not starting from nothing.
926
:They're kind of starting on their base.
927
:Right?
928
:So that's sort of another thing that I think is like a huge progress in terms of creating
the space for sports analysts to actually work on, you know, challenging interesting
929
:problems.
930
:No, for sure.
931
:mean, so first, now that I work in baseball, I can understand that metaphor.
932
:So thank you.
933
:And second, yeah, I mean,
934
:I'm an open source developer so I'm not gonna tell you this is a wrong direction.
935
:think, like, know, ideally if you'd ask me I'd be like, eh, I think you know the leagues
should just take over all of that and put all that tracking data and stuff like that.
936
:Not the latest frontier because, well,
937
:the industry to keep progressing, know, people still has to have to earn money because
they were like the first one to develop that kind of data.
938
:know, the old data and stuff like older data and stuff like that, you know, just put that
on the league and then the league could just open source everything so that everything
939
:gets access to that.
940
:Everything has the same data.
941
:Um, and then it's just how you use the data and the combination, not only a light of like
the models, but the combination of the people working on the data, the modelers.
942
:the GMs, the coaches, like all these people together, the scientists, that's what really
makes the value out of the team much more than having that exact row in their database
943
:that the other team doesn't have.
944
:think the rest of the other industries also show that, you know.
945
:Yeah, our edge at Toulouse did not come necessarily from having the best data.
946
:Like I think we did actually have the best data and the best models, but it came from
execution.
947
:There's no question about it.
948
:Like, yeah, we probably did a little better than if we'd had simpler data, some of the
models.
949
:Yeah, maybe we would have been a little less efficient, but uh there's a lot of clubs out
there, and I certainly won't say names here, but a lot of teams out there that have great
950
:data and big analytics groups and so on.
951
:And, you know, they treat them like houseplants.
952
:They stick them in the corner and ignore them.
953
:Right.
954
:And you're not going to get any, you're not going to create competitive advantage doing
that.
955
:No, no, no, for sure.
956
:mean, I see that a bit like, you know, high quality cooking, right?
957
:What makes a chef good is not only the ingredients, it's like how they use the ingredients
and with whom they're using that and to whom they are feeding the, the, the plates and the
958
:where they are feeding people.
959
:know, like, if you go to big restaurants, it's like, it's not only one table where they
just give you the food.
960
:It's like a whole experience.
961
:So.
962
:It's yeah, like I think I think it's a it's a great thing that we're moving in the in this
direction and and yeah, as you were seeing zls is doing a lot of that and that's I think
963
:that's really amazing.
964
:um Another question I have for you before the last two ones is, well, what's next for you,
you know, because you're a very curious person and uh now you've so you've sold zls so I'm
965
:curious are there
966
:any upcoming projects you're particularly excited about for the month to come?
967
:know, I sort of spent the first 15 years of my career trying to be the best statistician
that I could.
968
:I sort of had this personal mantra.
969
:I was always personally offended if I didn't understand something or I didn't know a
method.
970
:That's part of what drove me to learn so much.
971
:It's just like...
972
:feeling like I just wanted to sort of cover everything.
973
:And so I spent so much of my career doing sort of being that, right?
974
:And sort of growing as a statistician.
975
:And mostly in academia, but of course with very consulting eggs and all sorts of industry
as well.
976
:And then the last 15 years have really been about, and some overlap there, but the last 15
years have been about sort of sports and learning about sort of applying that domain
977
:really into sports and sort of how that leads to sort of decision-making, negotiations,
sort of finances and the whole, like the integration of all those things coming together
978
:to ultimately go from, hey, how do we value players better?
979
:Right through to like running a team and
980
:and ultimately outperforming our payroll and all the way through to like creating equity
value for shareholders, right?
981
:And I kind of feel like I'm at this point where like I've built a lot of time, know, maybe
it's like hit Malcolm Gladwell's 10,000 hours in both of these things.
982
:And I feel like I'm, with Toulouse, we've sort of proven the thesis here.
983
:so a lot of right now what's next to me, like I sort of want to keep leaning in on this.
984
:I feel like I have this, these two things.
985
:things sort of perfectly hybridized together where these technical skills that I've built
over the years combine with the team management skills and it's a super powerful
986
:combination and a really valuable combination.
987
:I like just love using those tools and these ideas to sort of play Revenge of the Nerds in
real life.
988
:Like going out there and just dominating, using essentially, this is gonna sound very not
humble, but using our group's intellectual advantage to win.
989
:And to me that's like super, super fun.
990
:Yeah, mean, love that.
991
:Definitely support it.
992
:Like, Revenge of the Nerd, you had me there.
993
:But yeah, I mean, of course, yeah.
994
:And that's something.
995
:Also try to do with the podcast from like trying to...
996
:percolate these ideas through more people.
997
:um Because I think that's something that's needed a lot.
998
:yeah.
999
:Awesome.
:
01:24:11,406 --> 01:24:12,547
Well, Luke.
:
01:24:13,048 --> 01:24:14,678
I think I think we'll call it a show.
:
01:24:14,678 --> 01:24:20,620
I would still have so many questions like literally I still have so many questions like I
had for you today I didn't ask you.
:
01:24:20,620 --> 01:24:22,511
Sorry that's my fault.
:
01:24:22,511 --> 01:24:24,831
I filibustered a couple of those questions there.
:
01:24:24,831 --> 01:24:27,152
No, no, That's you know, that's me.
:
01:24:27,152 --> 01:24:28,202
have, that's also my job.
:
01:24:28,202 --> 01:24:32,493
have to you know adapt uh to the topics.
:
01:24:32,493 --> 01:24:33,724
But yeah, I mean that's cool.
:
01:24:33,724 --> 01:24:42,396
That means you can come back on the show next time you have a cool project to talk about
and I'll get you, I'll get you to you these questions.
:
01:24:42,736 --> 01:24:47,516
But first, before you leave, of course, I have to ask you the last two questions.
:
01:24:47,556 --> 01:24:49,996
Ask every guest at the end of the show.
:
01:24:50,916 --> 01:24:53,356
So I'm going to change a bit the first one for you.
:
01:24:53,356 --> 01:24:54,556
First time I do that.
:
01:24:54,556 --> 01:25:00,196
But I think I think that you kind of answered a bit the first one already.
:
01:25:00,196 --> 01:25:10,596
And I'm curious because you have a particular background and origin story where if I
remember correctly, in this moneyball episode,
:
01:25:11,568 --> 01:25:25,528
you were saying that you actually started working on spatial temporal data with the
Sacramento Kings, because someone came into your office with like, some question, but the
:
01:25:25,528 --> 01:25:28,848
question was actually not for you, think, if I remember correctly.
:
01:25:28,848 --> 01:25:39,464
And that was so that's a very, very random, you know, origin story where that could be in
a in a movie in a way where it's like, the hero never wants to
:
01:25:39,630 --> 01:25:40,620
be a hero, right?
:
01:25:40,620 --> 01:25:43,543
It's like the situation imposes onto him.
:
01:25:43,543 --> 01:25:47,369
So my question is counterfactual for you.
:
01:25:47,369 --> 01:25:49,411
If that moment had not happened, right?
:
01:25:49,411 --> 01:26:01,683
If you had not been in your office at that time, at that place, and you hadn't met that
person and ended up working on on sports data, what do you think you would have done?
:
01:26:02,992 --> 01:26:10,472
Yes, I think everyone's lives are sort of built up with a lot of these just kind of random
events that accumulate into who you are.
:
01:26:10,632 --> 01:26:11,892
That was certainly one of them.
:
01:26:11,892 --> 01:26:22,972
I won't retell that story because as you say, it's on the Word and Money Balls podcast,
but I think I'd still be working in sort of very similar spaces, but possibly not sports,
:
01:26:22,972 --> 01:26:23,072
right?
:
01:26:23,072 --> 01:26:29,936
Sort of taking all these ideas, bays and so on and...
:
01:26:29,936 --> 01:26:32,476
trying to apply it to some interesting problem.
:
01:26:32,476 --> 01:26:41,616
for some reason, I ended up working in a space which is working on a zero-sum game where
billionaire owners are trying to extract value from millionaire players and the
:
01:26:41,616 --> 01:26:43,996
millionaire players are trying to get dollars from the billionaires.
:
01:26:43,996 --> 01:26:54,656
So it's like a very strange space to work in, but I think I'd probably be in a very
similar spot but not in sports.
:
01:26:54,816 --> 01:26:59,816
I was on a path where I was doing a lot of stuff in climate, maybe it'd that, maybe it'd
be...
:
01:26:59,864 --> 01:27:07,892
sort of some other domain, but I think it'd be sort of the same thing I'm doing now, but
just a different domain other than sports.
:
01:27:07,953 --> 01:27:09,034
Yeah, yeah.
:
01:27:09,034 --> 01:27:11,036
Yeah, I was thinking maybe agriculture.
:
01:27:11,036 --> 01:27:15,261
oh They do a lot of spatial temporal stuff over there.
:
01:27:15,261 --> 01:27:20,016
I actually published a paper once on crop yield predictions in the Canadian prairies.
:
01:27:20,016 --> 01:27:21,507
How exciting is that?
:
01:27:21,870 --> 01:27:23,842
Yeah, yeah, yeah, exactly.
:
01:27:23,842 --> 01:27:29,968
That's actually, we worked on a similar project when I was at Pimcee Labs.
:
01:27:29,968 --> 01:27:33,231
it was actually something like that.
:
01:27:33,452 --> 01:27:34,533
Gaussian processes.
:
01:27:34,533 --> 01:27:37,916
So that's a cool space because you can use a lot of Gaussian processes.
:
01:27:38,857 --> 01:27:42,460
That's also very challenging because Gaussian processes are hard to fit.
:
01:27:42,861 --> 01:27:44,743
But they are such cool beasts.
:
01:27:46,274 --> 01:27:54,009
And second question, if you could have dinner with any great scientific mind, dead, alive
or fictional, who would it be?
:
01:27:55,130 --> 01:28:00,533
Yeah, this is interesting because I know you've asked the same question to others, so I
sort of gave it some thought.
:
01:28:00,533 --> 01:28:08,258
You know, I've been really fortunate that over the last 20 years or so, I've had a lot of
tremendous dinners with people, right?
:
01:28:08,258 --> 01:28:10,680
Whether that's in academia, like...
:
01:28:10,882 --> 01:28:22,021
Some of the, I just have great memories of dinners with people that you probably know,
Christian Robert and so I had this amazing dinner in Bristol with Julian Bisseg in the
:
01:28:22,021 --> 01:28:22,732
last year of his life.
:
01:28:22,732 --> 01:28:24,103
was a visitor there in Bristol.
:
01:28:24,103 --> 01:28:25,840
ah
:
01:28:25,840 --> 01:28:27,660
Peter Green at the same time.
:
01:28:27,660 --> 01:28:36,120
Just so many incredible dinners and sort of, and then of course now in the sports world
with coaches and owners and so on, I've just been really fortunate.
:
01:28:36,320 --> 01:28:43,560
So in sort of like the explore, exploit, I think I have a pretty good idea of what the
distribution of good and bad dinners are.
:
01:28:43,560 --> 01:28:55,500
And so I think there's a really good chance that if I named a name of someone I've never
met, that it would be sort of below my expected return if I were to sort of exploit the,
:
01:28:55,794 --> 01:28:57,354
explore rather than exploit.
:
01:28:57,534 --> 01:29:07,174
I think I would have to say to sort of look at some of the people that I've had some of
the most interesting dinners with, and I thought about this a little bit, and think the
:
01:29:07,174 --> 01:29:09,914
person I would name is Xiaoli Meng.
:
01:29:10,210 --> 01:29:15,402
So Shelley was the chair when I was hired at Harvard and then later became the Dean.
:
01:29:15,643 --> 01:29:19,184
And uh he is perhaps one of the most fascinating people I've ever met.
:
01:29:19,184 --> 01:29:24,620
He's just full of wit and humor and intelligence and just a kind human being.
:
01:29:24,620 --> 01:29:33,090
It doesn't hurt that he has a uh giant box of scotch uh sitting under his desk, which can
uh come in handy at times.
:
01:29:33,211 --> 01:29:35,432
And so, yeah, I think I would not explore.
:
01:29:35,432 --> 01:29:37,052
think I would exploit.
:
01:29:37,413 --> 01:29:40,354
by that, I mean, I haven't had dinner with Shelley and probably
:
01:29:40,354 --> 01:29:44,416
a decade and I would love to sit down with him again.
:
01:29:44,957 --> 01:29:45,898
Nice, yeah.
:
01:29:45,898 --> 01:29:51,071
Well, I really like the structured answer.
:
01:29:51,071 --> 01:29:53,001
I think I never had that yet.
:
01:29:54,163 --> 01:29:55,823
So thank you so much.
:
01:29:56,604 --> 01:29:57,414
Yeah, awesome.
:
01:29:57,414 --> 01:29:59,286
Well, I think that's...
:
01:29:59,286 --> 01:30:01,807
Let's call it a show, Luke.
:
01:30:01,967 --> 01:30:03,218
That was really amazing.
:
01:30:03,218 --> 01:30:10,030
I'm really happy because we get to explore a lot of the questions I had for you from a...
:
01:30:10,030 --> 01:30:15,746
decision-making perspective but also got to be very nerdy so that's great.
:
01:30:16,347 --> 01:30:17,628
Thank you so much.
:
01:30:17,628 --> 01:30:22,884
As usual, we'll put links in the show notes for those who want to dig deeper.
:
01:30:22,884 --> 01:30:26,998
Thanks again, Luke, for taking the time and being on the show.
:
01:30:27,179 --> 01:30:27,979
Thank you, Alex.
:
01:30:27,979 --> 01:30:28,940
It was a blast.
:
01:30:33,796 --> 01:30:37,499
This has been another episode of Learning Bayesian Statistics.
:
01:30:37,499 --> 01:30:47,988
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbaystats.com for more resources about today's topics, as well as access to more
:
01:30:47,988 --> 01:30:52,071
episodes to help you reach true Bayesian state of mind.
:
01:30:52,071 --> 01:30:54,033
That's learnbaystats.com.
:
01:30:54,033 --> 01:30:58,877
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.
:
01:30:58,877 --> 01:31:02,039
Check out his awesome work at bababrinkman.com.
:
01:31:02,039 --> 01:31:03,224
I'm your host,
:
01:31:03,224 --> 01:31:04,204
Alex Andora.
:
01:31:04,204 --> 01:31:08,424
can follow me on Twitter at Alex underscore Andora, like the country.
:
01:31:08,424 --> 01:31:15,690
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
:
01:31:15,690 --> 01:31:18,071
Thank you so much for listening and for your support.
:
01:31:18,071 --> 01:31:20,382
You're truly a good Bayesian.
:
01:31:20,382 --> 01:31:23,503
Change your predictions after taking information.
:
01:31:23,503 --> 01:31:30,530
And if you're thinking I'll be less than amazing, let's adjust those expectations.
:
01:31:30,530 --> 01:31:43,688
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation