Get early access to Alex's next live-cohort courses!
Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Takeaways:
Chapters:
03:51 The Journey into Sports Analytics
15:20 The Evolution of Bayesian Statistics in Sports
26:01 Innovations in NFL WAR Modeling
39:23 Causal Modeling in Sports Analytics
46:29 Defining Replacement Levels in Sports
48:26 The Going Deep Framework and Big Data in Football
52:47 Modeling Expectations in Football Data
55:40 Teaching Statistical Concepts in Sports Analytics
01:01:54 The Importance of Model Building in Education
01:04:46 Statistical Thinking in Sports Analytics
01:10:55 Innovative Research in Player Movement
01:15:47 Exploring Data Needs in American Football
01:18:43 Building a Sports Analytics Portfolio
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Joshua Meehl, Javier Sabio, Kristian Higgins, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık, Suyog Chandramouli and Adam Tilmar Jakobsen.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
My guest today is the great Ron Yorkel, assistant teaching professor and director of spots
analytics at Carnegie Mellon University.
2
:In our conversation, Ron shares how Bayesian statistics and especially hierarchical models
can unlock new ways of thinking about player performance.
3
:We talk about his work on the NFL war model, how tracking data and expected points models
are
4
:reshaping football analytics and why full uncertainty propagation is essential when
evaluating player contributions.
5
:Ron also emphasizes the importance of teaching students not just how to use models, but
how to write them, developing the statistical thinking needed to succeed in sports
6
:analytics.
7
:We also dive into the challenges of estimating player performance in team sports, where
collinearity and role complexity make modeling difficult.
8
:and the exciting frontier of using tracking data to analyze player movement.
9
:This is Learning Vision Statistics, episode 140, recorded July 22, 2025.
10
:Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
11
:I'm your host, Alex Andorra.
12
:You can follow me on Twitter at alex-underscore-andorra.
13
:like the country.
14
:For any info about the show, learnbasedats.com is Laplace to be.
15
:Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
16
:That's learnbasedats.com.
17
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
18
:See you around, folks.
19
:and best patient wishes to you all.
20
:And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
21
:Check us out at pimc-labs.com.
22
:Yoriko, welcome to Learning Bayesian Statistics.
23
:Thanks for having me.
24
:I listen to this a lot, it's an honor to be on officially.
25
:Yeah, that's also an honor to have you on.
26
:I've been meaning to invite you for a few months now, m and not only because you were kind
enough to help me with the green card, but just you do so many fun stuff.
27
:I was like, OK.
28
:these guys need to come on the show.
29
:And also then I listened to you on the Wharton Moneyball podcast, which I listen to every
week.
30
:And when I heard you were there, was like, Oh, cool.
31
:So that means he must have some new stuff to talk about.
32
:that's that's a good moment to I have two things going on.
33
:I'm one of those people where it's just like, it's just too much.
34
:Yeah, summertime is like when I get the dabble and everything to
35
:Yeah, I saw that.
36
:I loved it.
37
:For me as a podcaster is great because then I have way too many questions for you.
38
:Let's actually get into your origin story because so today you do a lot of cars, you do a
lot of stats modeling, but you do also lot of spots modeling in particular.
39
:And it looks like you love that.
40
:But I'm curious what first drew you to stats and how did that...
41
:interest evolve into uh marrying to sponsor data rather than uh say, don't know, genetics
or finance or marketing.
42
:Yeah, it's well, the funny thing is, so my interest in statistics started from sports.
43
:All right, I grew up from Pittsburgh, Western PA area.
44
:So I grew up a huge Steelers fan, Pittsburgh Steelers fan.
45
:My dad went to Penn State.
46
:I
47
:followed Penn State College football a lot.
48
:I played baseball my whole life.
49
:I was a sports fan.
50
:um And honestly, even just entering high school, I didn't really know what I wanted to do
with my life.
51
:I thought I wanted to work in sports in some way.
52
:And I remember reading about becoming an orthopedic surgeon.
53
:And if you'd be an individual working on sports injuries, I thought that could be a really
exciting career.
54
:And then I read Moneyball.
55
:And then that completely changed my life trajectory, like many people in the 2000s and
whatnot, reading Moneyball.
56
:The movie came out then, I remember when I was a senior in high school.
57
:And like I was already addicted at that point to Sabre metrics, baseball stats.
58
:And that honestly led me to entering college.
59
:The funny thing was I actually heard about like the Pittsburgh Pirates.
60
:my favorite team growing up, you my local team, they had an information systems
department.
61
:So I applied to Carnegie Mellon University for their information systems major.
62
:Got in.
63
:All right, this is the right track.
64
:And then I realized information systems was effectively like becoming a data engineer,
database management stuff.
65
:And I didn't want to do that, but I had taken, you know, intro statistics course.
66
:had AP stats and
67
:And high school and realized, a second, this is actually what I'm interested in.
68
:learning.
69
:I'm interested in learning from data.
70
:We infer from data.
71
:How can I, you know, what can I measure to understand who are the best baseball players?
72
:How can I project baseball player performance?
73
:You know, that was my initial motivation and path into statistics, as well as things like
fantasy football.
74
:sports led me to statistics, which I think is common for.
75
:a number of people in my generation.
76
:I know some, you know, opposite way, like Luke Bourne, for instance, was one that was
science and then led him into sports.
77
:I was very much sports and it led me down the path of science though, eventually to when
I, in my PhD, my dissertation was focused on problems in statistical genetics.
78
:I do a lot of work also now and working with text data, NLP problems, but sports was...
79
:always what I really had a passion for along the way.
80
:I did have the opportunity when I was an undergrad, I landed this internship with the
Pittsburgh Pirates as a data analytics intern.
81
:This was during the 2014 season.
82
:And for me, it was kind of like landing the dream job.
83
:I learned though, this was pre-Statcast era in baseball.
84
:So what I had to do was as an intern, I charted every game that took place.
85
:And along with another intern where we would have this overhead camera.
86
:And this is all public knowledge now about what the Pirates did back in the day.
87
:Because we'd have the overhead camera of the field.
88
:And then after Mark, where's everybody out in the field when the ball is being released
from the pitcher's hand?
89
:Where's everybody out in the field when the ball is crossing home plate?
90
:Where's everybody at on the field when the ball is hitting the play?
91
:And I repeatedly do this every single pitch, every single moment.
92
:And it was to generate the data.
93
:that did not exist that the pirates ultimately would use to infer optimal defensive
positioning for their shifting at the time.
94
:And I would do things like send reports and whatnot.
95
:And I realized I didn't have the passion for like the day-to-day of working for a team,
but I had the opportunity to develop like a long-term project at the time with the
96
:pirates.
97
:And that to me was the most interesting thing.
98
:And that was...
99
:That was really my first exposure to research.
100
:And that led me to the path of, I eventually wanted to go PhD in statistics.
101
:I briefly worked in finance before I'm a PhD.
102
:But while I was doing that, I was really collaborating with Dr.
103
:Sam Ventura, who used to be CMU stats, PhD, then professor.
104
:And he ended up consulting with the Pittsburgh Penguins for a couple of years before
105
:Going there full time and is now VP hockey strategy and research for the Buffalo Sabres.
106
:He was someone that he was really a mentor of mine as well as uh Andrew Thomas, who he
used to be a visiting professor at CMU, but now uh he's bounced around the different
107
:places in sports and it's recently been running the analytics group R and D team with the
Detroit Tigers in this past year.
108
:So the two of them.
109
:I really were sort of guiding me along my path as an undergrad and then post graduation.
110
:And then along with another student I graduated with Max Horowitz, we worked on this
project called NFL Scraper, where we were accessing NFL play by play data, building models
111
:from that.
112
:And that really segue then into my PhD, because that was like my first real paper and
started just continuing down this path of.
113
:becoming a person in sports analytics, going to conferences.
114
:And fortunately, was great timing with uh Professor Rebecca Nugent, who's now the
department head here, where we started doing things like launching a sports analytics
115
:conference that started my first year as a PhD student.
116
:I don't think I would recommend other PhD students to organize a conference when they
start.
117
:It's a lot of work.
118
:But once it develops over time and you become faculty doing it,
119
:it becomes a lot easier.
120
:And yeah, that's, that's really sort of my path to where I am now.
121
:I ended up finishing my PhD in stats, uh, advised by Katherine Rader and Max Giselle
stayed on then as teaching faculty, the opportunity came up with, uh, with Carnegie Mellon
122
:and had this perfect blend of getting to work on, uh, interesting research problems that
I, that I'm passionate about in sports and some other areas, as well as
123
:Developing curriculum and leading this was now the Carnegie Mellon Sports Analytics Center
that officially launched this past year.
124
:So it's been, I've lucked out.
125
:been a pretty exciting time.
126
:And I still do some consulting.
127
:I did also previously work part-time when I was finishing my PhD with a company called
Zellis Analytics.
128
:That was co-founded by Luke Bourne, Dan Cervone and Doug Fearing.
129
:often much to their chagrin, they keep asking me, but I decided to go the path of
academia.
130
:And I've just been having a blast.
131
:It's been such a fun job.
132
:Yeah, it sounds like it.
133
:by preparing the episode, I can see you're having a lot of fun because I see you're doing
so many stuff.
134
:And that's really incredible and impressive and congrats on that role that you've been to.
135
:craft for yourself from your early passion in sports.
136
:I have to say I'm impressed by the level of self-awareness you were already exhibiting in
high school because you didn't know what you wanted to do but you were already asking
137
:yourself that.
138
:That was not my case.
139
:I definitely have lucked out every step of the way.
140
:Like even my job is a very weird job.
141
:Like I don't think it's, I don't think this style is common across universities.
142
:Like even this teaching track role that we have, it's even unique within CMU and sort of,
and sort of our teacher college, how we define teaching tracks.
143
:Uh, you know, so it's just a lot of luck along the way, I'd say more than.
144
:Yeah.
145
:mean, for sure.
146
:Like, uh, you always need randomness to go, to go your way at some point, right?
147
:But.
148
:I say it's like that for pretty much anything.
149
:know, that even in like, especially in sports, right?
150
:I mean, it depends on the game, but you know, I'm I'm a big soccer fan for instance.
151
:And soccer definitely has a lot of, know, like often it's defined, decided by like one big
cup is decided by just one game and it's 90 minutes and can go really one way or the
152
:other.
153
:it's really funny when, you know, you're just like,
154
:randomness is not on your team's side these days or...
155
:Yeah, randomness is always there along the Well, that's why soccer is great because
there's no playoffs.
156
:It's just you play this nice long season and then whoever's the winner at the end of the
full...
157
:like the regular season, the Premier League, that's it.
158
:You don't have to worry about the randomness of playoffs to sign the Yeah, yeah.
159
:We have the Cubs though.
160
:like, the Cubs.
161
:So yeah, the Champions League is getting less like...
162
:is getting more like starting to look more like a championship.
163
:And I think that's like, you know, kind of their objective.
164
:But yeah, it used to be much more of a cup and that was, yeah, that was a bit, yeah, I
mean, as a statistician, I don't really like that because that often, you know, it's, it
165
:helps, it injects a bit artificially randomness in the mix.
166
:So.
167
:because it's a gap.
168
:now it's like that.
169
:Personally, though, as you know, a spectator, I also like it for it to be randomness
because then, you know, it's also an entertainment.
170
:It's also a show.
171
:So, oh, yeah, I mean, I find it's hard.
172
:Like, it's hard to find the balance between deterministic and random and random show.
173
:But you definitely want both in there, I'd say.
174
:Yeah, the balance of it, right?
175
:The NBA playoffs this year was one example of the Oklahoma City Thunder were like, buy for
the best team.
176
:the NBA usually was a sport, and there's this great paper by Mike Lopez, Ben Bomber, Greg
Matthews a while ago, how often does the best team win?
177
:Where they were looking at that, like the randomness between different sports and
basketball was the least random.
178
:But
179
:Yeah, they went up against the Indiana Pacers who were an underdog story throughout the
postseason surprising people and whatnot.
180
:um So yeah, it's always fun to have underdogs throughout.
181
:No, for sure.
182
:um And so okay, that's for sports in particular.
183
:What about Batesian Stance?
184
:Right?
185
:Because you said you you listen to that show often.
186
:So first, thank you.
187
:And yeah, so I'm curious, you know,
188
:When do you remember when you were first introduced to Bayesian stats and how often do you
get to use them in your modeling work and teaching?
189
:Yeah, this is a fun question because I was thinking about this from the viewpoint, like in
hindsight, when did I actually learn about Bayesian stats without knowing I was learning
190
:about Bayesian stats?
191
:I read Tom Tango's the book.
192
:on baseball saber metrics and whatnot.
193
:And nothing's actually like formal statistics in there.
194
:know, someone that basically self-derived a number of different quantities and, you know,
he had this projection system for trying to project future performance, which in
195
:hindsight, I realized now was my first exposure to something like empirical base of taking
an observed estimate of person and shrinking it towards league average information.
196
:Right.
197
:I didn't know at the time that's what was going on.
198
:And then more formally,
199
:At CMU as an undergrad, I was able to take like a special topics course on astrostats and
that astrostats class taught some basics of Bayesian methods.
200
:And I really didn't understand what I was doing as an undergrad then.
201
:Uh, but that was probably my first exposure beyond just in a classroom.
202
:here's Bayes theorem, which is that you learn Bayes theorem.
203
:You don't actually learn about Bayesian inference really.
204
:All right.
205
:That's just learning Bayes role in the classic, you know,
206
:smoking causes cancer or cancer smoking problem, right?
207
:Like it's that, that's just the viewpoint of how often many people are just exposed to, to
bays in the, in their beginning.
208
:But then as I talked a bit earlier about like working on this project with NFL play by
play data and figuring out how to model that data.
209
:And prior to the start of my PhD, that's when I got hooked on reading
210
:Andrew Gelman's blog started reading that actively started them.
211
:read through that classic book.
212
:Yeah, I have it in my office data analysis using regression and multi-level slash
hierarchical models by Gelman and Hill.
213
:uh, that, I, so I read that prior to starting my PhD and that sort of completely changed a
lot of my mindset, uh, about modeling and the importance of, you know, uh, polling and
214
:then how they incorporate, you
215
:And even, even when I was first getting exposed to that, wasn't even thinking about it in
a fully Bayesian sense.
216
:mean, it was technically learning about multi-level models in the classic frequentist way
with the layered wear variance estimation.
217
:Um, it's only as I developed in my PhD started to really make that transition to, you
know, I can just do things fully Bayesian.
218
:I count things in a very rich way.
219
:and just take advantage of having posterior samples from the get-go rather than relying on
the standard errors from Fisher's information, right?
220
:For just simple confidence intervals.
221
:it was sort of a path where I was, and I think others have said this, like multi-level
models are mixed effects models, right?
222
:When people first learn about random effects, that is like the gateway drug into Bayesian
statistics for many folks.
223
:Because you just start to learn about pulling and this use of this hierarchical model and
the distribution assumptions that you could say that, hey, if I'm modeling NFL data, I can
224
:have these QBs come from their own distribution.
225
:I'm going to pull QBs together, have this variance that's being estimated.
226
:And that's sharing of information across the quarterbacks and applies to this shrinkage
and how you could build from that.
227
:account for more informative prior knowledge, right, to have better approaches for
estimating like QB performance in some way.
228
:That led me to the path of more formal Bayesian modeling that I've actually tried to
really install in the classroom more so because CMU, Carnegie Mellon used to really be a
229
:Bayesian school and then it's kind of gone away from that.
230
:uh
231
:I think part of it is not necessarily leaning towards like one way versus another.
232
:I think there's just a lot of what people are doing and the research, the theoretical
research advances they're trying to do.
233
:And there's a big belief in trying to minimize assumptions in whatever way possible.
234
:And, you know, things like in the world of conformal and friends and, you know,
distribution free assumptions that they make.
235
:Well, that's nice and all.
236
:I think in one of the lessons I stress with students too is, you know, it can be really
beneficial to be extremely explicit about every single assumption you're making.
237
:Right.
238
:And when you build a Bayesian model, that is inherently what you're doing.
239
:Right.
240
:We're explicitly laying out the model in every step, every level of the data that we're
dealing with.
241
:And then from that, you know, we can check that model.
242
:without even observing, without even estimating quantities, you're getting posterior, you
can do prior predictive checks to get a sense of if this is something that's reasonable,
243
:Going through this process for them to take advantage of being very explicit at coming up
with a data generating process, the data generating story from the modeling framework.
244
:And I think that aspect of Bayesian modeling is,
245
:probably the biggest advantage for me, right?
246
:And thinking about how do I, it's almost like building Lego blocks together in a way of
like, how do I assemble these models and the way I want to do it.
247
:The, and when it comes together, it's beautiful, right?
248
:The, that, that it's like a self-taught path in a way, which I don't think is unusual.
249
:think a lot of people have to go that path of getting a lot of self-education on patient
models.
250
:And one of the things I've been doing is trying to decrease it from just being totally
self-taught and getting it into my courses that I teach at CMU.
251
:Yeah, yeah, that makes sense.
252
:I mean, yeah, most of the people I talk to and my case is also the same.
253
:It's just like you have to teach yourself patience dance.
254
:mean, teach yourself, not really, because like you rely, as you were saying, you rely on a
lot of...
255
:textbooks and courses, but it's more like you don't have a formal, you know, patient
statistics degree because most of the time they don't exist.
256
:So it's mostly that.
257
:you have to go there yourself.
258
:it's work, it's introduced in the curriculum.
259
:I was just at uh the United States Conference on Teaching Statistics this past week.
260
:And uh one of the co-authors of Bayes' roles, which is one of the best
261
:introductions for undergrads on applied Bayesian modeling.
262
:I'm going to butcher her name.
263
:Her first name is Minay.
264
:I'm not going to struggle with saying her last name.
265
:She was on the podcast.
266
:Yeah, yeah, yeah.
267
:She gave this great short talk about, this was the beginning of conference where people
are horrified, come up to her like, teach Bayesian statistics in an intro class and her
268
:response is you teach sampling distributions in an intro class.
269
:When the discussion of no one understands what the central limit theorem is in these intro
statistics courses, right?
270
:But that's what we focus on because we're just focused on teaching the classic frequentist
statistics.
271
:But what I've found and what she was talking about too is students grasp patient
statistics much easier for them to just understand and comprehend.
272
:You know, we think of like the difference between confidence intervals, incredible
intervals, right?
273
:They always screw up what a credible interval
274
:confidence intervalist.
275
:They understand what a credible interval is.
276
:They think a confidence interval is a credible interval, right?
277
:They don't understand what the frequentist view is really implying.
278
:They think it's implying what they get from Bayesian stats.
279
:it motivates just, well, you should be teaching this earlier.
280
:Couldn't say better.
281
:And yeah, so if people want more about that, Mine was on the show.
282
:Mine Doguchu, I think that's her name.
283
:uh Episode 42.
284
:And yes, so we talk about what her her book and her uh teaching philosophy.
285
:uh Very inspiring, especially if you're in a role where you have to teach a lot.
286
:uh Or mentor, definitely uh give it a listen.
287
:uh But yeah, I mean, I resonate with, course, everything you just said.
288
:And in my experience, also, not only do people come
289
:to patient stats in more of a personal way where a lot of the time it's chosen and self
starter people, know, just by sampling bias, because they want to solve a problem they
290
:have.
291
:And while they have to go grab the most appropriate tools that was not given to them.
292
:But by definition here, if you think causally, you are talking about people who
293
:have the greed and resources to do that.
294
:most of the time these, these, kind of people I end up talking to and, often, very, very
often the motivation is adding some kind of hierarchical effect somewhere in a model
295
:because uh it is just so much easier to do it in the Bayesian framework.
296
:And that's also so, so ubiquitous.
297
:in any modeling you're doing.
298
:So very often that that's how people end up, you know, going to the to the Beijing
framework, very practical path most of the time.
299
:And since you were well, since you were talking about hierarchical model, and also just,
you know, iterative learning, and you're just a lifelong learner, I can I can tell Ron,
300
:like, I'm curious, I in 2019, you released a really cool paper.
301
:called NFL war that I read a lot by the way, last year, because I wanted to re understand
war.
302
:And that's when I started wanting to really do sports analytics as you know, as a job.
303
:And so while I was looking for a job, what I did was like, okay, I'm gonna try and
implement war, but for soccer, um which of course, you know, you have to adapt a lot of
304
:things, but that's a project I did last year.
305
:That was super fun.
306
:of course ended up using hierarchical GPs and fun stuff like that.
307
:And I put that in the show notes because in the end we wrote with Max Goebel, who was also
on the show, we ended up um putting all of that in a GitHub repo.
308
:So if you guys are interested, there is the whole model in there, uh big pangsy model with
GPs ordered.
309
:um order logit, likelihood.
310
:That's on our reading list for this fall.
311
:I haven't read the paper yet because I did hear you talk about it before.
312
:Our research lab, we got it on the reading list for the fall.
313
:might reach out you later about it too.
314
:Cool.
315
:yeah.
316
:Whenever you want.
317
:That would be super fun for sure.
318
:Yeah, so I will put the link in the show notes to the end.
319
:to the GitHub repo and then in the repo you have the code and you have the link to the
archive paper.
320
:um We didn't bother trying and uh submit that somewhere because we're not academics.
321
:If someone wants to do that for us, for sure.
322
:But this is not a fun process, at least for me, so why would I do that?
323
:oh But it's great to hear that you want to look into that, Ron.
324
:I'm happy to help with anything on that.
325
:But so anyways, yes, I did that.
326
:was mostly a self-teaching project to re-understand war because then I was going to work
in a baseball team, uh but didn't know a lot about baseball at the time.
327
:So was like, let's try and translate war to uh soccer.
328
:And so I read your paper a lot and I really loved it.
329
:ah And well done for also the writing style.
330
:I liked it because it was much more engaging, you know, than the classic uh academic
paper.
331
:Most of the academic paper are just boring because they are not optimized for reading.
332
:This one was much, much funnier.
333
:yeah, so that paper is great and it's in the show notes, but it's from 2019 and I'm
guessing you learned a lot since then.
334
:It was really from 2017 actually.
335
:wow, okay.
336
:And that's the academic writing process.
337
:Case in point.
338
:uh
339
:So in that paper you have a hierarchical model and you compute wins above replacement but
in American football.
340
:And so I'm curious if you rewrote that's very hard to say with a French accent if you
rewrote that model today but with what you know now and if you had richer tracking data
341
:What would you change first?
342
:Yeah, it's great question.
343
:this was the project that um I was mentioning earlier.
344
:This was my first real sports analytics research project.
345
:This was my exposure to building hierarchical models, actually implementing them.
346
:The first step that I always think about, and we put this at the end of the paper, and
others have gone on to do this.
347
:Paul Saban, who I believe you had on the show before he wrote a version of this idea of
the first step being actually doing the all 22.
348
:So players on both sides of the field, the all 22, uh, adjusted plus minus style approach
for modeling, uh, player level effects, uh, at the play level.
349
:And so when I say just a plus minus, well, I really mean regularized adjust the plus minus
where
350
:Literally are fitting our aggression model where you have indicator variables effectively
denoting who was on the field.
351
:Right.
352
:And you could have offense, defense variables, you football is great because they're only
playing one side ever over the course of the season.
353
:Uh, but then you could have different distributions for different positions.
354
:So.
355
:Lying backers get pulled together, cornerbacks get pulled together, QBs get pulled
together.
356
:Right.
357
:And you could have this nice structure in place.
358
:And I think that one of the problems though with this, of just dealing with the play level
data is this massive amount of co-linearity that we have between players that are involved
359
:in football.
360
:unless we observe injuries, you know, you're never going to tease apart the quarterback
from the center.
361
:The person snapping with QB the football, the center is always on the field.
362
:And so if we just do this naive approach, we're going to struggle with that.
363
:And that could motivate, well, okay, how do we handle something like some prior knowledge
about the role, the importance of the player, or just some prior beliefs you may have
364
:about how much the QB matters.
365
:And some folks have thought about like, you can use like salary data as an informative
prior to tease apart how much one contributes versus another at the play level when trying
366
:to estimate these coefficients for players.
367
:That's, one way.
368
:don't, I don't know if I think it's a, an optimal way.
369
:Uh, the thing we've really focused on in these last so many years is just in about, okay,
there's all this data that now NFL teams have access to.
370
:And my PhD student, Kwong Wen, he has a whole thesis titled now, uh, he just completed his
thesis proposal on statistical modeling problems for player movement sports with tracking
371
:data.
372
:Uh, and so he's, he's really tackling this.
373
:from the player movement perspective.
374
:But one could think about how can we leverage what we know from tracking data to maybe
incorporate into this estimation of these player coefficients via how much they mattered
375
:in the play.
376
:we've been going in this work of thinking about what the hypothetical could look like for
a player.
377
:If we model uh
378
:hypothetical movement or if we perturb the player in some way, what would that change?
379
:What would we expect to change on the outcome?
380
:And can we compute almost like the leverage or the amount of weight saying how much the
player mattered in a play?
381
:And then instead of just having zeros and ones, we have something that's between zero and
one.
382
:Like, if I, if the QB behaved very differently, the QB, you know,
383
:Well, QB is always the most important player.
384
:So I'll switch to say, if this wide receiver ran a slightly different route or did
something differently, you know, what effect could that have had on the other wide
385
:receiver?
386
:You know, how much influence did that wide receiver have?
387
:And that could change just whether or not I have them zero or one or, this person was more
close to one because they really were involved or this, or they exceed one because they
388
:were had so much involvement or this person.
389
:They were on the other side of the field, nowhere near the action.
390
:If I could move them all around, wouldn't change the outcome in any way.
391
:They basically get a zero, right?
392
:That could tease apart.
393
:That's this idea that we've been really trying to think about and developing of how can we
marry these two concepts effectively of some classical hierarchical modeling of discrete
394
:level data, which has the goal of inference, ultimately.
395
:I'm trying to estimate what was the impact of a player.
396
:Right.
397
:And how can I use the tracking data to tease apart and break up this problem of
co-linearity and just the, the, the fact that the play level unit is too coarse.
398
:You know, can I use the tracking data to help me tease apart?
399
:What are the moments that maybe even matter and do the estimation at those moments even?
400
:Yeah.
401
:So it's.
402
:They, had been thinking about how to, how to do that paper better since I wrote that paper
in:
403
:Like it's just, there's so many different ideas.
404
:and it's been nice to see all these people follow up on it in a number of different ways.
405
:Like Paul Saban wrote that paper of what the ESPN data that he had basically just
implemented what we talked about of, let's, I'm going to do wrap them for the all 22 and
406
:I'll do it for NFL in college.
407
:He has a great discussion of like the problems that that comes with of trying to do that.
408
:Uh, but then, you know, another idea one could take advantage of is there's even just
charted information at the play level.
409
:Uh, and maybe one could go about that process of including like in the prior for the
player distributions, like, if I have charted information, did they have a successful
410
:block or not?
411
:Or if they do some event, you know, I could condition on that.
412
:in my prior for the player level effects, like whatever matters, what almost like box
score stats at the play level that could be observed.
413
:And that could tease apart my players in some way.
414
:Yeah, that's a long answer of the different types of things I've thought about and like
how to correct or improve upon what was really at the time too, I'd say it was limited to
415
:what we just had available.
416
:We didn't know.
417
:with that play-by-play data, we only knew who was directly involved in the play.
418
:We did not have access to all 22 players.
419
:And that's still even limited.
420
:There was a public batch of it getting released for a period of time.
421
:So from 2016 to about like halfway through 2022, I believe through the NFL, fast-door NFL
verse, you can download all 22 participation data, but then they cut it off.
422
:And so now it's, so there's just like this nice little window of time that it's there.
423
:But then there's also the tracking data samples that come out from the NFL big data bowl
each year.
424
:And that's what like my group has really been leveraging and working with for the past so
many years.
425
:And from there, you know, who's all on the field, but you know, you know, more far more
than that.
426
:You know where everybody is at every 10th of a second.
427
:Right.
428
:And can you infer
429
:You know, the contributions at a much more granular level there.
430
:And it's, it's a challenging problem.
431
:It's, still open work of trying to address this.
432
:I don't know if we'll ever, I don't know when we'll see really like the, the first good
version of war for the NFL.
433
:Uh, I know pro football focus had a version and yeah, I think it seemed pretty in line
with like what we had in a way, um, probably.
434
:biased a little too much towards the players that were directly involved.
435
:Um, but at some point we're going to get to this stage and it might be behind the scenes.
436
:might be some teams or, or companies that think they've figured it out.
437
:Um, but that's, that's been, uh, you know, something I've thought about for three years
now.
438
:How can I make that better and better across all 22 players, not just the QB.
439
:Yeah.
440
:Yeah.
441
:Yeah.
442
:I mean, I can definitely resonate with that.
443
:It's the same for me.
444
:Like each time I deploy a model or, know, I'm done with a paper or something like that.
445
:Um, it says if I'm like, yeah, I'm not, I'm underestimating it in the sense that already
know all these flaws, you know?
446
:Uh, and so, and so I put it out there and then, um, maybe people like it and they're like,
yeah, that's great.
447
:And then
448
:You know, I'm like, ah, yeah, but you know, I think it'd be better with these or with
that.
449
:It's just like the work is like that also as a modeler where you have also to get good at,
um, you know, sure.
450
:Always, always seeing, you know, what's next, but also being able to celebrate the
momentary weeds.
451
:Definitely, definitely a challenge.
452
:Uh, but yeah, I mean, I completely understand what you're saying where you were like,
okay.
453
:this is published now, how can we make it better?
454
:uh And I also that resonates a lot with the experience I had trying and doing that in
soccer.
455
:And that's also why I used a lot of paper to came up with hours because like football,
American football and soccer are much closer than baseball and soccer are because they are
456
:much more continuous and depend a lot more on the other players.
457
:And we ended up having to do the same thing, at least for the first version, you know,
where we were like, okay, let's just focus on one position for us.
458
:was the striker.
459
:And then you incorporate the rest of the, of the, positions, but it's like, if I had to do
it for the other, if I had to do it for a professional club, would definitely need to come
460
:up with different words for each position.
461
:trying to find a way to combine all that for the whole team.
462
:But definitely you need a different way to estimate what a player brings to the team in
soccer and football because they are very specialized and continuous and dependent on the
463
:rest of the team.
464
:Yeah, and soccer, the one of biggest challenges too is the actual events that you have are
far fewer.
465
:in terms of what we think of traditional events, like whether it's, you know, goals are
extremely, you know, small.
466
:But then, you know, even just shot attempts is not that many.
467
:at least in NFL, it's, we have every play, you it has a start and an end point.
468
:And so we just accumulate a lot of events.
469
:The scoring is still relatively low, but there's still constant, you know, plays and
outcomes that are observed in different ways.
470
:The, it's the complexity though of how the players are interacting with each other.
471
:That just makes it so much more difficult than things like baseball, right?
472
:Baseball has got this wonderful, there's a batter, there's a pitcher, then there's a ball
in the play and then the added defense will do things.
473
:But yeah, and there's base running and like, yeah, that's, that's complex, right?
474
:Handling, base running is extremely difficult.
475
:There's no denying about that, think.
476
:But the basics of batter, pitcher matchup is so much simpler.
477
:than everything that can happen in a football play.
478
:And this conversation just reminded me actually of uh when I was a PhD student, after I
got this paper out, funny enough, there was a Stan workshop that was led by Michael
479
:Betancourt that a number of PhD students we got to go for free.
480
:And uh I had talked to him about this problem.
481
:And he said, you should just do it from the ground up from personnel packages.
482
:The thought has always stayed with me of like, you should model from the personnel
packages.
483
:then, you know, if a player changes teams and he could generate, well, Oh, you know, if a
player, is their, this is their effect, their contribution when their personnel package
484
:with two tight ends versus one tight ends and each team has a different mixture of how
they use these different personnel packages.
485
:So for people that don't know what that means, that means just different.
486
:different counts of different types of positions on the field that they can have, ah
whether that's on offense or defense and teams vary in their style.
487
:And so one could imagine if I have like, a wide receivers effect for each of these
different potential personnel packages.
488
:If I see how that player goes from one team to the next, I can see hypothetically, what
would I estimate to be their contribution during the trade?
489
:Like, Oh, well, if you're running this style of offense, their value would
490
:estimated to be blank, but if you're running this style of offense, the value would
actually be lower or higher depending on that shift and that personnel package
491
:distribution.
492
:And that's another thing where it's, that's a layer of how the players can be arranged
that could matter in this estimation.
493
:I've never done it.
494
:That information is public though, in those samples I talked about, and it could be one
way to, to improve upon that problem.
495
:m
496
:At some point, one of these days, I'm to get back to it or I'm going to make a suit.
497
:Yeah.
498
:Yeah.
499
:But that makes sense.
500
:mean, if you think from a very causal perspective, em that's definitely also the way I
would do it em in soccer also, because you try to find the most elemental building blocks.
501
:then once you have that, then for sure, you can add it to a different team and see how.
502
:the characteristics of that team interact with the players characteristic.
503
:But your idea is mostly trying to try and find the root nodes of the causal graph, and
then modeling the rest with that and war is just an emerging phenomenon of the players
504
:characteristics.
505
:But this is not this is not the
506
:the lever you're going to pull to actually make a causal experiment on the player's
abilities.
507
:They are determined, is determined by other characteristics that comes before it in the
deck.
508
:And if you can model that, that is much, much more powerful than just looking at war.
509
:Yeah.
510
:Yeah.
511
:And war itself just has like this idea.
512
:There's this whole thing of winds above replacement.
513
:Replacement?
514
:Is the, can't stand that word.
515
:uh I have struggled with what that's supposed to mean.
516
:And that's across all sports, you know, instead of us doing it relative to what we're
inferring to be average, which it's probably not even the average player.
517
:Cause the whole selection bias we observe in sports of who's getting in and if they're
being, if they're actually getting to play, they're probably.
518
:Decent enough, right?
519
:The, versus, you know, where, where do we actually observe the selection of this
distribution that's being observed, right?
520
:It's a, it's a, it's a hard problem.
521
:That varies by the sport too.
522
:uh And this replacement level idea.
523
:I remember in that paper, we did just some roster cutoff role of, each team has X number
of players, multiply number of teams.
524
:There's a backup and bam, there we go.
525
:That's our.
526
:If you're after this, you're a replacement.
527
:That definition I've always viewed as like, I don't know.
528
:Football is really, football is very different salary structure and contracts versus like
baseball, where baseball literally has, I would say replacement players, think it's just
529
:signed and put on a team, called up from the minors.
530
:Football doesn't have that.
531
:Football, like there's, well, there's practice squad.
532
:Like players on the practice squad that aren't actually in the league and may, and they
could get elevated from the practice squad.
533
:And so maybe one can say their replacement level in a sense, but it's just so, it doesn't
happen that much.
534
:I, it's like that, that idea.
535
:I don't even know if it generalizes well across sports.
536
:Like every, we, we try it across sports, but I, that's something I've always struggled
with.
537
:It's a, it's a personal thing.
538
:Yeah.
539
:Now, I it really depends on the sport for sure.
540
:And on the teams, like even in soccer, a replacement level at PSG would not be the same as
a replacement level at the last team in the French league.
541
:yeah, it's like, I find in soccer, it's a bit easier to define than football because like
you have usually a starting 11.
542
:And so basically, if you're not regularly in the starting 11,
543
:for the big games, you qualify as the replacement level for that team.
544
:So it's like, ready, it's like your first choice out of the bench, basically.
545
:And that's in American football.
546
:It's these constant different personnel packages that it's like, oh, you're one of the
four wide receivers that are used in the four wide receiver set.
547
:So it's not like you're not consistently starting.
548
:It's just, oh, you're a specialist for this particular role.
549
:like it's, that's why American football has got this whole different dynamic to it that
complicates it because all the substitutions that take place on that level of the skill
550
:positions, it's hard to figure out what's the right way to define that.
551
:Yeah, for sure.
552
:um I mean, but you seem to really like football because you work a lot on that.
553
:you have your, you have actually another paper of yours.
554
:I want to
555
:touch on is your work on the Going Deep framework.
556
:So yeah, can you tell us about that?
557
:Yeah, so this paper was the result of when the big data bowl that I mentioned was first
launched in the first year before it even partnered with Kaggle.
558
:The NFL just released this big data set for six weeks of either the 2017 or 2018 season on
GitHub.
559
:And it was full raw tracking data.
560
:So you get all this information, every player, every 10th of a second.
561
:And the NFL also had some event annotations of like when the snap was, when the pass was
thrown and or ball of rock.
562
:And so it was extremely rich.
563
:And what we decided at CMU, group of us, myself and a number of other PhD students at the
time, along with Sam Ventura and a collaborator at PIC, Costas Pelikrinis, we were
564
:thinking of this idea of how can we extend what
565
:I had previously done in that paper with Sam and then my colleague Max Forowitz on
modeling expected points at the play level.
566
:How can we take that idea and extend it within the play?
567
:Being inspired by previous work by Dan Gervone and Luke Bourne on this continuous time EPV
that they did in basketball.
568
:How do we do that in football?
569
:And like part of this first big data bowl, there was...
570
:You get into the competition or there was also a special issue with a journal, Quante of
Analysis and Sports.
571
:And we decided to for that path because we spent, we probably spent like three or four
months coming up with this flow diagram, this flow chart that's in the paper of how do we
572
:model football?
573
:Right?
574
:It's like a play could either be a drop back where the QB keeps the ball.
575
:And if the QB keeps the ball, the QB either will be potentially a design run.
576
:or scrambling if you know, gets pressure on them or eventually the QB gets sacked or if
the QB is going to throw the ball, who's the QB throw the ball to given that he throws the
577
:ball to somebody, you need to model okay, the completion probability.
578
:It's all these different models that go together.
579
:And then eventually it leads to if someone is the ball carrier, where are they going to
end up on the field?
580
:Right.
581
:And one of the things though about this in this paper, we did this whole process and
582
:We checked, we just focused on actually building the model to estimate the ending yard
line.
583
:Uh, and I was really naive in how I did this originally where it's like, we did this like
very ML model selection problem where we had like an LSTM versus a lasso versus a this.
584
:And you can see, okay, which gives us better expectations of bending yard lines.
585
:And then I was stupid and took that ending yard line.
586
:and fed it through the models that I had for uh expected points and win probability at the
play level.
587
:And those models were very simple.
588
:was at the start of a play, what we would do is we had this multinomial logistic
regression model that given initial play context, we'd estimate the probabilities of
589
:different scoring events.
590
:What would be the next scoring event?
591
:And then given those, we would just sum across those associated probabilities with the
point values of the different events.
592
:and then you would get your expected points at the start of the play.
593
:Now the way one should think about this is this is just some, this is just some utility
function.
594
:Ignore the fact that it's called expected points.
595
:It's just some insert utility function.
596
:And what I really cared about was the expected value of this utility function.
597
:Now the problem is I built a model that spit out a single point estimate and I then just
said, oh, I'll throw this point estimate into this utility function
598
:and say that the expected value of the utility function.
599
:I'm dealing with a very complex utility function.
600
:It's asymmetric, it's broken in various ways.
601
:Throwing in this point estimate does not give me the expected value of the utility
function.
602
:And a reviewer that we had for the paper laid this all out for us in a simple dice
example.
603
:And I never felt so stupid in my life, but then it led me to this path of, what the hell
am I doing?
604
:I should just be modeling the distribution.
605
:Why are we modeling just point estimates?
606
:And there's a number of ways you could do it.
607
:Like we could have just gone a Bayesian route, but we had one of our co-authors, Taylor
Pusbissel, as part of his dissertation work, he developed some high dimensional
608
:conditional density estimate techniques.
609
:And that led us to then using uh his random forest for conditional density estimate
approach, where now what we did was we estimate, we had this
610
:conditional density estimate for the running backs ending yard line as a function of this
tracking data.
611
:And now with that full distribution estimate, we integrate over this utility function that
we had from this:
612
:And then we actually got this nice looking tenuous time EPV curve.
613
:And it was a lesson for me too of in the modeling of football data.
614
:and thinking of how many yards a ball carrier is going to get.
615
:Modeling the expectation is a bad idea.
616
:Just thinking only about the expected value is a bad idea.
617
:Because if you think about a football field, it's a hundred yards and a ball carrier has
the ball.
618
:The distribution of what they could get, it's going to be really this mixture between do
they go all the way for a touchdown versus
619
:some bounded skewed distribution with this tail going towards the target end zone and a
big mass near where they're at probably with defenders around them.
620
:Right.
621
:And so when you model the expectation, you're going to get this line in a low density
region.
622
:You're actually going to get a value that's not likely to happen, but it's yeah, it's the
mean of this distribution.
623
:And that, and I still see people
624
:consistently do this problem all the time in this and like NGS has like a leaderboard of
expected rushing yards and that it's a flawed idea in this concept.
625
:The distribution of what you could see for the yards that you gain.
626
:It's not normal, right?
627
:The mean is not a likely value.
628
:It's going to fall somewhere in a low density space.
629
:And that's even just more reinforced the importance of we need to constantly remind, you
know,
630
:students and they're working with to think about full uncertainty propagation and what
they're trying to do.
631
:Think about what's the distribution of the outcome?
632
:What can that look like?
633
:And are you estimating the right thing?
634
:Yeah, yeah.
635
:Couldn't agree more.
636
:And actually, how do you do that?
637
:Since you're speaking of the classrooms, because you teach a lot.
638
:Also, you teach in undergraduate, you teach in master level, in...
639
:uh
640
:You teach sports analytics, NLP, Bayesian stats.
641
:So yeah, I'm curious, how do you, how do you communicate these concepts to the students?
642
:And also just curious if there is, uh like what's the hardest concept for students to
internalize?
643
:Yeah.
644
:So I had a, so for context, my sports analytics class that I'm now taught,
645
:for the past two years and I'm going to be teaching again this spring.
646
:I'm turning into a book, which is exciting.
647
:The statistical methods and sports analytics to be delivered on April 1st, 2027, which I
find hilarious.
648
:That's April 1st, April fools.
649
:I wrote a textbook, but it's going to be what that course really is.
650
:As I have advanced undergrads, junior seniors at CMU, as well as masters level students
who have taken
651
:our core regression course.
652
:Then this, my sports analytics class is really, I've had students tell me this of like,
you you attract us to sports, but we really just learned applied Bayesian modeling from
653
:you.
654
:I didn't realize that, but this carries over all these other different contexts.
655
:It like, yeah, that's the point, right?
656
:Of motivating, of thinking about different problems in sports to lead them to, Hey, you
know, how do we incorporate prior knowledge and the, the, benefits of pulling and.
657
:You the advantages that we could get from full uncertainty propagation and how when we
want to in sports, we often want to simulate things.
658
:You have a posterior distribution.
659
:There you go.
660
:There's your simulations.
661
:Pump it through whatever you want.
662
:You already have the ability to simulate and see what can happen with full uncertainty
throughout it.
663
:Right.
664
:It's a wonderful concept and, you know, illustrating to that, that to that, to them has
been really fun.
665
:The challenge though, and this was something, the discussion with Samir Deshpande, who's
666
:been on this show before I, because when I was developing this course, had some
conversations with them about, okay, and she actually teaching page of modeling and what,
667
:advice he had.
668
:And his big point to me was getting, getting students to understand how to write out a
model and just build models.
669
:They're on their own.
670
:And what I've realized from doing this is students struggle to literally
671
:write down the model, right?
672
:They're so familiar with assignments, exams of, uh, you're going to fit this model of this
assumption and model with this distribution and do these steps.
673
:Like we tell them what the model is.
674
:Right.
675
:And so what I decided to do as a homework assignment in the class, and it's by far the
lowest grade that they get.
676
:In the class, this, this, this homework rate, it just absolutely murders them across the
board is I give them context and I give them, I give them a description of the data set,
677
:the context of the data.
678
:And then I give them problems that I'm interested in of like, okay, here's a tennis level
data set and you want, and there's a description data dictionary and you want to infer,
679
:you know,
680
:uh different server abilities that you have, different returner abilities that are there.
681
:And you think that, okay, there might be like an additive shift based on the server level
uh and you want to account for these different things, but also you have these server
682
:characteristics.
683
:ah Write out for me the model that you're going to use and what are the assumptions?
684
:What are the different levels of the model?
685
:What's level one?
686
:What's the basic observational level unit?
687
:What's level two, the higher level unit at like the player server level.
688
:What are the characteristics that are there?
689
:Right.
690
:And, you know, making them actually write out the model and the data generating process
they struggle with.
691
:And it's so funny because it's, you know, I do it in the lectures and I remind them of
this, that as I write out these examples, all right, this is our response.
692
:We're assuming it follows this distribution now.
693
:these parameters that we have, we're going to model them this way.
694
:And these parameters, we're going to say, follow this per state of the distribution.
695
:Like I write out level one, level two, level three of these models, tell them you got to
get familiar with this.
696
:And you'll see there are students, they get it and they ace this homework assignment.
697
:And then there are other students that are like, I never have to write out the model.
698
:This is just given to me.
699
:Like, you don't understand.
700
:After you leave school, you're going to get a data set.
701
:Right?
702
:You're going to get data.
703
:You're going to be working on problems with colleagues and stakeholders, and they're not
going to tell you how to write out the model.
704
:You know, you need to figure out what is an appropriate model for me to write out what
for, for me to try out and assume and test and evaluate.
705
:Right?
706
:That step.
707
:That's, that's the hardest step for them.
708
:I honestly think the, cause like teaching them how to code things now and everything is
there's so many different resources.
709
:then
710
:You know, you can honestly use something like uh Gemini or Claude or ChachiBT to say,
yeah, here's my model.
711
:Give me the stand code.
712
:And it can spit it out reasonably well now, to be frank with you, right?
713
:But you still need to know how to write the model, right?
714
:It's not going to give you the right code if you don't know what model you want to
construct.
715
:that's by far the, that's the hardest thing to get across to them.
716
:And the, and it's, it's the most important.
717
:Yeah, yeah, yeah, no, for sure.
718
:Yeah, it's super interesting.
719
:I didn't know.
720
:I know that would be the case.
721
:But that makes sense for sure.
722
:Yeah, that's the hardest part.
723
:It's like you're, you're here, you're in front of the other computer and now you have to
model because you're actually one of the best modelers in, in the then you're in the work
724
:in your workplace.
725
:That's why you know, like, usually you have to do the model because you're the one that's
most
726
:qualified to do it.
727
:yeah, yeah, no, that's great.
728
:And that teaches them responsibility.
729
:So it makes me think about that reminds me of, so this is French comedian, he has this
beat about GPS and saying that he's complaining because he's like, he never gets what the
730
:GPS is telling these telling him, you know, it's not detailed enough.
731
:So he's like, we should have GPS levels, you know, like I would be on the beginners level.
732
:where the GPS would tell me, instead of in 100 meter turn left, I'd be like, prepare
yourself, you're gonna have to turn left.
733
:No, not really that one, just over here.
734
:No, just a bit more.
735
:Yeah, here, now, turn left.
736
:Yeah, good, very good.
737
:And then you go through the level, and then the last level, which is a bit more like what
you're doing with this assignment, would be just guess.
738
:Well, that's the thing, right?
739
:I tell them what we're interested in.
740
:What are we interested in actually estimating?
741
:What do we care about?
742
:They need to actually understand, okay, I know about different distributions, and these
are the different types of variables that I have.
743
:They need to synthesize that and put it together to write out the actual model.
744
:I still tell them though, like, this is the goal.
745
:I want to be able to...
746
:get estimates of these player abilities in these different ways, right?
747
:Or whatever the different examples that came up with, but it's just a matter of them being
able to synthesize all of the stuff we've discussed and recognize how it fits together.
748
:Because it's not just the, oh I applied the same thing to everything, which I think
they're just so familiar with in a way.
749
:Like, oh, this is just this single linear regression, so throw it all in there.
750
:Part of me thinks it's like the ML indoctrination of students where like, yeah, all right,
I'm gonna run Lasso, I'm gonna run Ridge, and I'll run Random Forest, and I'll run
751
:XGBoost.
752
:They do all of these different predictive techniques, and then they throw them into
whatever tuning grid that's prone to multiple testing problems anyway, then they never
753
:think about that.
754
:they just, they don't, there's a lack of statistical thinking that goes on.
755
:through that type of exercise.
756
:And I want them to get the statistical thinking back in the model building that I think ML
kind of pushes them away from.
757
:Yeah.
758
:Yeah.
759
:man.
760
:That's amazing.
761
:I would have loved having you as a professor at university.
762
:That would have made my life so much easier and faster.
763
:So if any of your students are listening, uh
764
:Listen to Professor Brown, guys.
765
:He's gonna make you save a lot of time in your life.
766
:And I mean, yeah, I can see that it's really one of your goals, right?
767
:Because both your book and your sub-stacks are called Statistical Thinking in Spots
Analytics.
768
:yeah, it sounds to be really something you're focused on right now.
769
:Do you wanna talk a bit about that, by the way?
770
:Like what are your objectives with the book, with the sub-stack?
771
:what role it plays in keeping your uh classroom content fresh, things like that?
772
:Yeah, yeah.
773
:So yeah, with the Substack, it's a statistical thing in sports analytics.
774
:And it was really my way of trying to do things in terms of sharing both just work we're
doing at CMU, what my students are doing, as well as I was trying to spend some time to
775
:write about work I see publicly.
776
:I've been so bad at managing that sub stack that now though, with this, this book that I
signed a contract for, which is called statistical methods and sports analytics, I pitched
777
:it as statistical thinking and they said, I just changed the methods.
778
:Like, darn the, the, but the idea of that is, you know, what are some of the, know, when
I'm teaching these students, the big ideas and problems that they're working on in sports
779
:analytics and what are the fundamental methodology?
780
:You know, methods that they need to know in tackling these different types of problems.
781
:And at the end of the day, this book is, it's not going to go really into the player
tracking data space.
782
:I only touch on that at the end because in my class, uh, I only get at it at the end.
783
:It was so funny was when I first scoped out this course, I was going to do this whole
first half of the semester where we, we, we talk about, uh, modeling things like expected
784
:goals, expected points, and then.
785
:Okay, how do we account for player level information in there?
786
:And then that leads us into multi-level modeling and this bridge of hierarchical models
and uh going then into full evasion approaches and having player level effects and these
787
:adjusted plus minus techniques and then going into state space models.
788
:And I thought I was going to that entirely in the first half of the semester.
789
:And then the second half is going to be player tracking data.
790
:And then I taught the course and the first half became the entire semester.
791
:So the player tracking data, I realized, oh, don't, I can't, I only get it to it at the
very end as a tease.
792
:And so this book is this scope of going from, right, let's, let's model discrete game
states.
793
:And then the progression of how do we evaluate player team performance?
794
:How do we go into head to head performances of teams and, or, or, you know, getting into.
795
:You know, Bradley Terry models, and then going from the static into dynamic into state
space modeling.
796
:like the, state space techniques are really, uh, the end point of, of the course of even
connecting to just all the ideas back through it of thinking of, you know, I'm estimating,
797
:uh, graph them with, uh, this regular is just post minus for basketball players.
798
:And well, you can do this in a state space approach too, right?
799
:can have player abilities evolve.
800
:from one year to the next in some way.
801
:And so it's a really fun class and in the development of it and teaching of it, it's led
me down the path of trying to get a sense of what are the important techniques that are
802
:relevant across these different sports and also exposure to just the variety of different
sports.
803
:So at CMU,
804
:American football and basketball are by far the two biggest sports.
805
:Like I do a little pull at the beginning to see what people are interested in.
806
:And that's dominated both years.
807
:Then we see soccer, baseball, um, tennis a little bit more, uh, but then F one is a big
one.
808
:So that's what I've been thinking about.
809
:All right.
810
:These students love F one, the structure of modeling that right.
811
:And thinking about like Plackett, Lewis, I had models of dealing with this whole ranking
system.
812
:and preserving that dependency structure that's unique.
813
:I've never done that before, but it also connects the thing about like modeling the draft
across different sports.
814
:so that's led me to, I got to think about how do I develop content related to those
problems?
815
:Because one students are interested in and it's very relevant.
816
:It has its own unique structure.
817
:And so that's one of my goals even for this next year, because I didn't cover that.
818
:outside of what students that on specific projects did.
819
:And like I only worked with them on the specific project.
820
:I realized this is such a cool concept and important.
821
:I got to get this to everybody beyond.
822
:like that's going to be ah incorporated in the class next year as well as into the book.
823
:And it's, again at the idea of thinking about, you know, the, the data that you're dealing
with and you know, what are the, what are the restrictions of that data that you're
824
:facing?
825
:How do you, how do you handle in sports, the dependency structure that you have, right?
826
:The dependency structure motivates the usage of these hierarchical models.
827
:It's all repeated measurements of players, repeated measurements of teams observed across
some longitudinal period of time.
828
:Right.
829
:And it's, it's real data.
830
:And that, that, that lends itself so well to teaching these concepts that, and I tell
students this carries across, it's not just in sports, it's medical data.
831
:It's, it's every field.
832
:Right.
833
:And you know, the unfortunate thing is you really are never truly dealing with independent
data.
834
:To the real world.
835
:It's very rare.
836
:I don't know if I've ever had independent data, but all the things they initially learn is
all about independent data.
837
:I could go on for this forever.
838
:No, that's great.
839
:I I think this is super interesting for people to hear because you're at the...
840
:the intersection of a lot of different fields and teaching and practice also.
841
:So I think it's super interesting to have your perspective here.
842
:And actually before, I want to also uh pick your brain a bit more, like, de-zooming a bit
and having a broader lens.
843
:But I want you to talk also bit about the work of one of your students, uh Quan Gui En.
844
:uh who's worked a lot for his thesis uh on modeling turn angle variability of NFL players.
845
:yeah, I know you really loved that work and it's super interesting.
846
:So what can you tell us about that?
847
:And of course, I will put in the show notes any link that is important, especially the
archive link.
848
:Yeah, yeah.
849
:So Kwon just finishes
850
:30 years stats PhD student here at CMU.
851
:And he's been actually teaching our summer undergrad research program and still managed to
pump out this manuscript.
852
:He is an absolute machine at what he does.
853
:He, um, and this, this is just a piece of what was his thesis proposal where this
component, what he's been doing, modeling player movement and tracking data.
854
:And he's been borrowing actually from uh ecological literature and modeling animal
movement.
855
:And so was the vectors of displacement.
856
:of an athlete on the field and then looking at, okay, the, the angle between these two
displacement vectors and the in football, this is so interesting because players can
857
:greatly vary right in their ability, how they turn, right?
858
:Some players are more shifty than others.
859
:Right.
860
:And so Korn looked at modeling this for, for running plays as well as receivers after the
catch.
861
:Uh, and did this in such a way that like, all right, look, this is circular data.
862
:We're gonna.
863
:properly handle that distribution assumption.
864
:But what's really of interest for us, it's not that certain players are going to be biased
towards like turning to the left or the right.
865
:What's of interest is the, their variance, right?
866
:Like the variability in their turn angle.
867
:And so what we did was we had this hierarchy model with effectively these random effects
of, uh, of the player at this, concentration parameter for the von Mises distribution.
868
:Right.
869
:And so we could get at this is different players vary in terms of their variability.
870
:Right.
871
:It's a, you know, another model with non-constant variance.
872
:That's all this is.
873
:It's just a non-constant variance being attributed at the player level, uh, and different
distributions for different positions.
874
:And the results are really fun.
875
:This is one of the reasons why he wanted to get out, just as this individual paper, cause
football, American football fans and folks really, really love what we see in this.
876
:It's just for a particular window of time from the big dateable sample, but we saw some
things that were kind of cute in the way of uh one wide receiver, Metcalf, being the
877
:lowest uh variability.
878
:And he's someone that after he gets the ball, he just, takes off like a Terminator just
running towards the end zone.
879
:And on the exact opposite side was a wide receiver, George Pickens, who
880
:displayed the most variability.
881
:And when he catches the ball, he just, he's like holding the ball with one hand and he's
just kind of spinning around.
882
:There's no, there's no, it's just chaos in how he's moving.
883
:Right.
884
:and he was calling connected this then to things like, you know, the speed measurements
that we have, and as well as like what they did and like, combine drills.
885
:And this is just one piece though, of this bigger picture of how do we properly model
player movement at this scale?
886
:And from that, you know, the, the idea of, we can infer the player level attributions, but
what's, what, what explains the variability in the angle and the direction of the angle as
887
:a function of just what's being observed with the tracking data.
888
:That's another piece of this.
889
:Uh, now we just, we just hit on really like the player level interpretation, uh, for it in
this demonstration of, you know, the variability that's due to the player, uh, in, in
890
:their turn angle.
891
:uh concentration.
892
:But there's got to be more to come from that work and other pieces.
893
:if anybody's very much interested in the player tracking data space, definitely look up
Kwong Wen and all the stuff he's been working on.
894
:It's been pretty exciting.
895
:yeah, definitely.
896
:And again, the archive link will be in the show notes for people who are interested in
that.
897
:But yeah, it's super fun work.
898
:ah I love it.
899
:And how actionable it is, is super.
900
:super interesting uh really love that it's been a while now so i need to i need to play us
out because i don't want to take too much of your time but before we do the last two
901
:questions i'd like to you know take a broader view and something i'm wondering is if you
could you know embed with any pro team for one season you know be it nfl
902
:MLB, NBA, whatever you want.
903
:em What an answered an anti question would you tackle first?
904
:In what data set would you want to collect?
905
:yeah, this is gonna sound silly.
906
:And people know about that.
907
:People that know about my interest in football have heard me say this before.
908
:m But I really want to see
909
:In American football, visuals and data for snaps.
910
:And so when the center literally snaps the ball back to the quarterback, I want to see
almost like a strike zone that we have in baseball for the location of snaps.
911
:And we, don't have this, right?
912
:This would require some spatial location.
913
:Where does the ball arrive in the quarterback?
914
:But there's variability there.
915
:And I want that data.
916
:And I want to see how much is the variability and snap quality relate to the performance
on the play.
917
:Right.
918
:And I I've told people about this.
919
:It's got to matter.
920
:That's my, it's my belief and it's a silly problem, but I would love to be able to see and
get an answer to it.
921
:Because if you, if anybody ever watches a football game and a quarterback standing in
shotgun, they will see the ball vary by placement.
922
:You'll see the best centers.
923
:It'll be more consistent.
924
:The QB will consistently get the ball where it's at.
925
:But if you get a bad snap, a high snap or a low snap, it's going to disrupt the play.
926
:It changes the timing, right?
927
:And some quarterbacks are probably better at handling bad snaps versus good snaps and, you
know, versus other, for other quarterbacks.
928
:But I just don't have the data.
929
:So I can't make any, you know, conclusions yet.
930
:And I would love to be able to see that at some point.
931
:And I, know, the technology is there probably to do it.
932
:Yeah.
933
:I mean, cool if we have some uh NFL folks listening here.
934
:Guys, you're gonna give Ron some of that data.
935
:Maybe anonymized, know, but like, come on.
936
:Gonna make Ron happy.
937
:eh And also something I think is gonna be super interesting given your, you know, your
career and what you do is...
938
:Like what, what advice do you usually give people who want to break into the spots in the
analytics industry, whether dance linked to academia, like you're doing and love doing, or
939
:if it's in a, a protein.
940
:The biggest thing I tell students always is you need to develop a sports analytics
portfolio, right?
941
:The projects that you work on that you've done.
942
:And with that, need to self promote.
943
:All right.
944
:And that's, that's a really annoying part.
945
:It's really weird to self promote all the time, but it's so crucial in this industry of
getting your work out there.
946
:Right.
947
:And, the challenge of developing a portfolio is often, are students that they don't have
the time, you know, the full-time students, they don't necessarily have the time to set
948
:aside and go, okay, I'm going to not work on what I need to do for class and, uh, build a,
949
:pitch model that I could show some MLB teams, right?
950
:They're already doing as much as they can and they might even have a part-time job to help
them pay for school and whatnot.
951
:so, know, one of the things I try to do at least here at CMU, and I encourage other
instructors out there as well, is to get it to be in the classroom where students, while
952
:they're getting their course grades and you know, their course credits,
953
:they're also developing that portfolio that they can showcase.
954
:You it's not just about exams.
955
:It's about what projects can they build?
956
:And, you know, if you'd get it involved in the classroom, then, you know, it's a win-win.
957
:They're, they're adding to their portfolio, they're learning, and they don't have to
figure out how they scope out the time to get something for these teams.
958
:Cause that's what they're looking for.
959
:And, you know, it's very different in sports versus other industries.
960
:You know, I, I work for only commercial loan risk.
961
:I didn't need to have a portfolio of commercial loan risk models before I got that job.
962
:I talked about what I did in baseball before I got that job.
963
:But if I want to get a job in baseball, it's like, what did you do before that shows us
the passion for this, right?
964
:And it's probably not fair.
965
:And that's why I think it's crucial for educators out there to create those opportunities
for students such that it can be something they can actually achieve because it's the most
966
:important thing.
967
:that portfolio and then going out and self promoting and networking are crucial for
entering the industry.
968
:Yeah.
969
:Yeah.
970
:Great advice.
971
:Definitely resonate with that.
972
:And yeah, I think that's a good point.
973
:I had never thought about that.
974
:But yeah, if teachers and instructors can bake that into their curriculum where, you know,
they have to develop, students have to develop something for their portfolio.
975
:Or I would say something that may even be better is contributing to an open source
software.
976
:That is definitely something amazing because these are out there, people can see what
you're doing.
977
:You've contributed to, let's say you contribute to Stan or Pimc or something like that.
978
:These are big packages used everywhere in industry.
979
:So that's not only a signal that...
980
:You can do it.
981
:have the grit and persistence to do it, but also your code is good enough because it's
included in a code base that, you know, thousands of companies and even more people are
982
:using throughout the world.
983
:So when we add something to PIMC, it's thoroughly tested because if we break something,
then that's...
984
:can always solve it, fix it, but that's always much more of a pain at making sure it's
actually working before you deploy.
985
:Of course, there some stuff you don't catch, but yeah, like it's a signal of uh a lot of
different things as you were saying.
986
:Also, if you do it by definition, it's on your free time, so it's also demonstrating uh
your passion for it.
987
:Yeah, it demonstrates a lot of good qualities that usually
988
:recruiters are looking for.
989
:And yeah, I think that that'd be great if that were more integrated into the different
curriculum.
990
:And it's hard to do it.
991
:It's like, it's the scoping out of classes and opportunities that you have to figure it
out.
992
:But yeah, because you want to teach them so much.
993
:uh It's a great way for them to learn too, is to get these projects involved and this
active learning experience that we always have.
994
:Because I talk too much.
995
:That's awesome.
996
:I gotta tell you, you make my job much, much easier.
997
:um Any topics we didn't talk about that you wanted to cover today?
998
:No, think hit on a lot of it with definitely encourage people to check out work by PhD
student, Kuan Wen and ah see everything that he's been developing and then just stay tuned
999
:for...
:
01:23:28,962 --> 01:23:34,044
developments on uh the textbook, statistical methods in sports analytics.
:
01:23:34,084 --> 01:23:43,238
And the last thing I'd say is, which I mentioned at the very beginning, we do have our
annual Carnegie Mellon Sports Analytics Conference that will be coming up this year on
:
01:23:43,238 --> 01:23:45,809
October 24th, 25th.
:
01:23:45,989 --> 01:23:51,611
The keynote address will be by Dean Oliver, sort of the godfather of basketball analytics.
:
01:23:51,611 --> 01:23:56,223
And then the closing keynote will be Alok Patani, who he's currently with Google.
:
01:23:56,223 --> 01:23:58,214
And he also ah
:
01:23:58,222 --> 01:24:00,302
I'd work with Dean at ESPN.
:
01:24:00,302 --> 01:24:01,962
So it's going to be a fun event.
:
01:24:01,962 --> 01:24:05,302
We're getting things organized and underway for that.
:
01:24:05,702 --> 01:24:10,622
people can, the website will be up to date fairly soon.
:
01:24:10,622 --> 01:24:14,982
The next week or the week after, then people will be able to check out the information.
:
01:24:14,982 --> 01:24:15,982
It'll be fun.
:
01:24:16,262 --> 01:24:17,022
Yeah.
:
01:24:17,022 --> 01:24:18,382
Definitely let me know.
:
01:24:18,382 --> 01:24:26,422
So send me the link when it's online, because I'll add that to the show notes for people
to start enrolling.
:
01:24:26,422 --> 01:24:27,736
Are you doing a...
:
01:24:27,736 --> 01:24:30,478
code for proposals for talks or you stand?
:
01:24:30,478 --> 01:24:37,053
Yeah, so we're gonna have, we run a reproducible research competition each year with the
conference.
:
01:24:37,053 --> 01:24:43,459
And so there will be an opportunity for that as well as an opportunity to submit poster
abstracts too.
:
01:24:43,459 --> 01:24:46,901
yeah, I'll definitely send you the link.
:
01:24:46,901 --> 01:24:56,729
ah The link currently takes you to last year's page, but it'll just redirect to this
year's when it's up and running.
:
01:24:57,676 --> 01:24:58,186
Yeah, yes.
:
01:24:58,186 --> 01:25:01,426
So that's just a sign that the website is nostalgic, right?
:
01:25:01,426 --> 01:25:04,508
That's like, my God, was so much better last year.
:
01:25:04,508 --> 01:25:05,348
Yeah.
:
01:25:06,048 --> 01:25:10,239
Gotta replace the software, you know, update the software with a more optimistic one.
:
01:25:10,239 --> 01:25:10,890
Yeah.
:
01:25:10,890 --> 01:25:12,370
No, man, it was great last year.
:
01:25:12,370 --> 01:25:13,899
It's going to be even better this year.
:
01:25:13,899 --> 01:25:14,591
Come on, dude.
:
01:25:14,591 --> 01:25:18,092
oh Yeah.
:
01:25:18,092 --> 01:25:19,262
So let's definitely do that.
:
01:25:19,262 --> 01:25:25,474
When that episode is going to air, um the website is going to be working.
:
01:25:25,474 --> 01:25:27,368
that's going to work out perfect.
:
01:25:27,368 --> 01:25:31,360
I mean your website for the conference is gonna be.
:
01:25:31,520 --> 01:25:33,341
Awesome, yeah, that's great.
:
01:25:34,262 --> 01:25:40,205
I'll see if I can come up at CMU that'd be fun to meet you all in person.
:
01:25:40,205 --> 01:25:40,555
Yeah.
:
01:25:40,555 --> 01:25:42,166
Yeah, yeah, absolutely.
:
01:25:42,166 --> 01:25:43,547
Yeah, for sure.
:
01:25:43,547 --> 01:25:49,921
Yeah, if you guys want to do a live show, if you have that on the slide, you want to do
that, I'm down to organize that with you.
:
01:25:49,921 --> 01:25:51,871
That's usually, I love doing that.
:
01:25:52,252 --> 01:25:53,393
Just do it one day.
:
01:25:53,393 --> 01:25:55,886
for sure.
:
01:25:55,886 --> 01:25:57,606
Yeah, I'll reach out to you about that later.
:
01:25:57,606 --> 01:25:58,468
Yeah, it's good idea.
:
01:25:58,468 --> 01:26:01,441
Yeah, it's always super fun.
:
01:26:01,441 --> 01:26:02,652
I always love doing that.
:
01:26:02,652 --> 01:26:06,975
yeah, and it's a good excuse for me to travel.
:
01:26:06,975 --> 01:26:10,898
very happy to help with that if you guys want to that.
:
01:26:10,898 --> 01:26:24,089
um Actually, the next episode, so not this week, but the following one will be the one I
did a few weeks ago at Imperial College London, talking about epidemiology, statistical
:
01:26:24,089 --> 01:26:25,358
epidemiology.
:
01:26:25,358 --> 01:26:26,973
That's going to be a fun one too.
:
01:26:28,652 --> 01:26:29,052
Awesome.
:
01:26:29,052 --> 01:26:36,064
Well, Ron, I think we can call it a show, but first, obviously, I'm going to ask you the
last questions.
:
01:26:36,064 --> 01:26:38,665
I ask everyone at end of the show, right?
:
01:26:38,665 --> 01:26:46,147
So first one, if you had unlimited time and resources, which problem would you try to
solve?
:
01:26:46,367 --> 01:26:52,729
Apart from animations for NFL, as you suggested before, you cannot say that again.
:
01:26:52,729 --> 01:26:53,169
Yeah.
:
01:26:53,169 --> 01:26:55,589
mean, people...
:
01:26:57,934 --> 01:27:03,214
People definitely give really good answers to this all the time and the different problems
that are out there.
:
01:27:03,214 --> 01:27:06,614
I guess for me, I'll say something.
:
01:27:07,994 --> 01:27:21,134
One of the reasons why I actually decided to do genetics work was in my mom's family,
there was this rare mutation that leads to a form of muscular dystrophy and these type of
:
01:27:21,134 --> 01:27:26,726
rare mutation diseases and figuring out cures for them.
:
01:27:27,820 --> 01:27:33,952
to me, if I had all the time in the world and it's something that, you know, we talk about
the role of probability in our lives.
:
01:27:34,093 --> 01:27:39,475
My grandmother had this and it was just a 50, 50 shot, whether my mom would get it and she
didn't get it.
:
01:27:39,475 --> 01:27:39,845
Right.
:
01:27:39,845 --> 01:27:41,796
And then I can't get it.
:
01:27:41,796 --> 01:27:48,259
Unfortunately, my uncle suffered from it in his life and my grandmother, she was one of 13
kids.
:
01:27:48,259 --> 01:27:49,980
About half of them had this.
:
01:27:49,980 --> 01:27:50,360
Right.
:
01:27:50,360 --> 01:27:54,242
And then I think about it like I had all the time in the world to solve something.
:
01:27:54,242 --> 01:27:55,022
It would be.
:
01:27:55,022 --> 01:27:59,262
How do you cure these mutations that are out there in the genetic code?
:
01:27:59,342 --> 01:28:00,922
And maybe we'll get to that at some point.
:
01:28:00,922 --> 01:28:09,122
mean, there's been so much great advancement in the technology for it, but the it's
definitely something just it's the role of chance in life.
:
01:28:09,202 --> 01:28:10,782
Well, yeah.
:
01:28:11,101 --> 01:28:12,342
Yeah, definitely.
:
01:28:12,462 --> 01:28:12,962
Damn.
:
01:28:12,962 --> 01:28:13,842
50 50 shot.
:
01:28:13,842 --> 01:28:18,182
It's just like, it's way, way too high for that kind of thing.
:
01:28:18,182 --> 01:28:19,022
Yeah.
:
01:28:19,102 --> 01:28:20,482
It's just like, yeah.
:
01:28:20,482 --> 01:28:21,122
Yeah.
:
01:28:21,122 --> 01:28:22,882
Fascinating research on on that.
:
01:28:22,882 --> 01:28:24,674
I love to to
:
01:28:24,674 --> 01:28:29,055
read always and I mean, not papers, hatred papers.
:
01:28:30,416 --> 01:28:39,038
Reading anything that's made for reading um any research on CRISPR Cas9 gene editing is
just absolutely fascinating.
:
01:28:39,038 --> 01:28:43,319
And actually, I'd love to have someone who does that on the show one day.
:
01:28:43,319 --> 01:28:54,158
So if any anybody in the audience knows someone who does, you know, that kind of research
in gene editing using some some Bayesian research, please uh contact me and I'll
:
01:28:54,158 --> 01:29:09,122
I'll get that um organized because it's absolutely super fascinating research and so
impactful um that, yeah, like, so yeah, it sounds basically that this kind of disease that
:
01:29:09,482 --> 01:29:13,944
looks like gene editing with CRISPR-Cas9 is our best shot right now.
:
01:29:13,944 --> 01:29:16,734
Yeah, it's one of many of those types, right?
:
01:29:16,734 --> 01:29:21,126
But probably down the road, it could be, it could be cured.
:
01:29:21,126 --> 01:29:21,706
Yeah.
:
01:29:21,706 --> 01:29:22,546
Yeah.
:
01:29:22,732 --> 01:29:31,949
Yeah, look at HIV right now and what having HIV a generation ago meant for your uh
lifespan.
:
01:29:31,949 --> 01:29:34,650
It's just very, very different.
:
01:29:34,650 --> 01:29:41,275
uh Great answer, Thanks.
:
01:29:41,275 --> 01:29:49,141
And uh second question, if you could have dinner with any great scientific mind, dead,
alive or fictional, who would it be?
:
01:29:49,141 --> 01:29:50,742
There's a lot of good answers to this.
:
01:29:50,742 --> 01:29:51,782
uh
:
01:29:51,854 --> 01:30:00,694
Someone I'm going to say that I just missed interacting with at CMU, was Steven Feinberg.
:
01:30:00,694 --> 01:30:02,954
So he passed away in::
01:30:02,954 --> 01:30:15,354
I started my PhD in:supposed to be a fantastic person and really was, uh, so well respected amongst everybody
:
01:30:15,354 --> 01:30:16,374
in the department.
:
01:30:16,374 --> 01:30:21,514
It's, know, I wish I could get a conversation with him and pick his brain on statistics.
:
01:30:22,090 --> 01:30:31,415
I remember when I moved into my office, there was like, still remnants of Feinberg
textbooks and notes at different places and just coming across it.
:
01:30:31,935 --> 01:30:38,218
yeah, he was somebody I would love to have a conversation with about statistics.
:
01:30:38,919 --> 01:30:41,200
Yeah, that sounds about right.
:
01:30:41,200 --> 01:30:45,302
um Damn, answers today, Ron.
:
01:30:45,302 --> 01:30:47,263
You're on your A game.
:
01:30:48,104 --> 01:30:50,870
Definitely starting quarterback today.
:
01:30:50,870 --> 01:30:56,252
Also, well, yeah, it was fantastic to have you on the show, Ron.
:
01:30:56,252 --> 01:30:57,432
You're welcome.
:
01:30:57,433 --> 01:30:59,053
Welcome back anytime.
:
01:30:59,053 --> 01:31:06,716
Good luck for the CMU conference because I know it takes time and work to organize a
conference.
:
01:31:07,057 --> 01:31:12,258
But this is as someone who enjoys uh conference a lot.
:
01:31:12,419 --> 01:31:14,079
I thank you for it.
:
01:31:14,320 --> 01:31:18,291
And yeah, well, next time you have something to share, you'll let me know.
:
01:31:18,291 --> 01:31:20,526
I'll come back on the show and
:
01:31:20,526 --> 01:31:33,346
show notes folks, are going to be lot of links today for you so for those who want to dig
deeper you have all the links there you have also all the links to Ron's socials, his link
:
01:31:33,346 --> 01:31:43,106
to his sub stack, to his upcoming book, his website and again thanks a lot Ron for taking
the time and being on this show.
:
01:31:43,346 --> 01:31:45,286
Thanks for having me, it was a blast.
:
01:31:48,994 --> 01:31:52,696
This has been another episode of Learning Bayesian Statistics.
:
01:31:52,696 --> 01:32:03,202
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbaystats.com for more resources about today's topics, as well as access to more
:
01:32:03,202 --> 01:32:07,284
episodes to help you reach true Bayesian state of mind.
:
01:32:07,284 --> 01:32:09,225
That's learnbaystats.com.
:
01:32:09,225 --> 01:32:14,088
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraan.
:
01:32:14,088 --> 01:32:17,230
Check out his awesome work at bababrinkman.com.
:
01:32:17,230 --> 01:32:18,432
I'm your host.
:
01:32:18,432 --> 01:32:19,493
Alex and Dora.
:
01:32:19,493 --> 01:32:23,642
can follow me on Twitter at Alex underscore and Dora like the country.
:
01:32:23,642 --> 01:32:30,913
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
:
01:32:30,913 --> 01:32:33,295
Thank you so much for listening and for your support.
:
01:32:33,295 --> 01:32:35,597
You're truly a good Bayesian.
:
01:32:35,597 --> 01:32:42,402
Change your predictions after taking information and if you're thinking I'll be less than
amazing.
:
01:32:42,402 --> 01:32:45,705
Let's adjust those expectations.
:
01:32:45,705 --> 01:32:47,206
Let me show you how.
:
01:32:55,133 --> 01:32:58,734
Let's get them on a solid foundation