Speaker:
00:00:05
Today I am thrilled, nay, I am honored to have Chris Fonsbeck on the show, a trailblazer
in sports analytics and Pimes' BDFL.
2
:
00:00:16
Chris's journey has spanned marine biology, sports modeling, particularly in baseball, and
broader statistical consulting, making him a key figure in the intersection of sports,
3
:
00:00:27
data science, and decision making.
4
:
00:00:30
In this episode,
5
:
00:00:31
Chris reflects on the evolution of sparse modeling from the early days of limited data to
the current era of high-frequency and hierarchical datasets.
6
:
00:00:39
He shares how Bayesian methods have been instrumental in navigating messy data and
building robust models.
7
:
00:00:47
We dive into themes like the importance of model transparency, iterative development, and
balancing simplicity with complexity.
8
:
00:00:55
Along the way, we discuss technical approaches, mygosh and processes, structural time
series,
9
:
00:01:00
and data simulations, exploring their practical applications in sports analytics, but also
beyond.
10
:
00:01:08
Whether you're a sports enthusiast, a data scientist, or someone curious about the growth
of patient methods, this episode is packed with insights and lessons learned from years of
11
:
00:01:18
experience in the field.
12
:
00:01:20
This is Learning Basics and episode 125, recorded November 6, 2024.
13
:
00:01:32
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
14
:
00:01:53
I'm your host, Alex Andorra.
15
:
00:01:55
You can follow me on Twitter at alex-underscore-andorra.
16
:
00:01:59
like the country.
17
:
00:02:00
For any info about the show, learnbasedats.com is Laplace to be.
18
:
00:02:04
Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
19
:
00:02:11
That's learnbasedats.com.
20
:
00:02:13
If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.
21
:
00:02:24
See you around, folks.
22
:
00:02:26
and best patient wishes to you all.
23
:
00:02:27
And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
24
:
00:02:35
Check us out at pimc-labs.com.
25
:
00:02:40
Chris Vonsbeek, welcome back to Learning Bajan Statistics.
26
:
00:02:44
Thanks for having me.
27
:
00:02:45
It's been a while.
28
:
00:02:45
Yeah, it's been five years exactly.
29
:
00:02:47
So I was the second guest ever?
30
:
00:02:50
You were the second guest.
31
:
00:02:52
First ever was Osvaldo Martin.
32
:
00:02:55
That's right.
33
:
00:02:56
And he was back on the show a few weeks ago, a few days ago.
34
:
00:03:01
And now it's you.
35
:
00:03:03
So I've gone a full circle of all the Bajan guests.
36
:
00:03:06
now I've...
37
:
00:03:07
You got to go back to the beginning.
38
:
00:03:08
There's no new ones.
39
:
00:03:09
Exactly.
40
:
00:03:10
I had to invite you back.
41
:
00:03:11
Like I had no choice.
42
:
00:03:13
went through the population.
43
:
00:03:15
No, of course.
44
:
00:03:15
It's a pleasure to have you back here because you've done a lot of things in five years.
45
:
00:03:22
So it's awesome to have you here.
46
:
00:03:24
We're here in New York because we're going to teach our Gaussian processes tutorial in a
few hours.
47
:
00:03:31
And that's really a fun turn of events for me because...
48
:
00:03:36
Like when I started learning Bayesian statistics five years ago, one of the first
tutorials I watched was one at Pinedet and New York about Bayesian inference, Gaussian
49
:
00:03:48
processes.
50
:
00:03:50
you know who was teaching it?
51
:
00:03:52
Me.
52
:
00:03:52
Yeah.
53
:
00:03:55
So that's really cool.
54
:
00:03:57
There we go.
55
:
00:03:57
Now five years later, we're here.
56
:
00:03:59
We're here together.
57
:
00:04:00
That is great.
58
:
00:04:01
And an honor to teach with you this afternoon.
59
:
00:04:03
Yeah.
60
:
00:04:03
It's going to be good.
61
:
00:04:04
Yeah.
62
:
00:04:04
I hope so.
63
:
00:04:07
We're typing that before so that we don't have any feedback so that we cannot say that it
was bad.
64
:
00:04:14
That's smart.
65
:
00:04:15
That's the French way to do it.
66
:
00:04:17
Good idea.
67
:
00:04:18
Editing.
68
:
00:04:20
Exactly.
69
:
00:04:21
So last time you were here, I think you were still working for the Phillies, right?
70
:
00:04:28
If it was five years ago, it would have been the Yankees.
71
:
00:04:30
Oh, so you were still at the Yankees.
72
:
00:04:32
Okay.
73
:
00:04:33
Lots of things.
74
:
00:04:34
Yeah.
75
:
00:04:35
Um, everybody knows you, you're a PIMC PDFL.
76
:
00:04:40
You've done a lot of, um, marine biology statistics.
77
:
00:04:43
You've done a lot of sports, uh, modeling.
78
:
00:04:48
What I'm curious about is how, like, how did you end up doing sports modeling?
79
:
00:04:54
Because it's a bit like me, right?
80
:
00:04:56
You didn't start doing sports modeling from the get-go and that was like your thing.
81
:
00:05:02
So how did that happen?
82
:
00:05:03
Yeah.
83
:
00:05:04
Yeah, I mean, part of it was timing related, right?
84
:
00:05:10
Because in the sort of late 20 teens is really when these new sources of high frequency
data started becoming available, at least in baseball.
85
:
00:05:26
So you had pitch effects and then TrackMan.
86
:
00:05:31
And now we have
87
:
00:05:33
Hawkeye and you know, this is providing baseball with streams of extremely high
resolution, comprehensive sources of data on everything that moves on a baseball field.
88
:
00:05:47
so, um, at that point, uh, you know, that industry began, um, looking for people that
could deal with data of that scale and use them in a useful machine learning and
89
:
00:06:00
statistical models.
90
:
00:06:03
um,
91
:
00:06:03
So it was no longer the realm of a baseball person with a spreadsheet to have more than
that.
92
:
00:06:12
so around that same time, I was at Vanderbilt in the biostatistics department, and all of
these jobs started appearing in various places.
93
:
00:06:26
actually, before that, started with, for a short stint, with the Milwaukee Brewers, just
to kind of
94
:
00:06:33
as a consultant to see what the industry was like and I didn't have to give up my tenure
track academic job.
95
:
00:06:42
I never envisioned myself being an academic in the first place.
96
:
00:06:45
So it wasn't like it was a long-term goal of mine, it's just where I ended up.
97
:
00:06:51
So I knew I wouldn't spend my whole career in academia anyway.
98
:
00:06:57
So I think it was just the next chapter and it felt like a natural transition into
99
:
00:07:02
sports, which, I've always been interested in.
100
:
00:07:05
you know, I played sports when I was young, but not particularly well.
101
:
00:07:09
So, the, you know, the analytics side kind of always appealed to me.
102
:
00:07:14
you know, I liked data science.
103
:
00:07:17
and, you know, it was a nice kind of, intersection of my interests and skills.
104
:
00:07:24
so that's kind of how it went about.
105
:
00:07:27
and, know, it continues and baseball is still, you know, there are still teams that don't
have.
106
:
00:07:32
a lot of quantitative support, although it's more widespread than it once was.
107
:
00:07:37
then, you know, beyond that, since I've left baseball, you know, you've talked to people
in other sports and you see kind of how far behind some of them are relative to baseball.
108
:
00:07:49
So there's still a lot of opportunity out there.
109
:
00:07:52
Yeah, yeah, for sure.
110
:
00:07:53
And how was the, were you a baseball already fan or amateur?
111
:
00:08:00
before you joined the Yankees?
112
:
00:08:02
Yeah, I I followed baseball, you know, my whole life and I played when I was in high
school.
113
:
00:08:07
so yeah, as you know, as an amateur observer, you know, never in any professional
capacity.
114
:
00:08:15
you definitely, you know, you get the imposter syndrome when you get in there, you go from
nothing to working for a major league baseball team, which doesn't seem right in some
115
:
00:08:25
respect.
116
:
00:08:26
But you know, you've got the skills that
117
:
00:08:29
that they need and so there you are.
118
:
00:08:31
so yeah, it was, it was always great to be able to, you know, go to the ballpark and work
there and hopefully contribute towards the success of the team.
119
:
00:08:42
So, so it's fine.
120
:
00:08:44
was a fun time.
121
:
00:08:45
Yeah.
122
:
00:08:45
And did you find that the, like the Bayesian methods were really helpful there?
123
:
00:08:52
How was your experience with all the, you know, because
124
:
00:08:56
Like working for a team, what I find also personally now that I've been there for just a
few months, but it's like you have all these interactions between the data, the data
125
:
00:09:07
engineers, you, the modeler, and then how the models are used, how they are interpreted.
126
:
00:09:16
So what was your experience here?
127
:
00:09:18
What was maybe the hardest, you know, the biggest hurdle that you had in this workflow?
128
:
00:09:26
in how did you find that Bayesian tests were helpful if they were at all?
129
:
00:09:33
Yeah, so two parts there.
130
:
00:09:35
First, the Bayesian stuff.
131
:
00:09:37
I think the Inline teams know the value of Bayesian methods, Because while you have large
quantities of data that you would think machine learning would be kind of the first stop,
132
:
00:09:51
and certainly a lot of machine learning methods are used.
133
:
00:09:54
You know those data are still messy They are naturally hierarchical clustered with
covariates and sources of noise So just throwing them into you know an XGBoost model or
134
:
00:10:08
something like that Kind of naively isn't isn't always the optimal way to to move forward
and get the most out of that data So Bayesian methods were you know are really kind of a
135
:
00:10:18
match as it's often is is kind of a natural choice
136
:
00:10:23
With the usual trade-off that you know the computational side can be very challenging when
you've it You know if you're using every pitch thrown over a 10 year period and that's a
137
:
00:10:35
lot of rows and so you know time to see and Stan will start to struggle so so one of the
challenges was you know getting yeah getting everything to work with I wouldn't call it
138
:
00:10:45
big data, but you know uncomfortably large data is always a challenge and then you know
coming from
139
:
00:10:53
academia slash software, open source software development.
140
:
00:10:57
just, you know, the whole production ization of the model is, a challenge, right?
141
:
00:11:01
So getting it up and running in, quickly, so that it can be used to make decisions is,
know, that's always a big challenge with, with, you know, you're never quite finished with
142
:
00:11:14
it.
143
:
00:11:14
It's kind of, you know, knowing when to stop and put something into production, you know,
in a way that's helpful again to the decision makers.
144
:
00:11:22
And that's always challenging.
145
:
00:11:23
then, and then you got to work within the framework of, you know, of, of the company,
which is, you know, how, how do they, how do they store their data?
146
:
00:11:31
How do they store their models?
147
:
00:11:32
How do they, how do they interface those outputs from those models to players, coaches,
and the front office?
148
:
00:11:41
And that's different from one team.
149
:
00:11:44
Yeah.
150
:
00:11:45
And how is so precisely, how did you.
151
:
00:11:49
Were you the one communicating these models?
152
:
00:11:54
Or were you just making the models and then other people were communicating them?
153
:
00:12:01
But if you were communicating them, what was your way of doing it so that the decision
makers really could use the model to its full potential?
154
:
00:12:13
At least in the case of the fillies, it was very transparent.
155
:
00:12:20
research and development team where you had access to the decision makers.
156
:
00:12:24
They are always close by and it's a large group of analysts there that worked closely
together.
157
:
00:12:32
So everybody knew what everyone else was doing.
158
:
00:12:35
So the usual way you'd write reports, you'd write, you'd productionize a model and the
results of that would appear on a dashboard for people to access.
159
:
00:12:48
My experience was a fair level of transparency.
160
:
00:12:52
But I've heard stories and I know that in some instances that's not always the case that
your work can be a little bit siloed and isolated and decision makers aren't always using
161
:
00:13:04
that information optimally or at all in some cases.
162
:
00:13:09
so that I was fortunate to avoid those sorts of challenges, which I would imagine.
163
:
00:13:17
can be a little disheartening doing, you know, lots of analytic work with interesting data
and then not have anybody put it to you.
164
:
00:13:24
So, yeah, so yeah, we were very fortunate and, know, and we're seeing the teams that do
those sorts of things are successful.
165
:
00:13:32
And those that don't, I think, tend to have a harder time.
166
:
00:13:36
have to get lucky more.
167
:
00:13:38
Yeah.
168
:
00:13:39
And, like in your experience,
169
:
00:13:44
communicating the distributions, was that something useful or were you more into point
estimates, showing the tails of the distribution?
170
:
00:13:55
How did that work?
171
:
00:13:57
Yeah, I mean, at the end of the day, think point estimates are still important, but it's
good to know how much uncertainty is associated with them.
172
:
00:14:06
that's still a challenge, I think.
173
:
00:14:09
At the end of the day, people want lists to make decisions with.
174
:
00:14:13
It's hard to, it's hard, you're always taught in doing Bayesian inference that, you know,
you carry that whole distribution with you and it's got all the information, but we're
175
:
00:14:23
often still using, you know, point estimates and, uh, but it does, you know, you can, um,
you know, using communication tools, you can convey some of that uncertainty in ways that
176
:
00:14:34
are interpretable and useful to people.
177
:
00:14:36
know, you can say that, you know, two players are, while they're mean is slightly
different.
178
:
00:14:42
There's.
179
:
00:14:43
huge overlap in the uncertainty and they're essentially the same player.
180
:
00:14:47
And so you shouldn't sweat over choosing between one or the other and that sort of thing.
181
:
00:14:52
it's still important to have those avenues for communication and making sure that they're
aware of the limitations of the data.
182
:
00:15:01
These are outputs of models and models are wrong.
183
:
00:15:06
yeah, some people either overly trust or
184
:
00:15:12
under trusts a model and over trusting it can be just as bad as under trusting it in
certain situations.
185
:
00:15:20
You know, it's just the output of a model.
186
:
00:15:22
so everything is, I think a good decision makers where are using it as one piece of the
tool.
187
:
00:15:27
don't let the model make decisions for them.
188
:
00:15:30
use it as a tool to help inform their decisions that involve more than just quantitative
baseball data.
189
:
00:15:38
Yeah.
190
:
00:15:38
Yeah.
191
:
00:15:39
That makes sense.
192
:
00:15:40
And the
193
:
00:15:41
So you were saying that, yeah, basically models are wrong.
194
:
00:15:44
How do you, like, is there a case, you know, all your work with the teams where you, like
a case where a model was really hard to develop for you and you really learned something
195
:
00:16:00
from that because I don't know, you discovered the new method was really useful in that
case and you didn't know about it or you were taking...
196
:
00:16:10
you were looking at the problem from an angle that was not useful and then changing the
way you were thinking about the problem made the difference.
197
:
00:16:20
Basically, is there a big mistake you made at some point that really made you better as a
modeler?
198
:
00:16:28
I made a lot of mistakes.
199
:
00:16:30
Yeah, lots of mistakes.
200
:
00:16:32
Yeah, the mistakes were more common than the successes, I think.
201
:
00:16:36
And yeah, you learn.
202
:
00:16:37
I learned a ton the whole way through.
203
:
00:16:41
you know, you, you come into it with a whole bunch of biases.
204
:
00:16:46
you know, you want to do things and then sort of an ideal, a idealized Bayesian way and
using base best practices.
205
:
00:16:52
And, you know, you know, when he was Gaussian processes for everything, cause they're cool
and useful and flexible.
206
:
00:16:59
But, you know, sometimes you force it a little bit and you, you know, come out with
nonsense and we got to go back to using a spline or, you know, or.
207
:
00:17:09
or a linear model, and that's fine, right?
208
:
00:17:12
There's no shame.
209
:
00:17:13
so some of the answers that were ended up being more effective sometimes were kind of the
hackier approaches that were a bit of a compromise, but they either ran more efficiently
210
:
00:17:27
or they produced estimates that were, there was always a sniff test, right?
211
:
00:17:31
You'd present the results of a model and then,
212
:
00:17:37
front office and the others that were making decisions would say, well, this doesn't make
any sense.
213
:
00:17:42
What's going on here?
214
:
00:17:44
And that's where you learn, right?
215
:
00:17:45
Your model is wrong is where you learn and you make it better.
216
:
00:17:49
So, it was a continual, you know, it was, it's a really nice example of, you know,
iteration and kind of a continual model expansion or sometimes model contraction, simpler
217
:
00:18:01
rather than more complicated.
218
:
00:18:03
But yeah, lots of valuable lessons.
219
:
00:18:06
Yeah, I can definitely resonate with everything you just said.
220
:
00:18:10
From my limited experience for now, I've seen both, I've been able to, sometimes the
Bayesian workflow as you see it on the poster really works awesome.
221
:
00:18:23
you can increase the complexity of kind of linearly.
222
:
00:18:31
But then at some point, so model expansion, as you were saying, but then at some point you
hit a wall and you're like...
223
:
00:18:37
that doesn't work at all.
224
:
00:18:38
Like you don't know why.
225
:
00:18:39
And then you try to sample then everything, you're like everything breaks and so on.
226
:
00:18:43
And then you're like, okay, what's happening?
227
:
00:18:46
And then you have to basically, um, mod to model retraction, as you were saying, where are
you at now?
228
:
00:18:53
I need to remove parts and see what's breaking.
229
:
00:18:58
And also something I find can be costly, you know, uh, intellectually it's like, yeah, I
know I'm making the model.
230
:
00:19:06
less good if I remove that stuff.
231
:
00:19:09
And that can be weird, right?
232
:
00:19:11
You're like, you know, don't want to remove that because I know mathematically it's way
better to have that in the model.
233
:
00:19:18
then, yeah, on paper, but then the model doesn't sample, dude.
234
:
00:19:22
So, know, you have to remove something.
235
:
00:19:25
I find that sometimes, you know, like heart-crushing.
236
:
00:19:30
Yeah, mean, it's never a good idea to...
237
:
00:19:33
hold on too tightly to any particular model or to fall in love with a model.
238
:
00:19:40
And I grew up in my educational career, sort of embracing model selection and model
uncertainty and the value of having more than one model.
239
:
00:19:53
And I think that's very important because you sometimes will over invest in a model,
you'll hold on too tight to it and you should really
240
:
00:20:03
hold them very loosely and be prepared to start it.
241
:
00:20:06
Sometimes you just have to start over again.
242
:
00:20:08
It's like the Concord Fallacy, right?
243
:
00:20:10
Because you've invested all the time and that doesn't mean you shouldn't invest more.
244
:
00:20:14
It might be time to trash it and start over again with something similar or just make it
simpler or strip out your favorite piece that makes it really cool.
245
:
00:20:25
It doesn't work as well.
246
:
00:20:26
Yeah, kill your darlings, right?
247
:
00:20:28
Kill your darlings.
248
:
00:20:30
Yeah, like writers talk a lot about that.
249
:
00:20:33
But yeah, mean that, yeah, some cost fallacy basically.
250
:
00:20:37
And I find something that's helpful for that workflow is Bambi.
251
:
00:20:42
If you work with point C because I'm using that more and more because it's just super easy
and fast to spin up a model.
252
:
00:20:51
Like, especially now that there is splines, there is HSGP.
253
:
00:20:56
I'm using more and more structural time series.
254
:
00:20:58
So I was talking to Tomica Pretorio, and I was like,
255
:
00:21:02
I'm probably going to try and add some structural time-series stuff in there, because like
in PIMC, you can do everything in PIMC, but it's very custom.
256
:
00:21:11
it's like then the sun cost fallacy can be higher, I find, at least for me, because it
takes longer to build the model.
257
:
00:21:18
So then you're like, but I don't want to throw all that structure away and then start
over.
258
:
00:21:23
Well, at some point you'll be able to ask Claude to do it and they'll just write you five
different models and just run them.
259
:
00:21:30
But we're not there yet.
260
:
00:21:31
No, if anybody's ever tried to yeah build Pimcee models with chat GPT that doesn't go so
well Yeah, don't do that at home But yeah, I wonder what that would look like in Bambi the
261
:
00:21:42
structural time series model at what point to you strip Sort of outstrip Bambi's ability
to keep things seem simple.
262
:
00:21:51
Yeah, well, yeah, I'm powerful models.
263
:
00:21:53
Yeah
264
:
00:21:54
That's definitely, I'm always taking Bambi to the limit.
265
:
00:21:56
So I think Tommy hates me because I'm breaking Bambi all the time.
266
:
00:21:59
So he's told me that he hates you.
267
:
00:22:01
Yeah.
268
:
00:22:02
yeah.
269
:
00:22:02
Okay.
270
:
00:22:03
So that's, that, that squares out with his behavior with me.
271
:
00:22:07
But yeah, like, so I'm always messaging you like, Hey, I think I found a bug.
272
:
00:22:11
Blah, blah.
273
:
00:22:11
I think yesterday I found a bug again.
274
:
00:22:14
And so I think one time I sent him the model I was trying to find and he was like, you
know, when I'm developing this stuff for Bambi, I never think that.
275
:
00:22:23
people are going to build such complicated models.
276
:
00:22:26
Like I'm doing that for like just a simple model.
277
:
00:22:30
And yeah, but that's super helpful because then you can like spin up a lot of models.
278
:
00:22:35
can do model comparison.
279
:
00:22:37
then once you have your like kind of final model, then you can spin up the Pimcey model if
you want, and then really custom everything so that sampling is way smoother.
280
:
00:22:48
But for development, you don't need to have, you know, the perfect model.
281
:
00:22:53
with no divergences, no perfect effective sample size problem.
282
:
00:22:58
If you don't have 1000 divergences, you know you're already in a good direction.
283
:
00:23:03
Yeah, or you can start by not doing MCMC sampling from the get-go.
284
:
00:23:09
You can just use FindMap or ADVI and get approximate answers in a shorter period of time.
285
:
00:23:16
Because for me, building the models didn't take long.
286
:
00:23:20
Yeah.
287
:
00:23:20
Waiting for them to finish running.
288
:
00:23:22
Exactly.
289
:
00:23:22
Yeah.
290
:
00:23:22
Yeah.
291
:
00:23:23
Yeah.
292
:
00:23:23
Same for me.
293
:
00:23:24
Same for me.
294
:
00:23:25
And yeah, that's actually a good point that I personally don't use enough, but I should
and I will now that you mentioned it, but yeah.
295
:
00:23:31
So basically in the modeling process, in the development process, you would use a lot of
just fine map or even ADVI because that gives you a full distribution.
296
:
00:23:40
then if the parameters are in the right direction.
297
:
00:23:44
Yeah.
298
:
00:23:45
And then if that's validated, then okay.
299
:
00:23:47
Well, then let's build, you know, the full.
300
:
00:23:50
Yeah.
301
:
00:23:50
The full production round.
302
:
00:23:51
The full production stuff.
303
:
00:23:52
Yeah.
304
:
00:23:52
It's like building a crappy car, but see how far it can go.
305
:
00:23:57
And then yeah, let's build the whole Aston Martin then.
306
:
00:24:01
Yeah, that's pretty good.
307
:
00:24:02
I like that.
308
:
00:24:05
Also something I found more and more useful personally is doing, even before using the
real data, especially because I mean, the data for at the Phillies, I think we're also
309
:
00:24:19
The same really huge.
310
:
00:24:22
so that's a lot of time, as you were saying, waiting for the model.
311
:
00:24:26
so even before that, to validate the structure of the model, what I do more and more is
simulate fake data.
312
:
00:24:33
That Claude can do do pre-code.
313
:
00:24:36
So the recovery.
314
:
00:24:38
And then use the fake data to do the parameter recovery for the model and simulation based
calibration, all that stuff.
315
:
00:24:47
Once you validate that model, well, at least you know that the structure is well, is good.
316
:
00:24:52
If your data actually follow the structure, that's a big caveat.
317
:
00:24:56
That's a big assumption, but at least then I find that when I go to the real data and I
feed the model, when I get the problems, I know that it's not really because the structure
318
:
00:25:10
of the model is wrong.
319
:
00:25:11
And that helps quite a lot because now it's
320
:
00:25:15
You can function the model in a way that's like, so then probably it's because I need to
use the non-centered parameterization here instead of the centered one, or the priors
321
:
00:25:24
maybe are too tight or something like that.
322
:
00:25:28
But it's not like, maybe I'm completely wrong and I should use another structure.
323
:
00:25:34
And that kind of limits the questions you have.
324
:
00:25:39
Can you get Claude to...
325
:
00:25:42
create data that looks like the data that you're actually using?
326
:
00:25:46
Or in my experience, when I do that, it gives me kind of lousy data.
327
:
00:25:51
I wonder if you can get it to look at a data set and say, me a similar data set with known
parameters, if you've tried that or not.
328
:
00:26:00
Oh, no, I've never tried that, but I will.
329
:
00:26:02
That's very fun.
330
:
00:26:03
No, usually I describe the data generated process I want.
331
:
00:26:12
I modify the code because there is always something that it didn't get, like it's not
doing exactly what I want, but at least I have the Bollywood plate and I can just fine
332
:
00:26:21
tune the parameters and so on.
333
:
00:26:24
But yeah, I'll try that next time.
334
:
00:26:26
That's definitely cool.
335
:
00:26:27
Yeah.
336
:
00:26:27
have no idea if it works, but yeah, I've underutilized that so far, but I'm sure it's
coming.
337
:
00:26:35
Actually, Tomica Preto has a very new blog post about that, how to do data simulation with
PIMC.
338
:
00:26:40
So I'll link to that in the show notes because that's very useful, especially for
simulation-based calibration, that's super helpful.
339
:
00:26:47
yeah, definitely we'll do that.
340
:
00:26:51
So yeah, actually, you already mentioned splines, caution processes, stuff like that.
341
:
00:26:56
And I know when we do teachings and so on, and even myself sometimes I'm like,
342
:
00:27:04
I can get into kind of an analysis paralysis, know, where like, well, I don't know, that
kind of data could fit so many structures and models.
343
:
00:27:13
I don't even know where to start.
344
:
00:27:15
know, so how would you like, how do you handle right now in your current job with claim
seal lines, or even when you were working with the Phillies and Yankees, how do you go
345
:
00:27:26
about starting these cases?
346
:
00:27:29
There's two approaches you can start.
347
:
00:27:35
with this simple linear model and then see where things deviate from linear.
348
:
00:27:40
So start simple and work your way to more complex.
349
:
00:27:43
But GPs are always a good starting point because you're not constrained to any functional
form.
350
:
00:27:50
And you can go in the opposite direction, start with a very flexible GP prior.
351
:
00:27:56
And if things look simpler, you can remove the GP and put something in that's easier to
compute.
352
:
00:28:03
But GPs are hard to beat there because you essentially let the data decide.
353
:
00:28:15
There is some skill associated there because sending priors with GPs can be a little
trickier than for a regression model.
354
:
00:28:23
so buyer beware.
355
:
00:28:27
And given how easy they are now to set up and the fact that we have HSGPs.
356
:
00:28:32
that work almost like the newer models anyway.
357
:
00:28:35
I think those are pretty attractive as kind of a starting place.
358
:
00:28:40
Yeah.
359
:
00:28:41
mean, HES GPs have really changed the game for me, at least for GPs.
360
:
00:28:48
That's also really what I like about GPs is that you have very flexible functional form.
361
:
00:28:55
And at the same time, you can put a lot of domain knowledge in the priors.
362
:
00:29:00
Especially because the length scale and the amplitude are interpretable most of the time
for your use case.
363
:
00:29:08
The tricky part is when you're using an inverse link function in the model, then you have
to do the math to get to convert the amplitude and length scale, but that's not too hard.
364
:
00:29:21
And so you can put domain knowledge here and then you let the data guide the functional
form, which is like, yeah.
365
:
00:29:29
really powerful and quite hard to beat.
366
:
00:29:34
And also with HHTP now, it's just way faster and easier to fit, like even in really big
data, which I mean, I've done that already.
367
:
00:29:44
Yeah.
368
:
00:29:44
The only thing it's missing is of course, you don't have all of the kernels available to
you and doing multiplicative and additive kernels is challenging.
369
:
00:29:53
So that will evolve.
370
:
00:29:55
Yeah.
371
:
00:29:57
I mean, even with, with PIMC you can already the, the additive kernels.
372
:
00:30:00
So like we had a tutorial up on the, on the, on the PIMC website that I co-wrote with
Bill, Bill Engels.
373
:
00:30:07
And so in the advanced use case, show use case.
374
:
00:30:10
Additive kernel.
375
:
00:30:11
We show people how to do the additive kernel and also hierarchical HSGP.
376
:
00:30:15
So that's, that's pretty cool.
377
:
00:30:18
You have to be, yeah, you have like, we'll see like, because the covariances in PIMC, are
not vectorized for,
378
:
00:30:26
the number of GPs you have.
379
:
00:30:29
So if you want one covariance per GP, you have to write the power spectral density by
hand, which is a bit challenging.
380
:
00:30:39
But something I have in mind is trying to do a PR on point C for basically to vectorize
the covariance in that dimension so that you can just have covariance shape in broadcast
381
:
00:30:51
automatically.
382
:
00:30:52
But the thing is you have to
383
:
00:30:54
the current, the power spectral density changes like for each kernel.
384
:
00:30:58
That's I, I'm guessing that's why Bill didn't do that out of the box because it takes
time.
385
:
00:31:03
Could be.
386
:
00:31:04
Yeah.
387
:
00:31:06
And I mean, talking about GPS.
388
:
00:31:08
So, well, I try, I'll try to release this episode once the Pi data videos are out so that
we can put our tutorial in the show notes so that people who are really like
389
:
00:31:21
interested in GPs and HSGP, can check that out.
390
:
00:31:26
Something also I think is interesting to talk about is, since we're talking about time
series, structural time series, right?
391
:
00:31:35
Because I had Jesse Grabowski on the show the other day and he's really a structural time
series guy.
392
:
00:31:40
He's like absolutely amazing with that.
393
:
00:31:42
Really a big time series with wizard.
394
:
00:31:46
he did like incredible work on the Pinesy experimental site with the state space module.
395
:
00:31:54
How do you balance these two?
396
:
00:31:57
In my experience, you're getting the structural time series with GPs too.
397
:
00:32:02
So why would you even want or need a structural time series?
398
:
00:32:09
mean, for me, structural time series is all about separating the concern of projecting
quantities of interest from the observation process.
399
:
00:32:20
So that you're wasting time projecting noisy variables forward.
400
:
00:32:27
So you just have these two links, but independent processes whereby you can, in baseball,
for example, you can have a model for whatever contaminates your observations.
401
:
00:32:41
could be issues with a Hawkeye sensor or in the many ways that
402
:
00:32:49
baseball data can be contaminated by the observation process.
403
:
00:32:54
You can separate that from the modeling the changes in the underlying variables that
you're interested in.
404
:
00:33:02
So as much as possible, you're projecting signal forward because projection is hard enough
as it is.
405
:
00:33:09
if you can remove as much of that noise as possible at every step.
406
:
00:33:16
And so yeah, as you say, you can use a GP for that.
407
:
00:33:19
uh, projection piece of it.
408
:
00:33:21
And it's very handy having these, you know, having say multiple length scales that you can
model how the process changes at potentially different timescales.
409
:
00:33:30
Yeah.
410
:
00:33:31
You know, like in baseball, you can have within season variation, you know, as the season
progresses, I get changes in, you know, pitchers velocity and so forth.
411
:
00:33:40
And then there's changes kind of from season to season and there's kind of, you know,
short to mid to long-term career variation and you can do all of that.
412
:
00:33:49
within the context of a GP.
413
:
00:33:51
then, having, it's nice to have these latent GPs like the Hilbert space that then can, you
can put arbitrary, you know, likelihoods on that and deal with skews and multimodal stuff.
414
:
00:34:08
Yeah, definitely.
415
:
00:34:09
mean, and that's something we'll show this afternoon, basically, where we'll have, we'll
fit a model with three GPs, one, we'll use soccer data because well,
416
:
00:34:19
It's the data because it's great exactly.
417
:
00:34:22
but yeah, we'll have a short-term GP, medium-term and long-term where the long-term would
be like an aging curve.
418
:
00:34:30
It's actually interesting to see the GP peak of that herbal-ish curve of the aging curve
of the players.
419
:
00:34:37
You can also definitely see the survivor bias in this because like if I...
420
:
00:34:43
which we'll do this afternoon, we'll feed the GP on a subset of the players just for
efficiency.
421
:
00:34:49
And so you can see that if you do that on the whole dataset, the aging curve will look a
bit different.
422
:
00:34:56
If you do this on the subset of the players, then you'll see that the aging curves picks
up a bit at the right, which is wrong, right?
423
:
00:35:05
A player who is 40 effects.
424
:
00:35:07
Exactly.
425
:
00:35:07
It's like Messi and Ronaldo effects basically.
426
:
00:35:11
Because I have them in the subset that will fit.
427
:
00:35:15
that's funny.
428
:
00:35:17
So basically you could do that.
429
:
00:35:18
That would be kind of the way to do a structural time series with GP's.
430
:
00:35:21
You could have the aging curve would be the trend, I guess.
431
:
00:35:26
Then you have the medium term GP, which is within season effects.
432
:
00:35:32
And then you have your short term covariance kernel, which would basically just pick up
the noise.
433
:
00:35:41
which could be the equivalent to doing kind of a step linear trend in a parametric time
series model where you add some autoregressive component for the residual.
434
:
00:35:53
Yeah, and you also don't necessarily have to estimate the length scales.
435
:
00:35:57
It can be a design decision.
436
:
00:35:59
Where I'm interested, this part of the GP or this GP is specifically for...
437
:
00:36:06
Modeling within season variation for example, and you said I like scale that's appropriate
for that and then you have another like scale for longer term stuff and because like
438
:
00:36:15
scales can be hard to estimate properly anyway without Highly informative priors.
439
:
00:36:20
So yeah, so if you can skip that then all the better and and again, there's something you
can change It's it's sort of a lever that you have to pull.
440
:
00:36:29
Yeah need to yeah.
441
:
00:36:30
Yeah.
442
:
00:36:30
No, it's a very good point because yeah in my experience length scales I think
443
:
00:36:36
Like I must have done one GP model where the length scale was learned, but most of the
time it's basically the prior.
444
:
00:36:44
Yeah.
445
:
00:36:44
think the posterior is very flat across a lot of the interesting like scale variables
values.
446
:
00:36:51
Yeah.
447
:
00:36:51
So, I think in, remember I was talking with Bill about that and I think he told me that
it's also because there is an identifiability with the amplitude.
448
:
00:37:01
And so basically you can only learn one of the two.
449
:
00:37:06
And I don't remember why.
450
:
00:37:08
think it's a multiplicative and identifiable.
451
:
00:37:10
Maybe I remember better than myself.
452
:
00:37:13
I don't, but that makes sense.
453
:
00:37:15
Yeah.
454
:
00:37:15
So basically very cool models.
455
:
00:37:17
And I definitely encourage people to check out the GP video.
456
:
00:37:22
Now, like, so we've talked about baseball in sports quite a bit.
457
:
00:37:25
so like, well, transition out because I don't want to, you know, have all the episode
about that.
458
:
00:37:31
would, but I'm guessing some listeners are like, okay.
459
:
00:37:34
Enough with the squat stuff, So you're not working with the Phillies anymore, actually.
460
:
00:37:43
Now you're working full-time with Pimesy Labs, which I take a bit personally, have to say,
because you joined Pimesy Labs right after I left to work with the Marlins.
461
:
00:37:56
It's not an accident.
462
:
00:37:57
So yeah, I'm guessing, you know, like probably Tommy Capretto told you to do that.
463
:
00:38:02
Quick, get out.
464
:
00:38:04
Get out, Alex is getting into baseball.
465
:
00:38:06
Trace, come here, come here.
466
:
00:38:09
So yeah, basically what are you up to these days now, Trace?
467
:
00:38:13
Yeah, living the dream, right?
468
:
00:38:15
As Thomas would say, we always wanted to have PIMC as our full-time job.
469
:
00:38:23
wanted to see what that was like.
470
:
00:38:27
it does give me more time to work on PIMC related issues, obviously, the two are separate,
but.
471
:
00:38:34
correlated.
472
:
00:38:36
So yeah, I do get a lot more time to spend on that sort of stuff and and just learning,
you know, a lot more of kind of the the business of statistical consulting and
473
:
00:38:51
productionizing, PMC models and and just seeing it in different contexts.
474
:
00:38:56
I mean, I would like to, you know, part of it is
475
:
00:38:59
a desire to do some business development on the sports analytics side and looking at other
sports, maybe outside of baseball.
476
:
00:39:05
And we've talked to a few potential clients there.
477
:
00:39:09
And that's all new to me too, like sitting on sales calls and things like that.
478
:
00:39:14
It's not sure it's something I'm particularly good at, but it's always good to learn.
479
:
00:39:17
yeah, it's great to break out of the baseball bubble, least for a little bit and see what
the rest of the world is like.
480
:
00:39:26
so it's great.
481
:
00:39:27
And as you know, it's a great bunch of, of scientists at PIMC labs and everybody is way
smarter than me and, everybody's very nice and congenial.
482
:
00:39:37
So it's just a nice place to work.
483
:
00:39:41
so it's great, you for, for, you for, for the, the as far as I'm concerned.
484
:
00:39:47
So yeah, and yeah, doing of interesting projects, things I didn't never that I would work
on.
485
:
00:39:54
Uh, so that's always the interesting part is, having a wide variety of clients and, you
know, and, um, there's a lot of opportunity, you know, there's, uh, you have the
486
:
00:40:04
opportunity to learn and need to not be, be afraid of the imposter syndrome and, know, I
can, I can work on modeling bonds or something like that.
487
:
00:40:14
So even though I'm not an expert on it, I can still contribute in a meaningful way.
488
:
00:40:19
So it's nice.
489
:
00:40:19
It's always good to learn.
490
:
00:40:21
Like I'm definitely a lifetime learner and I find I get bored if I'm not learning.
491
:
00:40:27
And it's hard to, I think it would be hard to get bored doing this.
492
:
00:40:32
no, definitely.
493
:
00:40:32
mean, completely agree.
494
:
00:40:33
it's like, yeah, great bunch of folks over there and working with them for the last few
years has been absolutely amazing and great environments.
495
:
00:40:48
Only positive things to say about that.
496
:
00:40:50
Yeah.
497
:
00:40:50
And, you know, we deliver a lot of workshops and tutorials and I like doing those.
498
:
00:40:55
That's kind of the aspect of teaching that I enjoy from my time at Vanderbilt as a
professor.
499
:
00:41:02
You know, you're not grading papers, but you're, you know, you're engaging interesting
people who are keen to learn how to, you know, apply Bayesian methods better or whatever
500
:
00:41:12
the topic has to be.
501
:
00:41:14
So doing, you know, doing a fair bit of
502
:
00:41:16
teaching as well, which is great.
503
:
00:41:19
And then having some time also to, with my PyBC BDFL hat on, trying to apply for grants to
help us increase the rate of development and support and sustainability for the project in
504
:
00:41:37
general.
505
:
00:41:37
yeah, PyBC Labs affords us the opportunity to do that.
506
:
00:41:44
Very synergistic.
507
:
00:41:46
activities.
508
:
00:41:49
And so you were talking about being a lifelong learner, which I guess like all the
listeners identify with.
509
:
00:41:57
What's something you're learning these days, know, technically, like is there a method
you're really interested in, something you're really curious about and you're like, yeah,
510
:
00:42:07
I've always been curious about that, but I know how that works.
511
:
00:42:10
Let me check that out.
512
:
00:42:12
Well, I'm trying to learn pi tensor a little bit better.
513
:
00:42:14
I always knew it kind of on a very superficial level.
514
:
00:42:20
it's nice to be- Pi MC's back end?
515
:
00:42:22
Pi MC's computational back end.
516
:
00:42:24
But more specifically on the methodological side, digging into newer variational inference
methods.
517
:
00:42:31
I think a hot topic of research with sort of the performance constraints of MCMC and-
518
:
00:42:40
Variational inference becomes attractive and I we use it a lot in baseball but some of the
sort of default methods these days tend to be There are compromises and trade-offs and so
519
:
00:42:54
looking into things like Pathfinder and and Normalizing flows and things like that Digging
into some of that.
520
:
00:43:01
It's been really interesting.
521
:
00:43:03
We've got a Google Summer of Code student Michael who's just finished finishing up
522
:
00:43:10
implementing Pathfinder for Prime C.
523
:
00:43:12
So we're working on getting that going.
524
:
00:43:15
So I'm looking forward to kind of improving.
525
:
00:43:17
think the VI side of Prime C has been neglected for a long time.
526
:
00:43:22
Max did a great job way back when, and there hasn't been as much activity on that side of
things.
527
:
00:43:29
I mean, I guess it's also because we have these new algorithms coming here, which are
really, really good.
528
:
00:43:37
And so that means...
529
:
00:43:40
We should probably be able to use Pathfinder directly in a PIMC model in a few months.
530
:
00:43:46
Who knows?
531
:
00:43:46
Yeah, it's early days still, but it's still sort of in a full R &D low place right now.
532
:
00:43:57
So we're not quite sure where it is yet.
533
:
00:44:01
there are, we've got some ongoing work.
534
:
00:44:05
getting them implemented into PyMC and not just relying on the blackjack's one that we
currently had that was kind of underutilized and seeing how PyTensor might be able to
535
:
00:44:15
speed that up.
536
:
00:44:18
then, parallel with that, looking at normalizing flows, think Adrian's been doing some
stuff on the NutPy side, making normalizing flows easier to do.
537
:
00:44:30
So hopefully all of this will fall into place kind of at the same time.
538
:
00:44:34
you'll be able to use that to make VI better.
539
:
00:44:37
Yeah.
540
:
00:44:37
And the idea here would be like always in PIMC, right?
541
:
00:44:40
That would work out of the box.
542
:
00:44:41
So instead of doing PM.sample, you do PM.sample sampler equals pathfinder or...
543
:
00:44:47
PM.fit is what it would be.
544
:
00:44:49
yeah.
545
:
00:44:49
So PM.feed pathfinder or normalizing close.
546
:
00:44:53
And then as usual, you get back your inference data object with the posterior
distributions.
547
:
00:44:58
Is that what...
548
:
00:44:59
that would look like in Yeah, I mean, it's a bit mysterious.
549
:
00:45:04
was sort of working, not working for unknown reasons.
550
:
00:45:07
Just the simple eight schools model was not running properly, but it would work on other
models.
551
:
00:45:12
so Michael's done a good job of kind of digging in to see where things might not be
working.
552
:
00:45:20
it currently runs a lot faster in Stan than PIME-C.
553
:
00:45:24
And why is that the case?
554
:
00:45:25
So again, it's sort of in an R &D.
555
:
00:45:29
place right now and hopefully in the coming months you'll see new functionality appear.
556
:
00:45:35
Yeah.
557
:
00:45:36
Super exciting.
558
:
00:45:37
Yeah.
559
:
00:45:37
I'll definitely use that because as you were saying, that's useful in baseball.
560
:
00:45:41
So I'll definitely do that.
561
:
00:45:43
Something in the meantime that listeners interested in can already use is Bambi is plugged
into BIOX, which is calling Carol's implementation of...
562
:
00:45:58
normalizing flows.
563
:
00:45:59
I don't remember which algorithm it is, but you have a bunch of algorithms available with
BIOX.
564
:
00:46:05
I think normalizing flows, but not Pathfinder or the other way around.
565
:
00:46:09
Anyways, if you go to the BAMBIs website, I'll put that in the show notes.
566
:
00:46:12
There is a notebook demonstrating the alternative samplers and you can already use
normalizing flows with a BAMBi model through BIOX.
567
:
00:46:20
Very cool.
568
:
00:46:21
Very cool.
569
:
00:46:22
Need to dig into Bayou.
570
:
00:46:24
Haven't looked at it in great detail.
571
:
00:46:26
We had normalizing flows in PyMC3 for a time, but they didn't perform very reliably.
572
:
00:46:32
We ended up getting rid of them.
573
:
00:46:34
So hopefully they will be more robust this time.
574
:
00:46:38
Yeah.
575
:
00:46:39
Damn.
576
:
00:46:39
Yeah.
577
:
00:46:40
That's so exciting to have all these different stuff and also being able to plug into
other packages and using the best practices from somewhere and you just plug it in.
578
:
00:46:50
That's incredible.
579
:
00:46:51
Yeah.
580
:
00:46:51
That's where we really want to...
581
:
00:46:55
Get out there and get some grants, some R &D grants to accelerate some of this stuff
because it's really helped in the past with Google Summer of Code students and having kind
582
:
00:47:07
of their time protected so that they can, because the core developers spend a lot of their
time just triaging bugs and making the current functionality work and work better.
583
:
00:47:20
It's good to have resources to help us do.
584
:
00:47:25
innovation and new functionality.
585
:
00:47:28
so hopefully we can get some funding to help us do that.
586
:
00:47:32
Yeah.
587
:
00:47:33
Yeah.
588
:
00:47:33
So folks, you've heard it.
589
:
00:47:35
Like if you have some free time to help us for that, you know, contact Chris or myself, or
if you know about the grant or if you even have money, you just want to give us money, you
590
:
00:47:45
know, well, it's just, we're here.
591
:
00:47:46
That's right.
592
:
00:47:47
Well, that's why we like doing PyaData meetings and hackathons and code sprints because it
brings new
593
:
00:47:54
talent into the fold that some of the attendees don't really even know what Bayesian
methods are or what PIMEC is and they come and hack on the code for a little bit and you
594
:
00:48:05
never know where the next Bill Engels or next Maxime will come from.
595
:
00:48:12
Yeah, definitely.
596
:
00:48:14
That's true.
597
:
00:48:16
And actually, is there, what's your, so that's like the short-term vision for PIMEC.
598
:
00:48:22
What's your medium term vision for the package, is there something in particular you'd
like to see in there, something you'd like to change?
599
:
00:48:31
Well, that's good question because one of things I'm trying to do now is come up with a
draft roadmap and my vision doesn't really matter.
600
:
00:48:38
Like I'm the BDFL, but this is very much a community driven project.
601
:
00:48:41
So I'm interested in what the core developers want to see and what the community, larger
community wants to see out of the package.
602
:
00:48:50
And I know a lot of the core development team barely has time to stop and think about
these sorts of things because of the immediate issues that are being addressed.
603
:
00:49:01
I think that's a question we'll have to answer.
604
:
00:49:03
What do we want to see PMC do in the future as a larger group?
605
:
00:49:09
So a lot of the things that we've already talked about, better improvements to variational
inference and GPU.
606
:
00:49:18
making it easier run stuff on GPUs and getting NutPy closer and closer to the rest of PyMC
and making that easier for people to use.
607
:
00:49:28
We have lot of issues with getting various compilers to work, which makes it hard to
install PyMC using PIP, things like that.
608
:
00:49:36
So maybe getting rid of the C backend and wanting to for all and relying more on the
current backends as kind of defaults.
609
:
00:49:44
Those are the things off the top of my head.
610
:
00:49:48
long list and it's a matter of prioritizing them and getting the resources and the people
available to work on them.
611
:
00:49:56
Yeah, as always.
612
:
00:49:59
So to close this out, Sten, we're going to have to leave to teach the tutorial in a few
minutes, but I'm curious now that you've started to work with labs and talking to
613
:
00:50:12
different clients in different industries.
614
:
00:50:16
What has been your biggest surprise?
615
:
00:50:20
You know something you weren't expecting to see or a use case?
616
:
00:50:25
Biggest surprise?
617
:
00:50:28
Yeah, I don't know.
618
:
00:50:30
My biggest surprise is kind of the number of people that know about Bayesian methods and
want to use Bayesian methods that don't
619
:
00:50:43
have the resources to often do them, which is why they come to PMC Labs.
620
:
00:50:47
So we get so much of our business, not from us going out and knocking on doors, but people
coming to us.
621
:
00:50:54
so I've been pleasantly surprised at kind of how well things have gone and the fact that
we have no shortage of business.
622
:
00:51:02
And that reflects kind of a desire for the analytics community in a variety of different
623
:
00:51:13
settings to use these methods.
624
:
00:51:17
And I think it's kind of been helped along by things like PIMC marketing and these more
than CausalPy, some of the specific packages that PIMC Labs has built to make them, maybe
625
:
00:51:29
make it more obvious to potential users how valuable it is.
626
:
00:51:33
yeah, maybe the biggest surprise for me is kind of how well known it's become, gone from
being a niche, know, had trouble getting
627
:
00:51:43
papers published, academic papers published, if they're a Bayesian without p-values to the
entire, seems like the entirety of industry wanting to use these methods to help them sell
628
:
00:51:56
products or stocks or whatever it is.
629
:
00:52:01
So yeah, been pleasantly surprised at the scope and the breadth of the, that's a guess.
630
:
00:52:11
just call it the community and industry growing the way that it is.
631
:
00:52:15
Yeah.
632
:
00:52:16
It's a very good point.
633
:
00:52:17
Like I remember back in 2017, when I started learning that was really a niche thing where
you had to justify why you were doing, you know, using Bayesian methods.
634
:
00:52:27
And now I feel like the, you know, the fight is basically one where it's like, don't, you
don't really need to convince people to use those methods most of the time, like from time
635
:
00:52:38
to time, but it's, it's really, really less and less.
636
:
00:52:41
It's common to use this method, which is us amazing, of course.
637
:
00:52:49
yeah, I think as you were saying, like at Labs, we've been blessed with a grateful, with
how many, like that many clients who are interested in Yeah, and amazed that, you know,
638
:
00:53:03
it's just a group of talented people and I'm amazed, not even surprised really, because it
is, you know...
639
:
00:53:11
a talented group of developers and data scientists and statisticians at the range of
applications.
640
:
00:53:20
know, a client could come along with in a completely new industry and we can help them in
pretty short order get up and running with a model.
641
:
00:53:31
That's pretty cool.
642
:
00:53:33
Now Fisher, sure.
643
:
00:53:35
Awesome.
644
:
00:53:35
Well, Chris.
645
:
00:53:37
I think I will call you to show, of course, I'll ask you the last two questions, I ask
every guest.
646
:
00:53:42
And yeah, the show you've answered them five years ago, but maybe the answer has changed.
647
:
00:53:47
So first one, if you had unlimited time and resources, which problem would you try to
solve?
648
:
00:53:54
Unlimited time and resources, which problem would I try to solve?
649
:
00:54:00
Ooh.
650
:
00:54:02
And I can't even remember my name either from last time.
651
:
00:54:08
So that's good.
652
:
00:54:08
That means we get another independent draw from the right.
653
:
00:54:11
Exactly.
654
:
00:54:12
And the draw from my mind.
655
:
00:54:17
Gosh.
656
:
00:54:21
unsolved problem.
657
:
00:54:24
assume you want like a Bayesian problem.
658
:
00:54:26
No, no, that could be anything you want.
659
:
00:54:28
No, could be even coming, you know, with a better version of pizza, you know, right, like
whatever you want.
660
:
00:54:36
Yeah, well, I guess with the election shortly in our rearview mirror, would be, you know,
fixing fixing the analysis of polling data.
661
:
00:54:45
Yeah, that may be an impossible.
662
:
00:54:47
That may require unlimited resources, but
663
:
00:54:53
that's still seems not to work very well.
664
:
00:54:56
I would say, yeah, coming up with better ways of extracting information from voters.
665
:
00:55:01
Well, happy to help on that.
666
:
00:55:04
If I can.
667
:
00:55:06
And a second question, if you could have dinner with any great scientific mind, dead,
alive or fictional, who would it be?
668
:
00:55:15
Dead, alive or fictional?
669
:
00:55:19
having dinner.
670
:
00:55:21
Hmm.
671
:
00:55:25
I would have to say, see, and again, I can't remember what I said last time.
672
:
00:55:30
Yeah, me neither.
673
:
00:55:33
okay.
674
:
00:55:35
I'm going to say Bill James, just baseball on a sort of a baseball trajectory, one of the
very early sabermetricians applying quantitative methods to.
675
:
00:55:50
baseball and yeah, I've never met him before and it would be interesting to pick his brain
and what he thinks of the current state of analytics in baseball.
676
:
00:56:00
Yeah, definitely.
677
:
00:56:01
Yeah.
678
:
00:56:02
So I would definitely like to join that dinner.
679
:
00:56:05
So please let me know.
680
:
00:56:06
Yeah.
681
:
00:56:06
Okay.
682
:
00:56:06
He's still over at the Boston Red Sox, right?
683
:
00:56:09
So, yeah.
684
:
00:56:10
Okay.
685
:
00:56:10
I think, I think Tyler, Tyler Birch, who's contributed a bench to Pimes in Bambi.
686
:
00:56:16
think maybe.
687
:
00:56:17
Probably Tyler knows Bill.
688
:
00:56:19
It's like, Tyler, what you're doing?
689
:
00:56:21
be sharing an office right now.
690
:
00:56:23
Exactly.
691
:
00:56:24
Invite Chris, please.
692
:
00:56:27
Awesome.
693
:
00:56:28
Well, thank you so much, for taking the time again.
694
:
00:56:32
See you in five years.
695
:
00:56:33
Yeah.
696
:
00:56:33
See you back in five years.
697
:
00:56:34
know, like French presidential elections, you have your five-year term.
698
:
00:56:38
And then you come back to tell us how amazing you've been.
699
:
00:56:43
Yeah.
700
:
00:56:44
I'll come up with better answers to these questions.
701
:
00:56:47
And I will try to give you the questions in advance this time.
702
:
00:56:50
Awesome.
703
:
00:56:52
Well, thanks, Chris.
704
:
00:56:53
And we'll see you whenever you want on the show.
705
:
00:56:57
Thanks, Alex.
706
:
00:57:02
This has been another episode of Learning Bayesian Statistics.
707
:
00:57:06
Be sure to rate, review and follow the show on your favorite podcatcher and visit
learnbaystats.com for more resources about today's topics as well as access to more
708
:
00:57:17
episodes to help you reach true Bayesian state of mind.
709
:
00:57:21
That's learnbaystats.com.
710
:
00:57:23
Our theme music is Good Bayesian by Baba Brinkman.
711
:
00:57:26
Fit MC Lass and Meghiraam.
712
:
00:57:27
Check out his awesome work at bababrinkman.com.
713
:
00:57:31
I'm your host.
714
:
00:57:32
Alex and Dora.
715
:
00:57:33
can follow me on Twitter at Alex underscore and Dora like the country.
716
:
00:57:37
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
717
:
00:57:44
Thank you so much for listening and for your support.
718
:
00:57:47
You're truly a good Bayesian.
719
:
00:57:49
Change your predictions after taking information.
720
:
00:57:53
And if you're thinking I'll be less than amazing.
721
:
00:57:56
Let's adjust those expectations.
722
:
00:57:59
Let me show you how to be a good daisy Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation