Artwork for podcast Learning Bayesian Statistics
#125 Bayesian Sports Analytics & The Future of PyMC, with Chris Fonnesbeck
Modeling Methods Episode 1255th February 2025 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 00:58:14

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire and Mike Loncaric.

Takeaways:

  • The evolution of sports modeling is tied to the availability of high-frequency data.
  • Bayesian methods are valuable in handling messy, hierarchical data.
  • Communication between data scientists and decision-makers is crucial for effective model use.
  • Models are often wrong, and learning from mistakes is part of the process.
  • Simplicity in models can sometimes yield better results than complexity.
  • The integration of analytics in sports is still developing, with opportunities in various sports.
  • Transparency in research and development teams enhances decision-making.
  • Understanding uncertainty in models is essential for informed decisions.
  • The balance between point estimates and full distributions is a challenge.
  • Iterative model development is key to improving analytics in sports.
  • It's important to avoid falling in love with a single model.
  • Data simulation can validate model structures before real data is used.
  • Gaussian processes offer flexibility in modeling without strict functional forms.
  • Structural time series help separate projection from observation noise.
  • Transitioning from sports analytics to consulting opens new opportunities.
  • Continuous learning is essential in the field of statistics.
  • The demand for Bayesian methods is growing across various industries.
  • Community-driven projects can lead to innovative solutions.

Chapters:

03:07 The Evolution of Modeling in Sports Analytics

06:03 Transitioning from Academia to Sports Modeling

08:56 The Role of Bayesian Methods in Sports Analytics

11:49 Communicating Models and Insights to Decision Makers

15:12 Learning from Mistakes in Model Development

18:06 The Importance of Model Flexibility and Iteration

21:02 Utilizing Simulation for Model Validation

23:50 Choosing the Right Model Structure for Data

27:04 Starting with Simple Models and Building Complexity

29:29 Advancements in Gaussian Processes and PyMC

31:54 Exploring Structural Time Series and GPs

37:34 Transitioning to PyMC Labs and New Opportunities

42:40 Innovations in Variational Inference Methods

48:50 Future Vision for PyMC and Community Engagement

50:43 Surprises in Bayesian Methods Adoption

54:08 Reflections on Problem Solving and Influential Figures

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Today I am thrilled, nay, I am honored to have Chris Fonsbeck on the show, a trailblazer

in sports analytics and Pimes' BDFL.

2

:

Chris's journey has spanned marine biology, sports modeling, particularly in baseball, and

broader statistical consulting, making him a key figure in the intersection of sports,

3

:

data science, and decision making.

4

:

In this episode,

5

:

Chris reflects on the evolution of sparse modeling from the early days of limited data to

the current era of high-frequency and hierarchical datasets.

6

:

He shares how Bayesian methods have been instrumental in navigating messy data and

building robust models.

7

:

We dive into themes like the importance of model transparency, iterative development, and

balancing simplicity with complexity.

8

:

Along the way, we discuss technical approaches, mygosh and processes, structural time

series,

9

:

and data simulations, exploring their practical applications in sports analytics, but also

beyond.

10

:

Whether you're a sports enthusiast, a data scientist, or someone curious about the growth

of patient methods, this episode is packed with insights and lessons learned from years of

11

:

experience in the field.

12

:

This is Learning Basics and episode 125, recorded November 6, 2024.

13

:

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

14

:

I'm your host, Alex Andorra.

15

:

You can follow me on Twitter at alex-underscore-andorra.

16

:

like the country.

17

:

For any info about the show, learnbasedats.com is Laplace to be.

18

:

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on

Patreon, everything is in there.

19

:

That's learnbasedats.com.

20

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.

21

:

See you around, folks.

22

:

and best patient wishes to you all.

23

:

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

24

:

Check us out at pimc-labs.com.

25

:

Chris Vonsbeek, welcome back to Learning Bajan Statistics.

26

:

Thanks for having me.

27

:

It's been a while.

28

:

Yeah, it's been five years exactly.

29

:

So I was the second guest ever?

30

:

You were the second guest.

31

:

First ever was Osvaldo Martin.

32

:

That's right.

33

:

And he was back on the show a few weeks ago, a few days ago.

34

:

And now it's you.

35

:

So I've gone a full circle of all the Bajan guests.

36

:

now I've...

37

:

You got to go back to the beginning.

38

:

There's no new ones.

39

:

Exactly.

40

:

I had to invite you back.

41

:

Like I had no choice.

42

:

went through the population.

43

:

No, of course.

44

:

It's a pleasure to have you back here because you've done a lot of things in five years.

45

:

So it's awesome to have you here.

46

:

We're here in New York because we're going to teach our Gaussian processes tutorial in a

few hours.

47

:

And that's really a fun turn of events for me because...

48

:

Like when I started learning Bayesian statistics five years ago, one of the first

tutorials I watched was one at Pinedet and New York about Bayesian inference, Gaussian

49

:

processes.

50

:

you know who was teaching it?

51

:

Me.

52

:

Yeah.

53

:

So that's really cool.

54

:

There we go.

55

:

Now five years later, we're here.

56

:

We're here together.

57

:

That is great.

58

:

And an honor to teach with you this afternoon.

59

:

Yeah.

60

:

It's going to be good.

61

:

Yeah.

62

:

I hope so.

63

:

We're typing that before so that we don't have any feedback so that we cannot say that it

was bad.

64

:

That's smart.

65

:

That's the French way to do it.

66

:

Good idea.

67

:

Editing.

68

:

Exactly.

69

:

So last time you were here, I think you were still working for the Phillies, right?

70

:

If it was five years ago, it would have been the Yankees.

71

:

Oh, so you were still at the Yankees.

72

:

Okay.

73

:

Lots of things.

74

:

Yeah.

75

:

Um, everybody knows you, you're a PIMC PDFL.

76

:

You've done a lot of, um, marine biology statistics.

77

:

You've done a lot of sports, uh, modeling.

78

:

What I'm curious about is how, like, how did you end up doing sports modeling?

79

:

Because it's a bit like me, right?

80

:

You didn't start doing sports modeling from the get-go and that was like your thing.

81

:

So how did that happen?

82

:

Yeah.

83

:

Yeah, I mean, part of it was timing related, right?

84

:

Because in the sort of late 20 teens is really when these new sources of high frequency

data started becoming available, at least in baseball.

85

:

So you had pitch effects and then TrackMan.

86

:

And now we have

87

:

Hawkeye and you know, this is providing baseball with streams of extremely high

resolution, comprehensive sources of data on everything that moves on a baseball field.

88

:

so, um, at that point, uh, you know, that industry began, um, looking for people that

could deal with data of that scale and use them in a useful machine learning and

89

:

statistical models.

90

:

um,

91

:

So it was no longer the realm of a baseball person with a spreadsheet to have more than

that.

92

:

so around that same time, I was at Vanderbilt in the biostatistics department, and all of

these jobs started appearing in various places.

93

:

actually, before that, started with, for a short stint, with the Milwaukee Brewers, just

to kind of

94

:

as a consultant to see what the industry was like and I didn't have to give up my tenure

track academic job.

95

:

I never envisioned myself being an academic in the first place.

96

:

So it wasn't like it was a long-term goal of mine, it's just where I ended up.

97

:

So I knew I wouldn't spend my whole career in academia anyway.

98

:

So I think it was just the next chapter and it felt like a natural transition into

99

:

sports, which, I've always been interested in.

100

:

you know, I played sports when I was young, but not particularly well.

101

:

So, the, you know, the analytics side kind of always appealed to me.

102

:

you know, I liked data science.

103

:

and, you know, it was a nice kind of, intersection of my interests and skills.

104

:

so that's kind of how it went about.

105

:

and, know, it continues and baseball is still, you know, there are still teams that don't

have.

106

:

a lot of quantitative support, although it's more widespread than it once was.

107

:

then, you know, beyond that, since I've left baseball, you know, you've talked to people

in other sports and you see kind of how far behind some of them are relative to baseball.

108

:

So there's still a lot of opportunity out there.

109

:

Yeah, yeah, for sure.

110

:

And how was the, were you a baseball already fan or amateur?

111

:

before you joined the Yankees?

112

:

Yeah, I I followed baseball, you know, my whole life and I played when I was in high

school.

113

:

so yeah, as you know, as an amateur observer, you know, never in any professional

capacity.

114

:

you definitely, you know, you get the imposter syndrome when you get in there, you go from

nothing to working for a major league baseball team, which doesn't seem right in some

115

:

respect.

116

:

But you know, you've got the skills that

117

:

that they need and so there you are.

118

:

so yeah, it was, it was always great to be able to, you know, go to the ballpark and work

there and hopefully contribute towards the success of the team.

119

:

So, so it's fine.

120

:

was a fun time.

121

:

Yeah.

122

:

And did you find that the, like the Bayesian methods were really helpful there?

123

:

How was your experience with all the, you know, because

124

:

Like working for a team, what I find also personally now that I've been there for just a

few months, but it's like you have all these interactions between the data, the data

125

:

engineers, you, the modeler, and then how the models are used, how they are interpreted.

126

:

So what was your experience here?

127

:

What was maybe the hardest, you know, the biggest hurdle that you had in this workflow?

128

:

in how did you find that Bayesian tests were helpful if they were at all?

129

:

Yeah, so two parts there.

130

:

First, the Bayesian stuff.

131

:

I think the Inline teams know the value of Bayesian methods, Because while you have large

quantities of data that you would think machine learning would be kind of the first stop,

132

:

and certainly a lot of machine learning methods are used.

133

:

You know those data are still messy They are naturally hierarchical clustered with

covariates and sources of noise So just throwing them into you know an XGBoost model or

134

:

something like that Kind of naively isn't isn't always the optimal way to to move forward

and get the most out of that data So Bayesian methods were you know are really kind of a

135

:

match as it's often is is kind of a natural choice

136

:

With the usual trade-off that you know the computational side can be very challenging when

you've it You know if you're using every pitch thrown over a 10 year period and that's a

137

:

lot of rows and so you know time to see and Stan will start to struggle so so one of the

challenges was you know getting yeah getting everything to work with I wouldn't call it

138

:

big data, but you know uncomfortably large data is always a challenge and then you know

coming from

139

:

academia slash software, open source software development.

140

:

just, you know, the whole production ization of the model is, a challenge, right?

141

:

So getting it up and running in, quickly, so that it can be used to make decisions is,

know, that's always a big challenge with, with, you know, you're never quite finished with

142

:

it.

143

:

It's kind of, you know, knowing when to stop and put something into production, you know,

in a way that's helpful again to the decision makers.

144

:

And that's always challenging.

145

:

then, and then you got to work within the framework of, you know, of, of the company,

which is, you know, how, how do they, how do they store their data?

146

:

How do they store their models?

147

:

How do they, how do they interface those outputs from those models to players, coaches,

and the front office?

148

:

And that's different from one team.

149

:

Yeah.

150

:

And how is so precisely, how did you.

151

:

Were you the one communicating these models?

152

:

Or were you just making the models and then other people were communicating them?

153

:

But if you were communicating them, what was your way of doing it so that the decision

makers really could use the model to its full potential?

154

:

At least in the case of the fillies, it was very transparent.

155

:

research and development team where you had access to the decision makers.

156

:

They are always close by and it's a large group of analysts there that worked closely

together.

157

:

So everybody knew what everyone else was doing.

158

:

So the usual way you'd write reports, you'd write, you'd productionize a model and the

results of that would appear on a dashboard for people to access.

159

:

My experience was a fair level of transparency.

160

:

But I've heard stories and I know that in some instances that's not always the case that

your work can be a little bit siloed and isolated and decision makers aren't always using

161

:

that information optimally or at all in some cases.

162

:

so that I was fortunate to avoid those sorts of challenges, which I would imagine.

163

:

can be a little disheartening doing, you know, lots of analytic work with interesting data

and then not have anybody put it to you.

164

:

So, yeah, so yeah, we were very fortunate and, know, and we're seeing the teams that do

those sorts of things are successful.

165

:

And those that don't, I think, tend to have a harder time.

166

:

have to get lucky more.

167

:

Yeah.

168

:

And, like in your experience,

169

:

communicating the distributions, was that something useful or were you more into point

estimates, showing the tails of the distribution?

170

:

How did that work?

171

:

Yeah, I mean, at the end of the day, think point estimates are still important, but it's

good to know how much uncertainty is associated with them.

172

:

that's still a challenge, I think.

173

:

At the end of the day, people want lists to make decisions with.

174

:

It's hard to, it's hard, you're always taught in doing Bayesian inference that, you know,

you carry that whole distribution with you and it's got all the information, but we're

175

:

often still using, you know, point estimates and, uh, but it does, you know, you can, um,

you know, using communication tools, you can convey some of that uncertainty in ways that

176

:

are interpretable and useful to people.

177

:

know, you can say that, you know, two players are, while they're mean is slightly

different.

178

:

There's.

179

:

huge overlap in the uncertainty and they're essentially the same player.

180

:

And so you shouldn't sweat over choosing between one or the other and that sort of thing.

181

:

it's still important to have those avenues for communication and making sure that they're

aware of the limitations of the data.

182

:

These are outputs of models and models are wrong.

183

:

yeah, some people either overly trust or

184

:

under trusts a model and over trusting it can be just as bad as under trusting it in

certain situations.

185

:

You know, it's just the output of a model.

186

:

so everything is, I think a good decision makers where are using it as one piece of the

tool.

187

:

don't let the model make decisions for them.

188

:

use it as a tool to help inform their decisions that involve more than just quantitative

baseball data.

189

:

Yeah.

190

:

Yeah.

191

:

That makes sense.

192

:

And the

193

:

So you were saying that, yeah, basically models are wrong.

194

:

How do you, like, is there a case, you know, all your work with the teams where you, like

a case where a model was really hard to develop for you and you really learned something

195

:

from that because I don't know, you discovered the new method was really useful in that

case and you didn't know about it or you were taking...

196

:

you were looking at the problem from an angle that was not useful and then changing the

way you were thinking about the problem made the difference.

197

:

Basically, is there a big mistake you made at some point that really made you better as a

modeler?

198

:

I made a lot of mistakes.

199

:

Yeah, lots of mistakes.

200

:

Yeah, the mistakes were more common than the successes, I think.

201

:

And yeah, you learn.

202

:

I learned a ton the whole way through.

203

:

you know, you, you come into it with a whole bunch of biases.

204

:

you know, you want to do things and then sort of an ideal, a idealized Bayesian way and

using base best practices.

205

:

And, you know, you know, when he was Gaussian processes for everything, cause they're cool

and useful and flexible.

206

:

But, you know, sometimes you force it a little bit and you, you know, come out with

nonsense and we got to go back to using a spline or, you know, or.

207

:

or a linear model, and that's fine, right?

208

:

There's no shame.

209

:

so some of the answers that were ended up being more effective sometimes were kind of the

hackier approaches that were a bit of a compromise, but they either ran more efficiently

210

:

or they produced estimates that were, there was always a sniff test, right?

211

:

You'd present the results of a model and then,

212

:

front office and the others that were making decisions would say, well, this doesn't make

any sense.

213

:

What's going on here?

214

:

And that's where you learn, right?

215

:

Your model is wrong is where you learn and you make it better.

216

:

So, it was a continual, you know, it was, it's a really nice example of, you know,

iteration and kind of a continual model expansion or sometimes model contraction, simpler

217

:

rather than more complicated.

218

:

But yeah, lots of valuable lessons.

219

:

Yeah, I can definitely resonate with everything you just said.

220

:

From my limited experience for now, I've seen both, I've been able to, sometimes the

Bayesian workflow as you see it on the poster really works awesome.

221

:

you can increase the complexity of kind of linearly.

222

:

But then at some point, so model expansion, as you were saying, but then at some point you

hit a wall and you're like...

223

:

that doesn't work at all.

224

:

Like you don't know why.

225

:

And then you try to sample then everything, you're like everything breaks and so on.

226

:

And then you're like, okay, what's happening?

227

:

And then you have to basically, um, mod to model retraction, as you were saying, where are

you at now?

228

:

I need to remove parts and see what's breaking.

229

:

And also something I find can be costly, you know, uh, intellectually it's like, yeah, I

know I'm making the model.

230

:

less good if I remove that stuff.

231

:

And that can be weird, right?

232

:

You're like, you know, don't want to remove that because I know mathematically it's way

better to have that in the model.

233

:

then, yeah, on paper, but then the model doesn't sample, dude.

234

:

So, know, you have to remove something.

235

:

I find that sometimes, you know, like heart-crushing.

236

:

Yeah, mean, it's never a good idea to...

237

:

hold on too tightly to any particular model or to fall in love with a model.

238

:

And I grew up in my educational career, sort of embracing model selection and model

uncertainty and the value of having more than one model.

239

:

And I think that's very important because you sometimes will over invest in a model,

you'll hold on too tight to it and you should really

240

:

hold them very loosely and be prepared to start it.

241

:

Sometimes you just have to start over again.

242

:

It's like the Concord Fallacy, right?

243

:

Because you've invested all the time and that doesn't mean you shouldn't invest more.

244

:

It might be time to trash it and start over again with something similar or just make it

simpler or strip out your favorite piece that makes it really cool.

245

:

It doesn't work as well.

246

:

Yeah, kill your darlings, right?

247

:

Kill your darlings.

248

:

Yeah, like writers talk a lot about that.

249

:

But yeah, mean that, yeah, some cost fallacy basically.

250

:

And I find something that's helpful for that workflow is Bambi.

251

:

If you work with point C because I'm using that more and more because it's just super easy

and fast to spin up a model.

252

:

Like, especially now that there is splines, there is HSGP.

253

:

I'm using more and more structural time series.

254

:

So I was talking to Tomica Pretorio, and I was like,

255

:

I'm probably going to try and add some structural time-series stuff in there, because like

in PIMC, you can do everything in PIMC, but it's very custom.

256

:

it's like then the sun cost fallacy can be higher, I find, at least for me, because it

takes longer to build the model.

257

:

So then you're like, but I don't want to throw all that structure away and then start

over.

258

:

Well, at some point you'll be able to ask Claude to do it and they'll just write you five

different models and just run them.

259

:

But we're not there yet.

260

:

No, if anybody's ever tried to yeah build Pimcee models with chat GPT that doesn't go so

well Yeah, don't do that at home But yeah, I wonder what that would look like in Bambi the

261

:

structural time series model at what point to you strip Sort of outstrip Bambi's ability

to keep things seem simple.

262

:

Yeah, well, yeah, I'm powerful models.

263

:

Yeah

264

:

That's definitely, I'm always taking Bambi to the limit.

265

:

So I think Tommy hates me because I'm breaking Bambi all the time.

266

:

So he's told me that he hates you.

267

:

Yeah.

268

:

yeah.

269

:

Okay.

270

:

So that's, that, that squares out with his behavior with me.

271

:

But yeah, like, so I'm always messaging you like, Hey, I think I found a bug.

272

:

Blah, blah.

273

:

I think yesterday I found a bug again.

274

:

And so I think one time I sent him the model I was trying to find and he was like, you

know, when I'm developing this stuff for Bambi, I never think that.

275

:

people are going to build such complicated models.

276

:

Like I'm doing that for like just a simple model.

277

:

And yeah, but that's super helpful because then you can like spin up a lot of models.

278

:

can do model comparison.

279

:

then once you have your like kind of final model, then you can spin up the Pimcey model if

you want, and then really custom everything so that sampling is way smoother.

280

:

But for development, you don't need to have, you know, the perfect model.

281

:

with no divergences, no perfect effective sample size problem.

282

:

If you don't have 1000 divergences, you know you're already in a good direction.

283

:

Yeah, or you can start by not doing MCMC sampling from the get-go.

284

:

You can just use FindMap or ADVI and get approximate answers in a shorter period of time.

285

:

Because for me, building the models didn't take long.

286

:

Yeah.

287

:

Waiting for them to finish running.

288

:

Exactly.

289

:

Yeah.

290

:

Yeah.

291

:

Yeah.

292

:

Same for me.

293

:

Same for me.

294

:

And yeah, that's actually a good point that I personally don't use enough, but I should

and I will now that you mentioned it, but yeah.

295

:

So basically in the modeling process, in the development process, you would use a lot of

just fine map or even ADVI because that gives you a full distribution.

296

:

then if the parameters are in the right direction.

297

:

Yeah.

298

:

And then if that's validated, then okay.

299

:

Well, then let's build, you know, the full.

300

:

Yeah.

301

:

The full production round.

302

:

The full production stuff.

303

:

Yeah.

304

:

It's like building a crappy car, but see how far it can go.

305

:

And then yeah, let's build the whole Aston Martin then.

306

:

Yeah, that's pretty good.

307

:

I like that.

308

:

Also something I found more and more useful personally is doing, even before using the

real data, especially because I mean, the data for at the Phillies, I think we're also

309

:

The same really huge.

310

:

so that's a lot of time, as you were saying, waiting for the model.

311

:

so even before that, to validate the structure of the model, what I do more and more is

simulate fake data.

312

:

That Claude can do do pre-code.

313

:

So the recovery.

314

:

And then use the fake data to do the parameter recovery for the model and simulation based

calibration, all that stuff.

315

:

Once you validate that model, well, at least you know that the structure is well, is good.

316

:

If your data actually follow the structure, that's a big caveat.

317

:

That's a big assumption, but at least then I find that when I go to the real data and I

feed the model, when I get the problems, I know that it's not really because the structure

318

:

of the model is wrong.

319

:

And that helps quite a lot because now it's

320

:

You can function the model in a way that's like, so then probably it's because I need to

use the non-centered parameterization here instead of the centered one, or the priors

321

:

maybe are too tight or something like that.

322

:

But it's not like, maybe I'm completely wrong and I should use another structure.

323

:

And that kind of limits the questions you have.

324

:

Can you get Claude to...

325

:

create data that looks like the data that you're actually using?

326

:

Or in my experience, when I do that, it gives me kind of lousy data.

327

:

I wonder if you can get it to look at a data set and say, me a similar data set with known

parameters, if you've tried that or not.

328

:

Oh, no, I've never tried that, but I will.

329

:

That's very fun.

330

:

No, usually I describe the data generated process I want.

331

:

I modify the code because there is always something that it didn't get, like it's not

doing exactly what I want, but at least I have the Bollywood plate and I can just fine

332

:

tune the parameters and so on.

333

:

But yeah, I'll try that next time.

334

:

That's definitely cool.

335

:

Yeah.

336

:

have no idea if it works, but yeah, I've underutilized that so far, but I'm sure it's

coming.

337

:

Actually, Tomica Preto has a very new blog post about that, how to do data simulation with

PIMC.

338

:

So I'll link to that in the show notes because that's very useful, especially for

simulation-based calibration, that's super helpful.

339

:

yeah, definitely we'll do that.

340

:

So yeah, actually, you already mentioned splines, caution processes, stuff like that.

341

:

And I know when we do teachings and so on, and even myself sometimes I'm like,

342

:

I can get into kind of an analysis paralysis, know, where like, well, I don't know, that

kind of data could fit so many structures and models.

343

:

I don't even know where to start.

344

:

know, so how would you like, how do you handle right now in your current job with claim

seal lines, or even when you were working with the Phillies and Yankees, how do you go

345

:

about starting these cases?

346

:

There's two approaches you can start.

347

:

with this simple linear model and then see where things deviate from linear.

348

:

So start simple and work your way to more complex.

349

:

But GPs are always a good starting point because you're not constrained to any functional

form.

350

:

And you can go in the opposite direction, start with a very flexible GP prior.

351

:

And if things look simpler, you can remove the GP and put something in that's easier to

compute.

352

:

But GPs are hard to beat there because you essentially let the data decide.

353

:

There is some skill associated there because sending priors with GPs can be a little

trickier than for a regression model.

354

:

so buyer beware.

355

:

And given how easy they are now to set up and the fact that we have HSGPs.

356

:

that work almost like the newer models anyway.

357

:

I think those are pretty attractive as kind of a starting place.

358

:

Yeah.

359

:

mean, HES GPs have really changed the game for me, at least for GPs.

360

:

That's also really what I like about GPs is that you have very flexible functional form.

361

:

And at the same time, you can put a lot of domain knowledge in the priors.

362

:

Especially because the length scale and the amplitude are interpretable most of the time

for your use case.

363

:

The tricky part is when you're using an inverse link function in the model, then you have

to do the math to get to convert the amplitude and length scale, but that's not too hard.

364

:

And so you can put domain knowledge here and then you let the data guide the functional

form, which is like, yeah.

365

:

really powerful and quite hard to beat.

366

:

And also with HHTP now, it's just way faster and easier to fit, like even in really big

data, which I mean, I've done that already.

367

:

Yeah.

368

:

The only thing it's missing is of course, you don't have all of the kernels available to

you and doing multiplicative and additive kernels is challenging.

369

:

So that will evolve.

370

:

Yeah.

371

:

I mean, even with, with PIMC you can already the, the additive kernels.

372

:

So like we had a tutorial up on the, on the, on the PIMC website that I co-wrote with

Bill, Bill Engels.

373

:

And so in the advanced use case, show use case.

374

:

Additive kernel.

375

:

We show people how to do the additive kernel and also hierarchical HSGP.

376

:

So that's, that's pretty cool.

377

:

You have to be, yeah, you have like, we'll see like, because the covariances in PIMC, are

not vectorized for,

378

:

the number of GPs you have.

379

:

So if you want one covariance per GP, you have to write the power spectral density by

hand, which is a bit challenging.

380

:

But something I have in mind is trying to do a PR on point C for basically to vectorize

the covariance in that dimension so that you can just have covariance shape in broadcast

381

:

automatically.

382

:

But the thing is you have to

383

:

the current, the power spectral density changes like for each kernel.

384

:

That's I, I'm guessing that's why Bill didn't do that out of the box because it takes

time.

385

:

Could be.

386

:

Yeah.

387

:

And I mean, talking about GPS.

388

:

So, well, I try, I'll try to release this episode once the Pi data videos are out so that

we can put our tutorial in the show notes so that people who are really like

389

:

interested in GPs and HSGP, can check that out.

390

:

Something also I think is interesting to talk about is, since we're talking about time

series, structural time series, right?

391

:

Because I had Jesse Grabowski on the show the other day and he's really a structural time

series guy.

392

:

He's like absolutely amazing with that.

393

:

Really a big time series with wizard.

394

:

he did like incredible work on the Pinesy experimental site with the state space module.

395

:

How do you balance these two?

396

:

In my experience, you're getting the structural time series with GPs too.

397

:

So why would you even want or need a structural time series?

398

:

mean, for me, structural time series is all about separating the concern of projecting

quantities of interest from the observation process.

399

:

So that you're wasting time projecting noisy variables forward.

400

:

So you just have these two links, but independent processes whereby you can, in baseball,

for example, you can have a model for whatever contaminates your observations.

401

:

could be issues with a Hawkeye sensor or in the many ways that

402

:

baseball data can be contaminated by the observation process.

403

:

You can separate that from the modeling the changes in the underlying variables that

you're interested in.

404

:

So as much as possible, you're projecting signal forward because projection is hard enough

as it is.

405

:

if you can remove as much of that noise as possible at every step.

406

:

And so yeah, as you say, you can use a GP for that.

407

:

uh, projection piece of it.

408

:

And it's very handy having these, you know, having say multiple length scales that you can

model how the process changes at potentially different timescales.

409

:

Yeah.

410

:

You know, like in baseball, you can have within season variation, you know, as the season

progresses, I get changes in, you know, pitchers velocity and so forth.

411

:

And then there's changes kind of from season to season and there's kind of, you know,

short to mid to long-term career variation and you can do all of that.

412

:

within the context of a GP.

413

:

then, having, it's nice to have these latent GPs like the Hilbert space that then can, you

can put arbitrary, you know, likelihoods on that and deal with skews and multimodal stuff.

414

:

Yeah, definitely.

415

:

mean, and that's something we'll show this afternoon, basically, where we'll have, we'll

fit a model with three GPs, one, we'll use soccer data because well,

416

:

It's the data because it's great exactly.

417

:

but yeah, we'll have a short-term GP, medium-term and long-term where the long-term would

be like an aging curve.

418

:

It's actually interesting to see the GP peak of that herbal-ish curve of the aging curve

of the players.

419

:

You can also definitely see the survivor bias in this because like if I...

420

:

which we'll do this afternoon, we'll feed the GP on a subset of the players just for

efficiency.

421

:

And so you can see that if you do that on the whole dataset, the aging curve will look a

bit different.

422

:

If you do this on the subset of the players, then you'll see that the aging curves picks

up a bit at the right, which is wrong, right?

423

:

A player who is 40 effects.

424

:

Exactly.

425

:

It's like Messi and Ronaldo effects basically.

426

:

Because I have them in the subset that will fit.

427

:

that's funny.

428

:

So basically you could do that.

429

:

That would be kind of the way to do a structural time series with GP's.

430

:

You could have the aging curve would be the trend, I guess.

431

:

Then you have the medium term GP, which is within season effects.

432

:

And then you have your short term covariance kernel, which would basically just pick up

the noise.

433

:

which could be the equivalent to doing kind of a step linear trend in a parametric time

series model where you add some autoregressive component for the residual.

434

:

Yeah, and you also don't necessarily have to estimate the length scales.

435

:

It can be a design decision.

436

:

Where I'm interested, this part of the GP or this GP is specifically for...

437

:

Modeling within season variation for example, and you said I like scale that's appropriate

for that and then you have another like scale for longer term stuff and because like

438

:

scales can be hard to estimate properly anyway without Highly informative priors.

439

:

So yeah, so if you can skip that then all the better and and again, there's something you

can change It's it's sort of a lever that you have to pull.

440

:

Yeah need to yeah.

441

:

Yeah.

442

:

No, it's a very good point because yeah in my experience length scales I think

443

:

Like I must have done one GP model where the length scale was learned, but most of the

time it's basically the prior.

444

:

Yeah.

445

:

think the posterior is very flat across a lot of the interesting like scale variables

values.

446

:

Yeah.

447

:

So, I think in, remember I was talking with Bill about that and I think he told me that

it's also because there is an identifiability with the amplitude.

448

:

And so basically you can only learn one of the two.

449

:

And I don't remember why.

450

:

think it's a multiplicative and identifiable.

451

:

Maybe I remember better than myself.

452

:

I don't, but that makes sense.

453

:

Yeah.

454

:

So basically very cool models.

455

:

And I definitely encourage people to check out the GP video.

456

:

Now, like, so we've talked about baseball in sports quite a bit.

457

:

so like, well, transition out because I don't want to, you know, have all the episode

about that.

458

:

would, but I'm guessing some listeners are like, okay.

459

:

Enough with the squat stuff, So you're not working with the Phillies anymore, actually.

460

:

Now you're working full-time with Pimesy Labs, which I take a bit personally, have to say,

because you joined Pimesy Labs right after I left to work with the Marlins.

461

:

It's not an accident.

462

:

So yeah, I'm guessing, you know, like probably Tommy Capretto told you to do that.

463

:

Quick, get out.

464

:

Get out, Alex is getting into baseball.

465

:

Trace, come here, come here.

466

:

So yeah, basically what are you up to these days now, Trace?

467

:

Yeah, living the dream, right?

468

:

As Thomas would say, we always wanted to have PIMC as our full-time job.

469

:

wanted to see what that was like.

470

:

it does give me more time to work on PIMC related issues, obviously, the two are separate,

but.

471

:

correlated.

472

:

So yeah, I do get a lot more time to spend on that sort of stuff and and just learning,

you know, a lot more of kind of the the business of statistical consulting and

473

:

productionizing, PMC models and and just seeing it in different contexts.

474

:

I mean, I would like to, you know, part of it is

475

:

a desire to do some business development on the sports analytics side and looking at other

sports, maybe outside of baseball.

476

:

And we've talked to a few potential clients there.

477

:

And that's all new to me too, like sitting on sales calls and things like that.

478

:

It's not sure it's something I'm particularly good at, but it's always good to learn.

479

:

yeah, it's great to break out of the baseball bubble, least for a little bit and see what

the rest of the world is like.

480

:

so it's great.

481

:

And as you know, it's a great bunch of, of scientists at PIMC labs and everybody is way

smarter than me and, everybody's very nice and congenial.

482

:

So it's just a nice place to work.

483

:

so it's great, you for, for, you for, for the, the as far as I'm concerned.

484

:

So yeah, and yeah, doing of interesting projects, things I didn't never that I would work

on.

485

:

Uh, so that's always the interesting part is, having a wide variety of clients and, you

know, and, um, there's a lot of opportunity, you know, there's, uh, you have the

486

:

opportunity to learn and need to not be, be afraid of the imposter syndrome and, know, I

can, I can work on modeling bonds or something like that.

487

:

So even though I'm not an expert on it, I can still contribute in a meaningful way.

488

:

So it's nice.

489

:

It's always good to learn.

490

:

Like I'm definitely a lifetime learner and I find I get bored if I'm not learning.

491

:

And it's hard to, I think it would be hard to get bored doing this.

492

:

no, definitely.

493

:

mean, completely agree.

494

:

it's like, yeah, great bunch of folks over there and working with them for the last few

years has been absolutely amazing and great environments.

495

:

Only positive things to say about that.

496

:

Yeah.

497

:

And, you know, we deliver a lot of workshops and tutorials and I like doing those.

498

:

That's kind of the aspect of teaching that I enjoy from my time at Vanderbilt as a

professor.

499

:

You know, you're not grading papers, but you're, you know, you're engaging interesting

people who are keen to learn how to, you know, apply Bayesian methods better or whatever

500

:

the topic has to be.

501

:

So doing, you know, doing a fair bit of

502

:

teaching as well, which is great.

503

:

And then having some time also to, with my PyBC BDFL hat on, trying to apply for grants to

help us increase the rate of development and support and sustainability for the project in

504

:

general.

505

:

yeah, PyBC Labs affords us the opportunity to do that.

506

:

Very synergistic.

507

:

activities.

508

:

And so you were talking about being a lifelong learner, which I guess like all the

listeners identify with.

509

:

What's something you're learning these days, know, technically, like is there a method

you're really interested in, something you're really curious about and you're like, yeah,

510

:

I've always been curious about that, but I know how that works.

511

:

Let me check that out.

512

:

Well, I'm trying to learn pi tensor a little bit better.

513

:

I always knew it kind of on a very superficial level.

514

:

it's nice to be- Pi MC's back end?

515

:

Pi MC's computational back end.

516

:

But more specifically on the methodological side, digging into newer variational inference

methods.

517

:

I think a hot topic of research with sort of the performance constraints of MCMC and-

518

:

Variational inference becomes attractive and I we use it a lot in baseball but some of the

sort of default methods these days tend to be There are compromises and trade-offs and so

519

:

looking into things like Pathfinder and and Normalizing flows and things like that Digging

into some of that.

520

:

It's been really interesting.

521

:

We've got a Google Summer of Code student Michael who's just finished finishing up

522

:

implementing Pathfinder for Prime C.

523

:

So we're working on getting that going.

524

:

So I'm looking forward to kind of improving.

525

:

think the VI side of Prime C has been neglected for a long time.

526

:

Max did a great job way back when, and there hasn't been as much activity on that side of

things.

527

:

I mean, I guess it's also because we have these new algorithms coming here, which are

really, really good.

528

:

And so that means...

529

:

We should probably be able to use Pathfinder directly in a PIMC model in a few months.

530

:

Who knows?

531

:

Yeah, it's early days still, but it's still sort of in a full R &D low place right now.

532

:

So we're not quite sure where it is yet.

533

:

there are, we've got some ongoing work.

534

:

getting them implemented into PyMC and not just relying on the blackjack's one that we

currently had that was kind of underutilized and seeing how PyTensor might be able to

535

:

speed that up.

536

:

then, parallel with that, looking at normalizing flows, think Adrian's been doing some

stuff on the NutPy side, making normalizing flows easier to do.

537

:

So hopefully all of this will fall into place kind of at the same time.

538

:

you'll be able to use that to make VI better.

539

:

Yeah.

540

:

And the idea here would be like always in PIMC, right?

541

:

That would work out of the box.

542

:

So instead of doing PM.sample, you do PM.sample sampler equals pathfinder or...

543

:

PM.fit is what it would be.

544

:

yeah.

545

:

So PM.feed pathfinder or normalizing close.

546

:

And then as usual, you get back your inference data object with the posterior

distributions.

547

:

Is that what...

548

:

that would look like in Yeah, I mean, it's a bit mysterious.

549

:

was sort of working, not working for unknown reasons.

550

:

Just the simple eight schools model was not running properly, but it would work on other

models.

551

:

so Michael's done a good job of kind of digging in to see where things might not be

working.

552

:

it currently runs a lot faster in Stan than PIME-C.

553

:

And why is that the case?

554

:

So again, it's sort of in an R &D.

555

:

place right now and hopefully in the coming months you'll see new functionality appear.

556

:

Yeah.

557

:

Super exciting.

558

:

Yeah.

559

:

I'll definitely use that because as you were saying, that's useful in baseball.

560

:

So I'll definitely do that.

561

:

Something in the meantime that listeners interested in can already use is Bambi is plugged

into BIOX, which is calling Carol's implementation of...

562

:

normalizing flows.

563

:

I don't remember which algorithm it is, but you have a bunch of algorithms available with

BIOX.

564

:

I think normalizing flows, but not Pathfinder or the other way around.

565

:

Anyways, if you go to the BAMBIs website, I'll put that in the show notes.

566

:

There is a notebook demonstrating the alternative samplers and you can already use

normalizing flows with a BAMBi model through BIOX.

567

:

Very cool.

568

:

Very cool.

569

:

Need to dig into Bayou.

570

:

Haven't looked at it in great detail.

571

:

We had normalizing flows in PyMC3 for a time, but they didn't perform very reliably.

572

:

We ended up getting rid of them.

573

:

So hopefully they will be more robust this time.

574

:

Yeah.

575

:

Damn.

576

:

Yeah.

577

:

That's so exciting to have all these different stuff and also being able to plug into

other packages and using the best practices from somewhere and you just plug it in.

578

:

That's incredible.

579

:

Yeah.

580

:

That's where we really want to...

581

:

Get out there and get some grants, some R &D grants to accelerate some of this stuff

because it's really helped in the past with Google Summer of Code students and having kind

582

:

of their time protected so that they can, because the core developers spend a lot of their

time just triaging bugs and making the current functionality work and work better.

583

:

It's good to have resources to help us do.

584

:

innovation and new functionality.

585

:

so hopefully we can get some funding to help us do that.

586

:

Yeah.

587

:

Yeah.

588

:

So folks, you've heard it.

589

:

Like if you have some free time to help us for that, you know, contact Chris or myself, or

if you know about the grant or if you even have money, you just want to give us money, you

590

:

know, well, it's just, we're here.

591

:

That's right.

592

:

Well, that's why we like doing PyaData meetings and hackathons and code sprints because it

brings new

593

:

talent into the fold that some of the attendees don't really even know what Bayesian

methods are or what PIMEC is and they come and hack on the code for a little bit and you

594

:

never know where the next Bill Engels or next Maxime will come from.

595

:

Yeah, definitely.

596

:

That's true.

597

:

And actually, is there, what's your, so that's like the short-term vision for PIMEC.

598

:

What's your medium term vision for the package, is there something in particular you'd

like to see in there, something you'd like to change?

599

:

Well, that's good question because one of things I'm trying to do now is come up with a

draft roadmap and my vision doesn't really matter.

600

:

Like I'm the BDFL, but this is very much a community driven project.

601

:

So I'm interested in what the core developers want to see and what the community, larger

community wants to see out of the package.

602

:

And I know a lot of the core development team barely has time to stop and think about

these sorts of things because of the immediate issues that are being addressed.

603

:

I think that's a question we'll have to answer.

604

:

What do we want to see PMC do in the future as a larger group?

605

:

So a lot of the things that we've already talked about, better improvements to variational

inference and GPU.

606

:

making it easier run stuff on GPUs and getting NutPy closer and closer to the rest of PyMC

and making that easier for people to use.

607

:

We have lot of issues with getting various compilers to work, which makes it hard to

install PyMC using PIP, things like that.

608

:

So maybe getting rid of the C backend and wanting to for all and relying more on the

current backends as kind of defaults.

609

:

Those are the things off the top of my head.

610

:

long list and it's a matter of prioritizing them and getting the resources and the people

available to work on them.

611

:

Yeah, as always.

612

:

So to close this out, Sten, we're going to have to leave to teach the tutorial in a few

minutes, but I'm curious now that you've started to work with labs and talking to

613

:

different clients in different industries.

614

:

What has been your biggest surprise?

615

:

You know something you weren't expecting to see or a use case?

616

:

Biggest surprise?

617

:

Yeah, I don't know.

618

:

My biggest surprise is kind of the number of people that know about Bayesian methods and

want to use Bayesian methods that don't

619

:

have the resources to often do them, which is why they come to PMC Labs.

620

:

So we get so much of our business, not from us going out and knocking on doors, but people

coming to us.

621

:

so I've been pleasantly surprised at kind of how well things have gone and the fact that

we have no shortage of business.

622

:

And that reflects kind of a desire for the analytics community in a variety of different

623

:

settings to use these methods.

624

:

And I think it's kind of been helped along by things like PIMC marketing and these more

than CausalPy, some of the specific packages that PIMC Labs has built to make them, maybe

625

:

make it more obvious to potential users how valuable it is.

626

:

yeah, maybe the biggest surprise for me is kind of how well known it's become, gone from

being a niche, know, had trouble getting

627

:

papers published, academic papers published, if they're a Bayesian without p-values to the

entire, seems like the entirety of industry wanting to use these methods to help them sell

628

:

products or stocks or whatever it is.

629

:

So yeah, been pleasantly surprised at the scope and the breadth of the, that's a guess.

630

:

just call it the community and industry growing the way that it is.

631

:

Yeah.

632

:

It's a very good point.

633

:

Like I remember back in 2017, when I started learning that was really a niche thing where

you had to justify why you were doing, you know, using Bayesian methods.

634

:

And now I feel like the, you know, the fight is basically one where it's like, don't, you

don't really need to convince people to use those methods most of the time, like from time

635

:

to time, but it's, it's really, really less and less.

636

:

It's common to use this method, which is us amazing, of course.

637

:

yeah, I think as you were saying, like at Labs, we've been blessed with a grateful, with

how many, like that many clients who are interested in Yeah, and amazed that, you know,

638

:

it's just a group of talented people and I'm amazed, not even surprised really, because it

is, you know...

639

:

a talented group of developers and data scientists and statisticians at the range of

applications.

640

:

know, a client could come along with in a completely new industry and we can help them in

pretty short order get up and running with a model.

641

:

That's pretty cool.

642

:

Now Fisher, sure.

643

:

Awesome.

644

:

Well, Chris.

645

:

I think I will call you to show, of course, I'll ask you the last two questions, I ask

every guest.

646

:

And yeah, the show you've answered them five years ago, but maybe the answer has changed.

647

:

So first one, if you had unlimited time and resources, which problem would you try to

solve?

648

:

Unlimited time and resources, which problem would I try to solve?

649

:

Ooh.

650

:

And I can't even remember my name either from last time.

651

:

So that's good.

652

:

That means we get another independent draw from the right.

653

:

Exactly.

654

:

And the draw from my mind.

655

:

Gosh.

656

:

unsolved problem.

657

:

assume you want like a Bayesian problem.

658

:

No, no, that could be anything you want.

659

:

No, could be even coming, you know, with a better version of pizza, you know, right, like

whatever you want.

660

:

Yeah, well, I guess with the election shortly in our rearview mirror, would be, you know,

fixing fixing the analysis of polling data.

661

:

Yeah, that may be an impossible.

662

:

That may require unlimited resources, but

663

:

that's still seems not to work very well.

664

:

I would say, yeah, coming up with better ways of extracting information from voters.

665

:

Well, happy to help on that.

666

:

If I can.

667

:

And a second question, if you could have dinner with any great scientific mind, dead,

alive or fictional, who would it be?

668

:

Dead, alive or fictional?

669

:

having dinner.

670

:

Hmm.

671

:

I would have to say, see, and again, I can't remember what I said last time.

672

:

Yeah, me neither.

673

:

okay.

674

:

I'm going to say Bill James, just baseball on a sort of a baseball trajectory, one of the

very early sabermetricians applying quantitative methods to.

675

:

baseball and yeah, I've never met him before and it would be interesting to pick his brain

and what he thinks of the current state of analytics in baseball.

676

:

Yeah, definitely.

677

:

Yeah.

678

:

So I would definitely like to join that dinner.

679

:

So please let me know.

680

:

Yeah.

681

:

Okay.

682

:

He's still over at the Boston Red Sox, right?

683

:

So, yeah.

684

:

Okay.

685

:

I think, I think Tyler, Tyler Birch, who's contributed a bench to Pimes in Bambi.

686

:

think maybe.

687

:

Probably Tyler knows Bill.

688

:

It's like, Tyler, what you're doing?

689

:

be sharing an office right now.

690

:

Exactly.

691

:

Invite Chris, please.

692

:

Awesome.

693

:

Well, thank you so much, for taking the time again.

694

:

See you in five years.

695

:

Yeah.

696

:

See you back in five years.

697

:

know, like French presidential elections, you have your five-year term.

698

:

And then you come back to tell us how amazing you've been.

699

:

Yeah.

700

:

I'll come up with better answers to these questions.

701

:

And I will try to give you the questions in advance this time.

702

:

Awesome.

703

:

Well, thanks, Chris.

704

:

And we'll see you whenever you want on the show.

705

:

Thanks, Alex.

706

:

This has been another episode of Learning Bayesian Statistics.

707

:

Be sure to rate, review and follow the show on your favorite podcatcher and visit

learnbaystats.com for more resources about today's topics as well as access to more

708

:

episodes to help you reach true Bayesian state of mind.

709

:

That's learnbaystats.com.

710

:

Our theme music is Good Bayesian by Baba Brinkman.

711

:

Fit MC Lass and Meghiraam.

712

:

Check out his awesome work at bababrinkman.com.

713

:

I'm your host.

714

:

Alex and Dora.

715

:

can follow me on Twitter at Alex underscore and Dora like the country.

716

:

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

717

:

Thank you so much for listening and for your support.

718

:

You're truly a good Bayesian.

719

:

Change your predictions after taking information.

720

:

And if you're thinking I'll be less than amazing.

721

:

Let's adjust those expectations.

722

:

Let me show you how to be a good daisy Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
125. #125 Bayesian Sports Analytics & The Future of PyMC, with Chris Fonnesbeck
00:58:14
124. #124 State Space Models & Structural Time Series, with Jesse Grabowski
01:35:43
123. #123 BART & The Future of Bayesian Tools, with Osvaldo Martin
01:32:13
122. #122 Learning and Teaching in the Age of AI, with Hugo Bowne-Anderson
01:23:10
121. #121 Exploring Bayesian Structural Equation Modeling, with Nathaniel Forde
01:08:12
120. #120 Innovations in Infectious Disease Modeling, with Liza Semenova & Chris Wymant
01:01:39
119. #119 Causal Inference, Fiction Writing and Career Changes, with Robert Kubinec
01:25:00
118. #118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
00:58:50
117. #117 Unveiling the Power of Bayesian Experimental Design, with Desi Ivanova
01:13:11
115. #115 Using Time Series to Estimate Uncertainty, with Nate Haines
01:39:50
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
113. #113 A Deep Dive into Bayesian Stats, with Alex Andorra, ft. the Super Data Science Podcast
01:30:51
112. #112 Advanced Bayesian Regression, with Tomi Capretto
01:27:18
111. #111 Nerdinsights from the Football Field, with Patrick Ward
01:25:43
110. #110 Unpacking Bayesian Methods in AI with Sam Duffield
01:12:27
108. #108 Modeling Sports & Extracting Player Values, with Paul Sabin
01:18:04
107. #107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt
01:21:37
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
105. #105 The Power of Bayesian Statistics in Glaciology, with Andy Aschwanden & Doug Brinkerhoff
01:15:25
104. #104 Automated Gaussian Processes & Sequential Monte Carlo, with Feras Saad
01:30:47
103. #103 Improving Sampling Algorithms & Prior Elicitation, with Arto Klami
01:14:38
102. #102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle
01:08:53
98. #98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié
01:05:06
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
94. #94 Psychometrics Models & Choosing Priors, with Jonathan Templin
01:06:25
93. #93 A CERN Odyssey, with Kevin Greif
01:49:04
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
90. #90, Demystifying MCMC & Variational Inference, with Charles Margossian
01:37:35
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
86. #86 Exploring Research Synchronous Languages & Hybrid Systems, with Guillaume Baudart
00:58:42
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
6. #6 A principled Bayesian workflow, with Michael Betancourt
01:03:53
11. #11 Taking care of your Hierarchical Models, with Thomas Wiecki
00:58:01
12. #12 Biostatistics and Differential Equations, with Demetri Pananos
00:46:30
14. #14 Hidden Markov Models & Statistical Ecology, with Vianey Leos-Barajas
00:49:01
17. #17 Reparametrize Your Models Automatically, with Maria Gorinova
00:51:30
20. #20 Regression and Other Stories, with Andrew Gelman, Jennifer Hill & Aki Vehtari
01:03:44
21. #21 Gaussian Processes, Bayesian Neural Nets & SIR Models, with Elizaveta Semenova
01:02:11
33. #33 Bayesian Structural Time Series, with Ben Zweig
00:57:49
34. #34 Multilevel Regression, Post-stratification & Missing Data, with Lauren Kennedy
01:12:39
35. #35 The Past, Present & Future of BRMS, with Paul Bürkner
01:07:02
36. #36 Bayesian Non-Parametrics & Developing Turing.jl, with Martin Trapp
01:09:29
37. #37 Prophet, Time Series & Causal Inference, with Sean Taylor
01:06:14
46. #46 Silly & Empowering Statistics, with Chelsea Parlett-Pelleriti
01:13:03
48. #48 Mixed Effects Models & Beautiful Plots, with TJ Mahr
01:01:24
58. #58 Bayesian Modeling and Computation, with Osvaldo Martin, Ravin Kumar and Junpeng Lao
01:09:25
59. #59 Bayesian Modeling in Civil Engineering, with Michael Faber
00:59:13
60. #60 Modeling Dialogues & Languages, with J.P. de Ruiter
01:12:35
61. #61 Why we still use non-Bayesian methods, with EJ Wagenmakers
01:16:44
63. #63 Media Mix Models & Bayes for Marketing, with Luciano Paz
01:14:43
66. #66 Uncertainty Visualization & Usable Stats, with Matthew Kay
01:01:57
68. #68 Probabilistic Machine Learning & Generative Models, with Kevin Murphy
01:05:35
69. #69 Why, When & How to use Bayes Factors, with Jorge Tendeiro
00:53:40
74. #74 Optimizing NUTS and Developing the ZeroSumNormal Distribution, with Adrian Seyboldt
01:12:16
78. #78 Exploring MCMC Sampler Algorithms, with Matt D. Hoffman
01:02:40
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05
82. #82 Sequential Monte Carlo & Bayesian Computation Algorithms, with Nicolas Chopin
01:06:35