Artwork for podcast Learning Bayesian Statistics
#118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
Business & Data Science Episode 11830th October 2024 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 00:58:50

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

  • User experience is crucial for the adoption of Stan.
  • Recent innovations include adding tuples to the Stan language, new features and improved error messages.
  • Tuples allow for more efficient data handling in Stan.
  • Beginners often struggle with the compiled nature of Stan.
  • Improving error messages is crucial for user experience.
  • BridgeStan allows for integration with other programming languages and makes it very easy for people to use Stan models.
  • Community engagement is vital for the development of Stan.
  • New samplers are being developed to enhance performance.
  • The future of Stan includes more user-friendly features.

Chapters:

00:00 Introduction to the Live Episode

02:55 Meet the Stan Core Developers

05:47 Brian Ward's Journey into Bayesian Statistics

09:10 Charles Margossian's Contributions to Stan

11:49 Recent Projects and Innovations in Stan

15:07 User-Friendly Features and Enhancements

18:11 Understanding Tuples and Their Importance

21:06 Challenges for Beginners in Stan

24:08 Pedagogical Approaches to Bayesian Statistics

30:54 Optimizing Monte Carlo Estimators

32:24 Reimagining Stan's Structure

34:21 The Promise of Automatic Reparameterization

35:49 Exploring BridgeStan

40:29 The Future of Samplers in Stan

43:45 Evaluating New Algorithms

47:01 Specific Algorithms for Unique Problems

50:00 Understanding Model Performance

54:21 The Impact of Stan on Bayesian Research

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke and Robert Flannery.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

This episode is the first of its kind.

2

:

Welcome to the very first live episode of the Learning Visions Statistics podcast recorded

,:

3

:

Again, I want to thank the whole STANCON committee for their help, trust and support in

organizing this event.

4

:

I surely had a blast and I hope

5

:

Everybody did.

6

:

In this episode, you will hear not about one, but two StandCore developers, Charles

Marcossian and Brian Ward.

7

:

They'll tell us all about Stand's future as well as give us some practical advice for

better statistical modeling.

8

:

And of course, there is a Q &A session with the audience at the end.

9

:

This is Learning Basics Statistics, episode 118.

10

:

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,

the projects, and the people who make it possible.

11

:

I'm your host, Alex Andorra.

12

:

You can follow me on Twitter at alex-underscore-andorra.

13

:

like the country.

14

:

For any info about the show, learnbasedats.com is Laplace to be.

15

:

Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on

Patreon, everything is in there.

16

:

That's learnbasedats.com.

17

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

18

:

See you around, folks.

19

:

and best patient wishes to you all.

20

:

And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can

help bring them to life.

21

:

Check us out at pimc-labs.com.

22

:

Hello my dear patients, today I want to welcome a new patron in the LearnBasedDance

family.

23

:

Thank you so much, Rob Flannery, your support truly makes this show possible.

24

:

I can't wait to talk to you in the Slack channel and hope that you will enjoy the

exclusive merch coming your way very soon.

25

:

Before we start, I have great news for you.

26

:

Because if you like live shows, I want to have two new live shows of LBS coming up on

November 7 and November 8 at Piedata, New York.

27

:

So if you want to be part of the live experience, join the Q &A's and connect with the

speakers and myself, and also get some pretty cool stickers, well...

28

:

You can get your ticket already at pine data dot org slash NYC 2024.

29

:

can't wait to see you there.

30

:

OK, on to the show now.

31

:

So, welcome.

32

:

Thank you so much for being here.

33

:

You are going to the immense honor and privilege to be the first ever live audience of the

Learning Basics and Statistics podcast.

34

:

Thank you.

35

:

Of course, as usual, a huge thank you to all the organizers of StandCon.

36

:

Charles, of course, thank you so much.

37

:

know you worked a lot.

38

:

Michael also who organized all of that.

39

:

So I think you can give them a big round of applause.

40

:

Okay, so let's get started.

41

:

So for those of you who don't know me, I'm Alex Endora.

42

:

I am an open source developer.

43

:

I am actually a PMC core developer.

44

:

Am I allowed to say those words here?

45

:

That's fine.

46

:

Don't worry.

47

:

Yes, and very recently started as the senior applied scientist at the Miami Marlins.

48

:

So if you're ever in Miami, let me know.

49

:

And today we are gonna talk, and yeah, no, of course I am the host and creator of the

Learning Patient Statistics podcast, which is the best show about patient stats.

50

:

I think we can say that confidently because it's the only one.

51

:

it's not that hard.

52

:

But today we have amazing guests with us.

53

:

We're gonna talk about everything Stan, today's the nerd panel.

54

:

anything you wanted to know about Stan, about samplers, about all the technical stuff

behind Stan.

55

:

Why does it take so long to have inline there, for instance, know, stuff like that.

56

:

You can ask that.

57

:

It's going to be like the last 10 minutes of the show, I think.

58

:

But before that, we're going to talk with Brian and Charles.

59

:

So I'm going to be without the mic that gives to the room for the rest of the show so that

you can hear from the guys mainly.

60

:

So let's start with Brian.

61

:

So Brian Ward, you were a Standcore developer, if I understood correctly.

62

:

Can you first give you a bit of a background, the origin story of Brian?

63

:

How did you end up doing what you're doing?

64

:

Because it seems to me that you're doing a lot of

65

:

software engineering thing, which is a priori quite far from the Bayesian world.

66

:

So how did you end up doing what you're doing today?

67

:

Yeah, so I majored in computer science and I sort of came into this from a very software

development angle.

68

:

So I sort of was always interested in how things work.

69

:

So I learned to program and then I was like, well, how programming languages work?

70

:

So I learned about compilers and then I stopped before going any deeper because there are

dragons down there.

71

:

But as part of my studies, I started working on a project with a couple of my professors

that was about Stan.

72

:

And they were mostly interested in Stan because in their words, it was the probabilistic

programming language that had the most thorough formal documentation of the language and

73

:

its semantics.

74

:

They really liked that they could form an abstract model of the Stan language.

75

:

And so that was my first time ever using a probabilistic programming language.

76

:

It was really coming in from that angle.

77

:

And then since 2021, I've been working a lot on the STAND compiler, but then also just on,

like you said, general software engineering for the different Python libraries and trying

78

:

to improve the installation process on systems like Windows and that sort of thing.

79

:

OK.

80

:

So we'll get back to that because I think there are a lot of interesting threads here.

81

:

But first, let's switch to Charles.

82

:

So maybe for the rest of the audience, Charles was already.

83

:

in the podcast, he's got the classic episode.

84

:

So if you're really interested in Charles' background, you can go and check out his

episode.

85

:

But maybe just for now, if you can quickly tell us who you are, how you ended up doing

that.

86

:

Yes, I should mention that I am an understudy.

87

:

were actually two other stand developers we were hoping to have on this panel.

88

:

because of circumstances, I ended up being here.

89

:

I'm in very good company and I have a lot of thoughts about the future of Stan, which is

the topic of this conversation.

90

:

But essentially, I've been a Stan developer for eight years now.

91

:

And I started when I was working in biotech in pharmacometrics where Stan was up and

coming, but it lacked certain features to be used in pharmacometrics modeling.

92

:

Notably, know, support for ODE systems, features to model clinical trials.

93

:

So my first project for Stan was developing an extension of Stan called Torsten, but also

in the process developed some features that directly appeared in Stan.

94

:

For example, the matrix exponential, which is used to solve linear ODE's, the algebraic

solvers.

95

:

And then,

96

:

I became a statistician, I pursued a PhD in statistics and I continued developing certain

features firsthand, kind of in that theme of implicit functions.

97

:

And I think we'll talk a little bit about that.

98

:

Nowadays, what I am is a research fellow, which is a glorified postdoc at the Flatiron

Institute, where I'm actually a colleague with Brian.

99

:

And I mostly do research.

100

:

around Bayesian computation, so that includes Markov chain Monte Carlo, variational

inference, and thinking about probabilistic programming languages today, tomorrow, but

101

:

also maybe in five or 10 years, what these might look like.

102

:

Yeah, thanks, Charles.

103

:

Quick legal announcement that I forgot, of course.

104

:

For the questions, we're going to record your voice.

105

:

So if you ask a question, you're

106

:

consenting to being recorded.

107

:

If you don't want your voice to be recorded, just come ask the question afterwards or find

a buddy who is willing to ask the question for you.

108

:

And that will be all fine.

109

:

So that's that.

110

:

Also, write down your questions because we're going to have the Q &A at the end of the

episode.

111

:

So let's continue.

112

:

Maybe with like that's for both of you.

113

:

I'm wondering before we talk about the future,

114

:

You guys work with Stan all the time, so you do a lot of things, but what has been your

most exciting recent project involving Stan, of course?

115

:

I can go first.

116

:

So this is a bit further ago, but one of the first real major, major win for me was adding

tuples to the language.

117

:

it's a slightly more advanced type than it previously appeared in Stan.

118

:

It had a lot of implementation difficulty, but it was a really big change to the language

in the compiler that finally made it in.

119

:

But more recently, working directly on Stan, I've been working on

120

:

been trying to add features to try to make it easier to do some of the things that are

built into Stan, especially related to the constraints and the transforms directly in

121

:

Stan.

122

:

So trying to take some of the magic that's built in out and let you be able to do things

yourself that work much closer to that.

123

:

And that's been interesting to think about how to make Stan a language that is easier to

extend for newer people.

124

:

this next release will have a

125

:

functions that make it a little easier to write your own user-defined transforms that do

the right thing during optimization, for example.

126

:

Hmm, okay.

127

:

that's cool.

128

:

Can you maybe give an example about such a function that people could use in a model?

129

:

Sure.

130

:

So one thing you might want to do is you might want a simplex parameter, but you want,

because you have some understanding of the posterior geometry, you want an alternative

131

:

parameterization.

132

:

You want to use softmax or you want to use some other thing than what's built into Stan.

133

:

And you can do this right now and it will work almost the same in almost all of the cases.

134

:

going forward, we're trying to make it work the same in all of the cases.

135

:

We're trying to sort of cover off those last things.

136

:

in particular, if you're finding a maximum likelihood estimate, that is done without the

Jacobian adjustment for the change of variables there.

137

:

But for the built-in types in STAND, but right now there's no way to have that also happen

for your custom transforms.

138

:

But there will be going forward.

139

:

Okay, that's really cool.

140

:

so I have to admit that a lot of my recent work has been more Stan-adjacent rather than

specific contributions to Stan.

141

:

And so I could talk about that, but maybe one of the features that we are hoping to

release soon and that I developed a few years ago, I prototyped a few years ago, was we

142

:

wanted to build a nested Laplace approximation inside of Stan.

143

:

And actually, we developed one and we had a prototype in 2020.

144

:

So that already goes back and we published a paper about that.

145

:

And then another year or two later when I wrote my PhD thesis, I had a more thorough

prototype that also released and then we kind of got stuck.

146

:

And I can talk a little bit about that, but essentially Steve Braunder who was supposed to

join us today, had something came up, hopefully he'll be there in the next few days.

147

:

at StenCon has really been pushing the C++ code and the development and we have this idea

that maybe by the next Sten release we'll actually have that integrated Laplace

148

:

approximation and we'll make it available to the users.

149

:

And of course there are a lot of interesting things in moving parts that are happening

around these features both from a technical

150

:

point of view.

151

:

So the automatic differentiation that we had to deploy is, I think, very interesting, very

challenging.

152

:

Also, the ways in which, what are the features that we put in our integrated Laplace?

153

:

So I don't think it's going to be as performant as the integrated Laplace approximation

that's implemented in Inla.

154

:

and I can discuss a little bit what are some of the features we lacked, but we also

focused on what are some unique things that having this integrated Laplace approximation

155

:

in Stan can give to the users in terms of modeling capabilities.

156

:

And those are things I'm excited about.

157

:

And there are going to be a few challenges about using this approximate algorithms, just

as they are whenever you use an approximate algorithm.

158

:

And that's going to motivate, you know,

159

:

new elements of a Bayesian workflow, new diagnostics, new checks that will have to be

semi-automated, that will have to be very well documented, and that will also need to be

160

:

demonstrated.

161

:

These are all the pieces you need for users to use an algorithm effectively.

162

:

And that's part of the journey between

163

:

We have a prototype.

164

:

We can publish this in what's considered a top machine learning conference, the paper

appeared in NeurIPS, versus.

165

:

I can almost say we have something that's stand worthy.

166

:

And the requirements are a little bit orthogonal.

167

:

So it's not like one is superior, but there's a lot of extra work that needs to happen.

168

:

And that will continue to happen.

169

:

Because one of the, I think, open question is when we make a new feature available, how

much responsibility

170

:

do we take and how much responsibility do we give to the users?

171

:

So maybe those are some of the topics that we can dive into.

172

:

But one thing that I'll say is the tuples that Brian mentioned, that was one of the key

technical components that we needed to develop in order to have an interface that's

173

:

user-friendly enough to use this integrated Laplace.

174

:

Yeah, I love that because

175

:

I don't know for you folks, but me, if I hear, yeah, we integrated two poles, I don't

think it's that important.

176

:

But then when you talk to the guys who actually code the stuff and implement that, it's a

building block that then unlocks a ton of incredible features and new stuff for users.

177

:

Yeah, and we can make that very, very concrete.

178

:

Yeah, for sure.

179

:

Actually, to give an example.

180

:

Well, Brian, how would you define a tuple?

181

:

So in type, no, I'm joking.

182

:

So a tuple is essentially just a grouping of different types of things.

183

:

So the simplest one to think of is like a point in R2, like a xy coordinate.

184

:

It's just a tuple of a real number and another real number.

185

:

But the nice thing about tuples as compared to like an array is that those don't have to

be the same type.

186

:

So for example, in more recent versions of Stan,

187

:

there is a function called eigen decompose which gives you a matrix of the eigenvectors

and a vector of the eigenvalues both back to you at the same time.

188

:

And so this actually cuts the amount of computation that has to be done in half because in

previous versions you had to call the eigenvectors function and the eigenvalues function

189

:

separately and they were repeating some work and now it can just give you this object that

has both at once.

190

:

And so that's like.

191

:

One of the really useful things of tuples is it lets you have a principal way to talk

about a combination of different types like that.

192

:

Yeah, yeah.

193

:

And so one place where having this grouping of different types is very useful is in

functionals.

194

:

So what's an example of a functional?

195

:

DoD solver and stand, it's a functional.

196

:

One of its arguments is a function, so the function that defines the right-hand side of

your differential equation.

197

:

And then you need to pass.

198

:

arguments to that function.

199

:

And of course, the user is specifying the function, and so they're going to specify what

are the arguments that we pass to that function.

200

:

There was this time where this function needed to have a strict signature.

201

:

So we told the user, you're first going to pass the time, the state, then the parameters,

then the real integers, and then the real data and the integer data.

202

:

And you have the strict format.

203

:

so basically, those are just way of taking the arguments, packing them into a specific

structure, and then inside the OD, you unpack them.

204

:

And so not only was this tedious, it can lead you to make your code less efficient if

you're not being careful about distinguishing what's a parameter and what's a data point.

205

:

And one experience of that

206

:

I had collaborating with applied people, with epidemiologists, so with Julien Rioux.

207

:

This was during the pandemic, during the COVID-19.

208

:

At some point, Julien reached out to the stand development team and he said he's

developing this really cool model, but right now it takes two, three days to fit, right?

209

:

Something like that.

210

:

And we're not at the...

211

:

level of complexity that we want to be at.

212

:

And so I have to give really most of the credit to Ben Bales, who was also a stand

developer at the time.

213

:

And we took a look at how the ODE was implemented and how it was coded up and how the

different types were being handled.

214

:

And we realized that way more of the arguments that were being passed were parameters than

was necessary.

215

:

And once you correct for that, the running time of the model went from two, three days to

two hours.

216

:

So not only is that much faster and that's good in terms of reproducibility, that also

means you can then keep developing the model and go to something more complicated.

217

:

So having this kind of two poles, well really what it gave us was variational, what's

called variadic arguments, sorry.

218

:

That was a big step actually, where now you don't have those strict signatures when you

pass the functionals.

219

:

People can really pass different things.

220

:

Now for the integrated Laplace, so I realize we haven't really defined what it is, but

basically what I'll say is that there are two functionals that you need to pass.

221

:

One is you're defining a likelihood function and the other one is you're defining a

covariance function.

222

:

And so we want the users to be able to use variadic arguments for both those functions

that they're defining.

223

:

So they're not construed by types.

224

:

That way it's not tedious, it's not error prone, or it's not prone to inefficiencies.

225

:

And that's why those two poles, to make the code user friendly, to probably decrease the

compute time that users will spend on this algorithm.

226

:

That's why that kind of stuff is important.

227

:

The power users, they don't need it.

228

:

They can handle the strict signatures.

229

:

I handle the strict signatures.

230

:

No problem.

231

:

But once you start using other probabilistic programming languages,

232

:

You realize that one of the big strengths of Stan is the attention it gives to users, to

API, how mindful it is from the users.

233

:

Other languages, you can tell that it really feels like sometimes they're written for

software engineers.

234

:

And the software engineers are the ones who are going to be the best ones at using those

languages.

235

:

But I think that that's one of the strengths of Stan.

236

:

and that some of the innovations are maybe gonna be less technical or algorithmic,

although those exist, and maybe we'll have time to talk about it, but actually making this

237

:

more user-friendly, less error-prone, less inefficiency-prone.

238

:

Yeah, and that definitely comes up, and I think it will come up whenever we're working on

new features for Stan.

239

:

There's always sort of two users we have in our head.

240

:

There's the user who is already at the limit of what Stan can do and wants to fit the next

biggest model, and how can we help that user, but also the user of like, you

241

:

they have a relatively small model that they just can't figure out right now and can we

make that user's life easier too?

242

:

sometimes they're actually sort fighting each other, but usually we can find features that

actually make both of their lives better, which is like the ideal circumstance.

243

:

But by the way, kind of in the spirit of that, apparently most of our Stan users are BRMS

users.

244

:

I think that's established, right?

245

:

BRMS really gives you this beautiful syntax that people can play with, that people can

reason with.

246

:

Personally, I like the Stan language.

247

:

That syntax is a bit more explicit.

248

:

But even that syntax in the Stan model is a simplification of what Stan is doing under the

hood.

249

:

I'll give you a simple example.

250

:

You know those tilde statements that you have in the model block, right?

251

:

That's because

252

:

You know, people like Andrew Galman like reasoning about models in a data-generated

fashion, right?

253

:

But really, you know, what's going on under the hood is we're incrementing a log

probability density, right?

254

:

So different users function with different level of abstractions, depending on whether

they're statisticians or, you know, more software engineering, maybe ML-oriented people,

255

:

or maybe

256

:

scientists who primarily reason about covariates, right?

257

:

That's where I see one of the big roles that BRMS is playing.

258

:

And we need a way that's maintainable, that's, you know, avoid compromises, you know, to

kind of like cater to these different users.

259

:

And in fact, we should talk about BridgeStand and a new community of users we're hoping to

reach with.

260

:

withstand maybe at some point.

261

:

Yeah, I'll add that to the notes.

262

:

Good, good.

263

:

Yeah, so many questions.

264

:

Thank you so much, guys.

265

:

think, yeah, something I'd like to pick up.

266

:

We'll get back to Inla also at some point.

267

:

think it's going to be like the, how do you say, chirurgie in English?

268

:

The thread.

269

:

The thread, thank you.

270

:

The red thread, you can say that.

271

:

I don't know.

272

:

So it's going to be the thread.

273

:

Talking a bit more about the beginners you were talking about and the user who is trying

to get his model work but cannot figure it out yet.

274

:

Do you see a common difficulty that these kind of users are having lately, maybe in the

stand forums, things like that?

275

:

And maybe you can tell them how to use that right now or maybe tell us what you guys are

doing.

276

:

in the coming month to address that kind of obstacles.

277

:

I think there are two, and they're sort of different.

278

:

So I think a lot of users who are coming from more traditional like R or Python and are

trying to write Stan themselves for the first time, the difficulty of just having a

279

:

compiled language at all, both in terms of the extra installation steps, but then also

like dealing with static typing.

280

:

And if you're not used to sort of thinking about variables in this way.

281

:

And so there are things we've talked about of trying to work on that, but a lot of what

I've invested in is just trying to improve the error messages the compiler gives you and

282

:

trying to have them less be like what a compiler engineer knows went wrong and make it

more like what you think went wrong.

283

:

But I think the second class that I see, and this is sort of going back to Charles's

point, is I think we have a lot of users who will use a tool like BRMS or Rstan Arm.

284

:

and it will get them as far as it gets them and then they want to go a bit further.

285

:

But I think the issue is if they've never written any stand code at that point, they ask

BRMS, hey, can you give me your stand code?

286

:

And they're given this model that would have taken them several months to write themselves

and now they have no hope.

287

:

They're starting off in the deep end already because they already have a very powerful

model that they just want to tune one bit further.

288

:

And that's a much harder thing, both in terms of

289

:

Software, also pedagogically, I don't know how to handle that.

290

:

I don't know if you have more.

291

:

I think a bit less about beginners.

292

:

No, no, okay, okay, so let me, let me nuance that a little bit.

293

:

So I teach workshops, I've had opportunities to teach.

294

:

And actually, I think about some fundamental questions that a beginner is likely to ask,

but for which we don't have great answers to.

295

:

And I'll give you one example.

296

:

For how many iterations should we run Markov chain Monte Carlo?

297

:

Right?

298

:

That's an elementary question, and it's not an easy one to answer.

299

:

especially if you start digging and thinking about what is the optimal length of a Markov

chain?

300

:

What is the optimal length of a warm-up phase, of a sampling phase?

301

:

What is the number of Markov chains that I should run given some compute that's available

to me?

302

:

And then you get into a more fundamental question, which is what is the precision that

people need from their Monte Carlo estimators?

303

:

So I asked an audience of scientists, well, what effective sample size do you need?

304

:

What summaries of the posterior distribution do you need?

305

:

Are you really interested in the expectation value, or do you need the variance, or maybe

you need these quantiles or these other quantiles?

306

:

And we have some unfortunate terminology.

307

:

People say we're computing the posterior.

308

:

That doesn't mean much.

309

:

It conveys a good first order intuition, but not a good second order intuition.

310

:

I like to say we're probing the posterior.

311

:

And then we need to think about what are the properties of the posterior that we're

actually pursuing.

312

:

And so then we get into, people ask me, when should I use MCMC or variational inference?

313

:

So people criticize variational inference.

314

:

say, well, even when you solve the, so what does VI do?

315

:

Maybe just as a summary is.

316

:

You have a family of approximation, for example, Gaussians.

317

:

And then within that family of approximation, it tries to find the best approximation to

your posterior.

318

:

And people will dismiss it because they say, look, even if you solve the optimization

problem, at the end of the day, your posterior is not a Gaussian.

319

:

So your optimal solution is not good.

320

:

It has what's called, what people call an asymptotic bias.

321

:

Whereas MCMC, you know that we have enough compute power.

322

:

and enough can be a lot, right?

323

:

Eventually you will hit arbitrary precision, right?

324

:

But now if I think about, I'm trying to probe the posterior, well maybe that Gaussian

approximation does match the expectation value, does match the summary quantities that I'm

325

:

interested in.

326

:

Maybe it captures the variance, or maybe it captures the entropy, right?

327

:

So maybe that is the pedagogical work that

328

:

I'm trying to do for beginners with the caveat that I don't have great answers to all

those questions.

329

:

I think these are real research topics.

330

:

But if I think about one goal, for example, that I would like to achieve, I would like to,

I want it to be part of the workflow.

331

:

people are doing work on that.

332

:

Aki Vettari is doing great work on that, to only name one person.

333

:

Once people figure out this is how precise my Monte Carlo estimators need to be, I want

that to be the input to stand.

334

:

And then I want it to run the Markov chains for the right number of iterations in a way

that gives you that precision without wasting too much computational power.

335

:

And we're not there yet.

336

:

We have promising directions to do that, which also come with their fair share of

challenges.

337

:

But yeah, that's the kind of thing I want to do for beginners and for intermediates and

for advanced and for myself.

338

:

But yeah, the beginners ask the right questions and the difficult questions.

339

:

Okay, thanks Charles.

340

:

Nice save.

341

:

No, so more seriously, yeah, Brian, was wondering like, so if you had, let's say Stan

Wulham,

342

:

He comes to you in a dream and he's like, okay, Brian, you've got one wish to make Stan

better for everybody, including the beginners, Charles.

343

:

So what would it be?

344

:

This is like a genie powerful wish.

345

:

I can rewrite the history of the...

346

:

Something that we've talked about again and again, but it would just be such a huge lift.

347

:

But if I'm allowed to go back to the start, I think that...

348

:

There's been a lot of talk about how the block structure of Stan gives a lot of power, but

it also makes a lot of things limiting.

349

:

it's, right now if you want to do a prior predictive check, you oftentimes need a separate

model that looks a little different than the model you're actually writing.

350

:

And this is one of the things that's great about BRMS, right, is the single formula can be

turned into all these models at once.

351

:

But there has been previous research, so Maria Goranova, Goranova?

352

:

She did a master's thesis and a PhD thesis on a tool she called SlickStand, which was a

stand with no blocks.

353

:

And so it sort of would automatically, you would write your stand model as you do now, but

without saying what's data and what's parameters, and then you would just give it data,

354

:

and it would then figure out, okay, these are the data, these are the parameters, here are

things I can move to generated quantities, and it would sort of be a much more powerful

355

:

form of the compiler that would really capture a lot of these ideas, but it would also be

sort of a fundamentally different.

356

:

thing than Stan.

357

:

If I could really do anything in the world, that would probably be it.

358

:

But I don't know if that will ever make it there.

359

:

There's a lot of existing stuff that we would have to give up, I think.

360

:

Yeah.

361

:

I understand.

362

:

If you're interested, Mario Gorinoa was in the podcast.

363

:

You can go on their website, learnbasedats.com.

364

:

There is a small stuff on the right.

365

:

On the top, you can...

366

:

look for any guests.

367

:

So Maria Gorinova, that was a great episode because I think she's also working on

automatic reparameterization, if I remember correctly.

368

:

So if you ever had to reparameterize a model, that can be quite frustrating if you're a

beginner because you're like, but it's the same model.

369

:

I'm just doing that for the sampler.

370

:

And so one of the goals of that is just having the sampler figure that out by itself.

371

:

Yeah, and then she also did some interesting work on automatic marginalization where it's

tractable, which was very cool, because that's another, I don't feel confident in my own

372

:

ability to marginalize a model off the top of my head, so it's like a, I know that's a

thing that new users hit a lot.

373

:

Yeah, yeah, yeah, I mean, you hit that quite a lot, and yeah, if we could automate that at

some point, that'd be absolutely fantastic, yeah.

374

:

Charles, I think we've got nine minutes before the Q &As.

375

:

So I'm going to give you choice.

376

:

No, so we could go back to talk about Inla a bit, because I realize we should have done

something at the beginning, which is defining Inla and telling people why that would be

377

:

useful and when.

378

:

We can also talk about BridgeStand, but I think, Brian, you can talk about BridgeStand

too.

379

:

So your call, Charles.

380

:

Let's talk about BridgeStand.

381

:

Or let's talk about BridgeStand.

382

:

Let's see how fast I can do it.

383

:

Maybe we can do both.

384

:

Yes and yes.

385

:

So Simon's talk earlier mentioned BridgeStand.

386

:

And if people aren't familiar, this was something that Edward Raldis, who's a Stand

developer, started a few years ago when he was visiting us in New York.

387

:

drives me crazy that I didn't think of this.

388

:

Edward deserves so much credit because it was sitting there all this time, but what it

essentially does is it, through a lot of technical mumbo jumbo that you should ask me

389

:

about later, it makes it very easy for people to use Stan models outside of Stan's C++

ecosystem.

390

:

And so if you have a model in Stan, but you want to use a...

391

:

like an algorithm that's only implemented in our package or that you're developing

yourself, it really lets you get the log densities and the gradients with all of the speed

392

:

and quality of the Stan Math library, but you can use these Python libraries or these like

experimental things that you're working on.

393

:

And so it's our, a lot, we have a paper and it has a few citations already of people who

have been using it to develop new algorithms and like I know a lot of work that Bob has

394

:

been doing recently has been using it and so like that's one way we're, especially

395

:

One of the things we're thinking of for those users who want to push the edge is new forms

of variational inference and new forms of HMFC.

396

:

And it has already been a really huge boon for that research.

397

:

Yeah, yeah.

398

:

At the Flatiron Institute, we do a lot of algorithmic work on new samplers and new

variational inference.

399

:

And we now use BridgeStand all the time.

400

:

I'll give you two good reasons and there are probably more but one of them is that gives

us access to Stan's automatic differentiation and if you look at a lot of papers that

401

:

evaluate the performance of algorithms they do it not against time but against number of

gradient evaluations because that tends to be the dominant operation computationally and

402

:

so now you write your sampler in Python or

403

:

maybe an R, or you write your VI in Python in R, but you still get the high performance

from using Stan.

404

:

So that's great.

405

:

And then the second thing is that means that you can now test those new algorithms that

you've developed in a pretty straightforward way on Stan models and the library of Stan

406

:

models, including posterior DB or maybe some other models that you've been using.

407

:

And those models are very readable.

408

:

It standardizes a little bit the testing framework.

409

:

so it has changed my thinking a little bit as someone who works a lot on the Stan

compiler, thinking of Stan not just as its own sort of ecosystem, but also as like a

410

:

language for communicating models.

411

:

I find it really helpful.

412

:

Someone can describe a model in LaTeX up on a slide, but as soon as they show me the Stan

code, I'm like, I get it.

413

:

And even if my job now was to go implement it in PyMC or something, I think it's still

helped.

414

:

Having this language that is a little bit bigger than itself or a little bit bigger than

it used to be where now, I see Adrian here is in the audience and he has an implementation

415

:

of HMC in Rust.

416

:

But you can use Stan models with it because of BridgeStan.

417

:

it has opened up the, sorry, Adrian's in the back.

418

:

But it's opened up the world of things that Stan can be, which is one thing that I think

is very cool.

419

:

Yeah, and I think, so when I spoke about the new community of users that I think we're

going to reach is there are people who write their own samplers who have particularly

420

:

difficult problems.

421

:

And even today, we've had two examples, at least two examples of people who departed from

the traditional samplers that are implemented in Stan, either to implement tempering or to

422

:

implement massive parallelization.

423

:

And so, you know, I really think that, you know, there is a group of people who for their

problems, you know, like to develop and try out certain samplers.

424

:

And, you know, that's also going to drive research for what could be the next default

sampler or variational inference or approximation in Stan.

425

:

They are candidates for that.

426

:

Although it's true that the more we learn, the more we develop new samplers, the more we

realize how good Nuts is.

427

:

But things are going to change over the years.

428

:

OK, awesome.

429

:

Thanks a lot, guys.

430

:

So I still have a ton of questions.

431

:

But already, let's open it up to the audience.

432

:

Are there already any questions?

433

:

Or should I ask one?

434

:

OK, perfect.

435

:

So, mentioning the new samplers that you guys are developing at the Flatiron and also I

have a lot of guests who come on the show and talk about new samplers, normalizing flows

436

:

for instance, Marie-Lou Gabriel was on the show, also Marvin Schmidt, Paul Buechner is

here, he works a lot on bass flow with Marvin Schmidt.

437

:

They are doing amortized patient inference.

438

:

So I'm really curious how you guys think about that and Stan, basically.

439

:

Because most of the time, it's also tied to increasing data sizes.

440

:

And so people are looking into new samplers which can adapt to their use case better.

441

:

So I'm curious how you guys think about that in the Stan team and what you're thinking of

developing in the coming month about that.

442

:

Yeah, I think one of the challenges that these approaches often, sort of one of the

motivating reasons for them is that you can get a wall clock time reduction by just

443

:

throwing a massive amount of compute at it with GPUs, which is one place where...

444

:

Stan's GPU support is still kind of piecemeal, like we're working on it, but it's sort of

like we can't compete with Google developing Jacks, you know?

445

:

And so like, you know, Simon's presentation earlier showed that like on CPU, Stan actually

beats Jacks or BridgeStand, you know, can be faster than Jacks.

446

:

But on GPU, we have sort of no hope.

447

:

And I think that like, or at least at the moment, no hope.

448

:

But I think that's where these approaches become really challenging is like trying to

think of.

449

:

And I think it's sort of an almost existential question of like, is Stan just like the CPU

solution, right?

450

:

And is something else better?

451

:

Because there are things about Stan's like, sort of core design that don't like GPUs.

452

:

It's a very expressive language and GPUs really like less expressive languages that are

much more easier to guess what you're gonna do next.

453

:

And so I think that is something that, know,

454

:

I personally believe there will always be sort of a community of like, know, researchers

working on their laptop or that sort of thing.

455

:

And so I think there will always be a place for these like CPU bound implementations.

456

:

But yeah, if you can predict that, you can probably make a lot of money.

457

:

Charles?

458

:

Yeah, I'm going to try and return to the original question, which is, you know,

459

:

So there are a lot of algorithms that are being developed and there are a of good ideas

that go into developing these algorithms and there some good experiments and some good

460

:

empirical evidence that supports why you might want to use those algorithms.

461

:

Nonetheless, 80 to 90 % of the time when I read a paper about a new algorithm, it doesn't

give me enough information as to whether

462

:

I should now start using this algorithm to solve my problem.

463

:

And there is a, so what does that mean?

464

:

That means that usually you need to somehow implement that algorithm and test it yourself

on your own problem, and that's fine, but I think that a lot of these algorithms out there

465

:

are not yet battle tested.

466

:

And we're kind of in a situation where, okay, we,

467

:

maybe we like the prototype and maybe it's promising, do we put in the developer time to

build this in Stan?

468

:

And it's a bit of a cycle because once it appears in Stan, then it really gets battle

tested.

469

:

And then we get feedback from the community and we can try to learn things about this

algorithm, we can try to improve it.

470

:

That's actually what happened to the no U-turn sampler which has evolved since its

original inception.

471

:

You know, I'm of the opinion that,

472

:

My bar for scientific papers is it presents a good idea and it's thought stimulating.

473

:

But I don't think it tells me this is the next thing we should build in Stan.

474

:

I think BridgeStan can alleviate some of that because it makes it easier for people to

build implementations that can then be tested in Stan and then we kind of get into battle

475

:

testing things.

476

:

Maybe someone builds a Python package

477

:

that is compatible with BridgeStand and maybe the process becomes instead of the stand

developers, the stand community, brutally evaluating an algorithm before deciding to put

478

:

some amount of work, maybe first this package gets used and it's developed by an algorithm

developers.

479

:

But this...

480

:

This is the broader question of how do algorithms get developed, implemented, and adopted?

481

:

And I'll tell you what, another big criterion here is the simplicity of the algorithm.

482

:

That plays a huge role into whether an algorithm is adopted by developers, by users, or

not.

483

:

So the answer is I don't know.

484

:

Yeah, that's always a fine answer.

485

:

Any questions?

486

:

I'm going to bring one up for my neighbor.

487

:

Wait, Perfect.

488

:

We needed the mic.

489

:

So what do we do about algorithms that are good for specific situations but not good for

other things?

490

:

Like so far we've only developed like black box algorithms that we kind of hope work

everywhere.

491

:

We don't have any kind of real specific algorithms for anything.

492

:

Is there any future for that?

493

:

I mean, this is...

494

:

I think this is one advantage, so I'm gonna quote the person who just asked the question,

but one thing Bob has said a lot is the reason we don't wanna just put 30 samplers into

495

:

Stan is then a lot of practitioners would try all 30 of them and then just report the,

there's an advantage to sort of being a great filter and being very conservative in what

496

:

is actually in Stan.

497

:

But I do think this is one advantage to making it easier to broaden the ecosystem where

now I think a future for that kind of

498

:

algorithm is in a R package or a Python package that can interface with, there are now

existing examples out there of an implementation of an algorithm that has support for Stan

499

:

models and PyMC models.

500

:

So it can kind of bridge gaps between communities, also sort of, if you have to install a

separate package, that makes it fairly clear that this is for a separate purpose.

501

:

And so I think that's what I would say the future is for those.

502

:

Yeah, I agree.

503

:

Do you have an intuition how easy it is for the Sten compiler to figure out whether a

model is generative and then to be able to sample from it?

504

:

I mean, of course we can do it in generative quantities, but it's always awkward to double

code our models.

505

:

This is a question that also sort of does expose a bit of my sort of not traditional

statistics background, is that I have never been presented with a definition of like,

506

:

generative or graphical model that is precise enough for me to actually answer this

question.

507

:

I think that there are definitely easy cases and hard cases.

508

:

I suspect that in general it would be impossible, but it's also, I think it's probably

likely that we could have a system where it tries really hard and then if it doesn't

509

:

succeed in a minute it gives up or something like that.

510

:

There are all these sorts of tricks in the compiler world, but I think that the...

511

:

This is another one of these things, kind of like GPU support, that because you can write

basically anything you want, you can also write sort of the worst possible case for this

512

:

kind of automated analysis.

513

:

an open question I've had for a long time is like, what percentage of STAND models in the

wild are generative or not?

514

:

If that number just naturally is 80, 90%, I think then this is like a very fruitful thing.

515

:

But if it's like 60, I don't know.

516

:

less, I'm not sure.

517

:

That's been what I've heard is that it is more like, it is fairly high, yeah, I think it

would be something that's worth looking into, but I would need some handholding on the

518

:

statistic modeling side of that, actually.

519

:

Sorry, I shouldn't call on people.

520

:

Hi, so I have a question about more on the people trying to implement models in Stan.

521

:

And say there's a model and it's just, you know, it's taking a very long time.

522

:

And people think, well, Stan, you know, they might have some complaints or I say it's too

slow.

523

:

But what I found in practice also is I never clear sometimes what parts of my model are

causing the delay.

524

:

So what are the slow bits or?

525

:

It can either just be like mathematically this is just harder to estimate or there's some

shape of my posterior that's really harder to navigate.

526

:

But I don't really get that feedback unless I'm like fixing certain parameters, toying

with other things.

527

:

Is there any way to allow, know, give that feedback of, what's causing some issues?

528

:

you ever thought about modeling that?

529

:

Sorry.

530

:

So I remember maybe a year ago, I was actually, I met Andrew Gelman and Meti Morris in

Paris at a cafe.

531

:

We just all so happened to be in Paris.

532

:

And we started brainstorming.

533

:

We had an idea of a research project, which is how much can you learn about your model and

your sampler by running 20 iterations of HMC?

534

:

And the idea that, you know, fail fast, learn fast, that, you know, the early iterations

of a Bayesian workflow should be based on that.

535

:

And I think that a lot of the statistics literature and the more formal literature, you

know, kind of imagines that, you know, you've done a really good job fitting your model,

536

:

you've thrown a lot of computation, you've waited a long time.

537

:

And we want to figure out, you know, what are the lessons that you can learn quickly,

right?

538

:

So now,

539

:

I can talk a little bit from experience and I can give you that, but we kind of want to

make that also part of the workflow and your early iterations that we can learn with fast

540

:

approximation.

541

:

And then hopefully we'll have a good answer to your question.

542

:

There's also a tool for instrumentation.

543

:

Yeah, was gonna say, in the immediate sense, there is the ability to profile stand models.

544

:

You can write a block that starts with the word profile and then a name, and then you can

turn that on when you're running it, and it will give you a printout of like, the block

545

:

named X took this percentage of the time, the block named Y took that percentage, and it

can help you identify at least like, here's the bad line.

546

:

Now, it might not help you figure out what you need to do instead.

547

:

But that's where I found that there are some real wizards who live on the Stand Forum,

some of whom are in the room and some of whom are completely anonymous and will never meet

548

:

them.

549

:

But they're super helpful.

550

:

if it's a model that you can share, that you can share a snippet of, there is a lot of

human capital.

551

:

yeah, automating that and putting that into documentation is an ongoing thing.

552

:

Yeah, mean, plus one to the human capital.

553

:

And the contributions of everyone here who comes to this conference, who teaches

tutorials, who demonstrates

554

:

their models, who shares the documentation, who makes their code open source.

555

:

I that's also one of the things that makes a programming language work.

556

:

Time for one last question.

557

:

So I was thinking, if you go back some decades, 50, 60 years or 48, if you develop a

model, then you have to develop a way to sample from the posterior and stuff like that.

558

:

But maybe fast forward to today and maybe my advisor could be thinking, when I was a boy,

I had to write my own sampler.

559

:

Now you can have people that can be designing models or new ways to model, observe data,

but they maybe don't have to think too much about that computational side.

560

:

So what you think about the effect of Stan and similar languages on opening up this

research in Bayesian modeling to people who maybe are not numerical analysts or stuff like

561

:

that.

562

:

think you should bring your advisor to Stencon.

563

:

Yeah, so...

564

:

One way to think about this question is to think about how old Hamiltonian Monte Carlo is.

565

:

So the original paper is from 1987.

566

:

And yet it was largely unused by the broader scientific community until Stan came out.

567

:

And what were the technologies, technological developments that enabled Stan to make

Hamiltonian Monte Carlo

568

:

the workhorse of so many scientists.

569

:

I that's something worth thinking about.

570

:

Though I should say the one exception, the one person who did use HMC through the 90s and

:

571

:

But otherwise, the tuning parameters, the control parameters, the requirement to calculate

gradients, that was an obstacle to many people.

572

:

And so instead of using HMC, they're using other samplers, which we know perform.

573

:

between less well and dramatically less well in many cases.

574

:

So I think it's great that we have these black box methods.

575

:

But the one nuance that I will say is that the algorithm is not the only thing that's

black boxified and Stan.

576

:

The diagnostics, the warning messages, the generation of those things, the fact that these

things are generated automatically.

577

:

That's what makes a black box algorithm reliable.

578

:

It was the derivatives too.

579

:

There wasn't a good auto-div system when we built Stan.

580

:

I mentioned gradients, no?

581

:

I'll caveat this a bit with the previous question hints at the fact that these things are

never truly black box.

582

:

Because when you're facing performance difficulties, when you're at the edge, you do need

to have a fairly sophisticated understanding of what's happening.

583

:

If you ever have used the reduce some function in Stan, that is technically like an

implementation detail.

584

:

that you are having to exploit to get the speed you need.

585

:

And so there's always a fuzzy boundary here, but I think that it does help lower the

barrier to entry, even if the hypothetical ceiling can stay as high as your imagination.

586

:

That's true.

587

:

We could be more black box.

588

:

That's seriously, huh?

589

:

I think that people do tweak and manipulate the methods a lot, and they need to understand

some fundamental concepts.

590

:

Awesome.

591

:

Well, I think we're good.

592

:

Thank you so much, folks, for being part of the first live show.

593

:

This has been another episode of Learning Bayesian Statistics.

594

:

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbayestats.com for more resources about today's topics, as well as access to more

595

:

episodes to help you reach true Bayesian state of mind.

596

:

That's learnbayestats.com.

597

:

Our theme music is Good Bayesian by Baba Brinkman.

598

:

Fit MC Lance and Meghiraam.

599

:

Check out his awesome work at bababrinkman.com.

600

:

I'm your host.

601

:

Alex Andorra.

602

:

You can follow me on Twitter at Alex underscore Andorra like the country.

603

:

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

604

:

Thank you so much for listening and for your support.

605

:

You're truly a good Bayesian.

606

:

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

607

:

Let's adjust those expectations.

608

:

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
118. #118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
00:58:50
116. #116 Mastering Soccer Analytics, with Ravi Ramineni
01:32:46
115. #115 Using Time Series to Estimate Uncertainty, with Nate Haines
01:39:50
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
113. #113 A Deep Dive into Bayesian Stats, with Alex Andorra, ft. the Super Data Science Podcast
01:30:51
112. #112 Advanced Bayesian Regression, with Tomi Capretto
01:27:18
108. #108 Modeling Sports & Extracting Player Values, with Paul Sabin
01:18:04
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
85. #85 A Brief History of Sports Analytics, with Jim Albert
01:06:10
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
2. #2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck
00:43:37
3. #3.1 What is Probabilistic Programming & Why use it, with Colin Carroll
00:32:33
bonus #3.2 How to use Bayes in industry, with Colin Carroll
00:32:06
5. #5 How to use Bayes in the biomedical industry, with Eric Ma
00:46:37
8. #8 Bayesian Inference for Software Engineers, with Max Sklar
00:48:41
11. #11 Taking care of your Hierarchical Models, with Thomas Wiecki
00:58:01
22. #22 Eliciting Priors and Doing Bayesian Inference at Scale, with Avi Bryant
01:06:55
23. #23 Bayesian Stats in Business and Marketing Analytics, with Elea McDonnel Feit
00:59:05
32. #32 Getting involved into Bayesian Stats & Open-Source Development, with Peadar Coyle
00:53:04
33. #33 Bayesian Structural Time Series, with Ben Zweig
00:57:49
58. #58 Bayesian Modeling and Computation, with Osvaldo Martin, Ravin Kumar and Junpeng Lao
01:09:25
63. #63 Media Mix Models & Bayes for Marketing, with Luciano Paz
01:14:43
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05