Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Takeaways:
Chapters:
00:00 Introduction to the Live Episode
02:55 Meet the Stan Core Developers
05:47 Brian Ward's Journey into Bayesian Statistics
09:10 Charles Margossian's Contributions to Stan
11:49 Recent Projects and Innovations in Stan
15:07 User-Friendly Features and Enhancements
18:11 Understanding Tuples and Their Importance
21:06 Challenges for Beginners in Stan
24:08 Pedagogical Approaches to Bayesian Statistics
30:54 Optimizing Monte Carlo Estimators
32:24 Reimagining Stan's Structure
34:21 The Promise of Automatic Reparameterization
35:49 Exploring BridgeStan
40:29 The Future of Samplers in Stan
43:45 Evaluating New Algorithms
47:01 Specific Algorithms for Unique Problems
50:00 Understanding Model Performance
54:21 The Impact of Stan on Bayesian Research
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke and Robert Flannery.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
This episode is the first of its kind.
2
:Welcome to the very first live episode of the Learning Visions Statistics podcast recorded
,:
3
:Again, I want to thank the whole STANCON committee for their help, trust and support in
organizing this event.
4
:I surely had a blast and I hope
5
:Everybody did.
6
:In this episode, you will hear not about one, but two StandCore developers, Charles
Marcossian and Brian Ward.
7
:They'll tell us all about Stand's future as well as give us some practical advice for
better statistical modeling.
8
:And of course, there is a Q &A session with the audience at the end.
9
:This is Learning Basics Statistics, episode 118.
10
:Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,
the projects, and the people who make it possible.
11
:I'm your host, Alex Andorra.
12
:You can follow me on Twitter at alex-underscore-andorra.
13
:like the country.
14
:For any info about the show, learnbasedats.com is Laplace to be.
15
:Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
16
:That's learnbasedats.com.
17
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
18
:See you around, folks.
19
:and best patient wishes to you all.
20
:And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can
help bring them to life.
21
:Check us out at pimc-labs.com.
22
:Hello my dear patients, today I want to welcome a new patron in the LearnBasedDance
family.
23
:Thank you so much, Rob Flannery, your support truly makes this show possible.
24
:I can't wait to talk to you in the Slack channel and hope that you will enjoy the
exclusive merch coming your way very soon.
25
:Before we start, I have great news for you.
26
:Because if you like live shows, I want to have two new live shows of LBS coming up on
November 7 and November 8 at Piedata, New York.
27
:So if you want to be part of the live experience, join the Q &A's and connect with the
speakers and myself, and also get some pretty cool stickers, well...
28
:You can get your ticket already at pine data dot org slash NYC 2024.
29
:can't wait to see you there.
30
:OK, on to the show now.
31
:So, welcome.
32
:Thank you so much for being here.
33
:You are going to the immense honor and privilege to be the first ever live audience of the
Learning Basics and Statistics podcast.
34
:Thank you.
35
:Of course, as usual, a huge thank you to all the organizers of StandCon.
36
:Charles, of course, thank you so much.
37
:know you worked a lot.
38
:Michael also who organized all of that.
39
:So I think you can give them a big round of applause.
40
:Okay, so let's get started.
41
:So for those of you who don't know me, I'm Alex Endora.
42
:I am an open source developer.
43
:I am actually a PMC core developer.
44
:Am I allowed to say those words here?
45
:That's fine.
46
:Don't worry.
47
:Yes, and very recently started as the senior applied scientist at the Miami Marlins.
48
:So if you're ever in Miami, let me know.
49
:And today we are gonna talk, and yeah, no, of course I am the host and creator of the
Learning Patient Statistics podcast, which is the best show about patient stats.
50
:I think we can say that confidently because it's the only one.
51
:it's not that hard.
52
:But today we have amazing guests with us.
53
:We're gonna talk about everything Stan, today's the nerd panel.
54
:anything you wanted to know about Stan, about samplers, about all the technical stuff
behind Stan.
55
:Why does it take so long to have inline there, for instance, know, stuff like that.
56
:You can ask that.
57
:It's going to be like the last 10 minutes of the show, I think.
58
:But before that, we're going to talk with Brian and Charles.
59
:So I'm going to be without the mic that gives to the room for the rest of the show so that
you can hear from the guys mainly.
60
:So let's start with Brian.
61
:So Brian Ward, you were a Standcore developer, if I understood correctly.
62
:Can you first give you a bit of a background, the origin story of Brian?
63
:How did you end up doing what you're doing?
64
:Because it seems to me that you're doing a lot of
65
:software engineering thing, which is a priori quite far from the Bayesian world.
66
:So how did you end up doing what you're doing today?
67
:Yeah, so I majored in computer science and I sort of came into this from a very software
development angle.
68
:So I sort of was always interested in how things work.
69
:So I learned to program and then I was like, well, how programming languages work?
70
:So I learned about compilers and then I stopped before going any deeper because there are
dragons down there.
71
:But as part of my studies, I started working on a project with a couple of my professors
that was about Stan.
72
:And they were mostly interested in Stan because in their words, it was the probabilistic
programming language that had the most thorough formal documentation of the language and
73
:its semantics.
74
:They really liked that they could form an abstract model of the Stan language.
75
:And so that was my first time ever using a probabilistic programming language.
76
:It was really coming in from that angle.
77
:And then since 2021, I've been working a lot on the STAND compiler, but then also just on,
like you said, general software engineering for the different Python libraries and trying
78
:to improve the installation process on systems like Windows and that sort of thing.
79
:OK.
80
:So we'll get back to that because I think there are a lot of interesting threads here.
81
:But first, let's switch to Charles.
82
:So maybe for the rest of the audience, Charles was already.
83
:in the podcast, he's got the classic episode.
84
:So if you're really interested in Charles' background, you can go and check out his
episode.
85
:But maybe just for now, if you can quickly tell us who you are, how you ended up doing
that.
86
:Yes, I should mention that I am an understudy.
87
:were actually two other stand developers we were hoping to have on this panel.
88
:because of circumstances, I ended up being here.
89
:I'm in very good company and I have a lot of thoughts about the future of Stan, which is
the topic of this conversation.
90
:But essentially, I've been a Stan developer for eight years now.
91
:And I started when I was working in biotech in pharmacometrics where Stan was up and
coming, but it lacked certain features to be used in pharmacometrics modeling.
92
:Notably, know, support for ODE systems, features to model clinical trials.
93
:So my first project for Stan was developing an extension of Stan called Torsten, but also
in the process developed some features that directly appeared in Stan.
94
:For example, the matrix exponential, which is used to solve linear ODE's, the algebraic
solvers.
95
:And then,
96
:I became a statistician, I pursued a PhD in statistics and I continued developing certain
features firsthand, kind of in that theme of implicit functions.
97
:And I think we'll talk a little bit about that.
98
:Nowadays, what I am is a research fellow, which is a glorified postdoc at the Flatiron
Institute, where I'm actually a colleague with Brian.
99
:And I mostly do research.
100
:around Bayesian computation, so that includes Markov chain Monte Carlo, variational
inference, and thinking about probabilistic programming languages today, tomorrow, but
101
:also maybe in five or 10 years, what these might look like.
102
:Yeah, thanks, Charles.
103
:Quick legal announcement that I forgot, of course.
104
:For the questions, we're going to record your voice.
105
:So if you ask a question, you're
106
:consenting to being recorded.
107
:If you don't want your voice to be recorded, just come ask the question afterwards or find
a buddy who is willing to ask the question for you.
108
:And that will be all fine.
109
:So that's that.
110
:Also, write down your questions because we're going to have the Q &A at the end of the
episode.
111
:So let's continue.
112
:Maybe with like that's for both of you.
113
:I'm wondering before we talk about the future,
114
:You guys work with Stan all the time, so you do a lot of things, but what has been your
most exciting recent project involving Stan, of course?
115
:I can go first.
116
:So this is a bit further ago, but one of the first real major, major win for me was adding
tuples to the language.
117
:it's a slightly more advanced type than it previously appeared in Stan.
118
:It had a lot of implementation difficulty, but it was a really big change to the language
in the compiler that finally made it in.
119
:But more recently, working directly on Stan, I've been working on
120
:been trying to add features to try to make it easier to do some of the things that are
built into Stan, especially related to the constraints and the transforms directly in
121
:Stan.
122
:So trying to take some of the magic that's built in out and let you be able to do things
yourself that work much closer to that.
123
:And that's been interesting to think about how to make Stan a language that is easier to
extend for newer people.
124
:this next release will have a
125
:functions that make it a little easier to write your own user-defined transforms that do
the right thing during optimization, for example.
126
:Hmm, okay.
127
:that's cool.
128
:Can you maybe give an example about such a function that people could use in a model?
129
:Sure.
130
:So one thing you might want to do is you might want a simplex parameter, but you want,
because you have some understanding of the posterior geometry, you want an alternative
131
:parameterization.
132
:You want to use softmax or you want to use some other thing than what's built into Stan.
133
:And you can do this right now and it will work almost the same in almost all of the cases.
134
:going forward, we're trying to make it work the same in all of the cases.
135
:We're trying to sort of cover off those last things.
136
:in particular, if you're finding a maximum likelihood estimate, that is done without the
Jacobian adjustment for the change of variables there.
137
:But for the built-in types in STAND, but right now there's no way to have that also happen
for your custom transforms.
138
:But there will be going forward.
139
:Okay, that's really cool.
140
:so I have to admit that a lot of my recent work has been more Stan-adjacent rather than
specific contributions to Stan.
141
:And so I could talk about that, but maybe one of the features that we are hoping to
release soon and that I developed a few years ago, I prototyped a few years ago, was we
142
:wanted to build a nested Laplace approximation inside of Stan.
143
:And actually, we developed one and we had a prototype in 2020.
144
:So that already goes back and we published a paper about that.
145
:And then another year or two later when I wrote my PhD thesis, I had a more thorough
prototype that also released and then we kind of got stuck.
146
:And I can talk a little bit about that, but essentially Steve Braunder who was supposed to
join us today, had something came up, hopefully he'll be there in the next few days.
147
:at StenCon has really been pushing the C++ code and the development and we have this idea
that maybe by the next Sten release we'll actually have that integrated Laplace
148
:approximation and we'll make it available to the users.
149
:And of course there are a lot of interesting things in moving parts that are happening
around these features both from a technical
150
:point of view.
151
:So the automatic differentiation that we had to deploy is, I think, very interesting, very
challenging.
152
:Also, the ways in which, what are the features that we put in our integrated Laplace?
153
:So I don't think it's going to be as performant as the integrated Laplace approximation
that's implemented in Inla.
154
:and I can discuss a little bit what are some of the features we lacked, but we also
focused on what are some unique things that having this integrated Laplace approximation
155
:in Stan can give to the users in terms of modeling capabilities.
156
:And those are things I'm excited about.
157
:And there are going to be a few challenges about using this approximate algorithms, just
as they are whenever you use an approximate algorithm.
158
:And that's going to motivate, you know,
159
:new elements of a Bayesian workflow, new diagnostics, new checks that will have to be
semi-automated, that will have to be very well documented, and that will also need to be
160
:demonstrated.
161
:These are all the pieces you need for users to use an algorithm effectively.
162
:And that's part of the journey between
163
:We have a prototype.
164
:We can publish this in what's considered a top machine learning conference, the paper
appeared in NeurIPS, versus.
165
:I can almost say we have something that's stand worthy.
166
:And the requirements are a little bit orthogonal.
167
:So it's not like one is superior, but there's a lot of extra work that needs to happen.
168
:And that will continue to happen.
169
:Because one of the, I think, open question is when we make a new feature available, how
much responsibility
170
:do we take and how much responsibility do we give to the users?
171
:So maybe those are some of the topics that we can dive into.
172
:But one thing that I'll say is the tuples that Brian mentioned, that was one of the key
technical components that we needed to develop in order to have an interface that's
173
:user-friendly enough to use this integrated Laplace.
174
:Yeah, I love that because
175
:I don't know for you folks, but me, if I hear, yeah, we integrated two poles, I don't
think it's that important.
176
:But then when you talk to the guys who actually code the stuff and implement that, it's a
building block that then unlocks a ton of incredible features and new stuff for users.
177
:Yeah, and we can make that very, very concrete.
178
:Yeah, for sure.
179
:Actually, to give an example.
180
:Well, Brian, how would you define a tuple?
181
:So in type, no, I'm joking.
182
:So a tuple is essentially just a grouping of different types of things.
183
:So the simplest one to think of is like a point in R2, like a xy coordinate.
184
:It's just a tuple of a real number and another real number.
185
:But the nice thing about tuples as compared to like an array is that those don't have to
be the same type.
186
:So for example, in more recent versions of Stan,
187
:there is a function called eigen decompose which gives you a matrix of the eigenvectors
and a vector of the eigenvalues both back to you at the same time.
188
:And so this actually cuts the amount of computation that has to be done in half because in
previous versions you had to call the eigenvectors function and the eigenvalues function
189
:separately and they were repeating some work and now it can just give you this object that
has both at once.
190
:And so that's like.
191
:One of the really useful things of tuples is it lets you have a principal way to talk
about a combination of different types like that.
192
:Yeah, yeah.
193
:And so one place where having this grouping of different types is very useful is in
functionals.
194
:So what's an example of a functional?
195
:DoD solver and stand, it's a functional.
196
:One of its arguments is a function, so the function that defines the right-hand side of
your differential equation.
197
:And then you need to pass.
198
:arguments to that function.
199
:And of course, the user is specifying the function, and so they're going to specify what
are the arguments that we pass to that function.
200
:There was this time where this function needed to have a strict signature.
201
:So we told the user, you're first going to pass the time, the state, then the parameters,
then the real integers, and then the real data and the integer data.
202
:And you have the strict format.
203
:so basically, those are just way of taking the arguments, packing them into a specific
structure, and then inside the OD, you unpack them.
204
:And so not only was this tedious, it can lead you to make your code less efficient if
you're not being careful about distinguishing what's a parameter and what's a data point.
205
:And one experience of that
206
:I had collaborating with applied people, with epidemiologists, so with Julien Rioux.
207
:This was during the pandemic, during the COVID-19.
208
:At some point, Julien reached out to the stand development team and he said he's
developing this really cool model, but right now it takes two, three days to fit, right?
209
:Something like that.
210
:And we're not at the...
211
:level of complexity that we want to be at.
212
:And so I have to give really most of the credit to Ben Bales, who was also a stand
developer at the time.
213
:And we took a look at how the ODE was implemented and how it was coded up and how the
different types were being handled.
214
:And we realized that way more of the arguments that were being passed were parameters than
was necessary.
215
:And once you correct for that, the running time of the model went from two, three days to
two hours.
216
:So not only is that much faster and that's good in terms of reproducibility, that also
means you can then keep developing the model and go to something more complicated.
217
:So having this kind of two poles, well really what it gave us was variational, what's
called variadic arguments, sorry.
218
:That was a big step actually, where now you don't have those strict signatures when you
pass the functionals.
219
:People can really pass different things.
220
:Now for the integrated Laplace, so I realize we haven't really defined what it is, but
basically what I'll say is that there are two functionals that you need to pass.
221
:One is you're defining a likelihood function and the other one is you're defining a
covariance function.
222
:And so we want the users to be able to use variadic arguments for both those functions
that they're defining.
223
:So they're not construed by types.
224
:That way it's not tedious, it's not error prone, or it's not prone to inefficiencies.
225
:And that's why those two poles, to make the code user friendly, to probably decrease the
compute time that users will spend on this algorithm.
226
:That's why that kind of stuff is important.
227
:The power users, they don't need it.
228
:They can handle the strict signatures.
229
:I handle the strict signatures.
230
:No problem.
231
:But once you start using other probabilistic programming languages,
232
:You realize that one of the big strengths of Stan is the attention it gives to users, to
API, how mindful it is from the users.
233
:Other languages, you can tell that it really feels like sometimes they're written for
software engineers.
234
:And the software engineers are the ones who are going to be the best ones at using those
languages.
235
:But I think that that's one of the strengths of Stan.
236
:and that some of the innovations are maybe gonna be less technical or algorithmic,
although those exist, and maybe we'll have time to talk about it, but actually making this
237
:more user-friendly, less error-prone, less inefficiency-prone.
238
:Yeah, and that definitely comes up, and I think it will come up whenever we're working on
new features for Stan.
239
:There's always sort of two users we have in our head.
240
:There's the user who is already at the limit of what Stan can do and wants to fit the next
biggest model, and how can we help that user, but also the user of like, you
241
:they have a relatively small model that they just can't figure out right now and can we
make that user's life easier too?
242
:sometimes they're actually sort fighting each other, but usually we can find features that
actually make both of their lives better, which is like the ideal circumstance.
243
:But by the way, kind of in the spirit of that, apparently most of our Stan users are BRMS
users.
244
:I think that's established, right?
245
:BRMS really gives you this beautiful syntax that people can play with, that people can
reason with.
246
:Personally, I like the Stan language.
247
:That syntax is a bit more explicit.
248
:But even that syntax in the Stan model is a simplification of what Stan is doing under the
hood.
249
:I'll give you a simple example.
250
:You know those tilde statements that you have in the model block, right?
251
:That's because
252
:You know, people like Andrew Galman like reasoning about models in a data-generated
fashion, right?
253
:But really, you know, what's going on under the hood is we're incrementing a log
probability density, right?
254
:So different users function with different level of abstractions, depending on whether
they're statisticians or, you know, more software engineering, maybe ML-oriented people,
255
:or maybe
256
:scientists who primarily reason about covariates, right?
257
:That's where I see one of the big roles that BRMS is playing.
258
:And we need a way that's maintainable, that's, you know, avoid compromises, you know, to
kind of like cater to these different users.
259
:And in fact, we should talk about BridgeStand and a new community of users we're hoping to
reach with.
260
:withstand maybe at some point.
261
:Yeah, I'll add that to the notes.
262
:Good, good.
263
:Yeah, so many questions.
264
:Thank you so much, guys.
265
:think, yeah, something I'd like to pick up.
266
:We'll get back to Inla also at some point.
267
:think it's going to be like the, how do you say, chirurgie in English?
268
:The thread.
269
:The thread, thank you.
270
:The red thread, you can say that.
271
:I don't know.
272
:So it's going to be the thread.
273
:Talking a bit more about the beginners you were talking about and the user who is trying
to get his model work but cannot figure it out yet.
274
:Do you see a common difficulty that these kind of users are having lately, maybe in the
stand forums, things like that?
275
:And maybe you can tell them how to use that right now or maybe tell us what you guys are
doing.
276
:in the coming month to address that kind of obstacles.
277
:I think there are two, and they're sort of different.
278
:So I think a lot of users who are coming from more traditional like R or Python and are
trying to write Stan themselves for the first time, the difficulty of just having a
279
:compiled language at all, both in terms of the extra installation steps, but then also
like dealing with static typing.
280
:And if you're not used to sort of thinking about variables in this way.
281
:And so there are things we've talked about of trying to work on that, but a lot of what
I've invested in is just trying to improve the error messages the compiler gives you and
282
:trying to have them less be like what a compiler engineer knows went wrong and make it
more like what you think went wrong.
283
:But I think the second class that I see, and this is sort of going back to Charles's
point, is I think we have a lot of users who will use a tool like BRMS or Rstan Arm.
284
:and it will get them as far as it gets them and then they want to go a bit further.
285
:But I think the issue is if they've never written any stand code at that point, they ask
BRMS, hey, can you give me your stand code?
286
:And they're given this model that would have taken them several months to write themselves
and now they have no hope.
287
:They're starting off in the deep end already because they already have a very powerful
model that they just want to tune one bit further.
288
:And that's a much harder thing, both in terms of
289
:Software, also pedagogically, I don't know how to handle that.
290
:I don't know if you have more.
291
:I think a bit less about beginners.
292
:No, no, okay, okay, so let me, let me nuance that a little bit.
293
:So I teach workshops, I've had opportunities to teach.
294
:And actually, I think about some fundamental questions that a beginner is likely to ask,
but for which we don't have great answers to.
295
:And I'll give you one example.
296
:For how many iterations should we run Markov chain Monte Carlo?
297
:Right?
298
:That's an elementary question, and it's not an easy one to answer.
299
:especially if you start digging and thinking about what is the optimal length of a Markov
chain?
300
:What is the optimal length of a warm-up phase, of a sampling phase?
301
:What is the number of Markov chains that I should run given some compute that's available
to me?
302
:And then you get into a more fundamental question, which is what is the precision that
people need from their Monte Carlo estimators?
303
:So I asked an audience of scientists, well, what effective sample size do you need?
304
:What summaries of the posterior distribution do you need?
305
:Are you really interested in the expectation value, or do you need the variance, or maybe
you need these quantiles or these other quantiles?
306
:And we have some unfortunate terminology.
307
:People say we're computing the posterior.
308
:That doesn't mean much.
309
:It conveys a good first order intuition, but not a good second order intuition.
310
:I like to say we're probing the posterior.
311
:And then we need to think about what are the properties of the posterior that we're
actually pursuing.
312
:And so then we get into, people ask me, when should I use MCMC or variational inference?
313
:So people criticize variational inference.
314
:say, well, even when you solve the, so what does VI do?
315
:Maybe just as a summary is.
316
:You have a family of approximation, for example, Gaussians.
317
:And then within that family of approximation, it tries to find the best approximation to
your posterior.
318
:And people will dismiss it because they say, look, even if you solve the optimization
problem, at the end of the day, your posterior is not a Gaussian.
319
:So your optimal solution is not good.
320
:It has what's called, what people call an asymptotic bias.
321
:Whereas MCMC, you know that we have enough compute power.
322
:and enough can be a lot, right?
323
:Eventually you will hit arbitrary precision, right?
324
:But now if I think about, I'm trying to probe the posterior, well maybe that Gaussian
approximation does match the expectation value, does match the summary quantities that I'm
325
:interested in.
326
:Maybe it captures the variance, or maybe it captures the entropy, right?
327
:So maybe that is the pedagogical work that
328
:I'm trying to do for beginners with the caveat that I don't have great answers to all
those questions.
329
:I think these are real research topics.
330
:But if I think about one goal, for example, that I would like to achieve, I would like to,
I want it to be part of the workflow.
331
:people are doing work on that.
332
:Aki Vettari is doing great work on that, to only name one person.
333
:Once people figure out this is how precise my Monte Carlo estimators need to be, I want
that to be the input to stand.
334
:And then I want it to run the Markov chains for the right number of iterations in a way
that gives you that precision without wasting too much computational power.
335
:And we're not there yet.
336
:We have promising directions to do that, which also come with their fair share of
challenges.
337
:But yeah, that's the kind of thing I want to do for beginners and for intermediates and
for advanced and for myself.
338
:But yeah, the beginners ask the right questions and the difficult questions.
339
:Okay, thanks Charles.
340
:Nice save.
341
:No, so more seriously, yeah, Brian, was wondering like, so if you had, let's say Stan
Wulham,
342
:He comes to you in a dream and he's like, okay, Brian, you've got one wish to make Stan
better for everybody, including the beginners, Charles.
343
:So what would it be?
344
:This is like a genie powerful wish.
345
:I can rewrite the history of the...
346
:Something that we've talked about again and again, but it would just be such a huge lift.
347
:But if I'm allowed to go back to the start, I think that...
348
:There's been a lot of talk about how the block structure of Stan gives a lot of power, but
it also makes a lot of things limiting.
349
:it's, right now if you want to do a prior predictive check, you oftentimes need a separate
model that looks a little different than the model you're actually writing.
350
:And this is one of the things that's great about BRMS, right, is the single formula can be
turned into all these models at once.
351
:But there has been previous research, so Maria Goranova, Goranova?
352
:She did a master's thesis and a PhD thesis on a tool she called SlickStand, which was a
stand with no blocks.
353
:And so it sort of would automatically, you would write your stand model as you do now, but
without saying what's data and what's parameters, and then you would just give it data,
354
:and it would then figure out, okay, these are the data, these are the parameters, here are
things I can move to generated quantities, and it would sort of be a much more powerful
355
:form of the compiler that would really capture a lot of these ideas, but it would also be
sort of a fundamentally different.
356
:thing than Stan.
357
:If I could really do anything in the world, that would probably be it.
358
:But I don't know if that will ever make it there.
359
:There's a lot of existing stuff that we would have to give up, I think.
360
:Yeah.
361
:I understand.
362
:If you're interested, Mario Gorinoa was in the podcast.
363
:You can go on their website, learnbasedats.com.
364
:There is a small stuff on the right.
365
:On the top, you can...
366
:look for any guests.
367
:So Maria Gorinova, that was a great episode because I think she's also working on
automatic reparameterization, if I remember correctly.
368
:So if you ever had to reparameterize a model, that can be quite frustrating if you're a
beginner because you're like, but it's the same model.
369
:I'm just doing that for the sampler.
370
:And so one of the goals of that is just having the sampler figure that out by itself.
371
:Yeah, and then she also did some interesting work on automatic marginalization where it's
tractable, which was very cool, because that's another, I don't feel confident in my own
372
:ability to marginalize a model off the top of my head, so it's like a, I know that's a
thing that new users hit a lot.
373
:Yeah, yeah, yeah, I mean, you hit that quite a lot, and yeah, if we could automate that at
some point, that'd be absolutely fantastic, yeah.
374
:Charles, I think we've got nine minutes before the Q &As.
375
:So I'm going to give you choice.
376
:No, so we could go back to talk about Inla a bit, because I realize we should have done
something at the beginning, which is defining Inla and telling people why that would be
377
:useful and when.
378
:We can also talk about BridgeStand, but I think, Brian, you can talk about BridgeStand
too.
379
:So your call, Charles.
380
:Let's talk about BridgeStand.
381
:Or let's talk about BridgeStand.
382
:Let's see how fast I can do it.
383
:Maybe we can do both.
384
:Yes and yes.
385
:So Simon's talk earlier mentioned BridgeStand.
386
:And if people aren't familiar, this was something that Edward Raldis, who's a Stand
developer, started a few years ago when he was visiting us in New York.
387
:drives me crazy that I didn't think of this.
388
:Edward deserves so much credit because it was sitting there all this time, but what it
essentially does is it, through a lot of technical mumbo jumbo that you should ask me
389
:about later, it makes it very easy for people to use Stan models outside of Stan's C++
ecosystem.
390
:And so if you have a model in Stan, but you want to use a...
391
:like an algorithm that's only implemented in our package or that you're developing
yourself, it really lets you get the log densities and the gradients with all of the speed
392
:and quality of the Stan Math library, but you can use these Python libraries or these like
experimental things that you're working on.
393
:And so it's our, a lot, we have a paper and it has a few citations already of people who
have been using it to develop new algorithms and like I know a lot of work that Bob has
394
:been doing recently has been using it and so like that's one way we're, especially
395
:One of the things we're thinking of for those users who want to push the edge is new forms
of variational inference and new forms of HMFC.
396
:And it has already been a really huge boon for that research.
397
:Yeah, yeah.
398
:At the Flatiron Institute, we do a lot of algorithmic work on new samplers and new
variational inference.
399
:And we now use BridgeStand all the time.
400
:I'll give you two good reasons and there are probably more but one of them is that gives
us access to Stan's automatic differentiation and if you look at a lot of papers that
401
:evaluate the performance of algorithms they do it not against time but against number of
gradient evaluations because that tends to be the dominant operation computationally and
402
:so now you write your sampler in Python or
403
:maybe an R, or you write your VI in Python in R, but you still get the high performance
from using Stan.
404
:So that's great.
405
:And then the second thing is that means that you can now test those new algorithms that
you've developed in a pretty straightforward way on Stan models and the library of Stan
406
:models, including posterior DB or maybe some other models that you've been using.
407
:And those models are very readable.
408
:It standardizes a little bit the testing framework.
409
:so it has changed my thinking a little bit as someone who works a lot on the Stan
compiler, thinking of Stan not just as its own sort of ecosystem, but also as like a
410
:language for communicating models.
411
:I find it really helpful.
412
:Someone can describe a model in LaTeX up on a slide, but as soon as they show me the Stan
code, I'm like, I get it.
413
:And even if my job now was to go implement it in PyMC or something, I think it's still
helped.
414
:Having this language that is a little bit bigger than itself or a little bit bigger than
it used to be where now, I see Adrian here is in the audience and he has an implementation
415
:of HMC in Rust.
416
:But you can use Stan models with it because of BridgeStan.
417
:it has opened up the, sorry, Adrian's in the back.
418
:But it's opened up the world of things that Stan can be, which is one thing that I think
is very cool.
419
:Yeah, and I think, so when I spoke about the new community of users that I think we're
going to reach is there are people who write their own samplers who have particularly
420
:difficult problems.
421
:And even today, we've had two examples, at least two examples of people who departed from
the traditional samplers that are implemented in Stan, either to implement tempering or to
422
:implement massive parallelization.
423
:And so, you know, I really think that, you know, there is a group of people who for their
problems, you know, like to develop and try out certain samplers.
424
:And, you know, that's also going to drive research for what could be the next default
sampler or variational inference or approximation in Stan.
425
:They are candidates for that.
426
:Although it's true that the more we learn, the more we develop new samplers, the more we
realize how good Nuts is.
427
:But things are going to change over the years.
428
:OK, awesome.
429
:Thanks a lot, guys.
430
:So I still have a ton of questions.
431
:But already, let's open it up to the audience.
432
:Are there already any questions?
433
:Or should I ask one?
434
:OK, perfect.
435
:So, mentioning the new samplers that you guys are developing at the Flatiron and also I
have a lot of guests who come on the show and talk about new samplers, normalizing flows
436
:for instance, Marie-Lou Gabriel was on the show, also Marvin Schmidt, Paul Buechner is
here, he works a lot on bass flow with Marvin Schmidt.
437
:They are doing amortized patient inference.
438
:So I'm really curious how you guys think about that and Stan, basically.
439
:Because most of the time, it's also tied to increasing data sizes.
440
:And so people are looking into new samplers which can adapt to their use case better.
441
:So I'm curious how you guys think about that in the Stan team and what you're thinking of
developing in the coming month about that.
442
:Yeah, I think one of the challenges that these approaches often, sort of one of the
motivating reasons for them is that you can get a wall clock time reduction by just
443
:throwing a massive amount of compute at it with GPUs, which is one place where...
444
:Stan's GPU support is still kind of piecemeal, like we're working on it, but it's sort of
like we can't compete with Google developing Jacks, you know?
445
:And so like, you know, Simon's presentation earlier showed that like on CPU, Stan actually
beats Jacks or BridgeStand, you know, can be faster than Jacks.
446
:But on GPU, we have sort of no hope.
447
:And I think that like, or at least at the moment, no hope.
448
:But I think that's where these approaches become really challenging is like trying to
think of.
449
:And I think it's sort of an almost existential question of like, is Stan just like the CPU
solution, right?
450
:And is something else better?
451
:Because there are things about Stan's like, sort of core design that don't like GPUs.
452
:It's a very expressive language and GPUs really like less expressive languages that are
much more easier to guess what you're gonna do next.
453
:And so I think that is something that, know,
454
:I personally believe there will always be sort of a community of like, know, researchers
working on their laptop or that sort of thing.
455
:And so I think there will always be a place for these like CPU bound implementations.
456
:But yeah, if you can predict that, you can probably make a lot of money.
457
:Charles?
458
:Yeah, I'm going to try and return to the original question, which is, you know,
459
:So there are a lot of algorithms that are being developed and there are a of good ideas
that go into developing these algorithms and there some good experiments and some good
460
:empirical evidence that supports why you might want to use those algorithms.
461
:Nonetheless, 80 to 90 % of the time when I read a paper about a new algorithm, it doesn't
give me enough information as to whether
462
:I should now start using this algorithm to solve my problem.
463
:And there is a, so what does that mean?
464
:That means that usually you need to somehow implement that algorithm and test it yourself
on your own problem, and that's fine, but I think that a lot of these algorithms out there
465
:are not yet battle tested.
466
:And we're kind of in a situation where, okay, we,
467
:maybe we like the prototype and maybe it's promising, do we put in the developer time to
build this in Stan?
468
:And it's a bit of a cycle because once it appears in Stan, then it really gets battle
tested.
469
:And then we get feedback from the community and we can try to learn things about this
algorithm, we can try to improve it.
470
:That's actually what happened to the no U-turn sampler which has evolved since its
original inception.
471
:You know, I'm of the opinion that,
472
:My bar for scientific papers is it presents a good idea and it's thought stimulating.
473
:But I don't think it tells me this is the next thing we should build in Stan.
474
:I think BridgeStan can alleviate some of that because it makes it easier for people to
build implementations that can then be tested in Stan and then we kind of get into battle
475
:testing things.
476
:Maybe someone builds a Python package
477
:that is compatible with BridgeStand and maybe the process becomes instead of the stand
developers, the stand community, brutally evaluating an algorithm before deciding to put
478
:some amount of work, maybe first this package gets used and it's developed by an algorithm
developers.
479
:But this...
480
:This is the broader question of how do algorithms get developed, implemented, and adopted?
481
:And I'll tell you what, another big criterion here is the simplicity of the algorithm.
482
:That plays a huge role into whether an algorithm is adopted by developers, by users, or
not.
483
:So the answer is I don't know.
484
:Yeah, that's always a fine answer.
485
:Any questions?
486
:I'm going to bring one up for my neighbor.
487
:Wait, Perfect.
488
:We needed the mic.
489
:So what do we do about algorithms that are good for specific situations but not good for
other things?
490
:Like so far we've only developed like black box algorithms that we kind of hope work
everywhere.
491
:We don't have any kind of real specific algorithms for anything.
492
:Is there any future for that?
493
:I mean, this is...
494
:I think this is one advantage, so I'm gonna quote the person who just asked the question,
but one thing Bob has said a lot is the reason we don't wanna just put 30 samplers into
495
:Stan is then a lot of practitioners would try all 30 of them and then just report the,
there's an advantage to sort of being a great filter and being very conservative in what
496
:is actually in Stan.
497
:But I do think this is one advantage to making it easier to broaden the ecosystem where
now I think a future for that kind of
498
:algorithm is in a R package or a Python package that can interface with, there are now
existing examples out there of an implementation of an algorithm that has support for Stan
499
:models and PyMC models.
500
:So it can kind of bridge gaps between communities, also sort of, if you have to install a
separate package, that makes it fairly clear that this is for a separate purpose.
501
:And so I think that's what I would say the future is for those.
502
:Yeah, I agree.
503
:Do you have an intuition how easy it is for the Sten compiler to figure out whether a
model is generative and then to be able to sample from it?
504
:I mean, of course we can do it in generative quantities, but it's always awkward to double
code our models.
505
:This is a question that also sort of does expose a bit of my sort of not traditional
statistics background, is that I have never been presented with a definition of like,
506
:generative or graphical model that is precise enough for me to actually answer this
question.
507
:I think that there are definitely easy cases and hard cases.
508
:I suspect that in general it would be impossible, but it's also, I think it's probably
likely that we could have a system where it tries really hard and then if it doesn't
509
:succeed in a minute it gives up or something like that.
510
:There are all these sorts of tricks in the compiler world, but I think that the...
511
:This is another one of these things, kind of like GPU support, that because you can write
basically anything you want, you can also write sort of the worst possible case for this
512
:kind of automated analysis.
513
:an open question I've had for a long time is like, what percentage of STAND models in the
wild are generative or not?
514
:If that number just naturally is 80, 90%, I think then this is like a very fruitful thing.
515
:But if it's like 60, I don't know.
516
:less, I'm not sure.
517
:That's been what I've heard is that it is more like, it is fairly high, yeah, I think it
would be something that's worth looking into, but I would need some handholding on the
518
:statistic modeling side of that, actually.
519
:Sorry, I shouldn't call on people.
520
:Hi, so I have a question about more on the people trying to implement models in Stan.
521
:And say there's a model and it's just, you know, it's taking a very long time.
522
:And people think, well, Stan, you know, they might have some complaints or I say it's too
slow.
523
:But what I found in practice also is I never clear sometimes what parts of my model are
causing the delay.
524
:So what are the slow bits or?
525
:It can either just be like mathematically this is just harder to estimate or there's some
shape of my posterior that's really harder to navigate.
526
:But I don't really get that feedback unless I'm like fixing certain parameters, toying
with other things.
527
:Is there any way to allow, know, give that feedback of, what's causing some issues?
528
:you ever thought about modeling that?
529
:Sorry.
530
:So I remember maybe a year ago, I was actually, I met Andrew Gelman and Meti Morris in
Paris at a cafe.
531
:We just all so happened to be in Paris.
532
:And we started brainstorming.
533
:We had an idea of a research project, which is how much can you learn about your model and
your sampler by running 20 iterations of HMC?
534
:And the idea that, you know, fail fast, learn fast, that, you know, the early iterations
of a Bayesian workflow should be based on that.
535
:And I think that a lot of the statistics literature and the more formal literature, you
know, kind of imagines that, you know, you've done a really good job fitting your model,
536
:you've thrown a lot of computation, you've waited a long time.
537
:And we want to figure out, you know, what are the lessons that you can learn quickly,
right?
538
:So now,
539
:I can talk a little bit from experience and I can give you that, but we kind of want to
make that also part of the workflow and your early iterations that we can learn with fast
540
:approximation.
541
:And then hopefully we'll have a good answer to your question.
542
:There's also a tool for instrumentation.
543
:Yeah, was gonna say, in the immediate sense, there is the ability to profile stand models.
544
:You can write a block that starts with the word profile and then a name, and then you can
turn that on when you're running it, and it will give you a printout of like, the block
545
:named X took this percentage of the time, the block named Y took that percentage, and it
can help you identify at least like, here's the bad line.
546
:Now, it might not help you figure out what you need to do instead.
547
:But that's where I found that there are some real wizards who live on the Stand Forum,
some of whom are in the room and some of whom are completely anonymous and will never meet
548
:them.
549
:But they're super helpful.
550
:if it's a model that you can share, that you can share a snippet of, there is a lot of
human capital.
551
:yeah, automating that and putting that into documentation is an ongoing thing.
552
:Yeah, mean, plus one to the human capital.
553
:And the contributions of everyone here who comes to this conference, who teaches
tutorials, who demonstrates
554
:their models, who shares the documentation, who makes their code open source.
555
:I that's also one of the things that makes a programming language work.
556
:Time for one last question.
557
:So I was thinking, if you go back some decades, 50, 60 years or 48, if you develop a
model, then you have to develop a way to sample from the posterior and stuff like that.
558
:But maybe fast forward to today and maybe my advisor could be thinking, when I was a boy,
I had to write my own sampler.
559
:Now you can have people that can be designing models or new ways to model, observe data,
but they maybe don't have to think too much about that computational side.
560
:So what you think about the effect of Stan and similar languages on opening up this
research in Bayesian modeling to people who maybe are not numerical analysts or stuff like
561
:that.
562
:think you should bring your advisor to Stencon.
563
:Yeah, so...
564
:One way to think about this question is to think about how old Hamiltonian Monte Carlo is.
565
:So the original paper is from 1987.
566
:And yet it was largely unused by the broader scientific community until Stan came out.
567
:And what were the technologies, technological developments that enabled Stan to make
Hamiltonian Monte Carlo
568
:the workhorse of so many scientists.
569
:I that's something worth thinking about.
570
:Though I should say the one exception, the one person who did use HMC through the 90s and
:
571
:But otherwise, the tuning parameters, the control parameters, the requirement to calculate
gradients, that was an obstacle to many people.
572
:And so instead of using HMC, they're using other samplers, which we know perform.
573
:between less well and dramatically less well in many cases.
574
:So I think it's great that we have these black box methods.
575
:But the one nuance that I will say is that the algorithm is not the only thing that's
black boxified and Stan.
576
:The diagnostics, the warning messages, the generation of those things, the fact that these
things are generated automatically.
577
:That's what makes a black box algorithm reliable.
578
:It was the derivatives too.
579
:There wasn't a good auto-div system when we built Stan.
580
:I mentioned gradients, no?
581
:I'll caveat this a bit with the previous question hints at the fact that these things are
never truly black box.
582
:Because when you're facing performance difficulties, when you're at the edge, you do need
to have a fairly sophisticated understanding of what's happening.
583
:If you ever have used the reduce some function in Stan, that is technically like an
implementation detail.
584
:that you are having to exploit to get the speed you need.
585
:And so there's always a fuzzy boundary here, but I think that it does help lower the
barrier to entry, even if the hypothetical ceiling can stay as high as your imagination.
586
:That's true.
587
:We could be more black box.
588
:That's seriously, huh?
589
:I think that people do tweak and manipulate the methods a lot, and they need to understand
some fundamental concepts.
590
:Awesome.
591
:Well, I think we're good.
592
:Thank you so much, folks, for being part of the first live show.
593
:This has been another episode of Learning Bayesian Statistics.
594
:Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbayestats.com for more resources about today's topics, as well as access to more
595
:episodes to help you reach true Bayesian state of mind.
596
:That's learnbayestats.com.
597
:Our theme music is Good Bayesian by Baba Brinkman.
598
:Fit MC Lance and Meghiraam.
599
:Check out his awesome work at bababrinkman.com.
600
:I'm your host.
601
:Alex Andorra.
602
:You can follow me on Twitter at Alex underscore Andorra like the country.
603
:You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
604
:Thank you so much for listening and for your support.
605
:You're truly a good Bayesian.
606
:Change your predictions after taking information in and if you're thinking I'll be less
than amazing.
607
:Let's adjust those expectations.
608
:Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation