Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Takeaways:
Chapters:
00:00 Introduction to Osvaldo Martin and Bayesian Statistics
08:12 Exploring Bayesian Additive Regression Trees (BART)
18:45 Prior Elicitation and the PreliZ Package
29:56 Teaching Bayesian Statistics and Future Directions
45:59 Exploring Prior Predictive Distributions
52:08 Interactive Modeling with PreliZ
54:06 The Evolution of ArviZ
01:01:23 Advancements in ArviZ 1.0
01:06:20 Educational Initiatives in Bayesian Statistics
01:12:33 The Future of Bayesian Methods
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire and Mike Loncaric.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
Today, I'm excited to host again the great Osvaldo Martin.
2
:Osvaldo collaborates with numerous open source projects including Aviz, Pimcee, Bambi,
PimceeBart and Prilis, helping shape the tools that make patient methods accessible and
3
:powerful.
4
:He's also an educator teaching patient statistics at UNSAM in Argentina and a problem
solver for businesses.
5
:In this episode, Osvaldo shares his insights on Bayesian Additive Regression Trees, or
BARTs, explaining their versatility as non-parametric models that are quick to use and
6
:effective for variable importance analysis.
7
:Osvaldo also introduces Prelease, a package aimed at improving prior elicitation in
Bayesian modeling, and explains how it enhances the Bayesian workflow, particularly in
8
:education.
9
:From interactive
10
:Learning tools to future developments in Pimsevart and Arviz, Osvaldo highlights the
importance of making Bayesian methods more intuitive and accessible for both students and
11
:practitioners.
12
:This is always a pleasure to have Osvaldo on the show because he does so many things and
he's a dear friend and you will also hear from a special surprise guest during the episode
13
:but I am not gonna spoil you the surprise.
14
:This is Learning Bayesian Statistics.
15
:Episode 1-2-3, recorded October 17, 2024.
16
:Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
17
:I'm your host, Alex Andorra.
18
:You can follow me on Twitter at alex-underscore-andorra.
19
:like the country.
20
:For any info about the show, learnbasedats.com is the last to be.
21
:Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
22
:That's learnbasedats.com.
23
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
24
:See you around, folks.
25
:and best patient wishes to you all.
26
:And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
27
:Check us out at pimc-labs.com.
28
:Hello my dear patients!
29
:Today I wanna thank Mike Longkarick and Corey Abshire for joining our Patreon in the full
posterity or higher.
30
:Mike and Corey, thank you so much for your support.
31
:This truly makes the show possible, you know, I pay for editing with your contributions
and I make sure you like what you're hearing and seeing and I can improve the show and
32
:keep investing in it and the quality of it, so thank you so much.
33
:and I can't wait to talk all stuff base in the snack channel with you.
34
:See you there guys, I hope you will enjoy your merch and now onto the show.
35
:Osvaldo Martin, welcome back to Learning Vision Statistics.
36
:Thank you much.
37
:It's a pleasure to here.
38
:Last time you were on the show, well...
39
:Do we have t-shirts or some gift for us?
40
:Yes, you're gonna get a bottle of wine but when it's your 10th appearance, so...
41
:Sounds like I'm, you know, inventing the rules, but...
42
:No, I'm not, I'm not.
43
:And so yeah, last time you were here was for episode 58, March 21st, 2022.
44
:And you came to talk with Ramin Kumar and Jun Bang Lao about your new book at the time,
Bayesian Modeling and Compensation in Python.
45
:I guess we'll talk a bit about that again today.
46
:We'll reference it at least because that's a really good book and I think it's still quite
useful.
47
:of course, the very first time you were here was the very first
48
:Learn Bayestats episode, episode 1.
49
:When, well, we talked about lots of things and your background in open source, in Bayes,
in bioinformatics.
50
:So yeah, lots of things happened in the mid time.
51
:So I thought it would be good to have you back on the show because you always do so many
things as well though.
52
:So, for...
53
:The listeners maybe tell us what you're doing nowadays and what's new basically in your
professional life since you were last on the show.
54
:okay, let's try because I'm in a transition time so it's not super easy to set what I'm
doing at the moment.
55
:Well, last time I was here, I worked for CONNESET, the National Scientific and Technical
Council.
56
:I'm not to work there anymore.
57
:So I resigned.
58
:But I'm still teaching at the university.
59
:It's Universidad Nacional de San Martin.
60
:It's a university in Buenos Aires.
61
:They have a data science program.
62
:So I'm teaching their vision statistics.
63
:And the good thing is not the first course in patient statistics, it's actually the second
course.
64
:That's super weird for me and probably for many people that you are the second person
talking about patient statistics to students.
65
:like weird, but I am.
66
:So it's very interesting that you mentioned something and say, yeah, we already saw that.
67
:That's a new experience to me.
68
:I have been also doing some consulting for Pianzi Labs, doing some educational material
for intuitive base.
69
:Probably we're going to talk about that.
70
:And of course I keep doing work for open source.
71
:That is actually my main motivation to wake up every day.
72
:At least work wise, my main motivation.
73
:Yeah, I was gonna say I'm sure Romina, Arille and Bruno are gonna be happy to hear that.
74
:Yeah, So yeah, basically that's what I'm doing now.
75
:Yeah, so quite a lot of things as listeners might have understood.
76
:Well, I'll try to touch about a lot of those topics.
77
:One I think that is going to be very interesting to start with is Bart, so Bayesian
Additive Regression Trees, because that's one of the things you've developed quite a lot
78
:between the publishing of the book in 2022 and now, because you've actually spearheaded
basically a sub-package from Piemce called Piemce Bart.
79
:especially to Bayesian Additive Regression Trees with PyMC.
80
:Maybe, what do you think?
81
:Should we...
82
:Yeah, let's define very quickly what bot models are and why...
83
:Maybe why you split up PyMC BART from the main package PyMC.
84
:Okay.
85
:Bayesian Additive Regression Trees BART is...
86
:non-parametric model, it's a Bayesian model.
87
:And the main idea is that you are trying to approximate functions by summing, by adding
trees.
88
:The number of trees is something that you have to decide.
89
:Usually people use something like 15, 100, 200 trees.
90
:That's what, important thing is that
91
:those many trees, that's a single point, it's a single function.
92
:And then you're going to have like a distribution, because your version, you are going to
have a distribution of some of trees.
93
:And that's what is a PyMC BART is computing.
94
:The main motivation to have PyMC BART was not only bringing BART models to PyMC.
95
:It was also trying to make them really work for probabilistic programming languages.
96
:If you check the...
97
:literature about bark you're going to see that most of the time people discuss bark as
standalone models So you have like a package that can fit one or two variation of a bark
98
:model and then there is a new paper describing a new variation and sometimes there is a
package for that and so on so on so it's like very this this thing that you have one model
99
:and then you have
100
:inference method tailored to that model.
101
:So what I wanted to do with PIMC part was having something that is completely independent
or at least try to do that.
102
:So essentially that you can mix part models with whatever you want.
103
:the main thing is probably that you want to switch between likelihoods.
104
:So if you want, you can have a normal likelihood that is typical thing.
105
:but you can have a gamma likelihood or whatever negative binomial whatever you need and
you can mix part models with linear models or maybe Gaussian processes or whatever you
106
:want in a single model.
107
:So that's the main motivation to create PMC part.
108
:And one of the reasons to split it was probably just matter of organization in a sense
because
109
:It's not only, I mean, essentially it's super easy to have BART models into PMC because
you already use a lot of the technology, a lot of the tools that are already implemented
110
:in PMC.
111
:So the things that you need to add as a technical level is you need a distribution or
something that like works as a distribution.
112
:There's a BART distribution that behaves similar to gamma, normal, whatever.
113
:And then as sampler.
114
:some way to get samples from a part-mode.
115
:And the reason you need a specific method to get samples is because trees are discrete or
kind of a weird kind of discrete thing.
116
:So you cannot use a Hamiltonian Monte Carlo or methods like that.
117
:So you need a special method and sampling from trees can be tricky.
118
:But essentially once you have those two elements, you can add it to PMC.
119
:and everything should work.
120
:But then you say, okay, it's not enough.
121
:Maybe you want some functionality, extra functionality, like plots for diagnostics and
plots for variable importance.
122
:That's something that you can do with apart models.
123
:You can not only fit a model and obtain a posterior and make predictions.
124
:You can also try to analyze which of the variables in your model are more important than
the others.
125
:which one are the relevance, that kind of thing.
126
:So that requires extra functionality.
127
:So it makes more sense to have that functionality.
128
:That kind of functionality is like too specific to have in a package that is so general
like, like PyMC.
129
:So it makes sense to split it.
130
:And at that time we have actually Bart was first in PyMC, I think for the book, it was
inside PyMC.
131
:Then we move it to PMC experimental that is this package that has methods that are not
necessarily experimental in the sense that they don't work, experimental in the sense that
132
:there are extra functionality.
133
:Then we decide, okay, maybe we need a specific package.
134
:So that's the current status.
135
:We have a specific package for that.
136
:And so to make sure listeners understand well, when
137
:Would you recommend using a BART model?
138
:A BART model?
139
:I would recommend a BART model when you are super lazy.
140
:I think that's best way to use a BART model.
141
:And the reason is that there is plenty of literature showing that you get good results
with BART models without putting too much thought on what you are doing.
142
:So in some sense they are competitive.
143
:against things like Gaussian processes or splines.
144
:Usually for those you need to think better.
145
:For instance, for spline you need to think, where I'm going to put the nodes or things
like that.
146
:For Gaussian processes, there are a of things that you can actually tweak.
147
:For BART, it's like usually there is not too many options.
148
:So you just...
149
:Define the likelihood, the prior or as I say if you do have something else like a linear
model or something like that you put together and you get something that is usually super
150
:reasonable That doesn't mean that you cannot get better usually if you put a little bit of
more thought you can do that and for that case Gaussian processes are excellent because
151
:you can mix and match different kernels and that kind of thing and that gives you more
interpretability a more custom model
152
:But I think BART excels when you want something relatively quick.
153
:Or if you don't have lot of domain knowledge to do the more tailored model.
154
:So that's probably a very good case.
155
:And I think another one is when you want to get information about variables, important
variables.
156
:That's also something that there are quite a few examples in the literature.
157
:when people, the main variable of analysis is to understand which variables are more
important.
158
:So maybe it's not that you want predictions, you want to know, you understand which
variables are the most relevant for your problem.
159
:And for that case is very good.
160
:And the thing is that we have a particular method in, in PMC bar that is, I have not seen
that method elsewhere that allows
161
:to interpret variable importance a little bit easier because essentially to explain the
method to understand variable importance is that essentially you count how many times you
162
:feed a lot of trees.
163
:The trees can have one or two, they're usually very shallow, so they usually incorporate
one, two, three, probably no more than that covariates.
164
:So essentially then you count how many times your first
165
:how many times the second, the third, etc.
166
:And then you say you plot that information relative frequency and that's That's the
important thing.
167
:But that's not super easy to understand how to interpret that because essentially you get
relative frequencies so you don't know where you have to cut and say this is important,
168
:this is not important.
169
:And there are some heuristics out there that
170
:try to help you with that.
171
:It's not that people have not thought about that, but I think we have something that is
much more easier in PyMCBAR that essentially what we do is we compare, we generate
172
:predictions from submodels.
173
:So essentially we generate predictions for the whole model.
174
:And then what we do is try is prune the trees to generate submodels.
175
:a model with, let's say we have a model with
176
:10 covariates.
177
:That's our reference ball.
178
:So we generate prediction from that.
179
:Then we simulate that we only have nine covariates and generate prediction from that.
180
:So first of all, until we have only one covariate.
181
:And essentially we plot that information.
182
:So you got a curve like saturates at some point.
183
:And that's much more easy to interpret.
184
:Because essentially it's easy to spot
185
:the smallest sub-model that allows you to generate the closest predictions to the
reference model.
186
:And actually that's an idea that we take from other places, the same thing that for
instance, blockpred in R, doos or culprit in Python.
187
:This, if in the way that we generate the sub-models and we compute the predictions,
188
:But the idea of comparing to the reference model, that's the key point I think.
189
:So you generate predictions to a reference model and then compare your prediction of your
submodels to that reference.
190
:So that's something that you can, and that's super cheap to do because you don't need to
refit the, you only fit the part model once.
191
:I further, we try to kind of approximate the submodel.
192
:So it's super cheap to do it, once you fit the robot mode.
193
:Nice.
194
:Yeah, thanks for that great summary.
195
:Definitely recommend people to check out the chapter of your book about bots.
196
:I'll put the link in the show notes because you guys and your editor have been kind enough
to have the online version free on the website.
197
:I put a link to that and of course if you are interested in the whole book, which I
definitely recommend, I recommend buying a hard copy because I often find myself
198
:referencing it where I forget something about Bored or the time series chapter is also
very interesting, I really love it.
199
:I always forget the nuances of time series models because it's a lot of things to tweak
and remember.
200
:And the Bob chapter also is really, really well written and explained and very clear.
201
:So I definitely recommend that.
202
:Yeah.
203
:We now also have a paper archive.
204
:It's kind of a mix of writing a style that is close to a typical scientific paper, but
also some elements more tailored to practitioners.
205
:So we try to provide some recommendations on how to choose the number of trees.
206
:And we explain what I just explained about variable selection and how to do it.
207
:That kind of thing.
208
:So that's also something that I think is easy to read.
209
:You're talking about the chapter of the book, or is that something else?
210
:No, I mean our archive paper.
211
:It's a paper on archive.
212
:Yeah.
213
:Okay.
214
:I see.
215
:yeah.
216
:So we definitely need to link to that too.
217
:I wasn't aware you had that.
218
:yeah, definitely we want that in the show notes.
219
:yeah, feel free to add that to the document I'll share with you.
220
:also one of the reasons I mentioned this archived paper is that, as you said, when we
published the book, I think we have one of the earliest versions of Bart Bolling in PNT.
221
:So we have changed, I mean, the theory of course is the same and the examples still work
and we have actually updated the examples in the book.
222
:So if you go to the repository, you're going to see that the examples updated to in one of
the newest version of Find C.
223
:And so it's still worth reading that.
224
:And we introduced a few changes in the API of
225
:parts, we try to make it a little bit more easier to use, more flexible.
226
:actually now there is Gabriel Statshulty.
227
:He's a core developer of Bambi and he's doing a superb job on speeding up TimeZBart.
228
:He's he is rewriting TimeZBart in Rust.
229
:We followed some of the advice and experience from
230
:NutPi that you already have discussed in the show.
231
:And we have seen a speedup of like 10 times or so.
232
:So we are super happy.
233
:I think that's going to be ready by the end of this year, probably.
234
:Okay.
235
:December, maybe.
236
:So maybe that will be out by the time we publish this episode.
237
:Yeah.
238
:That'd be great.
239
:Which should I link to just the GitHub repo of Pimzibart?
240
:Now it's living in a separate repo, but it's going to be the same thing.
241
:So for users, users should not notice anything, except that the models are going to run
much faster.
242
:It should be super easy for them to just run Pimzibart.
243
:Nice!
244
:Well, awesome!
245
:That is really cool.
246
:Maybe we should get Gabriel on the show to talk about that and also some Bambi stuff
because I know he does a lot of things on the Bambi side.
247
:He's doing tremendous work for Bambi.
248
:I didn't know he was doing that for a fancy part, but we're happy to hear that.
249
:He's also a patron of the show, thank you.
250
:Triple thank you, Gabriel.
251
:Yeah, one of the things that we have in PyMC part is that we have this plot dependence
plot.
252
:dependence plot.
253
:That's the other P.
254
:I was missing one P.
255
:Yeah, partial dependence plots.
256
:That's a cool way to try to add a little bit of interpretability to models.
257
:And I mention this because Gabriel has worked on something similar in Bumpy.
258
:specifically tailored for linear models or models that are available in BAMBI that this is
interpret model the they compute things that are kind of similar and the point is that how
259
:I how can I try to better understand my model and the answer is make predictions from the
model and make plots and usually that's easier than trying to understand parameters in
260
:linear models and of course for non-parametric models like BART
261
:The parameters that you fit doesn't make too much sense because they are branch of the
trees, things like that.
262
:Yeah.
263
:Yeah.
264
:Yeah.
265
:I mean, cause you agree with that, Miles.
266
:My experience personally also working on models is, yeah, really first making in-sample
predictions, making plots and then making out of sample predictions and making plots of
267
:them.
268
:Spend most of my time writing code so that the plots look good and I can understand them.
269
:Much more than the actual model code.
270
:don't know about you, but yeah, it's like, most of the work is actually handling the shape
issues when you're plotting and then handling the shape issues when you're doing out of
271
:sample predictions.
272
:Yeah.
273
:And that's one of the things Bambi tried to simplify a lot.
274
:Because it's Interpret model that has some summaries, numerical summaries and also plots.
275
:And usually the plots are super useful just out of the box.
276
:yeah.
277
:Yeah, I mean, I lose Bambi all the time, especially when I'm developing a model.
278
:Then when I know the final structure of the model and I know when I add some more
complicated structure to it, especially time series, like structural time series,
279
:instance, or motion random walk, they don't exist in Bambi at least yet.
280
:So then you have to go back to .c and write the full model.
281
:I find that to do the iterative process of just, you know, trying a bunch of stuff, it's
really awesome to have BAMI because then you can, it's really easy to iterate on the
282
:covariates.
283
:And then you have those plots, as you were saying, which is go work out of the box.
284
:And the best is really the auto sample predictions where you don't have to handle any of
the shape issues.
285
:And that's absolutely fantastic.
286
:I don't think you can do Bart with BAMI yet.
287
:No, but that's one idea.
288
:That's something that we would like to have because I think it fits very well into the
Bambi ethos because it's like, as I say, many times you want to just BART as something
289
:like quick to explore a model, sign coming off like a baseline.
290
:Sometimes it's enough because I mean, people actually use BART as their final model.
291
:There's a lot of papers.
292
:using just Bart.
293
:But even if you end up entirely happy with the result, then you can iterate and create
something much flexible and more tailored.
294
:And I think that fits quite well into Bambi's.
295
:Yeah, yeah, yeah.
296
:Yeah, definitely.
297
:mean, Gabriel, you're the person indicated to do that.
298
:Working on Bart, working on Bambi.
299
:I'm almost disappointed you haven't done that yet, Gabriel.
300
:What are you doing?
301
:No, it's, yeah, completely agree.
302
:That'd be fantastic.
303
:actually I'd be, I'd be super happy to, to give a hand on that because I think that'd be
super fun.
304
:also adding time series and, and state space methods and so on.
305
:That would be fantastic.
306
:We're gonna have Jesse Grabowski on the show in a few weeks, and he's been spearheading a
whole submodule in Prime C Experimental doing state space methods.
307
:And,
308
:Yeah, I think that'd be great if BAMI could plug into that state space submodule and then
you have state space method in BAMI.
309
:That'd be awesome.
310
:So since you're talking about
311
:about Bambi and Bart, I thought, you know what, we should bring in someone who actually
knows the nitty gritty of Bambi.
312
:What do you say?
313
:Do you already have Thomas?
314
:Yeah, but I mean right now, during the show, during the conversation.
315
:Yeah, yeah.
316
:Okay.
317
:like yes you know last time i know if you well you you you probably know that
318
:The first episode you recorded with me, Thomas, Tommy Capretto.
319
:So he was on the show, a few episodes ago.
320
:I'll put the show in the show notes.
321
:But yeah, last time you were on the show for the first episode, Tommy was actually in the
office, like just...
322
:listening to you?
323
:Yeah.
324
:And just eavesdropping.
325
:So, you know, like I...
326
:Then I think we should just...
327
:Just invite him.
328
:What do you say?
329
:Well, look at that!
330
:No, he didn't even ask for permission!
331
:Why do you laugh?
332
:What's happening here?
333
:Hi Tommy!
334
:Yeah, thanks for the...
335
:Yeah, yeah, I'm not at home.
336
:I'm in motorcycle.
337
:Yeah, so Tommy, Now I...
338
:So you don't have to eavesdrop anymore?
339
:You can just join the episode without even asking.
340
:that's awesome.
341
:Kidding aside, thanks for joining, Tony.
342
:That's great to have you both here.
343
:I think you had a few questions for Osvaldo.
344
:Yeah.
345
:So to be honest, I don't know what to be talking about.
346
:I didn't have the chance to be listening this time, but I can imagine some of the topics.
347
:Yeah, we were talking about VEMI and Osvaldo was saying it was not that good.
348
:was saying, yeah, it's getting better.
349
:yeah.
350
:now it's going to become usable.
351
:Yeah, yeah, yeah.
352
:Some people find it useful.
353
:I don't know, Osvaldo, if you talked about your teaching activities that you have this
semester.
354
:Have you talked about that?
355
:No, I briefly mentioned that I'm working on SAM, but...
356
:And I briefly mentioned that it's actually the second course in patient statistics, so I'm
super surprised that I'm not the first person talking to the students about patient stats
357
:or base factors or don't know, sampling for a posterior, that kind of thing.
358
:Okay.
359
:So since you mentioned that, so this is the second course.
360
:Do you already have in mind what would be...
361
:the third course or is it so specific that it's more like pick your own adventure and go
deep into that?
362
:I don't know.
363
:I think there are many options.
364
:one course that could be like a third course is something that is more based on solving
problems.
365
:That could be an option like a workshop that you...
366
:work on problems and discuss some problems and students share their findings and
improvement, that kind of thing, so that you can put in practice a lot of ideas of the
367
:workflow on diagnostics and going back and forth, that kind of thing, and putting things
in context.
368
:That was something that I tried to do in this second course, but it's usually kind of
difficult when examples are
369
:simple, I don't know.
370
:did a lot of the time to this is difficult because usually when you teach things tend to
be like very linear.
371
:Yeah.
372
:And when you work on a mobile, things tend to be nonlinear at all.
373
:that different is something that could be, we could try some work on that.
374
:Our dual option could be, but there are still plenty of
375
:topics to discuss like survival models or putting more emphasis on not just understanding
or...
376
:These certain courses, they don't really know about linear models.
377
:So we briefly talk about them and discuss a few more things about doing linear models
using patient statistics.
378
:But linear models is something like it's so, so right, so fast that you can just have a
course to say, okay, we are going to discuss linear models again, more in practice on all
379
:this and the things that we were discussing before, like the making predictions for linear
models, so you can understand linear models and that kind of thing.
380
:I don't know.
381
:So there are many things that we could do for a third kind of course.
382
:I really like the fact you mentioned that case study based approach, like, okay, in this
semester we're going to work on these four problems.
383
:I think that that approach would be like a very good opportunity to invest time in prior
elicitation.
384
:I don't know if you talked about prior elicitation before.
385
:I know you have been working a lot in Prellis, which is a great tool that's
386
:It's been around for some time and I see that you are constantly like adding things to it.
387
:And I think that we as a community, we are not using it enough.
388
:At least myself in my workflow, I end up reinventing many of the things that are already
implemented in Prellis.
389
:Yeah.
390
:What do you have in any analysis about the situation regarding tools for prior
elicitation?
391
:Because I think it's something it's mentioned a lot, but I know Prellies, but I don't know
many other tools or people advocating for tools for prior recitation.
392
:I think it's also, it's also worth it to introduce Prellies and talk about it as a lot of
sure about where it's at, where you'd like to take it and things like that.
393
:Yeah, sure.
394
:the first thing is that we have a paper with many other, I have a paper with, a co-author
of the paper with many other people from Alta University, Helsinki University and other
395
:universities that we discussed prior licitation.
396
:And this paper has this kind of thing that, okay, where are we with prior licitation?
397
:What we need to do?
398
:What are the approaches, tools out there?
399
:And one of the sections is discussing prior prioritization in the framework of the
Bayesian workflow.
400
:So when, you always want to do prioritization?
401
:It's something that you always want to do.
402
:It's something that you always want to do at the beginning, that kind of thing.
403
:And of course it tells us, no, not, sometimes you don't just do prioritization at the
beginning.
404
:Sometimes you do it after a while.
405
:And sometimes you want to spend a lot of time doing pre-elicitation.
406
:Sometimes you just want to just default priors like priors in Bambi or whatever.
407
:So there is a section discussing that.
408
:There is also a section about a prior elicitation software.
409
:That's something that is slacking and actually pre-elix is an answer to that.
410
:In a sense, it's an answer to that paper.
411
:If we say there, okay, there is a, we don't have enough tools.
412
:We don't have.
413
:The tools that we have are very sparse in the sense that they are not in the same place.
414
:So it's not easy to discover that you have different tools.
415
:And many of these tools are not integrated or not integrated well with probabilistic
programming languages.
416
:So maybe a tool is just, I don't know, a webpage where you can try to do some prior
recitation, but then you get
417
:something and you have to manually move numbers into PyMC or PyStand or whatever.
418
:So that's one of the goals of Prellis, trying to answer all these issues.
419
:So essentially I started working on Prellis when I was working at Aalto University and we
were writing the tape.
420
:So it has been a while.
421
:And one of the reasons I'm still not super happy with Prelius, I can maybe mention what we
have in Prelius now and what we don't have.
422
:So essentially Prelius is a packet for prior elicitation or for distribution elicitation
if you want.
423
:Sometimes you want to elicit the likelihoods also, not only prior.
424
:It's sister project with Arbis.
425
:So that's the reason it has a similar logo, has a similar name, is because it actually
lives inside Arby's deps repository, Arby's organization.
426
:And the focus after it says distributional visitation.
427
:So one of the things that Arby's provides is a collection of distribution, like the
distribution that you maybe find in PyMC or maybe you find in SciPy.
428
:And compared to SciPy, one difference is that the parameterization is a little bit more...
429
:...statistical, inclined.
430
:Okay?
431
:usually the name of the parameters are the same names that you are going to find in
textbooks, that you're going to find in probabilistic programming languages.
432
:that's the thing.
433
:And in SciPy we support alternative parameterizations.
434
:So for gamma you have alpha and beta, but you also have mu and sigma.
435
:The same goes for beta distribution, etc.
436
:And then what we offer with the distribution is that easy to define a distribution, to
plot the distribution.
437
:So we have functions to plot it just to CDF, the PDF, also interactive functions.
438
:So you can just call the distribution and move sliders.
439
:So you see what happens if I increase this parameter, decrease this other parameter, how
the distribution changed.
440
:So there are many functionalities at the level of the distribution.
441
:And that, think, is already something super useful because, as we were saying, you teach
statistics, people usually are familiar with distributions like normal, maybe gamma or
442
:binomial.
443
:And then you mention meta distribution or you mention something else and they say, so this
is the user to distribution.
444
:Let's play a little bit.
445
:They get familiar with the distribution.
446
:So now we are also adding documentation, specifically tailored to distribution.
447
:So you can have a gallery of distribution and you can go there and you're going to see a
short description of properties of the distribution, how the distribution is used, that
448
:kind of thing.
449
:So all these things are super simple in a sense.
450
:Thanks.
451
:But I think they are already useful.
452
:And as you say, if you don't have these things, you have to invent yourself.
453
:You have to write yourself because you use it in practice.
454
:So it's that someone already do that for you.
455
:And then because we have the distribution, we try to work on top of the distribution.
456
:So for instance, we have methods that can modify the distribution.
457
:We have one method that is called maxN because maximum entropy.
458
:So essentially you can pass
459
:It's a function and you pass distribution, whatever distribution you want, and then you
say, okay, I want this distribution, but I want the distribution that is bounded to be
460
:between this value and this value, and 90%, 80 % of the mass is going to be between those
boundaries.
461
:And the thing is that if you only do that, for many distributions, can, in principle, you
can have infinite distributions as answers.
462
:So we add this extra restriction that they say, okay, I don't any distribution.
463
:want the maximum entropy distribution.
464
:So it's the more spread distribution that satisfy these constraints.
465
:And that is computing that again, is I think simple idea is an old idea, but it's
something that is useful, very useful in practice to have.
466
:I actually, probably one of the functions that I use
467
:because after the distribution itself is the other function that I use most from previous,
all the time.
468
:And then there is also a little bit more flexibility because you can fit parameter if you
want, you have like a TSTUTION distribution and you say, okay, I want to fix null at
469
:seven, you can fix null at seven and then do the res with maximum entropy or you can fix
just the...
470
:whatever parameter you want.
471
:mean, some parameters make sense, some others that don't make too much sense to fix, but
you can do it.
472
:Usually it's not going to complain.
473
:It's just going to give you the distribution that satisfies the restrictions.
474
:And then we also have some functionality.
475
:One method that is commonly used in practice is called a roulette method, like roulette
like in casino.
476
:And the name of roulette is because you have like a
477
:I think the analogy is that you have a certain amount of chips that you want to in a
sense.
478
:So where do you want to place the chips?
479
:And of course you want to place the chips where do you think the distribution has more
chance to be something like that.
480
:So that's one way to see it and that's explain the name.
481
:But the other thing, if you see this functionality, you're going to see that what you are
doing is essentially drawing a histogram.
482
:So essentially you have a grid.
483
:Okay?
484
:And you have on this grid, can place, you can activate cells of the grid and then you get
something that looks like an histogram and then you can pick from a pool of distribution
485
:and say, okay, if this is, if my distribution looks like this histogram, what distribution
fits the best here?
486
:And essentially you can pass all distribution in Prellis or you can select one or two
distributions or whatever.
487
:is very flexible.
488
:And that's it.
489
:And you get the distribution you can do.
490
:Stop with that.
491
:distribution.
492
:And that's something that's very common in prior licitation literature.
493
:If you read literature, you're going to...
494
:That method appears many times.
495
:Mentioned.
496
:There are actual papers or tutorials with protocols about how to use that method if you
have like many experts.
497
:And that's something that you can do in Preli.
498
:So for instance, we...
499
:The three of us can go and use that elicitation method.
500
:We can then collect our three elicitation, elicited distributions and we can put it
together in a single distribution and we can add weights.
501
:we say, okay, Tomás probably is an expert in this field, so I want to give more weight.
502
:Alex, I don't really know anything, so we have little smaller weights, something like
that.
503
:And you can do that.
504
:So that's also something super useful.
505
:The thing that there are some functionality, but the thing is missing at this point is
elicitation, what is called predictive elicitation.
506
:All the things I have been talking about, there's elicitation at the parameter level.
507
:you have it, you know that you have something like use a slope or you have something like
this.
508
:know it's something in your model and it's usually one dimensional.
509
:That method for
510
:multi-dimensional, but anyway.
511
:But the thing is that you're working at one part of your model.
512
:And so if you have a of arguments in your model, you need to go one by one.
513
:But that's super annoying.
514
:And still that's useful because sometimes you, as I say, you can start with default priors
and then you have one or two priors that you want to pay attention.
515
:So for those cases, these tools are super useful.
516
:But anyway, sometimes you want to do something more automatic.
517
:So one idea is that, why don't we make predictions from the prior predictive distribution?
518
:And if you have predictions from the prior predictive distribution, then you can see what
your model is trying to do.
519
:So that's super useful.
520
:Right?
521
:So there's some functionality at this point in Prellis to do that.
522
:For instance, you can pass a BAMBI model or a PMC model or a Prelis model because you can
kind of define models using just Prelis.
523
:And you can get samples from the prior predictive distribution and you are going to get
like box or sliders that you can tweak the values and you can see the prior predictive
524
:distribution.
525
:So again, it's a simple idea that it's tried to make it easier this iteration process that
you
526
:have a model, you generate predictions, you apply, then you go back to the model, you make
predictions, you do another plot.
527
:So it's trying to speed up that process, to making it a little bit more interactive.
528
:So that's something that is already there.
529
:It should work for many models.
530
:I'm not sure if there's anything that you throw at it, but it's the idea that should work
for anything.
531
:Probably it's not going to work for anything because, know.
532
:have to test it better, but the functionality is there.
533
:And then there are a couple of functions that try to be more automatic in the sense that
you provide the model.
534
:So you have like a disinvited model with some default priors.
535
:And then in one of these functions, you'd also provide like a target distributions.
536
:That target distribution is supposed to be the distribution that you think the prior
predicted distribution should be.
537
:And this function is trying to make both as close as possible.
538
:This is the most experimental method in Priors at the moment.
539
:because essentially that's kind of an inference problem, right?
540
:So, I mean, you are not trying to reinvent patient statistics.
541
:You should do that in a way that is super cheap to do and super fast to do, that kind of
thing.
542
:So there's some functionality for that.
543
:It works for models that are super simple and it provides something that makes sense.
544
:And also it's kind of experimental that if you provide like the default, you provide a
model with a, I don't know, it has a normal and a half normal.
545
:For instance, two, half normal and half normal.
546
:But some experimental thing that is maybe it returns a normal and a gamma because say it's
better to have a gamma as a prior and not half normal because you're super shifted,
547
:something like that.
548
:So half normal is not a good fit.
549
:So that's something that is already there.
550
:Again, it's super experimental.
551
:I played from time to time, not sure if I will use it, probably not sure if it will work
for a very complex model, but it worked.
552
:It's just fit.
553
:At this point, actually worked for some models that are not that simple, like hierarchical
models, that kind of thing, it worked for that.
554
:And then the other method that is for private elicitation that I think is not very good in
practice, but I think could be good for teaching.
555
:Still not sure if something that it's...
556
:actually would or not something like that.
557
:It has to be decided yet.
558
:And it's a method that just provide a model.
559
:It's going to generate samples from the prior predictive distribution.
560
:And you are going to see some of these samples and you can click this 3x3 grid and you
say, okay, these samples look like what I think the predictive distribution should look
561
:like.
562
:And when you select
563
:is going to add both samples together.
564
:So you have another kind of plot down that you can see that what happens when you collect
all those individual samples and you can go clicking and every time you click it's trying
565
:to in a sense learn.
566
:It's super silly in size.
567
:not like I'm using generative AI.
568
:That's not that kind of thing.
569
:Not neural network.
570
:Just comparing distributions.
571
:But essentially the methods is layers that you like distribution that has some shape, some
means, some variance.
572
:So it's tried to provide you with map of those samples.
573
:And in terminal is also automatically adding samples that you didn't select manually, but
are super close to the ones that you select.
574
:So we try to enrich the population.
575
:Anyway, it's the same thing.
576
:We're trying to approximate what you have in your mind.
577
:as like a target distribution to what you are seeing as the prior predictive distribution.
578
:And in this, the other method was kind of automatic and this method use your brain as the
filter.
579
:I don't share it super useful, but I think it's at least it's fun.
580
:It could be fun to show it to the students and they can try to play and try to match
distributions.
581
:So at least for teaching that should be useful.
582
:And of course, because you, general distribution, you put a sense of is your samples are
too off from what you have in mind or are close or are too narrow or too wide, that kind
583
:of thing that you get, but you also get to interact and play a little bit with your
distribution and maybe waste a lot of time clicking with your mouse instead of doing
584
:actual work.
585
:But anyways.
586
:So, okay, that's Prellies.
587
:Yeah, damn, thanks a lot.
588
:That was quite a performance.
589
:Like a one-man show about Prellies, ladies and gentlemen.
590
:You should drink some water.
591
:I'm surprised you don't have any Mate with you.
592
:I can see that Tommy has, of course, his Mate.
593
:No, I usually only drink Mate in mornings.
594
:Okay.
595
:Or maybe if I have to drive, I have to...
596
:Go to anybody who hasn't tried MATEA.
597
:I definitely encourage issuing that.
598
:Of course, it's better to do it for the first time under supervision.
599
:So go to Argentina or Uruguay and then have a proper MATEA.
600
:Thanks a lot, Ozolo.
601
:think Tommy has another question for you.
602
:I'll give him the floor.
603
:Yeah, just want to say that, of course, I put the link to the Prelease package in the show
notes and definitely again, encourage people to check it out because I use it all the time
604
:in my modeling workflow.
605
:What I really love is that it marries very well with PIMC also, so you can have the prior
elicitation and then just ask Prelease to output that as a PIMC distribution directly in
606
:your PIMC model.
607
:So that makes the workflow way more transparent also.
608
:super reproducible.
609
:Oh yeah.
610
:And, and as you heard Osvaldo say, are so many things you can check out here.
611
:So very broadly, there is something you need.
612
:Go check that out.
613
:And also, you know, ask Osvaldo if he needs any help on the repo.
614
:Yes, I help.
615
:If you want to provide help, I'm super happy.
616
:The answer is always yes.
617
:a very short final question, which is something I'm curious about.
618
:So it is a question I thought I knew the answer to, but I realized my answer was extremely
silly.
619
:So you said Prellis is a sister project of Arvis.
620
:And when you say Prellis, to me, the name makes total sense because it's prior
elicitation.
621
:Okay.
622
:You glue both words and you get Prellis.
623
:And what's the origin of Arvis?
624
:What does it mean?
625
:It's RV like in random variates.
626
:You know when you write random variates and you write RV and S.
627
:So RVs.
628
:Yeah.
629
:And we say that with an S and then we say okay we could just say C because it sounds like
a visualization.
630
:Okay, do you want to know what used to be my internal answer?
631
:Okay, Of course.
632
:So this is a visualization library and it was created by a guy in Argentina who's very
patriotic.
633
:So, Arvis, of course.
634
:No, it didn't make sense.
635
:Very, very Argentinian, very Argentinian hypothesis from you.
636
:Yeah, yeah, yeah.
637
:There was a time in Argentina that everything was art, art something.
638
:You remember?
639
:Yeah, yeah, yeah.
640
:That's why I supposed that could be the reason.
641
:yeah, I was wrong.
642
:you were trying to promote like it's an everything package or whatever they'll do.
643
:It has to add something.
644
:And I know had no less after doing that, that it could be the case because I think Colin
make that assumption.
645
:I don't remember.
646
:I think was Colin Carlson.
647
:Okay, because you are Argentinian or something.
648
:But no, no.
649
:Good, thank you.
650
:yeah, Tommy, feel free, I don't know if you have other things, but feel free to stay here
and ask other questions.
651
:Otherwise, if you need to go, of course you can, but thank you so much for dropping in and
making that surprise appearance for Osvaldo.
652
:Actually, talking about RVs, I think you guys are working on RVs 1.0, Osvaldo.
653
:Right?
654
:Yes, that's right.
655
:it something you can share with the world?
656
:No, it's toxic.
657
:mean, that's the main like...
658
:LBS is kind of like the CNN of the Beijing world.
659
:you know, it's like...
660
:this is specify information.
661
:No, actually, yes, we are completely rewriting Aramis.
662
:Completely rewriting actually is the...
663
:One of the thing is that we are splitting in three sub-models.
664
:And again, probably when this gets released, users are not going to notice that unless
they want to install them by separate.
665
:you will be able to just call all the functionality from a single place.
666
:But from time to time, we get people that wants, for instance, to compute, I don't know,
our hats, but they don't want to do any plot.
667
:So we are splitting and...
668
:that for those people, there's going to be easier to install Arbis without needing to
install Plotting libraries.
669
:That makes a lot of sense when you're, for instance, working with clusters.
670
:Universities, typical scenarios that you're working in a cluster.
671
:The guy in charge of a cluster say, okay, you only have to install only the things that
you really, really need.
672
:Okay.
673
:Now we are going to split this functionality.
674
:Another thing we're working is that it's going to be much easier.
675
:We're still going to have like this batteries include plots that you just call something
like ppc, block ppc and you get something and you don't need to do anything else.
676
:But it's going to be much, much easier that once you call the plot to tweak it.
677
:So we have a thing that is
678
:has some reminiscence of grammar of graphics in a sense.
679
:It's not exactly that, but I think you, Thomas, with the first time you saw, because this
is something that Oriol proposed, I think the first time you, Thomas, saw it, you say,
680
:okay, this looks very close to grammar of graphics.
681
:So there's some inspiration there.
682
:That, yeah, so essentially you're going to be able to call a plot and then tweak it quite
a lot, even you are going to...
683
:I'll be allowed to do completely nonsensical things.
684
:Super easy.
685
:So probably unexpected things that are useful too.
686
:And for us, it's going to be much, much simpler to add new functionality.
687
:And actually this past week, I was working on some functionality that is not available in
current Arbis.
688
:This is for checking prior sensitivity likelihood.
689
:sensitivity checks.
690
:Essentially I add a few things to compute the statistics and whatever.
691
:When I was to create the plot, I started thinking, okay, I need to do all this work and
start working like super complex things.
692
:And then I realized, no, I just need to call a function that we already have and do some
manipulation of data a little bit and then call a function and voila, I have a plot that
693
:has a lot of functionality.
694
:and I didn't have to write anything.
695
:So even I was surprised, Oriol was surprised, Andy that's also working and there was
surprise.
696
:It was so easy to create something entirely new.
697
:Another thing that's probably going to be very useful for people is that if you want to do
something much fancier, we provide new objects, something that's called plot collection.
698
:that in a sense is like a, in this new everything is like an X-array data set or data tree
or that kind of structure.
699
:So it's super, it's built around taking advantage of properties of X-array data structure.
700
:So this prep collection is going to allow you to, if you want to build your own kind of
plots,
701
:So I hope that will be useful also for researchers or people that want to like push
boundaries, not reuse some plot, but create something new.
702
:I think that's going to be much easier to do it with this interface.
703
:So that's something that we are quite excited about that.
704
:And of course we are adding new methods that are not available.
705
:As I say, this prior
706
:sensitivity checks and don't know other methods that are not viable now in Harbis we are
starting to move or add those methods in Harbis 1.0 because we want to be ready as soon as
707
:possible actually it is already usable you can go and check probably we should add the
others of the repository
708
:Um, and you will see that currently you already have like, uh, plot forest.
709
:Actually, I love the plot forest in R &D New Arbus.
710
:It looks super amazing.
711
:It's super clean.
712
:I don't know.
713
:I just want to look, it's like you cannot have a drawing or a good picture.
714
:It's just look nice.
715
:Um, we have, uh, I don't know, we already have a lot of functionality.
716
:Everything is there.
717
:and some functionality is going to be in the new Arbus and not the old one but again, I
think in a couple of months once this episode gets published it's going to be much much
718
:more usable ah, one thing that is super nice is that internally we have a better way to
handle with different backends so in current Arbus we support matplib and bokeh
719
:and in Arbis 1.0 we also have Plotly, we have Matplotly, Bokeh and Plotly.
720
:Again, was a pleasant surprise when Oriol added Plotly because it was relatively easy work
and suddenly we have a completely new backend working almost perfectly out of the box.
721
:Still there are some things that we need to polish but it was so crazy and
722
:even for us that you can just add a couple of things and you get something that works.
723
:Actually at this point, probably works better than than Bokeh.
724
:It's super nice.
725
:yeah.
726
:Yeah.
727
:Damn, that's super exciting.
728
:For sure.
729
:Let's put that in the show notes.
730
:The link to Travis Word 1.0.
731
:That sounds amazing.
732
:Yeah, I mean, can't wait to see the forest plot now because I use forest plots all the
time with arties.
733
:It's just the best ones, especially when you have models with a lot of dimensions.
734
:It's one of the best ones because you get a lot of information in just one plot.
735
:Something I always have difficulty with is mainly diagnosing and then visualizing models
where you have
736
:one dimension or several which are huge, you know.
737
:So for instance...
738
:For my main job with the Marlins, instance, my models, I have a ton of players in them.
739
:It's like if you have a model with 10k players, it's a challenge to visualize a trace plot
or even a forest plot.
740
:yeah, this I'm still trying to find my go-tos.
741
:But plot forests...
742
:Definitely, definitely one of mine.
743
:Love it.
744
:Yeah, definitely.
745
:was...
746
:I'm still a fan of calculus because I like how they usually encode information, but I'm
becoming more more enthusiastic about just point intervals, in plot forest.
747
:Just a point on some interval and that's it.
748
:Because as you say, when you have a lot of things, it's much easier to compare.
749
:I you don't have to worry about things like...
750
:the bandwidth that you're using.
751
:I don't know, there are many things that are useful.
752
:course, each tool has their use, but I think plot forest and in general, the use of
half-point intervals is super useful.
753
:Yeah, yeah, definitely.
754
:And also, if at some point you find some good plots and illustrations for models with huge
dimensions, please let me know.
755
:Something I want to talk about with you too is intuitive base because you've done some
work over there mainly writing down some really good educational content.
756
:If you had briefly talk about that and also we should definitely put that in the shots.
757
:Yeah, I think the first...
758
:The first thing I did for intuitive bay was practical MCMC.
759
:So essentially it's a course for people that, I mean, you don't want to get an expert into
sampling methods.
760
:Actually the promise of probabilistic programming language is that you don't need to worry
about the sampling.
761
:You only need to worry about defining your models.
762
:But of course in practice, it's usually a good idea to at least have a
763
:conceptual understanding of what MCMC methods are doing because if you have this
conceptual understanding is easier to diagnose or and help know what things you can do to
764
:fix when you have problems like bad trace plot or divergences or that kind of thing So
essentially for practical MCMC we do that, sometimes it's very conceptual and we try to
765
:use a lot of animations
766
:And that was one of the things that I was more excited because I'm just to teach writing
books or tutorials or things that are static.
767
:And then when I teach live, I try to have something that looks closer to an animation, but
usually very simple ones, very rudimentary ones.
768
:So with PracticalMCNC, we were able to do something like this.
769
:those looks a little bit more fancy, a little bit more entertaining.
770
:So we spent quite time trying to do that and making those animations actually something
useful so students can actually learn the concepts in a faster and easier way.
771
:Yeah, essentially that is a course for people that want to do things in practice and they
want to get some understanding of MCMC methods and how to deal
772
:with them in practice.
773
:And then we wrote workflow guide that also was something interesting because as I say
earlier, talking about the workflow in a practical way is sometimes not that easy.
774
:It's kind of challenging because of the nonlinear nature of workflow, all these branches
and different things that you can do in actual work.
775
:And anyway, we tried to provide like, it's a short guide.
776
:This is a PDF.
777
:The other one was a videos.
778
:This is PDF.
779
:And it's a kind of short guide that provides like the general ideas of the work from.
780
:So at one point we have like a kind of description of a linear work from.
781
:We clarify say, okay, usually things are not linear, but we're going to put one step below
the other.
782
:I'm trying to provide some of the things that you usually want to pay attention at each
step.
783
:And then we'll provide an actual example when we go and iterate and we play with the prior
predictor distribution and do some prior licitation and then check them all and then
784
:decide if we want to go back or not and repeat something and that kind of thing.
785
:Actually we have, we find issues and we go back.
786
:And this is like, as I say, it's a short guide and was kind of a proof of concept because
we wanted to see if we were able to write like this short, relatively short, booklets for
787
:people that they already know what they don't know or they already know what they need to
improve.
788
:it's short because we assume you already know what...
789
:the modeling is, you already know what samples are, you already know your tools, but you
need this extra step to create, how do I proceed in actual modeling?
790
:And we also have some work that's already published, we are working on that for private
solicitation.
791
:So we want to create also a booklet and or we're still trying to decide a video.
792
:course lessons for prior licitation.
793
:And again, prior licitation in the context of the patient workflow on probabilistic
programming.
794
:When do you want to do prior licitation?
795
:How much time should you spend doing prior licitation?
796
:What kind of priors that can you think?
797
:It's not a surprise probably from the audience.
798
:think that we really like weekly informative priors.
799
:So we tried to center the discussion about, but what are weekly informative priors?
800
:We tried to provide a practical definition of that.
801
:you can try to aim at working with weekly informative priors.
802
:So yeah, I think it's quite, I have been super happy of working with Intuitive Faith, as I
say, because for all the projects we have trying to do something that is
803
:new in the sense not only that I really have some experience teaching patients to people.
804
:when at each project we try to do something new in that sense, something new in how we are
going to teach this concept that I have probably teach many times before.
805
:How to approach it in a different way or how to learn that, how to
806
:try to collect all the things that you know you tried in the past and didn't work or do it
differently.
807
:So it has been super quiet, super fun project.
808
:Yeah.
809
:And I definitely recommend listeners to check those out, all of those.
810
:So I put already your practical MCMC course in the show notes.
811
:And please, you can add, well, at least the first booklet, I know it's out, if you can add
the link to the show notes.
812
:The second, as you were saying, I think you're still working on that.
813
:But maybe by the time this episode is out, we'll edit and definitely I'll recommend people
check that out because as a lot of your work, I'm guessing it's going to be very to the
814
:point and enough technical details, but not too much.
815
:So that's definitely the kind of thing I like in your work and in your writing in
particular.
816
:It's like you give the readers enough technical details to make them understand, but at
the same time, you're not just drowning them in a sea of technical details that are not
817
:necessary, at least to start with these kind of methods and actually use them in the wild.
818
:So well done on that.
819
:I love that.
820
:Thank you.
821
:And that note, the way, and always, there is always a lot of humor in your writing because
you're quite a funny guy.
822
:I have to say to the listeners in case they didn't notice.
823
:And actually, if people want to read more of you, you have your book, right?
824
:That was the first book I read when I started learning English and statistics.
825
:Now, you have...
826
:your first book that's...
827
:The title is Introduction to Patient Statistics or something like that?
828
:No, Patient Analysis with Pact.
829
:No, Patient Analysis with Python.
830
:Yes, Patient Analysis with Python with Pact.
831
:And that's your third edition, right?
832
:Yes, exactly.
833
:Damn, congratulations.
834
:I know how it's hard to write one book, so three...
835
:I don't know.
836
:And thanks to you, actually, you managed to have your editor give some goodies to the
listeners.
837
:So we're going to have a handful of ebooks to distribute to the patrons of the show.
838
:We're going to do a random draw in the Learn Bay Stats Slack and then the winner are going
to hear, guessing, from Pact.
839
:How many books are we able to give away this time?
840
:I'm not sure, really.
841
:Okay, so we'll get back to you on that, And for the rest of you, you have a discount code
that we'll also put in the show notes when that episode comes out, so that you can buy
842
:Oswaldos book at a discount.
843
:And again, we recommend...
844
:Anything Osvaldo writes because it's usually really well done.
845
:So thanks again Osvaldo for setting that up with PACT.
846
:Okay.
847
:Thank you.
848
:So we're already at one hour, more than one hour.
849
:So I don't want to take too much of your time.
850
:So I'll ask you again the last two questions I ask every guest at the end of the show
because you that's cool.
851
:get another.
852
:another stand at that but a personal curiosity i have i talked to you about that a bit but
i thought it would be interesting to talk about that on on the show a bard for time series
853
:is that possible how is that possible is it just something that you know you you just add
a bard prior to i don't know a model you're already
854
:doing kind of a linear regression on which you have a temporal element like a Gaussian
random walk.
855
:Can you also add a barge to that?
856
:How does that work?
857
:that possible?
858
:Yeah, you can do that.
859
:There's probably one or two examples I've seen from Juan Orduz because he has worked a lot
with time series.
860
:I personally have not worked a lot with time series.
861
:So that's something that usually isn't out of my radar, but yeah, in principle you can do
it.
862
:As I say, probably one difference with Gaussian processes is that you work with Gaussian
processes for time series or other methods for time series is that you kind of try to
863
:construct your time series by adding
864
:different terms that encode different kinds of information at different levels.
865
:And in that sense, is a little bit more dumb.
866
:But it's something that I have been thinking about, about how to try to add a little bit
more structure to BART models.
867
:And there's actually some literature about that, for instance, telling your BART model
that one or more variables are
868
:always increasing or always decreasing or whatever.
869
:So, but I think there's, there is something to work on how to provide more information to,
to encode more information, prior information into BART and also how to combine more BART
870
:models.
871
:Like maybe you can build different BART models together.
872
:and try to build a more complex function in that way.
873
:I think there's something that is missing from literature.
874
:And I think that's partially because this idea of having like, bad models as standalone
models.
875
:The moment you move from there and you say, no, bad model is not a model.
876
:It's a stochastic process or a distribution, if you want.
877
:Then we start thinking about
878
:Okay, maybe I can do much more general stuff and much more complex stuff.
879
:So I don't know, I'm collecting people.
880
:Actually, my hope is that people using PyMCVart try to do that for me.
881
:So I have not to do it, they were able to do it with PyMCVart and they showed me how to do
it.
882
:And so I can learn from them.
883
:That's my ultimate hope with PyMCVart.
884
:Yeah, I can guess that.
885
:Okay, so that's interesting.
886
:Basically, you're saying that, yeah, BART are more of a black boxy in a way.
887
:So it's kind of like you use a BART or you don't, but it's not something that's modular in
that you can use a component of your linear regression, for instance, could be a BART.
888
:And in addition to that, you have what you were saying for...
889
:Gaussian processes, for instance, where you can model a trend, you can model the
seasonality, you can model short-term variations and each of these three components, you
890
:could do it with a Gaussian process, or you can also have structural time series as we'll
talk about with J.C.
891
:Grabowski, where you model your trend, you model the seasonality, each with two different
892
:two different, well, not models, but let's say methods in the same model.
893
:And also you can use something like an arena to then model the rest, the residuals.
894
:Once you've taken the trend and seasonality into account, what you're saying is that you
cannot really do that with bars.
895
:What I'm saying is that it's more difficult to give priors to part in the sense, the
priors are super general in the sense that you pick the number of trees and then you say
896
:you have some probability of how deep the trees can be.
897
:But this is not that easy to encode a lot of information into the priors.
898
:Like for instance, you can do with kernels and Gaussian.
899
:That's the things that I'm saying.
900
:But I think it's not something that is intrinsic from BART, it's more of the methods that
are viable.
901
:And probably I'm saying this and maybe I hope some listeners say it all, but I know about
the model when you have blah blah blah.
902
:Because there are a lot of things.
903
:For instance, there are bad models that at each leaf node, they return the Gaussian
process.
904
:Okay.
905
:So the answer of the the result of the models is a sum of Gaussian processes.
906
:super complex and you can do a lot of things with that.
907
:What I'm saying is that at this point with PrimeZ verb is much more flexible than other
methods in the sense that you can use as part of a model.
908
:For instance, know Chris Farnesbeck, PrimeZ, he has used models that include part of GP
components in a single thing.
909
:So you can do that.
910
:What I'm saying is that we need
911
:a little bit more work in able to provide like more prior information or encode more prior
information directly into bark so we can restrict.
912
:The thing is that bark models are super flexible.
913
:When you have something that's super flexible, it's always a good idea to say, okay, I
want to restrict you to be more of this type or this type because that's a way to encode
914
:prior things.
915
:Right.
916
:Okay.
917
:Yeah.
918
:Get that.
919
:Yeah.
920
:Super fun.
921
:I mean, I can't wait to,
922
:you know, play with all these, uh, all these different, um, all these different methods,
uh, which is, uh, we just talked about throw out, uh, the episode.
923
:I'm already playing with all of them, but you know, uh, always learning.
924
:that's, uh, that's really cool.
925
:and, um, yeah, yeah, yeah, yeah.
926
:It's, it's, it's part of the job, right?
927
:Sometimes it's a bit like, Oh my God, don't know anything.
928
:And, uh, all the, and most of the things like, Oh, okay.
929
:Yeah.
930
:And now I understand that.
931
:that thing but now I have to go learn that next step of that method or learn that new
method to actually combine it with another one I know about.
932
:yeah, that's really the challenge and also the joy of that line of work, let's say.
933
:So I put the blog post from Juan Orduz that you talked about.
934
:think it's the one looking at a cohort retention analysis with VART.
935
:sounds like it's a time series with Bartel.
936
:I'll definitely give it a read because that sounds super interesting.
937
:And since you mentioned GPs, I also put in the show notes two new...
938
:well newish...
939
:now they have a few weeks...
940
:that Bill Engels and myself wrote about HSGPs, so Hilbert Speights' decomposition of the
Gaussian processes, which I definitely recommend people interested in GPs.
941
:you look at because honestly, HSGPs, if you're doing one or two dimensions, sometimes
three, but at least one and two dimensions is absolutely amazing.
942
:And that changes a lot of things, because it's way faster to compute way more efficient.
943
:And so if you want to get started with the HSGPs, recommend these two tutorials.
944
:First one is that guides you through the basics.
945
:And the second one is demonstrating two more advanced use cases.
946
:Yes, I have to teach GPS in a couple of weeks.
947
:I'm going to steal all your material from my class.
948
:Perfect.
949
:Awesome.
950
:Yeah, I can't wait.
951
:I love these methods.
952
:So I really love hearing that they get propagated even more in Argentina, which is dear to
my heart, as you know.
953
:So I think it's time to let you go as well.
954
:It's been a while.
955
:need to, you need to, get some alpha heart in your, in your blood.
956
:but before letting you go, I'm going to ask you again, the last two questions, ask your
guests at the end of the show.
957
:So first one, if you had unlimited time and resources, which problem would you try to
solve?
958
:Yeah, I think I'm going to say exactly the same.
959
:I I'm still motivated to.
960
:to work on invasion methods.
961
:And I say at the beginning, for me it's really something I'm super happy when I get some
code and you get something like a plot forest or something like that, or the first time
962
:you see PymC part running and it's actually fitting with something that at least looks
reasonable or talking with other people like Gabriel and he say, okay, I speed up this 10
963
:times.
964
:or whatever, with Tomas about Bambi.
965
:So yeah, I think I'm still super interested in working on patient methods and making
patient methods more useful to people.
966
:I think it's even more interesting to me to make methods so other people can do stuff than
doing the stuff.
967
:Like this kind of general...
968
:approach.
969
:So yeah, I will keep doing the same and if I have more resources I will have more time,
more people to help with this.
970
:That sounds good.
971
:I love it.
972
:And who would be your guest if you could have dinner with any great scientific mind, dead,
alive or fictional?
973
:I don't know, that's all...
974
:It would be easier if we, all participants of your podcast, have a single banquet with all
the people so we can switch tables and talk to many people.
975
:So we should try to organize something like that at some point.
976
:I don't know.
977
:It's super difficult to think about a single person.
978
:The last time I think I mentioned Harold Shalaga.
979
:that I have the opportunity to have language.
980
:Which him, he was from my previous life as a bioinformatician, biophysicist or whatever.
981
:statistics, I don't know.
982
:I don't know if I have some like hero, statistical hero yet.
983
:I see, see.
984
:I know some people that I already have the opportunity to have lunch and I don't know.
985
:I think close to me, I don't want to sound like a...
986
:I think, you know, to get back to Argentina, think José Luis Borges would be an
interesting person to have dinner with.
987
:I don't know, it's super intimidating.
988
:So not technically a scientist, could be interesting.
989
:but to be super intimidating, you know?
990
:It's a guy that...
991
:It's a guy that is like,
992
:I don't know if listeners have read him, even when we were reading fiction, he always kind
of transmitted this idea that he's saying something that is super profound, true,
993
:something like that.
994
:I don't know, it's just poetry, it was fiction, but I don't know, it has a very
interesting style of writing.
995
:Yeah, yeah, definitely.
996
:yeah, mean, the Garden of F**king Path, things like that, I think it's definitely
scientific-ish, let's say, you so it'd be interesting to think about how he came up with
997
:these ideas.
998
:No, have many things that look like scientific, in a sense.
999
:There fiction that look like that.
:
01:29:14,199 --> 01:29:16,981
Could be science fiction, instantly.
:
01:29:17,222 --> 01:29:17,762
Yeah.
:
01:29:17,762 --> 01:29:19,243
Yeah, definitely.
:
01:29:19,383 --> 01:29:20,424
Awesome!
:
01:29:20,424 --> 01:29:22,326
Osvaldo, that was a pleasure.
:
01:29:22,326 --> 01:29:24,558
Thank you again for taking the time.
:
01:29:24,558 --> 01:29:28,691
It's always a great pleasure to have you on the show.
:
01:29:29,792 --> 01:29:34,035
Of course, the show notes of these episodes are gonna be huge because you do so many
things.
:
01:29:34,516 --> 01:29:37,718
But yeah, so I've already added a lot of things.
:
01:29:37,859 --> 01:29:43,763
Feel free to add other links that we mentioned today.
:
01:29:43,824 --> 01:29:47,136
And of course, links where people can follow you.
:
01:29:47,766 --> 01:29:57,149
Yeah, and maybe before we close up the show, can you tell people where they can follow
you, where they can support your work, something like that?
:
01:29:57,729 --> 01:30:00,489
Yeah, I am still on Twitter.
:
01:30:01,430 --> 01:30:04,481
I'm still calling Twitter, but probably I'm going to be great.
:
01:30:04,481 --> 01:30:06,571
I'm also on Mastodon.
:
01:30:07,051 --> 01:30:07,939
It's all the same.
:
01:30:07,939 --> 01:30:08,892
It's all of TigerV.
:
01:30:08,892 --> 01:30:11,373
It's the same handler for everything.
:
01:30:11,373 --> 01:30:13,093
I'm in blue sky.
:
01:30:13,813 --> 01:30:17,004
Maybe we can then share my personal webpage.
:
01:30:17,272 --> 01:30:22,365
that is essentially a place that you can go and see what I'm doing at the moment.
:
01:30:24,227 --> 01:30:25,568
now I'm in LinkedIn.
:
01:30:25,568 --> 01:30:30,531
I have to admit that I hate LinkedIn, but I'm there for some reason.
:
01:30:30,531 --> 01:30:36,575
I still don't understand what people are doing there, but I see that a lot of people
posting interesting things.
:
01:30:36,715 --> 01:30:40,697
So I started to follow those folks.
:
01:30:41,618 --> 01:30:42,109
Perfect.
:
01:30:42,109 --> 01:30:45,861
Well, I'm sure people will connect there.
:
01:30:47,062 --> 01:30:52,933
Thank you again, Osvaldo, for taking the time and being on this show for third time.
:
01:30:53,054 --> 01:30:53,842
Thank you.
:
01:30:53,842 --> 01:30:55,799
I only need seven more.
:
01:31:01,176 --> 01:31:04,869
This has been another episode of Learning Bayesian Statistics.
:
01:31:04,869 --> 01:31:15,358
Be sure to rate, review and follow the show on your favorite podcatcher and visit
learnbaystats.com for more resources about today's topics as well as access to more
:
01:31:15,358 --> 01:31:19,461
episodes to help you reach true Bayesian state of mind.
:
01:31:19,461 --> 01:31:21,403
That's learnbaystats.com.
:
01:31:21,403 --> 01:31:24,265
Our theme music is Good Bayesian by Baba Brinkman.
:
01:31:24,265 --> 01:31:26,247
Fit MC Lass and Meghiraam.
:
01:31:26,247 --> 01:31:29,409
Check out his awesome work at bababrinkman.com.
:
01:31:29,409 --> 01:31:30,604
I'm your host.
:
01:31:30,604 --> 01:31:31,575
Alex and Dora.
:
01:31:31,575 --> 01:31:35,774
can follow me on Twitter at Alex underscore and Dora like the country.
:
01:31:35,774 --> 01:31:43,083
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
:
01:31:43,083 --> 01:31:45,445
Thank you so much for listening and for your support.
:
01:31:45,445 --> 01:31:47,757
You're truly a good Bayesian.
:
01:31:47,757 --> 01:31:51,249
Change your predictions after taking information.
:
01:31:51,249 --> 01:31:54,572
And if you're thinking I'll be less than amazing.
:
01:31:54,572 --> 01:31:57,922
Let's adjust those expectations.
:
01:31:57,922 --> 01:32:11,086
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation