Speaker:
00:00:02
uh My guest today is Gabriel Steksholti, a software engineer passionate about
probabilistic programming and optimization.
2
:
00:00:14
Gabriel recently re-implemented BART, Bayesian Additive Regression Tree, in Rust, making
the algorithms faster, more flexible, and more suitable for real-world applications.
3
:
00:00:27
So if you are a PyMC BART user, well, definitely
4
:
00:00:31
recommend checking out his implementation that is in the show notes.
5
:
00:00:35
In our conversation, we dive deep into what makes BART special, its ability to quantify
uncertainty, handle different likelihoods, and serve as a strong baseline in settings like
6
:
00:00:46
optimization and time series.
7
:
00:00:49
We also explain how BART compares with Gaussian processes and other tree-based methods,
and talk about practical challenges like handling missing data, integrating BART into
8
:
00:00:59
PyMC,
9
:
00:01:00
and embedding machine learning models into decision-making frameworks.
10
:
00:01:05
Beyond the code, Gabriel reflects on open-source collaboration, the importance of
community support, and where probabilistic programming is headed next.
11
:
00:01:15
This is Learning Vision Statistics, episode 142, recorded September 18, 2025.
12
:
00:01:29
Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,
the projects, and the people who make it possible.
13
:
00:01:49
I'm your host, Alex Andorra.
14
:
00:01:51
You can follow me on Twitter at alex-underscore-andorra.
15
:
00:01:55
like the country.
16
:
00:01:56
For any info about the show, learnbasedats.com is Laplace to be.
17
:
00:02:00
Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
18
:
00:02:07
That's learnbasedats.com.
19
:
00:02:09
If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
20
:
00:02:20
See you around, folks.
21
:
00:02:22
and best patient wishes to you all.
22
:
00:02:24
And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
23
:
00:02:31
Check us out at pimc-labs.com.
24
:
00:02:38
a real schtektschulte.
25
:
00:02:40
Welcome to Learning Vision Statistics.
26
:
00:02:43
And I think I butchered your name, did I?
27
:
00:02:46
No, no, it was quite good.
28
:
00:02:49
Okay.
29
:
00:02:51
It was because I rehearsed it and that didn't work.
30
:
00:02:55
So, but yeah, thanks a lot for being on the show.
31
:
00:02:59
I've been meaning to have you here for a while because you do a lot of very interesting
things.
32
:
00:03:06
We know each other from the Piemsee world, but that's the first time we actually meet,
almost in person.
33
:
00:03:13
So that's great.
34
:
00:03:15
I'm very happy you're here.
35
:
00:03:17
Thanks a lot for taking the time.
36
:
00:03:19
As usual, let's start with your background and your origin story.
37
:
00:03:23
Can you tell us what you're doing nowadays and how did you end up doing that?
38
:
00:03:30
Yeah, for sure.
39
:
00:03:31
And thanks for having me on and maybe next time we can be in person.
40
:
00:03:35
Some hiking in the mountains.
41
:
00:03:37
So my background, so I originally, so currently I'm in an internet of things lab, an IOT
lab at the Lutian University of Applied Sciences and Arts.
42
:
00:03:48
Within the lab, I'm doing various modeling of engineering processes, but I wasn't always
doing that.
43
:
00:03:54
So I originally studied economics back in the U S and there we were primarily doing
econometrics.
44
:
00:04:02
And so like frequentist based statistics.
45
:
00:04:06
then from there, I moved to Switzerland to do my masters and be with my girlfriend.
46
:
00:04:10
And then in my masters, continued in data science.
47
:
00:04:14
But that's really kind of when I started getting involved in probabilistic programming and
Bayesian statistics in particular.
48
:
00:04:21
And so after graduating, immediately started working in the lab at the university.
49
:
00:04:28
So.
50
:
00:04:30
So pretty, kind of a random, random road, right?
51
:
00:04:34
Like this is where that people end up in Switzerland, especially when coming from the US
or is my prior wrong?
52
:
00:04:43
No, no.
53
:
00:04:44
Yeah.
54
:
00:04:44
It's, it's a bit odd.
55
:
00:04:46
Yeah.
56
:
00:04:49
so what do you mean by an Internet of Things lab?
57
:
00:04:52
I absolutely don't know what that is.
58
:
00:04:55
Yeah.
59
:
00:04:55
So pretty much any, like a lot of things that do with you.
60
:
00:05:00
connectivity of hardware.
61
:
00:05:02
so when you're, when that hardware is connected to the internet to provide some sort of
connectivity, and then that's kind of what you can kind of think of internet of things.
62
:
00:05:12
And so like all of your things labeled as like smart devices, I was kind of under that
umbrella term IOT.
63
:
00:05:20
And so now everything is being, today is being like, is being IOTified.
64
:
00:05:25
And so you have
65
:
00:05:27
your dishwashers connected to the internet, coffee machines and so forth.
66
:
00:05:32
And so that's kind of really what IoT generally means.
67
:
00:05:37
Okay.
68
:
00:05:37
Okay.
69
:
00:05:38
So this is, that sounds very algorithmic and deep learning, does it?
70
:
00:05:45
So, I mean, it's pretty, so with our group, we have a very widespread of knowledge of
within our group, you have people that are really specialists in networking and hardware.
71
:
00:05:56
And then you have, and then like stored data storage and processing, and then maybe like
me more on the machine learning or data analysis side.
72
:
00:06:06
And so for me, it's how do you analyze the data coming from various sorts of machines,
whether that's manufacturing machines and so forth.
73
:
00:06:18
Okay.
74
:
00:06:18
Okay.
75
:
00:06:19
And how did you end up doing that?
76
:
00:06:21
Because that's not what you studied, right?
77
:
00:06:23
So how did that happen?
78
:
00:06:26
Yeah, I think it kind of came stem from, so back to my bachelor's in econometrics, doing a
lot of time series stuff.
79
:
00:06:35
And then you can kind of think in IoT, it's also a lot of time series stuff as well.
80
:
00:06:40
Because when you, when you have these sensors hooked up to the machines, you're also
logging time series.
81
:
00:06:47
Like every, depending on the frequency, like 10 Hertz, 50 Hertz, you're logging a
measurement every second or, or 60 measurements every second.
82
:
00:06:57
And so with that, you get a really nice stream of time series data.
83
:
00:07:02
And I don't know how I exactly got kind of like brought into the IOT field specifically,
but it was kind of like.
84
:
00:07:09
Stunned from, Hey, do you know various time series methods like.
85
:
00:07:14
Like, um, like the seasonal or moving average and state space models and so forth.
86
:
00:07:20
And it's like, okay, yeah, some of that can be translated from econometrics over into this
IOT realm.
87
:
00:07:26
And so was kind of a bit of that kind of gradual shift.
88
:
00:07:31
Okay, okay.
89
:
00:07:32
Yeah, that makes sense.
90
:
00:07:33
So a lot of time series, state-based models, Gaussian processes, I'm guessing, or at least
I'm hoping.
91
:
00:07:40
Yeah.
92
:
00:07:43
Um, how did you end up working on Bayes stats in particular?
93
:
00:07:50
you, do you remember when you were first introduced to that and how, how often do you use
that in your, in your current work?
94
:
00:08:00
Yeah.
95
:
00:08:01
So I was first, and it goes back to my bachelor's when I was doing, um, the first couple
of statistics courses.
96
:
00:08:09
And I just remember like.
97
:
00:08:12
When doing some very basic regression models and then in that course, they're like, yeah,
you reject the null hypothesis because of the p-value.
98
:
00:08:25
And I'm just sitting there, I'm like, yeah, but who and why?
99
:
00:08:30
Why is it 0.05 and who came up with that kind of arbitrary metric?
100
:
00:08:35
And it just kind of like was always unsatisfying to me.
101
:
00:08:38
uh
102
:
00:08:40
This is kind of, you follow this kind of like strict rule and diagram and you kind of,
okay, yeah, you do this or you don't do that.
103
:
00:08:47
And that was always kind of unsettling to me.
104
:
00:08:49
And so from there, it was kind of like a more of a self discovery because there they never
taught in my underground Beijing statistics.
105
:
00:08:58
And so was kind of like, okay, what else is out there?
106
:
00:09:01
And that's kind of, that's where I came across Richard McElroy's statistical rethinking
and then Andrew Gellman's Beijing data analysis.
107
:
00:09:10
And so that's kind of how I got more introduced into Bayesian statistics.
108
:
00:09:13
But then in regards to how, how often I use it, it's pretty much every day.
109
:
00:09:21
So almost every project that I've done in this lab has to do with probabilistic modeling
and some form or another.
110
:
00:09:31
And why is that?
111
:
00:09:33
how come patient stats seem very interesting and important to your work and what do they
bring that you can't have with the Frequencies framework?
112
:
00:09:46
Yeah.
113
:
00:09:46
So the big thing I see in like with sensors and the IoT in general is particularly the
problems that I'm solving is...
114
:
00:09:56
First off, you have a lot of sensor noise.
115
:
00:09:58
So these sensors and the processes that they're measuring aren't perfect.
116
:
00:10:03
And so, for example, like if you're, have a bunch of sensors on a manufacturing machine,
the speeds that these sensors are logging aren't going to be necessarily exact.
117
:
00:10:13
They could fluctuate a little bit.
118
:
00:10:15
And then not only that, the process that it's measuring is always perfect.
119
:
00:10:20
So if, and so I look at that, I'm like, okay, actually
120
:
00:10:25
probabilistic programming is a really good kind of fit here because we can start to begin
to model the uncertainty, some of the sensor noise in the process and then in the kind of
121
:
00:10:34
manufacturing process itself.
122
:
00:10:37
so being able to like quantify the uncertainty there is very powerful because it kind of
lets you account for that sense, for some of the noise in the process and in the
123
:
00:10:46
measurements.
124
:
00:10:48
But at the same time, it kind of, it's really also difficult because you can imagine that
we're
125
:
00:10:54
Some of these settings are logging a lot of data.
126
:
00:10:58
so traditionally, uh Bayesian stat computational methods aren't very good with big data.
127
:
00:11:04
And so it's kind of like, often see kind of in my day to day, like you kind of have this
friction between the big data and Bayes.
128
:
00:11:11
And so that's also kind of maybe we can talk about in a little bit, but you have that kind
of friction.
129
:
00:11:19
Yeah, yeah, exactly.
130
:
00:11:20
That's where I was going.
131
:
00:11:21
And that's where my, my astonishment come from.
132
:
00:11:24
comes from.
133
:
00:11:24
um Yeah.
134
:
00:11:26
So actually do you to talk about that?
135
:
00:11:30
Yeah.
136
:
00:11:30
How do you manage to combine both like this need for uncertainty quantification and also
intuitive em uncertainty um interpretation, but at the same time also need to run the
137
:
00:11:48
models.
138
:
00:11:49
I don't know how frequent you guys need to run the inference, but
139
:
00:11:53
You have a lot of data.
140
:
00:11:54
Yeah, that can be a bottleneck.
141
:
00:11:56
how do you how do you thread that needle?
142
:
00:12:03
So I'd say there's kind of three general approaches, not really approaches, but maybe
techniques that we do.
143
:
00:12:08
And so the first one is probably what everyone can think of is you kind of have your raw
data and then you perform just some aggregation on top, some sort of resampling to kind of
144
:
00:12:19
reduce the size of the data.
145
:
00:12:21
And then you just continue to apply your general, maybe MCMC on that.
146
:
00:12:26
But then the second one is other inference methods like variational inference.
147
:
00:12:31
We've seen
148
:
00:12:32
to be a very good fit because we have a lot of data, but then with variational inference
methods, we can apply that to that data because a lot of times you need like some sort of
149
:
00:12:45
approximating strategy.
150
:
00:12:46
And since we have a lot of data, we can come up with a nice sampling scheme to then use
within the variational inference method.
151
:
00:12:55
And then the last one is...
152
:
00:12:57
Yeah, luckily we do have some hardware at our lab that we can just throw GPUs at the
problem.
153
:
00:13:03
So we can use like maybe a lot of times like JATS, so NumPyro, Pyro, and use these more
traditional deep learning frameworks for GPU acceleration.
154
:
00:13:14
Yeah, yeah, I see.
155
:
00:13:16
Yeah.
156
:
00:13:16
So that last approach is quite nice also because you don't have to think too much, right?
157
:
00:13:23
Like a NumPyro.
158
:
00:13:25
Or PIMC model can just run out of the box on JAGS and you get GPU acceleration and you
don't have to do anything else.
159
:
00:13:33
So I'd say if you have the, yeah, if you have the compute available, uh when I do that,
especially if you have to come up with a customized version of inference scheme, that's
160
:
00:13:45
the biggest, that is much more, much more intricate.
161
:
00:13:48
And I'm curious what's your experience with the different VI algorithms.
162
:
00:13:55
Maybe if you can give a lay of the land to the listeners, where do you see these methods
and the different algorithms being useful or not and what your practical recommendations
163
:
00:14:08
are in a way.
164
:
00:14:10
Yeah.
165
:
00:14:11
So in regards to that, I just mainly use kind of these, the standard implementations that
NumPyro offers.
166
:
00:14:19
So like with their auto guides and their mean field.
167
:
00:14:23
oh
168
:
00:14:24
and so forth.
169
:
00:14:26
so when I, when I, when we were using those as primarily for like hierarchical models and
we found that, or I found that out of the box, they work quite well.
170
:
00:14:36
So I had to kind of go off into some other tangents to figure out which ones work or not.
171
:
00:14:44
Yeah, yeah, yeah, for sure.
172
:
00:14:46
I know, I know we are making a lot of effort also in the PIME system to have more, more VI
so...
173
:
00:14:54
There is a lot of effort made with the Google Summer of Code on improving out of the box
VI, so ATVI.
174
:
00:15:05
Also, Jesse Grabowski did lot of work on the Laplace approximation in PIMC Extras that you
can use now in conjunction with Fitmap where you can use the map estimate to initiate the
175
:
00:15:21
Laplace.
176
:
00:15:22
approximation, is also the pathfinder algorithm that's already available in PimcXRAS.
177
:
00:15:31
We will have a dedicated episode about that with Michael Kao, who developed the oh
pathfinder module in PimcXRAS.
178
:
00:15:43
know, people stay tuned for this.
179
:
00:15:46
And I think I'm forgetting even one, one VI method that we are adding.
180
:
00:15:52
But maybe that will come back to me.
181
:
00:15:55
But there is active development on the Pimesys line too.
182
:
00:15:58
And I feel this is really great because I think like we've neglected in the last few years
to improve that as a community or at least to make people more aware of these different
183
:
00:16:14
algorithms and they can come in very handy.
184
:
00:16:19
Yeah.
185
:
00:16:20
Which, who, who was it that kind of pioneered that?
186
:
00:16:24
Cause at least from, from an outsider kind of seems a bit like Pyro.
187
:
00:16:29
Did Stan also do some?
188
:
00:16:32
Yeah, yeah, yeah.
189
:
00:16:33
Stan has a lot of that.
190
:
00:16:34
Pathfinder was developed by Bob Carpenter and his team, actually.
191
:
00:16:39
At the beginning, it was developed as an initialization method for nuts, but they realized
that.
192
:
00:16:49
Actually the results in themselves were really good.
193
:
00:16:55
so they just released it also as a separate outcome.
194
:
00:16:58
But something you can definitely do also is initialize nets with the pathfinder results,
which can be very useful.
195
:
00:17:07
like there's that.
196
:
00:17:08
So Stan has a lot, NumPyro has a lot.
197
:
00:17:12
We've had the ADVI module in Prime C for a long time now.
198
:
00:17:16
Now it's getting a bit more love and we we have La Blast, we have, we have Pathfinder as I
just said, and I am sure I'm forgetting one method, but you know, that will come back to
199
:
00:17:27
me.
200
:
00:17:28
And actually if you want, if you folks want an introduction to these and the different
methods, there is a talk I co-wrote with Chris Von Speck and actually Michael Kao, who I
201
:
00:17:42
was talking about a few minutes ago.
202
:
00:17:46
for Pi Data Virginia.
203
:
00:17:49
And so I will put the YouTube video in the show notes and also the GitHub repo.
204
:
00:17:56
And Chris was the one, I was supposed also to fly to Virginia, but didn't find any
affordable flights.
205
:
00:18:03
So I had to ditch Chris and he was the sole presenter.
206
:
00:18:12
I think he doesn't hate me.
207
:
00:18:14
He loves presenting and he's so good at that.
208
:
00:18:19
So the video is awesome.
209
:
00:18:20
He obviously did a great job presenting the material.
210
:
00:18:24
So I'll put that in the show notes folks, because that's a very good lay of the land
basically of what you can do with VI and what the different algorithms are.
211
:
00:18:33
then keep an eye out.
212
:
00:18:35
think Hua-Lan also recently gave a talk in PyData Berlin.
213
:
00:18:41
Yeah, exactly.
214
:
00:18:41
That's what I was going to say.
215
:
00:18:43
So it's not, the videos are not released yet as the time of recording and I don't think
they will anytime soon.
216
:
00:18:48
think they usually take about two months to release them.
217
:
00:18:52
So folks keep an eye out on the, on the PyData Berlin YouTube channel and then watch Juan
Arduz talk there where he basically builds up on our presentation at PyData Berlin, at
218
:
00:19:05
PyData Virginia and then builds on that, on top of that and shows practical implementation
of VI.
219
:
00:19:11
especially SVI with NumPyro with really good practical advice.
220
:
00:19:16
So yeah, really, really recommend that too.
221
:
00:19:18
This is a very good one.
222
:
00:19:23
And actually I'm curious if there is anything you do in particular Gabriel, when you, when
you use VI to try and make sure that the results you're getting back are reasonably close
223
:
00:19:36
to the posterior.
224
:
00:19:37
Because we have these, these guarantees with NURTS, with MCMC, but we don't with VI
algorithms.
225
:
00:19:45
And so usually something I do is...
226
:
00:19:48
trying the model on fake data and make sure they can recover the parameters of interest in
a reasonably close range or for the parameters they can't try and see if there is a
227
:
00:20:00
pattern in the bias.
228
:
00:20:01
And at least we know there is a bias in the model and where, and that's already very
helpful because well, if you can at least get a model running with VI, even though you
229
:
00:20:11
know there is a small bias, I would argue this is already better than having no model
because you only want to do MCNC.
230
:
00:20:18
Yeah, but so basically I think this is something useful, but I am sure you're doing much
better things than, than I am because you have more experience than me on that front.
231
:
00:20:28
No, not on that front.
232
:
00:20:29
Not really.
233
:
00:20:30
Cause I also kind of take the same approach there, like before kind of scaling out to like
maybe a more complex model on our problem in industry.
234
:
00:20:39
Like I try to simulate what that
235
:
00:20:42
data or engineering process looks like on, yeah, on simulated data and then see just
pretty much exactly what you said, kind of like how well the algorithm is kind of
236
:
00:20:51
recovering the parameter or the posterior and is able to actually model the problem at
hand.
237
:
00:20:57
Yeah, it's funny.
238
:
00:20:58
Yeah, that's pretty much kind of what I do as well.
239
:
00:21:01
found that I do.
240
:
00:21:03
Okay, cool.
241
:
00:21:04
So I'm not, I'm not doing something obviously stop it.
242
:
00:21:07
That that's good.
243
:
00:21:08
That that is rare.
244
:
00:21:10
No, no,
245
:
00:21:14
So anything you want to about, about VI, things you've noticed in the wild that work
particularly well or particularly bad before we move on to some other topic.
246
:
00:21:28
Okay.
247
:
00:21:30
So yeah, something I really want to talk to you about is that something you've worked on
for a long time.
248
:
00:21:37
Like this this has, this was really a masterpiece.
249
:
00:21:42
So thank you first for doing that.
250
:
00:21:45
And this is your, your re-implementation of BART, Pageant Additive Regression Trees in
REST.
251
:
00:21:52
So people probably know you can do BART models with .c.
252
:
00:21:57
in a sub-package that's called pinceybart.
253
:
00:22:00
And this is an awesome package.
254
:
00:22:01
I'm it whenever I can, but it has the defaults of regression trees, which is you have, if
I remember correctly, you have as many parameters as you have rows in your death set.
255
:
00:22:15
So that means it grows pretty fast in the computational demands.
256
:
00:22:20
So...
257
:
00:22:21
Yeah, in my experience, when you start passing 200K observations, it starts to be, to be
really slow to infer.
258
:
00:22:29
So what you did is re-implement the sampling algorithms, which is metropolis Gibbs, I
think, or something like that.
259
:
00:22:38
What algorithm is that?
260
:
00:22:40
uh Like a particle Gibbs.
261
:
00:22:42
Yes, that's particle Gibbs.
262
:
00:22:44
Yeah.
263
:
00:22:44
So particle Gibbs, and you re-implemented that in Rust.
264
:
00:22:47
So yeah, can you, can you talk about that?
265
:
00:22:49
Like basically...
266
:
00:22:51
Why would you start doing that?
267
:
00:22:56
And basically give us the elevator pitch for the project before we dive a bit deeper.
268
:
00:23:02
Yeah.
269
:
00:23:02
So the, PyMCBART project does this really kind of, I believe comes from, it comes from
Osvaldo and about a couple, a year ago he reached out and was like, Hey, we need to make
270
:
00:23:13
this thing faster.
271
:
00:23:15
Are you interested?
272
:
00:23:16
I'm like, Hey, I'm all for it.
273
:
00:23:17
Let's do it.
274
:
00:23:20
For that, I really hadn't used Bart or was too familiar with, with the, the, method.
275
:
00:23:27
I mean, I was familiar with more like gradient boosting techniques, which is somewhat
similar, but I did have the experience with the Rust.
276
:
00:23:35
And so that was kind of a good, kind of complimenting each other.
277
:
00:23:39
It's like, okay.
278
:
00:23:40
I see kind of like maybe what like Adrian Sabolt's doing with NutPy and Rust.
279
:
00:23:45
Maybe we can kind of share some of the code that he's been doing.
280
:
00:23:49
And then use that within a PIMC bar to help kind of, least with the log probability
evaluations and so forth.
281
:
00:23:58
And so, yeah, this thing, this really stemmed Osvaldo wanting to make it more performant.
282
:
00:24:05
And then me stepping on board and saying, Hey, okay, let's, yeah, let's implement this in
Rust and then share some of the code base from Pi.
283
:
00:24:14
Hmm.
284
:
00:24:15
Okay.
285
:
00:24:15
Okay.
286
:
00:24:16
And so how was the, how was the experience like was, was rest all new to you?
287
:
00:24:22
How do you even start on such a huge project?
288
:
00:24:26
Yeah.
289
:
00:24:26
So I do have prior experience with Rust within some data processing pipelines in the IOT
lab.
290
:
00:24:33
So the Rust part wasn't entirely new to me, but what was new was kind of interrupting with
Python, having Python bindings.
291
:
00:24:42
So that way the Python user.
292
:
00:24:44
When they call the bar code, it calls then it executes the Rust implementation.
293
:
00:24:51
But in regards to the implementation process, essentially what, what the approach I kind
of took was, let's implement this kind of essentially one for one from the Python
294
:
00:25:01
implementation into the Rust implementation.
295
:
00:25:04
And then from there, we can start to kind of optimize the different, whether the different
functions or methods.
296
:
00:25:10
And then that way we can get a more of a nice performance improvement instead of kind of
just immediately rewriting something.
297
:
00:25:17
Then we wouldn't know maybe like, okay, now this isn't kind of working right.
298
:
00:25:21
Where, where did that go wrong?
299
:
00:25:22
And so forth.
300
:
00:25:23
And yeah, I don't know if you want to talk about some of like the Rust specifics or the
algorithm specifics or.
301
:
00:25:29
Yeah, maybe.
302
:
00:25:30
So it's been a while since we talked about BART and Repression Trees on the show.
303
:
00:25:37
So maybe if you can introduce.
304
:
00:25:40
The methods, the tree methods in general, you mentioned graded boosting, we obviously
mentioned BART.
305
:
00:25:46
So maybe give us just the elevator pitch for BART and tree methods in general.
306
:
00:25:51
And then I think it will be useful to dive into a bit more of the technical details of the
algorithm to understand really how the methods work and why people could be interested in
307
:
00:26:08
using BART.
308
:
00:26:09
And in which cases.
309
:
00:26:11
Yeah.
310
:
00:26:12
So yeah, at a high level, then like you have these tree based methods and I like the
simplest level you have your decision tree.
311
:
00:26:21
And so that's kind of just your, your logic.
312
:
00:26:23
Like if this variable is greater than some value, you kind of go down the tree and then
you finally get to a leaf node.
313
:
00:26:31
And that's kind of like your prediction for a target or your response variable.
314
:
00:26:35
So
315
:
00:26:36
building up off of the decision tree, you can, you could have like a random forest, which
has been like a bunch of those decision trees together into a forest.
316
:
00:26:45
Um, but then kind of even stepping up on top of that, have gradient, like boosting
methods.
317
:
00:26:53
And so this, these methods are really where you attempt to like learn, like you use kind
of like the random forest, but you learn.
318
:
00:27:05
you learn the residual between the difference in the tree's predictions.
319
:
00:27:12
And so when you start to do that, you're kind of like a meta learner.
320
:
00:27:16
You're kind of learning where each tree is doing bad at to kind of come up with a better
producing or better predicting tree.
321
:
00:27:25
And so this is kind of really more where BART is aligned with, with the gradient boosted
methods rather than a random forest.
322
:
00:27:33
um
323
:
00:27:35
Because BART is kind of doing the same thing as boosting these gradient boosted methods.
324
:
00:27:40
The way it's kind of assembling these trees is different.
325
:
00:27:43
um And the way that the BART assembles these trees is by taking random perturbations and
then assessing the log likelihood of that tree.
326
:
00:27:58
Okay.
327
:
00:27:58
Yeah.
328
:
00:27:59
Yeah.
329
:
00:27:59
So that's closer to gradient boosting way of doing things.
330
:
00:28:03
Yeah.
331
:
00:28:06
And okay.
332
:
00:28:07
So that's the, that's the elevator pitch.
333
:
00:28:10
Now, when are these models particularly useful in your experience and why is there
strength and drawbacks?
334
:
00:28:20
So
335
:
00:28:25
One of the strengths, think, the, with, so if you want to kind of compare like BART and
like a traditional maybe XGBoost or LightGBM model, one of the big benefits of BART is
336
:
00:28:38
that you get the uncertainty quantification.
337
:
00:28:42
You have a posterior over decision trees.
338
:
00:28:46
And so then with that, you can actually kind of, you can actually stick that model.
339
:
00:28:52
into maybe other things that you want to use uncertainty for.
340
:
00:28:56
For example, like Bayesian optimization traditionally uses Gaussian processes, but you can
actually, you can actually stick this BART model into the Bayesian optimization routine as
341
:
00:29:07
well, because you also have the uncertainty there.
342
:
00:29:09
But one of the big drawbacks with BART is that it's famously slow compared to the like
XGBoost or like GBM.
343
:
00:29:18
And so that's kind of one of the big drawbacks I see there with that message.
344
:
00:29:22
But another nice thing about bar is that.
345
:
00:29:28
So with like XGBoost and like GBM, it's very easy to overfit on your data.
346
:
00:29:34
And so you're going to, you need to look at a lot of like loss or loss curves and figure
out, okay, Hey, when do I stop training?
347
:
00:29:43
When, how much, how many trees do I use?
348
:
00:29:46
How many learners and so forth to kind of stop the tree, stop the tree of training and
stop the tree from overfitting.
349
:
00:29:54
But with bar, it's really nice because you can.
350
:
00:29:57
we have regularizing techniques.
351
:
00:30:00
So that way we avoid overfitting kind of inherently within the method.
352
:
00:30:06
And so that's one really nice kind of pro I see with Bart over the others.
353
:
00:30:10
But yeah, the big, I'd say con is that it's significantly slower than the other ones.
354
:
00:30:17
And that's just for multiple reasons.
355
:
00:30:20
So yeah, thanks a lot.
356
:
00:30:21
This is much clearer.
357
:
00:30:23
So, and I hope it is too for listeners.
358
:
00:30:26
So now I think it's a good time to dive into why that would be like where the bottlenecks
are and why, like what's the algorithm per se and how does it work basically under the
359
:
00:30:42
hood so that people really understand the models when they use that.
360
:
00:30:48
Yep.
361
:
00:30:49
So in regards to like PyMCBART, um we implement, as I think we stated before, particle
Gibbs, whereas other implementations might implement like a metropolis Hastings approach.
362
:
00:31:02
And so with the particle Gibbs step, how the algorithm works is that, m so we generate a
set of trees.
363
:
00:31:13
maybe 50,
364
:
00:31:14
or in pyMCBAR, you define the number of trees and the number of particles.
365
:
00:31:19
And so, for example, you might say, okay, we want 50 trees and then 10 particles.
366
:
00:31:25
And so now we're going to, we're going to perform a series of particles or particle gib
steps.
367
:
00:31:32
And so at the first step, um we want, we're going to loop through the, for all 50 trees.
368
:
00:31:40
So for the first tree.
369
:
00:31:43
We're going to initialize then take maybe 10 particles, whatever you define.
370
:
00:31:47
then those 10 part for all those 10 particles, which are just decision trees.
371
:
00:31:53
We perturb each one.
372
:
00:31:56
Maybe we decide for, we sample for a variable, a certain split value, and then another
one, another split value.
373
:
00:32:04
And then we assess the log likelihood.
374
:
00:32:06
And then at the end, we say, okay, hey, this particle.
375
:
00:32:11
maybe particle five out of the 10, that's going to replace the current tree, which is one.
376
:
00:32:17
And then we proceed to the next tree.
377
:
00:32:19
Tree number two, we go through that same process, initialize 10 particles, perturb each
one, uh weight them according to the law of likelihood, and then replace that tree.
378
:
00:32:29
And we continue until all 50 trees are essentially replaced.
379
:
00:32:34
And then, and so yeah, that's really kind of at a high level, the main.
380
:
00:32:38
the algorithmic steps, it's really quite a simple process, which is quite surprising if
you read kind of lot of these papers.
381
:
00:32:48
Yeah.
382
:
00:32:49
And were you already versed that much into part in three methods before working on that?
383
:
00:32:57
Or did you get that knowledge by working on this project?
384
:
00:33:01
Not really.
385
:
00:33:02
So it was mainly knowledge from working on the project and reading the code base that
Osvaldo and others did, which was quite readable, which is really nice, nice procedural
386
:
00:33:11
kind of line by line, oh, this is what it's doing.
387
:
00:33:15
And so that really kind of helped with the intuitive understanding of, hey, this is what
the particle gives us doing.
388
:
00:33:21
is kind of, Yeah.
389
:
00:33:25
And I second that, yeah, the code base is really...
390
:
00:33:30
Really well done and written and it's quite easy to start contributing to the package.
391
:
00:33:38
this is really awesome.
392
:
00:33:40
I've dabbled a bit with Bart for uh work in baseball and I haven't tried yet your Rust
implementation because that is very useful in baseball because there is a lot of use cases
393
:
00:33:53
for methods like Bart, but there is so much data that you often need.
394
:
00:34:00
you often need an acceleration somewhere.
395
:
00:34:01
um yeah, whether it's using classic PMC bar on a GPU or actually using your rest
implementation and adding a GPU on top of that, that should probably be a really good
396
:
00:34:16
boost to some flick speed.
397
:
00:34:19
Yeah.
398
:
00:34:20
Yeah.
399
:
00:34:20
And I must say one of the things that's really nice with the PMC bar is that there's
several things like
400
:
00:34:28
Really nice enhancements that we have.
401
:
00:34:30
so if you go look around online, a lot of the other packages are specifically for Gaussian
likelihoods.
402
:
00:34:38
So that's the first one.
403
:
00:34:39
So you can't really model like a Poisson process or any other.
404
:
00:34:43
uh The second one is that we also offer various split rules.
405
:
00:34:50
So if you have in your design matrix, numerical features and categorical features, you can
pass
406
:
00:34:57
split rules specific for that data type.
407
:
00:34:59
um And this is something that's common in other packages that it kind of just assumes
everything as a numerical value.
408
:
00:35:05
And so that those are kind of the two really nice things I think differentiates kind of
our package.
409
:
00:35:11
But then lastly is that like we have the kind of random variable and how this can, how
this is embedded in PIMC.
410
:
00:35:21
And so you can model, you can model um
411
:
00:35:24
linear predictor, you can model the sigma, that parameter.
412
:
00:35:28
And so that's really nice because you can build essentially arbitrary probabilistic
programs with BART.
413
:
00:35:34
Whereas other package, it's kind of, the, you use that method and that is the method that
you use.
414
:
00:35:41
Yeah, exactly.
415
:
00:35:42
This is actually a very good point.
416
:
00:35:44
Yeah.
417
:
00:35:44
That you can module, you can add that as a module in a painting model.
418
:
00:35:48
So you could model your linear predictor with a classic linear regression.
419
:
00:35:53
And then your Sigma, your standard deviation, you could model that with a bot, a random
variable.
420
:
00:35:58
So this is very useful.
421
:
00:36:02
I must say that recently in the current existing Python implementation, support has been
added for more than one bot random variable, which is really great and has been something
422
:
00:36:14
that's been requested.
423
:
00:36:16
Yeah.
424
:
00:36:16
So you could do two different bots on two different printers.
425
:
00:36:21
This is really awesome.
426
:
00:36:22
In a way that's starting to look a lot like the GP sub module.
427
:
00:36:26
GPs, can add them to PIMC models really as you want.
428
:
00:36:31
And you can have different GPs for any number of parameters you want to your model.
429
:
00:36:36
And you really cannot do that with like GP packages, GP-focused packages.
430
:
00:36:42
Most of GP-focused pages is where you can just use the GPs.
431
:
00:36:48
That's all.
432
:
00:36:49
You cannot do...
433
:
00:36:50
anything else, and often also likelihood, likelihood limits.
434
:
00:36:54
In PIMC you can use a GP with any likelihood distribution.
435
:
00:36:58
In most of the packages it's often normal, normal likelihood because that's often how
goes.
436
:
00:37:04
How is it?
437
:
00:37:04
So I know on the BART, pure Python BART, we can use any likelihood we want.
438
:
00:37:09
How is it on the Rust side now?
439
:
00:37:11
Because I remember at the very beginning you had not included yet categorical multinomial.
440
:
00:37:17
ability to use that kind of likelihood.
441
:
00:37:20
And of course, I always use that likelihood.
442
:
00:37:23
So was like, damn, cannot use that yet.
443
:
00:37:26
But yeah, how, how is it right now when it comes to the likelihood and especially the, the
most multidimensional one, always, I know much more of a pain to develop.
444
:
00:37:39
So in regards to the current state of the Rust implementation, there are still some things
that
445
:
00:37:47
are implemented one-to-one and I'm still working on that.
446
:
00:37:50
But in regards to the likelihood that that's been resolved.
447
:
00:37:54
So you can model multiple different likelihoods.
448
:
00:37:58
But I think the one you were specifically asking was in regards to the different split
rules, like the categorical and like a continuous split rules.
449
:
00:38:08
And that is also now implemented in the Rust implementation.
450
:
00:38:11
But the one thing that's not yet is the multiple BART random variables.
451
:
00:38:16
I still am working out some bugs there.
452
:
00:38:19
uh And so that's still something that's kind of being implemented on our end.
453
:
00:38:24
Okay.
454
:
00:38:24
Yeah.
455
:
00:38:25
Yeah.
456
:
00:38:25
So concretely we can do anything with PimesyBotRest that we can do with PimesyBot, except
for having two, more than one random variable, BartRandomVariable in the PimesyBot.
457
:
00:38:39
Otherwise everything is on par right now.
458
:
00:38:42
Yeah.
459
:
00:38:42
Amazing.
460
:
00:38:43
Yeah.
461
:
00:38:44
It is cool.
462
:
00:38:44
Yeah.
463
:
00:38:45
Yeah, thanks Gabriel.
464
:
00:38:47
That means now we'll be able to use that much, much more on baseball data.
465
:
00:38:53
This is going to be super fun.
466
:
00:38:56
And how do you squeeze that in actually?
467
:
00:38:58
Like is it part of your job or is it something you do on the side?
468
:
00:39:03
And like maybe you have some advice for people who want to start doing some open source
like we do and they want to have some practical advice of how to...
469
:
00:39:13
squeeze that in, in the work and their free time because in the end, is really what
research is about, Trying to push the envelope on very frontier topics, which are not only
470
:
00:39:28
going to pay off for your project, but for your company as a whole and a lot of other
people who are not used to that have that one.
471
:
00:39:36
Yeah.
472
:
00:39:37
And so, yeah, luckily with this kind of...
473
:
00:39:42
A lot of the stuff I do at work uses these tools.
474
:
00:39:45
And so if, if I, if like our team and me see kind of like, Hey, that would be really nice
if we could speed up BART because it would help our problem at work.
475
:
00:39:56
And then, then like doing that, doing the open source at work kind of aligns quite nicely.
476
:
00:40:01
But if the problems are kind of aren't really related, yeah, that's kind of in my own time
and so forth.
477
:
00:40:07
But in regards to contributing more generally, um,
478
:
00:40:12
I honestly, the PMC and Bambi community is just, think one of the best in the scientific
open source community.
479
:
00:40:22
Everyone is very inviting and willing to help.
480
:
00:40:24
But my advice kind of, I'd say to people starting out is don't bite off more than you can
chew.
481
:
00:40:30
Pick kind of maybe the low hanging fruit and then work your way up from there.
482
:
00:40:35
I've found to be quite a, fairly more safe approach.
483
:
00:40:40
and kind of goes better with the maintainers that way.
484
:
00:40:43
Yeah, that definitely sounds right.
485
:
00:40:47
And that's what I recommend to also to people who reach out to me.
486
:
00:40:51
Maybe one last question on Bart.
487
:
00:40:54
Since you use that a lot in your work, what's your experience on these models?
488
:
00:40:59
What do you find they are very useful for?
489
:
00:41:02
Where do you see their limitations to be?
490
:
00:41:07
Yeah.
491
:
00:41:07
So I've used them in two scenarios.
492
:
00:41:10
One of them is it within embedding bar and Bayesian optimization routine, which you talk
to with Max, but then the other one is specifically for a time series process that is that
493
:
00:41:26
I'm going to use my hands, but like exhibits kind of like a, kind of like a, really kind
of like a partitioned kind of like blocky.
494
:
00:41:34
That's not really good.
495
:
00:41:35
The time series isn't continuous, but it kind of has like this block structure.
496
:
00:41:40
like from time, from point one to point B as a constant value.
497
:
00:41:45
And then the next, the next time interval, it goes up to another shoots up to another
value and then it's constant for a little bit.
498
:
00:41:52
And so this is kind of quite nice because these tree methods are essentially kind of like
a piecewise linear function.
499
:
00:42:01
And so it's able to model that just kind of inherently quite nicely.
500
:
00:42:05
Yeah.
501
:
00:42:05
And so that's just kind of a...
502
:
00:42:08
Yeah, like a very raw, weird time series, is, mean, no time series is really continuous,
but it's like, you don't even have enough point for it to look like it's continuous.
503
:
00:42:24
so, but the discreteness of the tree structure here is actually an asset.
504
:
00:42:31
Yeah, exactly.
505
:
00:42:32
Yeah.
506
:
00:42:34
kind of see this come up in sensor based time series, kind of quite often, I think.
507
:
00:42:39
If you kind of look at maybe like our profiles over time, you kind of see that kind of
like, block looking structure as well.
508
:
00:42:48
And then you can kind of be like, Ooh, maybe like these tree based methods might work
here.
509
:
00:42:53
Okay.
510
:
00:42:56
Okay.
511
:
00:42:56
This is very interesting.
512
:
00:42:57
I love it.
513
:
00:42:58
Thanks.
514
:
00:42:58
um
515
:
00:43:01
Actually, two other questions on that.
516
:
00:43:03
ah What about the time intervals?
517
:
00:43:08
Because a lot of the time having fixed time intervals is much easier to deal with.
518
:
00:43:14
What's your experience here with bot models?
519
:
00:43:18
Like, are they sensitive to the fact that sometimes the time intervals are not sensitive,
which I guess might be the case with sensor data?
520
:
00:43:29
Related to that...
521
:
00:43:31
What about missing data and what about out of sample predictions?
522
:
00:43:34
I know it's a big, question, but it's, it's all related.
523
:
00:43:38
so the, in regards to the, the time interval.
524
:
00:43:41
you're, you're saying when the time intervals are unequal, the time between, okay.
525
:
00:43:46
Yeah.
526
:
00:43:47
So in regards to like the BART or just, just tree methods in general, I think are very,
they're very good for interpolating missing values because you can kind of impute that.
527
:
00:44:01
or to interpolate that inherently within the tree.
528
:
00:44:04
And so if you have like a sensor that's maybe didn't log a measurement over a certain time
period, but that all of a sudden kind of comes back online and then it continues logging
529
:
00:44:14
with the tree methods, you do kind of, you do get a nice interpolation there.
530
:
00:44:19
And so you don't really need to do any kind of feature, like feature processing
beforehand.
531
:
00:44:23
And so it's nice cause that's handled inherently within the model.
532
:
00:44:30
Okay.
533
:
00:44:30
Yeah.
534
:
00:44:30
Yeah.
535
:
00:44:31
So this is great.
536
:
00:44:31
That's what I thought.
537
:
00:44:33
But yeah, basically when there is no fixed time interval, it's like a missing data
problem.
538
:
00:44:39
So they are very good at internal interpolation.
539
:
00:44:47
How good are they at extrapolation?
540
:
00:44:50
So really doing out of simple predictions.
541
:
00:44:53
How does that work here?
542
:
00:44:55
Yeah.
543
:
00:44:56
So luckily I haven't really had to use it for out of sample predictions in my works.
544
:
00:45:02
Interesting.
545
:
00:45:03
Yeah.
546
:
00:45:03
Because I mean, obviously I'm asking you that because I know tree methods are not good at
at simple predictions.
547
:
00:45:09
So I'm glad I haven't had to use it.
548
:
00:45:13
So I think that's one of the reasons why I did choose to use that because, like, in
particular with some of the couple of the problems we were modeling, for example,
549
:
00:45:25
Like if you have like actuator limits of a, of a robot, that that's pretty clear upper and
lower bounds that you have from the engineering process.
550
:
00:45:37
And then, so, you know, you're not going to be extrapolating past that.
551
:
00:45:40
And so, you know, with the bar, then you have a nice interpolation within these actuator
limits.
552
:
00:45:48
Yeah, exactly.
553
:
00:45:50
So, and that's actually why also I haven't been able to use path methods yet in production
other than for exploring and teaching because most of the time I work on actual out of
554
:
00:46:06
sample data.
555
:
00:46:07
So let's say if I work on players, the age of players is not really out of sample.
556
:
00:46:13
All players are human, so you'll never have a player with 120 years old.
557
:
00:46:19
But if you were looking at season, for instance, well, the errors, the errors really are
out of sample.
558
:
00:46:25
And so here it's a problem.
559
:
00:46:26
Or players themselves are out of sample.
560
:
00:46:29
What about the player you never saw in your training, that said?
561
:
00:46:33
So that's often why I couldn't use tree methods or part methods because they don't
extrapolate in comparison to caution processes, which are really good in general at
562
:
00:46:45
prediction and state space models.
563
:
00:46:50
Okay, awesome.
564
:
00:46:51
And so one last question, I swear on Bard.
565
:
00:46:55
What do you mean by using them in optimization routines?
566
:
00:47:00
I find that super interesting.
567
:
00:47:02
Yep.
568
:
00:47:03
So in regards to the optimization routine, I'm specifically talking about Bayesian
optimization.
569
:
00:47:09
So essentially this Bayesian optimization is a sequential optimization process where you
typically have some sort of
570
:
00:47:17
surrogate model, and that can be typically it's a Gaussian process, but it can really be
any other method that provides a posterior.
571
:
00:47:27
And so I'm, so I'm essentially kind of slopping out this GP, putting in the BART model
there.
572
:
00:47:34
And so using that to optimize for some industrial process.
573
:
00:47:39
so with the iterative method, essentially what we're kind of doing is we're training the
model and the historical data, and then we're using
574
:
00:47:46
Um, you don't have to get to the details, but some sort of function or generator to
generate a new set of features, feature values or design points.
575
:
00:47:56
And then, and then evaluating that back with Bart and then running the loop again, retrain
the model, generate some new values, evaluate with Bart and so forth.
576
:
00:48:06
And so that's kind of what I mean generally with the Bayesian optimization.
577
:
00:48:11
And it's just, it's just Bayesian because we're using probabilistic methods from what I
can tell.
578
:
00:48:17
Okay.
579
:
00:48:17
it is your, Bart is included in your loss function?
580
:
00:48:23
It's the, the, it's the, so it's the SORIG model.
581
:
00:48:28
So it's, so for example, in a lot of, and so if you think about, if you want to optimize
the machine for like the scrap rate, how much scrap an industrial machine is producing.
582
:
00:48:44
You probably don't know the physical, like the equations that gen that that will govern or
produce the scrap.
583
:
00:48:52
And so what's the next best thing we can do?
584
:
00:48:54
We can turn to data-driven methods.
585
:
00:48:57
And so there we collect data about the process.
586
:
00:49:00
Maybe you have sensor measurements on how fast the robot arm is moving, how fast is
material being fed into the machine.
587
:
00:49:09
And then you also have measurements on, okay, this
588
:
00:49:14
Scrap was produced, no scrap was produced and so forth.
589
:
00:49:17
And so we use them the bar or the GP and to learn the association between the.
590
:
00:49:23
Arameters governing the process and your whatever metric you're traveling.
591
:
00:49:29
And then, so now that you have that, that's kind of now your, your, your function, that's
your mapping from inputs to outputs.
592
:
00:49:37
And then, so with the Bayesian optimization framework or loop.
593
:
00:49:42
You're then kind of deciding, oh, hey, we, we want to optimize.
594
:
00:49:46
We want to produce as little scrap as possible.
595
:
00:49:49
So we're going to use this model that we just trained on to propose or to select the, the
values that produce the least amount of scrap.
596
:
00:50:03
If that makes sense.
597
:
00:50:05
Okay.
598
:
00:50:06
Okay.
599
:
00:50:06
Yeah.
600
:
00:50:06
Yeah, I think it does.
601
:
00:50:07
So, so this is not really that you are using
602
:
00:50:11
part inside of a loss function when doing optimization.
603
:
00:50:16
This is more, this is still something different.
604
:
00:50:19
Not necessarily, no.
605
:
00:50:23
Do you any public writing about that that people could look at if they are interested in
these kind of methods?
606
:
00:50:33
We are writing a paper, but it's not um published yet, so unfortunately, no.
607
:
00:50:39
yeah.
608
:
00:50:40
Well, let me know when it is, because then we'll purchase that in the LBS sphere, which as
you know is extremely powerful in the world.
609
:
00:50:52
Sure to do that.
610
:
00:50:53
Great.
611
:
00:50:53
So I think it's a good summary of everything, Bart.
612
:
00:50:58
Do you have anything to add on that I forgot to ask you?
613
:
00:51:04
Or do you think we did a good job already to give people an idea of how they can use that?
614
:
00:51:13
No.
615
:
00:51:14
So, I mean, our goal is to essentially provide backwards compatibility with the Rust
implementation.
616
:
00:51:21
So it's just a drop in replacement.
617
:
00:51:22
But I think the things that we maybe didn't touch on too much, maybe for some of the Rust
people out there, maybe like what were some of the interesting Rust, like Rusty bits that
618
:
00:51:33
kind of resulted in some nice performance gains, I think could be kind of fun to talk
about.
619
:
00:51:42
Yeah.
620
:
00:51:45
one of, so a couple of the things, or especially one of the areas that was nice to
implement with Rust is, in the, the tree proposals.
621
:
00:51:57
And so what we do with pyMC bar is we have a prior probability over the depth of the tree.
622
:
00:52:04
And so if you think of a binary tree as a, like, as you add nodes,
623
:
00:52:09
to it, it'll then the depth of the tree will increase.
624
:
00:52:12
And so we have a prior probability of how deep a tree can be.
625
:
00:52:16
And you can actually set this as a user using the two parameters, alpha and beta.
626
:
00:52:22
And so in the tree proposals, we propose a variable to split on and a value to split.
627
:
00:52:30
so based off of that and the prior probability of the depth of the tree, we can
essentially
628
:
00:52:38
We then can say how likely a tree is to be grown.
629
:
00:52:45
essentially, how likely is it the depth to increase?
630
:
00:52:49
And so traditionally then, in the original Python implementation, we performed in the tree
proposal, would always perform a tree proposal in a systematic resampling.
631
:
00:53:06
to propose the particle to replace the tree.
632
:
00:53:09
But with the Rust implementation, we take a lazy approach.
633
:
00:53:14
And so we use a smart pointer called reference counting to essentially defer or wait to
essentially materialize the growing the tree until we know we've, until we know we will
634
:
00:53:31
accept that tree to grow.
635
:
00:53:34
And so.
636
:
00:53:35
So we kind of like beforehand, we'll calculate the proposal, we'll compute the proposal
and we'll say, Hey, this is what it would do if it were to be chosen or selected.
637
:
00:53:47
And then if it is selected, okay.
638
:
00:53:50
Materialize actually compute the results.
639
:
00:53:53
And so it's a bit of a kind of way of doing it.
640
:
00:53:58
And so there it's really nice because in the systematic resampling, we resample according
to the
641
:
00:54:05
the weights or the log likelihood of the trees.
642
:
00:54:08
And so if you have 20 particles, and then after systematic resampling, say we select like
10, 10 of those new particles all come from the same one, because I had a high weight.
643
:
00:54:22
We essentially have to copy or perform a deep copy or a clone of that tree structure,
which can be very expensive.
644
:
00:54:30
But since we're now kind of using these smart pointers for
645
:
00:54:35
and only copy if we know we're going to accept the tree proposal, then we get a really
nice performance boost there.
646
:
00:54:43
And that's kind of just a lower level detail that's I think quite cool.
647
:
00:54:47
Yeah, yeah, yeah.
648
:
00:54:49
That is definitely super cool.
649
:
00:54:51
And so if people want to get started on PancyBard and especially the rest implementation,
what should they do?
650
:
00:54:58
What should they look at?
651
:
00:54:59
should they download?
652
:
00:55:03
Yeah.
653
:
00:55:04
And I think if people want to get, just read the code base to start helping.
654
:
00:55:08
Currently the code base is under my repository.
655
:
00:55:12
And I think we can link that in the show notes and, I have several issues there where
things that should be need implemented or cleaned up.
656
:
00:55:21
But in yeah.
657
:
00:55:23
And so think that'd be a very good place to start is just going to the repo and looking at
some of the issues I have, I have them all tagged as good first issue and.
658
:
00:55:33
various other tags.
659
:
00:55:35
Yeah.
660
:
00:55:36
Yeah.
661
:
00:55:36
So I put that already in the show notes.
662
:
00:55:38
So feel free to look at that folks.
663
:
00:55:41
That's if you want to start getting involved.
664
:
00:55:43
I was asking you, what if you want to, what if people want to start using it?
665
:
00:55:47
What I would advise is, and that's going to be in the show note too, go to the Pimesy Bart
website, look at the tutorial notebooks, uh and then just install Bart RS and just run
666
:
00:56:00
these notebooks, but using
667
:
00:56:02
rest implementation and you'll see it's literally a drop in replacement, except when you
need to use two bot random variables as we were saying, but otherwise this is literally
668
:
00:56:13
the same.
669
:
00:56:14
And I think it's amazing this makes it so easy for people to start.
670
:
00:56:19
And bot models are really good because they are super flexible.
671
:
00:56:24
They are very easy to understand and they are usually a very good baseline if you are in a
case where tree methods are.
672
:
00:56:33
So if you're in this case, practical advice would be definitely try parencyBart because
the motto is going to be super easy to write.
673
:
00:56:39
It's going to be just like this, Bart just figures out the functional form.
674
:
00:56:44
So it's just like one Bart variable, you feed that into your likelihood and you're done.
675
:
00:56:49
And you see how it works.
676
:
00:56:51
If you're in the cases we say we talked about before where it doesn't work, well then
that's going to be for next time.
677
:
00:56:58
But if you are lucky enough to be in these kind of cases, I think it's a very good shot to
buy.
678
:
00:57:05
Yep, absolutely.
679
:
00:57:07
And so to play us out Gabriel, I saw that you recently worked on some other optimization
problem, which is reproducing Uber's marketplace optimization.
680
:
00:57:18
And you have a really good blog post about that that I put in the show notes.
681
:
00:57:22
You put also the code into a GitHub repo that is also in the show notes, folks.
682
:
00:57:28
If you want to look at it, do you want to do a touch on that briefly, basically what it
is, what it does and how, why would people be interested in that?
683
:
00:57:38
Uber has a system in place that performs uh resource allocation.
684
:
00:57:44
so what their problem is, is they have a, what Uber can do is they're a ride hailing
service with a bunch of different programs like Uber Eats and your normal.
685
:
00:57:56
driving scenarios, but what Uber can do is that they can actually, they can influence the
marketplace by essentially allocating money to different programs to stimulate supply and
686
:
00:58:08
demand.
687
:
00:58:08
But this results in a business problem as in, hey, as a company, we have a finite amount
of money.
688
:
00:58:15
How much should we allocate to each city and each program within the city such that we can
maximize some sort of business metric.
689
:
00:58:25
And such as like gross bookings, which then maybe influences the profit of the company.
690
:
00:58:30
And so I'm interested, I was interested in, in how you can, how this even works.
691
:
00:58:38
So how do you perform resource allocation with optimization methods?
692
:
00:58:42
But then what I found out quite interesting was, was that they were embedding a neural
network into the optimization algorithm to model the forecasting problem.
693
:
00:58:54
And so you kind of have these two kind of interesting components.
694
:
00:58:58
You have the optimization algorithm, but then the fact that they're embedding a neural
network into the system to help learn the association between how much money they're
695
:
00:59:08
allocating and how much this influences a business outcome, such as gross bookings.
696
:
00:59:14
so.
697
:
00:59:15
Is that the same idea as the, what you talked about by embedding a bath model into the
optimization?
698
:
00:59:21
Yeah, yeah.
699
:
00:59:22
So it's kind of the same.
700
:
00:59:23
because I think, I don't think a lot of people, I mean, maybe a lot of people know this,
but like you, you can really embed like any machine learning model into like an
701
:
00:59:36
optimization kind of program and then optimize for those features that you're using in the
model.
702
:
00:59:45
Um, and so essentially this is kind of what I want us to do here is like,
703
:
00:59:50
OK, let's embed a machine learning model into an algorithm optimization component to
produce an optimal allocation scenario.
704
:
00:59:59
um But what's really interesting with the optimization algorithm used here, the
alternating direction method of multipliers, um is that it's for a distributed
705
:
01:00:12
optimization algorithm.
706
:
01:00:14
And so it happens in three steps.
707
:
01:00:19
And so in the first step, you use the neural network to predict essentially how much gross
bookings each city is going to have given a certain amount of allocation to this program.
708
:
01:00:36
And then you select the ones that optimize that objective.
709
:
01:00:40
And then the next step, you perform a consensus step where you are trying to get
710
:
01:00:46
the cities to kind of agree with each other to satisfy the constraint.
711
:
01:00:53
Typically you have the constraint of maybe Uber can only allocate a million dollars and we
need to divvy up that a million dollars to each city such that the sum is does not exceed
712
:
01:01:06
a million.
713
:
01:01:06
And so you have that consensus step.
714
:
01:01:09
And then the last step is just kind of a dual update step.
715
:
01:01:12
And then you kind of, then you iterate over this.
716
:
01:01:15
And so.
717
:
01:01:16
It was really kind of a nice exploration, but I think what could be more interesting and
something I talked to with Juan was what if we embedded more of a probabilistic model into
718
:
01:01:29
here?
719
:
01:01:30
And so then instead of, so now we can have the entire posterior over our decision space
and we can say, Hey, you should allocate like between
720
:
01:01:45
to 100,000 to 150,000 to city A and program A B.
721
:
01:01:51
And so that's kind of where I kind of see this going in a way is kind of instead of now
replacing the neural network with a more of a probabilistic model to have uncertainty over
722
:
01:02:03
our decisions.
723
:
01:02:06
Really cool.
724
:
01:02:07
Yeah, this is really amazing.
725
:
01:02:09
I really love that.
726
:
01:02:10
Well, that's actually a public writing.
727
:
01:02:13
that we can refer people to if they are interested in this idea you were explicitly before
of embedding a BART model into an optimization algorithm.
728
:
01:02:23
think this is very close.
729
:
01:02:26
This is using a neural network, but this is also very, very cool.
730
:
01:02:31
yeah, and I definitely some application of that in the baseball, in the baseball for sure.
731
:
01:02:39
I mean, the sports world.
732
:
01:02:42
in general.
733
:
01:02:42
yeah, this is, this is super cool.
734
:
01:02:45
Yeah.
735
:
01:02:45
Thanks.
736
:
01:02:46
Thanks Gabriel.
737
:
01:02:47
So yeah, all the links to that are in the show notes.
738
:
01:02:49
Any, any other current or upcoming projects you, do you want to talk about before we close
up the show?
739
:
01:02:56
um Something you're excited about?
740
:
01:03:03
So not really current projects on the plate.
741
:
01:03:07
Maybe there are a couple previous projects where I see
742
:
01:03:11
more probabilistic programming could be in play, but yeah, nothing at the moment.
743
:
01:03:19
Okay.
744
:
01:03:20
And I'm curious also, are you curious to see in the coming month then?
745
:
01:03:30
What is it something you would really like to see in the Bayesian world maybe, but in the
data world in general, in the data science world in general that would have a huge impact
746
:
01:03:44
and potential on your...
747
:
01:03:46
on your work, you know, things you are able to do.
748
:
01:03:49
uh
749
:
01:03:52
I think because I think a recurring theme of a lot of the problems I work on is
optimization.
750
:
01:04:00
um And I think better, better tooling around.
751
:
01:04:08
And yeah, embedding or using machine learning models, probabilistic models within an
optimization framework, whether that's Bayesian optimization, whether that's in
752
:
01:04:18
traditional convex optimization or sequential decision-making.
753
:
01:04:22
think because typically now, especially at like work, I need to hand roll all of that
together myself.
754
:
01:04:28
And I think it would be really nice to have a package or a framework that really helps
with that.
755
:
01:04:37
process.
756
:
01:04:40
Yeah.
757
:
01:04:41
Yeah.
758
:
01:04:41
I agree.
759
:
01:04:43
would be something very interesting and very useful.
760
:
01:04:47
Amazing.
761
:
01:04:48
Damn.
762
:
01:04:48
Thanks a lot, Gabriel.
763
:
01:04:49
I'm very, very excited to try these new things like the BART rest part and also the
optimization thing.
764
:
01:04:59
So folks, if you want to contribute to BART RS, the links are in the show notes.
765
:
01:05:03
We're always looking for people who want to make this better for themselves and everybody.
766
:
01:05:10
same time and I'm sure that Gabriel will welcome any help on that.
767
:
01:05:18
Anything to ask Gabriel before we ask you the last two questions?
768
:
01:05:26
No, no, Nothing's my seat.
769
:
01:05:28
good.
770
:
01:05:29
I'll take it as a sign I did a good job.
771
:
01:05:32
So I'm going to ask you the last two questions I ask every guest at the end of the show.
772
:
01:05:38
If you had unlimited time and resources, which problem would you try to solve?
773
:
01:05:48
So I think I'm to, going to defer back to what I was just saying in regards to the
tooling.
774
:
01:05:54
and in particular, kind of the specific problem space is sequential decision making.
775
:
01:06:00
And so kind of the big idea or the big pitch there is like, um, what, so what decision
should you take now such that your immediate action or your immediate reward is
776
:
01:06:16
maximize, but also takes into account the future contribution, the expectation of the
future contribution.
777
:
01:06:23
And this is kind of this problem space of sequential decision-making, sequential
optimization is really kind of quite formal in the control theory world, but in regards to
778
:
01:06:34
kind of like business applications, I think is quite lacking, especially in the open
source world.
779
:
01:06:40
And so developing a library or framework for that.
780
:
01:06:45
I think would be a great step forward for modeling sequential decision problems.
781
:
01:06:52
That's something I would really like to work on.
782
:
01:06:55
Yeah, that definitely sounds like it would be very useful.
783
:
01:07:00
And second question, if you could have a dinner with any great scientific mind, dead,
alive or fictional, who would it be?
784
:
01:07:11
So I already had dinner with Tommy Capretto in Buenos Aires.
785
:
01:07:18
And I recommended the restaurant, so I feel like I was at that dinner too, you know.
786
:
01:07:23
Yeah.
787
:
01:07:24
No, but probably, I would say probably Richard Feynman.
788
:
01:07:30
um Because I've read some of the biographies of him and not only would it be like, I think
it'd just be a fun dinner, right?
789
:
01:07:40
Like a lot of technical people can be quite boring or socially awkward, but
790
:
01:07:46
Feynman being technical and fun, I think would be a very good dinner experience.
791
:
01:07:51
Yeah, yeah.
792
:
01:07:53
Definitely agree on all these accounts.
793
:
01:07:55
Feynman sounded very interesting and cool and that the technical people can be, what you
said, this is great choice.
794
:
01:08:06
This dinner is getting crowded, I can tell you that.
795
:
01:08:08
This is a popular choice.
796
:
01:08:10
So we're going to have to scooch at the dinner table, but you know.
797
:
01:08:16
We should go to Buenos Aires to that same restaurant.
798
:
01:08:18
I'm sure fine line would have a lot of things to say about it.
799
:
01:08:21
I think so too.
800
:
01:08:23
I forget the name of it.
801
:
01:08:25
Otherwise I would recommend it right now to everybody.
802
:
01:08:27
me too actually.
803
:
01:08:28
it's like, I don't know, I'm blanking on the name.
804
:
01:08:31
Tommy come to our rescue.
805
:
01:08:33
Yeah.
806
:
01:08:34
Awesome.
807
:
01:08:35
Well, Gabriel, that was a great show.
808
:
01:08:38
Thank you so much for taking the time.
809
:
01:08:41
Show notes are going to be full for this one folks.
810
:
01:08:43
So make sure to take a look at them.
811
:
01:08:46
And well, Gabriel, next time you have a fun and useful project like that, you are welcome
anytime on the show.
812
:
01:08:56
Otherwise, really looking forward to meeting you in person in Switzerland.
813
:
01:08:59
At some point, I am definitely going to come and do some hiking over there, which my wife
and I love.
814
:
01:09:05
Gabriel, thank you again for taking the time and being on this show.
815
:
01:09:09
Yeah, thank you, it's been a lot of fun.
816
:
01:09:16
This has been another episode of Learning Bajan Statistics.
817
:
01:09:19
Be sure to rate, review and follow the show on your favorite podcatcher and visit
learnbastats.com for more resources about today's topics as well as access to more
818
:
01:09:30
episodes to help you reach true Bajan state of mind.
819
:
01:09:34
That's learnbastats.com.
820
:
01:09:36
Our theme music is Good Bajan by Baba Brinkman, fit MC Lass and Meghiraan.
821
:
01:09:41
Check out his awesome work at bababrinkman.com.
822
:
01:09:44
I'm your host.
823
:
01:09:45
Alex and Dora.
824
:
01:09:46
can follow me on Twitter at Alex underscore and Dora like the country.
825
:
01:09:50
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
826
:
01:09:58
Thank you so much for listening and for your support.
827
:
01:10:00
You're truly a good Bayesian.
828
:
01:10:02
Change your predictions after taking information and if you're thinking I'll be less than
amazing.
829
:
01:10:09
Let's adjust those expectations.
830
:
01:10:12
Let me show you how to
831
:
01:10:14
Good days, you change calculations after taking fresh data in Those predictions that your
brain is making Let's get them on a solid foundation