Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Changing perspective is often a great way to solve burning research problems. Riemannian spaces are such a perspective change, as Arto Klami, an Associate Professor of computer science at the University of Helsinki and member of the Finnish Center for Artificial Intelligence, will tell us in this episode.
He explains the concept of Riemannian spaces, their application in inference algorithms, how they can help sampling Bayesian models, and their similarity with normalizing flows, that we discussed in episode 98.
Arto also introduces PreliZ, a tool for prior elicitation, and highlights its benefits in simplifying the process of setting priors, thus improving the accuracy of our models.
When Arto is not solving mathematical equations, you’ll find him cycling, or around a good board game.
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Takeaways:
- Riemannian spaces offer a way to improve computational efficiency and accuracy in Bayesian inference by considering the curvature of the posterior distribution.
- Riemannian spaces can be used in Laplace approximation and Markov chain Monte Carlo algorithms to better model the posterior distribution and explore challenging areas of the parameter space.
- Normalizing flows are a complementary approach to Riemannian spaces, using non-linear transformations to warp the parameter space and improve sampling efficiency.
- Evaluating the performance of Bayesian inference algorithms in challenging cases is a current research challenge, and more work is needed to establish benchmarks and compare different methods.
- PreliZ is a package for prior elicitation in Bayesian modeling that facilitates communication with users through visualizations of predictive and parameter distributions.
- Careful prior specification is important, and tools like PreliZ make the process easier and more reproducible.
- Teaching Bayesian machine learning is challenging due to the combination of statistical and programming concepts, but it is possible to teach the basic reasoning behind Bayesian methods to a diverse group of students.
- The integration of Bayesian approaches in data science workflows is becoming more accepted, especially in industries that already use deep learning techniques.
- The future of Bayesian methods in AI research may involve the development of AI assistants for Bayesian modeling and probabilistic reasoning.
Chapters:
00:00 Introduction and Background
02:05 Arto's Work and Background
06:05 Introduction to Bayesian Inference
12:46 Riemannian Spaces in Bayesian Inference
27:24 Availability of Romanian-based Algorithms
30:20 Practical Applications and Evaluation
37:33 Introduction to Prelease
38:03 Prior Elicitation
39:01 Predictive Elicitation Techniques
39:30 PreliZ: Interface with Users
40:27 PreliZ: General Purpose Tool
41:55 Getting Started with PreliZ
42:45 Challenges of Setting Priors
45:10 Reproducibility and Transparency in Priors
46:07 Integration of Bayesian Approaches in Data Science Workflows
55:11 Teaching Bayesian Machine Learning
01:06:13 The Future of Bayesian Methods with AI Research
01:10:16 Solving the Prior Elicitation Problem
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
Let me show you how to be a good b...
2
:how they can help sampling Bayesian models
and their similarity with normalizing
3
:flows that we discussed in episode 98.
4
:ARTO also introduces Prelease, a tool for
prior elicitation, and highlights its
5
:benefits in simplifying the process of
setting priors, thus improving the
6
:accuracy of our models.
7
:When ARTO is not solving mathematical
equations, you'll find him cycling or
8
:around the good board game.
9
:This is Learning Bayesian Statistics,
episode 103.
10
:recorded February 15, 2024.
11
:Welcome to Learning Bayesian Statistics, a
podcast about Bayesian inference, the
12
:methods, the projects, and the people who
make it possible.
13
:I'm your host.
14
:You can follow me on Twitter at Alex
underscore and Dora like the country for
15
:any info about the show.
16
:Learnbasedats .com is left last to me.
17
:Show notes, becoming a corporate sponsor,
unlocking Bayesian Merch, supporting the
18
:show on Patreon.
19
:Everything is in there.
20
:That's Learnbasedats .com.
21
:If you're interested in one -on -one
mentorship, online courses, or statistical
22
:consulting, feel free to reach out and
book a call at topmate .io slash Alex
23
:underscore.
24
:And Dora, see you around, folks, and best
wishes to you all.
25
:Clemmy, welcome to Layer Name Patient
Statistics.
26
:Thank you.
27
:You're welcome.
28
:How was my Finnish pronunciation?
29
:Oh, I think that was excellent.
30
:For people who don't have the video, I
don't think that was true.
31
:So thanks a lot for taking the time,
Artho.
32
:I'm really happy to have you on the show.
33
:And I've had a lot of questions for you
for a long time, and the longer we
34
:postpone the episode, the more questions.
35
:So I'm gonna do my best to not take three
hours of your time.
36
:And let's start by...
37
:maybe defining the work you're doing
nowadays and well, how do you end up
38
:working on this?
39
:Yes, sure.
40
:So I personally identify as a machine
learning researcher.
41
:So I do machine learning research, but
very much from a Bayesian perspective.
42
:So my original background is in computer
science.
43
:I'm essentially a self -educated
statistician in the sense that I've never
44
:really
45
:kind of studied properly statistics
design, well except for a few courses here
46
:and there.
47
:But I've been building models, algorithms,
building on the Bayesian principles for
48
:addressing various kinds of machine
learning problems.
49
:So you're basically like a self -taught
statistician through learning, let's say.
50
:More or less, yes.
51
:I think the first things I started doing,
52
:with anything that had to do with Bayesian
statistics was pretty much already going
53
:to the deep end and trying to learn
posterior inference for fairly complicated
54
:models, even actually non -parametric
models in some ways.
55
:Yeah, we're going to dive a bit on that.
56
:Before that, can you tell us the topics
you are particularly focusing on through
57
:that?
58
:umbrella of topics you've named.
59
:Yes, absolutely.
60
:So I think I actually have a few somewhat
distinct areas of interest.
61
:So on one hand, I'm working really on the
kind of core inference problem.
62
:So how do we computationally efficiently,
accurately enough approximate the
63
:posterior distributions?
64
:Recently, we've been especially working on
inference algorithms that build on
65
:concepts from Riemannian geometry.
66
:So we're trying to really kind of account
the actual manifold induced by this
67
:posterior distribution and try to somehow
utilize these concepts to kind of speed up
68
:inference.
69
:So that's kind of one very technical
aspect.
70
:Then there's the other main theme on the
kind of Bayesian side is on priors.
71
:So we'll be working on prior elicitation.
72
:So how do we actually go about specifying
the prior distributions?
73
:and ideally maybe not even specifying.
74
:So how would we extract that knowledge
from a domain expert who doesn't
75
:necessarily even have any sort of
statistical training?
76
:And how do we flexibly represent their
true beliefs and then encode them as part
77
:of a model?
78
:That's maybe the main kind of technical
aspects there.
79
:Yeah.
80
:Yeah.
81
:No, super fun.
82
:And we're definitely going to dive into
those two aspects a bit later in the show.
83
:I'm really interested in that.
84
:Before that, do you remember how you first
got introduced to Bayesian inference,
85
:actually, and also why it sticks with you?
86
:Yeah, like I said, I'm in some sense self
-trained.
87
:I mean, coming with the computer science
background, we just, more or less,
88
:sometime during my PhD,
89
:I was working in a research group that was
led by Samuel Kaski.
90
:When I joined the group, we were working
on neural networks of the kind that people
91
:were interested in.
92
:That was like 20 years ago.
93
:So we were working on things like self
-organizing maps and these kind of
94
:methods.
95
:And then we started working on
applications where we really bumped into
96
:the kind of small sample size problems.
97
:So looking at...
98
:DNA microarray data that was kind of tens
of thousands of dimensions and medical
99
:applications with 20 samples.
100
:So we essentially figured out that we're
gonna need to take the kind of uncertainty
101
:into account properly.
102
:Started working on the Bayesian modeling
side of these and one of the very first
103
:things I was doing is kind of trying to
create Bayesian versions of some of these
104
:classical analysis methods that were
105
:especially canonical correlation analysis.
106
:That's the original derivation is like an
information theoretic formulation.
107
:So I kind of dive directly into this that
let's do Bayesian versions of models.
108
:But I actually do remember that around the
same time I also took a course, a course
109
:by Akivehtari.
110
:He's his author of this Gelman et al.
111
:book, one of the authors.
112
:I think the first version of the book had
been released.
113
:just before that.
114
:So Aki was giving a course where he was
teaching based on that book.
115
:And I think that's the kind of first real
official contact on trying to understand
116
:the actual details behind the principles.
117
:Yeah, and actually I'm pretty sure
listeners are familiar with Aki.
118
:He's been on the show already, so I'll
link to the episode, of course, where Aki
119
:was.
120
:And yeah, for sure.
121
:I also recommend going through these
episodes, show notes for people who are
122
:interested in, well, starting learning
about basic stuff and things like that.
123
:Something I'm wondering from what you just
explained is, so you define yourself as a
124
:machine learning researcher, right?
125
:And you work in artificial intelligence
too.
126
:But there is this interaction with the
Bayesian framework.
127
:How does that framework underpin your
research in statistical machine learning
128
:and artificial intelligence?
129
:How does that all combine?
130
:Yeah.
131
:Well, that's a broad topic.
132
:There's of course a lot in that
intersection.
133
:I personally do view all learning problems
in some sense from a Bayesian perspective.
134
:I mean, no matter what kind of a, whether
it's a very simple fitting a linear
135
:regression type of a problem or whether
it's figuring out the parameters of a
136
:neural network with 1 billion parameters,
it's ultimately still a statistical
137
:inference problem.
138
:I mean, most of the cases, I'm quite
confident that we can't figure out the
139
:parameters exactly.
140
:We need to somehow quantify for the
uncertainty.
141
:I'm not really aware of any other kind of
principled way of doing it.
142
:So I would just kind of think about it
that we're always doing Bayesian inference
143
:in some sense.
144
:But then there's the issue of how far can
we go in practice?
145
:So it's going to be approximate.
146
:It's possibly going to be very crude
approximations.
147
:But I would still view it through the lens
of Bayesian statistics in my own work.
148
:And that's what I do when I teach for my
BSc students, for example.
149
:I mean not all of them explicitly
formulate the learning algorithms kind of
150
:from these perspectives but we are still
kind of talking about that what's the
151
:relationship what can we assume about the
algorithms what can we assume about the
152
:result and how would it relate to like
like properly estimating everything
153
:through kind of exactly how it should be
done.
154
:Yeah okay that's an interesting
perspective yeah so basically putting that
155
:in a in that framework.
156
:And that means, I mean, that makes me
think then, how does that, how do you
157
:believe, what do you believe, sorry, the
impact of Bayesian machine learning is on
158
:the broader field of AI?
159
:What does that bring to that field?
160
:It's a, let's say it has a big effect.
161
:It has a very big impact in a sense that
pretty much most of the stuff that is
162
:happening on the machine learning front
and hence also on the kind of all learning
163
:based AI solutions.
164
:It is ultimately, I think a lot of people
are thinking about roughly in the same way
165
:as I am, that there is an underlying
learning problem that we would ideally
166
:want to solve more or less following
exactly the Bayesian principles.
167
:don't necessarily talk about it from this
perspective.
168
:So you might be happy to write algorithms,
all the justification on the choices you
169
:make comes from somewhere else.
170
:But I think a lot of people are kind of
accepting that it's the kind of
171
:probabilistic basis of these.
172
:So for instance, I think if you think
about the objectives that people are
173
:optimizing in deep learning, they're all
essentially likelihoods of some
174
:assume probabilistic model.
175
:Most of the regularizers they are
considering do have an interpretation of
176
:some kind of a prior distribution.
177
:I think a lot of people are all the time
going deeper and deeper into actually
178
:explicitly thinking about it from these
perspectives.
179
:So we have a lot of these deep learning
type of approaches, various autoencoders,
180
:Bayesian neural networks, various kinds of
generative AI models that are
181
:They are actually even explicitly
formulated as probabilistic models and
182
:some sort of an approximate inference
scheme.
183
:So I think the kind of these things are,
they are the same two sides of the same
184
:coin.
185
:People are kind of more and more thinking
about them from the same perspective.
186
:Okay, yeah, that's super interesting.
187
:Actually, let's start diving into these
topics from a more technical perspective.
188
:So you've mentioned the
189
:research and advances you are working on
regarding Romanian spaces.
190
:So I think it'd be super fun to talk about
that because we've never really talked
191
:about it on the show.
192
:So maybe can you give listeners a primer
on what a Romanian space is?
193
:Why would you even care about that?
194
:And what you are doing in this regard,
what your research is in this regard.
195
:Yes, let's try.
196
:I mean, this is a bit of a mathematical
concept to talk about.
197
:But I mean, ultimately, if you think about
most of the learning algorithms, so we are
198
:kind of thinking that there are some
parameters that live in some space.
199
:So we essentially, without thinking about
it, that we just assume that it's a
200
:Euclidean space in a sense that we can
measure distances between two parameters,
201
:that how similar they are.
202
:It doesn't matter which direction we go,
if the distance is the same, we think that
203
:they are kind of equally far away.
204
:So now a Riemannian geometry is one that
is kind of curved in some sense.
205
:So we may be stretching the space in
certain ways and we'll be doing this
206
:stretching locally.
207
:So what it actually means, for example, is
that the shortest path between two
208
:possible
209
:values, maybe for example two parameter
configurations, that if you start
210
:interpolating between two possible values
for a parameter, it's going to be a
211
:shortest path in this Riemannian geometry,
which is not necessarily a straight line
212
:in an underlying Euclidean space.
213
:So that's what the Riemannian geometry is
in general.
214
:So it's kind of the tools and machinery we
need to work with these kind of settings.
215
:And now then the relationship to
statistical inference comes from trying to
216
:define such a Riemannian space that it has
somehow nice characteristics.
217
:So maybe the concept that most of the
people actually might be aware of would be
218
:the Fisher information matrix that kind of
characterizes the kind of the curvature
219
:induced by a particular probabilistic
model.
220
:So these tools kind of then allow, for
example, a very recent thing that we did,
221
:it's going to come out later this spring
in AI stats, is an extension of the
222
:Laplace approximation in a Riemannian
geometry.
223
:So those of you who know what the Laplace
approximation is, it's essentially just
224
:fitting a normal distribution at the mode
of a distribution.
225
:But if we now fit the same normal
distribution in a suitably chosen
226
:Riemannian space,
227
:we can actually model also the kind of
curvature of the posterior mode and even
228
:kind of how it stretches.
229
:So we get a more flexible approximation.
230
:We are still fitting a normal
distribution.
231
:We're just doing it in a different space.
232
:Not sure how easy that was to follow, but
at least maybe it gives some sort of an
233
:idea.
234
:Yeah, yeah, yeah.
235
:That was actually, I think, a pretty
approachable.
236
:introduction and so if I understood
correctly then you're gonna use these
237
:Romanian approximations to come up with
better algorithms is that what you do and
238
:why you focus on Romanian spaces and yeah
if you can if you can introduce that and
239
:tell us basically why that is interesting
to then look
240
:at geometry from these different ways
instead of the classical Euclidean way of
241
:things geometry.
242
:Yeah, I think that's exactly what it is
about.
243
:So one other thing, maybe another
perspective of thinking about it is that
244
:we've also been doing Markov chain Monte
Carlo algorithms, so MCMC in these
245
:Riemannian spaces.
246
:And what we can achieve with those is that
if you have, let's say, a posterior
247
:distribution,
248
:that has some sort of a narrow funnel,
some very narrow area that extends far
249
:away in one corner of your parameter
space.
250
:It's actually very difficult to get there
with something like standard Hamiltonian
251
:Monte Carlo, but with the Riemannian
methods we can kind of make these narrow
252
:funnels equally easy compared to the
flatter areas.
253
:Now of course this may sound like a magic
bullet that we should be doing all
254
:inference with these techniques.
255
:Of course it does come with
256
:certain computational challenges.
257
:So we do need to be, like I said, the
shortest paths are no longer straight
258
:lines.
259
:So we need numerical integration to follow
the geodesic paths in these metrics and so
260
:on.
261
:So it's a bit of a compromise, of course.
262
:So they have very nice theoretical
properties.
263
:We've been able to get them working also
in practice in many cases so that they are
264
:kind of comparable with the current state
of the art.
265
:But it's not always easy.
266
:Yeah, there is no free lunch.
267
:Yes.
268
:Yeah.
269
:Yeah.
270
:Do you have any resources about these?
271
:Well, first the concepts of Romanian
spaces and then the algorithms that you
272
:folks derived in your group using these
Romanian space for people who are
273
:interested?
274
:Yeah, I think I wouldn't know, let's say a
very particular
275
:reasons I would recommend on the Romanian
geometry.
276
:It is actually a rather, let's say,
mathematically involved topic.
277
:But regarding the specific methods, I
think they are...
278
:It's a couple of my recent papers, so we
have this Laplace approximation is coming
279
:out in AI stats this year.
280
:The MCMC sampler we had, I think, two
years ago in AI stats, similarly, the
281
:first MCMC method building on these and
then...
282
:last year one paper on transactions of
machine learning research.
283
:I think they are more or less accessible.
284
:Let's definitely link to those papers if
you can in the show notes because I'm
285
:personally curious about it but also I
think listeners will be.
286
:It sounds from what you're saying that
this idea of doing algorithms in this
287
:Romanian space is
288
:somewhat recent.
289
:Am I right?
290
:And why would it appear now?
291
:Why would it become interesting now?
292
:Well, it's not actually that recent.
293
:I think the basic principle goes back, I
don't know, maybe 20 years or so.
294
:I think the main reason why we've been
working on this right now is that the
295
:We've been able to resolve some of the
computational challenges.
296
:So the fundamental problem with these
models is always this numeric integration
297
:of following the shortest paths depending
on an algorithm we needed for different
298
:reasons, but we always needed to do it,
which usually requires operations like
299
:inversion of a metric tensor, which has
the kind of a dimensionality of the
300
:parameter space.
301
:So we came up with the particular metric.
302
:that happens to have computationally
efficient inverse.
303
:So there's kind of this kind of concrete
algorithmic techniques that are kind of
304
:bringing the computational cost to the
level so that it's no longer notably more
305
:expensive than doing kind of standard
Euclidean methods.
306
:So we can, for example, scale them for
Bayesian neural networks.
307
:That's one of the application cases we are
looking at.
308
:We are really having very high
-dimensional problems but still able to do
309
:some of these Riemannian techniques or
approximations of them.
310
:That was going to be my next question.
311
:In which cases are these approximations
interesting?
312
:In which cases would you recommend
listeners to actually invest time to
313
:actually use these techniques because they
have a better chance of working than the
314
:classic Hamiltonian Monte Carlo semper
that are the default in most probabilistic
315
:languages?
316
:Yeah, I think the easy answer is that when
the inference problem is hard.
317
:So essentially one very practical way
would be that if you realize that you
318
:can't really get a Hamiltonian Monte Carlo
to explore the space, the posterior
319
:properly, that it may be difficult to find
out that this is happening.
320
:Of course, if you're ever visiting a
certain corner, you wouldn't actually
321
:know.
322
:But if you have some sort of a reason to
believe that you really are handling with
323
:such a complex posterior that I'm kind of
willing to spend a bit more extra
324
:computation to be careful so that I really
try to cover every corner there is.
325
:Another example is that we realized on the
scope of these Bayesian neural networks
326
:that there are certain kind of classical
327
:Well, certain kind of scenarios where we
can show that if you do inference with the
328
:two simple methods, so something in the
Euclidean metric with the standard
329
:Vangerman dynamics type of a thing, what
we actually see is that if you switch to
330
:using better prior distributions in your
model, you don't actually see an advantage
331
:of those unless you at the same time
switch to using an inference algorithm
332
:that is kind of able to handle the extra
complexity.
333
:So if you have for example like
334
:heavy tail spike and slap type of priors
in the neural network.
335
:You just kind of fail to get any benefit
from these better priors if you don't pay
336
:a bit more attention into how you do the
inference.
337
:Okay, super interesting.
338
:And also, so that seems it's also quite
interesting to look at that when you have,
339
:well, or when you suspect that you have
multi -modal posteriors.
340
:Yes, well yeah, multimodal posteriors are
interesting.
341
:I'm not, we haven't specifically studied
like this question that is there and we
342
:have actually thought about some ideas of
creating metrics that would specifically
343
:encourage exploring the different modes
but we haven't done that concretely so we
344
:now still focusing on these kind of narrow
thin areas of posteriors and how can you
345
:kind of reach those.
346
:Okay.
347
:And do you know of normalizing flows?
348
:Sure, yes.
349
:So yeah, we've had Marie -Lou Gabriel on
the show recently.
350
:It was episode 98.
351
:And so she's working a lot on these
normalizing flows and the idea of
352
:assisting NCMC sampling with these machine
learning methods.
353
:And it's amazing.
354
:can sound somewhat similar to what you do
in your group.
355
:And so for listeners, could you explain
the difference between the two ideas and
356
:maybe also the use cases that both apply
to it?
357
:Yeah, I think you're absolutely right.
358
:So they are very closely related.
359
:So there are, for example, the basic idea
of the neural transport that uses
360
:normalizing flows for
361
:essentially transforming the parameter
space in a suitable non -linear way and
362
:then running standard Euclidean
Hamiltonian Monte Carlo.
363
:It can actually be proven.
364
:I think it is in the original paper as
well that I mean it is actually
365
:mathematically equivalent to conducting
Riemannian inference in a suitable metric.
366
:So I would say that it's like a
complementary approach of solving exactly
367
:the same problem.
368
:So you have a way of somehow in a flexible
way warping your parameter space.
369
:You either do it through a metric or you
kind of do it as a pre -transformation.
370
:So there's a lot of similarities.
371
:It's also the computation in some sense
that if you think about mapping...
372
:sample through a normalizing flow.
373
:It's actually very close to what we do
with the Riemannian Laplace approximation
374
:that you start kind of take a sample and
you start propagating it through some sort
375
:of a transformation.
376
:It's just whether it's defined through a
metric or as a flow.
377
:So yes, so they are kind of very close.
378
:So now the question is then that when
should I be using one of these?
379
:I'm afraid I don't really have an answer.
380
:that in a sense that I mean there's
computational properties on let's say for
381
:example if you've worked with flows you do
need to pre -train them so you do need to
382
:train some sort of a flow to be able to
use it in certain applications so it comes
383
:with some pre -training cost.
384
:Quite likely during when you're actually
using it it's going to be faster than
385
:working in a Riemannian metric where you
need to invert some metric tensors and so
386
:on.
387
:So there's kind of like technical
differences.
388
:Then I think the bigger question is of
course that if we go to really challenging
389
:problems, for example, very high
dimensions, that which of these methods
390
:actually work well there.
391
:For that I don't quite now have an answer
in the sense that I would dare to say that
392
:or even speculate that which of these
things I might miss some kind of obvious
393
:limitations of one of the approaches if
trying to kind of extrapolate too far.
394
:from what we've actually tried in
practice.
395
:Yeah, that's what I was going to say.
396
:It's also that these methods are really at
the frontier of the science.
397
:So I guess we're lacking, we're lacking
for now the practical cases, right?
398
:And probably in a few years we'll have
more ideas of these and when one is more
399
:appropriate than another.
400
:But for now, I guess we have to try.
401
:those algorithms and see what we get back.
402
:And so actually, what if people want to
try these Romanian based algorithms?
403
:Do you have already packages that we can
link to that people can try and plug their
404
:own model into?
405
:Yes and no.
406
:So we have released open source code with
each of the research papers.
407
:So there is a reference implementation
that
408
:can be used.
409
:We have internally been integrating these,
kind of working a bit towards integrating
410
:the kind of proper open ecosystems that
would allow, make like for example model
411
:specification easy.
412
:It's not quite there yet.
413
:So there's one particular challenge is
that many of the environments don't
414
:actually have all the support
functionality you need for the Riemannian
415
:methods.
416
:They're essentially simplifying some of
the things that directly encoding these
417
:assumptions that the shortest path is an
interpolation or it's a line.
418
:So you need a bit of an extra machinery
for the most established libraries.
419
:There are some libraries, I believe, that
are actually making it fairly easy to do
420
:kind of plug and play Riemannian metrics.
421
:I don't remember the names right now, but
that's where we've kind of been.
422
:planning on putting in the algorithms, but
they're not really there yet.
423
:Hmm, OK, I see.
424
:Yeah, definitely that would be, I guess,
super, super interesting.
425
:If by the time of release, you see
something that people could try,
426
:definitely we'll link to that, because I
think listeners will be curious.
427
:And I'm definitely super curious to try
that.
428
:Any new stuff like that, or you'd like to?
429
:try and see what you can do with it.
430
:It's always super interesting.
431
:And I've already seen some very
interesting experiments done with
432
:normalizing flows, especially Bayox by
Colin Carroll and other people.
433
:Colin Carroll is one of the EasyPindC
developer also.
434
:And yeah, now you can use Bayox to take
any
435
:a juxtifiable model and you plug that into
it and you can use the flow MC algorithm
436
:to sample your juxtifiable PIMC model.
437
:So that's really super cool.
438
:And I'm really looking forward to more
experiments like that to see, well, okay,
439
:what can we do with those algorithms?
440
:Where can we push them to what extent, to
what degree, where do they fall down?
441
:That's really super interesting, at least
for me, because I'm not a mathematician.
442
:So when I see that, I find that super,
like, I love the idea of, basically the
443
:idea is somewhat simple.
444
:It's like, okay, we have that problem when
we think about geometry that way, because
445
:then the geometry becomes a funnel, for
instance, as you were saying.
446
:And then sampling at the bottom of the
funnel is just super hard in the way we do
447
:it right now, because just super small
distances.
448
:What if we change the definition of
distance?
449
:What if we change the definition of
geometry, basically, which is this idea
450
:of, OK, let's switch to Romanian space.
451
:And the way we do that, then, well, the
funnel disappears, and it just becomes
452
:something easier.
453
:It's just like going beyond the idea of
the centered versus non -centered
454
:parameterization, for instance, when you
do that in model, right?
455
:But it's going big with that because it's
more general.
456
:So I love that idea.
457
:I understand it, but I cannot really read
the math and be like, oh, OK, I see what
458
:that means.
459
:So I have to see the model and see what I
can do and where I can push it.
460
:And then I get a better understanding of
what that entails.
461
:Yeah, I think you gave a much better
summary of what it is doing than I did.
462
:So good for that.
463
:I mean, you are actually touching that, of
course.
464
:So there's the one point is making the
algorithms.
465
:available so that everyone could try them
out.
466
:But then there's also the other aspect
that we need to worry about, which is the
467
:proper evaluation of what they're doing.
468
:I mean, of course, most of the papers when
you release a new algorithm, you need to
469
:emphasize things like, in our case,
computational efficiency.
470
:And you do demonstrate that it, maybe for
example, being quite explicitly showing
471
:that these very strong funnels, it does
work better with those.
472
:But now then the question is of course
that how reliable these things are if used
473
:in a black box manner in a so that someone
just runs them on their favorite model.
474
:And one of the challenges we realized is
that it's actually very hard to evaluate
475
:how well an algorithm is working in an
extremely difficult case.
476
:Because there is no baseline.
477
:I mean, in some of the cases we've been
comparing that let's try to do...
478
:standard Hamiltonian MCMC on nuts as
carefully as we can.
479
:And they kind of think that this is the
ground truth, this is the true posterior.
480
:But we don't really know whether that's
the case.
481
:So if it's hard enough case, our kind of
supposed ground truth is failing as well.
482
:And it's very hard to kind of then we
might be able to see that our solution
483
:differs from that.
484
:But then we would need to kind of
separately go and investigate that which
485
:one was wrong.
486
:And that is a practical challenge,
especially if you would like to have a
487
:broad set of models.
488
:And we would want to show somehow
transparently for the kind of end users
489
:that in these and these kind of problems,
this and that particular method, whether
490
:it's one of ours or something else, any
other new fancy.
491
:When do they work when they don't?
492
:Without relying that we really have some
particular method that they already trust
493
:and we kind of, if it's just compared to
it, we can't kind of really convince
494
:others that is it correct when it is
differing from what we kind of used to
495
:rely on.
496
:Yeah, that's definitely a problem.
497
:That's also a question I asked Marilu.
498
:when she was on the show and then that was
kind of the same answer if I remember
499
:correctly that for now it's kind of hard
to do benchmarks in a way, which is
500
:definitely an issue if you're trying to
work on that from a scientific perspective
501
:as well.
502
:If we were astrologists, that'd be great,
like then we'd be good.
503
:But if you're a scientist, then you want
to evaluate your methods and...
504
:And finding a method to evaluate the
method is almost as valuable as finding
505
:the method in the first place.
506
:And where do you think we are on that
regarding in your field?
507
:Is that an active branch of the research
to try and evaluate these algorithms?
508
:How would that even look like?
509
:Or are we still really, really at a very
early time for that work?
510
:That's a...
511
:Very good question.
512
:So I'm not aware of a lot of people that
would kind of specifically focus on
513
:evaluation.
514
:So for example, Aki has of course been
working a lot on that, trying to kind of
515
:create diagnostics and so on.
516
:But then if we think about more on the
flexible machine learning side, I think my
517
:hunch is that it's the individual research
groups are kind of all circling around the
518
:same problems that they are kind of trying
to figure out that, okay,
519
:Every now and then someone invents a fancy
way of evaluating something.
520
:It introduces a particular type of
synthetic scenario where I think that the
521
:most common in tries that what people do
is that you create problems where you
522
:actually have an analytic posterior and
it's somehow like an artificial problem
523
:that you take a problem and you transform
it in a given way and then you assume that
524
:I didn't have the analytic one.
525
:But they are all, I mean, they feel a bit
artificial.
526
:They feel a bit synthetic.
527
:So let's see.
528
:It would maybe be something that the
community should kind of be talking a bit
529
:more about on a workshop or something
that, OK, let's try to really think about
530
:how to verify the robustness or possibly
identify that these things are not really
531
:ready or reliable for practical use in
very serious applications yet.
532
:Yeah.
533
:I haven't been following very closely
what's happening, so I may be missing some
534
:important works that are already out
there.
535
:Okay, yeah.
536
:Well, Aki, if you're listening, send us a
message if we forgot something.
537
:And second, that sounds like there are
some interesting PhDs to do on the issue,
538
:if that's still a very new branch of the
research.
539
:So, people?
540
:If you're interested in that, maybe
contact Arto and we'll see.
541
:Maybe in a few months or years, you can
come here on the show and answer the
542
:question I just asked.
543
:Another aspect of your work I really want
to talk about also that I really love and
544
:now listeners can relax because that's
going to be, I think, less abstract and
545
:closer to their user experience.
546
:is about priors.
547
:You talked about it a bit at the
beginning, especially you are working and
548
:you worked a lot on a package called
Prelease that I really love.
549
:One of my friends and fellow Pimc
developers, Osvaldo Martin, is also
550
:collaborating on that.
551
:And you guys have done a tremendous job on
that.
552
:So yeah, can you give people a primer
about Prelease?
553
:What is it?
554
:When could they use it and what's its
purpose in general?
555
:Maybe I need to start by saying that I
haven't worked a lot on prelease.
556
:Osvaldo has and a couple of others, so
I've been kind of just hovering around and
557
:giving a bit of feedback.
558
:But yeah, so I'll maybe start a bit
further away, so not directly from
559
:prelease, but the whole question of prior
elicitation.
560
:So I think the...
561
:Yeah.
562
:What we've been working with that is the
prior elicitation is simply an, I would
563
:frame it as that it's some sort of
unusually iterative approach of
564
:communicating with the domain expert where
the goal is to estimate what's their
565
:actual subjective prior knowledge is on
whatever parameters the model has and
566
:doing it so that it's like cognitively
easy for the expert.
567
:So many of the algorithms that we've been
working on this are based on this idea of
568
:predictive elicitation.
569
:So if you have a model where the
parameters don't actually have a very
570
:concrete, easily understandable meaning,
you can't actually start asking questions
571
:from the expert about the parameters.
572
:It would require them to understand fully
the model itself.
573
:The predictive elicitation techniques kind
of ask
574
:communicate with the expert usually in the
space of the observable quantities.
575
:So they're trying to make that is this
somehow more likely realization than this
576
:other one.
577
:And now this is where the prelease comes
into play.
578
:So when we are communicating with the
user, so most of the times the information
579
:we show for the user is some sort of
visualizations.
580
:of predictive distributions or possibly
also about the parameter distributions
581
:themselves.
582
:So we need an easy way of communicating
whether it's histograms of predicted
583
:values and whatnot.
584
:So how do we show those for a user in
scenarios where the model itself is some
585
:sort of a probabilistic program so we
can't kind of fixate to a given model
586
:family.
587
:That's actually what's the main role of
Prelease is essentially making it easy to
588
:interface with the user.
589
:Of course, Prelease also then includes
these algorithms themselves.
590
:So, algorithms for estimating the prior
and the kind of interface components for
591
:the expert to give information.
592
:So, make a selection, use a slider that I
would want my distribution to be a bit
593
:more skewed towards the right and so on.
594
:That's what we are aiming at.
595
:A general purpose tool that would be used,
it's essentially kind of a platform for
596
:developing and kind of bringing into use
all kinds of prioritization techniques.
597
:So it's not tied to any given algorithm or
anything but you just have the components
598
:and could then easily kind of commit,
let's say, a new type of prioritization
599
:algorithm into the library.
600
:Yeah and I re -encourage
601
:folks to go take a look at the prelease
package.
602
:I put the link in the show notes because,
yeah, as you were saying, that's a really
603
:easier way to specify your priors and also
elicit them if you need the intervention
604
:of non -statisticians in your model, which
you often do if the model is complex
605
:enough.
606
:So yeah, like...
607
:I'm using it myself quite a lot.
608
:So thanks a lot guys for this work.
609
:So Arto, as you were saying, Osvaldo
Martín is one of the main contributors,
610
:Oriol Abril Blas also, and Alejandro
Icazati, if I remember correctly.
611
:So at least these four people are the main
contributors.
612
:And yeah, so I definitely encourage people
to go there.
613
:What would you say, Arto, are the...
614
:like the Pareto effect, what would it be
if people want to get started with
615
:Prelease?
616
:Like the 20 % of uses that will give you
80 % of the benefits of Prelease for
617
:someone who don't know anything about it.
618
:That's a very good question.
619
:I think the most important thing actually
is to realize that we need to be careful
620
:when we set the priors.
621
:So simply being aware that you need a tool
for this.
622
:You need a tool that makes it easy to do
something like a prior predictive check.
623
:You need a tool that relieves you from
figuring out how do I inspect.
624
:my priors or the effects it has on the
model.
625
:That's actually where the real benefit is.
626
:You get most of the...
627
:when you kind of try to bring it as part
of your Bayesian workflow in a kind of a
628
:concrete step that you identify that I
need to do this.
629
:Then the kind of the remaining tale of
this thing is then of course that the...
630
:maybe in some cases you have such a
complicated model that you really need to
631
:deep dive and start...
632
:running algorithms that help you eliciting
the priors.
633
:And I would actually even say that the
elicitation algorithms, I do perceive them
634
:useful even when the person is actually a
statistician.
635
:I mean, there's a lot of models that we
may think that we know how to set the
636
:priors.
637
:But what we are actually doing is
following some very vague ideas on what's
638
:the effect.
639
:And we may also make
640
:severe mistakes or spend a lot of time in
doing it.
641
:So to an extent these elicitation
interfaces, I believe that ultimately they
642
:will be helping even kind of hardcore
statisticians in just kind of doing it
643
:faster, doing it slightly better, doing it
perhaps in a more better documented
644
:manner.
645
:So you could for example kind of store all
the interaction the modeler had.
646
:with these things and kind of put that
aside that this is where we got the prior
647
:from instead of just trial and error and
then we just see at the end the result.
648
:So you could kind of revisit the choices
you made during an elicitation process
649
:that I discarded these predictive
distributions for some reason and then you
650
:can later kind of, okay I made a mistake
there maybe I go and change my answer in
651
:that part and then an algorithm provides
you an updated prior.
652
:without you needing to actually go through
the whole prior specification process
653
:again.
654
:Yeah.
655
:Yeah.
656
:Yeah, I really love that.
657
:And that makes the process of setting
priors more reproducible, more transparent
658
:in a way.
659
:That makes me think a bit of the scikit
-learn pipelines that you use to transform
660
:the data.
661
:For instance, you just set up the pipeline
and you say, I want to standardize my
662
:data, for instance.
663
:And then you have that pipeline ready.
664
:And when you do the auto sample
predictions, you can use the pipeline and
665
:say, okay, now like do that same
transformation on these new data so that
666
:we're sure that it's done the right way,
but it's still transparent and people know
667
:what's going on here.
668
:It's a bit the same thing, but with the
priors.
669
:And I really love that because that makes
it also easier for people to think about
670
:the priors and to actually choose the
priors.
671
:Because.
672
:What I've seen in teaching is that
especially for beginners, even more when
673
:they come from the Frequentis framework,
sending the priors can be just like
674
:paralyzing.
675
:It's like products of choice.
676
:It's way too many, way too many choices.
677
:And then they end up not choosing anything
because they are too afraid to choose the
678
:wrong prior.
679
:Yes, I fully agree with that.
680
:I mean, there's a lot of very simple
models.
681
:that already start having six, seven,
eight different univariate priors there.
682
:And then I've been working with these
things for a long time and I still very
683
:easily make stupid mistakes that I'm
thinking that I increase the variance of
684
:this particular prior here, thinking that
what I'm achieving is, for example, higher
685
:predictive variance as well.
686
:And then I realized that, no, that's not
the case.
687
:It's actually...
688
:Later in the model, it plays some sort of
a role and it actually has the opposite
689
:effect.
690
:It's hard.
691
:Yeah.
692
:Yeah.
693
:That stuff is really hard and same here.
694
:When I discovered that, I'm extremely
frustrated because I'm like, I always did
695
:hours on these, whereas if I had a more
producible pipeline, that would just have
696
:been handled automatically for me.
697
:So...
698
:Yeah, for sure.
699
:We're not there yet in the workflow, but
that definitely makes it way easier.
700
:So yeah, I absolutely agree that we are
not there yet.
701
:I mean, the Prellis is a very well
-defined tool that allows us to start
702
:working on it.
703
:But I mean, then the actual concrete
algorithms that would make it easy to
704
:let's say for example, avoid these kind of
stupid mistakes and be able to kind of
705
:really reduce the effort.
706
:So if it now takes two weeks for a PhD
student trying to think about and fiddle
707
:with the prior, so can we get to one day?
708
:Can we get it to one hour?
709
:Can we get it to two minutes of a quick
interaction?
710
:And probably not two minutes, but if we
can get it to one hour and it...
711
:It will require lots of things.
712
:It will require even better of this kind
of tooling.
713
:So how do we visualize, how do we play
around with it?
714
:But I think it's going to require quite a
bit better algorithms on how do you, from
715
:kind of maximally limited interaction, how
do you estimate.
716
:what the prior is and how you design the
kind of optimal questions you should be
717
:asking from the expert.
718
:There's no point in kind of reiterating
the same things just to fine -tune a bit
719
:one of the variances of the priors if
there is a massive mistake still somewhere
720
:in the prior and a single question would
be able to rule out half of the possible
721
:scenarios.
722
:It's going to be an interesting...
723
:let's say, rise research direction, I
would say, for the next 5, 10 years.
724
:Yeah, for sure.
725
:And very valuable also because very
practical.
726
:So for sure, again, a great PhD
opportunity, folks.
727
:Yeah, yeah.
728
:Also, I mean, that may be hard to find
those algorithms that you were talking
729
:about because it is hard, right?
730
:I know I worked on the...
731
:find constraint prior function that we
have in PMC now.
732
:And it's just like, it seemed like a very
simple case.
733
:It's not even doing all the fancy stuff
that Prellis is doing.
734
:It's mainly just optimizing distribution
so that it fits the constraints that you
735
:are giving it.
736
:Like for instance, I want a gamma with 95
% of the mass between 2 and 6.
737
:Give me the...
738
:parameters that fit that constraint.
739
:That's actually surprisingly hard
mathematically.
740
:You have a lot of choices to make, you
have a lot of things to really be careful
741
:about.
742
:And so I'm guessing that's also one of the
hurdles right now in that research.
743
:Yeah, it absolutely is.
744
:I mean, I would say at least I'm
approaching this.
745
:more or less from an optimization
perspective then that I mean, yes, we are
746
:trying to find a prior that best satisfies
whatever constraints we have and trying to
747
:formulate an optimization problem of some
kind that gets us there.
748
:This is also where I think there's a lot
of room for the, let's say flexible
749
:machine learning tools type of things.
750
:So, I mean, if you think about the prior
that satisfies these constraints, we could
751
:be specifying it with some sort of a
flexible
752
:not a particular parametric prior but some
sort of a flexible representation and then
753
:just kind of optimizing for within a much
broader set of this.
754
:But then of course it requires completely
different kinds of tools that we are used
755
:to working on.
756
:It also requires people accepting that our
priors may take arbitrary shapes.
757
:They may be distributions that we could
have never specified directly.
758
:Maybe they're multimodal.
759
:priors that we kind of just infer that
this is what you couldn't really and
760
:there's going to be also a lot of kind of
educational perspective on getting people
761
:to accept this.
762
:But even if I had to give you a perfect
algorithm that somehow cranks out a prior
763
:and then you look at the prior and you're
saying that I don't even know what
764
:distribution this is, I would have never
ever converged into this if I was manually
765
:doing this.
766
:So will you accept?
767
:that that's your prior or will you insist
that your method is doing something
768
:stupid?
769
:I mean, I still want to use my my Gaussian
prior here.
770
:Yeah, that's a good point.
771
:And in a way that's kind of related to a
classic problem that you have when you're
772
:trying to automate a process.
773
:I think there's the same issue with the
automated cars, like those self -driving
774
:cars, where people actually trust the cars
more if they think they have
775
:some control over it.
776
:I've seen interesting experiments where
they put a placebo button in the car that
777
:people could push on to override if they
wanted to, but the button wasn't doing
778
:anything.
779
:People are saying they were more
trustworthy of these cars than the
780
:completely self -driving cars.
781
:That's also definitely something to take
into account, but that's more related to
782
:the human psychology than to the
algorithms per se.
783
:related to human psychology but it's also
related to this evaluation perspective.
784
:I mean of course if we did have a very
robust evaluation pattern that somehow
785
:tells that once you start using these
techniques your final conclusions in some
786
:sense will be better and if we can make
that kind of a very convincing then it
787
:will be easier.
788
:I mean if you think about, I mean there's
a lot of people that would say that
789
:very massive neural network with four
billion parameters.
790
:It would never ever be able to answer a
question given in a natural language.
791
:A lot of people were saying that five
years ago that this is a pipeline, it's
792
:never gonna happen.
793
:Now we do have it and now everyone is
ready to accept that yes, it can be done.
794
:And they are willing to actually trust
these judge -y pity type of models in a
795
:lot of things.
796
:And they are investing a lot of effort
into figuring out what to do with this.
797
:It just needs this kind of very concrete
demonstration that there is value and that
798
:it works well enough.
799
:It will still take time for people to
really accept it, but I mean, I think
800
:that's kind of the key ingredient.
801
:Yeah, yeah.
802
:I mean, it's also good in some way.
803
:Like that skepticism makes the tools
better.
804
:So that's good.
805
:I mean, so we could...
806
:Keep talking about Prolis because I have
other technical questions about that.
807
:But actually, since you're like, that's a
perfect segue to a question I also had for
808
:you because you have a lot of experience
in that field.
809
:So how do you think can industries better
integrate the patient approaches into
810
:their data science workflows?
811
:Because that's basically what we ended up
talking about right now without me nudging
812
:you towards it.
813
:Yeah, I have actually indeed been thinking
about that quite a bit.
814
:So I do a lot of collaboration with
industrial partners in different domains.
815
:I think there's a couple of perspectives
to this.
816
:So one is that, I mean, people are
finally, I think they are starting to
817
:accept the fact that probabilistic
programming with kind of black box
818
:automated inference is the only sensible
way.
819
:doing statistical modeling.
820
:So looking at back like 10 -15 years ago,
you would still have a lot of people,
821
:maybe not in industry but in research in
different disciplines, in meteorology or
822
:physics or whatever.
823
:People would actually be writing
Metropolis -Hastings algorithms from
824
:scratch, which is simply not reliable in
any sense.
825
:I mean, it took time for them to accept
that yes, we can actually now do it with
826
:something like Stan.
827
:I think this is of course the way that to
an extent that there are problems that fit
828
:well with what something like Stan or
Priency offers.
829
:I think we've been educating long enough
master students who are kind of familiar
830
:with these concepts.
831
:Once they go to the industry they will use
them, they know roughly how to use them.
832
:So that's one side.
833
:But then the other thing is that I
think...
834
:Especially in many of these predictive
industries, so whether it's marketing or
835
:recommendation or sales or whatever,
people are anyway already doing a lot of
836
:deep learning types of models there.
837
:That's a routine tool in what they do.
838
:And now if we think about that, at least
in my opinion, that these fields are
839
:getting closer to each other.
840
:So we have more and more deep learning
techniques that are, like various and
841
:autoencoder is a prime example, but it is
ultimately a Bayesian model in itself.
842
:This may actually be that they creep
through that all this bayesian thinking
843
:and reasoning is actually getting into use
by the next generation of these deep
844
:learning techniques that they are doing.
845
:They've been building those models,
they've been figuring out that they cannot
846
:get reliable estimates of uncertainty,
they maybe tried some ensembles or
847
:whatnot.
848
:And they will be following.
849
:So once the tools are out there, there's
good enough tutorials on how to use those.
850
:So they might start using things like,
let's say, Bayesian neural networks or
851
:whatever the latest tool is at that point.
852
:And I think this may be the easiest way
for the industries to do so.
853
:They're not going to go switch back to
very simple classical linear models when
854
:they do their analysis.
855
:But they're going to make their deep
learning solutions Bayesian on some time
856
:scale.
857
:Maybe not tomorrow, but maybe in five
years.
858
:Yeah, that's a very good point.
859
:Yeah, I love that.
860
:And of course, I'm very happy about that,
being one of the actors making the
861
:industry more patient.
862
:So I have a vested interest in these.
863
:But yeah, also, I've seen the same
evolution you were talking about.
864
:Right now, it's not even really an issue
of
865
:convincing people to use these kind of
tools.
866
:I mean, still from time to time, but less
and less.
867
:And now the question is really more in
making those tools more accessible, more
868
:versatile, easier to use, more reliable,
easier to deploy in industry, things like
869
:that, which is a really good point to be
at for sure.
870
:And to some extent, I think it's...
871
:It's an interesting question also from the
perspective of the tools.
872
:So to some extent, it may mean that we
just end up doing a lot of the kind of
873
:Bayesian analysis on top of what we would
now call deep learning frameworks.
874
:And it's going to be, of course, it's
going to be libraries building on top of
875
:those.
876
:So like NumPyro is a library building on
PyTorch.
877
:But the syntax is kind of intentionally
similar to what they've used in
878
:used to in the deep learning type of
modeling these.
879
:And this is perfectly fine.
880
:We are anyway using a lot of stochastic
optimization routines in Bayesian
881
:inference and so on.
882
:So they are actually very good tools for
building all kinds of Bayesian models.
883
:And I think this may be the layer where
the industry use happens, that it's going
884
:to be always.
885
:They need the GPU type of scaling and
everything there anyway.
886
:So just happy to have our systems.
887
:work on top of these libraries.
888
:Yeah, very good point.
889
:And also to come back to one of the points
you've made in passing, where education is
890
:helping a lot with that.
891
:You have been educating now the data
scientists who go in industry.
892
:And I know in Finland, in France, not that
much.
893
:Where are you originally from?
894
:But in Finland, I know there is this
really great integration between the
895
:research part, the university and the
industry.
896
:You can really see that in the PhD
positions, in the professorship positions
897
:and stuff like that.
898
:So I think that's really interesting and
that's why I wanted to talk to you about
899
:that.
900
:To go back to the education part, what
challenges and opportunities do you see in
901
:teaching Bayesian machine learning as you
do at the university level?
902
:Yeah, it's challenging.
903
:I must say that.
904
:I mean, especially if we get to the point
of well, Bayesian machine learning.
905
:So it is a combination of two topics that
are somewhat difficult in itself.
906
:So if we want to talk about normalizing
flows and then we want to talk about
907
:statistical properties of estimators or
MCMC convergence.
908
:So they require different kinds of
mathematical tools.
909
:tools, they require a certain level of
expertise on the software, on the
910
:programming side.
911
:So what it means actually is that it's
even that if we look at the population of
912
:let's say data science students, we can
always have a lot of people that are
913
:missing background on one of these sites.
914
:So I think this is a difficult topic to
teach.
915
:If it was a small class, it would be fine.
916
:But it appears to be that at least our
students are really excited about these
917
:things.
918
:So I can launch a course with explicitly a
title of a Bayesian machine learning,
919
:which is like an advanced level machine
learning course.
920
:And I would still get 60 to 100 students
enrolling on that course.
921
:And then that means that within that
group, there's going to be some CS
922
:students with almost no background on
statistics.
923
:There's going to be some statisticians who
924
:certainly know how to program but they're
not really used to thinking about GPU
925
:acceleration of a very large model.
926
:But it's interesting, I mean it's not an
impossible thing.
927
:I think it is also a topic that you can
kind of teach on a sufficient level for
928
:everyone.
929
:So everyone agrees is able to understand
the basic reasoning of why we are doing
930
:these things.
931
:Some of the students may struggle,
932
:figuring out all the math behind it.
933
:But they might still be able to use these
tools very nicely.
934
:They might be able to say that if I do
this and that kind of modification, I
935
:realize that my estimates are better
calibrated.
936
:And some others are really then going
deeper into figuring out why these things
937
:work.
938
:So it just needs a bit of creativity on
how do we do it and what do we expect from
939
:the students.
940
:What should they know once they've
completed a course like this?
941
:Yeah, that makes sense.
942
:Do you have seen also an increase in the
number of students in the recent years?
943
:Well, we get as many students as we can
take.
944
:So I mean, it's actually been for quite a
while already that in our university, by
945
:far the most...
946
:popular master's programs and bachelor's
programs are essentially data science and
947
:computer science.
948
:So we can't take in everyone we would
want.
949
:So it actually looks to us that it's more
or less like a stable number of students,
950
:but it's always been a large number since
we launched, for example, the data science
951
:program.
952
:So it went up very fast.
953
:So there's definitely interest.
954
:Yeah.
955
:Yeah.
956
:That's fantastic.
957
:And...
958
:So I've been taking a lot of your time.
959
:So we're going to start to close up the
show, but there are at least two questions
960
:I want to get your insight on.
961
:And the first one is, what do you think
the biggest hurdle in the Bayesian
962
:workflow currently is?
963
:We've talked about that a bit already, but
how do you want to get your structured
964
:answer?
965
:Well, I think the first thing is that
getting people to actually start
966
:using more or less systematic workflows.
967
:I mean, the idea is great.
968
:We kind of know more or less how we should
be thinking about it, but it's a very
969
:complex object.
970
:So we're going to be able to tell experts,
statisticians that, yes, this is roughly
971
:how you should do.
972
:Then we should still also convince them
that, like, almost force them to stick to
973
:it.
974
:But then especially if we then think about
newcomers, people who are just starting
975
:with these things, it's a very complicated
thing.
976
:So if you would need to read 50 page book
or 100 page book about Bayesian workflow
977
:to even know how to do it, it's a
technical challenge.
978
:So I think in long term, we are going to
get essentially tools for assisting it.
979
:So really kind of streamlining the
process.
980
:thinking of something like an AI assistant
for a person building a model that they
981
:really kind of pull you that now I see
that you are trying to go there and do
982
:this, but I see that you haven't done
prior predictive checks.
983
:I actually already created some plots for
you.
984
:Please take a look at these and confirm
that is this what you were expecting?
985
:And it's going to be a lot of effort in
creating those.
986
:It's something that we've been kind of
trying to think about.
987
:how to do it, but it's still.
988
:I think that's where the challenge is.
989
:We know most of the stuff within the
workflow, roughly how it should be done.
990
:At least we have good enough solutions.
991
:But then really kind of helping people to
actually follow these principles, that's
992
:gonna be hard.
993
:Yeah, yeah, yeah.
994
:But damn, that would be super cool.
995
:Like talking about something like a Javis,
you know, like the AI assistant
996
:environment, a Javis, but for...
997
:Beijing models, how cool would that be?
998
:Love that.
999
:And looking forward, how do you see
Beijing methods evolving with artificial
:
01:06:54,476 --> 01:06:56,198
intelligence research?
:
01:06:58,284 --> 01:07:00,728
Yeah, I think.
:
01:07:02,476 --> 01:07:06,356
For quite a while I was about to say that,
like I've been kind of building this basic
:
01:07:06,356 --> 01:07:10,486
idea that the deep learning models as such
will become more and more basic in any
:
01:07:10,486 --> 01:07:10,976
way.
:
01:07:10,976 --> 01:07:13,276
So that's kind of a given.
:
01:07:13,276 --> 01:07:19,916
But now of course, now the recent very
large scale AI models, they're getting so
:
01:07:19,916 --> 01:07:25,656
big that then the question of
computational resources is, it's a major
:
01:07:25,656 --> 01:07:31,658
hurdle to do learning for those models,
even in the crudest possible way.
:
01:07:31,658 --> 01:07:37,678
So it may be that there's of course kind
of clear needs for uncertainty
:
01:07:37,678 --> 01:07:41,258
quantification in the large language model
type of scopes.
:
01:07:41,258 --> 01:07:43,088
They are really kind of unreliable.
:
01:07:43,088 --> 01:07:47,558
They're really poor at, for example,
evaluating their own confidence.
:
01:07:47,558 --> 01:07:52,228
So there's been some examples that if you
ask how sure you are about these states,
:
01:07:52,228 --> 01:07:55,238
more or less irrespective of the
statement, give similar number.
:
01:07:55,238 --> 01:07:56,388
Yeah, 50 % sure.
:
01:07:56,388 --> 01:07:57,888
I don't know.
:
01:07:58,708 --> 01:08:01,412
So it may be that the
:
01:08:01,580 --> 01:08:05,150
It's not really, at least on a very short
run, it's not going to be the Bayesian
:
01:08:05,150 --> 01:08:10,040
techniques that really sells all the
uncertainty quantification in those type
:
01:08:10,040 --> 01:08:10,400
of models.
:
01:08:10,400 --> 01:08:13,560
In the long term, it maybe is.
:
01:08:13,560 --> 01:08:15,140
But I think there's a lot of...
:
01:08:15,140 --> 01:08:16,380
It's going to be interesting.
:
01:08:16,380 --> 01:08:21,480
It looks to me a bit that it's a lot of
stuff that's built on top of...
:
01:08:21,480 --> 01:08:27,058
To address specific limitations of these
large language models, it is...
:
01:08:27,058 --> 01:08:28,368
separate components.
:
01:08:28,368 --> 01:08:32,468
It's some sort of an external tool that
reads in those inputs or it's an external
:
01:08:32,468 --> 01:08:35,088
tool that the LLM can use.
:
01:08:35,248 --> 01:08:39,298
So maybe this is going to be this kind of
a separate element that somehow
:
01:08:39,298 --> 01:08:40,228
integrates.
:
01:08:40,228 --> 01:08:50,088
So an LLM, of course, could be having an
API interface where it can query, let's
:
01:08:50,088 --> 01:08:51,948
say, use tan.
:
01:08:51,948 --> 01:08:56,418
to figure out an answer to type of a
question that requires probabilistic
:
01:08:56,418 --> 01:08:57,328
reasoning.
:
01:08:57,328 --> 01:09:03,248
So people have been plugging in, there's
this public famous examples where you can
:
01:09:03,248 --> 01:09:07,108
query like some mathematical reasoning
engines and so on.
:
01:09:07,108 --> 01:09:11,058
So that the LLM, if you ask a specific
type of a question, it goes outside of its
:
01:09:11,058 --> 01:09:13,128
own realm and does something.
:
01:09:13,248 --> 01:09:17,578
It already kind of knows how to program,
so maybe we just need to teach LLMs to do
:
01:09:17,578 --> 01:09:19,214
statistical inference.
:
01:09:19,596 --> 01:09:24,716
by relying on actually running an MCMC
algorithm on a model that they kind of
:
01:09:24,716 --> 01:09:26,596
specify together with the user.
:
01:09:26,596 --> 01:09:29,086
I don't know whether anyone is actually
working on that.
:
01:09:29,086 --> 01:09:31,246
It's something that just came to my mind.
:
01:09:31,246 --> 01:09:34,096
So I haven't really thought about this too
much.
:
01:09:35,436 --> 01:09:41,436
Yeah, but again, we're getting so many PhD
ideas for people right now.
:
01:09:41,436 --> 01:09:42,576
We are.
:
01:09:42,576 --> 01:09:48,684
Yeah, I feel like we should be doing the
best of all your...
:
01:09:48,684 --> 01:09:50,564
Awesome PhD ideas.
:
01:09:51,804 --> 01:09:52,194
Awesome.
:
01:09:52,194 --> 01:09:59,324
Well, I still have so many questions for
you, but let's go to the show because I
:
01:09:59,324 --> 01:10:01,064
don't want to take too much of your time.
:
01:10:01,064 --> 01:10:02,884
I know it's getting late in Finland.
:
01:10:02,884 --> 01:10:07,344
So let's close up the show and ask you the
last two questions.
:
01:10:07,344 --> 01:10:10,124
I always ask at the end of the show.
:
01:10:10,124 --> 01:10:14,814
First one, if you had unlimited time and
resources, which problem would you try to
:
01:10:14,814 --> 01:10:15,684
solve?
:
01:10:16,780 --> 01:10:17,720
Let's see.
:
01:10:17,720 --> 01:10:23,760
The lazy answer is that I am now trying to
get unlimited resources, well, not
:
01:10:23,760 --> 01:10:28,160
unlimited resources, but I'm really trying
to tackle this prior elicitation question.
:
01:10:28,160 --> 01:10:32,900
I think most of the other parts on the
Bayesian workflow are kind of, we have
:
01:10:32,900 --> 01:10:36,750
reasonably good solutions for those, but
this whole question of really how to
:
01:10:36,750 --> 01:10:42,926
figure out complex multivariate priors
over arbitrary complex models.
:
01:10:42,988 --> 01:10:47,348
That's a very practical thing that I am
investing on.
:
01:10:47,388 --> 01:10:51,588
But maybe if I'm kind of taking, if it
really is infinite, then maybe I could
:
01:10:51,588 --> 01:10:55,948
actually continue on the quick idea that
we just talked about.
:
01:10:55,948 --> 01:11:01,278
That I mean really getting this
probabilistic reasoning at the core of
:
01:11:01,278 --> 01:11:04,638
these large language model type of AI
applications.
:
01:11:04,638 --> 01:11:13,188
That it would really be reliably answering
proper probabilistic judgments of the
:
01:11:13,228 --> 01:11:17,048
kind of decision -making reasoning
problems that we ask from them.
:
01:11:17,148 --> 01:11:18,988
So that would be interesting.
:
01:11:19,308 --> 01:11:19,528
Yeah.
:
01:11:19,528 --> 01:11:21,668
Yeah, for sure.
:
01:11:22,748 --> 01:11:26,808
And second question, if you could have
dinner with any great scientific mind,
:
01:11:26,808 --> 01:11:29,988
dead or alive or fictional, who would it
be?
:
01:11:30,328 --> 01:11:34,228
Yes, this is something I actually thought
about it because I figured you would be
:
01:11:34,228 --> 01:11:36,208
asking it also from me.
:
01:11:36,248 --> 01:11:39,708
And I chose that I mean fictional
characters.
:
01:11:39,708 --> 01:11:41,238
I like fictional characters.
:
01:11:41,238 --> 01:11:43,052
So I went with...
:
01:11:43,052 --> 01:11:48,232
Daniel Waterhouse from Niels Deffensen's
The Baroque Cycle books.
:
01:11:48,492 --> 01:11:50,772
So they are kind of semi -historical
books.
:
01:11:50,772 --> 01:11:57,352
So they talk about the era where Isaac
Newton and others are kind of living and
:
01:11:57,352 --> 01:11:59,372
establishing the Royal Society.
:
01:11:59,372 --> 01:12:03,872
And there's a lot of high fantasy
components involved.
:
01:12:04,132 --> 01:12:12,970
And Daniel Waterhouse in those novels is
his roommate of Isaac Newton and a friend.
:
01:12:12,980 --> 01:12:14,840
of Gottfried Leibniz.
:
01:12:14,840 --> 01:12:20,250
So he knows both sides of this great
debate on who invented calculus and who
:
01:12:20,250 --> 01:12:21,600
copied whom.
:
01:12:21,600 --> 01:12:27,020
So if I had a dinner with him, I would get
to talk about these innovations that I
:
01:12:27,020 --> 01:12:29,840
think are one of the foundational ones.
:
01:12:29,840 --> 01:12:34,170
But I wouldn't actually need to get
involved with either party.
:
01:12:34,170 --> 01:12:39,020
I wouldn't need to choose sides, whether
it's Isaac or Gottfried that I would be
:
01:12:39,020 --> 01:12:40,200
talking to.
:
01:12:41,164 --> 01:12:42,344
Love it.
:
01:12:42,344 --> 01:12:43,704
Yeah, love that answer.
:
01:12:43,704 --> 01:12:47,204
Make sure to record that dinner and post
it on YouTube.
:
01:12:47,204 --> 01:12:50,564
I'm pretty sure lots of people will be
interested in it.
:
01:12:50,564 --> 01:12:51,334
Fantastic.
:
01:12:51,334 --> 01:12:51,804
Thanks.
:
01:12:51,804 --> 01:12:53,284
Thanks a lot, Arto.
:
01:12:53,644 --> 01:12:56,184
That was a great discussion.
:
01:12:56,184 --> 01:13:01,214
Really happy we could go through the,
well, not the whole depth of what you do
:
01:13:01,214 --> 01:13:04,204
because you do so many things, but a good
chunk of it.
:
01:13:04,204 --> 01:13:06,114
So I'm really happy about that.
:
01:13:06,114 --> 01:13:08,108
As usual,
:
01:13:08,108 --> 01:13:12,008
I'll put resources and a link to your
website in the show notes for those who
:
01:13:12,008 --> 01:13:13,188
want to dig deeper.
:
01:13:13,288 --> 01:13:17,148
Thank you again, Akto, for taking the time
and being on this show.
:
01:13:18,348 --> 01:13:19,348
Thank you very much.
:
01:13:19,348 --> 01:13:20,718
It was my pleasure.
:
01:13:20,718 --> 01:13:22,856
I really enjoyed the discussion.
:
01:13:26,796 --> 01:13:30,496
This has been another episode of Learning
Bayesian Statistics.
:
01:13:30,496 --> 01:13:35,486
Be sure to rate, review, and follow the
show on your favorite podcatcher, and
:
01:13:35,486 --> 01:13:40,376
visit learnbaystats .com for more
resources about today's topics, as well as
:
01:13:40,376 --> 01:13:45,116
access to more episodes to help you reach
true Bayesian state of mind.
:
01:13:45,116 --> 01:13:47,076
That's learnbaystats .com.
:
01:13:47,076 --> 01:13:51,916
Our theme music is Good Bayesian by Baba
Brinkman, fit MC Lass and Meghiraam.
:
01:13:51,916 --> 01:13:55,036
Check out his awesome work at bababrinkman
.com.
:
01:13:55,036 --> 01:13:56,234
I'm your host.
:
01:13:56,234 --> 01:13:57,184
Alex and Dora.
:
01:13:57,184 --> 01:14:01,464
You can follow me on Twitter at Alex
underscore and Dora like the country.
:
01:14:01,464 --> 01:14:06,524
You can support the show and unlock
exclusive benefits by visiting patreon
:
01:14:06,524 --> 01:14:08,704
.com slash LearnBasedDance.
:
01:14:08,704 --> 01:14:11,144
Thank you so much for listening and for
your support.
:
01:14:11,144 --> 01:14:17,034
You're truly a good Bayesian change your
predictions after taking information and
:
01:14:17,034 --> 01:14:20,324
if you think and I'll be less than
amazing.
:
01:14:20,364 --> 01:14:26,304
Let me show you how to be a good Bayesian.
:
01:14:26,304 --> 01:14:29,844
Change calculations after taking fresh
data in.
:
01:14:29,844 --> 01:14:33,124
Those predictions that your brain is
making.
:
01:14:33,124 --> 01:14:36,624
Let's get them on a solid foundation.