Speaker:
00:00:05
Today I am excited to host Vincent Fortwin, a leading researcher in Bayesian deep learning
and AI for science.
2
:
00:00:13
Vincent is a tenure-track research group leader at Helmholtz AI in Munich, where he leads
the Efficient Learning and Probabilistic Inference for Science group.
3
:
00:00:23
In this episode, we explore why traditional deep learning often struggles in scientific
applications
4
:
00:00:28
and how incorporating prior knowledge and uncertainty quantification can enhance model
reliability.
5
:
00:00:35
Vincent shares his insight on generative AI, meta-learning and inference techniques like
Laplace and subspace inference, explaining how they contribute to more efficient and
6
:
00:00:46
robust AI models.
7
:
00:00:48
We'll also discuss the current landscape of Bayesian deep learning libraries, the
challenges of real-world applications and the role of PAC Bayesian theory
8
:
00:00:57
providing generalization bounds.
9
:
00:00:59
Whether you're an AI researcher or someone interested in the intersection of deep learning
and science, this episode is packed with insights into the future of reliable and
10
:
00:01:10
data-efficient AI.
11
:
00:01:11
This is Learning Vision Statistics, episode 129, recorded November 22, 2024.
12
:
00:01:24
Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the
projects, and the people who make it possible.
13
:
00:01:45
I'm your host, Alex Andorra.
14
:
00:01:47
You can follow me on Twitter at alex-underscore-andorra.
15
:
00:01:51
like the country.
16
:
00:01:51
For any info about the show, learnbasedats.com is Laplace to be.
17
:
00:01:56
Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
18
:
00:02:03
That's learnbasedats.com.
19
:
00:02:05
If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
20
:
00:02:16
See you around, folks.
21
:
00:02:17
and best patient wishes to you all.
22
:
00:02:19
And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
23
:
00:02:26
Check us out at pimc-labs.com.
24
:
00:02:32
Hello my dear patients!
25
:
00:02:34
Well, I hope that you are doing well and before we dive into this episode, I wanna thank
some new patrons, namely Ejit Asik and Suyog Chandramouli.
26
:
00:02:47
I hope I am not butchering your names, guys.
27
:
00:02:50
But thank you so much for supporting the show on Patreon.
28
:
00:02:55
Well, not exactly Patreon, actually on YouTube.
29
:
00:02:59
You guys are the first ones
30
:
00:03:02
to support the show on YouTube on the good Beijing tier and above.
31
:
00:03:08
So well done guys and thank you so much.
32
:
00:03:11
Thank you so much not only for being the first one to support the show on YouTube, but to
support the show period.
33
:
00:03:18
Really as you know, this is exactly the support that makes the show possible.
34
:
00:03:23
I pay for editing, I pay for everything with your support editing.
35
:
00:03:29
Posting recording all that stuff that you don't see you that goes into producing the
episode.
36
:
00:03:34
Well, I pay for that Thanks to you guys support.
37
:
00:03:39
So thank you so much Make sure to link your YouTube account to your discord account and
that way you will be automatically added to the LBS discord server and well, I can't wait
38
:
00:03:51
to see you in there and
39
:
00:03:54
If other people are interested in supporting the show on YouTube, well, you can just go to
the YouTube channel, LBS, Learn Big Stats.
40
:
00:04:01
You look into that in YouTube and then you will see a membership tab and you'll have all
the info in there.
41
:
00:04:08
That should be super easy to set up with your YouTube account.
42
:
00:04:11
You just link then your Discord account and you're all done.
43
:
00:04:14
So on that note, thank you again, guys.
44
:
00:04:18
I will see you in the Discord.
45
:
00:04:20
And now, onto the show.
46
:
00:04:24
Vincent Fortoyn, welcome to Learning Vision Statistics.
47
:
00:04:30
Hi Alex, great to be here.
48
:
00:04:32
Yeah, thank you for taking the time.
49
:
00:04:34
How was my Dutch pronunciation?
50
:
00:04:36
Perfect, mean, impeccable.
51
:
00:04:38
Although, you know, I have to say like there's this weird phoneme in Dutch that like also
I personally can't actually pronounce that well because I grew up in Germany, but like
52
:
00:04:46
it's definitely as close as I would come.
53
:
00:04:49
So well done.
54
:
00:04:50
wants to contribute more to baseball and I keep trying but where keep getting in the
middle my secret wish is that at one point I will find a way to have to use amortized
55
:
00:05:04
patient inference for the Marlins and then I can actually contribute to base flow for my
job so that's like that's what I'm trying to do but right now I have to focus on other
56
:
00:05:17
priorities still
57
:
00:05:19
contributing to other open source packages.
58
:
00:05:23
Sorry about that, Marvin.
59
:
00:05:25
I'm trying.
60
:
00:05:26
I'm trying.
61
:
00:05:27
That's perfectly fine.
62
:
00:05:28
Yeah, sure.
63
:
00:05:29
So I didn't have the most straightforward path.
64
:
00:05:31
I started off studying biochemistry in my undergrad.
65
:
00:05:34
And I don't know if you've ever been to a biochemistry lab.
66
:
00:05:38
Essentially, it's this kind of place where you pipette little watery liquids into each
other, and it takes you a week.
67
:
00:05:44
And then you stick it into some machine that goes, merp, you did it wrong, start from
scratch.
68
:
00:05:50
And so I'm caricaturing, but like I guess I was just a bit too clumsy for the experiment,
so they never worked out and I found that a bit frustrating.
69
:
00:05:57
And I realized that I could handle computers much better than pipettes, so I moved on to
bioinformatics.
70
:
00:06:04
And when I did my masters in bioinformatics, it was just about the time when you would see
all these papers that claimed that all these algorithms that bioinformaticians had worked
71
:
00:06:13
on for decades.
72
:
00:06:15
were essentially being beaten by deep learning solutions, right?
73
:
00:06:19
So I wanted to be on the right side of history and moved into deep learning for my PhD.
74
:
00:06:26
But what I realized was that a lot of this hype in AI for science was not really
delivering on the promises they made because scientists really have a lot of prior
75
:
00:06:36
knowledge about their field that they wanted to get into these models.
76
:
00:06:40
And they also were really careful about
77
:
00:06:42
like the kind of predictions they made, right?
78
:
00:06:44
They didn't just want a high accuracy, but they want to have well calibrated predictions
that they can really generate insight out of.
79
:
00:06:51
And normal deep learning didn't quite fit the bill, right?
80
:
00:06:54
And so that's how I got into Bayesian deep learning, where then the hope is of course to
marry the expressive power of deep learning with all the promises that Bayesian statistics
81
:
00:07:04
usually gives us, which is like putting in prior knowledge into the models and getting
uncertainties and just like.
82
:
00:07:11
optimal decisions under some sense.
83
:
00:07:15
Okay, okay, I see.
84
:
00:07:16
That's super cool.
85
:
00:07:17
I really like the meandering path like that, know, illustration of randomness and a lot of
good things actually come from randomness.
86
:
00:07:29
that's great.
87
:
00:07:31
And so today, what are you focusing on?
88
:
00:07:34
And you know, what does it mean to be a researcher in patient deep learning?
89
:
00:07:37
Because to me, it sounds like, you know,
90
:
00:07:40
the conjunction of three extremely highly rated SEO keywords.
91
:
00:07:48
Yeah, for sure.
92
:
00:07:50
I mean, there's definitely like a lot of the work that we're doing in our lab that's still
on the method side and trying to be better at doing inference in these Bayesian models and
93
:
00:08:01
trying to, you know, like just make them more reliable and robust.
94
:
00:08:06
But we also look a lot into application areas like AI for science.
95
:
00:08:09
So as I said, like my background was in science originally, so I'm still trying to follow
through on that a little bit at least.
96
:
00:08:17
And a particularly interesting area that we're looking at right now is sequential
learning.
97
:
00:08:22
So in the context of Bayesian optimization or Bayesian experimental design.
98
:
00:08:26
So I know you had Desi on the podcast recently, so that kind of stuff.
99
:
00:08:32
and obviously like these days, if you do anything related to deep learning, you can't
ignore that we have these big foundation models and LLMs and these kinds of things now.
100
:
00:08:41
So some of the work we do is also trying to figure out how we can fit in into that space
and how we can make them more Bayesian in some way, which, probably doesn't make much
101
:
00:08:51
sense on the pre-training because you have more or less infinite data anyway.
102
:
00:08:56
But in the fine tuning, we've done some work that is quite interesting where typically if
you have your big GPT or whatever, LLM, and you want to fine tune it on a tiny data set in
103
:
00:09:07
your target domain, like this is where you really start caring about the uncertainties.
104
:
00:09:12
And then if you fine tune it in a way that's inspired by Bayesian updating, that usually
gives you much better calibration than if you do the standard fine tuning.
105
:
00:09:20
Okay, I see.
106
:
00:09:21
That's pretty cool.
107
:
00:09:22
I didn't know that.
108
:
00:09:23
So I should, we'll...
109
:
00:09:25
Let's dive a bit into that because I want to ask you a bit more about Bayesian deep
learning and so on, but I'm curious about that.
110
:
00:09:31
Like how would that work?
111
:
00:09:34
Like what would be the workflow here where you would use Bayesian statistics or I'm
guessing more prior knowledge and infuse that into the fine tuning of the audience?
112
:
00:09:46
Because I mean, I'm not surprised by that because if I'm like that last time I checked and
read more about these methods, fine tuning was
113
:
00:09:55
kind of a, human heavy element of the LLM workflow.
114
:
00:10:01
So I'm really curious to hear about that.
115
:
00:10:04
Yeah, definitely.
116
:
00:10:05
So I think like philosophically the way I think about normal fine tuning is also from a
somewhat Bayesian viewpoint, right?
117
:
00:10:12
So if you pre-train your LLM on the entire text on the internet, you can somehow view that
as a, as a way to encode all the prior knowledge that is on the internet into some.
118
:
00:10:24
into some condensed form, right?
119
:
00:10:26
Which now comes as a point estimate of parameters of some big transformer model.
120
:
00:10:31
And then what you typically do these days in fine tuning is this idea of parameter
efficient fine tuning, right?
121
:
00:10:36
So you actually keep this big model fixed, like the backbone, as people would say, and
then you add these low rank adapters like LoRa, or there's kind of more modern versions
122
:
00:10:47
that are called Vera or whatever.
123
:
00:10:50
But so what you end up doing is you have this big model with billions of parameters.
124
:
00:10:54
But then you have your small adapters with like just a few million parameters that you
fine tune.
125
:
00:10:59
And they actually guide the big model towards the task you care about.
126
:
00:11:04
And so what we did is essentially like within these small parameter efficient adapters to
then treat them in a Bayesian way, right?
127
:
00:11:11
So like the big model is still just a fixed backbone of a point estimate.
128
:
00:11:15
And that's essentially in some way our prior.
129
:
00:11:18
But then we use these small adapter layers to do Bayesian inference.
130
:
00:11:22
because they're
131
:
00:11:24
you know, like very small, I said, like you can do Bayesian inference actually quite
efficiently as opposed to the big neural network or the big transformer where you couldn't
132
:
00:11:32
do it.
133
:
00:11:33
Okay, I see.
134
:
00:11:34
This is awesome.
135
:
00:11:36
I didn't know that was the case yet.
136
:
00:11:38
So actually, could you now actually define, you know, Bayesian deep learning, you know,
maybe what's deep learning in comparison to machine learning, for instance, and what makes
137
:
00:11:50
it Bayesian?
138
:
00:11:52
Yeah, sure.
139
:
00:11:53
So, so, okay.
140
:
00:11:54
I'm going to give you the more traditional view first, and then a second one that I
prefer.
141
:
00:12:00
so in the traditional sense, like if you think about what deep learning is, like it's
essentially trying to learn functions that are parametrized by these artificial neural
142
:
00:12:10
networks, right?
143
:
00:12:11
So it's essentially, an architecture that has an input layer where you put your data and
then it propagates through several layers.
144
:
00:12:20
So that's where the deep comes from.
145
:
00:12:23
And at the end, there's an output layer that gives you the predictions, right?
146
:
00:12:27
So algebraically speaking, essentially, it's just a bunch of matrices, which are your
weights that you multiply with the input vector.
147
:
00:12:35
And then you apply some nonlinearity, which is the activation function of the neural
network.
148
:
00:12:41
Now, this is really a powerful way of learning functions because you can prove that if you
make your network big enough, it can approximate any function.
149
:
00:12:49
So that's what's called the universal approximation theorem.
150
:
00:12:53
And, you know, using techniques like backpropagation, we can actually do this quite
efficiently in GPUs, for instance, right?
151
:
00:13:01
So that's why this whole deep learning paradigm has become so popular because we just
happen to have the hardware that could do these matrix vector products quite efficiently.
152
:
00:13:12
And so now once you have this neural network idea, right, this deep learning, then the
classic idea to make it Bayesian is to say, instead of just learning a single setting of
153
:
00:13:21
the parameters,
154
:
00:13:22
you learn a distribution over parameters, right?
155
:
00:13:25
So then you don't just have one setting for your weights, but you have, for instance, a
Gaussian distribution over weights, which is then defined by a mean and a covariance
156
:
00:13:33
matrix.
157
:
00:13:35
And then you have to figure out how to get to that distribution.
158
:
00:13:38
And one intuitive way is to do it via Bayesian inference, right?
159
:
00:13:43
So you write down some distribution that is your prior, then you observe your data, you
use Bayesian inference to update, and then you get a posterior.
160
:
00:13:50
And then from that posterior, can sample different parameters for the network, which will
then give you different predictions.
161
:
00:13:58
So each sample of the parameters gives you a different set of predictions.
162
:
00:14:01
And then you can use that to quantify the uncertainty in your prediction space.
163
:
00:14:06
So this is kind of the classic textbook view of what Bayesian deep learning is, right?
164
:
00:14:11
The way I personally like to rather think about it is slightly the other way around.
165
:
00:14:17
And I think this is where
166
:
00:14:19
Like the difference between statistics and machine learning comes in, right?
167
:
00:14:22
So in statistics, you really care about the parameters of the model, right?
168
:
00:14:28
So if you build a model for how well different players and the model ends up performing,
then these are actually like interpretable parameters.
169
:
00:14:35
Like, you know, which parameter is which player and you infer a posterior and that tells
you something about the real world.
170
:
00:14:42
In this neural network setting, of course, these parameters are just arbitrary, right?
171
:
00:14:46
We just like build this big model.
172
:
00:14:47
We could have built it twice as big or half, and then these parameters would be different.
173
:
00:14:51
But ultimately what we care about is the function we're learning.
174
:
00:14:54
So we're trying to learn a function that maps from inputs to outputs.
175
:
00:14:58
And we just use this neural network as a convenient shape of that function because we know
that it can approximate things well.
176
:
00:15:04
And so that's how I kind of like to think about Bayesian deep learning rather as Bayesian
inference in the function space.
177
:
00:15:12
So very similar to how a Gaussian process would work, right?
178
:
00:15:15
So like in a Gaussian process, you essentially also have a distribution over functions,
but it's a very restricted one because it's Gaussian, right?
179
:
00:15:24
And in the Bayesian deep learning sense, you could say we have a very flexible
distribution over functions because we know that these neural networks can essentially fit
180
:
00:15:32
any function we want.
181
:
00:15:35
And then the main thing we have to care about is like, how does our posterior and function
space look like?
182
:
00:15:41
And we don't really care about the parameters.
183
:
00:15:43
That's just the means towards the end of getting a distribution over functions that fits
our data well.
184
:
00:15:47
Hey, very interesting, because I was going to ask you, so we've talked already on the show
several times about the thing that actually neural networks converge to Gaussian
185
:
00:15:58
processes.
186
:
00:16:00
and the way he's right and one of the extremely close to what caution processes are doing
so as gonna see what the difference is between well the deep neural network and a motion
187
:
00:16:10
process he seems like they are very close to each other anyways, that's equal to two here
and in a person's for someone who would like to
188
:
00:16:29
Which library would you recommend looking at?
189
:
00:16:33
Yeah, so that's a good question.
190
:
00:16:36
It's a bit of a pain point maybe.
191
:
00:16:39
like right now, I guess we have this issue that there isn't like one library that rules
them all kind of, right?
192
:
00:16:44
So I think that's why you mentioning the base flow, right?
193
:
00:16:49
Before, I think this is a great effort to try to...
194
:
00:16:52
put everyone on one ship.
195
:
00:16:55
So in Bayesian deep learning, we currently don't have that.
196
:
00:16:58
So we have different libraries for different types of inference.
197
:
00:17:01
So there's a Laplace library in PyTorch that does Laplace inference quite well.
198
:
00:17:08
That's something that I've worked with quite a lot.
199
:
00:17:11
But there's also one that is called Tihi, which is doing MCMC inference.
200
:
00:17:15
And there's one that's Bayesian Torch, which does variational inference.
201
:
00:17:22
And all these libraries are essentially maintained by different people and need slightly
different ways of defining your model.
202
:
00:17:29
And the problem is really that a priori, you don't actually know which is the right
inference, right?
203
:
00:17:33
So like, if you really want to do stuff, you probably have to actually install all the
three libraries and like try all of them.
204
:
00:17:40
And of course in practice, like most practitioners don't want to do this.
205
:
00:17:43
So I think that's one of the main problems why, you know, we have a lot of papers.
206
:
00:17:49
where we show academically that it can really make a difference, but then in the real
world, people just don't want to go through that hassle of having to figure out what's the
207
:
00:17:57
right library and stuff.
208
:
00:17:58
And if they can just use normal deep learning and two lines of code, they don't want to
spend more than that on Bayesian methods.
209
:
00:18:05
So I think this is really something where the community still has to come together a bit
more and build tools that maybe have a joint API that can then talk to all these other
210
:
00:18:14
libraries.
211
:
00:18:17
and that makes sense also because that's really at the frontier of you know of research so
it's like yeah the time for the research to trickle down to tools you can actually use is
212
:
00:18:29
quite normal you know that's what happens with time c for instance which i'm one of the
core developers of is like for instance if you take hs gp so helmer space the composition
213
:
00:18:42
of gps i think the the the then the time you know that
214
:
00:18:46
we read the paper, we understand it deeply, we implement our first version, we deploy it
into Prime C, we make sure everything works and that doesn't break anything.
215
:
00:18:56
It takes time, so it's like, it's already quite fast, it always take quite some time.
216
:
00:19:04
But that's already, like, doesn't mean the alternative would be faster.
217
:
00:19:09
yeah.
218
:
00:19:12
But yeah, so I'll link to these.
219
:
00:19:14
to these three libraries that you've named in the show notes anyways, for people who are
interested, for sure having to understand which method to use before, that's a pain point
220
:
00:19:26
for practitioners because I'm guessing most of them are not specialized.
221
:
00:19:32
So they don't know, know, like it's not that more than they don't want to, I'm guessing
it's that they don't absolutely don't know how to do that.
222
:
00:19:40
So that's for sure.
223
:
00:19:42
And then the classic deep learning libraries, I guess these are PyTorch, TensorFlow, and
always forget the third one, you remember?
224
:
00:19:53
Jacks, I guess, is one of the big ones.
225
:
00:19:56
yeah, of course, Jacks.
226
:
00:19:57
How can I forget Jacks?
227
:
00:19:58
I use it all the time with PyMC.
228
:
00:20:02
Yeah, so yeah.
229
:
00:20:04
Anyways, so these are good references for people to try out if they want to.
230
:
00:20:10
You don't have to...
231
:
00:20:12
stuff with that, but that's already something to get familiar with neural networks.
232
:
00:20:17
Yeah, I mean definitely if people from the more open source software engineering community
want to get involved, I think there's a lot of open problems we could need help with
233
:
00:20:28
because we're more like these academic types.
234
:
00:20:30
We write our little GitHub repo and link it on your paper, but then just as you said,
putting it into production on the level of PyMC is a whole other...
235
:
00:20:41
problem for which you also need qualified people that actually know how to do this well.
236
:
00:20:46
Yeah, for sure.
237
:
00:20:47
Good point.
238
:
00:20:50
And actually, you have...
239
:
00:20:53
So I know you're more on the algorithm side, but I'm wondering if you have some real world
applications where patient deep learning has significantly improved outcomes?
240
:
00:21:05
Yeah, that's a great question.
241
:
00:21:06
So I think for the reason that I mentioned before that like somehow we
242
:
00:21:11
use it a lot in research but it hasn't really made it easy enough for people to use.
243
:
00:21:16
I think people still have this preconception that Bayesian deep learning just doesn't work
because they don't see it used very much.
244
:
00:21:23
But if you actually look into the papers that people write, there's all kinds of
applications like healthcare, drug discovery, astrophysics, climate science, robotics,
245
:
00:21:34
autonomous driving and so on, where it actually can make a difference.
246
:
00:21:40
You know, lot of these are projects where then you have some domain experts working with
someone like me, right?
247
:
00:21:47
Like some researchers from Basin Deep Learning.
248
:
00:21:50
And then we can show on a project by project level like that it actually has a positive
impact.
249
:
00:21:56
But of course, yeah, for the wider impact, then we would need to make it more usable for
people.
250
:
00:22:01
But I think it's really very promising to see that in all these different areas, have been
attempts to.
251
:
00:22:09
to use it and that it actually has made a difference there.
252
:
00:22:11
And just for people that want to read a bit more about what the pros and cons are and how
it's being used in all these fields, maybe I kind of shamelessly plug a little position
253
:
00:22:24
paper that we wrote and we published at ICML this year, which was co-written with a whole
bunch of very, very good co-authors from different institutions.
254
:
00:22:36
And essentially like there, we try to make an argument why really like today we still need
to use base in deep learning, like probably more than ever, just because AI is so
255
:
00:22:45
pervasive in the world and to make it more reliable and trustworthy.
256
:
00:22:49
That's one way of doing it.
257
:
00:22:50
Yeah, I love that.
258
:
00:22:52
Yeah, for sure.
259
:
00:22:53
And I totally agree with your point that, yeah, seeing more real world applications is
definitely something that's going to inspire people to use these kinds of methods much
260
:
00:23:04
more.
261
:
00:23:04
In addition to...
262
:
00:23:05
the whole workflow convenience and package convenience that we just mentioned before.
263
:
00:23:13
I'm actually curious, in which cases would you recommend people look at deep learning or
patient deep learning in particular, and in which cases you think, no, that won't be
264
:
00:23:26
useful here or that's an overkill?
265
:
00:23:29
Yeah, yeah, that's a great question.
266
:
00:23:30
So I think the main
267
:
00:23:33
The main properties that I think make a problem interesting for Bayesian deep learning is
if you have some kind of prior knowledge, right?
268
:
00:23:40
So typically in a lot of sciences, that's the case.
269
:
00:23:43
Then secondly, if you have certain decisions that you want to make with the predictions
that depend on your uncertainty, right?
270
:
00:23:53
like, and I mean, medicine is a classic example, right?
271
:
00:23:56
Like if you have a diagnostic machine learning system, like you probably care about
whether.
272
:
00:24:01
The system is 99.9 % certain that the patient has a certain disease versus like 70 %
because that might change how you treat them immediately or run another test or something.
273
:
00:24:11
Right.
274
:
00:24:13
And I guess the third thing is like if your data are kind of expensive to generate, right.
275
:
00:24:18
So typically in sciences, again, like often you.
276
:
00:24:22
I don't know if you have a chemical experiment and it costs you like a few thousand bucks
to run each experiment, then you can't generate billions of data points, but like you're,
277
:
00:24:30
you're quite limited in how much data you can generate.
278
:
00:24:33
and that really helps you then to really get the most out of your data to, to be Bayesian.
279
:
00:24:38
Like on the, on the Contra side, again, like if you want to pre-train a language model on
like 15 trillion tokens that you scrape from the internet, like probably you don't need to
280
:
00:24:47
be Bayesian because you have.
281
:
00:24:48
large data set and you probably don't have any better prior knowledge than what's written
on the internet anyway.
282
:
00:24:54
So maybe there it's fine to just use normal deep learning.
283
:
00:24:59
Okay, yeah.
284
:
00:25:00
So basically, if you have prior knowledge, patient deep learning, if you don't, but have
and or have a lot of data, then classic deep learning will be useful to you.
285
:
00:25:14
And we all like, how much data is enough data for classic deep learning?
286
:
00:25:21
I would say?
287
:
00:25:21
Yeah, that's that's really, I mean, super hard question.
288
:
00:25:25
I think it really depends on your problem, right?
289
:
00:25:27
Like, I mean, first, it depends on the
290
:
00:25:30
dimensionality, obviously, if you have a one dimensional problem, then probably if you
have 100 data points, there's already like a lot in one dimension.
291
:
00:25:37
But if you have a problem that's a million dimensional, and you need, you know, more data
points, and then it depends how complex your your problem is, like if it's just a binary
292
:
00:25:46
classification, maybe you can do with quite a few data points to fit some decision
boundary.
293
:
00:25:51
But if you want to do 1000 class classification, like like an image net or something.
294
:
00:25:56
then you might need more data points, right?
295
:
00:25:58
So I think it's very hard to give any like, you know, off the mill like number, but yeah,
I definitely think like if you look at your data and you randomly sample data points and
296
:
00:26:10
they start looking very similar, then probably you have a lot of data, right?
297
:
00:26:15
Like if you sample data points and they all look very different, then you're probably in
the low data regime to some extent.
298
:
00:26:20
And then maybe there it helps you more to be based in.
299
:
00:26:25
And also, from what I understood from talking with Marvin and the Baseflow team, something
that's very important for you to be able to apply at least amortized patient inference, I
300
:
00:26:39
don't know for the other methods, is the number of parameters, as you were saying, and
dimensions in your model.
301
:
00:26:46
For instance, most of my data is very hierarchical and my models have tons of parameters
and dimensions.
302
:
00:26:53
And in these cases, it seems like it's not very useful to use amortized Bayesian inference
because then if understood correctly, the neural network will be very hard to learn.
303
:
00:27:07
Whereas if you have, as you were saying, something that's less dimensional, but a lot of
data, well then here that's a clearer use case.
304
:
00:27:16
Am I summarizing that well?
305
:
00:27:19
Yeah, I mean, I think for normal deep learning, that's definitely true.
306
:
00:27:23
I guess in the basic deep learning, it's always a question, like how good your prior is,
right?
307
:
00:27:27
So like in the, you know, let's kind of for the sake of argument, let's assume you
actually already know what the solution is and you can write this down as a prior.
308
:
00:27:37
Then you don't need any training data and you already have a good posterior, right?
309
:
00:27:42
so it also always depends on this.
310
:
00:27:43
Like if you, if your problem is very hard and high dimensional, as you say, but you
actually already kind of know what the right solution probably is.
311
:
00:27:50
And you just want to fine tune it a little bit to like.
312
:
00:27:54
fit the last wiggles, then you can do that quite well.
313
:
00:27:57
But yeah, if you don't know much a priori, then of course the more complex the problem is,
harder it will be to learn.
314
:
00:28:04
Okay.
315
:
00:28:04
Yeah.
316
:
00:28:05
Yeah.
317
:
00:28:05
I see.
318
:
00:28:06
I see.
319
:
00:28:07
That's interesting for me to really understand that.
320
:
00:28:13
In something you work on quite a lot is also something that's called data efficient AI.
321
:
00:28:19
I'm wondering what that is mainly, you know,
322
:
00:28:22
and if you could discuss your work in these and especially if I understood correctly,
there is a relationship between deep generative modeling and data efficient AI.
323
:
00:28:35
Yeah, definitely.
324
:
00:28:38
yeah, mean, the data efficiency is really like what comes from this idea of having prior
knowledge, right?
325
:
00:28:43
As I just said, essentially, like if you have a perfect prior, then you're a maximally
data efficient because you don't need any data and you already solve your problem.
326
:
00:28:51
And this is really what in many scientific applications is quite useful, right?
327
:
00:28:56
Like if, know, as I said, in chemistry, it's very expensive to generate data, but on the
other hand, you have a whole library full of chemistry books that tell you a lot about how
328
:
00:29:06
that field works.
329
:
00:29:08
Then the hope would be that you can extract some of that prior knowledge, put it in your
model, and then you don't need to see as many data points to make progress.
330
:
00:29:16
So the connection to generative AI is also quite interesting because
331
:
00:29:21
In some level, like Bayesian deep learning and generative AI are quite related in that
they both model joint distributions.
332
:
00:29:27
Now in the Bayesian deep learning case, you have a joint distribution between your
parameters of the model and the predictions, right?
333
:
00:29:36
While in the generative AI, you typically model a joint distribution over the data itself.
334
:
00:29:40
So between inputs and outputs.
335
:
00:29:43
But, you know, like because they both care about modeling joint distributions over
different things.
336
:
00:29:49
You can quite fruitfully exchange ideas between the two.
337
:
00:29:52
like typically the one way would be to say like we can actually use generative AI tools to
do the Bayesian inference better.
338
:
00:29:59
And that's maybe along the lines where people might say these days we have these powerful
diffusion models for generative modeling and we can use diffusion models now to model
339
:
00:30:09
posteriors of Bayesian neural networks, for instance.
340
:
00:30:13
And the other way is obviously the other way around where you can say like, let's take one
of these big
341
:
00:30:18
generative models and try to infuse it with some Bayesian prior knowledge to make it more
data efficient, right?
342
:
00:30:24
And so this is a bit like saying, yeah, if we already know what kind of antibiotics look
like, and I want to build a diffusion model that can produce new target molecules that
343
:
00:30:37
look like antibiotics, like maybe I can put some prior knowledge in there so it doesn't
have to see as many of them to learn how to model them.
344
:
00:30:45
Yeah, it's fascinating.
345
:
00:30:46
really like that.
346
:
00:30:46
And it's all intertwined in everything you're doing.
347
:
00:30:51
that's so cool.
348
:
00:30:52
And is that related to some- thought about, like, I should maybe also mention, like, I did
write another position paper on generative AI.
349
:
00:30:58
So maybe we'll also put that in the show notes if people are interested.
350
:
00:31:01
yeah.
351
:
00:31:02
Yeah, yeah, for sure, for sure.
352
:
00:31:03
That's going to be super interesting.
353
:
00:31:04
Yeah, yeah.
354
:
00:31:06
And how is that related to another interest of yours that's meta-learning?
355
:
00:31:11
How does that interact with your research and-
356
:
00:31:14
What advancement have you observed in this area?
357
:
00:31:17
Yeah, that's another good question.
358
:
00:31:19
Yeah.
359
:
00:31:19
So, so essentially, as I said, like one of the main things in Bayesian deep learning is to
have a good prior, right?
360
:
00:31:25
So like the better your prior is, the better your whole like model is going to be, and
it's going to be more data efficient and hopefully more calibrated.
361
:
00:31:33
But writing down priors by hand can sometimes be challenging, right?
362
:
00:31:37
So sometimes if, if you go to a medical doctor and you tell them, like we're trying to
build this model to predict some disease, like what's your prior.
363
:
00:31:44
they might not be able to actually tell you that much that you can put in there.
364
:
00:31:49
So one way to get these priors is to use meta-learning, which is essentially some way to
look at other tasks that you've solved before that are similar to the problem you care
365
:
00:31:59
about.
366
:
00:32:00
And then you use the knowledge from those related tasks to make the performance in your
target task better.
367
:
00:32:07
So you can actually view meta-learning as a hierarchical Bayesian model where you have
essentially like a distributional.
368
:
00:32:13
tasks at the top level.
369
:
00:32:15
And then you have like the different tasks as like different Bayesian inference problems,
but you use the previous task knowledge to inform the prior on every new task you see.
370
:
00:32:26
And so this is kind of how, how, yeah, meta-learning can really help you get better
priors.
371
:
00:32:31
And then of course, there's the other way around where you can actually say, we can also
use meta-learning as a tool to learn how to do Bayesian inference.
372
:
00:32:38
And then that's
373
:
00:32:38
you know, the amortized kind of inference idea that you mentioned that Marvin and others
are working on, where you really meta-learn how to do the Bayesian inference, so you don't
374
:
00:32:47
actually have to run the whole Bayesian inference routine, but you can use some like
meta-learned neural network or something to do that inference for you.
375
:
00:32:56
And for instance, one class of models that do something like that on neural processes,
which are kind of a way of framing, you know, Gaussian processes with neural networks.
376
:
00:33:04
So that comes back to what we talked about earlier.
377
:
00:33:07
And that's one of these meta learning frameworks for Bayesian inference.
378
:
00:33:10
And we're also quite interested in, my group.
379
:
00:33:12
Okay.
380
:
00:33:13
I see.
381
:
00:33:13
So it's like, so meta learning would be learning about the models themselves.
382
:
00:33:21
Am I, am I understand understanding that right?
383
:
00:33:24
Yeah.
384
:
00:33:25
So meta learning is really like, you kind of try to try to look at some previous task and
then say like, okay, now, now that I've solved this task, like, what can I do better next
385
:
00:33:35
time?
386
:
00:33:36
Right.
387
:
00:33:36
So I think in the normal world, you might think about, you know, if you learn several
languages, right?
388
:
00:33:44
Like I know you speak, you speak different languages.
389
:
00:33:47
Like I think every, every new language gets a little bit easier because you have all these
previous ones and you can kind of reuse some of that knowledge to have a better prior of
390
:
00:33:55
what the next one might be like, right?
391
:
00:33:58
Okay.
392
:
00:33:59
Okay.
393
:
00:33:59
I see.
394
:
00:34:01
Interesting.
395
:
00:34:02
And so is that related to pack Bayesian theory?
396
:
00:34:06
which is another thing you're doing in your lab.
397
:
00:34:08
So can you explain what that is and yeah, basically why that's useful, why it's relevant
to your work?
398
:
00:34:15
Yeah.
399
:
00:34:16
Yeah, yeah.
400
:
00:34:17
So, so it's not just moving on from meta-learning.
401
:
00:34:20
Like there is a relationship and I actually have a paper on using pack-based in theory for
meta-learning.
402
:
00:34:26
It's not necessarily like directly related.
403
:
00:34:28
It's just something that I happen to do, but maybe on a, on a higher, more general level,
I guess the idea of pack-based.
404
:
00:34:35
is it's one of these things where people try to marry like Bayesian and frequentist ideas,
right?
405
:
00:34:42
So I think in the past, there was kind of like in the early 20th century, like there were
these kind of fights essentially between the frequentists and the Bayesians and they were
406
:
00:34:51
like solidly on different sides of statistics and arguing about things.
407
:
00:34:56
But these days, I think it's actually like more, more ecumenical, right?
408
:
00:35:00
Like people really try to use ideas from Bayesian statistics and
409
:
00:35:04
printed statistics and make them work where they work and use the others otherwise.
410
:
00:35:09
PackBase is a great example to combine these two.
411
:
00:35:12
So PackBase essentially is a way to give you generalization bounds.
412
:
00:35:16
All right.
413
:
00:35:16
So you essentially have a model that you trained, could be a Bayesian model, could
actually be something else.
414
:
00:35:21
And you try to ask the question based on the test error that you have, that you care, like
what's, what's kind of a bound, how bad that could be.
415
:
00:35:30
Right.
416
:
00:35:30
So like how well will your model do on unseen data?
417
:
00:35:34
And typically the form that these pack-based bounds take is to say with a high probability
over possible test data you might observe, the expected test error under your posterior
418
:
00:35:47
will not be much larger than the expected train error under the posterior.
419
:
00:35:53
So if you have your base posterior and you evaluate it on your training set and you get
some numbers, so on MNIST maybe you get 1 % or something.
420
:
00:36:02
then the pack-based bound might tell you with like 95 % probability your test error on
unseen data might not be worse than 1 % plus x.
421
:
00:36:11
So it might be 2 % or 3 % or something.
422
:
00:36:15
And then like how much slack there is, right?
423
:
00:36:16
Like how much between the test error bound and your actually trained error that depends on
how many data points you've observed, how high you want the probability to be, right?
424
:
00:36:25
So if you want it to be 99 instead of 95%, it will get a bit looser.
425
:
00:36:31
usually on the KL divergence between your prior and posterior in some way.
426
:
00:36:35
So this is where this kind of pack-based idea comes in that you really have this prior and
you compute the KL divergence.
427
:
00:36:43
And I guess some of your listeners might now think, okay, if I say train error plus
something like KL divergence, that sounds a bit like the elbow, right?
428
:
00:36:51
The evidence lower bound and variational inference.
429
:
00:36:54
And indeed you can.
430
:
00:36:56
you can see it in a very similar way.
431
:
00:36:57
So you can optimize a pack-based bound in the same way that you optimize the elbow and use
it for model selection or to actually derive posterior measures.
432
:
00:37:07
And whereas the elbow, like if you optimize it, will essentially give you the proper base
posterior, with these pack-based bounds, you can get something like pseudo-posteriors that
433
:
00:37:16
also gives measures, right?
434
:
00:37:17
So they have very similar mathematical form as a base posterior, but they might be more
robust in certain ways because they deviate slightly.
435
:
00:37:26
We've talked a bit about that, algorithms for the Bayesian deep learning models.
436
:
00:37:34
Can you give us a rundown of these algorithms and when they are useful for which cases?
437
:
00:37:42
Yeah, definitely.
438
:
00:37:43
So there's essentially a spectrum of trade-off between how expensive the algorithm is and
how good it is, right?
439
:
00:37:51
Like how well you fit the posterior.
440
:
00:37:55
So on one side, we have things like Laplace inference, which I've been working on quite a
lot recently, which is very cheap.
441
:
00:38:02
So there essentially you just train your model as you would normally, right?
442
:
00:38:06
So you optimize your log posterior typically.
443
:
00:38:08
So you get a map estimate and you have a point estimate for your parameters.
444
:
00:38:12
That's your neural network.
445
:
00:38:14
And this is going to be your mean for the posterior.
446
:
00:38:17
And then to get some distribution around it, you essentially have to approximate your
Hessian.
447
:
00:38:22
So you have to...
448
:
00:38:23
compute a second order derivative on the loss function.
449
:
00:38:27
And that Hessian then gives you the covariance for your Gaussian approximation, right?
450
:
00:38:32
So you just wrap a Gaussian around your optimized point estimate.
451
:
00:38:38
And that sounds very crude, right?
452
:
00:38:40
So like this clearly doesn't fit the entire posterior, but it turns out that it works
quite okay, right?
453
:
00:38:46
So like for how cheap it is, it's quite a decent approximation.
454
:
00:38:51
And with the modern algebra frameworks, you can do this Hessian approximation quite fast.
455
:
00:38:56
So this is something that these days people have actually successfully done even on GPT-2
or something, right?
456
:
00:39:02
So you can really scale this up quite a lot.
457
:
00:39:06
Now, if you want a slightly better posterior, you can do things like variational
inference.
458
:
00:39:11
So there you can choose a bit more freely what your posterior shape might be.
459
:
00:39:16
So doesn't have to be Gaussian.
460
:
00:39:18
You can use some...
461
:
00:39:19
other distribution and then just optimize the elbow between your true posterior and your
approximate one.
462
:
00:39:26
And you can also do things like mixtures, like if you believe that your posterior might be
multimodal, you can actually have a mixture distribution that you optimize.
463
:
00:39:36
And maybe subset of that mixture variational inference is these kind of particle-based
approaches where the mixture is actually a mixture of Dirac measures.
464
:
00:39:45
So you actually just have some.
465
:
00:39:47
some point masses that you move around.
466
:
00:39:50
And you can show that you can move them around in a way that they cover the posterior
quite nicely.
467
:
00:39:56
So this is actually an elegant way of saying, typically we care about drawing samples
anyway, right?
468
:
00:40:02
And so if I said from my posterior, I would naturally draw 50 samples.
469
:
00:40:06
And instead of first approximating the whole posterior and then drawing 50 samples, I can
just start with 50 samples and then just move them around so they look like they were
470
:
00:40:14
drawn from the posterior and then I'm done.
471
:
00:40:17
And that's quite related to deep ensembles, which is also a slightly non-Basian bass line
that people often use in practice because it's easy to implement and easy to use.
472
:
00:40:28
And so there's some kind cool connections between how, like if you take a normal deep
ensemble and you add some certain repulsive force between the ensemble members, then if
473
:
00:40:37
you do that in the right way, it essentially recovers some bass posterior.
474
:
00:40:42
And then of course the more...
475
:
00:40:44
Expensive ones are all these MCMC approaches, so Markov chain Monte Carlo approaches like
stochastic gradient Langevin dynamics and Hamiltonian Monte Carlo that, you know, like the
476
:
00:40:55
longer you run the chain at some point it mixes and I mean, I guess you know all that from
PiMC.
477
:
00:41:02
So this is the best way to get actual samples that have some guarantees, but it's also
very expensive.
478
:
00:41:08
So you might have to pay a huge amount of extra compute.
479
:
00:41:12
And it really depends like how much compute you're willing to spend in your particular
problem, right?
480
:
00:41:17
Like if you pre-train a language model, like already one training run costs you millions
of dollars.
481
:
00:41:22
You probably don't want to spend any extra compute on that.
482
:
00:41:26
But if you're a medical researcher and you spend already half a year generating your
dataset, then maybe you don't care whether your neural network training now takes one day
483
:
00:41:36
or five days or something, right?
484
:
00:41:38
Like this is not the bottleneck in your project.
485
:
00:41:41
So it really depends on what the application is.
486
:
00:41:43
And sometimes if you can afford running more compute, then of course you should run the
better inference.
487
:
00:41:49
Yeah.
488
:
00:41:49
Yeah, for sure.
489
:
00:41:50
That makes sense.
490
:
00:41:51
And what are the latest advancements when it comes to, well, more efficient inference
techniques for patient deep learning?
491
:
00:42:01
Yeah.
492
:
00:42:02
Yeah.
493
:
00:42:02
I mean, a lot of them are still being developed.
494
:
00:42:04
Like the Laplace stuff I talked about, we definitely had a lot of papers recently, also
some of them that I was involved in.
495
:
00:42:10
that have kind of pushed the scalability by using all kinds of clever tricks from
matrix-free linear algebra and all these things that you can do these days in cool
496
:
00:42:20
frameworks.
497
:
00:42:22
I think a general idea that's quite interesting is the idea of subspace inference.
498
:
00:42:27
So that's essentially the insight that you don't have to treat every single parameter of
your neural network probabilistically in order to get a good enough
499
:
00:42:38
posterior over functions, right?
500
:
00:42:40
So this kind of comes back to what I talked about earlier.
501
:
00:42:43
Like if you think about the Bayesian neural networks as just like a neural network that's
now Bayesian and you make the parameters a distribution, then it sounds like you might
502
:
00:42:52
need to do this of all parameters.
503
:
00:42:54
But if you think about it just from the lens of saying we want to have a posterior over
functions that makes sense, then it becomes obvious that, you know, in your millions of
504
:
00:43:02
parameters, maybe there's a subset that
505
:
00:43:05
is enough to be random to actually introduce enough randomness in the functions that are
being implemented.
506
:
00:43:11
And so there's a lot of work that, for instance, just takes the last layer of the neural
network or just the first layer or some sub network inside the bigger neural network.
507
:
00:43:21
And as I told you before, like in the case of language models, for instance, you can take
the backbone and leave it fixed and frozen as a point estimate and then just have these
508
:
00:43:29
small parameter efficient adapters that you treat as a Bayesian.
509
:
00:43:34
So I think this is really like where you get a lot of efficiency gains is to figure out
like out of your huge neural network, like which subset of parameters do you need to treat
510
:
00:43:43
in a Bayesian way to then make the function space posterior fit what your function should
be.
511
:
00:43:48
And then most of the other parameters you can just leave as point estimates.
512
:
00:43:51
Okay.
513
:
00:43:51
Okay.
514
:
00:43:52
I see.
515
:
00:43:52
Very interesting.
516
:
00:43:53
So yeah, if you have any, any link to dive deeper into that for, for listeners and for
myself, myself, honestly.
517
:
00:44:03
Yeah, leave that in the show notes because I'm always curious to see these latest
developments.
518
:
00:44:10
Something I'm very excited about is the inter-twanning, I don't know if that's the word in
English, of initialization of the MCMC chains with neural networks.
519
:
00:44:28
So normalizing flows or the pathfinder algorithm from Bob Carpenter.
520
:
00:44:34
where you would use basically draws from pathfinder or normalizing flows as the
initialization of the MCMC chains and that that makes your sampling not necessarily faster
521
:
00:44:49
because you still have to train a neural network but sometimes when like if you have
really a lot of data and MCMC is really slow then that's definitely useful especially if
522
:
00:45:01
you can have access to GPU
523
:
00:45:03
And that also makes basically, especially the pathfinder option basically makes
variational inference much more practical because it usually gives you much better
524
:
00:45:22
answers.
525
:
00:45:23
And then there is the normalizing for initialization option that here if you have GPU,
that's definitely extremely helpful.
526
:
00:45:31
And actually there are ongoing efforts right now going on the PIMC side to add
PathfinderVI to PIMC as an initialization option to MCMC, which would be super efficient
527
:
00:45:50
because that's also when Bob Carpenter created that for us.
528
:
00:45:54
Basically running Pathfinder using some draws.
529
:
00:45:58
initialize MCMC and then you can just run MCMC for a few iterations on an out of chains
and that's a much faster convergence than pure MCMC but that should also be
530
:
00:46:14
much more reliable than the classic VI that we have right now in PIMC.
531
:
00:46:20
And then there is also ongoing effort by Adrian Zeebelt, in particular on the NutPy side,
where he just added the ability to use normalizing flows as initialization for the MCMC
532
:
00:46:35
inference on NutPy.
533
:
00:46:37
So you can already use that in your PIMC or STAND models to try that.
534
:
00:46:44
So definitely I'll put the links in the show notes of that option because the pathfinder
option on PIMC is still under development so it's still a pull request.
535
:
00:46:57
So not very useful to link to but the nutpy thing with normalizing flow that's definitely
implemented already.
536
:
00:47:05
So I'll put a link in the show notes to a discourse post that Adrian posted.
537
:
00:47:11
recently to explain how you can use that.
538
:
00:47:14
And so I definitely encourage people to check that out.
539
:
00:47:17
Also report to Adrian on any GitHub issue on Notify if there are any problems because
that's renew so...
540
:
00:47:26
In this case, that's extremely useful for open source developers to hear from early
adopters to fine tune the details.
541
:
00:47:34
Keep in mind, and you'll see that in the discourse post that definitely GPU will help you
a here because you need to fit the neural network first.
542
:
00:47:44
yeah, don't expect any model to run in less than 10 minutes.
543
:
00:47:49
But then if your model is already
544
:
00:47:52
beer than that and takes much more time than that, then that could be a viable option.
545
:
00:47:57
cool.
546
:
00:47:58
That sounds very good.
547
:
00:48:01
that's really awesome.
548
:
00:48:02
And I'll definitely definitely link to that.
549
:
00:48:08
also another question I have for you actually is related to be to that.
550
:
00:48:12
But what advancements do you foresee and maybe also do you wish for in making these AI
models?
551
:
00:48:21
more reliable and data efficient.
552
:
00:48:25
Yeah.
553
:
00:48:25
mean, I mean, definitely as, as you probably guessed, like I'm personally hoping that
Bayesian deep learning can, play a role in that.
554
:
00:48:31
Right.
555
:
00:48:33
and I guess in our position paper that I mentioned before, like we really tried to make
this argument quite strongly.
556
:
00:48:39
I think what's really, I think an issue is often we don't do a good job in communicating
to people like what they need these uncertainties for.
557
:
00:48:49
Right.
558
:
00:48:49
So I don't know if you have that.
559
:
00:48:51
in your consultancy and stuff, right?
560
:
00:48:52
But like I often talk to scientists and then they, you know, they're like, we need this,
this model to solve our task.
561
:
00:48:59
And I'm like, okay, what if we make it Bayesian?
562
:
00:49:01
I can also give you uncertainties.
563
:
00:49:02
And then they're like, but what should I, what should I get uncertainties for?
564
:
00:49:06
I don't care.
565
:
00:49:06
I just want like a good performance.
566
:
00:49:09
And I think it's a bit unfortunate that in the communication, don't make them understand
that well, that it's really about downstream decisions, right?
567
:
00:49:18
So like we don't.
568
:
00:49:19
care about uncertainties for their own sake, but we care about making good decisions in
the real world.
569
:
00:49:25
And as I said before, like often you really need the uncertainty to make the decision,
right?
570
:
00:49:30
Like if you're a doctor and you have a patient, like you need to understand whether my
algorithm is giving you like a 99.9 % accurate prediction or whether it's just like a 80,
571
:
00:49:40
20 kind of guess, in which case, like in the letter, you probably want to do another test
or something, right?
572
:
00:49:47
And, you know, the problem is
573
:
00:49:49
that as long as we don't communicate this, then people also won't have an intrinsic
motivation to try our methods.
574
:
00:49:56
So I think this is really something that maybe as a community, we could be a bit more kind
of like better at communicating, right?
575
:
00:50:03
That we really don't care about uncertainties just because we want to have nice likelihood
numbers in our tables, what we care about making good decisions in the real world.
576
:
00:50:11
And as long as people care about this, then I mean, I'm also happy if they find other ways
to.
577
:
00:50:17
to serve that purpose, right?
578
:
00:50:19
Like, I said before, like frequentist methods can also work quite well.
579
:
00:50:22
And if people want to use conformal prediction for some specific task where that works
well, then I'm not going to force them to do a Bayesian thing, right?
580
:
00:50:30
I think as long as people actually start thinking more about like, why, why do I need
reliable predictions?
581
:
00:50:36
What do I use them for?
582
:
00:50:38
And how expensive is my data?
583
:
00:50:40
Can I maybe get more out of this small data set that I paid a lot of money for?
584
:
00:50:44
Then I think.
585
:
00:50:46
you know, like that I'm happy.
586
:
00:50:47
if they, you know, I hope that most of them will find that that base is a good way of
doing this.
587
:
00:50:53
But some of them might find a different way.
588
:
00:50:55
And that's also fine.
589
:
00:50:55
Yeah, in the end, whatever works, right?
590
:
00:50:58
That's important.
591
:
00:50:59
It's better to have a good enough model than that no model at all.
592
:
00:51:03
Yeah, for sure.
593
:
00:51:06
So to play yourself, I'm wondering if you have any advice to offer to
594
:
00:51:13
those looking to pursue a career in topics you're working on?
595
:
00:51:18
So whether deep learning or probabilistic inference?
596
:
00:51:23
Yeah, definitely.
597
:
00:51:24
mean, I think, you know, one of the big advice that I always give my students is to really
focus on the foundations.
598
:
00:51:31
So I like to tell this anecdote when I started my PhD, like everyone was crazy about GANs.
599
:
00:51:37
I don't know if you remember that.
600
:
00:51:38
So like these generative adversarial networks.
601
:
00:51:41
was the time where, you know, they were everywhere and there were like hundreds of papers
about them.
602
:
00:51:46
And so when I started my PhD, some people are like, you should really get into GANs,
right?
603
:
00:51:50
You should like learn about that in detail and learn all the tricks to train them and
whatever, which I didn't do.
604
:
00:51:57
I'm guessing if I had done like none of this would be relevant anymore, right?
605
:
00:52:00
Because like these days people don't use GANs anymore.
606
:
00:52:03
People use diffusion models and you know, that's going to be the next next thing, right?
607
:
00:52:08
But I think in contrast to that, if you look at things like Bayesian inference, that's
been used for 200 years and people still use it and it's still an important thing to
608
:
00:52:17
understand.
609
:
00:52:18
So I think if people focus more on these big ideas, these foundational things rather than
the latest trends and fads, I think that serves a much better purpose.
610
:
00:52:29
And I understand that these days a lot of people are quite excited about language models
and stuff, but who knows how long.
611
:
00:52:36
we still care about them in this particular way, right?
612
:
00:52:39
Like maybe next year someone comes along and develops some cool new architecture that's
very different from a transformer.
613
:
00:52:45
And suddenly, like everyone does language models differently, or people stop doing
autoregressive language modeling and use diffusion for language or whatever.
614
:
00:52:54
And then suddenly, like if you've only learned about this particular thing because it was
cool right now, then your knowledge will be obsolete, right?
615
:
00:53:02
So I think that's maybe the main.
616
:
00:53:05
the main idea.
617
:
00:53:05
Then maybe another advice that I always give people is to just talk to as many people as
possible, right?
618
:
00:53:10
And I think what's really nice in our community is that people are quite open.
619
:
00:53:14
Like if you go to any machine learning conference, you can just talk to anyone and people
are happy to have a chat.
620
:
00:53:19
Like nobody's going to turn you down.
621
:
00:53:21
And just to get an idea of the diversity of research that's being done and to get an idea
that not everyone just works on LLMs, but there's actually like tons of interesting ideas
622
:
00:53:31
that people work on.
623
:
00:53:33
And yeah, it's quite an exciting time to be in that field really.
624
:
00:53:38
Yeah, yeah, I second everything you just say.
625
:
00:53:41
Extremely welcoming community.
626
:
00:53:43
Feel free to ask questions.
627
:
00:53:45
Always politely, of course, you know.
628
:
00:53:49
But yeah, like extremely welcoming community.
629
:
00:53:53
And if you're persistent and really want to help people out and be active, I don't think
you're going to have any problem getting a foot in the door, let's say.
630
:
00:54:08
Yeah, yeah, it's a good question.
631
:
00:54:09
mean, since I started my own research group, I feel like it's been diffusing around the
edges a bit because suddenly you have PhD students and they obviously have their own ideas
632
:
00:54:19
as they definitely should, right?
633
:
00:54:23
So yeah, there's a lot of things we're looking into.
634
:
00:54:25
mean, as I said before, I think AI for science applications are still something that I
find really exciting.
635
:
00:54:32
And I feel like we should, you know, like now is the time to show the world that
636
:
00:54:37
the kind of algorithms we've developed over the last 10 years or whatever actually make a
difference.
637
:
00:54:43
And particularly like looking into how we can also benchmark them better, right?
638
:
00:54:47
So I think that's a bit of an issue right now that we end up using a lot of benchmarks
that other people have developed for their particular models to make them look good,
639
:
00:54:57
right?
640
:
00:54:58
So typically if you look at these deep learning benchmarks like MNIST and CIFAR and
whatever,
641
:
00:55:04
They're really like very highly curated data sets that are very clean.
642
:
00:55:08
Like there's essentially no uncertainty about these digits.
643
:
00:55:10
Right.
644
:
00:55:11
So like if you, if you train a normal neural network on MNIST that works perfectly fine.
645
:
00:55:15
So arguably you don't need to make it Bayesian.
646
:
00:55:18
And then if we try to benchmark on this and you don't see much of a benefit and then
people will say, there's not much of a benefit to be Bayesian.
647
:
00:55:24
Like, why would you do this in the first place?
648
:
00:55:26
But it's because this data set is just not interesting for that purpose.
649
:
00:55:30
Right.
650
:
00:55:31
Similarly on the Bayesian side, a lot of the data sets that people use are very small and
low dimensional because that's where traditional Bayesian methods work well.
651
:
00:55:39
Right?
652
:
00:55:39
Like if you just have a Gaussian process, then you want to run on like some little, you
know, five dimensional regression thing.
653
:
00:55:46
so I think what we're lacking kind of right now are these benchmarks that are realistic
type of data from the real world that is both high dimensional.
654
:
00:55:55
that deep learning is useful.
655
:
00:55:57
but also has all this like complicated noise structure or some uncertainty.
656
:
00:56:02
And so I think this is something that we're also like looking into a bit more now to find
these kind of the right niche for our product.
657
:
00:56:10
Like it's a bit like, obviously then people might say, if you have a hammer, you try to
make everything look like a nail.
658
:
00:56:14
So I don't want to be that person, but I still believe that there are nails in the world
and I want to find them so I can use my hammer.
659
:
00:56:21
And I don't want to hammer on everything that's not a nail.
660
:
00:56:23
like, I think that's something that...
661
:
00:56:27
that we're definitely looking into right now.
662
:
00:56:29
Cool, yeah, that's very exciting.
663
:
00:56:32
Yeah, love that.
664
:
00:56:33
Come back on the show as soon as you have something to tell us about that.
665
:
00:56:39
Yeah, definitely.
666
:
00:56:40
Yeah, that sounds fascinating.
667
:
00:56:42
Awesome, well Vincent, I know you have a lot to do and I have to let you go because you
have a hard stop, so I would still ask you a ton of questions, but let's call it a show.
668
:
00:56:53
I think we covered a lot of ground.
669
:
00:56:57
learned a lot of things and yeah I really like it because lots of things are clear to me
now that we have talked that before the show so that's awesome that's also why I do that
670
:
00:57:10
show but before letting you go I'm gonna ask you the last two questions I ask every guest
at the end of the show first one if you had unlimited time and resources which problem
671
:
00:57:23
would you try to solve
672
:
00:57:26
question.
673
:
00:57:27
So I don't know if they use that slogan anymore but DeepMind used to have that slogan
where they said their goal was to solve intelligence and then use it to solve everything
674
:
00:57:37
else.
675
:
00:57:38
So I might make that, you know, solve Bayesian inference and then use it to solve
everything else.
676
:
00:57:44
But jokes aside, I think like there's so many interesting problems in AI for science right
now, you know, from healthcare to material science to climate.
677
:
00:57:54
And so really hope that we can have some impact there by essentially building AI methods
that scientists can use, which are strongly founded on Bayesian principles and are
678
:
00:58:04
therefore more reliable, more robust and more trustworthy.
679
:
00:58:08
Yeah.
680
:
00:58:09
I feel thinking about this.
681
:
00:58:11
don't think I have any more creative answer than all your other guests sent before me.
682
:
00:58:15
I do think having dinner with like Bayes or Laplace would be fun, right?
683
:
00:58:20
Although for the latter, I might have to brush up on my French a little bit.
684
:
00:58:24
Otherwise, one thing I'm quite sad about personally is that I never actually got to meet
Dave McKay before he passed away.
685
:
00:58:31
So I think that might also be very nice to meet him for dinner one time.
686
:
00:58:36
Yeah, for sure.
687
:
00:58:39
Well, Vincent, a pleasure to meet you, a pleasure to have you the show.
688
:
00:58:47
Come back any time.
689
:
00:58:48
And as usual, I'll put a link to...
690
:
00:58:52
website, your socials and and all the papers for this episode, a lot of packages.
691
:
00:58:57
So that's great.
692
:
00:58:58
The show notes already very big.
693
:
00:59:01
So feel free to add anything in the show notes for for those who want to dig deeper.
694
:
00:59:05
Thank you again, Vincent, for taking your time and maybe on the next show.
695
:
00:59:10
Thanks, Alex.
696
:
00:59:11
was great fun.
697
:
00:59:16
This has been another episode of Learning Bayesian Statistics.
698
:
00:59:20
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbayestats.com for more resources about today's topics, as well as access to more
699
:
00:59:30
episodes to help you reach true Bayesian state of mind.
700
:
00:59:34
That's learnbayestats.com.
701
:
00:59:36
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraan.
702
:
00:59:41
Check out his awesome work at bababrinkman.com.
703
:
00:59:44
I'm your host.
704
:
00:59:46
Alex and Dora.
705
:
00:59:47
can follow me on Twitter at Alex underscore and Dora like the country.
706
:
00:59:51
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
707
:
00:59:58
Thank you so much for listening and for your support.
708
:
01:00:00
You're truly a good Bayesian.
709
:
01:00:03
Change your predictions after taking information.
710
:
01:00:06
And if you're thinking I'll be less than amazing.
711
:
01:00:10
Let's adjust those expectations.
712
:
01:00:13
Let me show you how to be a good Bayesian Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation