Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
How does the world of statistical physics intertwine with machine learning, and what groundbreaking insights can this fusion bring to the field of artificial intelligence?
In this episode, we delve into these intriguing questions with Marylou Gabrié. an assistant professor at CMAP, Ecole Polytechnique in Paris. Having completed her PhD in physics at École Normale Supérieure, Marylou ventured to New York City for a joint postdoctoral appointment at New York University’s Center for Data Science and the Flatiron’s Center for Computational Mathematics.
As you’ll hear, her research is not just about theoretical exploration; it also extends to the practical adaptation of machine learning techniques in scientific contexts, particularly where data is scarce.
In this conversation, we’ll traverse the landscape of Marylou's research, discussing her recent publications and her innovative approaches to machine learning challenges, latest MCMC advances, and ML-assisted scientific computing.
Beyond that, get ready to discover the person behind the science – her inspirations, aspirations, and maybe even what she does when not decoding the complexities of machine learning algorithms!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Takeaways
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
How does the world of statistical physics
intertwine with machine learning, and what
2
:groundbreaking insights can this fusion
bring to the field of artificial
3
:intelligence?
4
:In this episode, we'll delve into these
intriguing questions with Marilou Gavrier.
5
:Having completed her doctorate in physics
at Ecole Normale Supérieure, Marilou
6
:ventured to New York City for a joint
postdoctoral appointment at New York
7
:University's Center for Data Science.
8
:and the Flatirons Center for Computational
Mathematics.
9
:As you'll hear, her research is not just
about theoretical exploration, it also
10
:extends to the practical adaptation of
machine learning techniques in scientific
11
:contexts, particularly where data are
scarce.
12
:And this conversation will traverse the
landscape of Marie-Lou's research,
13
:discussing her recent publications and her
innovative approaches to machine learning
14
:challenges.
15
:her inspirations, aspirations, and maybe
even what she does when she's not decoding
16
:the complexities of machine learning
algorithms.
17
:This is Learning Bayesian Statistics,
,:
18
:Let me show you how to be a good lazy and
change your predictions.
19
:Marie-Louis Gabrié, welcome to Learning
Bayesian Statistics.
20
:Thank you very much, Alex, for having me.
21
:Yes, thank you.
22
:And thank you to Virgil, André and me for
putting us in contact.
23
:This is a French connection network here.
24
:So thanks a lot, Virgil.
25
:Thanks a lot, Marie-Lou for taking the
time.
26
:I'm probably going to say Marie-Lou
because it flows better in my English
27
:because saying Marie-Lou is and then
continuing with English.
28
:I'm going to have the French accent, which
nobody wants to hear that.
29
:So let's start.
30
:So I gave a bit of...
31
:of your background in the intro to this
episode, Marie-Lou, but can you define the
32
:work that you're doing nowadays and the
topics that you are particularly
33
:interested in?
34
:I would define my work as being focused on
developing methods and more precisely
35
:developing methods that use and leverage
all the progress in machine learning for
36
:scientific computing.
37
:I have a special focus within this realm.
38
:which is to study high-dimensional
probabilistic models, because they really
39
:come up everywhere.
40
:And I think they give us a very particular
lens on our world.
41
:And so I would say I'm working broadly in
this direction.
42
:Well, that sounds like a lot of fun.
43
:So I understand why Virgil put me in
contact with you.
44
:And could you start by telling us about
your journey?
45
:actually into the field of statistical
physics and how it led you to merge these
46
:interests with machine learning and what
you're doing today.
47
:Absolutely.
48
:My background is actually in physics, so I
studied physics.
49
:Among the topics in physics, I quickly
became interested in statistical
50
:mechanics.
51
:I don't know if all listeners would be
familiar with statistical mechanics, but I
52
:would define it.
53
:broadly as the study of complex systems
with many interacting components.
54
:So it could be really anything.
55
:You could think of molecules, which are
networks of interacting agents that have
56
:non-trivial interactions and that have
non-trivial behaviors when put all
57
:together within one system.
58
:And I think it's really important, as I
was saying, viewpoint of the world today
59
:to look at those big macroscopic systems
that you can study probabilistically.
60
:And so I was quickly interested in this
field that is statistical mechanics.
61
:And at some point machine learning got the
picture.
62
:And the way it did is that I was looking
for a PhD in:
63
:And I had some of my friends that were,
you know, students in computer science and
64
:kind of early commerce to machine
learning.
65
:And so I started to know that it existed.
66
:I started to know that actually deep
neural networks were revolutionizing the
67
:fields, that you could expect a program
to, I don't know, give names to people in
68
:pictures.
69
:And I thought, well, if this is possible,
I really wanna know how it works.
70
:I really want to, for this technology, not
to sound like magic to me, and I want to
71
:know about it.
72
:And so this is how I started to become
interested and to...
73
:find out that people knew how to make it
work, but not how it worked, why it worked
74
:so well.
75
:And so this is how I, in the end, was put
into contact with Florence Akala, who was
76
:my PhD advisor.
77
:And I started to have this angle of trying
to use statistical mechanics framework to
78
:study deep neural networks that are
precisely those complex systems I was just
79
:mentioning, and that are so big that we
are having trouble making really sense of
80
:what they are doing.
81
:Yeah, I mean, that must be quite...
82
:Indeed, it must be quite challenging.
83
:We could already dive into that.
84
:That sounds like fun.
85
:Do you want to talk a bit more about that
project?
86
:Since then, I really shifted my angle.
87
:I studied in this direction for, say,
three, four years.
88
:Now, I'm actually going back to really the
applications to real-world systems, let's
89
:say.
90
:using all the potentialities of deep
learning.
91
:So it's like the same intersection, but
looking at it from the other side.
92
:Now really looking at application and
using machine learning as a tool, where I
93
:was looking at machine learning as my
study, my object of study, and using
94
:statistical mechanics before.
95
:So I'm keen on talking about what I'm
doing now.
96
:Yeah.
97
:So basically you...
98
:You changed, now you're doing the other
way around, right?
99
:You're studying statistical physics with
machine learning tools instead of doing
100
:the opposite.
101
:And so how does, yeah, what does that look
like?
102
:What does that mean concretely?
103
:Maybe can you talk about an example from
your own work so that listeners can get a
104
:better idea?
105
:Yeah, absolutely.
106
:So.
107
:As I was saying, statistical mechanics is
really about large systems that we study
108
:probabilistically.
109
:And here there's a tool, I mean, that
would be one of the, I would say, most
110
:active direction of research in machine
learning today, which are generative
111
:models.
112
:And they are very natural because there
are ways of making probabilistic model,
113
:but that you can control.
114
:That you have control.
115
:produce samples from within one commons,
where you are in need of very much more
116
:challenging algorithms if you want to do
it in a general physical system.
117
:So we have those machines that we can
leverage and that we can actually combine
118
:in our typical computation tools such as
Markov chain Monte Carlo algorithms, and
119
:that will allow us to speed up the
algorithms.
120
:Of course, it requires some adaptation
compared to what people usually do in
121
:machine learning and how those generative
models were developed, but it's possible
122
:and it's fascinating to try to make those
adaptations.
123
:Hmm.
124
:So, yeah, that's interesting because if I
understand correctly, you're saying that
125
:one of your...
126
:One of the aspects of your...
127
:job is to understand how to use MCMC
methods to speed up these models?
128
:Actually, it's the other way around, is
how to use those models to speed up MCMC
129
:methods.
130
:Okay.
131
:Can you talk about that?
132
:That sounds like fun.
133
:Yeah, of course.
134
:Say MCMC algorithms, so Markov Chain
Monte-Carlo's are really the go-to
135
:algorithm when you are faced with
probabilistic models that is describing
136
:whichever system you care about, say it
might be a molecule, and this molecule has
137
:a bunch of atoms, and so you know that you
can describe your system, I mean at least
138
:classically, at the level of giving the
Cartesian coordinates of all the atoms in
139
:your system.
140
:And then you can describe the equilibrium
properties of your system.
141
:by using the energy function of this
molecule.
142
:So if you believe that you have an energy
function for this molecule, then you
143
:believe that it's distributed as
exponential minus beta the energy.
144
:This is the Boltzmann distribution.
145
:And then, okay, you are left with your
probabilistic model.
146
:And if you want to approach it, a priori
you have no control onto what this energy
147
:function is imposing as constraints.
148
:It may be very, very complicated.
149
:Well, go-to algorithm is Markov chain
Monte Carlo.
150
:And it's a go-to algorithm that is always
going to work.
151
:And here I'm putting quotes around this
thing.
152
:Because it's going to be a greedy
algorithm that is going to be looking for
153
:plausible configurations next to other
plausible configurations.
154
:And locally, make a search on the
configuration space, try to visit it, and
155
:then.
156
:will be representative of the
thermodynamics.
157
:Of course, it's not that easy.
158
:And although you can make such locally,
sometimes it's really not enough to
159
:describe fully probabilistic modeling, in
particular, how different regions of your
160
:configuration space are related to one
another.
161
:So if I come back to my molecule example,
it would be that I have two different,
162
:let's say, conformations of my molecule,
two main templates that my molecule is
163
:going to look like.
164
:And they may be divided by what we call an
energy barrier, or in the language of
165
:probabilities, it's just low probability
regions in between large probability
166
:regions.
167
:And in this case, local MCMCs are gonna
fail.
168
:And this is where we believe that
generative models could help us.
169
:And let's say fill this gap to answer some
very important questions.
170
:And how would that work then?
171
:Like you would...
172
:Would you run a first model that would
help you infer that and then use that into
173
:the MCMC algorithm?
174
:Or like, yeah, what does that look like?
175
:I think your intuition is correct.
176
:So you cannot do it in one go.
177
:And what's, for example, the paper that I
published, I think it was last year in
178
:PNAS that is called Adaptive Monte Carlo
Augmented with Normalizing Flows is
179
:precisely implementing something where you
have feedback loops.
180
:So
181
:The idea is that the fact that you have
those local Monte-Carlo's that you can run
182
:within the different regions You have
identified as being interesting Will help
183
:you to see the training of a generative
model that is going to target generating
184
:configurations in those different regions
Once you have this generative model you
185
:can include it in your mark of change
strategy You can use it as a proposal
186
:mechanism
187
:to propose new locations for your MCMC to
jump.
188
:And so you're creating a Monte Carlo chain
that is going to slowly converge towards
189
:the target distribution you're really
after.
190
:And you're gonna do it by using the data
you're producing to train a generative
191
:model that will help you produce better
data as it's incorporated within the MCMC
192
:kernel you are actually jumping with.
193
:So you have this feedback mechanism that
makes that things can work.
194
:And this idea of adaptivity really stems
from the fact that in scientific
195
:computing, we are going to do machine
learning with scarce data.
196
:We are not going to have all the data we
wish we had to start with, but we are
197
:going to have these type of methods where
we are doing things in what we call
198
:adaptively.
199
:So it's doing, recording information,
doing again.
200
:In a few words.
201
:Yeah.
202
:Yeah, yeah.
203
:Yeah.
204
:So I mean, if I understand correctly, it's
a way of going one step further than what
205
:HMC is already doing where we're looking
at the gradients and we're trying to adapt
206
:based on that.
207
:Now, basically, the idea is to find some
way of getting even more information as to
208
:where the next sample should come from.
209
:from the typical set and then being able
to navigate the typical set more
210
:efficiently?
211
:Yes.
212
:Yes, so let's say that it's an algorithm
that is more ambitious than HMC.
213
:Of course, there are caveats.
214
:But HMC is trying to follow a dynamic to
try to travel towards interesting regions.
215
:But it has to be tuned quite finely in
order to actually end up in the next
216
:interesting region.
217
:provided that it started from one.
218
:And so to cross those energy barriers,
here with machine learning, we would
219
:really be jumping over energy barriers.
220
:We would have models that pretty only
targets the interesting regions and just
221
:doesn't care about what's in between.
222
:And that really focuses the efforts where
you believe it matters.
223
:However, there are cases in which those
machine learning models will have trouble
224
:scaling where
225
:HMC would be more robust.
226
:So there is of course always a trade-off
on the algorithms that you are using, how
227
:efficient they can be per MCMC step and
how general you can accept them to be.
228
:Hmm.
229
:I see.
230
:Yeah.
231
:So, and actually, yeah, that would be one
of my questions would be, when do you
232
:think this kind of new algorithm would be?
233
:would be interesting to use instead of the
classic and Chempsey?
234
:Like in which cases would you say people
should give that a try instead of using
235
:the classic rubber state Chempsey method
we have right now?
236
:So that's an excellent question.
237
:I think right now, so on paper, the
algorithm we propose is really, really
238
:powerful because it will allow you to jump
throughout your space and so to...
239
:to correlate your MCMC configurations
extremely fast.
240
:However, for this to happen, you have that
the proposal that is made by your deep
241
:generative model as a new location, I
mean, a new configuration in your MCMC
242
:chain is accepted.
243
:So in the end, you don't have anymore the
fact that you are jumping locally and that
244
:your de-correlation comes from the fact
that you are going to make lots of local
245
:jumps.
246
:Here you could correlate in one step, but
you need to accept.
247
:So the acceptance will be really what you
need to care about in running the
248
:algorithm.
249
:And what is going to determine whether or
not your acceptance is high is actually
250
:the agreement between your deep generative
model and the target distribution you're
251
:after.
252
:And we have traditional, you know,
253
:challenges here in making the genetic
model look like exactly the target we
254
:want.
255
:There are issues with scalability and
there are issues with, I would say,
256
:constraints.
257
:So you give me, let's say you're
interested in Bayesian inference, so
258
:another case where we can apply these kind
of algorithms, right?
259
:Because you have a posterior and you just
want to sample from your posterior to make
260
:sense
261
:10, 100.
262
:I tell you, I know how to train
normalizing flows, which are the specific
263
:type of generative models we are using
here, in 10 or 100 dimension.
264
:So if you believe that your posterior is
multimodal, that it will be hard for
265
:traditional algorithms to visit the entire
landscape and equilibrate because there
266
:are some low density regions in between
high density regions, go for it.
267
:If you...
268
:actually are an astronomer and you want to
marginalize over your initial conditions
269
:on a grid that represents the universe and
actually the posterior distribution you're
270
:interested in is on, you know, variables
that are in millions of dimension.
271
:I'm sorry.
272
:We're not going to do it with you and you
should actually use something that is more
273
:general, something that will use a local
search, but that is actually going to, you
274
:know, be
275
:Unperfect, right?
276
:Because it's going to be very, very hard
also for this algorithm to work.
277
:But the magic of the machine learning will
not scale yet to this type of dimensions.
278
:Yeah, I see.
279
:And is that an avenue you're actively
researching to basically how to scale
280
:these algorithms better to be your scams?
281
:Yeah, of course.
282
:Of course we can always try to do better.
283
:So, I mean, as far as I'm concerned, I'm
also very interested in sampling physical
284
:systems.
285
:And in physical systems, there are a lot
of, you know, prior information that you
286
:have on the system.
287
:You have symmetries, you have, I don't
know, yeah, physical rules that you know
288
:that the system has to fulfill.
289
:Or maybe some, I don't know, multi-scale.
290
:property of the probability distribution,
you know that there are some
291
:self-significant similarities, you have
information you can try to exploit in two
292
:ways, either in the sampling part, so
you're having this coupled MCMC with the
293
:degenerative models, so either in the way
you make proposals you can try to
294
:symmetrize them, you can try to explore
the symmetry by any means.
295
:Oh, you can also directly put it in the
generative model.
296
:So those are things that really are
crucial.
297
:And we understand very well nowadays that
it's naive to think you will learn it all.
298
:You should really use as much information
on your system as you may, as you can.
299
:And after that, you can go one step
further with machine learning.
300
:But in non-trivial systems, it would be, I
mean, it's not a big deal.
301
:deceiving to believe that you could just
learn things.
302
:Yeah.
303
:I mean, completely resonate with that.
304
:It's definitely something we will always
tell students or clients, like, don't
305
:just, you know, throw everything at the
model that you can and just try to pray
306
:that the model works like that.
307
:And, but actually you should probably use
a generative perspective to
308
:try and find out what the best way of
thinking about the problem is, what would
309
:be the good enough, simple enough model
that you can come up with and then try to
310
:run that.
311
:Yeah, so definitely I think that resonates
with a lot of the audience where think
312
:generatively.
313
:And from what I understand from what you
said is also trying to put as much
314
:knowledge and information as you have in
your generative model.
315
:the deep neural network is here, the
normalizing flow is here to help, but it's
316
:not going to be a magical solution to a
suboptimally specified model.
317
:Yes, yes.
318
:Of course, in all those problems, what's
hidden behind is the curse of
319
:dimensionality.
320
:If we are trying to learn something in
very high dimension and...
321
:It could be arbitrarily hard.
322
:It could be that you cannot learn
something in high dimension just because
323
:you would need to observe all the location
in this high dimension to get the
324
:information.
325
:So of course, this is in general not the
case, because what we are trying to learn
326
:has some structure, some underlying
structure that is actually described by
327
:fewer dimensions.
328
:And you actually need fewer observations
to actually learn it.
329
:But the question is, how do you find those
structures, and how do you put them in?
330
:Therefore, we need to take into account as
much as the knowledge we have on the
331
:system to make this learning as efficient
as possible.
332
:Yeah, yeah, yeah.
333
:Now, I mean, that's super interesting.
334
:And that's your paper, Adaptive Monte
Carlo augmented with normalizing floats,
335
:right?
336
:So this is the paper where we did this
generally.
337
:And I don't have yet a paper out where we
are trying to really put the structure in
338
:the generative models.
339
:But that's the direction I'm actively
340
:Okay, yeah.
341
:I mean, so for sure, we'll put that paper
I just seated in the show notes for people
342
:who want to dig deeper.
343
:And also, if by the time this episode is
out, you have the paper or a preprint,
344
:feel free to add that to the show notes or
just tell me and I'll add that to the show
345
:notes.
346
:That sounds really interesting for people
to read.
347
:And so I'm curious, like, you know, this
idea of normalizing flows
348
:deep neural network to help MCMC sample
faster, converge faster to the typical
349
:set.
350
:What was the main objective of doing that?
351
:I'm curious why did you even start
thinking and working on that?
352
:So yes, I think for me,
353
:The answer is really this question of
multimodality.
354
:So the fact that you may be interested in
priority distribution for which it's very
355
:hard to connect the different interesting
regions.
356
:In statistical mechanics, it's something
that we called actually metastability.
357
:So I don't know if it's a word you've
already heard, but where some communities
358
:talk about multimodality, we talk about
metastability.
359
:And metastability are at the heart of many
interesting phenomena in physics.
360
:be it phase transitions.
361
:And therefore, it's something very
challenging in the computations, but in
362
:the same time, very crucial that we have
an understanding of.
363
:So for us, it felt like there was this big
opportunity with those probabilistic
364
:models that were so malleable, that were
so, I mean, of course, hard to train, but
365
:then they give you so much.
366
:They give you an exact...
367
:value for the density that they encode,
plus the possibility of sampling from them
368
:very easily, getting just a bunch of
high-ID samples just in one run through a
369
:neural network.
370
:So for us, there was really this
opportunity of studying multimodal
371
:distribution, in particular, metastable
systems from statistical mechanics with
372
:those tools.
373
:Yeah.
374
:Okay.
375
:So in theory,
376
:these normalizing flows are especially
helpful to handle multimodal posterior.
377
:I didn't get that at first, so that's
interesting.
378
:Yep.
379
:That's really what they're going to offer
you is the possibility to make large
380
:jumps, actually to make jumps within your
Markov chain that can go from one location
381
:of high density to another one.
382
:just in one step.
383
:So this is what you are really interested
in.
384
:Well, first of all, in one step, so you're
going far in one step.
385
:And second of all, regardless of how low
is the density between them, because if
386
:you were to run some other type of local
MCMC, you would, in a sense, need to find
387
:a path between the two modes in order to
visit both of them.
388
:In our case, it's not true.
389
:You're just completely jumping out of the
blue thanks to...
390
:your normalizing flows that is trying to
mimic your target distribution, and
391
:therefore that has developed mass
everywhere that you believe matters, and
392
:that from which you can produce an IID
sample wherever it supports very easily.
393
:I see, yeah.
394
:And I'm guessing you did some benchmarks
for the paper?
395
:So I think that's actually a very
interesting question you're asking,
396
:because I feel benchmarks are extremely
difficult, both in MCMC...
397
:and in deep learning.
398
:So, I mean, you can make benchmarks say,
okay, I changed the architecture and I see
399
:that I'm getting something different.
400
:I can say, I mean, but otherwise, I think
it's one of the big challenges that we
401
:have today.
402
:So if I tell you, okay, with my algorithm,
I can write an MCMC that is going to mix
403
:between the different modes, between the
different metastable states.
404
:that's something that I don't know how to
do by any other means.
405
:So the benchmark is, actually you won.
406
:There is nothing to be compared with, so
that's fine.
407
:But if I need to compare on other cases
where actually I can find those algorithms
408
:that will work, but I know that they are
going to probably take more iterations,
409
:then I still need to factor in a lot of
things in my true
410
:honest benchmark.
411
:I need to factor in the fact that I run a
lot of experiments to choose the
412
:architecture of my normalizing flow.
413
:I run a lot of experiments to choose the
hyperparameters of my training and so on
414
:and so forth.
415
:And I don't see how we can make those
honest benchmarks nowadays.
416
:So I can make one, but I don't think I
will think very highly that it's, I mean,
417
:you know, really revealing some profound
truth about
418
:which solution is really working.
419
:The only way of making a known-use
benchmark would be to take different
420
:teams, give them problems, and lock them
in a room and see who comes out first with
421
:the solution.
422
:But I mean, how can we do that?
423
:Well, we can call on listeners who are
interested to do the experiments to
424
:contact us.
425
:That would be the first thing.
426
:But yeah, that's actually a very good
point.
427
:And in a way, that's a bit frustrating,
right?
428
:Because then it means at least
experimentally, it's hard to differentiate
429
:between the efficiency of the different
algorithms.
430
:So I'm guessing the claims that you make
about this new algorithm being more
431
:efficient for multimodalities,
432
:theoretical underpinning of the algorithm?
433
:No, I mean, it's just based on the fact
that I don't know of any other algorithm,
434
:which under the same premises, which can
do that.
435
:So, I mean, it's an easy way out of making
any benchmark, but also a powerful one
436
:because I really don't know who to compare
to.
437
:But indeed, I think then it's...
438
:As far as I'm concerned, I'm mostly
interested in developing methodologies.
439
:I mean, that's just what I like to do.
440
:But of course, what's important is that
those methods are going to work and they
441
:are going to be useful to some communities
that really have research questions that
442
:they want to answer.
443
:I mean, research or not actually could be
engineering questions, decisions to be
444
:taken that require to do an MCMC.
445
:And I think the true tests of
446
:whether or not the algorithm is useful is
going to be this, the test of time.
447
:Are people adopting the algorithms?
448
:Are they seeing that this is really
something that they can use and that would
449
:make their inference work where they could
not find another method that was as
450
:efficient?
451
:And in this direction, there is the
cross-collaborator, Case Wong, who is
452
:working at the Flatiron Institute and with
whom we developed a package that is called
453
:FlowMC.
454
:that is written in Jax and that implements
these algorithms.
455
:And the idea was really to try to write a
package that was as user-friendly as
456
:possible.
457
:So of course we have the time we have to
take care of it and the experience we have
458
:as a region, you know, available softwares
as we have, but we really try hard.
459
:And at least in this community of people
studying gravitational waves, it seems
460
:that people are really trying, starting to
use this in their research.
461
:And so I'm excited, and I think it is
useful.
462
:But it's not the proper benchmark you
would dream of.
463
:Yeah, you just stole one of my questions.
464
:Basically, I was exactly going to ask you,
but then how can people try these?
465
:Is there a package somewhere?
466
:So yeah, perfect.
467
:That's called FlowMC, you told me.
468
:Yes, it's called FlowMC.
469
:You can pip install FlowMC, and you will
have it.
470
:If you are allergic to Jax...
471
:Right, I have it here.
472
:Yeah, there is a read the docs.
473
:So I'll put that in the show notes for
sure.
474
:Yes, we have even documentation.
475
:That's how far you go when you are
committed to having something that is used
476
:and useful.
477
:So I mean, of course, we are also open to
both comments and contributions.
478
:So just write to us if you're interested.
479
:Yeah, for sure.
480
:Yeah, that folks, if you are interested in
contributing, if you see any bugs, make
481
:sure to open some issues on the GitHub
repo or even better, contribute pull
482
:requests.
483
:I'm sure Marie-Doux and the co-authors
will be very happy about that.
484
:Yes, you know typos in the documentation,
all of this.
485
:Yeah, exactly.
486
:That's what I...
487
:I tell everyone also who wants to start
doing some open source package, start with
488
:the smallest PRs.
489
:You don't have to write a new algorithm,
like already fixing typos, making the
490
:documentation look better, and stuff like
that.
491
:That's extremely valuable, and that will
be appreciated.
492
:So for sure, do that, folks.
493
:Do not be shy with that kind of stuff.
494
:So yeah, I put already the paper, you have
out an archive at adaptive Monte Carlo and
495
:Flow MC, I put that in the show notes.
496
:And yeah, to get back to what you were
saying, basically, I think as more of a
497
:practitioner than a person who developed
the algorithms, I would say the reasons I
498
:would...
499
:you know, adopt that kind of new
algorithms would be that, well, I know,
500
:okay, that algorithm is specialized,
especially for handling multimodels,
501
:multimodels posterior.
502
:So then I'd be, if I have a problem like
that, I'll be like, oh, okay, yeah, I can
503
:use that.
504
:And then also ease of adoption.
505
:So is there an open source package in
which languages that can I just, you know,
506
:What kind of trade-off basically do I have
to make?
507
:Is that something that's easy to adopt?
508
:Is that something that's really a lot of
barriers to adoptions?
509
:But at the same time, it really seems to
be solving my problem.
510
:You know what I'm saying?
511
:It's like, indeed, it's not only the
technical and theoretical aspects of the
512
:method, but also how easy it is to...
513
:adopt in your existing workflows.
514
:Yes.
515
:And for this, I guess it's, I mean, the
feedback is extremely valuable because
516
:when you know the methods, you're really,
it's hard to exactly locate where people
517
:will not understand what you meant.
518
:And so I really welcomed.
519
:No, for sure.
520
:And already I find that absolutely
incredible that now
521
:Almost all new algorithms, at least that I
talk about on the podcast and that I see
522
:in the community, on the PMC community,
almost all of them now, when they come up
523
:with a paper, they come out with an open
source package that's usually installable
524
:in a Python, in the Python ecosystem.
525
:Which is really incredible.
526
:I remember that when I started on these a
few years ago, it was really not the norm
527
:and much more the exception and now almost
528
:The Icon Panning open source package is
almost part of the paper, which is really
529
:good because way more people are going to
use the package than read the paper.
530
:So, this is absolutely a fantastic
evolution.
531
:And thank you in the name of our soul to
have taken the time to develop the
532
:package, clean up the code, put that on
PyPI and making the documentation because
533
:That's where the academic incentives are a
bit disaligned with what I think they
534
:should be.
535
:Because unfortunately, literally it takes
time for you to do that.
536
:And it's not very much appreciated by the
academic community, right?
537
:It's just like, you have to do it, but
they don't really care.
538
:We care as the practitioners, but the
academic world doesn't really.
539
:And what counts is the paper.
540
:So for now, unfortunately, it's really
just time that you take.
541
:out of your paper writing time.
542
:So I'm sure everybody appreciates it.
543
:Yes, but I don't know.
544
:I see true value to it.
545
:And I think, although it's maybe not as
rewarded as it should, I think many of us
546
:see value in doing it.
547
:So you're very welcome.
548
:Yeah, yeah.
549
:No, for sure.
550
:Lots of value in it.
551
:Just saying that value should be more
recognized.
552
:Just a random question, but something I'm
always curious about.
553
:I think I know the answer if I still want
to ask.
554
:Can you handle sample discrete parameters
with these algorithms?
555
:Because that's one of the grails of the
field right now.
556
:How do you sample discrete parameters?
557
:So, okay, the pack, so what I've
implemented, tested, is all on continuous
558
:space.
559
:But, but what I need for this algorithm to
work is a generative model of which I can
560
:sample from easily.
561
:IID, I mean, not I have to make a Monte
Carlo to sample from my note that I can
562
:just in one Python comment or whichever
language you want comment, gets an IID
563
:sample from.
564
:and that I can write what is the
likelihood of this sample.
565
:Because a lot of generative models
actually don't have tractable likelihoods.
566
:So if you think, I don't know, of
generative adversarial networks or
567
:variational entoencoders for people who
might be familiar with those very, very
568
:common generative models, they don't have
this property.
569
:You can generate samples easily, but you
cannot write down with which density of
570
:probability you've generated this sample.
571
:This is really what we need in order to
use this generative model inside a Markov
572
:chain and inside an algorithm that we know
is going to converge towards the target
573
:distribution.
574
:So normalizing flows are playing this role
for us with continuous variables.
575
:They give us easy sampling and easy
evaluation of the likelihood.
576
:But you also have equivalence on discrete
distributions.
577
:And if you want...
578
:generative model that would have those two
properties on discrete distribution, you
579
:should turn yourself to autoregressive
models.
580
:So I don't know if you've learned about
them, but the idea is just that they use a
581
:factorization of probability distributions
that is just with conditional
582
:distributions.
583
:And that's something that is in theory has
full expressivity, that any distribution
584
:can be written as a factorized
distribution where you are progressively
585
:on the degrees of freedom that you have
already sampled.
586
:And you can rewrite the algorithm,
training an autoregressive model in the
587
:place of a normalizing flow.
588
:So honest answer, I haven't tried, but it
can be done.
589
:Well, it can be done.
590
:And now that I'm thinking about it, people
have done it because in statistical
591
:mechanics, there are a lot of systems that
we like.
592
:a lot of our toy systems that are binary.
593
:So that's, for example, the Ising model,
which are a model of spins that are just
594
:binary variables.
595
:And I know of at least one paper where
they are doing something of this sort.
596
:So making jumps, they're actually not
trying to refresh full configurations, or
597
:they are doing two, both refreshing full
configurations and partial configurations.
598
:And they are doing...
599
:something that, in essence, is exactly
this algorithm, but with discrete
600
:variables.
601
:So I'll happily add the reference to this
paper, which is, I think, it's by the
602
:group of Giuseppe Carleo from EPFL.
603
:And OK, I haven't, I don't think they
train exactly like, so it's not exactly
604
:the same algorithm, but things around this
have been tested.
605
:OK, well, it sounds like a.
606
:Sounds like fun, for sure.
607
:Definitely something I'm sure lots of
people would like to test.
608
:So folks, if you have some discrete
parameters somewhere in your models, maybe
609
:you'll be interested by normalizing flows.
610
:So the flow in C package is in the show
notes.
611
:Feel free to try it out.
612
:Another thing I'm curious about is how do
you run the typical network, actually?
613
:And how much of a bottleneck is it on the
sampling time, if any?
614
:Yes.
615
:So it will definitely depend on the space.
616
:No, let me rewrite.
617
:The thing is, whether or not it's going to
be worth it to train a neural network in
618
:order to help you sampling.
619
:depends on how difficult this for you to
sample in, I mean, with the more
620
:traditional MCMCs that you have on your
hand.
621
:So again, if you have a multimodal
distribution, it's very likely that your
622
:traditional MCMC algorithms are just not
going to cut it.
623
:And so then, I mean, if you really care
about sampling this posterior distribution
624
:or this distribution of configurations of
a physical system,
625
:then you will be willing to pay the price
on this sampling.
626
:So instead of, say, having to use a local
sampler that will take you billions of
627
:iterations in order to see transitions
between the modes, you can train a
628
:normalizing flow on the autoregressive
model if you're discrete, and then have
629
:those jumps happening every other time.
630
:Then it's more than clear that it's worth
doing it.
631
:OK, yeah, so the answer is it depends
quite a lot.
632
:Of course, of course.
633
:Yeah, yeah.
634
:And I guess, how does it scale with the
quantity of parameters and quantity of
635
:data?
636
:So quantity of parameters, it's really
this dimension I was already discussing a
637
:bit about and telling you that there is a
cap on what you can really expect these
638
:methods will work on.
639
:I would say that if the quantity of
parameters is something like tens or
640
:hundreds, then things are going to work
well, more or less out of the box.
641
:But if it's larger than this, you will
likely run into trouble.
642
:And then the number of data is actually
something I'm less familiar with because
643
:I'm less from the Bayesian communities
than the stat-mech community to start
644
:with.
645
:So my distribution doesn't have data
embedded in them, in a sense, most of the
646
:time.
647
:But for sure, what people argue, why it's
a really good idea to use generative
648
:models such as normalizing flows to sample
in the Bayesian context.
649
:is the fact that you have an amortization
going on.
650
:And what do I mean by that?
651
:I mean that you're learning a model.
652
:Once it's learned, it's going to be easy
to adjust it if things are changing a
653
:little.
654
:And with little adjustments, you're going
to be able to sample still a very
655
:complicated distribution.
656
:So say you have data that is arriving
online, and you keep on having new samples
657
:to be added to your posterior
distribution.
658
:then it's very easy to just adjust the
normalizing flow with a few training
659
:iterations to get back to the new
posterior you actually have now, given
660
:that you have this amount of data.
661
:So this is what some people call
amortization, the fact that you can really
662
:encapsulate in your model all the
knowledge you have so far, and then just
663
:adjust it a bit, and don't have to start
from scratch, as you would have to in
664
:other.
665
:Monte Carlo methods.
666
:Yeah.
667
:Yeah, so what I'm guessing is that maybe
the tuning time is a bit longer than a
668
:classic HMC.
669
:But then once you're out of the tuning
phase, the sampling is going to be way
670
:faster.
671
:Yes, I think that's a correct way of
putting it.
672
:And otherwise, for the kind of the number
of, I mean, the dimensionality that the
673
:algorithm is comfortable with.
674
:In general, the running times of the
model, how have you noticed that being
675
:like, has that been close to when you use
a classic HMC or is it something you
676
:haven't done yet?
677
:I don't think I can honestly answer this
question.
678
:I think it will depend because it will
also depend how easily your HMC reaches
679
:all the
680
:regions you actually care about.
681
:So I mean, probably there are some
distributions that are very easy for HMC
682
:to cover and where it wouldn't be worth it
to train the model.
683
:But then plenty of cases where things are
the other way around.
684
:Yeah, yeah, yeah.
685
:Yeah, I can guess.
686
:That's always something that's really
fascinating in this algorithm world is how
687
:dependent everything is on the model.
688
:use case, really dependent on the model
and the data.
689
:So on this project, on this algorithm,
what are the next steps for you?
690
:What would you like to develop next on
this algorithm precisely?
691
:Yes, so as I was saying, one of my main
questions is how to scale this algorithm
692
:and
693
:We kind of wrote it in an all-purpose
fashion.
694
:And all-purpose is nice, but all-purpose
does not scale.
695
:So that's really what I'm focusing on,
trying to understand how we can learn
696
:structures we can know or we can learn
from the system, how to explore them and
697
:put them in, in order to be able to tackle
more and more complex systems with higher,
698
:I mean, more degrees of freedom.
699
:So more parameters than what we are
currently doing.
700
:So there's this.
701
:And of course, I'm also very interested in
having some collaborations with people
702
:that care about actual problem for which
this method is actually solving something
703
:for them.
704
:As it's really what gives you the idea of
what's next to be developed, what are the
705
:next methodologies that's
706
:will be useful to people?
707
:Can they already solve their problem?
708
:Do they need something more from you?
709
:And that's the two things I'm having a
look at.
710
:Yeah.
711
:Well, it definitely sounds like fun.
712
:And I hope you'll be able to work on that
and come up with some new, amazing,
713
:exciting papers on this.
714
:I'll be happy to look at that.
715
:And so that's it.
716
:It was a great deep dive on this project.
717
:And thank you for indulging on my
questions, Marilou.
718
:Now, if we want to de-zoom a bit and talk
about other things you do, you're also
719
:interested to mention that in the context
of scarce data.
720
:So I'm curious on what you're doing on
these, if you could elaborate a bit.
721
:Yes, so I guess what I mean by scarce data
is precisely that when we are using
722
:machine learning in scientific computing,
usually what we are doing is exploiting
723
:the great tool that are deep neural
networks to play the role of a surrogate
724
:model somewhere in our scientific
computation.
725
:But most of the time, this is without data
a priori.
726
:We know that there is a function we want
to approximate somewhere.
727
:But in order to have data, either we have
to pay the price of costly experiments,
728
:costly observations, or we have to pay the
price of costly numerics.
729
:So if you, I mean, a very famous example
of applications of machine learning
730
:through scientific computing is molecular
dynamics and quantum precision.
731
:So this is what people call density
functional theory.
732
:So if you want to.
733
:observe the dynamics of a molecule with
the accuracy of what's going on really at
734
:the level of quantum mechanics, then you
have to make very, very costly call to a
735
:function that predicts what's the energy
predicted by quantum mechanics and what
736
:are the forces predicted by quantum
mechanics.
737
:So people have seen here an opportunity to
use deep neural nets in order to just
738
:regress what's the value of this quantum
potential.
739
:at the different locations that you're
going to visit.
740
:And the idea is that you are creating your
own data.
741
:You are deciding when you are going to pay
the price of do the full numerical
742
:computation and then obtain a training
point of given Cartesian coordinates, what
743
:is the value of this energy here.
744
:And then you have to, I mean, conversely
to what you're doing traditionally in
745
:machine learning, where you believe that
you have...
746
:huge data sets that are encapsulating a
rule, and you're going to try to exploit
747
:them at best.
748
:Here, you have the choice of where you
create your data.
749
:And so you, of course, have to be as smart
as possible in order to have to create as
750
:little as possible training points.
751
:And so this is this idea of working with
scarce data that has to be infused in the
752
:usage of machine learning in scientific
computing.
753
:My example of application is just what we
have discussed, where we want to learn a
754
:deep generative model, whereas what we
start, we just have our target
755
:distribution as an objective, but we don't
have any sample from it.
756
:That would be the traditional data that
people will be using in generative
757
:modeling to train a generative model.
758
:So if you want, we are playing this
adaptive game.
759
:I was already a bit eating at.
760
:where we are creating data that is not
exactly the data we want, but that we
761
:believe is informative of the data we want
to train the generative model that is in
762
:turn going to help us to convert the MCMC
and in the same time as you are training
763
:your model, generate the data you would
have needed to train your model.
764
:Yeah, that is really cool.
765
:And of course I asked about that because
scarce data is something that's extremely
766
:common in the Bayesian world.
767
:That's where usually Bayesian statistics
from the yeah, helpful and useful because
768
:when you don't have a lot of data, you
need more structure and more priors.
769
:So if you want to say anything about your
phenomenon of interest.
770
:So that's really cool that you're working
on that.
771
:I love that.
772
:And from also, you know, a bit broader
perspective, you know, MCMC really well.
773
:We work on it a lot.
774
:So I'm curious where you think MCMC is
heading in the next few years.
775
:And if you see its relevance waning in
some way.
776
:Well, I don't think MCMC can go out of
fashion in a sense because it's absolutely
777
:ubiquitous.
778
:So practical use cases are everywhere.
779
:If you have a large probabilistic model,
usually it's given to you by the nature of
780
:the problem you want to study.
781
:And if you cannot choose anything about
putting in the right properties, you're
782
:just going to be.
783
:you know, left with something that you
don't know how to approach except by MCMC.
784
:So it's absolutely ubiquitous as an
algorithm for probabilistic inference.
785
:And I would also say that one of the
things that are going to, you know, keep
786
:MCMC going for a long time is how much
it's a cherished object of study by
787
:actually researchers from different
communities, because I mean...
788
:You can see people really from statistics
that are kind of the prime researchers on,
789
:okay, how should you make a Monte Carlo
method that has the best convergence
790
:properties, the best speed of convergence,
and so on and so forth.
791
:But you can also see that the fields where
those algorithms are used a lot, be it
792
:statistical mechanics, be it Bayesian
inference, also have full communities that
793
:are working on developing MCMCs.
794
:And so I think it's really a matter that
they are an object of curiosity and in
795
:training to a lot of people.
796
:And therefore it's something that's for
now is still very relevant and really
797
:unsolved.
798
:I mean, something that I love about MCMC
is that when you look at it first, you
799
:say, yeah, that's simple, you know?
800
:Yeah.
801
:Yes, that's, but then you start thinking
about it.
802
:Then you...
803
:I mean, realize how subtle are all the
properties of those algorithms.
804
:And you're telling yourself, but I cannot
believe it's so hard to actually sample
805
:from distributions that are not that
complicated when you're a naive newcomer.
806
:And so, yeah, I mean, for now, I think
they are still here and in place.
807
:And if I could even comment a bit more
regarding exactly the context of my
808
:research, where
809
:it could seemingly be the case that I'm
trying to replace MCMC's with machine
810
:learning.
811
:I would warn the listeners that it's not
at all what we are concluding.
812
:I mean, that's not at all the direction we
are going to.
813
:It's really a case where we need both.
814
:That MCMC can benefit from learning, but
learning without MCMC is never going to
815
:give you something that you have enough
guarantees on, that something that you can
816
:really trust for sure.
817
:So I think here there is a really nice
combination of MCMC and learning and that
818
:they're just going to nutter each other
and not replace one another.
819
:Yeah, yeah, for sure.
820
:And I really love the, yeah, that these
projects of trying to make basically MCMC
821
:more informed instead of having first
random draws, you know, almost random
822
:draws with Metropolis in the end.
823
:making that more complicated, more
informed with the gradients, with HMC, and
824
:then normalizing flows, which try to
squeeze a bit more information out of the
825
:structure that you have to make the
sampling go faster.
826
:I found that one super useful.
827
:And also, yeah, that's also a very, very
fascinating part of the research.
828
:And this is part also of a lot of the
research
829
:a lot of initiatives that you have focused
on, right?
830
:Personally, basically how that we could
decry it like a machine learning assisted
831
:scientific computing.
832
:You know, and do you have other examples
to share with us on how machine learning
833
:is helping traditional scientific
computing methods?
834
:Yes.
835
:So, for example, I was giving already the
example of
836
:of the learning of the regression of the
potentials of molecular force fields in
837
:people that are studying molecules.
838
:But we are seeing a lot of other things
going on.
839
:So there are people that are trying to
even use machine learning as a black box
840
:in order to, how should I say, to make
classifications between things they care
841
:about.
842
:So for example, you have samples that come
from a model.
843
:But you're not sure if they come from this
model or this other one.
844
:You're not sure if they are above a
critical temperature or below a critical
845
:temperature, if they belong to the same
phase.
846
:So you can really try to play this game of
creating an artificial data set where you
847
:know what is the answer, train a
classifier, and then use your black box to
848
:tell you when you see a new configuration
which type of configuration it is.
849
:And it's really.
850
:given to you by deep learning because you
would have no idea why the neural net is
851
:deciding that it's actually from this or
from this.
852
:You don't have any other statistics that
you can gather and that will tell you
853
:what's the answer and this is why.
854
:But it's kind of like opening this new
conceptual door that sometimes there are
855
:things that are predictable.
856
:I mean, you can check that, okay, on the
data that you know the answer of the
857
:machine is extremely efficient.
858
:But then you don't know why things are
happening this way.
859
:I mean, there's this, but there are plenty
of other directions.
860
:So people that are, for example, using
neural networks to try to discover a
861
:model.
862
:And here, model would be actually what
people call partial differential
863
:equations, so PDEs.
864
:So I don't know if you've heard about
those physics-informed neural networks.
865
:But there are neural networks that people
are training, such that they are solution
866
:of a PDE.
867
:So instead of actually having training
data, what you do is that you use the
868
:properties of the deep neural nets, which
are that they are differentiable with
869
:respect to their parameters, but also with
respect to their inputs.
870
:And for example, you have a function f.
871
:And you know that the laplation of f is
supposed to be equal to.
872
:the derivative in time of f, well, you can
write mean squared loss on the fact that
873
:the laplacian of your neural network has
to be close to its derivative in time.
874
:And then, given boundary conditions, so
maybe initial condition in time and
875
:boundary condition in space, you can ask a
neural net to predict the solution of the
876
:PDE.
877
:And even better, you can give to your
878
:learning mechanism a library of term that
would be possible candidates for being
879
:part of the PDE.
880
:And you can let the network tell you which
terms of the PDE in the library are
881
:actually, seems to be actually in the data
you are observing.
882
:So, I mean, there are all kinds of
inventive way that researchers are now
883
:using the fact that deep neural nets are
differentiable.
884
:smooth, can generalize easily, and yes,
those universal approximators.
885
:I mean, seemingly you can use neural nets
to represent any kind of function and use
886
:that inside their computation problems to
try to, I don't know, answer all kinds of
887
:scientific questions.
888
:So it's, I believe, pretty exciting.
889
:Yeah, yeah, that is super fun.
890
:I love how
891
:You know, these comes together to help on
really hard sampling problems like
892
:sampling ODE's or PDE's, just extremely
hard.
893
:So yeah, using that.
894
:Maybe one day also we'll get something for
GPs.
895
:I know the Gaussian processes are a lot of
the effort is on decomposing them and
896
:finding some useful
897
:algebraic decompositions, so like the
helper space, Gaussian processes that Bill
898
:Engels especially has added to the PrimeC
API, or eigenvalue decomposition, stuff
899
:like that.
900
:But I'd be curious to see if there are
also some initiatives on trying to help
901
:the conversion of Gaussian processes using
probably deep neural networks, because
902
:there is a mathematical connection between
neural networks and GPs.
903
:I mean, everything is a GP in the end, it
seems.
904
:So yeah, using a neural network to
facilitate the sampling of a Gaussian
905
:process would be super fun.
906
:So I have so many more questions.
907
:But when I be mindful of your time, we've
already been recording for some time.
908
:So I try to make my thoughts more packed.
909
:But something I wanted to ask you
910
:You teach actually a course in
Polytechnique in France that's called
911
:Emerging Topics in Machine Learning.
912
:So I'm curious to hear you say what are
some of the emerging topics that excite
913
:you the most and how do you approach
teaching them?
914
:So in this class, it's actually the nice
class where we have a wild card to just
915
:talk about whatever we want.
916
:So as far as I'm concerned, I'm really
teaching about the last point that we
917
:discussed, which is how can we hope to use
the technology of machine learning to
918
:assist scientific computing.
919
:And I have colleagues that are jointly
teaching this class with me that are, for
920
:example, teaching about optimal transport
or about private and federated learning.
921
:So it can be different topics.
922
:But we all have the same approach to it,
which is to introduce to the students the
923
:main ideas quite briefly and then to give
them the opportunity to learn, to read
924
:papers that we believe are important or at
least really illustrative of those ideas
925
:and the direction in which the research is
going and to read these papers, of course,
926
:critically.
927
:So the idea is that we want to make sure
that they are understood.
928
:We also want them to implement the
methods.
929
:And once you implement the methods, you
realize everything that is sometimes under
930
:the rug in the paper.
931
:So where is it really difficult?
932
:Where the method is really making a
difference?
933
:And so on and so forth.
934
:So that's our approach to it.
935
:Yeah, that must be a very fun course.
936
:At which level do you teach that?
937
:So our students are third year at Ecole
Polytechnique.
938
:So that would be equivalent to the first
year of graduate program.
939
:Yeah.
940
:And actually, looking forward, what do you
think are the most promising areas of
941
:research in what you do?
942
:So basically, interaction of machine
learning and statistical physics.
943
:Well, I think something that actually has
been and will continue being a very, very
944
:fruitful field between statistical
mechanics and machine learning are
945
:generative models.
946
:So you probably heard of diffusion models,
and there are new kind of generative
947
:models that are relying on learning how to
reverse a diffusion process, a diffusion
948
:process that is noising the data.
949
:once you've learned how to reverse it,
will allow you to transform noise into
950
:data.
951
:It's something that is really close to
statistical mechanics because the
952
:diffusion really comes from studying
brilliant particles that are all around
953
:us.
954
:And this is where this mathematics comes
from.
955
:And this is still an object of study in
the field of statistical mechanics.
956
:And you've served a lot of machine
learning models.
957
:I could also cite Boltzmann machines.
958
:I mean, they have even the name of the
father of statistical mechanics,
959
:Boltzmann.
960
:And it's here again, I mean, something
where it's really inspiration from the
961
:model studied by physicists that gave the
first forms of models that were used by
962
:machine learner in order to do density
estimation.
963
:So there is really this cross-fatalization
964
:has been here for, I guess, the last 50
years.
965
:The field of machine learning has really
emerged in the communities.
966
:And I'm hoping that my work and all the
groups that are working in this direction
967
:are also going to demonstrate the other
way around, that generative models can
968
:help also a lot in statistical mechanics.
969
:So that's definitely what I am looking
forward to.
970
:Yeah.
971
:Yeah, I love that and understand why
you're talking about that, especially now
972
:with the whole conversation we've had.
973
:That your answer is not surprising to me.
974
:Actually, something also that I mean, even
broader than that, I'm guessing you
975
:already care a lot about these questions
from what I get, but if you could choose
976
:the questions you'd like to see the answer
to before you die, what would they be?
977
:That's obviously a very vast question.
978
:If I stick to a bit really this...
979
:what we've discussed about the sampling
problems and where I think they are hard
980
:and why they are so intriguing.
981
:I think that something I'm very keen on
seeing some progress around is this
982
:question of sampling multimodal
distributions but have come up with
983
:guarantees.
984
:Here, there's really, in a sense, sampling
a multimodal distribution could be just
985
:judged.
986
:undoable.
987
:I mean, there is some NP-hardness that is
hidden somewhere in this picture.
988
:So of course, it's not going to be
something general, but I'm really
989
:wondering, I mean, I'm really thinking
that there should be some assumption, some
990
:way of formalizing the problem under which
we could understand how to construct
991
:algorithms that will probably, you know,
succeed in making this something happen.
992
:And so here, I don't know, it's a
theoretical question, but I'm
993
:very curious about what we will manage to
say in this direction.
994
:Yeah.
995
:And actually that sets us up, I think, for
the last two questions of the show.
996
:So, I mean, I have other questions, but
already I've been recording for a long
997
:time.
998
:So I need to let you go and have dinner.
999
:I know it's late for you.
:
01:07:27,196 --> 01:07:29,896
So let me ask you the last two questions.
:
01:07:29,896 --> 01:07:32,437
I ask every guest at the end of the show.
:
01:07:33,097 --> 01:07:33,874
First one.
:
01:07:33,874 --> 01:07:38,321
If you had unlimited time and resources,
which problem would you try to solve?
:
01:07:40,562 --> 01:07:45,244
I think it's an excellent question because
it's an excellent opportunity maybe to say
:
01:07:45,244 --> 01:07:49,485
that we don't have unlimited resources.
:
01:07:50,306 --> 01:07:57,149
I think it's probably the biggest
challenge we have right now to understand
:
01:07:57,149 --> 01:08:02,631
and to collectively understand because I
think now we individually understand that
:
01:08:02,631 --> 01:08:04,792
we don't have unlimited resources.
:
01:08:05,132 --> 01:08:07,913
And in a sense the...
:
01:08:08,834 --> 01:08:14,156
the biggest problem is how do we move this
complex system of human societies we have
:
01:08:14,196 --> 01:08:19,778
created in order to move within the
direction where we are using precisely
:
01:08:19,778 --> 01:08:21,058
less resources.
:
01:08:21,279 --> 01:08:25,881
And I mean, it has nothing to do with
anything that we have discussed before,
:
01:08:25,881 --> 01:08:32,083
but it feels to me that it's really where
the biggest question is lying that really
:
01:08:32,083 --> 01:08:33,304
matters today.
:
01:08:33,504 --> 01:08:35,965
And I have no clue how to approach it.
:
01:08:36,425 --> 01:08:37,105
But
:
01:08:38,046 --> 01:08:39,606
I think it's actually what matters.
:
01:08:39,606 --> 01:08:46,389
And if I had a limit in time and
resources, that's definitely what I would
:
01:08:46,529 --> 01:08:47,989
be researching towards.
:
01:08:49,270 --> 01:08:51,791
Yeah.
:
01:08:51,791 --> 01:08:52,431
Love that answer.
:
01:08:52,431 --> 01:08:54,752
And you're definitely in good company.
:
01:08:54,932 --> 01:08:59,494
Lots of people have talked about that for
this question, actually.
:
01:08:59,875 --> 01:09:04,076
And second question, if you could have
dinner with any great scientific mind,
:
01:09:04,136 --> 01:09:07,417
dead, alive, or fictional, who would it
be?
:
01:09:09,518 --> 01:09:14,201
So, I mean, a logic answer with my last
response is actually Grotendieck.
:
01:09:14,201 --> 01:09:20,584
So, I don't know, you probably know about
this mathematician who, I mean, was
:
01:09:20,744 --> 01:09:27,988
somebody worried about, you know, our
relationship to the world, let's say, as
:
01:09:27,988 --> 01:09:34,732
scientists very early on, and who had
concluded that to some extent we should
:
01:09:34,732 --> 01:09:36,213
not be doing research.
:
01:09:36,573 --> 01:09:37,253
So...
:
01:09:38,174 --> 01:09:44,835
I don't know that I agree, but I also
don't think it's obviously wrong.
:
01:09:44,835 --> 01:09:50,617
So I think it would be really probably one
of the most interesting discussion to be
:
01:09:50,617 --> 01:09:54,258
added on top that he was a fantastic
speaker.
:
01:09:54,258 --> 01:09:58,579
And I do invite you to listen to his
conferences and that it would be really
:
01:09:58,579 --> 01:10:00,580
fascinating to have this conversation.
:
01:10:01,340 --> 01:10:02,120
Yeah.
:
01:10:02,180 --> 01:10:02,920
Great.
:
01:10:03,420 --> 01:10:04,061
Great answer.
:
01:10:04,061 --> 01:10:06,741
You know, definitely the first one to
answer Grotendic.
:
01:10:07,922 --> 01:10:09,402
But that'd be cool.
:
01:10:09,402 --> 01:10:09,542
Yeah.
:
01:10:09,542 --> 01:10:14,624
If you have a favorite conference of him,
feel free to put that in the show notes
:
01:10:14,624 --> 01:10:19,266
for listeners, I think it's going to be
really interesting and fun for people.
:
01:10:19,727 --> 01:10:21,247
Might be in French, but...
:
01:10:22,328 --> 01:10:26,369
I mean, there are a lot of subtitles now.
:
01:10:26,369 --> 01:10:31,932
If it's in YouTube, it's doing a pretty
good job at the automated transcription,
:
01:10:31,932 --> 01:10:32,772
especially in English.
:
01:10:32,772 --> 01:10:35,213
So I think it will be okay.
:
01:10:36,498 --> 01:10:40,059
And that will be good for people's French
lessons.
:
01:10:40,059 --> 01:10:43,961
So yeah, you know, two birds with one
stone.
:
01:10:44,241 --> 01:10:45,981
So definitely include that now.
:
01:10:47,642 --> 01:10:48,563
Awesome, Marie-Lou.
:
01:10:48,563 --> 01:10:51,124
So that was really great.
:
01:10:51,124 --> 01:10:54,705
Thanks a lot for taking the time and being
so generous with your time.
:
01:10:55,546 --> 01:10:59,268
I'm happy because I had a lot of
questions, but I think we did a pretty
:
01:10:59,268 --> 01:11:02,789
good job at tackling most of them.
:
01:11:03,029 --> 01:11:03,969
As usual,
:
01:11:04,070 --> 01:11:08,375
I put resources and a link to your website
in the show notes for those who want to
:
01:11:08,375 --> 01:11:09,316
dig deeper.
:
01:11:09,316 --> 01:11:12,580
Thank you again, Marie-Lou, for taking the
time and being on this show.
:
01:11:13,301 --> 01:11:14,883
Thank you so much for having me.