Artwork for podcast Learning Bayesian Statistics
#98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié
Episode 9824th January 2024 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:05:06

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

How does the world of statistical physics intertwine with machine learning, and what groundbreaking insights can this fusion bring to the field of artificial intelligence?

In this episode, we delve into these intriguing questions with Marylou Gabrié. an assistant professor at CMAP, Ecole Polytechnique in Paris. Having completed her PhD in physics at École Normale Supérieure, Marylou ventured to New York City for a joint postdoctoral appointment at New York University’s Center for Data Science and the Flatiron’s Center for Computational Mathematics.

As you’ll hear, her research is not just about theoretical exploration; it also extends to the practical adaptation of machine learning techniques in scientific contexts, particularly where data is scarce.

In this conversation, we’ll traverse the landscape of Marylou's research, discussing her recent publications and her innovative approaches to machine learning challenges, latest MCMC advances, and ML-assisted scientific computing.

Beyond that, get ready to discover the person behind the science – her inspirations, aspirations, and maybe even what she does when not decoding the complexities of machine learning algorithms!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Takeaways

  • Developing methods that leverage machine learning for scientific computing can provide valuable insights into high-dimensional probabilistic models.
  • Generative models can be used to speed up Markov Chain Monte Carlo (MCMC) methods and improve the efficiency of sampling from complex distributions.
  • The Adaptive Monte Carlo algorithm augmented with normalizing flows offers a powerful approach for sampling from multimodal distributions.
  • Scaling the algorithm to higher dimensions and handling discrete parameters are ongoing challenges in the field.
  • Open-source packages, such as Flow MC, provide valuable tools for researchers and practitioners to adopt and contribute to the development of new algorithms. The scaling of algorithms depends on the quantity of parameters and data. While some methods work well with a few hundred parameters, larger quantities can lead to difficulties.
  • Generative models, such as normalizing flows, offer benefits in the Bayesian context, including amortization and the ability to adjust the model with new data.
  • Machine learning and MCMC are complementary and should be used together rather than replacing one another.
  • Machine learning can assist scientific computing in the context of scarce data, where expensive experiments or numerics are required.
  • The future of MCMC lies in the exploration of sampling multimodal distributions and understanding resource limitations in scientific research.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

How does the world of statistical physics

intertwine with machine learning, and what

2

:

groundbreaking insights can this fusion

bring to the field of artificial

3

:

intelligence?

4

:

In this episode, we'll delve into these

intriguing questions with Marilou Gavrier.

5

:

Having completed her doctorate in physics

at Ecole Normale Supérieure, Marilou

6

:

ventured to New York City for a joint

postdoctoral appointment at New York

7

:

University's Center for Data Science.

8

:

and the Flatirons Center for Computational

Mathematics.

9

:

As you'll hear, her research is not just

about theoretical exploration, it also

10

:

extends to the practical adaptation of

machine learning techniques in scientific

11

:

contexts, particularly where data are

scarce.

12

:

And this conversation will traverse the

landscape of Marie-Lou's research,

13

:

discussing her recent publications and her

innovative approaches to machine learning

14

:

challenges.

15

:

her inspirations, aspirations, and maybe

even what she does when she's not decoding

16

:

the complexities of machine learning

algorithms.

17

:

This is Learning Bayesian Statistics,

,:

18

:

Let me show you how to be a good lazy and

change your predictions.

19

:

Marie-Louis Gabrié, welcome to Learning

Bayesian Statistics.

20

:

Thank you very much, Alex, for having me.

21

:

Yes, thank you.

22

:

And thank you to Virgil, André and me for

putting us in contact.

23

:

This is a French connection network here.

24

:

So thanks a lot, Virgil.

25

:

Thanks a lot, Marie-Lou for taking the

time.

26

:

I'm probably going to say Marie-Lou

because it flows better in my English

27

:

because saying Marie-Lou is and then

continuing with English.

28

:

I'm going to have the French accent, which

nobody wants to hear that.

29

:

So let's start.

30

:

So I gave a bit of...

31

:

of your background in the intro to this

episode, Marie-Lou, but can you define the

32

:

work that you're doing nowadays and the

topics that you are particularly

33

:

interested in?

34

:

I would define my work as being focused on

developing methods and more precisely

35

:

developing methods that use and leverage

all the progress in machine learning for

36

:

scientific computing.

37

:

I have a special focus within this realm.

38

:

which is to study high-dimensional

probabilistic models, because they really

39

:

come up everywhere.

40

:

And I think they give us a very particular

lens on our world.

41

:

And so I would say I'm working broadly in

this direction.

42

:

Well, that sounds like a lot of fun.

43

:

So I understand why Virgil put me in

contact with you.

44

:

And could you start by telling us about

your journey?

45

:

actually into the field of statistical

physics and how it led you to merge these

46

:

interests with machine learning and what

you're doing today.

47

:

Absolutely.

48

:

My background is actually in physics, so I

studied physics.

49

:

Among the topics in physics, I quickly

became interested in statistical

50

:

mechanics.

51

:

I don't know if all listeners would be

familiar with statistical mechanics, but I

52

:

would define it.

53

:

broadly as the study of complex systems

with many interacting components.

54

:

So it could be really anything.

55

:

You could think of molecules, which are

networks of interacting agents that have

56

:

non-trivial interactions and that have

non-trivial behaviors when put all

57

:

together within one system.

58

:

And I think it's really important, as I

was saying, viewpoint of the world today

59

:

to look at those big macroscopic systems

that you can study probabilistically.

60

:

And so I was quickly interested in this

field that is statistical mechanics.

61

:

And at some point machine learning got the

picture.

62

:

And the way it did is that I was looking

for a PhD in:

63

:

And I had some of my friends that were,

you know, students in computer science and

64

:

kind of early commerce to machine

learning.

65

:

And so I started to know that it existed.

66

:

I started to know that actually deep

neural networks were revolutionizing the

67

:

fields, that you could expect a program

to, I don't know, give names to people in

68

:

pictures.

69

:

And I thought, well, if this is possible,

I really wanna know how it works.

70

:

I really want to, for this technology, not

to sound like magic to me, and I want to

71

:

know about it.

72

:

And so this is how I started to become

interested and to...

73

:

find out that people knew how to make it

work, but not how it worked, why it worked

74

:

so well.

75

:

And so this is how I, in the end, was put

into contact with Florence Akala, who was

76

:

my PhD advisor.

77

:

And I started to have this angle of trying

to use statistical mechanics framework to

78

:

study deep neural networks that are

precisely those complex systems I was just

79

:

mentioning, and that are so big that we

are having trouble making really sense of

80

:

what they are doing.

81

:

Yeah, I mean, that must be quite...

82

:

Indeed, it must be quite challenging.

83

:

We could already dive into that.

84

:

That sounds like fun.

85

:

Do you want to talk a bit more about that

project?

86

:

Since then, I really shifted my angle.

87

:

I studied in this direction for, say,

three, four years.

88

:

Now, I'm actually going back to really the

applications to real-world systems, let's

89

:

say.

90

:

using all the potentialities of deep

learning.

91

:

So it's like the same intersection, but

looking at it from the other side.

92

:

Now really looking at application and

using machine learning as a tool, where I

93

:

was looking at machine learning as my

study, my object of study, and using

94

:

statistical mechanics before.

95

:

So I'm keen on talking about what I'm

doing now.

96

:

Yeah.

97

:

So basically you...

98

:

You changed, now you're doing the other

way around, right?

99

:

You're studying statistical physics with

machine learning tools instead of doing

100

:

the opposite.

101

:

And so how does, yeah, what does that look

like?

102

:

What does that mean concretely?

103

:

Maybe can you talk about an example from

your own work so that listeners can get a

104

:

better idea?

105

:

Yeah, absolutely.

106

:

So.

107

:

As I was saying, statistical mechanics is

really about large systems that we study

108

:

probabilistically.

109

:

And here there's a tool, I mean, that

would be one of the, I would say, most

110

:

active direction of research in machine

learning today, which are generative

111

:

models.

112

:

And they are very natural because there

are ways of making probabilistic model,

113

:

but that you can control.

114

:

That you have control.

115

:

produce samples from within one commons,

where you are in need of very much more

116

:

challenging algorithms if you want to do

it in a general physical system.

117

:

So we have those machines that we can

leverage and that we can actually combine

118

:

in our typical computation tools such as

Markov chain Monte Carlo algorithms, and

119

:

that will allow us to speed up the

algorithms.

120

:

Of course, it requires some adaptation

compared to what people usually do in

121

:

machine learning and how those generative

models were developed, but it's possible

122

:

and it's fascinating to try to make those

adaptations.

123

:

Hmm.

124

:

So, yeah, that's interesting because if I

understand correctly, you're saying that

125

:

one of your...

126

:

One of the aspects of your...

127

:

job is to understand how to use MCMC

methods to speed up these models?

128

:

Actually, it's the other way around, is

how to use those models to speed up MCMC

129

:

methods.

130

:

Okay.

131

:

Can you talk about that?

132

:

That sounds like fun.

133

:

Yeah, of course.

134

:

Say MCMC algorithms, so Markov Chain

Monte-Carlo's are really the go-to

135

:

algorithm when you are faced with

probabilistic models that is describing

136

:

whichever system you care about, say it

might be a molecule, and this molecule has

137

:

a bunch of atoms, and so you know that you

can describe your system, I mean at least

138

:

classically, at the level of giving the

Cartesian coordinates of all the atoms in

139

:

your system.

140

:

And then you can describe the equilibrium

properties of your system.

141

:

by using the energy function of this

molecule.

142

:

So if you believe that you have an energy

function for this molecule, then you

143

:

believe that it's distributed as

exponential minus beta the energy.

144

:

This is the Boltzmann distribution.

145

:

And then, okay, you are left with your

probabilistic model.

146

:

And if you want to approach it, a priori

you have no control onto what this energy

147

:

function is imposing as constraints.

148

:

It may be very, very complicated.

149

:

Well, go-to algorithm is Markov chain

Monte Carlo.

150

:

And it's a go-to algorithm that is always

going to work.

151

:

And here I'm putting quotes around this

thing.

152

:

Because it's going to be a greedy

algorithm that is going to be looking for

153

:

plausible configurations next to other

plausible configurations.

154

:

And locally, make a search on the

configuration space, try to visit it, and

155

:

then.

156

:

will be representative of the

thermodynamics.

157

:

Of course, it's not that easy.

158

:

And although you can make such locally,

sometimes it's really not enough to

159

:

describe fully probabilistic modeling, in

particular, how different regions of your

160

:

configuration space are related to one

another.

161

:

So if I come back to my molecule example,

it would be that I have two different,

162

:

let's say, conformations of my molecule,

two main templates that my molecule is

163

:

going to look like.

164

:

And they may be divided by what we call an

energy barrier, or in the language of

165

:

probabilities, it's just low probability

regions in between large probability

166

:

regions.

167

:

And in this case, local MCMCs are gonna

fail.

168

:

And this is where we believe that

generative models could help us.

169

:

And let's say fill this gap to answer some

very important questions.

170

:

And how would that work then?

171

:

Like you would...

172

:

Would you run a first model that would

help you infer that and then use that into

173

:

the MCMC algorithm?

174

:

Or like, yeah, what does that look like?

175

:

I think your intuition is correct.

176

:

So you cannot do it in one go.

177

:

And what's, for example, the paper that I

published, I think it was last year in

178

:

PNAS that is called Adaptive Monte Carlo

Augmented with Normalizing Flows is

179

:

precisely implementing something where you

have feedback loops.

180

:

So

181

:

The idea is that the fact that you have

those local Monte-Carlo's that you can run

182

:

within the different regions You have

identified as being interesting Will help

183

:

you to see the training of a generative

model that is going to target generating

184

:

configurations in those different regions

Once you have this generative model you

185

:

can include it in your mark of change

strategy You can use it as a proposal

186

:

mechanism

187

:

to propose new locations for your MCMC to

jump.

188

:

And so you're creating a Monte Carlo chain

that is going to slowly converge towards

189

:

the target distribution you're really

after.

190

:

And you're gonna do it by using the data

you're producing to train a generative

191

:

model that will help you produce better

data as it's incorporated within the MCMC

192

:

kernel you are actually jumping with.

193

:

So you have this feedback mechanism that

makes that things can work.

194

:

And this idea of adaptivity really stems

from the fact that in scientific

195

:

computing, we are going to do machine

learning with scarce data.

196

:

We are not going to have all the data we

wish we had to start with, but we are

197

:

going to have these type of methods where

we are doing things in what we call

198

:

adaptively.

199

:

So it's doing, recording information,

doing again.

200

:

In a few words.

201

:

Yeah.

202

:

Yeah, yeah.

203

:

Yeah.

204

:

So I mean, if I understand correctly, it's

a way of going one step further than what

205

:

HMC is already doing where we're looking

at the gradients and we're trying to adapt

206

:

based on that.

207

:

Now, basically, the idea is to find some

way of getting even more information as to

208

:

where the next sample should come from.

209

:

from the typical set and then being able

to navigate the typical set more

210

:

efficiently?

211

:

Yes.

212

:

Yes, so let's say that it's an algorithm

that is more ambitious than HMC.

213

:

Of course, there are caveats.

214

:

But HMC is trying to follow a dynamic to

try to travel towards interesting regions.

215

:

But it has to be tuned quite finely in

order to actually end up in the next

216

:

interesting region.

217

:

provided that it started from one.

218

:

And so to cross those energy barriers,

here with machine learning, we would

219

:

really be jumping over energy barriers.

220

:

We would have models that pretty only

targets the interesting regions and just

221

:

doesn't care about what's in between.

222

:

And that really focuses the efforts where

you believe it matters.

223

:

However, there are cases in which those

machine learning models will have trouble

224

:

scaling where

225

:

HMC would be more robust.

226

:

So there is of course always a trade-off

on the algorithms that you are using, how

227

:

efficient they can be per MCMC step and

how general you can accept them to be.

228

:

Hmm.

229

:

I see.

230

:

Yeah.

231

:

So, and actually, yeah, that would be one

of my questions would be, when do you

232

:

think this kind of new algorithm would be?

233

:

would be interesting to use instead of the

classic and Chempsey?

234

:

Like in which cases would you say people

should give that a try instead of using

235

:

the classic rubber state Chempsey method

we have right now?

236

:

So that's an excellent question.

237

:

I think right now, so on paper, the

algorithm we propose is really, really

238

:

powerful because it will allow you to jump

throughout your space and so to...

239

:

to correlate your MCMC configurations

extremely fast.

240

:

However, for this to happen, you have that

the proposal that is made by your deep

241

:

generative model as a new location, I

mean, a new configuration in your MCMC

242

:

chain is accepted.

243

:

So in the end, you don't have anymore the

fact that you are jumping locally and that

244

:

your de-correlation comes from the fact

that you are going to make lots of local

245

:

jumps.

246

:

Here you could correlate in one step, but

you need to accept.

247

:

So the acceptance will be really what you

need to care about in running the

248

:

algorithm.

249

:

And what is going to determine whether or

not your acceptance is high is actually

250

:

the agreement between your deep generative

model and the target distribution you're

251

:

after.

252

:

And we have traditional, you know,

253

:

challenges here in making the genetic

model look like exactly the target we

254

:

want.

255

:

There are issues with scalability and

there are issues with, I would say,

256

:

constraints.

257

:

So you give me, let's say you're

interested in Bayesian inference, so

258

:

another case where we can apply these kind

of algorithms, right?

259

:

Because you have a posterior and you just

want to sample from your posterior to make

260

:

sense

261

:

10, 100.

262

:

I tell you, I know how to train

normalizing flows, which are the specific

263

:

type of generative models we are using

here, in 10 or 100 dimension.

264

:

So if you believe that your posterior is

multimodal, that it will be hard for

265

:

traditional algorithms to visit the entire

landscape and equilibrate because there

266

:

are some low density regions in between

high density regions, go for it.

267

:

If you...

268

:

actually are an astronomer and you want to

marginalize over your initial conditions

269

:

on a grid that represents the universe and

actually the posterior distribution you're

270

:

interested in is on, you know, variables

that are in millions of dimension.

271

:

I'm sorry.

272

:

We're not going to do it with you and you

should actually use something that is more

273

:

general, something that will use a local

search, but that is actually going to, you

274

:

know, be

275

:

Unperfect, right?

276

:

Because it's going to be very, very hard

also for this algorithm to work.

277

:

But the magic of the machine learning will

not scale yet to this type of dimensions.

278

:

Yeah, I see.

279

:

And is that an avenue you're actively

researching to basically how to scale

280

:

these algorithms better to be your scams?

281

:

Yeah, of course.

282

:

Of course we can always try to do better.

283

:

So, I mean, as far as I'm concerned, I'm

also very interested in sampling physical

284

:

systems.

285

:

And in physical systems, there are a lot

of, you know, prior information that you

286

:

have on the system.

287

:

You have symmetries, you have, I don't

know, yeah, physical rules that you know

288

:

that the system has to fulfill.

289

:

Or maybe some, I don't know, multi-scale.

290

:

property of the probability distribution,

you know that there are some

291

:

self-significant similarities, you have

information you can try to exploit in two

292

:

ways, either in the sampling part, so

you're having this coupled MCMC with the

293

:

degenerative models, so either in the way

you make proposals you can try to

294

:

symmetrize them, you can try to explore

the symmetry by any means.

295

:

Oh, you can also directly put it in the

generative model.

296

:

So those are things that really are

crucial.

297

:

And we understand very well nowadays that

it's naive to think you will learn it all.

298

:

You should really use as much information

on your system as you may, as you can.

299

:

And after that, you can go one step

further with machine learning.

300

:

But in non-trivial systems, it would be, I

mean, it's not a big deal.

301

:

deceiving to believe that you could just

learn things.

302

:

Yeah.

303

:

I mean, completely resonate with that.

304

:

It's definitely something we will always

tell students or clients, like, don't

305

:

just, you know, throw everything at the

model that you can and just try to pray

306

:

that the model works like that.

307

:

And, but actually you should probably use

a generative perspective to

308

:

try and find out what the best way of

thinking about the problem is, what would

309

:

be the good enough, simple enough model

that you can come up with and then try to

310

:

run that.

311

:

Yeah, so definitely I think that resonates

with a lot of the audience where think

312

:

generatively.

313

:

And from what I understand from what you

said is also trying to put as much

314

:

knowledge and information as you have in

your generative model.

315

:

the deep neural network is here, the

normalizing flow is here to help, but it's

316

:

not going to be a magical solution to a

suboptimally specified model.

317

:

Yes, yes.

318

:

Of course, in all those problems, what's

hidden behind is the curse of

319

:

dimensionality.

320

:

If we are trying to learn something in

very high dimension and...

321

:

It could be arbitrarily hard.

322

:

It could be that you cannot learn

something in high dimension just because

323

:

you would need to observe all the location

in this high dimension to get the

324

:

information.

325

:

So of course, this is in general not the

case, because what we are trying to learn

326

:

has some structure, some underlying

structure that is actually described by

327

:

fewer dimensions.

328

:

And you actually need fewer observations

to actually learn it.

329

:

But the question is, how do you find those

structures, and how do you put them in?

330

:

Therefore, we need to take into account as

much as the knowledge we have on the

331

:

system to make this learning as efficient

as possible.

332

:

Yeah, yeah, yeah.

333

:

Now, I mean, that's super interesting.

334

:

And that's your paper, Adaptive Monte

Carlo augmented with normalizing floats,

335

:

right?

336

:

So this is the paper where we did this

generally.

337

:

And I don't have yet a paper out where we

are trying to really put the structure in

338

:

the generative models.

339

:

But that's the direction I'm actively

340

:

Okay, yeah.

341

:

I mean, so for sure, we'll put that paper

I just seated in the show notes for people

342

:

who want to dig deeper.

343

:

And also, if by the time this episode is

out, you have the paper or a preprint,

344

:

feel free to add that to the show notes or

just tell me and I'll add that to the show

345

:

notes.

346

:

That sounds really interesting for people

to read.

347

:

And so I'm curious, like, you know, this

idea of normalizing flows

348

:

deep neural network to help MCMC sample

faster, converge faster to the typical

349

:

set.

350

:

What was the main objective of doing that?

351

:

I'm curious why did you even start

thinking and working on that?

352

:

So yes, I think for me,

353

:

The answer is really this question of

multimodality.

354

:

So the fact that you may be interested in

priority distribution for which it's very

355

:

hard to connect the different interesting

regions.

356

:

In statistical mechanics, it's something

that we called actually metastability.

357

:

So I don't know if it's a word you've

already heard, but where some communities

358

:

talk about multimodality, we talk about

metastability.

359

:

And metastability are at the heart of many

interesting phenomena in physics.

360

:

be it phase transitions.

361

:

And therefore, it's something very

challenging in the computations, but in

362

:

the same time, very crucial that we have

an understanding of.

363

:

So for us, it felt like there was this big

opportunity with those probabilistic

364

:

models that were so malleable, that were

so, I mean, of course, hard to train, but

365

:

then they give you so much.

366

:

They give you an exact...

367

:

value for the density that they encode,

plus the possibility of sampling from them

368

:

very easily, getting just a bunch of

high-ID samples just in one run through a

369

:

neural network.

370

:

So for us, there was really this

opportunity of studying multimodal

371

:

distribution, in particular, metastable

systems from statistical mechanics with

372

:

those tools.

373

:

Yeah.

374

:

Okay.

375

:

So in theory,

376

:

these normalizing flows are especially

helpful to handle multimodal posterior.

377

:

I didn't get that at first, so that's

interesting.

378

:

Yep.

379

:

That's really what they're going to offer

you is the possibility to make large

380

:

jumps, actually to make jumps within your

Markov chain that can go from one location

381

:

of high density to another one.

382

:

just in one step.

383

:

So this is what you are really interested

in.

384

:

Well, first of all, in one step, so you're

going far in one step.

385

:

And second of all, regardless of how low

is the density between them, because if

386

:

you were to run some other type of local

MCMC, you would, in a sense, need to find

387

:

a path between the two modes in order to

visit both of them.

388

:

In our case, it's not true.

389

:

You're just completely jumping out of the

blue thanks to...

390

:

your normalizing flows that is trying to

mimic your target distribution, and

391

:

therefore that has developed mass

everywhere that you believe matters, and

392

:

that from which you can produce an IID

sample wherever it supports very easily.

393

:

I see, yeah.

394

:

And I'm guessing you did some benchmarks

for the paper?

395

:

So I think that's actually a very

interesting question you're asking,

396

:

because I feel benchmarks are extremely

difficult, both in MCMC...

397

:

and in deep learning.

398

:

So, I mean, you can make benchmarks say,

okay, I changed the architecture and I see

399

:

that I'm getting something different.

400

:

I can say, I mean, but otherwise, I think

it's one of the big challenges that we

401

:

have today.

402

:

So if I tell you, okay, with my algorithm,

I can write an MCMC that is going to mix

403

:

between the different modes, between the

different metastable states.

404

:

that's something that I don't know how to

do by any other means.

405

:

So the benchmark is, actually you won.

406

:

There is nothing to be compared with, so

that's fine.

407

:

But if I need to compare on other cases

where actually I can find those algorithms

408

:

that will work, but I know that they are

going to probably take more iterations,

409

:

then I still need to factor in a lot of

things in my true

410

:

honest benchmark.

411

:

I need to factor in the fact that I run a

lot of experiments to choose the

412

:

architecture of my normalizing flow.

413

:

I run a lot of experiments to choose the

hyperparameters of my training and so on

414

:

and so forth.

415

:

And I don't see how we can make those

honest benchmarks nowadays.

416

:

So I can make one, but I don't think I

will think very highly that it's, I mean,

417

:

you know, really revealing some profound

truth about

418

:

which solution is really working.

419

:

The only way of making a known-use

benchmark would be to take different

420

:

teams, give them problems, and lock them

in a room and see who comes out first with

421

:

the solution.

422

:

But I mean, how can we do that?

423

:

Well, we can call on listeners who are

interested to do the experiments to

424

:

contact us.

425

:

That would be the first thing.

426

:

But yeah, that's actually a very good

point.

427

:

And in a way, that's a bit frustrating,

right?

428

:

Because then it means at least

experimentally, it's hard to differentiate

429

:

between the efficiency of the different

algorithms.

430

:

So I'm guessing the claims that you make

about this new algorithm being more

431

:

efficient for multimodalities,

432

:

theoretical underpinning of the algorithm?

433

:

No, I mean, it's just based on the fact

that I don't know of any other algorithm,

434

:

which under the same premises, which can

do that.

435

:

So, I mean, it's an easy way out of making

any benchmark, but also a powerful one

436

:

because I really don't know who to compare

to.

437

:

But indeed, I think then it's...

438

:

As far as I'm concerned, I'm mostly

interested in developing methodologies.

439

:

I mean, that's just what I like to do.

440

:

But of course, what's important is that

those methods are going to work and they

441

:

are going to be useful to some communities

that really have research questions that

442

:

they want to answer.

443

:

I mean, research or not actually could be

engineering questions, decisions to be

444

:

taken that require to do an MCMC.

445

:

And I think the true tests of

446

:

whether or not the algorithm is useful is

going to be this, the test of time.

447

:

Are people adopting the algorithms?

448

:

Are they seeing that this is really

something that they can use and that would

449

:

make their inference work where they could

not find another method that was as

450

:

efficient?

451

:

And in this direction, there is the

cross-collaborator, Case Wong, who is

452

:

working at the Flatiron Institute and with

whom we developed a package that is called

453

:

FlowMC.

454

:

that is written in Jax and that implements

these algorithms.

455

:

And the idea was really to try to write a

package that was as user-friendly as

456

:

possible.

457

:

So of course we have the time we have to

take care of it and the experience we have

458

:

as a region, you know, available softwares

as we have, but we really try hard.

459

:

And at least in this community of people

studying gravitational waves, it seems

460

:

that people are really trying, starting to

use this in their research.

461

:

And so I'm excited, and I think it is

useful.

462

:

But it's not the proper benchmark you

would dream of.

463

:

Yeah, you just stole one of my questions.

464

:

Basically, I was exactly going to ask you,

but then how can people try these?

465

:

Is there a package somewhere?

466

:

So yeah, perfect.

467

:

That's called FlowMC, you told me.

468

:

Yes, it's called FlowMC.

469

:

You can pip install FlowMC, and you will

have it.

470

:

If you are allergic to Jax...

471

:

Right, I have it here.

472

:

Yeah, there is a read the docs.

473

:

So I'll put that in the show notes for

sure.

474

:

Yes, we have even documentation.

475

:

That's how far you go when you are

committed to having something that is used

476

:

and useful.

477

:

So I mean, of course, we are also open to

both comments and contributions.

478

:

So just write to us if you're interested.

479

:

Yeah, for sure.

480

:

Yeah, that folks, if you are interested in

contributing, if you see any bugs, make

481

:

sure to open some issues on the GitHub

repo or even better, contribute pull

482

:

requests.

483

:

I'm sure Marie-Doux and the co-authors

will be very happy about that.

484

:

Yes, you know typos in the documentation,

all of this.

485

:

Yeah, exactly.

486

:

That's what I...

487

:

I tell everyone also who wants to start

doing some open source package, start with

488

:

the smallest PRs.

489

:

You don't have to write a new algorithm,

like already fixing typos, making the

490

:

documentation look better, and stuff like

that.

491

:

That's extremely valuable, and that will

be appreciated.

492

:

So for sure, do that, folks.

493

:

Do not be shy with that kind of stuff.

494

:

So yeah, I put already the paper, you have

out an archive at adaptive Monte Carlo and

495

:

Flow MC, I put that in the show notes.

496

:

And yeah, to get back to what you were

saying, basically, I think as more of a

497

:

practitioner than a person who developed

the algorithms, I would say the reasons I

498

:

would...

499

:

you know, adopt that kind of new

algorithms would be that, well, I know,

500

:

okay, that algorithm is specialized,

especially for handling multimodels,

501

:

multimodels posterior.

502

:

So then I'd be, if I have a problem like

that, I'll be like, oh, okay, yeah, I can

503

:

use that.

504

:

And then also ease of adoption.

505

:

So is there an open source package in

which languages that can I just, you know,

506

:

What kind of trade-off basically do I have

to make?

507

:

Is that something that's easy to adopt?

508

:

Is that something that's really a lot of

barriers to adoptions?

509

:

But at the same time, it really seems to

be solving my problem.

510

:

You know what I'm saying?

511

:

It's like, indeed, it's not only the

technical and theoretical aspects of the

512

:

method, but also how easy it is to...

513

:

adopt in your existing workflows.

514

:

Yes.

515

:

And for this, I guess it's, I mean, the

feedback is extremely valuable because

516

:

when you know the methods, you're really,

it's hard to exactly locate where people

517

:

will not understand what you meant.

518

:

And so I really welcomed.

519

:

No, for sure.

520

:

And already I find that absolutely

incredible that now

521

:

Almost all new algorithms, at least that I

talk about on the podcast and that I see

522

:

in the community, on the PMC community,

almost all of them now, when they come up

523

:

with a paper, they come out with an open

source package that's usually installable

524

:

in a Python, in the Python ecosystem.

525

:

Which is really incredible.

526

:

I remember that when I started on these a

few years ago, it was really not the norm

527

:

and much more the exception and now almost

528

:

The Icon Panning open source package is

almost part of the paper, which is really

529

:

good because way more people are going to

use the package than read the paper.

530

:

So, this is absolutely a fantastic

evolution.

531

:

And thank you in the name of our soul to

have taken the time to develop the

532

:

package, clean up the code, put that on

PyPI and making the documentation because

533

:

That's where the academic incentives are a

bit disaligned with what I think they

534

:

should be.

535

:

Because unfortunately, literally it takes

time for you to do that.

536

:

And it's not very much appreciated by the

academic community, right?

537

:

It's just like, you have to do it, but

they don't really care.

538

:

We care as the practitioners, but the

academic world doesn't really.

539

:

And what counts is the paper.

540

:

So for now, unfortunately, it's really

just time that you take.

541

:

out of your paper writing time.

542

:

So I'm sure everybody appreciates it.

543

:

Yes, but I don't know.

544

:

I see true value to it.

545

:

And I think, although it's maybe not as

rewarded as it should, I think many of us

546

:

see value in doing it.

547

:

So you're very welcome.

548

:

Yeah, yeah.

549

:

No, for sure.

550

:

Lots of value in it.

551

:

Just saying that value should be more

recognized.

552

:

Just a random question, but something I'm

always curious about.

553

:

I think I know the answer if I still want

to ask.

554

:

Can you handle sample discrete parameters

with these algorithms?

555

:

Because that's one of the grails of the

field right now.

556

:

How do you sample discrete parameters?

557

:

So, okay, the pack, so what I've

implemented, tested, is all on continuous

558

:

space.

559

:

But, but what I need for this algorithm to

work is a generative model of which I can

560

:

sample from easily.

561

:

IID, I mean, not I have to make a Monte

Carlo to sample from my note that I can

562

:

just in one Python comment or whichever

language you want comment, gets an IID

563

:

sample from.

564

:

and that I can write what is the

likelihood of this sample.

565

:

Because a lot of generative models

actually don't have tractable likelihoods.

566

:

So if you think, I don't know, of

generative adversarial networks or

567

:

variational entoencoders for people who

might be familiar with those very, very

568

:

common generative models, they don't have

this property.

569

:

You can generate samples easily, but you

cannot write down with which density of

570

:

probability you've generated this sample.

571

:

This is really what we need in order to

use this generative model inside a Markov

572

:

chain and inside an algorithm that we know

is going to converge towards the target

573

:

distribution.

574

:

So normalizing flows are playing this role

for us with continuous variables.

575

:

They give us easy sampling and easy

evaluation of the likelihood.

576

:

But you also have equivalence on discrete

distributions.

577

:

And if you want...

578

:

generative model that would have those two

properties on discrete distribution, you

579

:

should turn yourself to autoregressive

models.

580

:

So I don't know if you've learned about

them, but the idea is just that they use a

581

:

factorization of probability distributions

that is just with conditional

582

:

distributions.

583

:

And that's something that is in theory has

full expressivity, that any distribution

584

:

can be written as a factorized

distribution where you are progressively

585

:

on the degrees of freedom that you have

already sampled.

586

:

And you can rewrite the algorithm,

training an autoregressive model in the

587

:

place of a normalizing flow.

588

:

So honest answer, I haven't tried, but it

can be done.

589

:

Well, it can be done.

590

:

And now that I'm thinking about it, people

have done it because in statistical

591

:

mechanics, there are a lot of systems that

we like.

592

:

a lot of our toy systems that are binary.

593

:

So that's, for example, the Ising model,

which are a model of spins that are just

594

:

binary variables.

595

:

And I know of at least one paper where

they are doing something of this sort.

596

:

So making jumps, they're actually not

trying to refresh full configurations, or

597

:

they are doing two, both refreshing full

configurations and partial configurations.

598

:

And they are doing...

599

:

something that, in essence, is exactly

this algorithm, but with discrete

600

:

variables.

601

:

So I'll happily add the reference to this

paper, which is, I think, it's by the

602

:

group of Giuseppe Carleo from EPFL.

603

:

And OK, I haven't, I don't think they

train exactly like, so it's not exactly

604

:

the same algorithm, but things around this

have been tested.

605

:

OK, well, it sounds like a.

606

:

Sounds like fun, for sure.

607

:

Definitely something I'm sure lots of

people would like to test.

608

:

So folks, if you have some discrete

parameters somewhere in your models, maybe

609

:

you'll be interested by normalizing flows.

610

:

So the flow in C package is in the show

notes.

611

:

Feel free to try it out.

612

:

Another thing I'm curious about is how do

you run the typical network, actually?

613

:

And how much of a bottleneck is it on the

sampling time, if any?

614

:

Yes.

615

:

So it will definitely depend on the space.

616

:

No, let me rewrite.

617

:

The thing is, whether or not it's going to

be worth it to train a neural network in

618

:

order to help you sampling.

619

:

depends on how difficult this for you to

sample in, I mean, with the more

620

:

traditional MCMCs that you have on your

hand.

621

:

So again, if you have a multimodal

distribution, it's very likely that your

622

:

traditional MCMC algorithms are just not

going to cut it.

623

:

And so then, I mean, if you really care

about sampling this posterior distribution

624

:

or this distribution of configurations of

a physical system,

625

:

then you will be willing to pay the price

on this sampling.

626

:

So instead of, say, having to use a local

sampler that will take you billions of

627

:

iterations in order to see transitions

between the modes, you can train a

628

:

normalizing flow on the autoregressive

model if you're discrete, and then have

629

:

those jumps happening every other time.

630

:

Then it's more than clear that it's worth

doing it.

631

:

OK, yeah, so the answer is it depends

quite a lot.

632

:

Of course, of course.

633

:

Yeah, yeah.

634

:

And I guess, how does it scale with the

quantity of parameters and quantity of

635

:

data?

636

:

So quantity of parameters, it's really

this dimension I was already discussing a

637

:

bit about and telling you that there is a

cap on what you can really expect these

638

:

methods will work on.

639

:

I would say that if the quantity of

parameters is something like tens or

640

:

hundreds, then things are going to work

well, more or less out of the box.

641

:

But if it's larger than this, you will

likely run into trouble.

642

:

And then the number of data is actually

something I'm less familiar with because

643

:

I'm less from the Bayesian communities

than the stat-mech community to start

644

:

with.

645

:

So my distribution doesn't have data

embedded in them, in a sense, most of the

646

:

time.

647

:

But for sure, what people argue, why it's

a really good idea to use generative

648

:

models such as normalizing flows to sample

in the Bayesian context.

649

:

is the fact that you have an amortization

going on.

650

:

And what do I mean by that?

651

:

I mean that you're learning a model.

652

:

Once it's learned, it's going to be easy

to adjust it if things are changing a

653

:

little.

654

:

And with little adjustments, you're going

to be able to sample still a very

655

:

complicated distribution.

656

:

So say you have data that is arriving

online, and you keep on having new samples

657

:

to be added to your posterior

distribution.

658

:

then it's very easy to just adjust the

normalizing flow with a few training

659

:

iterations to get back to the new

posterior you actually have now, given

660

:

that you have this amount of data.

661

:

So this is what some people call

amortization, the fact that you can really

662

:

encapsulate in your model all the

knowledge you have so far, and then just

663

:

adjust it a bit, and don't have to start

from scratch, as you would have to in

664

:

other.

665

:

Monte Carlo methods.

666

:

Yeah.

667

:

Yeah, so what I'm guessing is that maybe

the tuning time is a bit longer than a

668

:

classic HMC.

669

:

But then once you're out of the tuning

phase, the sampling is going to be way

670

:

faster.

671

:

Yes, I think that's a correct way of

putting it.

672

:

And otherwise, for the kind of the number

of, I mean, the dimensionality that the

673

:

algorithm is comfortable with.

674

:

In general, the running times of the

model, how have you noticed that being

675

:

like, has that been close to when you use

a classic HMC or is it something you

676

:

haven't done yet?

677

:

I don't think I can honestly answer this

question.

678

:

I think it will depend because it will

also depend how easily your HMC reaches

679

:

all the

680

:

regions you actually care about.

681

:

So I mean, probably there are some

distributions that are very easy for HMC

682

:

to cover and where it wouldn't be worth it

to train the model.

683

:

But then plenty of cases where things are

the other way around.

684

:

Yeah, yeah, yeah.

685

:

Yeah, I can guess.

686

:

That's always something that's really

fascinating in this algorithm world is how

687

:

dependent everything is on the model.

688

:

use case, really dependent on the model

and the data.

689

:

So on this project, on this algorithm,

what are the next steps for you?

690

:

What would you like to develop next on

this algorithm precisely?

691

:

Yes, so as I was saying, one of my main

questions is how to scale this algorithm

692

:

and

693

:

We kind of wrote it in an all-purpose

fashion.

694

:

And all-purpose is nice, but all-purpose

does not scale.

695

:

So that's really what I'm focusing on,

trying to understand how we can learn

696

:

structures we can know or we can learn

from the system, how to explore them and

697

:

put them in, in order to be able to tackle

more and more complex systems with higher,

698

:

I mean, more degrees of freedom.

699

:

So more parameters than what we are

currently doing.

700

:

So there's this.

701

:

And of course, I'm also very interested in

having some collaborations with people

702

:

that care about actual problem for which

this method is actually solving something

703

:

for them.

704

:

As it's really what gives you the idea of

what's next to be developed, what are the

705

:

next methodologies that's

706

:

will be useful to people?

707

:

Can they already solve their problem?

708

:

Do they need something more from you?

709

:

And that's the two things I'm having a

look at.

710

:

Yeah.

711

:

Well, it definitely sounds like fun.

712

:

And I hope you'll be able to work on that

and come up with some new, amazing,

713

:

exciting papers on this.

714

:

I'll be happy to look at that.

715

:

And so that's it.

716

:

It was a great deep dive on this project.

717

:

And thank you for indulging on my

questions, Marilou.

718

:

Now, if we want to de-zoom a bit and talk

about other things you do, you're also

719

:

interested to mention that in the context

of scarce data.

720

:

So I'm curious on what you're doing on

these, if you could elaborate a bit.

721

:

Yes, so I guess what I mean by scarce data

is precisely that when we are using

722

:

machine learning in scientific computing,

usually what we are doing is exploiting

723

:

the great tool that are deep neural

networks to play the role of a surrogate

724

:

model somewhere in our scientific

computation.

725

:

But most of the time, this is without data

a priori.

726

:

We know that there is a function we want

to approximate somewhere.

727

:

But in order to have data, either we have

to pay the price of costly experiments,

728

:

costly observations, or we have to pay the

price of costly numerics.

729

:

So if you, I mean, a very famous example

of applications of machine learning

730

:

through scientific computing is molecular

dynamics and quantum precision.

731

:

So this is what people call density

functional theory.

732

:

So if you want to.

733

:

observe the dynamics of a molecule with

the accuracy of what's going on really at

734

:

the level of quantum mechanics, then you

have to make very, very costly call to a

735

:

function that predicts what's the energy

predicted by quantum mechanics and what

736

:

are the forces predicted by quantum

mechanics.

737

:

So people have seen here an opportunity to

use deep neural nets in order to just

738

:

regress what's the value of this quantum

potential.

739

:

at the different locations that you're

going to visit.

740

:

And the idea is that you are creating your

own data.

741

:

You are deciding when you are going to pay

the price of do the full numerical

742

:

computation and then obtain a training

point of given Cartesian coordinates, what

743

:

is the value of this energy here.

744

:

And then you have to, I mean, conversely

to what you're doing traditionally in

745

:

machine learning, where you believe that

you have...

746

:

huge data sets that are encapsulating a

rule, and you're going to try to exploit

747

:

them at best.

748

:

Here, you have the choice of where you

create your data.

749

:

And so you, of course, have to be as smart

as possible in order to have to create as

750

:

little as possible training points.

751

:

And so this is this idea of working with

scarce data that has to be infused in the

752

:

usage of machine learning in scientific

computing.

753

:

My example of application is just what we

have discussed, where we want to learn a

754

:

deep generative model, whereas what we

start, we just have our target

755

:

distribution as an objective, but we don't

have any sample from it.

756

:

That would be the traditional data that

people will be using in generative

757

:

modeling to train a generative model.

758

:

So if you want, we are playing this

adaptive game.

759

:

I was already a bit eating at.

760

:

where we are creating data that is not

exactly the data we want, but that we

761

:

believe is informative of the data we want

to train the generative model that is in

762

:

turn going to help us to convert the MCMC

and in the same time as you are training

763

:

your model, generate the data you would

have needed to train your model.

764

:

Yeah, that is really cool.

765

:

And of course I asked about that because

scarce data is something that's extremely

766

:

common in the Bayesian world.

767

:

That's where usually Bayesian statistics

from the yeah, helpful and useful because

768

:

when you don't have a lot of data, you

need more structure and more priors.

769

:

So if you want to say anything about your

phenomenon of interest.

770

:

So that's really cool that you're working

on that.

771

:

I love that.

772

:

And from also, you know, a bit broader

perspective, you know, MCMC really well.

773

:

We work on it a lot.

774

:

So I'm curious where you think MCMC is

heading in the next few years.

775

:

And if you see its relevance waning in

some way.

776

:

Well, I don't think MCMC can go out of

fashion in a sense because it's absolutely

777

:

ubiquitous.

778

:

So practical use cases are everywhere.

779

:

If you have a large probabilistic model,

usually it's given to you by the nature of

780

:

the problem you want to study.

781

:

And if you cannot choose anything about

putting in the right properties, you're

782

:

just going to be.

783

:

you know, left with something that you

don't know how to approach except by MCMC.

784

:

So it's absolutely ubiquitous as an

algorithm for probabilistic inference.

785

:

And I would also say that one of the

things that are going to, you know, keep

786

:

MCMC going for a long time is how much

it's a cherished object of study by

787

:

actually researchers from different

communities, because I mean...

788

:

You can see people really from statistics

that are kind of the prime researchers on,

789

:

okay, how should you make a Monte Carlo

method that has the best convergence

790

:

properties, the best speed of convergence,

and so on and so forth.

791

:

But you can also see that the fields where

those algorithms are used a lot, be it

792

:

statistical mechanics, be it Bayesian

inference, also have full communities that

793

:

are working on developing MCMCs.

794

:

And so I think it's really a matter that

they are an object of curiosity and in

795

:

training to a lot of people.

796

:

And therefore it's something that's for

now is still very relevant and really

797

:

unsolved.

798

:

I mean, something that I love about MCMC

is that when you look at it first, you

799

:

say, yeah, that's simple, you know?

800

:

Yeah.

801

:

Yes, that's, but then you start thinking

about it.

802

:

Then you...

803

:

I mean, realize how subtle are all the

properties of those algorithms.

804

:

And you're telling yourself, but I cannot

believe it's so hard to actually sample

805

:

from distributions that are not that

complicated when you're a naive newcomer.

806

:

And so, yeah, I mean, for now, I think

they are still here and in place.

807

:

And if I could even comment a bit more

regarding exactly the context of my

808

:

research, where

809

:

it could seemingly be the case that I'm

trying to replace MCMC's with machine

810

:

learning.

811

:

I would warn the listeners that it's not

at all what we are concluding.

812

:

I mean, that's not at all the direction we

are going to.

813

:

It's really a case where we need both.

814

:

That MCMC can benefit from learning, but

learning without MCMC is never going to

815

:

give you something that you have enough

guarantees on, that something that you can

816

:

really trust for sure.

817

:

So I think here there is a really nice

combination of MCMC and learning and that

818

:

they're just going to nutter each other

and not replace one another.

819

:

Yeah, yeah, for sure.

820

:

And I really love the, yeah, that these

projects of trying to make basically MCMC

821

:

more informed instead of having first

random draws, you know, almost random

822

:

draws with Metropolis in the end.

823

:

making that more complicated, more

informed with the gradients, with HMC, and

824

:

then normalizing flows, which try to

squeeze a bit more information out of the

825

:

structure that you have to make the

sampling go faster.

826

:

I found that one super useful.

827

:

And also, yeah, that's also a very, very

fascinating part of the research.

828

:

And this is part also of a lot of the

research

829

:

a lot of initiatives that you have focused

on, right?

830

:

Personally, basically how that we could

decry it like a machine learning assisted

831

:

scientific computing.

832

:

You know, and do you have other examples

to share with us on how machine learning

833

:

is helping traditional scientific

computing methods?

834

:

Yes.

835

:

So, for example, I was giving already the

example of

836

:

of the learning of the regression of the

potentials of molecular force fields in

837

:

people that are studying molecules.

838

:

But we are seeing a lot of other things

going on.

839

:

So there are people that are trying to

even use machine learning as a black box

840

:

in order to, how should I say, to make

classifications between things they care

841

:

about.

842

:

So for example, you have samples that come

from a model.

843

:

But you're not sure if they come from this

model or this other one.

844

:

You're not sure if they are above a

critical temperature or below a critical

845

:

temperature, if they belong to the same

phase.

846

:

So you can really try to play this game of

creating an artificial data set where you

847

:

know what is the answer, train a

classifier, and then use your black box to

848

:

tell you when you see a new configuration

which type of configuration it is.

849

:

And it's really.

850

:

given to you by deep learning because you

would have no idea why the neural net is

851

:

deciding that it's actually from this or

from this.

852

:

You don't have any other statistics that

you can gather and that will tell you

853

:

what's the answer and this is why.

854

:

But it's kind of like opening this new

conceptual door that sometimes there are

855

:

things that are predictable.

856

:

I mean, you can check that, okay, on the

data that you know the answer of the

857

:

machine is extremely efficient.

858

:

But then you don't know why things are

happening this way.

859

:

I mean, there's this, but there are plenty

of other directions.

860

:

So people that are, for example, using

neural networks to try to discover a

861

:

model.

862

:

And here, model would be actually what

people call partial differential

863

:

equations, so PDEs.

864

:

So I don't know if you've heard about

those physics-informed neural networks.

865

:

But there are neural networks that people

are training, such that they are solution

866

:

of a PDE.

867

:

So instead of actually having training

data, what you do is that you use the

868

:

properties of the deep neural nets, which

are that they are differentiable with

869

:

respect to their parameters, but also with

respect to their inputs.

870

:

And for example, you have a function f.

871

:

And you know that the laplation of f is

supposed to be equal to.

872

:

the derivative in time of f, well, you can

write mean squared loss on the fact that

873

:

the laplacian of your neural network has

to be close to its derivative in time.

874

:

And then, given boundary conditions, so

maybe initial condition in time and

875

:

boundary condition in space, you can ask a

neural net to predict the solution of the

876

:

PDE.

877

:

And even better, you can give to your

878

:

learning mechanism a library of term that

would be possible candidates for being

879

:

part of the PDE.

880

:

And you can let the network tell you which

terms of the PDE in the library are

881

:

actually, seems to be actually in the data

you are observing.

882

:

So, I mean, there are all kinds of

inventive way that researchers are now

883

:

using the fact that deep neural nets are

differentiable.

884

:

smooth, can generalize easily, and yes,

those universal approximators.

885

:

I mean, seemingly you can use neural nets

to represent any kind of function and use

886

:

that inside their computation problems to

try to, I don't know, answer all kinds of

887

:

scientific questions.

888

:

So it's, I believe, pretty exciting.

889

:

Yeah, yeah, that is super fun.

890

:

I love how

891

:

You know, these comes together to help on

really hard sampling problems like

892

:

sampling ODE's or PDE's, just extremely

hard.

893

:

So yeah, using that.

894

:

Maybe one day also we'll get something for

GPs.

895

:

I know the Gaussian processes are a lot of

the effort is on decomposing them and

896

:

finding some useful

897

:

algebraic decompositions, so like the

helper space, Gaussian processes that Bill

898

:

Engels especially has added to the PrimeC

API, or eigenvalue decomposition, stuff

899

:

like that.

900

:

But I'd be curious to see if there are

also some initiatives on trying to help

901

:

the conversion of Gaussian processes using

probably deep neural networks, because

902

:

there is a mathematical connection between

neural networks and GPs.

903

:

I mean, everything is a GP in the end, it

seems.

904

:

So yeah, using a neural network to

facilitate the sampling of a Gaussian

905

:

process would be super fun.

906

:

So I have so many more questions.

907

:

But when I be mindful of your time, we've

already been recording for some time.

908

:

So I try to make my thoughts more packed.

909

:

But something I wanted to ask you

910

:

You teach actually a course in

Polytechnique in France that's called

911

:

Emerging Topics in Machine Learning.

912

:

So I'm curious to hear you say what are

some of the emerging topics that excite

913

:

you the most and how do you approach

teaching them?

914

:

So in this class, it's actually the nice

class where we have a wild card to just

915

:

talk about whatever we want.

916

:

So as far as I'm concerned, I'm really

teaching about the last point that we

917

:

discussed, which is how can we hope to use

the technology of machine learning to

918

:

assist scientific computing.

919

:

And I have colleagues that are jointly

teaching this class with me that are, for

920

:

example, teaching about optimal transport

or about private and federated learning.

921

:

So it can be different topics.

922

:

But we all have the same approach to it,

which is to introduce to the students the

923

:

main ideas quite briefly and then to give

them the opportunity to learn, to read

924

:

papers that we believe are important or at

least really illustrative of those ideas

925

:

and the direction in which the research is

going and to read these papers, of course,

926

:

critically.

927

:

So the idea is that we want to make sure

that they are understood.

928

:

We also want them to implement the

methods.

929

:

And once you implement the methods, you

realize everything that is sometimes under

930

:

the rug in the paper.

931

:

So where is it really difficult?

932

:

Where the method is really making a

difference?

933

:

And so on and so forth.

934

:

So that's our approach to it.

935

:

Yeah, that must be a very fun course.

936

:

At which level do you teach that?

937

:

So our students are third year at Ecole

Polytechnique.

938

:

So that would be equivalent to the first

year of graduate program.

939

:

Yeah.

940

:

And actually, looking forward, what do you

think are the most promising areas of

941

:

research in what you do?

942

:

So basically, interaction of machine

learning and statistical physics.

943

:

Well, I think something that actually has

been and will continue being a very, very

944

:

fruitful field between statistical

mechanics and machine learning are

945

:

generative models.

946

:

So you probably heard of diffusion models,

and there are new kind of generative

947

:

models that are relying on learning how to

reverse a diffusion process, a diffusion

948

:

process that is noising the data.

949

:

once you've learned how to reverse it,

will allow you to transform noise into

950

:

data.

951

:

It's something that is really close to

statistical mechanics because the

952

:

diffusion really comes from studying

brilliant particles that are all around

953

:

us.

954

:

And this is where this mathematics comes

from.

955

:

And this is still an object of study in

the field of statistical mechanics.

956

:

And you've served a lot of machine

learning models.

957

:

I could also cite Boltzmann machines.

958

:

I mean, they have even the name of the

father of statistical mechanics,

959

:

Boltzmann.

960

:

And it's here again, I mean, something

where it's really inspiration from the

961

:

model studied by physicists that gave the

first forms of models that were used by

962

:

machine learner in order to do density

estimation.

963

:

So there is really this cross-fatalization

964

:

has been here for, I guess, the last 50

years.

965

:

The field of machine learning has really

emerged in the communities.

966

:

And I'm hoping that my work and all the

groups that are working in this direction

967

:

are also going to demonstrate the other

way around, that generative models can

968

:

help also a lot in statistical mechanics.

969

:

So that's definitely what I am looking

forward to.

970

:

Yeah.

971

:

Yeah, I love that and understand why

you're talking about that, especially now

972

:

with the whole conversation we've had.

973

:

That your answer is not surprising to me.

974

:

Actually, something also that I mean, even

broader than that, I'm guessing you

975

:

already care a lot about these questions

from what I get, but if you could choose

976

:

the questions you'd like to see the answer

to before you die, what would they be?

977

:

That's obviously a very vast question.

978

:

If I stick to a bit really this...

979

:

what we've discussed about the sampling

problems and where I think they are hard

980

:

and why they are so intriguing.

981

:

I think that something I'm very keen on

seeing some progress around is this

982

:

question of sampling multimodal

distributions but have come up with

983

:

guarantees.

984

:

Here, there's really, in a sense, sampling

a multimodal distribution could be just

985

:

judged.

986

:

undoable.

987

:

I mean, there is some NP-hardness that is

hidden somewhere in this picture.

988

:

So of course, it's not going to be

something general, but I'm really

989

:

wondering, I mean, I'm really thinking

that there should be some assumption, some

990

:

way of formalizing the problem under which

we could understand how to construct

991

:

algorithms that will probably, you know,

succeed in making this something happen.

992

:

And so here, I don't know, it's a

theoretical question, but I'm

993

:

very curious about what we will manage to

say in this direction.

994

:

Yeah.

995

:

And actually that sets us up, I think, for

the last two questions of the show.

996

:

So, I mean, I have other questions, but

already I've been recording for a long

997

:

time.

998

:

So I need to let you go and have dinner.

999

:

I know it's late for you.

:

01:07:27,196 --> 01:07:29,896

So let me ask you the last two questions.

:

01:07:29,896 --> 01:07:32,437

I ask every guest at the end of the show.

:

01:07:33,097 --> 01:07:33,874

First one.

:

01:07:33,874 --> 01:07:38,321

If you had unlimited time and resources,

which problem would you try to solve?

:

01:07:40,562 --> 01:07:45,244

I think it's an excellent question because

it's an excellent opportunity maybe to say

:

01:07:45,244 --> 01:07:49,485

that we don't have unlimited resources.

:

01:07:50,306 --> 01:07:57,149

I think it's probably the biggest

challenge we have right now to understand

:

01:07:57,149 --> 01:08:02,631

and to collectively understand because I

think now we individually understand that

:

01:08:02,631 --> 01:08:04,792

we don't have unlimited resources.

:

01:08:05,132 --> 01:08:07,913

And in a sense the...

:

01:08:08,834 --> 01:08:14,156

the biggest problem is how do we move this

complex system of human societies we have

:

01:08:14,196 --> 01:08:19,778

created in order to move within the

direction where we are using precisely

:

01:08:19,778 --> 01:08:21,058

less resources.

:

01:08:21,279 --> 01:08:25,881

And I mean, it has nothing to do with

anything that we have discussed before,

:

01:08:25,881 --> 01:08:32,083

but it feels to me that it's really where

the biggest question is lying that really

:

01:08:32,083 --> 01:08:33,304

matters today.

:

01:08:33,504 --> 01:08:35,965

And I have no clue how to approach it.

:

01:08:36,425 --> 01:08:37,105

But

:

01:08:38,046 --> 01:08:39,606

I think it's actually what matters.

:

01:08:39,606 --> 01:08:46,389

And if I had a limit in time and

resources, that's definitely what I would

:

01:08:46,529 --> 01:08:47,989

be researching towards.

:

01:08:49,270 --> 01:08:51,791

Yeah.

:

01:08:51,791 --> 01:08:52,431

Love that answer.

:

01:08:52,431 --> 01:08:54,752

And you're definitely in good company.

:

01:08:54,932 --> 01:08:59,494

Lots of people have talked about that for

this question, actually.

:

01:08:59,875 --> 01:09:04,076

And second question, if you could have

dinner with any great scientific mind,

:

01:09:04,136 --> 01:09:07,417

dead, alive, or fictional, who would it

be?

:

01:09:09,518 --> 01:09:14,201

So, I mean, a logic answer with my last

response is actually Grotendieck.

:

01:09:14,201 --> 01:09:20,584

So, I don't know, you probably know about

this mathematician who, I mean, was

:

01:09:20,744 --> 01:09:27,988

somebody worried about, you know, our

relationship to the world, let's say, as

:

01:09:27,988 --> 01:09:34,732

scientists very early on, and who had

concluded that to some extent we should

:

01:09:34,732 --> 01:09:36,213

not be doing research.

:

01:09:36,573 --> 01:09:37,253

So...

:

01:09:38,174 --> 01:09:44,835

I don't know that I agree, but I also

don't think it's obviously wrong.

:

01:09:44,835 --> 01:09:50,617

So I think it would be really probably one

of the most interesting discussion to be

:

01:09:50,617 --> 01:09:54,258

added on top that he was a fantastic

speaker.

:

01:09:54,258 --> 01:09:58,579

And I do invite you to listen to his

conferences and that it would be really

:

01:09:58,579 --> 01:10:00,580

fascinating to have this conversation.

:

01:10:01,340 --> 01:10:02,120

Yeah.

:

01:10:02,180 --> 01:10:02,920

Great.

:

01:10:03,420 --> 01:10:04,061

Great answer.

:

01:10:04,061 --> 01:10:06,741

You know, definitely the first one to

answer Grotendic.

:

01:10:07,922 --> 01:10:09,402

But that'd be cool.

:

01:10:09,402 --> 01:10:09,542

Yeah.

:

01:10:09,542 --> 01:10:14,624

If you have a favorite conference of him,

feel free to put that in the show notes

:

01:10:14,624 --> 01:10:19,266

for listeners, I think it's going to be

really interesting and fun for people.

:

01:10:19,727 --> 01:10:21,247

Might be in French, but...

:

01:10:22,328 --> 01:10:26,369

I mean, there are a lot of subtitles now.

:

01:10:26,369 --> 01:10:31,932

If it's in YouTube, it's doing a pretty

good job at the automated transcription,

:

01:10:31,932 --> 01:10:32,772

especially in English.

:

01:10:32,772 --> 01:10:35,213

So I think it will be okay.

:

01:10:36,498 --> 01:10:40,059

And that will be good for people's French

lessons.

:

01:10:40,059 --> 01:10:43,961

So yeah, you know, two birds with one

stone.

:

01:10:44,241 --> 01:10:45,981

So definitely include that now.

:

01:10:47,642 --> 01:10:48,563

Awesome, Marie-Lou.

:

01:10:48,563 --> 01:10:51,124

So that was really great.

:

01:10:51,124 --> 01:10:54,705

Thanks a lot for taking the time and being

so generous with your time.

:

01:10:55,546 --> 01:10:59,268

I'm happy because I had a lot of

questions, but I think we did a pretty

:

01:10:59,268 --> 01:11:02,789

good job at tackling most of them.

:

01:11:03,029 --> 01:11:03,969

As usual,

:

01:11:04,070 --> 01:11:08,375

I put resources and a link to your website

in the show notes for those who want to

:

01:11:08,375 --> 01:11:09,316

dig deeper.

:

01:11:09,316 --> 01:11:12,580

Thank you again, Marie-Lou, for taking the

time and being on this show.

:

01:11:13,301 --> 01:11:14,883

Thank you so much for having me.