#103 Improving Sampling Algorithms & Prior Elicitation, with Arto Klami

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Changing perspective is often a great way to solve burning research problems. Riemannian spaces are such a perspective change, as Arto Klami, an Associate Professor of computer science at the University of Helsinki and member of the Finnish Center for Artificial Intelligence, will tell us in this episode.

He explains the concept of Riemannian spaces, their application in inference algorithms, how they can help sampling Bayesian models, and their similarity with normalizing flows, that we discussed in episode 98.

Arto also introduces PreliZ, a tool for prior elicitation, and highlights its benefits in simplifying the process of setting priors, thus improving the accuracy of our models.

When Arto is not solving mathematical equations, you’ll find him cycling, or around a good board game.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Takeaways:

- Riemannian spaces offer a way to improve computational efficiency and accuracy in Bayesian inference by considering the curvature of the posterior distribution.

- Riemannian spaces can be used in Laplace approximation and Markov chain Monte Carlo algorithms to better model the posterior distribution and explore challenging areas of the parameter space.

- Normalizing flows are a complementary approach to Riemannian spaces, using non-linear transformations to warp the parameter space and improve sampling efficiency.

- Evaluating the performance of Bayesian inference algorithms in challenging cases is a current research challenge, and more work is needed to establish benchmarks and compare different methods.

- PreliZ is a package for prior elicitation in Bayesian modeling that facilitates communication with users through visualizations of predictive and parameter distributions.

- Careful prior specification is important, and tools like PreliZ make the process easier and more reproducible.

- Teaching Bayesian machine learning is challenging due to the combination of statistical and programming concepts, but it is possible to teach the basic reasoning behind Bayesian methods to a diverse group of students.

- The integration of Bayesian approaches in data science workflows is becoming more accepted, especially in industries that already use deep learning techniques.

- The future of Bayesian methods in AI research may involve the development of AI assistants for Bayesian modeling and probabilistic reasoning.

Chapters:

00:00 Introduction and Background

02:05 Arto's Work and Background

06:05 Introduction to Bayesian Inference

12:46 Riemannian Spaces in Bayesian Inference

27:24 Availability of Romanian-based Algorithms

30:20 Practical Applications and Evaluation

37:33 Introduction to Prelease

38:03 Prior Elicitation

39:01 Predictive Elicitation Techniques

39:30 PreliZ: Interface with Users

40:27 PreliZ: General Purpose Tool

41:55 Getting Started with PreliZ

42:45 Challenges of Setting Priors

45:10 Reproducibility and Transparency in Priors

46:07 Integration of Bayesian Approaches in Data Science Workflows

55:11 Teaching Bayesian Machine Learning

01:06:13 The Future of Bayesian Methods with AI Research

01:10:16 Solving the Prior Elicitation Problem

Links from the show:

LBS #29, Model Assessment, Non-Parametric Models, And Much More, with Aki Vehtari: https://learnbayesstats.com/episode/model-assessment-non-parametric-models-aki-vehtari/
LBS #20 Regression and Other Stories, with Andrew Gelman, Jennifer Hill & Aki Vehtari: https://learnbayesstats.com/episode/20-regression-and-other-stories-with-andrew-gelman-jennifer-hill-aki-vehtari/
LBS #98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié: https://learnbayesstats.com/episode/98-fusing-statistical-physics-machine-learning-adaptive-mcmc-marylou-gabrie/
Arto’s website: https://www.cs.helsinki.fi/u/aklami/
Arto on Google Scholar: https://scholar.google.com/citations?hl=en&user=v8PeLGgAAAAJ
Multi-source probabilistic inference Group: https://www.helsinki.fi/en/researchgroups/multi-source-probabilistic-inference
FCAI web page: https://fcai.fi
Probabilistic AI summer school lectures: https://www.youtube.com/channel/UCcMwNzhpePJE3xzOP_3pqsw
Keynote: "Better priors for everyone" by Arto Klami: https://www.youtube.com/watch?v=mEmiEHsfWyc&ab_channel=ProbabilisticAISchool
Variational Inference and Optimization I by Arto Klami: https://www.youtube.com/watch?v=60USDNc1nE8&list=PLRy-VW__9hV8s--JkHXZvnd26KgjRP2ik&index=3&ab_channel=ProbabilisticAISchool
PreliZ, A tool-box for prior elicitation: https://preliz.readthedocs.io/en/latest/
AISTATS paper that presents the new computationally efficient metric in context of MCMC: https://researchportal.helsinki.fi/en/publications/lagrangian-manifold-monte-carlo-on-monge-patches
TMLR paper that scales up the solution for larger models, using the metric for sampling-based inference in deel learning: https://openreview.net/pdf?id=dXAuvo6CGI
Riemannian Laplace approximation (to appear in AISTATS’24): https://arxiv.org/abs/2311.02766
Prior Knowledge Elicitation -- The Past, Present, and Future: https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Prior-Knowledge-Elicitation-The-Past-Present-and-Future/10.1214/23-BA1381.full

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:00

Let me show you how to be a good b...

2

: 00:00:28

how they can help sampling Bayesian models

and their similarity with normalizing

3

: 00:00:33

flows that we discussed in episode 98.

4

: 00:00:37

ARTO also introduces Prelease, a tool for

prior elicitation, and highlights its

5

: 00:00:41

benefits in simplifying the process of

setting priors, thus improving the

6

: 00:00:46

accuracy of our models.

7

: 00:00:47

When ARTO is not solving mathematical

equations, you'll find him cycling or

8

: 00:00:52

around the good board game.

9

: 00:00:54

This is Learning Bayesian Statistics,

episode 103.

10

: 00:00:57

recorded February 15, 2024.

11

: 00:01:15

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

12

: 00:01:22

methods, the projects, and the people who

make it possible.

13

: 00:01:25

I'm your host.

14

: 00:01:26

You can follow me on Twitter at Alex

underscore and Dora like the country for

15

: 00:01:32

any info about the show.

16

: 00:01:34

Learnbasedats .com is left last to me.

17

: 00:01:36

Show notes, becoming a corporate sponsor,

unlocking Bayesian Merch, supporting the

18

: 00:01:41

show on Patreon.

19

: 00:01:42

Everything is in there.

20

: 00:01:44

That's Learnbasedats .com.

21

: 00:01:45

If you're interested in one -on -one

mentorship, online courses, or statistical

22

: 00:01:50

consulting, feel free to reach out and

book a call at topmate .io slash Alex

23

: 00:01:55

underscore.

24

: 00:01:56

And Dora, see you around, folks, and best

wishes to you all.

25

: 00:02:05

Clemmy, welcome to Layer Name Patient

Statistics.

26

: 00:02:09

Thank you.

27

: 00:02:10

You're welcome.

28

: 00:02:11

How was my Finnish pronunciation?

29

: 00:02:13

Oh, I think that was excellent.

30

: 00:02:16

For people who don't have the video, I

don't think that was true.

31

: 00:02:24

So thanks a lot for taking the time,

Artho.

32

: 00:02:26

I'm really happy to have you on the show.

33

: 00:02:29

And I've had a lot of questions for you

for a long time, and the longer we

34

: 00:02:34

postpone the episode, the more questions.

35

: 00:02:38

So I'm gonna do my best to not take three

hours of your time.

36

: 00:02:42

And let's start by...

37

: 00:02:44

maybe defining the work you're doing

nowadays and well, how do you end up

38

: 00:02:50

working on this?

39

: 00:02:52

Yes, sure.

40

: 00:02:54

So I personally identify as a machine

learning researcher.

41

: 00:02:59

So I do machine learning research, but

very much from a Bayesian perspective.

42

: 00:03:05

So my original background is in computer

science.

43

: 00:03:08

I'm essentially a self -educated

statistician in the sense that I've never

44

: 00:03:13

really

45

: 00:03:14

kind of studied properly statistics

design, well except for a few courses here

46

: 00:03:20

and there.

47

: 00:03:21

But I've been building models, algorithms,

building on the Bayesian principles for

48

: 00:03:27

addressing various kinds of machine

learning problems.

49

: 00:03:32

So you're basically like a self -taught

statistician through learning, let's say.

50

: 00:03:40

More or less, yes.

51

: 00:03:41

I think the first things I started doing,

52

: 00:03:44

with anything that had to do with Bayesian

statistics was pretty much already going

53

: 00:03:49

to the deep end and trying to learn

posterior inference for fairly complicated

54

: 00:03:55

models, even actually non -parametric

models in some ways.

55

: 00:04:00

Yeah, we're going to dive a bit on that.

56

: 00:04:05

Before that, can you tell us the topics

you are particularly focusing on through

57

: 00:04:12

that?

58

: 00:04:13

umbrella of topics you've named.

59

: 00:04:15

Yes, absolutely.

60

: 00:04:17

So I think I actually have a few somewhat

distinct areas of interest.

61

: 00:04:23

So on one hand, I'm working really on the

kind of core inference problem.

62

: 00:04:27

So how do we computationally efficiently,

accurately enough approximate the

63

: 00:04:33

posterior distributions?

64

: 00:04:36

Recently, we've been especially working on

inference algorithms that build on

65

: 00:04:40

concepts from Riemannian geometry.

66

: 00:04:42

So we're trying to really kind of account

the actual manifold induced by this

67

: 00:04:48

posterior distribution and try to somehow

utilize these concepts to kind of speed up

68

: 00:04:53

inference.

69

: 00:04:54

So that's kind of one very technical

aspect.

70

: 00:04:58

Then there's the other main theme on the

kind of Bayesian side is on priors.

71

: 00:05:04

So we'll be working on prior elicitation.

72

: 00:05:06

So how do we actually go about specifying

the prior distributions?

73

: 00:05:11

and ideally maybe not even specifying.

74

: 00:05:14

So how would we extract that knowledge

from a domain expert who doesn't

75

: 00:05:19

necessarily even have any sort of

statistical training?

76

: 00:05:23

And how do we flexibly represent their

true beliefs and then encode them as part

77

: 00:05:28

of a model?

78

: 00:05:29

That's maybe the main kind of technical

aspects there.

79

: 00:05:35

Yeah.

80

: 00:05:35

Yeah.

81

: 00:05:36

No, super fun.

82

: 00:05:38

And we're definitely going to dive into

those two aspects a bit later in the show.

83

: 00:05:43

I'm really interested in that.

84

: 00:05:46

Before that, do you remember how you first

got introduced to Bayesian inference,

85

: 00:05:51

actually, and also why it sticks with you?

86

: 00:05:54

Yeah, like I said, I'm in some sense self

-trained.

87

: 00:05:59

I mean, coming with the computer science

background, we just, more or less,

88

: 00:06:04

sometime during my PhD,

89

: 00:06:05

I was working in a research group that was

led by Samuel Kaski.

90

: 00:06:11

When I joined the group, we were working

on neural networks of the kind that people

91

: 00:06:18

were interested in.

92

: 00:06:19

That was like 20 years ago.

93

: 00:06:20

So we were working on things like self

-organizing maps and these kind of

94

: 00:06:24

methods.

95

: 00:06:25

And then we started working on

applications where we really bumped into

96

: 00:06:30

the kind of small sample size problems.

97

: 00:06:32

So looking at...

98

: 00:06:34

DNA microarray data that was kind of tens

of thousands of dimensions and medical

99

: 00:06:40

applications with 20 samples.

100

: 00:06:42

So we essentially figured out that we're

gonna need to take the kind of uncertainty

101

: 00:06:47

into account properly.

102

: 00:06:48

Started working on the Bayesian modeling

side of these and one of the very first

103

: 00:06:53

things I was doing is kind of trying to

create Bayesian versions of some of these

104

: 00:07:00

classical analysis methods that were

105

: 00:07:02

especially canonical correlation analysis.

106

: 00:07:04

That's the original derivation is like an

information theoretic formulation.

107

: 00:07:09

So I kind of dive directly into this that

let's do Bayesian versions of models.

108

: 00:07:16

But I actually do remember that around the

same time I also took a course, a course

109

: 00:07:22

by Akivehtari.

110

: 00:07:24

He's his author of this Gelman et al.

111

: 00:07:26

book, one of the authors.

112

: 00:07:27

I think the first version of the book had

been released.

113

: 00:07:30

just before that.

114

: 00:07:32

So Aki was giving a course where he was

teaching based on that book.

115

: 00:07:36

And I think that's the kind of first real

official contact on trying to understand

116

: 00:07:40

the actual details behind the principles.

117

: 00:07:46

Yeah, and actually I'm pretty sure

listeners are familiar with Aki.

118

: 00:07:52

He's been on the show already, so I'll

link to the episode, of course, where Aki

119

: 00:07:59

was.

120

: 00:08:00

And yeah, for sure.

121

: 00:08:02

I also recommend going through these

episodes, show notes for people who are

122

: 00:08:08

interested in, well, starting learning

about basic stuff and things like that.

123

: 00:08:14

Something I'm wondering from what you just

explained is, so you define yourself as a

124

: 00:08:23

machine learning researcher, right?

125

: 00:08:25

And you work in artificial intelligence

too.

126

: 00:08:27

But there is this interaction with the

Bayesian framework.

127

: 00:08:31

How does that framework underpin your

research in statistical machine learning

128

: 00:08:36

and artificial intelligence?

129

: 00:08:37

How does that all combine?

130

: 00:08:40

Yeah.

131

: 00:08:42

Well, that's a broad topic.

132

: 00:08:45

There's of course a lot in that

intersection.

133

: 00:08:49

I personally do view all learning problems

in some sense from a Bayesian perspective.

134

: 00:08:56

I mean, no matter what kind of a, whether

it's a very simple fitting a linear

135

: 00:09:02

regression type of a problem or whether

it's figuring out the parameters of a

136

: 00:09:07

neural network with 1 billion parameters,

it's ultimately still a statistical

137

: 00:09:12

inference problem.

138

: 00:09:14

I mean, most of the cases, I'm quite

confident that we can't figure out the

139

: 00:09:20

parameters exactly.

140

: 00:09:21

We need to somehow quantify for the

uncertainty.

141

: 00:09:25

I'm not really aware of any other kind of

principled way of doing it.

142

: 00:09:29

So I would just kind of think about it

that we're always doing Bayesian inference

143

: 00:09:33

in some sense.

144

: 00:09:35

But then there's the issue of how far can

we go in practice?

145

: 00:09:38

So it's going to be approximate.

146

: 00:09:40

It's possibly going to be very crude

approximations.

147

: 00:09:44

But I would still view it through the lens

of Bayesian statistics in my own work.

148

: 00:09:50

And that's what I do when I teach for my

BSc students, for example.

149

: 00:09:54

I mean not all of them explicitly

formulate the learning algorithms kind of

150

: 00:09:59

from these perspectives but we are still

kind of talking about that what's the

151

: 00:10:03

relationship what can we assume about the

algorithms what can we assume about the

152

: 00:10:07

result and how would it relate to like

like properly estimating everything

153

: 00:10:13

through kind of exactly how it should be

done.

154

: 00:10:17

Yeah okay that's an interesting

perspective yeah so basically putting that

155

: 00:10:21

in a in that framework.

156

: 00:10:23

And that means, I mean, that makes me

think then, how does that, how do you

157

: 00:10:32

believe, what do you believe, sorry, the

impact of Bayesian machine learning is on

158

: 00:10:40

the broader field of AI?

159

: 00:10:42

What does that bring to that field?

160

: 00:10:46

It's a, let's say it has a big effect.

161

: 00:10:51

It has a very big impact in a sense that

pretty much most of the stuff that is

162

: 00:10:57

happening on the machine learning front

and hence also on the kind of all learning

163

: 00:11:01

based AI solutions.

164

: 00:11:03

It is ultimately, I think a lot of people

are thinking about roughly in the same way

165

: 00:11:07

as I am, that there is an underlying

learning problem that we would ideally

166

: 00:11:11

want to solve more or less following

exactly the Bayesian principles.

167

: 00:11:17

don't necessarily talk about it from this

perspective.

168

: 00:11:19

So you might be happy to write algorithms,

all the justification on the choices you

169

: 00:11:26

make comes from somewhere else.

170

: 00:11:28

But I think a lot of people are kind of

accepting that it's the kind of

171

: 00:11:33

probabilistic basis of these.

172

: 00:11:35

So for instance, I think if you think

about the objectives that people are

173

: 00:11:42

optimizing in deep learning, they're all

essentially likelihoods of some

174

: 00:11:46

assume probabilistic model.

175

: 00:11:48

Most of the regularizers they are

considering do have an interpretation of

176

: 00:11:54

some kind of a prior distribution.

177

: 00:11:57

I think a lot of people are all the time

going deeper and deeper into actually

178

: 00:12:02

explicitly thinking about it from these

perspectives.

179

: 00:12:04

So we have a lot of these deep learning

type of approaches, various autoencoders,

180

: 00:12:11

Bayesian neural networks, various kinds of

generative AI models that are

181

: 00:12:16

They are actually even explicitly

formulated as probabilistic models and

182

: 00:12:19

some sort of an approximate inference

scheme.

183

: 00:12:22

So I think the kind of these things are,

they are the same two sides of the same

184

: 00:12:27

coin.

185

: 00:12:27

People are kind of more and more thinking

about them from the same perspective.

186

: 00:12:32

Okay, yeah, that's super interesting.

187

: 00:12:35

Actually, let's start diving into these

topics from a more technical perspective.

188

: 00:12:43

So you've mentioned the

189

: 00:12:46

research and advances you are working on

regarding Romanian spaces.

190

: 00:12:52

So I think it'd be super fun to talk about

that because we've never really talked

191

: 00:12:58

about it on the show.

192

: 00:13:00

So maybe can you give listeners a primer

on what a Romanian space is?

193

: 00:13:07

Why would you even care about that?

194

: 00:13:09

And what you are doing in this regard,

what your research is in this regard.

195

: 00:13:15

Yes, let's try.

196

: 00:13:17

I mean, this is a bit of a mathematical

concept to talk about.

197

: 00:13:20

But I mean, ultimately, if you think about

most of the learning algorithms, so we are

198

: 00:13:26

kind of thinking that there are some

parameters that live in some space.

199

: 00:13:30

So we essentially, without thinking about

it, that we just assume that it's a

200

: 00:13:34

Euclidean space in a sense that we can

measure distances between two parameters,

201

: 00:13:40

that how similar they are.

202

: 00:13:42

It doesn't matter which direction we go,

if the distance is the same, we think that

203

: 00:13:46

they are kind of equally far away.

204

: 00:13:49

So now a Riemannian geometry is one that

is kind of curved in some sense.

205

: 00:13:55

So we may be stretching the space in

certain ways and we'll be doing this

206

: 00:14:00

stretching locally.

207

: 00:14:02

So what it actually means, for example, is

that the shortest path between two

208

: 00:14:07

possible

209

: 00:14:08

values, maybe for example two parameter

configurations, that if you start

210

: 00:14:12

interpolating between two possible values

for a parameter, it's going to be a

211

: 00:14:17

shortest path in this Riemannian geometry,

which is not necessarily a straight line

212

: 00:14:22

in an underlying Euclidean space.

213

: 00:14:26

So that's what the Riemannian geometry is

in general.

214

: 00:14:30

So it's kind of the tools and machinery we

need to work with these kind of settings.

215

: 00:14:35

And now then the relationship to

statistical inference comes from trying to

216

: 00:14:41

define such a Riemannian space that it has

somehow nice characteristics.

217

: 00:14:46

So maybe the concept that most of the

people actually might be aware of would be

218

: 00:14:52

the Fisher information matrix that kind of

characterizes the kind of the curvature

219

: 00:15:00

induced by a particular probabilistic

model.

220

: 00:15:03

So these tools kind of then allow, for

example, a very recent thing that we did,

221

: 00:15:08

it's going to come out later this spring

in AI stats, is an extension of the

222

: 00:15:14

Laplace approximation in a Riemannian

geometry.

223

: 00:15:19

So those of you who know what the Laplace

approximation is, it's essentially just

224

: 00:15:22

fitting a normal distribution at the mode

of a distribution.

225

: 00:15:26

But if we now fit the same normal

distribution in a suitably chosen

226

: 00:15:30

Riemannian space,

227

: 00:15:32

we can actually model also the kind of

curvature of the posterior mode and even

228

: 00:15:38

kind of how it stretches.

229

: 00:15:39

So we get a more flexible approximation.

230

: 00:15:42

We are still fitting a normal

distribution.

231

: 00:15:44

We're just doing it in a different space.

232

: 00:15:48

Not sure how easy that was to follow, but

at least maybe it gives some sort of an

233

: 00:15:52

idea.

234

: 00:15:53

Yeah, yeah, yeah.

235

: 00:15:55

That was actually, I think, a pretty

approachable.

236

: 00:16:01

introduction and so if I understood

correctly then you're gonna use these

237

: 00:16:11

Romanian approximations to come up with

better algorithms is that what you do and

238

: 00:16:18

why you focus on Romanian spaces and yeah

if you can if you can introduce that and

239

: 00:16:25

tell us basically why that is interesting

to then look

240

: 00:16:29

at geometry from these different ways

instead of the classical Euclidean way of

241

: 00:16:36

things geometry.

242

: 00:16:38

Yeah, I think that's exactly what it is

about.

243

: 00:16:41

So one other thing, maybe another

perspective of thinking about it is that

244

: 00:16:45

we've also been doing Markov chain Monte

Carlo algorithms, so MCMC in these

245

: 00:16:50

Riemannian spaces.

246

: 00:16:51

And what we can achieve with those is that

if you have, let's say, a posterior

247

: 00:16:56

distribution,

248

: 00:16:57

that has some sort of a narrow funnel,

some very narrow area that extends far

249

: 00:17:03

away in one corner of your parameter

space.

250

: 00:17:06

It's actually very difficult to get there

with something like standard Hamiltonian

251

: 00:17:10

Monte Carlo, but with the Riemannian

methods we can kind of make these narrow

252

: 00:17:15

funnels equally easy compared to the

flatter areas.

253

: 00:17:19

Now of course this may sound like a magic

bullet that we should be doing all

254

: 00:17:23

inference with these techniques.

255

: 00:17:24

Of course it does come with

256

: 00:17:26

certain computational challenges.

257

: 00:17:28

So we do need to be, like I said, the

shortest paths are no longer straight

258

: 00:17:33

lines.

259

: 00:17:33

So we need numerical integration to follow

the geodesic paths in these metrics and so

260

: 00:17:38

on.

261

: 00:17:38

So it's a bit of a compromise, of course.

262

: 00:17:40

So they have very nice theoretical

properties.

263

: 00:17:43

We've been able to get them working also

in practice in many cases so that they are

264

: 00:17:46

kind of comparable with the current state

of the art.

265

: 00:17:50

But it's not always easy.

266

: 00:17:53

Yeah, there is no free lunch.

267

: 00:17:55

Yes.

268

: 00:17:56

Yeah.

269

: 00:17:56

Yeah.

270

: 00:17:57

Do you have any resources about these?

271

: 00:18:05

Well, first the concepts of Romanian

spaces and then the algorithms that you

272

: 00:18:12

folks derived in your group using these

Romanian space for people who are

273

: 00:18:17

interested?

274

: 00:18:19

Yeah, I think I wouldn't know, let's say a

very particular

275

: 00:18:25

reasons I would recommend on the Romanian

geometry.

276

: 00:18:28

It is actually a rather, let's say,

mathematically involved topic.

277

: 00:18:33

But regarding the specific methods, I

think they are...

278

: 00:18:37

It's a couple of my recent papers, so we

have this Laplace approximation is coming

279

: 00:18:42

out in AI stats this year.

280

: 00:18:45

The MCMC sampler we had, I think, two

years ago in AI stats, similarly, the

281

: 00:18:51

first MCMC method building on these and

then...

282

: 00:18:53

last year one paper on transactions of

machine learning research.

283

: 00:18:58

I think they are more or less accessible.

284

: 00:19:03

Let's definitely link to those papers if

you can in the show notes because I'm

285

: 00:19:09

personally curious about it but also I

think listeners will be.

286

: 00:19:14

It sounds from what you're saying that

this idea of doing algorithms in this

287

: 00:19:20

Romanian space is

288

: 00:19:22

somewhat recent.

289

: 00:19:24

Am I right?

290

: 00:19:26

And why would it appear now?

291

: 00:19:28

Why would it become interesting now?

292

: 00:19:31

Well, it's not actually that recent.

293

: 00:19:33

I think the basic principle goes back, I

don't know, maybe 20 years or so.

294

: 00:19:40

I think the main reason why we've been

working on this right now is that the

295

: 00:19:46

We've been able to resolve some of the

computational challenges.

296

: 00:19:50

So the fundamental problem with these

models is always this numeric integration

297

: 00:19:54

of following the shortest paths depending

on an algorithm we needed for different

298

: 00:19:59

reasons, but we always needed to do it,

which usually requires operations like

299

: 00:20:04

inversion of a metric tensor, which has

the kind of a dimensionality of the

300

: 00:20:10

parameter space.

301

: 00:20:11

So we came up with the particular metric.

302

: 00:20:15

that happens to have computationally

efficient inverse.

303

: 00:20:20

So there's kind of this kind of concrete

algorithmic techniques that are kind of

304

: 00:20:24

bringing the computational cost to the

level so that it's no longer notably more

305

: 00:20:32

expensive than doing kind of standard

Euclidean methods.

306

: 00:20:35

So we can, for example, scale them for

Bayesian neural networks.

307

: 00:20:39

That's one of the application cases we are

looking at.

308

: 00:20:41

We are really having very high

-dimensional problems but still able to do

309

: 00:20:47

some of these Riemannian techniques or

approximations of them.

310

: 00:20:52

That was going to be my next question.

311

: 00:20:55

In which cases are these approximations

interesting?

312

: 00:21:00

In which cases would you recommend

listeners to actually invest time to

313

: 00:21:05

actually use these techniques because they

have a better chance of working than the

314

: 00:21:10

classic Hamiltonian Monte Carlo semper

that are the default in most probabilistic

315

: 00:21:16

languages?

316

: 00:21:17

Yeah, I think the easy answer is that when

the inference problem is hard.

317

: 00:21:23

So essentially one very practical way

would be that if you realize that you

318

: 00:21:27

can't really get a Hamiltonian Monte Carlo

to explore the space, the posterior

319

: 00:21:33

properly, that it may be difficult to find

out that this is happening.

320

: 00:21:39

Of course, if you're ever visiting a

certain corner, you wouldn't actually

321

: 00:21:42

know.

322

: 00:21:42

But if you have some sort of a reason to

believe that you really are handling with

323

: 00:21:47

such a complex posterior that I'm kind of

willing to spend a bit more extra

324

: 00:21:52

computation to be careful so that I really

try to cover every corner there is.

325

: 00:21:58

Another example is that we realized on the

scope of these Bayesian neural networks

326

: 00:22:03

that there are certain kind of classical

327

: 00:22:08

Well, certain kind of scenarios where we

can show that if you do inference with the

328

: 00:22:13

two simple methods, so something in the

Euclidean metric with the standard

329

: 00:22:16

Vangerman dynamics type of a thing, what

we actually see is that if you switch to

330

: 00:22:22

using better prior distributions in your

model, you don't actually see an advantage

331

: 00:22:28

of those unless you at the same time

switch to using an inference algorithm

332

: 00:22:33

that is kind of able to handle the extra

complexity.

333

: 00:22:36

So if you have for example like

334

: 00:22:38

heavy tail spike and slap type of priors

in the neural network.

335

: 00:22:43

You just kind of fail to get any benefit

from these better priors if you don't pay

336

: 00:22:49

a bit more attention into how you do the

inference.

337

: 00:22:54

Okay, super interesting.

338

: 00:22:56

And also, so that seems it's also quite

interesting to look at that when you have,

339

: 00:23:01

well, or when you suspect that you have

multi -modal posteriors.

340

: 00:23:08

Yes, well yeah, multimodal posteriors are

interesting.

341

: 00:23:11

I'm not, we haven't specifically studied

like this question that is there and we

342

: 00:23:17

have actually thought about some ideas of

creating metrics that would specifically

343

: 00:23:22

encourage exploring the different modes

but we haven't done that concretely so we

344

: 00:23:27

now still focusing on these kind of narrow

thin areas of posteriors and how can you

345

: 00:23:32

kind of reach those.

346

: 00:23:35

Okay.

347

: 00:23:37

And do you know of normalizing flows?

348

: 00:23:44

Sure, yes.

349

: 00:23:45

So yeah, we've had Marie -Lou Gabriel on

the show recently.

350

: 00:23:51

It was episode 98.

351

: 00:23:53

And so she's working a lot on these

normalizing flows and the idea of

352

: 00:23:57

assisting NCMC sampling with these machine

learning methods.

353

: 00:24:02

And it's amazing.

354

: 00:24:04

can sound somewhat similar to what you do

in your group.

355

: 00:24:08

And so for listeners, could you explain

the difference between the two ideas and

356

: 00:24:15

maybe also the use cases that both apply

to it?

357

: 00:24:20

Yeah, I think you're absolutely right.

358

: 00:24:22

So they are very closely related.

359

: 00:24:25

So there are, for example, the basic idea

of the neural transport that uses

360

: 00:24:30

normalizing flows for

361

: 00:24:32

essentially transforming the parameter

space in a suitable non -linear way and

362

: 00:24:38

then running standard Euclidean

Hamiltonian Monte Carlo.

363

: 00:24:43

It can actually be proven.

364

: 00:24:45

I think it is in the original paper as

well that I mean it is actually

365

: 00:24:48

mathematically equivalent to conducting

Riemannian inference in a suitable metric.

366

: 00:24:55

So I would say that it's like a

complementary approach of solving exactly

367

: 00:25:01

the same problem.

368

: 00:25:02

So you have a way of somehow in a flexible

way warping your parameter space.

369

: 00:25:09

You either do it through a metric or you

kind of do it as a pre -transformation.

370

: 00:25:14

So there's a lot of similarities.

371

: 00:25:17

It's also the computation in some sense

that if you think about mapping...

372

: 00:25:22

sample through a normalizing flow.

373

: 00:25:24

It's actually very close to what we do

with the Riemannian Laplace approximation

374

: 00:25:28

that you start kind of take a sample and

you start propagating it through some sort

375

: 00:25:34

of a transformation.

376

: 00:25:35

It's just whether it's defined through a

metric or as a flow.

377

: 00:25:40

So yes, so they are kind of very close.

378

: 00:25:44

So now the question is then that when

should I be using one of these?

379

: 00:25:48

I'm afraid I don't really have an answer.

380

: 00:25:51

that in a sense that I mean there's

computational properties on let's say for

381

: 00:25:59

example if you've worked with flows you do

need to pre -train them so you do need to

382

: 00:26:03

train some sort of a flow to be able to

use it in certain applications so it comes

383

: 00:26:08

with some pre -training cost.

384

: 00:26:11

Quite likely during when you're actually

using it it's going to be faster than

385

: 00:26:15

working in a Riemannian metric where you

need to invert some metric tensors and so

386

: 00:26:19

on.

387

: 00:26:21

So there's kind of like technical

differences.

388

: 00:26:24

Then I think the bigger question is of

course that if we go to really challenging

389

: 00:26:28

problems, for example, very high

dimensions, that which of these methods

390

: 00:26:32

actually work well there.

391

: 00:26:35

For that I don't quite now have an answer

in the sense that I would dare to say that

392

: 00:26:40

or even speculate that which of these

things I might miss some kind of obvious

393

: 00:26:46

limitations of one of the approaches if

trying to kind of extrapolate too far.

394

: 00:26:51

from what we've actually tried in

practice.

395

: 00:26:53

Yeah, that's what I was going to say.

396

: 00:26:55

It's also that these methods are really at

the frontier of the science.

397

: 00:27:00

So I guess we're lacking, we're lacking

for now the practical cases, right?

398

: 00:27:07

And probably in a few years we'll have

more ideas of these and when one is more

399

: 00:27:13

appropriate than another.

400

: 00:27:14

But for now, I guess we have to try.

401

: 00:27:18

those algorithms and see what we get back.

402

: 00:27:24

And so actually, what if people want to

try these Romanian based algorithms?

403

: 00:27:33

Do you have already packages that we can

link to that people can try and plug their

404

: 00:27:38

own model into?

405

: 00:27:41

Yes and no.

406

: 00:27:43

So we have released open source code with

each of the research papers.

407

: 00:27:50

So there is a reference implementation

that

408

: 00:27:53

can be used.

409

: 00:27:58

We have internally been integrating these,

kind of working a bit towards integrating

410

: 00:28:03

the kind of proper open ecosystems that

would allow, make like for example model

411

: 00:28:08

specification easy.

412

: 00:28:11

It's not quite there yet.

413

: 00:28:12

So there's one particular challenge is

that many of the environments don't

414

: 00:28:17

actually have all the support

functionality you need for the Riemannian

415

: 00:28:22

methods.

416

: 00:28:23

They're essentially simplifying some of

the things that directly encoding these

417

: 00:28:28

assumptions that the shortest path is an

interpolation or it's a line.

418

: 00:28:33

So you need a bit of an extra machinery

for the most established libraries.

419

: 00:28:38

There are some libraries, I believe, that

are actually making it fairly easy to do

420

: 00:28:45

kind of plug and play Riemannian metrics.

421

: 00:28:48

I don't remember the names right now, but

that's where we've kind of been.

422

: 00:28:53

planning on putting in the algorithms, but

they're not really there yet.

423

: 00:28:58

Hmm, OK, I see.

424

: 00:29:00

Yeah, definitely that would be, I guess,

super, super interesting.

425

: 00:29:05

If by the time of release, you see

something that people could try,

426

: 00:29:11

definitely we'll link to that, because I

think listeners will be curious.

427

: 00:29:17

And I'm definitely super curious to try

that.

428

: 00:29:19

Any new stuff like that, or you'd like to?

429

: 00:29:21

try and see what you can do with it.

430

: 00:29:24

It's always super interesting.

431

: 00:29:26

And I've already seen some very

interesting experiments done with

432

: 00:29:33

normalizing flows, especially Bayox by

Colin Carroll and other people.

433

: 00:29:42

Colin Carroll is one of the EasyPindC

developer also.

434

: 00:29:47

And yeah, now you can use Bayox to take

any

435

: 00:29:50

a juxtifiable model and you plug that into

it and you can use the flow MC algorithm

436

: 00:30:00

to sample your juxtifiable PIMC model.

437

: 00:30:03

So that's really super cool.

438

: 00:30:06

And I'm really looking forward to more

experiments like that to see, well, okay,

439

: 00:30:12

what can we do with those algorithms?

440

: 00:30:14

Where can we push them to what extent, to

what degree, where do they fall down?

441

: 00:30:20

That's really super interesting, at least

for me, because I'm not a mathematician.

442

: 00:30:25

So when I see that, I find that super,

like, I love the idea of, basically the

443

: 00:30:29

idea is somewhat simple.

444

: 00:30:30

It's like, okay, we have that problem when

we think about geometry that way, because

445

: 00:30:34

then the geometry becomes a funnel, for

instance, as you were saying.

446

: 00:30:38

And then sampling at the bottom of the

funnel is just super hard in the way we do

447

: 00:30:42

it right now, because just super small

distances.

448

: 00:30:46

What if we change the definition of

distance?

449

: 00:30:48

What if we change the definition of

geometry, basically, which is this idea

450

: 00:30:54

of, OK, let's switch to Romanian space.

451

: 00:30:57

And the way we do that, then, well, the

funnel disappears, and it just becomes

452

: 00:31:01

something easier.

453

: 00:31:02

It's just like going beyond the idea of

the centered versus non -centered

454

: 00:31:09

parameterization, for instance, when you

do that in model, right?

455

: 00:31:13

But it's going big with that because it's

more general.

456

: 00:31:17

So I love that idea.

457

: 00:31:19

I understand it, but I cannot really read

the math and be like, oh, OK, I see what

458

: 00:31:23

that means.

459

: 00:31:25

So I have to see the model and see what I

can do and where I can push it.

460

: 00:31:30

And then I get a better understanding of

what that entails.

461

: 00:31:35

Yeah, I think you gave a much better

summary of what it is doing than I did.

462

: 00:31:40

So good for that.

463

: 00:31:42

I mean, you are actually touching that, of

course.

464

: 00:31:45

So there's the one point is making the

algorithms.

465

: 00:31:48

available so that everyone could try them

out.

466

: 00:31:51

But then there's also the other aspect

that we need to worry about, which is the

467

: 00:31:56

proper evaluation of what they're doing.

468

: 00:31:59

I mean, of course, most of the papers when

you release a new algorithm, you need to

469

: 00:32:04

emphasize things like, in our case,

computational efficiency.

470

: 00:32:08

And you do demonstrate that it, maybe for

example, being quite explicitly showing

471

: 00:32:13

that these very strong funnels, it does

work better with those.

472

: 00:32:17

But now then the question is of course

that how reliable these things are if used

473

: 00:32:22

in a black box manner in a so that someone

just runs them on their favorite model.

474

: 00:32:27

And one of the challenges we realized is

that it's actually very hard to evaluate

475

: 00:32:34

how well an algorithm is working in an

extremely difficult case.

476

: 00:32:39

Because there is no baseline.

477

: 00:32:41

I mean, in some of the cases we've been

comparing that let's try to do...

478

: 00:32:47

standard Hamiltonian MCMC on nuts as

carefully as we can.

479

: 00:32:55

And they kind of think that this is the

ground truth, this is the true posterior.

480

: 00:33:00

But we don't really know whether that's

the case.

481

: 00:33:02

So if it's hard enough case, our kind of

supposed ground truth is failing as well.

482

: 00:33:09

And it's very hard to kind of then we

might be able to see that our solution

483

: 00:33:12

differs from that.

484

: 00:33:14

But then we would need to kind of

separately go and investigate that which

485

: 00:33:17

one was wrong.

486

: 00:33:20

And that is a practical challenge,

especially if you would like to have a

487

: 00:33:27

broad set of models.

488

: 00:33:31

And we would want to show somehow

transparently for the kind of end users

489

: 00:33:35

that in these and these kind of problems,

this and that particular method, whether

490

: 00:33:39

it's one of ours or something else, any

other new fancy.

491

: 00:33:42

When do they work when they don't?

492

: 00:33:46

Without relying that we really have some

particular method that they already trust

493

: 00:33:52

and we kind of, if it's just compared to

it, we can't kind of really convince

494

: 00:33:58

others that is it correct when it is

differing from what we kind of used to

495

: 00:34:04

rely on.

496

: 00:34:06

Yeah, that's definitely a problem.

497

: 00:34:10

That's also a question I asked Marilu.

498

: 00:34:12

when she was on the show and then that was

kind of the same answer if I remember

499

: 00:34:16

correctly that for now it's kind of hard

to do benchmarks in a way, which is

500

: 00:34:22

definitely an issue if you're trying to

work on that from a scientific perspective

501

: 00:34:28

as well.

502

: 00:34:30

If we were astrologists, that'd be great,

like then we'd be good.

503

: 00:34:34

But if you're a scientist, then you want

to evaluate your methods and...

504

: 00:34:39

And finding a method to evaluate the

method is almost as valuable as finding

505

: 00:34:43

the method in the first place.

506

: 00:34:46

And where do you think we are on that

regarding in your field?

507

: 00:34:50

Is that an active branch of the research

to try and evaluate these algorithms?

508

: 00:34:57

How would that even look like?

509

: 00:34:59

Or are we still really, really at a very

early time for that work?

510

: 00:35:05

That's a...

511

: 00:35:07

Very good question.

512

: 00:35:08

So I'm not aware of a lot of people that

would kind of specifically focus on

513

: 00:35:13

evaluation.

514

: 00:35:14

So for example, Aki has of course been

working a lot on that, trying to kind of

515

: 00:35:17

create diagnostics and so on.

516

: 00:35:19

But then if we think about more on the

flexible machine learning side, I think my

517

: 00:35:26

hunch is that it's the individual research

groups are kind of all circling around the

518

: 00:35:31

same problems that they are kind of trying

to figure out that, okay,

519

: 00:35:36

Every now and then someone invents a fancy

way of evaluating something.

520

: 00:35:41

It introduces a particular type of

synthetic scenario where I think that the

521

: 00:35:49

most common in tries that what people do

is that you create problems where you

522

: 00:35:55

actually have an analytic posterior and

it's somehow like an artificial problem

523

: 00:35:59

that you take a problem and you transform

it in a given way and then you assume that

524

: 00:36:04

I didn't have the analytic one.

525

: 00:36:06

But they are all, I mean, they feel a bit

artificial.

526

: 00:36:10

They feel a bit synthetic.

527

: 00:36:12

So let's see.

528

: 00:36:13

It would maybe be something that the

community should kind of be talking a bit

529

: 00:36:17

more about on a workshop or something

that, OK, let's try to really think about

530

: 00:36:22

how to verify the robustness or possibly

identify that these things are not really

531

: 00:36:28

ready or reliable for practical use in

very serious applications yet.

532

: 00:36:34

Yeah.

533

: 00:36:35

I haven't been following very closely

what's happening, so I may be missing some

534

: 00:36:39

important works that are already out

there.

535

: 00:36:42

Okay, yeah.

536

: 00:36:44

Well, Aki, if you're listening, send us a

message if we forgot something.

537

: 00:36:50

And second, that sounds like there are

some interesting PhDs to do on the issue,

538

: 00:36:57

if that's still a very new branch of the

research.

539

: 00:37:01

So, people?

540

: 00:37:04

If you're interested in that, maybe

contact Arto and we'll see.

541

: 00:37:08

Maybe in a few months or years, you can

come here on the show and answer the

542

: 00:37:12

question I just asked.

543

: 00:37:18

Another aspect of your work I really want

to talk about also that I really love and

544

: 00:37:24

now listeners can relax because that's

going to be, I think, less abstract and

545

: 00:37:31

closer to their user experience.

546

: 00:37:33

is about priors.

547

: 00:37:35

You talked about it a bit at the

beginning, especially you are working and

548

: 00:37:40

you worked a lot on a package called

Prelease that I really love.

549

: 00:37:46

One of my friends and fellow Pimc

developers, Osvaldo Martin, is also

550

: 00:37:51

collaborating on that.

551

: 00:37:54

And you guys have done a tremendous job on

that.

552

: 00:37:58

So yeah, can you give people a primer

about Prelease?

553

: 00:38:02

What is it?

554

: 00:38:03

When could they use it and what's its

purpose in general?

555

: 00:38:12

Maybe I need to start by saying that I

haven't worked a lot on prelease.

556

: 00:38:16

Osvaldo has and a couple of others, so

I've been kind of just hovering around and

557

: 00:38:21

giving a bit of feedback.

558

: 00:38:23

But yeah, so I'll maybe start a bit

further away, so not directly from

559

: 00:38:28

prelease, but the whole question of prior

elicitation.

560

: 00:38:31

So I think the...

561

: 00:38:32

Yeah.

562

: 00:38:33

What we've been working with that is the

prior elicitation is simply an, I would

563

: 00:38:38

frame it as that it's some sort of

unusually iterative approach of

564

: 00:38:43

communicating with the domain expert where

the goal is to estimate what's their

565

: 00:38:49

actual subjective prior knowledge is on

whatever parameters the model has and

566

: 00:38:56

doing it so that it's like cognitively

easy for the expert.

567

: 00:39:01

So many of the algorithms that we've been

working on this are based on this idea of

568

: 00:39:07

predictive elicitation.

569

: 00:39:09

So if you have a model where the

parameters don't actually have a very

570

: 00:39:13

concrete, easily understandable meaning,

you can't actually start asking questions

571

: 00:39:19

from the expert about the parameters.

572

: 00:39:21

It would require them to understand fully

the model itself.

573

: 00:39:26

The predictive elicitation techniques kind

of ask

574

: 00:39:30

communicate with the expert usually in the

space of the observable quantities.

575

: 00:39:34

So they're trying to make that is this

somehow more likely realization than this

576

: 00:39:40

other one.

577

: 00:39:42

And now this is where the prelease comes

into play.

578

: 00:39:46

So when we are communicating with the

user, so most of the times the information

579

: 00:39:52

we show for the user is some sort of

visualizations.

580

: 00:39:58

of predictive distributions or possibly

also about the parameter distributions

581

: 00:40:03

themselves.

582

: 00:40:04

So we need an easy way of communicating

whether it's histograms of predicted

583

: 00:40:11

values and whatnot.

584

: 00:40:13

So how do we show those for a user in

scenarios where the model itself is some

585

: 00:40:21

sort of a probabilistic program so we

can't kind of fixate to a given model

586

: 00:40:26

family.

587

: 00:40:27

That's actually what's the main role of

Prelease is essentially making it easy to

588

: 00:40:34

interface with the user.

589

: 00:40:35

Of course, Prelease also then includes

these algorithms themselves.

590

: 00:40:40

So, algorithms for estimating the prior

and the kind of interface components for

591

: 00:40:46

the expert to give information.

592

: 00:40:48

So, make a selection, use a slider that I

would want my distribution to be a bit

593

: 00:40:54

more skewed towards the right and so on.

594

: 00:40:57

That's what we are aiming at.

595

: 00:40:59

A general purpose tool that would be used,

it's essentially kind of a platform for

596

: 00:41:06

developing and kind of bringing into use

all kinds of prioritization techniques.

597

: 00:41:11

So it's not tied to any given algorithm or

anything but you just have the components

598

: 00:41:15

and could then easily kind of commit,

let's say, a new type of prioritization

599

: 00:41:21

algorithm into the library.

600

: 00:41:25

Yeah and I re -encourage

601

: 00:41:28

folks to go take a look at the prelease

package.

602

: 00:41:31

I put the link in the show notes because,

yeah, as you were saying, that's a really

603

: 00:41:38

easier way to specify your priors and also

elicit them if you need the intervention

604

: 00:41:47

of non -statisticians in your model, which

you often do if the model is complex

605

: 00:41:52

enough.

606

: 00:41:54

So yeah, like...

607

: 00:41:55

I'm using it myself quite a lot.

608

: 00:41:58

So thanks a lot guys for this work.

609

: 00:42:02

So Arto, as you were saying, Osvaldo

Martín is one of the main contributors,

610

: 00:42:07

Oriol Abril Blas also, and Alejandro

Icazati, if I remember correctly.

611

: 00:42:13

So at least these four people are the main

contributors.

612

: 00:42:19

And yeah, so I definitely encourage people

to go there.

613

: 00:42:22

What would you say, Arto, are the...

614

: 00:42:25

like the Pareto effect, what would it be

if people want to get started with

615

: 00:42:32

Prelease?

616

: 00:42:32

Like the 20 % of uses that will give you

80 % of the benefits of Prelease for

617

: 00:42:40

someone who don't know anything about it.

618

: 00:42:45

That's a very good question.

619

: 00:42:48

I think the most important thing actually

is to realize that we need to be careful

620

: 00:42:58

when we set the priors.

621

: 00:43:00

So simply being aware that you need a tool

for this.

622

: 00:43:04

You need a tool that makes it easy to do

something like a prior predictive check.

623

: 00:43:09

You need a tool that relieves you from

figuring out how do I inspect.

624

: 00:43:15

my priors or the effects it has on the

model.

625

: 00:43:19

That's actually where the real benefit is.

626

: 00:43:22

You get most of the...

627

: 00:43:23

when you kind of try to bring it as part

of your Bayesian workflow in a kind of a

628

: 00:43:28

concrete step that you identify that I

need to do this.

629

: 00:43:32

Then the kind of the remaining tale of

this thing is then of course that the...

630

: 00:43:37

maybe in some cases you have such a

complicated model that you really need to

631

: 00:43:42

deep dive and start...

632

: 00:43:43

running algorithms that help you eliciting

the priors.

633

: 00:43:47

And I would actually even say that the

elicitation algorithms, I do perceive them

634

: 00:43:52

useful even when the person is actually a

statistician.

635

: 00:43:57

I mean, there's a lot of models that we

may think that we know how to set the

636

: 00:44:02

priors.

637

: 00:44:03

But what we are actually doing is

following some very vague ideas on what's

638

: 00:44:09

the effect.

639

: 00:44:10

And we may also make

640

: 00:44:12

severe mistakes or spend a lot of time in

doing it.

641

: 00:44:16

So to an extent these elicitation

interfaces, I believe that ultimately they

642

: 00:44:21

will be helping even kind of hardcore

statisticians in just kind of doing it

643

: 00:44:27

faster, doing it slightly better, doing it

perhaps in a more better documented

644

: 00:44:31

manner.

645

: 00:44:32

So you could for example kind of store all

the interaction the modeler had.

646

: 00:44:40

with these things and kind of put that

aside that this is where we got the prior

647

: 00:44:44

from instead of just trial and error and

then we just see at the end the result.

648

: 00:44:51

So you could kind of revisit the choices

you made during an elicitation process

649

: 00:44:55

that I discarded these predictive

distributions for some reason and then you

650

: 00:45:00

can later kind of, okay I made a mistake

there maybe I go and change my answer in

651

: 00:45:05

that part and then an algorithm provides

you an updated prior.

652

: 00:45:10

without you needing to actually go through

the whole prior specification process

653

: 00:45:14

again.

654

: 00:45:16

Yeah.

655

: 00:45:16

Yeah.

656

: 00:45:17

Yeah, I really love that.

657

: 00:45:19

And that makes the process of setting

priors more reproducible, more transparent

658

: 00:45:25

in a way.

659

: 00:45:26

That makes me think a bit of the scikit

-learn pipelines that you use to transform

660

: 00:45:33

the data.

661

: 00:45:33

For instance, you just set up the pipeline

and you say, I want to standardize my

662

: 00:45:38

data, for instance.

663

: 00:45:40

And then you have that pipeline ready.

664

: 00:45:41

And when you do the auto sample

predictions, you can use the pipeline and

665

: 00:45:44

say, okay, now like do that same

transformation on these new data so that

666

: 00:45:48

we're sure that it's done the right way,

but it's still transparent and people know

667

: 00:45:52

what's going on here.

668

: 00:45:53

It's a bit the same thing, but with the

priors.

669

: 00:45:56

And I really love that because that makes

it also easier for people to think about

670

: 00:46:02

the priors and to actually choose the

priors.

671

: 00:46:07

Because.

672

: 00:46:07

What I've seen in teaching is that

especially for beginners, even more when

673

: 00:46:13

they come from the Frequentis framework,

sending the priors can be just like

674

: 00:46:18

paralyzing.

675

: 00:46:19

It's like products of choice.

676

: 00:46:20

It's way too many, way too many choices.

677

: 00:46:23

And then they end up not choosing anything

because they are too afraid to choose the

678

: 00:46:27

wrong prior.

679

: 00:46:29

Yes, I fully agree with that.

680

: 00:46:31

I mean, there's a lot of very simple

models.

681

: 00:46:36

that already start having six, seven,

eight different univariate priors there.

682

: 00:46:43

And then I've been working with these

things for a long time and I still very

683

: 00:46:50

easily make stupid mistakes that I'm

thinking that I increase the variance of

684

: 00:46:55

this particular prior here, thinking that

what I'm achieving is, for example, higher

685

: 00:47:00

predictive variance as well.

686

: 00:47:02

And then I realized that, no, that's not

the case.

687

: 00:47:04

It's actually...

688

: 00:47:06

Later in the model, it plays some sort of

a role and it actually has the opposite

689

: 00:47:11

effect.

690

: 00:47:11

It's hard.

691

: 00:47:14

Yeah.

692

: 00:47:15

Yeah.

693

: 00:47:15

That stuff is really hard and same here.

694

: 00:47:19

When I discovered that, I'm extremely

frustrated because I'm like, I always did

695

: 00:47:26

hours on these, whereas if I had a more

producible pipeline, that would just have

696

: 00:47:31

been handled automatically for me.

697

: 00:47:34

So...

698

: 00:47:34

Yeah, for sure.

699

: 00:47:35

We're not there yet in the workflow, but

that definitely makes it way easier.

700

: 00:47:42

So yeah, I absolutely agree that we are

not there yet.

701

: 00:47:45

I mean, the Prellis is a very well

-defined tool that allows us to start

702

: 00:47:55

working on it.

703

: 00:47:55

But I mean, then the actual concrete

algorithms that would make it easy to

704

: 00:48:02

let's say for example, avoid these kind of

stupid mistakes and be able to kind of

705

: 00:48:07

really reduce the effort.

706

: 00:48:09

So if it now takes two weeks for a PhD

student trying to think about and fiddle

707

: 00:48:15

with the prior, so can we get to one day?

708

: 00:48:19

Can we get it to one hour?

709

: 00:48:21

Can we get it to two minutes of a quick

interaction?

710

: 00:48:24

And probably not two minutes, but if we

can get it to one hour and it...

711

: 00:48:30

It will require lots of things.

712

: 00:48:32

It will require even better of this kind

of tooling.

713

: 00:48:36

So how do we visualize, how do we play

around with it?

714

: 00:48:39

But I think it's going to require quite a

bit better algorithms on how do you, from

715

: 00:48:48

kind of maximally limited interaction, how

do you estimate.

716

: 00:48:54

what the prior is and how you design the

kind of optimal questions you should be

717

: 00:48:58

asking from the expert.

718

: 00:49:00

There's no point in kind of reiterating

the same things just to fine -tune a bit

719

: 00:49:05

one of the variances of the priors if

there is a massive mistake still somewhere

720

: 00:49:12

in the prior and a single question would

be able to rule out half of the possible

721

: 00:49:18

scenarios.

722

: 00:49:20

It's going to be an interesting...

723

: 00:49:22

let's say, rise research direction, I

would say, for the next 5, 10 years.

724

: 00:49:27

Yeah, for sure.

725

: 00:49:29

And very valuable also because very

practical.

726

: 00:49:33

So for sure, again, a great PhD

opportunity, folks.

727

: 00:49:39

Yeah, yeah.

728

: 00:49:40

Also, I mean, that may be hard to find

those algorithms that you were talking

729

: 00:49:45

about because it is hard, right?

730

: 00:49:48

I know I worked on the...

731

: 00:49:51

find constraint prior function that we

have in PMC now.

732

: 00:49:56

And it's just like, it seemed like a very

simple case.

733

: 00:50:00

It's not even doing all the fancy stuff

that Prellis is doing.

734

: 00:50:03

It's mainly just optimizing distribution

so that it fits the constraints that you

735

: 00:50:10

are giving it.

736

: 00:50:12

Like for instance, I want a gamma with 95

% of the mass between 2 and 6.

737

: 00:50:18

Give me the...

738

: 00:50:20

parameters that fit that constraint.

739

: 00:50:23

That's actually surprisingly hard

mathematically.

740

: 00:50:26

You have a lot of choices to make, you

have a lot of things to really be careful

741

: 00:50:30

about.

742

: 00:50:32

And so I'm guessing that's also one of the

hurdles right now in that research.

743

: 00:50:38

Yeah, it absolutely is.

744

: 00:50:40

I mean, I would say at least I'm

approaching this.

745

: 00:50:44

more or less from an optimization

perspective then that I mean, yes, we are

746

: 00:50:48

trying to find a prior that best satisfies

whatever constraints we have and trying to

747

: 00:50:53

formulate an optimization problem of some

kind that gets us there.

748

: 00:50:58

This is also where I think there's a lot

of room for the, let's say flexible

749

: 00:51:03

machine learning tools type of things.

750

: 00:51:05

So, I mean, if you think about the prior

that satisfies these constraints, we could

751

: 00:51:09

be specifying it with some sort of a

flexible

752

: 00:51:15

not a particular parametric prior but some

sort of a flexible representation and then

753

: 00:51:20

just kind of optimizing for within a much

broader set of this.

754

: 00:51:26

But then of course it requires completely

different kinds of tools that we are used

755

: 00:51:29

to working on.

756

: 00:51:30

It also requires people accepting that our

priors may take arbitrary shapes.

757

: 00:51:38

They may be distributions that we could

have never specified directly.

758

: 00:51:43

Maybe they're multimodal.

759

: 00:51:45

priors that we kind of just infer that

this is what you couldn't really and

760

: 00:51:51

there's going to be also a lot of kind of

educational perspective on getting people

761

: 00:51:55

to accept this.

762

: 00:51:56

But even if I had to give you a perfect

algorithm that somehow cranks out a prior

763

: 00:52:03

and then you look at the prior and you're

saying that I don't even know what

764

: 00:52:07

distribution this is, I would have never

ever converged into this if I was manually

765

: 00:52:12

doing this.

766

: 00:52:13

So will you accept?

767

: 00:52:15

that that's your prior or will you insist

that your method is doing something

768

: 00:52:19

stupid?

769

: 00:52:20

I mean, I still want to use my my Gaussian

prior here.

770

: 00:52:24

Yeah, that's a good point.

771

: 00:52:27

And in a way that's kind of related to a

classic problem that you have when you're

772

: 00:52:34

trying to automate a process.

773

: 00:52:35

I think there's the same issue with the

automated cars, like those self -driving

774

: 00:52:39

cars, where people actually trust the cars

more if they think they have

775

: 00:52:45

some control over it.

776

: 00:52:47

I've seen interesting experiments where

they put a placebo button in the car that

777

: 00:52:52

people could push on to override if they

wanted to, but the button wasn't doing

778

: 00:52:58

anything.

779

: 00:52:59

People are saying they were more

trustworthy of these cars than the

780

: 00:53:03

completely self -driving cars.

781

: 00:53:05

That's also definitely something to take

into account, but that's more related to

782

: 00:53:09

the human psychology than to the

algorithms per se.

783

: 00:53:15

related to human psychology but it's also

related to this evaluation perspective.

784

: 00:53:19

I mean of course if we did have a very

robust evaluation pattern that somehow

785

: 00:53:25

tells that once you start using these

techniques your final conclusions in some

786

: 00:53:30

sense will be better and if we can make

that kind of a very convincing then it

787

: 00:53:38

will be easier.

788

: 00:53:39

I mean if you think about, I mean there's

a lot of people that would say that

789

: 00:53:44

very massive neural network with four

billion parameters.

790

: 00:53:49

It would never ever be able to answer a

question given in a natural language.

791

: 00:53:54

A lot of people were saying that five

years ago that this is a pipeline, it's

792

: 00:53:57

never gonna happen.

793

: 00:53:59

Now we do have it and now everyone is

ready to accept that yes, it can be done.

794

: 00:54:04

And they are willing to actually trust

these judge -y pity type of models in a

795

: 00:54:09

lot of things.

796

: 00:54:09

And they are investing a lot of effort

into figuring out what to do with this.

797

: 00:54:14

It just needs this kind of very concrete

demonstration that there is value and that

798

: 00:54:20

it works well enough.

799

: 00:54:22

It will still take time for people to

really accept it, but I mean, I think

800

: 00:54:26

that's kind of the key ingredient.

801

: 00:54:29

Yeah, yeah.

802

: 00:54:30

I mean, it's also good in some way.

803

: 00:54:34

Like that skepticism makes the tools

better.

804

: 00:54:37

So that's good.

805

: 00:54:40

I mean, so we could...

806

: 00:54:41

Keep talking about Prolis because I have

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube