#102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Structural Equation Modeling (SEM) is a key framework in causal inference. As I’m diving deeper and deeper into these topics to teach them and, well, finally understand them, I was delighted to host Ed Merkle on the show.

A professor of psychological sciences at the University of Missouri, Ed discusses his work on Bayesian applications to psychometric models and model estimation, particularly in the context of Bayesian SEM. He explains the importance of BSEM in psychometrics and the challenges encountered in its estimation.

Ed also introduces his blavaan package in R, which enhances researchers' capabilities in BSEM and has been instrumental in the dissemination of these methods. Additionally, he explores the role of Bayesian methods in forecasting and crowdsourcing wisdom.

When he’s not thinking about stats and psychology, Ed can be found running, playing the piano, or playing 8-bit video games.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Takeaways:

- Bayesian SEM is a powerful framework in psychometrics that allows for the estimation of complex models involving multiple variables and causal relationships.

- Understanding the principles of Bayesian inference is crucial for effectively applying Bayesian SEM in psychological research.

- Informative priors play a key role in Bayesian modeling, providing valuable information and improving the accuracy of model estimates.

- Challenges in BSEM estimation include specifying appropriate prior distributions, dealing with unidentified parameters, and ensuring convergence of the model. Incorporating prior information is crucial in Bayesian modeling, especially when dealing with large models and imperfect data.

- The blavaan package enhances researchers' capabilities in Bayesian structural equation modeling, providing a user-friendly interface and compatibility with existing frequentist models.

- Bayesian methods offer advantages in forecasting and subjective probability by allowing for the characterization of uncertainty and providing a range of predictions.

- Interpreting Bayesian model results requires careful consideration of the entire posterior distribution, rather than focusing solely on point estimates.

- Latent variable models, also known as structural equation models, play a crucial role in psychometrics, allowing for the estimation of unobserved variables and their influence on observed variables.

- The speed of MCMC estimation and the need for a slower, more thoughtful workflow are common challenges in the Bayesian workflow.

- The future of Bayesian psychometrics may involve advancements in parallel computing and GPU-accelerated MCMC algorithms.

Chapters:

00:00 Introduction to the Conversation

02:17 Background and Work on Bayesian SEM

04:12 Topics of Focus: Structural Equation Models

05:16 Introduction to Bayesian Inference

09:30 Importance of Bayesian SEM in Psychometrics

10:28 Overview of Bayesian Structural Equation Modeling (BSEM)

12:22 Relationship between BSEM and Causal Inference

15:41 Advice for Learning BSEM

21:57 Challenges in BSEM Estimation

34:40 The Impact of Model Size and Data Quality

37:07 The Development of the Blavaan Package

42:16 Bayesian Methods in Forecasting and Subjective Probability

46:27 Interpreting Bayesian Model Results

51:13 Latent Variable Models in Psychometrics

56:23 Challenges in the Bayesian Workflow

01:01:13 The Future of Bayesian Psychometrics

Links from the show:

Ed’s website: https://ecmerkle.github.io/
Ed on Mastodon: https://mastodon.sdf.org/@edgarmerkle
Ed on BlueSky: @edgarmerkle.bsky.social
Ed on GitHub: https://github.com/ecmerkle
blaavan R package: https://ecmerkle.github.io/blavaan/
Resources on how to use blaavan: https://ecmerkle.github.io/blavaan/articles/resources.html
Richard McElreath, Table 2 Fallacy: https://youtu.be/uanZZLlzKHw?si=vssrwJsvGO5HhH5H&t=4323

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:02

Structural Equation Modeling, or SEM, is a

key framework in causal inference.

2

: 00:00:10

As I'm diving deeper and deeper into these

topics to teach them and, well, finally

3

: 00:00:15

understand them, I was delighted to host

Ed Merkel on the show.

4

: 00:00:19

A professor of psychological sciences at

the University of Missouri, Ed discusses

5

: 00:00:25

his work on Bayesian applications to

psychometric models and model estimation.

6

: 00:00:29

particularly in the context of Bayesian

SEM.

7

: 00:00:32

He explains the importance of Bayesian SEM

in psychometrics and the challenges

8

: 00:00:37

encountered in its estimation.

9

: 00:00:39

Ed also introduces his blaavan package in

R, which enhances researchers'

10

: 00:00:44

capabilities in Bayesian SEM and has been

instrumental in the dissemination of these

11

: 00:00:49

methods.

12

: 00:00:51

Additionally, he explores the role of

Bayesian methods in forecasting and

13

: 00:00:55

crowdsourcing wisdom, and when he's not

thinking about stats and psychology, Ed

14

: 00:01:00

can be found running, playing the piano,

or playing 8 -bit video games.

15

: 00:01:04

This is Learning Bayesian Statistics,

,: 2024

16

: 00:01:12

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

17

: 00:01:33

methods, the projects, and the people who

make it possible.

18

: 00:01:36

I'm your host, Alex Andorra.

19

: 00:01:38

You can follow me on Twitter at alex

-underscore -andorra.

20

: 00:01:42

like the country.

21

: 00:01:43

For any info about the show, learnbasedats

.com is left last to be.

22

: 00:01:47

Show notes, becoming a corporate sponsor,

unlocking Bayesian Merge, supporting the

23

: 00:01:52

show on Patreon, everything is in there.

24

: 00:01:54

That's learnbasedats .com.

25

: 00:01:56

If you're interested in one -on -one

mentorship, online courses, or statistical

26

: 00:02:01

consulting, feel free to reach out and

book a call at topmate .io slash alex

27

: 00:02:06

underscore and dora.

28

: 00:02:08

See you around, folks, and best Bayesian

wishes to you all.

29

: 00:02:17

Thank you for having me.

30

: 00:02:20

Yeah, you bet.

31

: 00:02:21

Thanks a lot for taking the time.

32

: 00:02:24

I am really happy to have you on and I

have a lot of questions.

33

: 00:02:29

So that is perfect.

34

: 00:02:31

Before that, as usual, how would you

define the work you're doing nowadays and

35

: 00:02:36

how did you end up working on this?

36

: 00:02:40

Well, a lot of my work right now is with

37

: 00:02:46

Bayesian applications to psychometric

models and model estimation.

38

: 00:02:53

Over time, I've gotten more and more into

the model estimation and computation as

39

: 00:03:00

opposed to applications.

40

: 00:03:02

And it was a slow process to get here.

41

: 00:03:06

I started doing some Bayesian modeling

when I was working on my PhD.

42

: 00:03:12

I finished that in 2005 and...

43

: 00:03:15

I felt a bit restricted by what I could do

with the tools I had at that time, but

44

: 00:03:21

things have improved a lot since then.

45

: 00:03:24

And also I've learned a lot since then.

46

: 00:03:27

So I have over time left some things and

come back to them.

47

: 00:03:33

And when I come back to them, I find

there's more progress that can be made.

48

: 00:03:40

Yeah, that makes sense.

49

: 00:03:42

And that's always super...

50

: 00:03:43

interesting and inspiring to see such

diverse backgrounds on the show.

51

: 00:03:50

I'm always happy to see that.

52

: 00:03:53

And by the way, thanks a lot to Jorge

Sinval to do the introduction.

53

: 00:03:59

Today is February 14th and he was our

matchmaker.

54

: 00:04:02

So thanks a lot, Jorge.

55

: 00:04:05

And yeah, like this promises to be a great

episode.

56

: 00:04:10

So thanks a lot for the suggestion.

57

: 00:04:12

And Ed, actually, could you tell us the

topics that you are particularly focusing

58

: 00:04:20

on?

59

: 00:04:22

Yeah, recently, so in psychology,

psychometrics, education, there's this

60

: 00:04:29

class of models, structural equation

models.

61

: 00:04:32

It's a pretty large class of models and I

think some special cases have been really

62

: 00:04:39

useful.

63

: 00:04:40

Others sometimes get a bad reputation

with, I think, certain groups of

64

: 00:04:45

statistics people.

65

: 00:04:46

But it's this big class and it has

interested me for a long time because so

66

: 00:04:52

much can be done with this class of

models.

67

: 00:04:56

So the Bayesian estimation part has

especially been interesting to me because

68

: 00:05:03

it was relatively underexplored for a long

time.

69

: 00:05:07

And there's some unique challenges there

that I have found and I've tried to make

70

: 00:05:13

some progress on.

71

: 00:05:16

Yeah.

72

: 00:05:17

And we're going to dive into these topics

for sure in the coming minutes.

73

: 00:05:24

But to still talk about your background,

do you remember how you first got

74

: 00:05:31

introduced to Bayesian inference and also

why they sticked with you?

75

: 00:05:36

Yes.

76

: 00:05:40

I think part of how I got interested in

Bayesian inference,

77

: 00:05:46

starts a lot earlier to when I was growing

up.

78

: 00:05:51

I'm about the age where the first half of

my childhood, there were no computers.

79

: 00:05:57

And the second half of growing up,

computers were in people's houses, the

80

: 00:06:03

internet was coming around and so on.

81

: 00:06:05

So I grew up with having a computer in my

house for the first time.

82

: 00:06:12

And then...

83

: 00:06:13

just messing around with it and learning

how to do things on it.

84

: 00:06:17

So then later, a while later when I was

working on my PhD, I grew up with the

85

: 00:06:24

computing topics and I enjoyed that.

86

: 00:06:29

So I felt at the time with Bayesian

estimation, some of the interesting

87

: 00:06:36

computing things were coming out around

the time I was working on my PhD.

88

: 00:06:41

So for example, wind bugs was a big thing,

say around: 2000

89

: 00:06:48

That was when I was starting to work on my

PhD.

90

: 00:06:52

And that seemed like a fun little program

where you could build these models and do

91

: 00:06:57

some Bayesian estimation.

92

: 00:06:59

At the time, I didn't always know exactly

what I was doing, but I still found it

93

: 00:07:03

interesting and perhaps a bit more

intuitive than some of the other.

94

: 00:07:09

methods that were out there at the time.

95

: 00:07:13

Yeah.

96

: 00:07:13

And actually it seems like you've been

part of that movement, which introduced

97

: 00:07:22

patient stats a lot in the psychological

sciences.

98

: 00:07:28

Can you elaborate on the role of the

patient framework in psychological

99

: 00:07:34

research?

100

: 00:07:35

Always a hard word to say when you have a

French accent.

101

: 00:07:39

I understand.

102

: 00:07:45

So yeah, when I was working on my PhD, I

think there was not a lot of psychology

103

: 00:07:52

applications necessarily, or maybe it was

just in certain areas.

104

: 00:07:56

So when I started on my PhD, I was doing

like some cognitive psychology modeling

105

: 00:08:03

where you would bring.

106

: 00:08:05

someone into a room for an experiment and

it could be about memory or something

107

: 00:08:10

where you have them remember a list of

words and then you give them a new list of

108

: 00:08:15

words and ask them which did you see

before and which are new and then you can

109

: 00:08:19

model people's response times or accuracy.

110

: 00:08:24

So there were some Bayesian applications

definitely related to like memory modeling

111

: 00:08:28

at that time but more generally there were

less applications.

112

: 00:08:33

I did my PhD on some Bayesian structural

equation modeling applications to missing

113

: 00:08:39

data.

114

: 00:08:41

At the time, I had a really hard time

publishing that work.

115

: 00:08:45

I think it was partly because I just

wasn't that great at writing papers at the

116

: 00:08:49

time, but also there weren't as many

Bayesian applications.

117

: 00:08:53

So I think people were less interested.

118

: 00:08:57

But over time that has changed, I think

with...

119

: 00:09:00

with improved tools and more attention to

Bayesian modeling.

120

: 00:09:05

You see it more and more in psychology.

121

: 00:09:08

Sometimes it's just an alternative to

frequentness.

122

: 00:09:13

Like if you're doing a regression or a

mixed model, Bayesian is just an

123

: 00:09:18

alternative.

124

: 00:09:20

Other times, like for the structural

equation models, there can be some

125

: 00:09:24

advantages to the Bayesian approach,

especially related to characterizing

126

: 00:09:28

uncertainty.

127

: 00:09:30

And so I think there's more and more

attention in psychology and psychometrics

128

: 00:09:34

to some of those issues.

129

: 00:09:37

Yeah.

130

: 00:09:39

And definitely interesting to see, to hear

that the publishing has, has gotten, has

131

: 00:09:46

become easier, at least for you.

132

: 00:09:49

And a method you're especially working on

and developing is Bayesian structural

133

: 00:09:58

equation modeling or BSEM.

134

: 00:10:01

So we've never covered that yet on the

show.

135

: 00:10:04

So could you give our listeners a primer

on BSEM and its importance in

136

: 00:10:09

psychometrics?

137

: 00:10:11

Yes.

138

: 00:10:12

So this Bayesian structural equation

modeling framework, or maybe I can start

139

: 00:10:18

with just the structural equation modeling

part, that overlaps with lots of other

140

: 00:10:28

modeling frameworks.

141

: 00:10:31

So item response models and factor

analysis models, these are more on the

142

: 00:10:36

measurement side, examining how say some

tests or scales help us to measure a

143

: 00:10:44

person's aptitude.

144

: 00:10:46

Those could all be viewed as special cases

of structural equation models, but the

145

: 00:10:51

heart of structural equation models

involves,

146

: 00:10:57

Like a series of regression models all in

in one big model.

147

: 00:11:03

So if if you know, like the directed

acyclic graphs that come from causal

148

: 00:11:09

research, especially Judea Pearl, you can

think of structural equation models as a

149

: 00:11:18

way to estimate those types of models.

150

: 00:11:21

Like these graphs will often have many

variables.

151

: 00:11:24

and you have arrows between variables that

reflect some causal relationships.

152

: 00:11:29

Well, now structural equation models are

throwing likelihoods on top of that,

153

: 00:11:33

typically normal likelihoods.

154

: 00:11:36

And that gives us a way to fit these sorts

of models to data.

155

: 00:11:41

Whereas directed acyclic graph would

often, you look at that and that helps you

156

: 00:11:47

to know what is estimable and what is not

estimable, say.

157

: 00:11:53

that now the structural equation model is

a way to fit that sort of thing to data.

158

: 00:11:59

But it also overlaps with mixed models.

159

: 00:12:04

Like I said, the item response models,

there's some ideas related to principal

160

: 00:12:09

components in there.

161

: 00:12:11

It overlaps with a lot of things.

162

: 00:12:14

Yeah, that's really interesting to have

that take you on structural.

163

: 00:12:22

structural equation modeling and the

relationship to causal inference in a way.

164

: 00:12:31

And so as you were saying, it also relates

to UDA pearls, to calculus and things like

165

: 00:12:38

that.

166

: 00:12:38

So I definitely encourage the listener to

dive deeper on these literature that's

167

: 00:12:44

absolutely fascinating.

168

: 00:12:45

I really love that.

169

: 00:12:46

And that's also from my own perspective

learning about those

170

: 00:12:51

things recently, I found that it was way

easier being already a Bayesian.

171

: 00:12:58

If you already do Bayesian models from a

generative modeling perspective, then

172

: 00:13:05

intervening on the graph and doing, like

in calculus, doing an intervention is

173

: 00:13:11

basically like doing bus operative

sampling as you were already doing on your

174

: 00:13:16

Bayesian model.

175

: 00:13:17

But instead of having already

176

: 00:13:20

conditioned on some data, you come up with

the platonic idea of the data generative

177

: 00:13:30

model that you have in mind.

178

: 00:13:31

And then you intervene on the model by

setting some values on some of the nodes

179

: 00:13:37

and then seeing what that gives you, what

that intervention gives you on the

180

: 00:13:41

outcome.

181

: 00:13:42

And I find that really, really natural to

learn already from a Bayesian perspective.

182

: 00:13:47

I don't know what your experience has

been.

183

: 00:13:49

Oh, yeah, I think the Bayesian perspective

really helps you keep these models at like

184

: 00:13:57

the raw data level.

185

: 00:14:00

So you're thinking about how do individual

variables cause other variables and what

186

: 00:14:06

does that mean about data predictions?

187

: 00:14:09

If you look at often how frequent this

present these models.

188

: 00:14:16

We have something like random effects in

these models.

189

: 00:14:19

And so from a frequentist perspective, you

wanna get rid of those random effects,

190

: 00:14:24

marginalize them out of a model.

191

: 00:14:26

And then for these models, we're left with

some structured covariance matrix.

192

: 00:14:31

And often the frequentist will start with,

okay, you have an observed covariance

193

: 00:14:35

matrix and then our model implies a

covariance matrix.

194

: 00:14:39

But I find that so it's...

195

: 00:14:42

it's unintuitive to think about compared

to raw data.

196

: 00:14:46

You know, like I can see how the data from

one variable can influence another

197

: 00:14:52

variable, but now to think about what does

that mean about the prediction for a

198

: 00:14:56

covariance that I think makes it less

intuitive and that's really where some of

199

: 00:15:02

the Bayesian models have an advantage.

200

: 00:15:05

Yeah, yeah, definitely.

201

: 00:15:06

And that's why my learning myself on

202

: 00:15:11

on this front and also teaching about

these topics has been extremely helpful

203

: 00:15:17

for myself because to teach it, you really

have to understand it really well.

204

: 00:15:23

So that was a great Or said differently

that you don't understand it until you

205

: 00:15:28

teach it.

206

: 00:15:29

I've thought that I understood things

before, but then when I teach it, I

207

: 00:15:34

realized, well, I didn't quite understand

everything.

208

: 00:15:39

Yeah, for sure.

209

: 00:15:39

Definitely.

210

: 00:15:41

And what advice would you give to someone

who is already a Bayesian and want to

211

: 00:15:52

learn about these structural equation

modeling, and to someone who is already

212

: 00:15:57

doing psychometrics and would like to now

learn about these structural equation

213

: 00:16:01

modeling?

214

: 00:16:02

What advice would you give to help them

start on this path?

215

: 00:16:07

Yeah, I think.

216

: 00:16:10

For people who already know Bayesian

models.

217

: 00:16:17

I think I would explain structural

equation models as like a combination of

218

: 00:16:24

say principal components or factor

analysis and then regression.

219

: 00:16:30

And I think you can, there's these

expressions for the structural equation

220

: 00:16:37

modeling framework where you have these

big matrices and depending on what goes in

221

: 00:16:42

the matrices, you get certain models.

222

: 00:16:45

I would almost advise against starting

there because you can have this giant

223

: 00:16:50

framework that's expressing matrices, but

it gets very confusing about what goes in

224

: 00:16:57

what matrix or what does this mean from a

general perspective.

225

: 00:17:01

I would almost advise starting smaller,

say with some factor analysis models, or

226

: 00:17:07

you can have these models where there's

one unobserved variable regressed on

227

: 00:17:13

another unobserved variable.

228

: 00:17:15

I would say like starting with some of

those models and then working your way up.

229

: 00:17:20

On the other hand, if someone already

knows the psychometric models and is

230

: 00:17:25

moving to Bayesian modeling, I think the

challenge is to think of these models

231

: 00:17:32

again as models of data, not as models of

a covariance matrix.

232

: 00:17:36

I guess that's related to what we talked

about earlier.

233

: 00:17:39

But if you know the frequentist models,

typically the

234

: 00:17:45

just how they talk about these models

involves just a covariance matrix or

235

: 00:17:51

tricks for marginalizing over the random

effects or the random parameters in the

236

: 00:17:57

model.

237

: 00:17:58

And I think taking a step back and looking

at what does the model say about the data

238

: 00:18:04

before we try to get rid of these random

parameters, I think that is helpful for

239

: 00:18:09

thinking through the Bayesian approach.

240

: 00:18:12

Okay, yeah.

241

: 00:18:13

Yeah, super interesting.

242

: 00:18:14

in the then I would also want to ask you

once you once you've done that so once

243

: 00:18:24

you're into BSEM why is that useful and

what is its importance in your field of

244

: 00:18:33

psychometrics these days?

245

: 00:18:37

Yeah, so the Bayesian part, I would say

one use is, I think it slows you down a

246

: 00:18:47

bit.

247

: 00:18:47

There are certain, say, specifying prior

distributions and really thinking through

248

: 00:18:51

the prior distributions.

249

: 00:18:53

This is something you don't encounter on

the frequentist side.

250

: 00:18:56

It's going to slow you down, but I think

for these models, that ends up being

251

: 00:19:01

useful because...

252

: 00:19:04

You know, if you simulate data from priors

and really look at what are these priors

253

: 00:19:09

saying about the sort of data I can

expect, I find that helps you understand

254

: 00:19:15

these models in a way that you don't often

get from the frequentist side.

255

: 00:19:24

And then I guess said differently, I think

over say the past 30, 40 years with these

256

: 00:19:32

structural equation models, I think often

in the field we've come to expect that I

257

: 00:19:38

can specify this giant model and hit a

button and run it.

258

: 00:19:42

And then I get some results and report

just a few results from this big model.

259

: 00:19:48

I think we've lost something with

understanding what.

260

: 00:19:52

exactly as this model is saying about the

data.

261

: 00:19:55

And that's a place where the Bayesian

versions of these models can be really

262

: 00:20:00

helpful.

263

: 00:20:02

I think there was a second part to your

question, but I forgot the second part.

264

: 00:20:06

Yeah, what is the importance of BSCM these

days in psychometrics?

265

: 00:20:13

Yeah, yeah.

266

: 00:20:16

I think there's a couple, I think key

advantages.

267

: 00:20:19

One, again, we have random parameters that

are sort of like random effects if you

268

: 00:20:24

know mixed models.

269

: 00:20:26

And with MCMC, we can sample these

parameters and characterize their

270

: 00:20:33

uncertainty or allow the uncertainty in

these random parameters to filter through

271

: 00:20:38

to other model predictions.

272

: 00:20:40

That's something that's very natural to do

from a Bayesian perspective.

273

: 00:20:45

potentially not from other perspectives.

274

: 00:20:50

So there's a random parameter piece.

275

: 00:20:54

Another thing that people talk about a lot

is fitting these models to smaller sample

276

: 00:20:59

sizes.

277

: 00:21:00

So for some of these structural equation

models, there's a lot happening and you

278

: 00:21:05

can get these failures to converge if

you're estimating frequentist versions of

279

: 00:21:10

the model.

280

: 00:21:12

Bayesian models,

281

: 00:21:14

can still work there.

282

: 00:21:15

I think you still have to be careful

because of course if you don't have much

283

: 00:21:19

data, the priors are going to be more

influential and sensitivity analyses and

284

: 00:21:26

things become very important.

285

: 00:21:28

So I think it's not just a full solution

to if you don't have much data, but I

286

: 00:21:35

think you can make some progress there

with Bayesian models that are maybe more

287

: 00:21:39

difficult with frequentist models.

288

: 00:21:44

Okay, I see.

289

: 00:21:45

And on the other end, what are some of the

biggest challenges you've encountered in

290

: 00:21:50

BSM estimation and how does your work

address them?

291

: 00:21:57

I've found I encounter problems as I'm

working on my R package or just

292

: 00:22:06

unestimating the models.

293

: 00:22:08

There's a number of problems that aren't

completely evident when you start.

294

: 00:22:12

And one I've worked on recently and I

continue to work on is specifying prior

295

: 00:22:19

distributions for these models in a way

that you know exactly what the prior

296

: 00:22:25

distributions are.

297

: 00:22:27

in a non -software dependent way.

298

: 00:22:32

So in some of these models, there's, say

there's a covariance matrix, a free

299

: 00:22:41

parameter.

300

: 00:22:42

So you're estimating a full covariance

matrix.

301

: 00:22:45

Now, in certain cases of these models, I'm

going to fix some off diagonal elements of

302

: 00:22:52

this covariance matrix to zero.

303

: 00:22:55

but then I want to freely estimate the

rest of this covariance matrix.

304

: 00:22:59

That becomes very difficult when you're

specifying prior distributions now because

305

: 00:23:06

we have to keep this full covariance

matrix positive definite.

306

: 00:23:10

And I have prior distributions for like an

unrestricted covariance matrix.

307

: 00:23:14

You could do a Wishard or an LKJ, say.

308

: 00:23:18

But to have this covariance matrix where

some of the entries are, say, fixed to

309

: 00:23:22

zero,

310

: 00:23:23

but I still have to keep this full

covariance matrix positive definite.

311

: 00:23:28

The prior distributions become very

challenging there.

312

: 00:23:31

And there's some workarounds that are, I

would say, allow you to estimate the

313

: 00:23:37

model, but make it difficult to describe

exactly what prior distribution did you

314

: 00:23:42

use here.

315

: 00:23:44

That's a piece that continues to challenge

me.

316

: 00:23:50

Yeah, and so what are you?

317

: 00:23:53

What I'm working on these days to try and

address that.

318

: 00:23:59

Um

319

: 00:24:01

I've been, I've looked at some ways to

decompose a covariance matrix.

320

: 00:24:08

So let's say the Kolesky factors or

things, and we have put prior

321

: 00:24:12

distributions on some decomposition of

this covariance matrix so that it's easy

322

: 00:24:17

to put, say, some normal priors on the

elements of the decomposition while

323

: 00:24:26

maintaining this positive definite full

covariance matrix.

324

: 00:24:30

And,

325

: 00:24:31

I think I made some progress there, but

then you get into this situation where I

326

: 00:24:38

want to put my prior distributions on

intuitive things.

327

: 00:24:42

If I get to like some Kolesky factor that

might have some intuitive interpretation,

328

: 00:24:51

but sometimes maybe not.

329

: 00:24:52

And you run into this problem then of,

okay, if I want to put a prior

330

: 00:24:56

distribution on this.

331

: 00:24:59

could I meaningfully do that or could a

user meaningfully do that versus they

332

: 00:25:03

would just use some default because they

don't know what else they would put on

333

: 00:25:08

that.

334

: 00:25:08

That becomes a bit of a problem too.

335

: 00:25:12

Yeah, yeah.

336

: 00:25:15

That's definitely also something I have to

handle when I am teaching these kind of

337

: 00:25:23

the compositions.

338

: 00:25:26

Like usually the way I...

339

: 00:25:28

teach that is when you do that in a linear

regression, for instance, and you would

340

: 00:25:34

try and infer not only the intercept and

the slope, but the correlation of

341

: 00:25:40

intercept and slope.

342

: 00:25:42

And so that way, if the intercept, like if

you have a negative covariance matrix, for

343

: 00:25:48

instance, that's inferred between the

intercept and the slope.

344

: 00:25:50

That means, well, if you observe a group

and if you do that in a hierarchical

345

: 00:25:54

model, particularly, that's very useful.

346

: 00:25:57

Because that means, well, if I'm in a

group of the hierarchical model where the

347

: 00:26:02

intercepts are high, that probably means

that the slopes are low.

348

: 00:26:08

So, because we have that negative

covariation.

349

: 00:26:13

And that's interesting because that allows

the model to squeeze even more information

350

: 00:26:18

from the data and so make even more

informed and accurate predictions.

351

: 00:26:22

But of course, to do that, the challenge,

352

: 00:26:26

is that you have to infer a covariance

matrix between the intercept and the

353

: 00:26:31

slope.

354

: 00:26:31

How do you infer that covariance matrix

that usually tends to be hard and

355

: 00:26:35

computationally intensive?

356

: 00:26:37

And so that's where the decomposition of

the covariance matrix enters the round.

357

: 00:26:41

So especially the Kolesky decomposition of

the covariance matrix, that's what we

358

: 00:26:47

usually recommend doing in PMC.

359

: 00:26:50

And we have that PM .LKJKoleskykov

distribution.

360

: 00:26:56

And two parametrized that you have to give

a prior on the correlation matrix, which

361

: 00:27:05

is a bit weird.

362

: 00:27:06

But when you think about it, when people

think about it, it's like, wait, prior as

363

: 00:27:11

a distribution, understand a prior as a

distribution on a correlation matrix is

364

: 00:27:17

hard to understand.

365

: 00:27:18

But actually, when you decompose, it's not

that hard.

366

: 00:27:22

because it's mainly, well, what's the

parameter that's inside a correlation

367

: 00:27:25

matrix?

368

: 00:27:26

That's parameter that says there is a

correlation between A and B.

369

: 00:27:30

And so what is your a priori belief of

that correlation between the intercept and

370

: 00:27:36

the slope?

371

: 00:27:37

And so usually you don't want the

completely flat prior, which stays any

372

: 00:27:43

correlation is possible with the same

degree of belief.

373

: 00:27:46

So that means I really think that there is

as much possibility of that

374

: 00:27:51

of slopes and intercept to be completely

positively correlated as they have a

375

: 00:27:57

possibility to be not at all correlated.

376

: 00:27:59

I'm not sure.

377

: 00:28:00

So if you think that, then you need to use

a regularizing weighting information

378

: 00:28:05

priors as you do for any other parameters.

379

: 00:28:07

So you could think of coming up with a

prior that's a bit more bell -shaped prior

380

: 00:28:14

in a way that gives more mass to the low.

381

: 00:28:20

Yeah.

382

: 00:28:21

to smaller correlations.

383

: 00:28:23

And then that's how usually you would do

that in PMC.

384

: 00:28:28

And that's what you're basically talking

about.

385

: 00:28:31

Of course, that's more complicated and it

makes your model more complex.

386

: 00:28:34

But once you have ran that model and have

that inference, that can be extremely

387

: 00:28:40

useful and powerful for posterior

analysis.

388

: 00:28:44

So it's trade -off.

389

: 00:28:46

Yeah, yeah, definitely.

390

: 00:28:48

But that reminds me of...

391

: 00:28:51

I would say like in psychology, in

psychometrics, there's still a lot of

392

: 00:28:58

hesitance to use informative priors.

393

: 00:29:00

There's still the idea of I want to do

something objective.

394

: 00:29:06

And so I want my priors to be all flat,

which especially like you say for a

395

: 00:29:12

correlation or even for other parameters,

I'm against that.

396

: 00:29:17

Now I would like to put some...

397

: 00:29:20

information in my priors always, but that

is always a challenge because like for the

398

: 00:29:26

models I work with, users are accustomed,

like I said, to specifying this big model

399

: 00:29:32

and pressing a button and it runs and it

estimates.

400

: 00:29:36

But now you do that in a Bayesian context

with these uninformative priors.

401

: 00:29:43

Sometimes you just run into problems and

you have to think more about the priors

402

: 00:29:46

and add some information.

403

: 00:29:48

Yeah.

404

: 00:29:49

Which is, if you ask me, a blessing in

disguise, right?

405

: 00:29:52

Because just because a model seems to run

doesn't mean it is giving you sensible

406

: 00:29:58

results and unbiased results.

407

: 00:30:00

I actually love the fact that usually HMC

is really unforgiving of really bad

408

: 00:30:08

priors.

409

: 00:30:11

So of course, it's usually something we

tend to teach is, try to use priors that

410

: 00:30:17

make sense, right?

411

: 00:30:17

A priori.

412

: 00:30:18

Most of the time you have more information

than you think.

413

: 00:30:21

And if you're thinking from a betting

perspective, like let's say that any

414

: 00:30:25

decision you make with your model is

actually something that's going to cost

415

: 00:30:27

you money or give you money.

416

: 00:30:29

If you were to bet on that prior, why

wouldn't you use any information that you

417

: 00:30:36

have at your disposal?

418

: 00:30:37

Why would you throw away information if

you knew that actually you had information

419

: 00:30:41

that would help you make a more

informed...

420

: 00:30:45

bet and so bet that gives you actually

more money instead of losing money.

421

: 00:30:49

And so I find that this way of framing the

priors can actually like usually works on

422

: 00:30:57

beginners because that helps them see the

like the idea.

423

: 00:31:00

It's like the idea is not to fudge your

analysis, even though I can show you how

424

: 00:31:05

to fudge your analysis, but in both ways.

425

: 00:31:06

I can use priors which are going to bias

the model, but I can also use priors that

426

: 00:31:11

are going to completely

427

: 00:31:14

unbiased the model, but just make it so

variable that it's just going to answer

428

: 00:31:18

very aggressively to any data point.

429

: 00:31:21

And do you really want that?

430

: 00:31:22

I'm not sure.

431

: 00:31:23

Do you really want to make very hard

claims based on very small data?

432

: 00:31:30

I'm not sure.

433

: 00:31:31

So again, if you come back to this idea

of, imagine that you're betting.

434

: 00:31:37

Wouldn't you use all the information you

have at your disposal?

435

: 00:31:39

That's all.

436

: 00:31:40

That's everything you're doing.

437

: 00:31:43

That doesn't mean that information is

golden.

438

: 00:31:45

That doesn't mean you have to be extremely

certain about the information you're

439

: 00:31:47

putting in.

440

: 00:31:48

That just means let's try to put some more

structure because that doesn't make any

441

: 00:31:54

sense if you're modeling football players.

442

: 00:31:59

That doesn't make any sense to allow them

to be able to score 20 goals in a game.

443

: 00:32:07

It doesn't ever happen.

444

: 00:32:08

Why would you let the model...

445

: 00:32:12

a low for that possibility.

446

: 00:32:13

You don't want that.

447

: 00:32:14

It's going to make your model harder to

estimate, longer, it's going to take

448

: 00:32:18

longer to estimate also.

449

: 00:32:20

And so that's just less efficient.

450

: 00:32:23

Yeah.

451

: 00:32:24

You mentioned too of HMC being

unforgiving.

452

: 00:32:29

And yeah, a lot of the software that I've

been working on, the model is run and

453

: 00:32:34

stand.

454

: 00:32:36

And from time to time, well, for some of

these structural equation models, there's

455

: 00:32:40

some...

456

: 00:32:41

Like, weekly identified parameters, or

maybe even unidentified parameters, but I

457

: 00:32:46

run into these situations where.

458

: 00:32:49

Somebody runs a Gibbs sampler and they

say, look, it just worked and it converged

459

: 00:32:53

and now I move this model over to Stan and

I'm getting these by modal posteriors or

460

: 00:33:00

such and such.

461

: 00:33:01

It's sort of like a bit of an education of

saying, well, the problem is at Stan.

462

: 00:33:07

The problem was the model all along, but

the Gibbs sampler just didn't.

463

: 00:33:11

tell you that there was a problem.

464

: 00:33:13

Yeah, exactly.

465

: 00:33:13

Exactly.

466

: 00:33:14

Yeah.

467

: 00:33:14

Yeah.

468

: 00:33:15

That's like, that's a joke.

469

: 00:33:17

I have actually a sticker like that, which

is a, which is a meme of, you know, that

470

: 00:33:23

meme of that, that, that guy from a, I

think it's from the notebook, right?

471

: 00:33:26

Who, who is crying and yeah, basically the

sticker I have is when someone tells me

472

: 00:33:33

that the model he has divergences in HMC.

473

: 00:33:37

So they are switching to the Metropolis

sampler and.

474

: 00:33:41

I just dance like, yeah, sure.

475

: 00:33:44

You're not going to have divergences with

the metropolis sampler.

476

: 00:33:47

Doesn't mean the model is converting as

you want.

477

: 00:33:51

And yeah, so that's really that thing

where, yeah, actually, you had problems

478

: 00:33:59

with the model already.

479

: 00:33:59

It's just that you were using a crude

instrument that wasn't able to give you

480

: 00:34:03

these diagnostics.

481

: 00:34:04

It's like doing an MRI with a stethoscope.

482

: 00:34:09

Yeah.

483

: 00:34:10

Yeah, that's not going to work.

484

: 00:34:12

It's going to look like you don't have any

problems, but maybe you do.

485

: 00:34:15

It's just like you're not using the right

tool.

486

: 00:34:18

So yeah.

487

: 00:34:19

And also this idea of, well, let's use

flat priors and just let the data speak.

488

: 00:34:25

That can work from time to time.

489

: 00:34:27

And that's definitely going to be the case

anyways, if you have a lot of data.

490

: 00:34:30

Even if you're using weekly regularizing

priors, that's exactly the goal.

491

: 00:34:35

It's just to give you enough structure to

the model in case the data are not

492

: 00:34:38

informative for some parameters.

493

: 00:34:40

The bigger the model, the more parameters,

well, the less informed the parameters are

494

: 00:34:45

going to be if your data stay what they

are, keep being what they are, right?

495

: 00:34:49

If you don't have more.

496

: 00:34:52

And also that assumes that the data are

perfect, that there's no bias, that the

497

: 00:35:02

data are completely trustworthy.

498

: 00:35:05

Do you actually believe that?

499

: 00:35:07

If you don't, well, then...

500

: 00:35:09

You already know something about your

data, right?

501

: 00:35:11

That's your prior right here.

502

: 00:35:12

If you think that there is sampling bias

and you kind of know why, well, that's a

503

: 00:35:18

prior information.

504

: 00:35:19

So why wouldn't you tell that in the

model?

505

: 00:35:20

Again, from that betting perspective,

you're just making your model's life

506

: 00:35:23

harder and your inference is potentially

wrong.

507

: 00:35:28

I'm guessing that's not what you want as

the modeler.

508

: 00:35:33

Yeah, you can trust the data blindly.

509

: 00:35:36

Should you though?

510

: 00:35:38

That's a question you have to answer each

time you're doing a model.

511

: 00:35:42

Yep.

512

: 00:35:42

Most often than not, you cannot.

513

: 00:35:45

Yeah, yeah.

514

: 00:35:46

Yeah, the HMC failing thing, I think

that's a place where you can really see

515

: 00:35:53

the progress that's been made in Bayesian

estimation.

516

: 00:35:57

Just like say in the 20 some years that

I've been doing it, I can think back to

517

: 00:36:03

starting out with wind bugs.

518

: 00:36:05

You're just happy to get the thing to run.

519

: 00:36:07

and to give you some decent convergence

diagnostics.

520

: 00:36:12

I think a lot of the things we did around

the start of wind bugs, if you try to run

521

: 00:36:19

them in Stan now, you find there were a

lot of problems that were just hidden or

522

: 00:36:26

you're kind of overlooked.

523

: 00:36:29

Yeah, yeah, yeah, for sure.

524

: 00:36:31

And definitely that I think we've hammered

that point in the community quite a lot.

525

: 00:36:37

in the last few years.

526

: 00:36:39

And so definitely those points that I've

been making in the last few minutes are

527

: 00:36:45

clearly starting to percolate.

528

: 00:36:47

And I think the situation is way better

than it was a few years ago, just to be

529

: 00:36:51

clear and not come across as complaining

statisticians.

530

: 00:36:56

Because I'm already French.

531

: 00:36:58

So people already imagine that I'm going

to assume that I'm going to complain.

532

: 00:37:02

So if on top of that, I complain about

stats, I'm done.

533

: 00:37:05

People are not going to listen to the

podcast anymore.

534

: 00:37:07

I think you'll be all right.

535

: 00:37:12

So to continue, I'd like to talk about

your Blavin package and what inspired the

536

: 00:37:21

development of this package and how does

it enhance the capabilities of researchers

537

: 00:37:28

in doing BSEM?

538

: 00:37:31

Yeah, I think I said earlier my...

539

: 00:37:36

PhD was about some Bayesian factor

analysis models and looking at some

540

: 00:37:42

missing data issues.

541

: 00:37:44

I would say it wasn't the greatest PhD

thesis, but it was finished.

542

: 00:37:50

And at the time, I thought it would be

nice to have some software that would give

543

: 00:37:57

you some somewhat simple way to specify a

model.

544

: 00:38:02

And then it could be translated to

545

: 00:38:04

like at the time wind bugs so that you

could have some easier MCMC estimation.

546

: 00:38:13

But at that time, like, I, the, like R

wasn't as quite as developed and my skills

547

: 00:38:21

weren't quite there to be able to do that

all on my own.

548

: 00:38:25

So I left it for a few years, then around

: 2009

549

: 00:38:34

Some R packages for frequent structural

equation models were becoming better

550

: 00:38:40

developed and more supported.

551

: 00:38:43

So a few years later, I met the developer

of the LaVon package, which does frequent

552

: 00:38:51

structural equation models and did some

work with him.

553

: 00:38:56

And from there I thought, well,

554

: 00:39:00

he's done some of the hard work already

just with model specification and setting

555

: 00:39:04

up the model likelihood.

556

: 00:39:06

So I built this package on top of what was

already there to do like the Bayesian

557

: 00:39:10

version of that model estimation.

558

: 00:39:13

And then it has just gone from there.

559

: 00:39:16

I think I continue to learn more things

about these models or encounter tricky

560

: 00:39:23

issues that I wasn't quite aware of when I

started.

561

: 00:39:26

And I just have...

562

: 00:39:29

continue it on.

563

: 00:39:32

Yeah.

564

: 00:39:32

Well, that sounds like a fun project for

sure.

565

: 00:39:37

And how would people use it right now?

566

: 00:39:42

When would you recommend using your

package for which type of problems?

567

: 00:39:48

Well, the idea from the start was

always...

568

: 00:39:53

make the model specification and

everything very similar to the LaVon

569

: 00:39:57

package for Frequence models because that

package was already fairly popular among

570

: 00:40:03

people that use these models.

571

: 00:40:05

And the idea was, well, they could move to

doing a Bayesian version without having to

572

: 00:40:11

learn a brand new model specification.

573

: 00:40:13

They could already do something similar to

what they had been doing on the Frequence

574

: 00:40:17

side.

575

: 00:40:19

So that's like,

576

: 00:40:22

from the start where we, the idea that we

had or what we wanted to do with a package

577

: 00:40:28

and then who would use it?

578

: 00:40:32

I think it could be for some of these

measurement problems, like I said, with

579

: 00:40:37

item response modelers or things if they

wanted to do a Bayesian version of some of

580

: 00:40:41

these models that's currently possible and

blah, blah, and another place is.

581

: 00:40:50

With something kind of similar to the

DAGs, the directed acyclic graphs we talk

582

: 00:40:57

about, especially in the social sciences,

people have these theories about they have

583

: 00:41:02

a collection of variables and what

variables cause what other variables and

584

: 00:41:07

they want to estimate some regression type

relationships between these things.

585

: 00:41:11

You would see it often like an

observational data where you can't really

586

: 00:41:15

do these.

587

: 00:41:16

these manipulations the way you could in

an experiment.

588

: 00:41:21

But the idea is that you could specify a

graph like that and use Blofond to try to

589

: 00:41:27

estimate these regression -like

relationships that if the graph is

590

: 00:41:32

correct, you might interpret it as causal

relationships.

591

: 00:41:39

Yeah, fascinating, fascinating.

592

: 00:41:40

I love that.

593

: 00:41:42

And I'll put the package, of course, in

the show notes.

594

: 00:41:46

And I encourage people to take a look at

the website.

595

: 00:41:50

There are some tutorials and packages of

the, sorry, some tutorials on how to use

596

: 00:41:56

the package on there.

597

: 00:41:58

So yeah, definitely take a look at the

resources that are on the website.

598

: 00:42:02

And of course, everything is on the show

notes.

599

: 00:42:05

Another topic I thought was very

interesting from your background is that

600

: 00:42:11

your research also touches on forecasting

and subjective probability.

601

: 00:42:16

Can you discuss how Bayesian methods

improve these processes, particularly in

602

: 00:42:22

crowdsourcing wisdom, which is something

you've worked on quite a lot?

603

: 00:42:27

Yeah, I started working on that.

604

: 00:42:30

It was probably 2009 or 2010.

605

: 00:42:35

So at that time, I think...

606

: 00:42:39

Tools like Mechanical Turk were becoming

more usable and so people were looking at

607

: 00:42:45

this wisdom of Krausen saying, can we

recruit a large group of people from the

608

: 00:42:50

internet?

609

: 00:42:51

And if we average their predictions, do

those make for good predictions?

610

: 00:42:58

I got involved in some of that work,

especially through some forecasting

611

: 00:43:03

tournaments that were being run by

612

: 00:43:07

the US government or some branches of the

US government at the time.

613

: 00:43:14

I think Bayesian tools there first made

some model estimations easier just the way

614

: 00:43:21

they sometimes do in general.

615

: 00:43:23

But also with forecasting, it's all about

uncertainty.

616

: 00:43:27

You might say, here's what I think will

happen.

617

: 00:43:30

But then you also want to have some

characterization of.

618

: 00:43:34

your certainty or uncertainty that

something happens.

619

: 00:43:38

I think that's where the Bayesian approach

was really helpful.

620

: 00:43:44

Of course, you always have this trade -off

with you are giving a forecast often to

621

: 00:43:51

like a decision maker or an executive or

someone that is a leader.

622

: 00:43:56

Those people sometimes want the simplest

forecast possible and it's sometimes

623

: 00:44:01

difficult to convince them that,

624

: 00:44:03

Well, you also want to look at the

uncertainty around this forecast as

625

: 00:44:07

opposed to just a point estimate.

626

: 00:44:09

Yeah.

627

: 00:44:09

But that's some of the ways we were using

Bayesian methods, at least to try to

628

: 00:44:15

characterize uncertainty.

629

: 00:44:17

Yeah.

630

: 00:44:18

Yeah.

631

: 00:44:19

I'm becoming more and more authoritative

on these fronts, you know, just not even

632

: 00:44:27

giving the point estimates anymore and by

default giving a range for the

633

: 00:44:30

predictions.

634

: 00:44:31

and then people have to ask you for the

point estimates.

635

: 00:44:35

Then I can make the point of, do you

really want that?

636

: 00:44:37

Why do you want that one?

637

: 00:44:39

And why do you want the mean more than the

tail?

638

: 00:44:42

Maybe in your case, actually, the tail

scenarios are more interesting.

639

: 00:44:47

So keep that in mind.

640

: 00:44:49

So yeah, people have to opt in to get the

point estimates.

641

: 00:44:54

And well, the human brain being what it

is, usually it's happy with the default.

642

: 00:44:59

And so...

643

: 00:45:00

Making the default better is something I'm

trying to actually actively do.

644

: 00:45:05

That's a good point.

645

: 00:45:07

So what for reporting modeling results,

you avoid posterior means.

646

: 00:45:12

All you give them is like a posterior

interval or something.

647

: 00:45:16

A range.

648

: 00:45:17

Yeah.

649

: 00:45:17

Yeah.

650

: 00:45:18

Yeah, exactly.

651

: 00:45:20

Not putting particular emphasis on the

mean.

652

: 00:45:22

Because otherwise what's going to end up

happening, and that's extremely

653

: 00:45:27

frustrating to me, is...

654

: 00:45:29

I mentioned that you're comparing two

options.

655

: 00:45:34

And so you have the posterior on option A,

the posterior on option B.

656

: 00:45:39

You're looking at the first plot of A and

B.

657

: 00:45:42

They seem to overlap.

658

: 00:45:44

So then you compute the difference of the

posteriors.

659

: 00:45:47

So B minus A.

660

: 00:45:48

And you're seeing where it spans on the

real line.

661

: 00:45:53

And if option A and B are close enough,

662

: 00:45:57

the HDI, so the highest density interval,

is going to overlap with zero.

663

: 00:46:02

And it seems like zero is a magic number

that makes the whole HDI collapse on one

664

: 00:46:08

point.

665

: 00:46:08

So basically, the zero is a black hole

which just sucks everything onto itself,

666

: 00:46:12

and then the whole range is zero.

667

: 00:46:16

And then people are just going to say, oh,

but that's weird because, no, I think

668

: 00:46:21

there is some difference between A and B.

669

: 00:46:24

And then you have to say, but that's not

what the model is saying.

670

: 00:46:27

You're just looking at zero and you see

that the HDI overlaps zero at some point.

671

: 00:46:31

But actually the model is saying that, I

don't know, there is an 86 % chance that

672

: 00:46:36

option A is actually better than option B

is actually better than A.

673

: 00:46:41

So, you know, there is a five in six

chance, which is absolutely non -next

674

: 00:46:46

level that B is indeed better than A, but

we can actually rule out the possibility

675

: 00:46:49

that A is better than B.

676

: 00:46:51

That's what the model is saying.

677

: 00:46:52

It's not telling you that there is no

difference.

678

: 00:46:55

And it's not telling you that

679

: 00:46:56

A is definitely better than B.

680

: 00:46:59

And that is still in it.

681

: 00:47:01

I'm trying to crack.

682

: 00:47:03

But yeah, here you cannot make the zero

disappear, right?

683

: 00:47:09

But the only thing you can do is make sure

that people don't interpret the zero as a

684

: 00:47:13

black hole.

685

: 00:47:14

That's the main thing.

686

: 00:47:15

Yeah, yeah.

687

: 00:47:16

Yeah, yeah, that's a good point.

688

: 00:47:22

I can see that being challenging for

people that come from frequentist models

689

: 00:47:26

because what they're accustomed to, the

maximum likelihood estimate.

690

: 00:47:32

And it's all about those point estimates.

691

: 00:47:35

But I like the idea of not even supplying

those point estimates.

692

: 00:47:40

Yeah.

693

: 00:47:41

Yeah, yeah.

694

: 00:47:41

I mean, and that makes sense in the way

that's just a distraction.

695

: 00:47:45

It doesn't mean anything in particular.

696

: 00:47:47

That's mainly a distraction.

697

: 00:47:48

What's more important here is the range.

698

: 00:47:50

of the estimates.

699

: 00:47:53

So, you know, like give the range and give

the point estimates if people ask for it.

700

: 00:47:57

But otherwise, that's more distraction

than anything else.

701

: 00:48:01

And I think I got that idea from listening

to a talk by Richard MacGarriff, who was

702

: 00:48:09

talking about something he called table

two fallacy.

703

: 00:48:13

Yeah, I know that.

704

: 00:48:15

Where usually the present the table of

estimates in the table two.

705

: 00:48:19

And usually people tend to, his point with

that, people tend to interpret the

706

: 00:48:28

coefficient on a linear regression, for

instance, as all of them as causal, but

707

: 00:48:36

they are not.

708

: 00:48:37

The only parameter that's really causally

interpretable is the one that relates the

709

: 00:48:41

treatment to the outcome.

710

: 00:48:43

The other one, for instance, from a

mediator to the outcome, or...

711

: 00:48:49

the one from a confounder to the outcome,

you cannot interpret that parameter as

712

: 00:48:56

causal.

713

: 00:48:57

Or you have to do the causal graph

analysis and then see if the linear

714

: 00:49:02

regression you ran actually corresponds to

the one you would have to run in this new

715

: 00:49:09

causal DAG to identify or the direct or

the total causal effect of that new

716

: 00:49:16

variable that you're taking as the

treatment.

717

: 00:49:18

basically you're changing the treatment

here.

718

: 00:49:20

So you have to change the model

potentially.

719

: 00:49:23

And so you cannot interpret and should

absolutely not interpret the parameters

720

: 00:49:28

that are not the one from the treatment to

the outcome as causally interpretable.

721

: 00:49:33

And so to avoid that fallacy, he was

suggesting two options or you actually

722

: 00:49:41

provide the interpretation of that

parameter in the current DAG that you

723

: 00:49:46

have.

724

: 00:49:47

And say, if it's not causally

interpretable in that case, which DAG you

725

: 00:49:51

would have, which regression, sorry, which

model would have to use, which is

726

: 00:49:56

different from the one you actually have

RAM to actually be able to interpret that

727

: 00:50:02

coefficient causally.

728

: 00:50:03

Or you just don't report these parameters,

these coefficients, because they are not

729

: 00:50:08

the point of the analysis.

730

: 00:50:10

The point of the analysis is to relate the

treatment to the outcome and see what the

731

: 00:50:13

effect of the treatment is on the outcome.

732

: 00:50:15

not what the treatment of a camp founder

on the outcome is.

733

: 00:50:18

So why would you report that in the first

place?

734

: 00:50:20

You can report it if people ask for it,

but you don't, you should not report it by

735

: 00:50:24

default.

736

: 00:50:26

Yeah, yeah.

737

: 00:50:27

There's some good like tie -ins to

structural equation models there too,

738

: 00:50:30

because I think like in some of those,

some of McElroy's examples, he dabbles a

739

: 00:50:37

little bit in structural equation model

and to, it's kind of like a one possible

740

: 00:50:42

solution here to,

741

: 00:50:45

to really saying what could we interpret

causally or not in the presence of

742

: 00:50:50

confounding variables or like there's the

colliders that also cause problems if you

743

: 00:50:56

include them in a regression.

744

: 00:50:59

Yeah, he does a little bit.

745

: 00:51:00

I've seen some of his examples like what

structural equation model source of

746

: 00:51:04

things.

747

: 00:51:04

I think there's something interesting

there about informing what predictors

748

: 00:51:11

should go in a regression or.

749

: 00:51:13

what could we interpret causally out of a

particular model?

750

: 00:51:18

Yeah, exactly.

751

: 00:51:21

And I have actually linked to the table 2

fallacy thing I was talking about, his

752

: 00:51:34

video of that.

753

: 00:51:36

So this will be in the show notes for

people who want to dig deeper.

754

: 00:51:41

Yes.

755

: 00:51:43

And, yeah, so we're in this discussion.

756

: 00:51:46

I really love to talk about these topics,

as you can see, and I've really deeply

757

: 00:51:50

enjoyed diving deeper into them.

758

: 00:51:54

And still, I'm diving deeper into these

topics for: 2024

759

: 00:51:58

That's one of my objectives, so that's

really fun.

760

: 00:52:03

Yeah.

761

: 00:52:04

Maybe let's talk about latent viable

models, because you also work on that.

762

: 00:52:10

And if I understood correctly, they are

quite crucial in psychology.

763

: 00:52:13

So how do you approach these models,

especially in the context of patient

764

: 00:52:20

stance?

765

: 00:52:21

And maybe explain, also give us a primer

on what latent viable models are.

766

: 00:52:26

Yeah, I would.

767

: 00:52:27

So sometimes I almost use them as like

just another term for structural equation

768

: 00:52:32

model.

769

: 00:52:33

They're very related.

770

: 00:52:36

I would say.

771

: 00:52:38

I would say if I'm around psychology or

psychometrics people, I would use the term

772

: 00:52:43

structural equation model.

773

: 00:52:44

But if I'm around statistics people, I

might more often use the term latent

774

: 00:52:49

variable model because I think that term

latent variable, or maybe sometimes people

775

: 00:52:57

might say a hidden variable or something

that's unobserved.

776

: 00:53:03

But it's like in...

777

: 00:53:05

in structural equation modeling, that is

sort of just like a random effect or a

778

: 00:53:09

random parameter that we assume has some

influence on other observed variables.

779

: 00:53:17

And that you can never observe it.

780

: 00:53:22

That's right.

781

: 00:53:23

And so the traditional example is...

782

: 00:53:28

maybe something related to intelligence or

say like a person's math aptitude,

783

: 00:53:33

something you would use a standardized

test for.

784

: 00:53:36

You can't directly observe it.

785

: 00:53:38

You can ask many questions that get at a

person's math aptitude.

786

: 00:53:43

And we could assume, yes, there's this

latent aptitude that each person has that

787

: 00:53:49

we are trying to measure with all of our

questions on a standardized test.

788

: 00:53:54

That sort of gets at the idea of latent

variable.

789

: 00:53:58

Yeah.

790

: 00:53:58

Yeah.

791

: 00:53:59

And like, or another example would be the

latent popularity of political parties.

792

: 00:54:10

Like, you never really observed them.

793

: 00:54:12

Actually, you just have an idea with

polls.

794

: 00:54:14

You had a better idea with elections, but

even elections are not a perfect image of

795

: 00:54:19

that because nobody, like, not everybody

goes and vote.

796

: 00:54:24

So that's thank you again.

797

: 00:54:26

actually never observe the actual

popularity of political parties in the

798

: 00:54:33

total population because, well, even

elections don't make a perfect job of

799

: 00:54:38

that.

800

: 00:54:39

Yeah, yeah, yeah.

801

: 00:54:40

Yeah, and then people will get into a lot

of deep philosophy conversations about

802

: 00:54:47

does this latent variable even exist and

how could one characterize that?

803

: 00:54:54

And

804

: 00:54:55

Personally, I don't often get into those

deep philosophy conversations.

805

: 00:55:00

I just more think of this as a model than

within this model.

806

: 00:55:05

It could be a random parameter.

807

: 00:55:07

And I guess maybe it's just my personal

bias.

808

: 00:55:11

I don't think about it too abstractly.

809

: 00:55:13

I just think about how does this latent

variable function in a model and how can I

810

: 00:55:17

fit this model to data?

811

: 00:55:22

Yeah, I see.

812

: 00:55:23

And so in these cases, how do you found

that using a basin framework has been

813

: 00:55:34

helpful?

814

: 00:55:36

Yeah, I think related to it, I was

discussing before about these latent

815

: 00:55:45

variables are often like random effects.

816

: 00:55:48

And so from a Bayesian point of view, you

can sample those parameters and look at

817

: 00:55:56

how their uncertainty filters through to

other parts of your model.

818

: 00:56:01

That's all.

819

: 00:56:02

very straightforward from a Bayesian point

of view.

820

: 00:56:04

I think those are some of the big

advantages.

821

: 00:56:08

OK, I see.

822

: 00:56:09

I see.

823

: 00:56:09

Yeah.

824

: 00:56:11

If we de -zoom a bit, I'm actually

curious, what would you say is the biggest

825

: 00:56:18

hurdle in the Bayesian workflow currently?

826

: 00:56:23

Um

827

: 00:56:29

There's always challenges with how long

does it take MCMC to run, especially for

828

: 00:56:35

people coming from frequentist models or

things where, for some frequentist models,

829

: 00:56:40

especially with these structural equation

or latent variable models, you can get

830

: 00:56:45

some maximum likelihood estimates in a

couple of seconds.

831

: 00:56:48

And there's cases with MCMC, it might take

much longer depending on how the model was

832

: 00:56:55

set up or how tailored.

833

: 00:56:57

your estimation strategy is to a

particular model.

834

: 00:57:02

So I think speed is always an issue.

835

: 00:57:06

And that I think could maybe detract some

people from doing Bayesian modeling

836

: 00:57:13

sometimes.

837

: 00:57:14

I would say maybe the other barrier to the

workflow is just getting people to slow

838

: 00:57:21

down and just be happy with slowing down

with working through their model.

839

: 00:57:27

I think especially in the social sciences

where I work, people become too accustomed

840

: 00:57:36

to specifying their model, pressing a

button, getting the results immediately

841

: 00:57:42

and writing it and being done.

842

: 00:57:45

And I think that's not how good Bayesian

modeling happens.

843

: 00:57:50

Good Bayesian modeling, you sit back a

little bit and think through everything.

844

: 00:57:56

And...

845

: 00:57:57

I think is a challenge convincing people

sometimes to make that a habitual part of

846

: 00:58:04

the workflow.

847

: 00:58:05

Yeah.

848

: 00:58:06

Bayesian models need love.

849

: 00:58:08

You need to give it love for sure.

850

: 00:58:10

I personally have been working lately on

an academic project like that where we're

851

: 00:58:17

writing a paper on, basically it's a trade

paper on biology, marine biology trade.

852

: 00:58:26

And the model is extremely complex.

853

: 00:58:30

And that's why I'm on this project is to

work with the academics working on it who

854

: 00:58:38

are extremely knowledgeable, of course,

but on their domain.

855

: 00:58:42

And me, I don't understand anything about

the biology part, but I'm just here to try

856

: 00:58:47

and make the model work.

857

: 00:58:49

And the one is tremendously complicated

because the phenomenon they are studying

858

: 00:58:53

is extremely complex.

859

: 00:58:55

So.

860

: 00:58:56

Yeah, but like here, the amazing thing is

that the person leading the project, Aaron

861

: 00:59:02

McNeil, has a huge appetite for that kind

of work, right?

862

: 00:59:06

And really love doing the Bayesian model,

coding it, and then improving it together.

863

: 00:59:14

But definitely that's a big endeavor,

takes a lot of time.

864

: 00:59:18

But then the model is extremely powerful

afterwards and you can get a lot of

865

: 00:59:22

inferences that you cannot have with a

classic trivial model.

866

: 00:59:26

So, you know, there is no free lunch,

right?

867

: 00:59:29

If your model is trivial, your inferences

probably will be, unless you're extremely

868

: 00:59:34

lucky and you're just working on something

that nobody has worked on before.

869

: 00:59:37

So then it's like, just a forest

completely new.

870

: 00:59:43

But otherwise, if you want interesting

inferences, you have to have an

871

: 00:59:48

interesting model.

872

: 00:59:49

And that takes time, takes dedication, but

for sure it's extremely...

873

: 00:59:54

interesting and then after once it gives

you a lot of power.

874

: 00:59:58

So, you know, it's a bit of a...

875

: 01:00:01

That's also a bit frustrating to me in the

sense that the model is actually not going

876

: 01:00:04

to be really part of the paper, right?

877

: 01:00:07

People just care about the results of the

model.

878

: 01:00:09

But me, it's like, and I mean, it makes

sense, right?

879

: 01:00:12

It's like when you buy a car, yeah, the

engine is important, but you care about

880

: 01:00:17

the whole car, right?

881

: 01:00:19

But I'm guessing that the person who built

the engine is like, yeah, but without the

882

: 01:00:22

engine, it's not even a car.

883

: 01:00:24

So why don't you give credit to the

engine?

884

: 01:00:28

But that makes sense.

885

: 01:00:30

But it was really fun for me to see

because for me, the model is really the

886

: 01:00:34

thing.

887

: 01:00:36

But it's actually almost not even going to

be a part of the paper.

888

: 01:00:41

It's going to be an annex or something

like that.

889

: 01:00:44

Yeah.

890

: 01:00:45

That's really weird.

891

: 01:00:48

Put it in the appendix.

892

: 01:00:51

Yeah.

893

: 01:00:53

Yeah.

894

: 01:00:54

So I've already taken a lot of your time,

Ed.

895

: 01:00:58

So let's head up for the last two

questions.

896

: 01:01:04

Before that, though, I'm curious, looking

forward, what exciting developments do you

897

: 01:01:09

foresee in patient psychometrics?

898

: 01:01:13

Uh, the one that I see coming is related

to the speed issue again.

899

: 01:01:21

So, um, I, what there's, there's more and

more MCMC stuff with GPUs.

900

: 01:01:28

And I was at a stand meeting last year

where they're talking about, um, you know,

901

: 01:01:35

imagine being able to run hundreds of

parallel chains that all like share a burn

902

: 01:01:40

in so that, you know,

903

: 01:01:41

one chain isn't going to go off and do

something really crazy.

904

: 01:01:46

I think all of that is really interesting.

905

: 01:01:49

And I think that could really improve some

of these bigger psychometric models that

906

: 01:01:55

can take a while to run if we could do

lots of parallel chains and be pretty sure

907

: 01:02:02

that they're gonna converge.

908

: 01:02:04

I think is something coming that will be

very useful.

909

: 01:02:10

Yeah, that definitely sounds like an

awesome project.

910

: 01:02:15

So before letting you go, Ed, I'm going to

ask you the last two questions I ask every

911

: 01:02:20

guest at the end of the show.

912

: 01:02:22

First one, if you had unlimited time and

resources, which problem would you try to

913

: 01:02:26

solve?

914

: 01:02:29

Yes.

915

: 01:02:34

So I guess people should say, you know,

world hunger or world peace or something,

916

: 01:02:41

but I think I would probably go for

something that's closer to what I do.

917

: 01:02:51

And one thing that comes to mind involves

maybe improving math education or making

918

: 01:03:00

it more accessible to more people.

919

: 01:03:03

I think at least in the US, like for

younger kids growing up with math, it

920

: 01:03:11

feels a little bit like sports where if

you are fortunate to have gotten into it

921

: 01:03:16

really early, then you like have this

advantage and you do well.

922

: 01:03:23

But if you come into math late, say maybe

as a teenager, I think what happens

923

: 01:03:30

sometimes is,

924

: 01:03:32

You see other people that are way ahead of

you, like solving problems you have no

925

: 01:03:36

idea how to do.

926

: 01:03:38

And then you get maybe not so enthusiastic

and you just leave and do something else

927

: 01:03:45

with your life.

928

: 01:03:47

I think more could be done just to try to

get more interested people like staying in

929

: 01:03:54

math related fields and doing more work

there.

930

: 01:03:58

I think.

931

: 01:03:59

with unlimited resources, that's the sort

of thing that I would try to do.

932

: 01:04:06

Yeah, I love that.

933

: 01:04:08

And definitely I can, yeah, I can

understand why you would say that.

934

: 01:04:14

That's a very good point.

935

: 01:04:17

As I was to say, I was late coming around

to math myself.

936

: 01:04:22

I think I don't know what happens in every

country, but in the US, it feels like...

937

: 01:04:28

You're just expected to think that math is

this tough thing that's not for you.

938

: 01:04:33

And unless you have like influences in

your life that would convince you

939

: 01:04:39

otherwise, I think a lot of kids just

don't even make an attempt to do something

940

: 01:04:44

with math.

941

: 01:04:47

Yeah, yeah, that's a good point.

942

: 01:04:50

And second question, if you could have

dinner with any great scientific mind,

943

: 01:04:55

dead, alive, or fictional, who would it

be?

944

: 01:05:01

Yeah, this is one that is easy to

overthink or to really make a big thing

945

: 01:05:08

about.

946

: 01:05:09

But so here's one thing that I think

about.

947

: 01:05:13

There's, I think it's called Stigler's law

about it's related to this idea that the

948

: 01:05:21

person who is known for like a major

finding or scientific result often isn't

949

: 01:05:27

the one that did the hard work.

950

: 01:05:30

Maybe they were the ones that that were

like promoted themselves the most or or

951

: 01:05:37

otherwise just got their name attached and

so If I'm having dinner, I want it to be

952

: 01:05:43

more of a low -key dinner.

953

: 01:05:44

So I don't necessarily want to go for the

most famous person that is the most known

954

: 01:05:50

for something because I worry that they

would just like promote themselves the

955

: 01:05:55

whole time or you would feel like you're

talking to a robot because they're

956

: 01:05:59

They're like, they see themselves as kind

of above everyone.

957

: 01:06:04

So with that in mind, and keeping it on

the Bayesian viewpoint, one person that

958

: 01:06:11

comes to mind is Arianna Rosenbluth, who

was one of the, I think was the first to

959

: 01:06:20

like program a Metropolis Hastings

algorithm and did it in the context of the

960

: 01:06:26

Manhattan project during World War II.

961

: 01:06:29

So I think she would be an interesting

person to have dinner with.

962

: 01:06:33

She clearly did some important work.

963

: 01:06:38

Didn't quite get the recognition that some

others did, but also I think she didn't

964

: 01:06:44

have a traditional academic career.

965

: 01:06:46

So that means that dinner, you know, you

could talk about some work things, but

966

: 01:06:50

also I think she would be interesting to

talk to just, you know, just about other

967

: 01:06:57

non -work things.

968

: 01:06:58

That's the kind of dinner that I would

like to have.

969

: 01:07:01

So that's my answer.

970

: 01:07:03

Love it.

971

: 01:07:04

Love it, Ed.

972

: 01:07:05

Fantastic answer.

973

: 01:07:07

And definitely invite me to that dinner.

974

: 01:07:10

That would be fascinating.

975

: 01:07:12

Fantastic.

976

: 01:07:14

Thanks a lot, Ed.

977

: 01:07:16

We can call it a show.

978

: 01:07:19

That was great.

979

: 01:07:20

I learned a lot.

980

: 01:07:22

And as usual, I will put a link to your

website and your socials and tutorials.

981

: 01:07:28

in the show notes for those who want to

dig deeper.

982

: 01:07:32

Thank you again.

983

: 01:07:33

All right.

984

: 01:07:33

Thanks for taking the time and being on

the show.

985

: 01:07:35

Thanks for having me.

986

: 01:07:36

It was fun.

987

: 01:07:41

This has been another episode of Learning

Bayesian Statistics.

988

: 01:07:45

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

989

: 01:07:50

visit learnbaystats .com for more

resources about today's topics, as well as

990

: 01:07:55

access to more episodes to help you reach

true Bayesian state of mind.

991

: 01:07:59

That's learnbaystats .com.

992

: 01:08:01

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lass and Meghiraam.

993

: 01:08:06

Check out his awesome work at bababrinkman

.com.

994

: 01:08:09

I'm your host.

995

: 01:08:11

Alex and Dora.

996

: 01:08:11

You can follow me on Twitter at Alex

underscore and Dora like the country.

997

: 01:08:16

You can support the show and unlock

exclusive benefits by visiting patreon

998

: 01:08:21

.com slash LearnBasedDance.

999

: 01:08:23

Thank you so much for listening and for

your support.

: 1000

01:08:25,929 --> 01:08:31,839

You're truly a good Bayesian change your

predictions after taking information and

: 1001

01:08:31,839 --> 01:08:35,129

if you think and I'll be less than

amazing.

: 1002

01:08:35,209 --> 01:08:38,109

Let's adjust those expectations.

: 1003

01:08:38,285 --> 01:08:43,695

Let me show you how to be a good Bayesian

Change calculations after taking fresh

: 1004

01:08:43,695 --> 01:08:49,735

data in Those predictions that your brain

is making Let's get them on a solid

: 1005

01:08:49,735 --> 01:08:51,365

foundation

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube