Artwork for podcast Learning Bayesian Statistics
#106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
Business & Data Science Episode 10616th May 2024 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:16:46

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

If there is one guest I don’t need to introduce, it’s mister Andrew Gelman. So… I won’t! I will refer you back to his two previous appearances on the show though, because learning from Andrew is always a pleasure. So go ahead and listen to episodes 20 and 27.

In this episode, Andrew and I discuss his new book, Active Statistics, which focuses on teaching and learning statistics through active student participation. Like this episode, the book is divided into three parts: 1) The ideas of statistics, regression, and causal inference; 2) The value of storytelling to make statistical concepts more relatable and interesting; 3) The importance of teaching statistics in an active learning environment, where students are engaged in problem-solving and discussion.

And Andrew is so active and knowledgeable that we of course touched on a variety of other topics — but for that, you’ll have to listen ;)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary and Blake Walters.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Takeaways:

- Active learning is essential for teaching and learning statistics.

- Storytelling can make statistical concepts more relatable and interesting.

- Teaching statistics in an active learning environment engages students in problem-solving and discussion.

- The book Active Statistics includes 52 stories, class participation activities, computer demonstrations, and homework assignments to facilitate active learning.

- Active learning, where students actively engage with the material through activities and discussions, is an effective approach to teaching statistics.

- The flipped classroom model, where students read and prepare before class and engage in problem-solving activities during class, can enhance learning and understanding.

- Clear organization and fluency in teaching statistics are important for student comprehension and engagement.

- Visualization plays a crucial role in understanding statistical concepts and aids in comprehension.

- The future of statistical education may involve new approaches and technologies, but the challenge lies in finding effective ways to teach basic concepts and make them relevant to real-world problems.

Chapters:

00:00 Introduction and Background

08:09 The Importance of Stories in Statistics Education

30:28 Using 'Two Truths and a Lie' to Teach Logistic Regression

38:08 The Power of Storytelling in Teaching Statistics

57:26 The Importance of Visualization in Understanding Statistics

01:07:03 The Future of Statistical Education

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

If there is one guest I don't need to

introduce, it is Mr.

2

:

Andrew Gammann.

3

:

So I won't.

4

:

I will refer you back to his two previous

appearances on the show, though, because

5

:

learning from Andrew is always a pleasure.

6

:

So go ahead and listen to episodes 20 and

27.

7

:

The links are in the show notes.

8

:

In this episode, Andrew and I discuss his

new book, Active Statistics,

9

:

which focuses on teaching and learning

statistics through active student

10

:

participation.

11

:

Like this episode, the book is divided

into three parts.

12

:

One, the ideas of statistics regression

and causal inference.

13

:

Two, the value of storytelling to make

statistical concepts more relatable and

14

:

interesting.

15

:

And three, the importance of teaching

statistics in an active learning

16

:

environment where students are engaged in

problem solving and discussion.

17

:

And well, Andrew is so active and

knowledgeable,

18

:

that we of course touched on a variety of

their topics, but for that, you'll have to

19

:

listen.

20

:

This is Learning Basis Statistics, episode

,:

21

:

Welcome to Learning Bayesian Statistics, a

podcast about Bayesian inference, the

22

:

methods, the projects, and the people who

make it possible.

23

:

I'm your host, Alex Andorra.

24

:

You can follow me on Twitter at alex

.andorra, like the country.

25

:

For any info about the show, learnbaystats

.com is left last to be.

26

:

Show notes, becoming a corporate sponsor,

unlocking Bayesian Merge, supporting the

27

:

show on Patreon, everything is in there.

28

:

That's LearnBasedStats .com.

29

:

If you're interested in one -on -one

mentorship, online courses, or statistical

30

:

consulting, feel free to reach out and

book a call at topmate .io slash alex

31

:

underscore and dora.

32

:

See you around, folks, and best patient

wishes to you all.

33

:

on LBS now, so for curious listeners, I

definitely recommend episode 20, which was

34

:

your first one with Andrew Gell -Mann.

35

:

Yes, you were here.

36

:

And with Akive Tali and Jennifer Hale, it

was both your previous book, Regression

37

:

and Other Stories.

38

:

And then episode 27 with Marilyn

nn, where we talked about the:

39

:

US presidential elections.

40

:

We talked about the model you folks did

for the economists.

41

:

So definitely recommend checking this one

out because I'm guessing this is going to

42

:

be interesting also for this year's

election.

43

:

Yeah, we're working with them for 2024 as

well.

44

:

So we're trying to improve the model.

45

:

Perfect.

46

:

Yeah.

47

:

So it seems like you're releasing a book

every four year just before the US

48

:

election.

49

:

I hope it won't be four years before an

Xbook comes out.

50

:

We're trying to finish our Bayesian

workflow book.

51

:

So we're hoping that will be done by the

end of the year.

52

:

Well, yeah, definitely curious to check

this one out.

53

:

I think I also saw that you're working on

an MRP update book.

54

:

Is that still the case?

55

:

Yeah, I think Yajuan and some Lauren...

56

:

Uh, Kennedy and some other people are

organizing this, um, uh, MRP book edited

57

:

book we're putting together.

58

:

Yeah.

59

:

Um, I will definitely check these out.

60

:

Well, writing books is a lot of fun

because you can write whatever you want

61

:

because you're trying to communicate with

the audience.

62

:

When you write an article, you're trying

to communicate with the reviewers who

63

:

aren't the readers.

64

:

It's a very weird indirect thing.

65

:

It's.

66

:

I guess similarly, if you're trying to

write a TV show, you have to convince the

67

:

TV network to produce the show, but

they're not the people who are watching it

68

:

and articles are like that too.

69

:

But a book is so simple.

70

:

You just write a book and you're just

aiming to reach people.

71

:

It's very pleasant.

72

:

I recommend it.

73

:

Yeah.

74

:

I can see that it's something you really

enjoy because you're such a prolific

75

:

author.

76

:

Yeah.

77

:

I am.

78

:

Personally, I use MRP quite a lot and

often, so I'm definitely super curious to

79

:

see what's going to be in this book.

80

:

I'm sure I'm going to learn things

personally, and that's also going to help

81

:

me teach MRP, which I'm doing from time to

time.

82

:

Thanks a lot.

83

:

We have a research project I'm very

excited about now, which is integrating

84

:

survey weights into MRPs.

85

:

So people do it now, though.

86

:

They'll think they'll run weighted

regression or they'll do like in, they'll

87

:

have the model in stand and use power

likelihood, but it's not really quite

88

:

right.

89

:

So we have what I think is a better

approach, but that's not what you have me

90

:

here today, right?

91

:

Here I'm supposed to talk about our active

statistics book, my new book with Aki.

92

:

Yeah, yeah.

93

:

Yeah, exactly.

94

:

I would, we can put whatever you want, but

yeah, the main focus is going to be your

95

:

new book.

96

:

Active Statistics with Akira Etari.

97

:

And yeah, so maybe can you give us an idea

of the genesis of the book and thanks for

98

:

showing up the book on the video.

99

:

So those watching on YouTube.

100

:

So it's for people learning statistics or

teaching statistics.

101

:

So the story is that everybody says you

102

:

Want to do active learning so students

should be working together class class

103

:

time should be an active time for students

to be thinking about problems discussing

104

:

problems.

105

:

I notice so what.

106

:

Okay, I teach a class based on regression

other stories and it's two semesters and

107

:

each semester is 13 weeks and each week

has two classes.

108

:

So that's 52 classes.

109

:

And we cover the book every class is an

hour and a half long, or I guess, seventy

110

:

five minutes long and each class.

111

:

I have a story a class participation

activity, a computer, a computer

112

:

demonstration, some quick drills for

students to work on in class, and then the

113

:

discussion problem for students to talk

about and think more.

114

:

I don't always have time in every class to

do all of these, but sometimes I do and I

115

:

can always do most of them.

116

:

I found when I had been teaching

statistics, I told stories a lot, but what

117

:

happened, it's tricky to tell a story,

partly because for other, not every

118

:

teacher has a lot of experience, so they

don't always have a lot of good stories.

119

:

So,

120

:

So, okay, so our book, it's okay.

121

:

Our book is 52 stories, 52 class

participation activities, 52 computer

122

:

demonstrations, et cetera, one for each

class.

123

:

So, first, these are 52 stories that are

pretty good that I've come up with or that

124

:

Aki and I have encountered in our careers.

125

:

So, there are high quality stories, but

also when you tell a story, when I tell a

126

:

story in class, sometimes it gets a little

disorganized.

127

:

So, it worked good.

128

:

It worked well to write the stories down.

129

:

And for each story, we very explicitly say

how it connects to the week's topic, the

130

:

week's reading, and also how it connects

to the course as a whole.

131

:

And I felt that had been missing before.

132

:

It wasn't hard for me to tell an

entertaining story with statistical

133

:

content, but I wasn't always making that

connection with what was happening in

134

:

class.

135

:

So I feel that if you're a student and you

want to learn statistics, you can read

136

:

these stories and...

137

:

There are great little stories.

138

:

There aren't a lot of sources for

statistics stories out there.

139

:

Textbooks tend to have boring examples.

140

:

They want to set it up like here's how to

turn the crank.

141

:

Sometimes textbooks tell stories, but they

don't tell them well.

142

:

And I'll give you an example of that in a

moment.

143

:

There isn't really anything like this.

144

:

And so maybe we should have just had a

little book.

145

:

Our book is, how long is it?

146

:

It's three and two fifty pages long.

147

:

Maybe we should have had just a book that

was like 50 or 100 pages long with just

148

:

the stories, because that already is

great.

149

:

Maybe it should have been several

pamphlets rather than one book.

150

:

Then we have class participation

activities.

151

:

These are things where the class gets

involved.

152

:

They're filling out survey forms or.

153

:

they're doing an experiment on each other

or we do an experiment on them or they're

154

:

weighing bags of things and trying to get

estimates, they're flipping coins.

155

:

I love these.

156

:

Deb Nolan and I had a book a few years

ago, Teaching Statistics, A Bag of Tricks,

157

:

which had a few activities, but this is a

million times better.

158

:

First, we didn't have 52 activities, but

also these are lined up with the course.

159

:

So they go in sequence.

160

:

So they're not just fun things to do.

161

:

There are things that line up with

particular lessons.

162

:

And I just love that people tell me

they'll say, Oh, I liked your book and I

163

:

used one of your activities in one of my

classes.

164

:

And it makes you want to scream and like,

you know, throw something at the TV or

165

:

punch the wall or whatever.

166

:

I want you to do it in every class, every

class should have an activity or at least

167

:

most of the time.

168

:

So that was a lot of effort because we had

a bunch, but a bunch of them, like we just

169

:

created from scratch.

170

:

We need an activity for this.

171

:

And that's really great.

172

:

So that could have been its own pamphlet,

another 50 pages.

173

:

Then we have computer demonstrations.

174

:

And I find that live demos are great.

175

:

But if you try to do it from scratch, you

get tangled in the code.

176

:

So it's good to have pre -written live

demos.

177

:

And so that's like to say you should have

a demo.

178

:

And it's surprisingly hard.

179

:

You create even something simple, simulate

fake data and run a regression.

180

:

You have to have good values of the

parameters or else you're not really

181

:

demonstrating the point you want to make.

182

:

If it has some curvature, how much to

have.

183

:

So we tested them out and did them in

class.

184

:

And so that way when I teach, I can always

have a live demo, which is everybody's

185

:

favorite part of class and so forth with

the others.

186

:

And then we have some homework assignments

and we have some chapters at the beginning

187

:

where we talk about how to set up the

class and how to learn better.

188

:

It's not really just for teachers, as I

said, should be for.

189

:

students.

190

:

So that's what's in it.

191

:

Yeah, well, thanks a lot, Andrew.

192

:

I already have a lot of follow -up

questions for you.

193

:

But something also you've told me in

preparing the episode is that you have

194

:

thought about the book in three distinct

parts.

195

:

All right, so first one being the idea of

statistics, regression and causal

196

:

inference.

197

:

Then another pillar would be like using

stories to explain statistics.

198

:

And the third pillar would be the method

of teaching with active student

199

:

participation.

200

:

So why did you choose these three

different pillars and how do you think

201

:

they are helping an active learning of

statistics, which is one of the goals of

202

:

your book?

203

:

So.

204

:

Teaching or learning is like a vector.

205

:

It has a magnitude and a direction.

206

:

And the magnitude is how hard you work to

figure stuff out.

207

:

And the direction is what you're learning.

208

:

So yeah, I think applied regression and

causal inference is super important.

209

:

This typical audience for this book would

be students who took one statistics class.

210

:

Maybe they already took statistics in high

school or at university.

211

:

took that one class where they learned

about sampling and experimenting and

212

:

estimation, intervals, normal

distribution, stuff like this.

213

:

This is all about using it, about going

beyond that.

214

:

So, yeah, I think applied statistics is

great.

215

:

I want to teach regression about, like,

the most important thing is understanding

216

:

the model and being able to use it.

217

:

Not so much the mathematical theorem

about…

218

:

least squares estimation.

219

:

That's important too.

220

:

There's other places to learn that.

221

:

So yeah, the direction is that it's

applied statistics.

222

:

I think the magnitude is about how to make

that work, how to get people to learn.

223

:

And so most of the learning is not done in

class, but at least if students are doing

224

:

these activities,

225

:

in class that the hour and a half or the

three hours a week they're spending in

226

:

class, they are already heavily thinking

about it.

227

:

Which, and you know, I just like, it's

kind of horrible for the students because

228

:

you really make them work.

229

:

It's like teaching a foreign language

class, right?

230

:

If you go and take a usual class in

college, you sit in the back and you zone

231

:

out and you're like, oh, this is pleasant.

232

:

It's like watching a movie, maybe.

233

:

But if you're in a foreign language class,

you're working all the time, right?

234

:

The teacher's always making you talk and

listen.

235

:

If you lose focus for a second, it's...

236

:

Difficult statistics is a foreign language

and you can learn by speaking it and

237

:

practicing it So I think it's important in

class to be able to do that or if you're

238

:

studying at home to have these activities

and stories That there isn't I mean, it's

239

:

and of course the computer I'll say like

my computer code is pretty bad.

240

:

So that's good, right?

241

:

Because that's like student code.

242

:

It's all crappy code.

243

:

So it's realistic I know it's not the

world's cleanest always

244

:

I would say, but it runs, but maybe it

doesn't all run either.

245

:

It ran when I wrote it.

246

:

But it's supposed to be, when I do code

demos in class, what I like to do is

247

:

actually type in the code, not copy and

paste it.

248

:

So that's modeling how someone might do

it.

249

:

So we try to keep them short enough that

you can do that.

250

:

Yeah, thanks a lot.

251

:

I see what you're doing and I really

appreciate it because that's also helping

252

:

me in my own teaching philosophy because I

do have the same experience where the

253

:

students who end up learning the most are

usually the most active ones.

254

:

but then the main question is, okay, how

do I make them all active?

255

:

Or at least give them the opportunity to

all be active.

256

:

And that's really one of the things.

257

:

Yeah, when I teach, I make them talk.

258

:

Like even it could be a class with 50 or

more students, but I'll tell the story and

259

:

then I'll pause and then say, well, what

do you think?

260

:

Talk to your neighbor about this.

261

:

And I look and I make sure they're

talking.

262

:

And if they're not talking, then I walk

over and say, you know, I go like this to

263

:

them.

264

:

and if their computer is out by look and

if they're on their social media, I ask

265

:

them to close their computer and if their

phone is out, I ask them to close their

266

:

phone and so forth.

267

:

The funny thing is as a teacher, that's

hard, it's easier as a teacher to just

268

:

talk and talk and talk and talk, like I'm

talking now, I'm just talking.

269

:

It's easy to talk and you have complete

control over it.

270

:

So that's why I really needed to structure

this in this way.

271

:

That was my original motivation for all of

this.

272

:

was that many years ago I was teaching a

class and I couldn't make it because I had

273

:

my co -teacher, another faculty member in

the department was teaching the same level

274

:

class, teach my class, and then I went and

taught hers and she said, oh, your

275

:

students were just dead.

276

:

And then I talked to her class and they

were so lively and I realized not that she

277

:

was lucky, but that they had been in that

habit of participating in class.

278

:

She's just a natural great teacher.

279

:

I'm naturally not a good teacher.

280

:

And so I...

281

:

do this stick in order to get them

involved.

282

:

And then I just wanted to do it well.

283

:

I want to tell stories, but I want to be

able to make the point, to help them learn

284

:

it.

285

:

Yeah, that's interesting because me, when

the teachers were doing that to me, it's

286

:

because I was talking too much.

287

:

That happened quite a lot.

288

:

Maybe that's why I have a podcast now.

289

:

Apart from these philosophical

considerations.

290

:

Yeah, that's very interesting.

291

:

I'm going to try that in my own classes.

292

:

The thing is I personally teach a lot of

online courses and so I cannot beender and

293

:

see the screens.

294

:

So that's pretty hard.

295

:

Yeah, it's tough.

296

:

I remember when I was doing the class over

Zoom and you could try to put them in a

297

:

little room so they work in pairs, but yet

if you can't see them doing it, I think

298

:

there is some online...

299

:

conferencing software where you can

actually see the pairs and then then or

300

:

the small groups, but I don't I don't know

the full story with that, but I could get

301

:

so I gave you an example.

302

:

There's something one of the things it's

difficult.

303

:

I don't know.

304

:

There's any answer about this about the

stories is that if they're too if they're

305

:

too simple, that's boring.

306

:

But if they're too complicated, then you

know, that's not good either.

307

:

One thing I like to say like I I want to

send the message that.

308

:

Statistics, how did I put it in the book?

309

:

I had a slogan that statistics is hard.

310

:

It should not feel tricky.

311

:

So I don't like those.

312

:

I don't like this.

313

:

I like statistics stories with a twist,

but I don't like the kind of stories where

314

:

the messages, this is just hard like this,

like at Monte Hall problem.

315

:

I hate that because it's just so confusing

to people.

316

:

Like, what's the lesson that you're

teaching?

317

:

Right?

318

:

Like, this is really, really confusing.

319

:

I don't want to teach that.

320

:

But here's an example.

321

:

And this is a very standard example used

in United States statistics classes where

322

:

we put another twist on it based on the

recent literature.

323

:

So this was a survey that was done in 1936

by a magazine called the Literary Digest.

324

:

And they did a very famous in statistics

books example.

325

:

They did a survey for the presidential

election and it was the presidential

326

:

election was Franklin Roosevelt running

for reelection against somebody who wasn't

327

:

Franklin Roosevelt.

328

:

So you kind of know who won that election.

329

:

But in the their poll, actually, Franklin

Roosevelt was going to get destroyed.

330

:

They did a poll with they they surveyed 10

million people and two and a half million

331

:

of those responded.

332

:

And out of that, it looked like Roosevelt

was completely getting smoked.

333

:

Well, there were two things happening.

334

:

One is the two and a half million

respondents were not random sample of the

335

:

10 million people.

336

:

Second, the 10 million people were

themselves not representative of Americans

337

:

because it was from lists of people who

own cars and things like richer people.

338

:

So it wasn't a representative sample and

usually it just stops there.

339

:

But that's not a good place to stop for a

couple of reasons.

340

:

One of which is what lesson are you

telling people?

341

:

If you don't have a random sample, your

survey is no good.

342

:

Well, unfortunately, no surveys are random

samples.

343

:

I mean, no surveys of humans, no political

polls are.

344

:

So the message would be, oh, you can't

ever trust any political poll.

345

:

Well, that would be a mistake because

political polls, even when they're off,

346

:

they tend only to be off by a couple of

percentage points.

347

:

So what goes on with political?

348

:

Well, so let's OK, so let's look at this

survey.

349

:

The first thing is that the same magazine

had

350

:

done this survey in previous elections and

it had worked well.

351

:

So they had some track record.

352

:

It wasn't as dumb as it sounds.

353

:

Second thing, and this is something that

two statisticians recently looked into, I

354

:

was able to take advantage of their work.

355

:

So Sharon Lore and Michael Brick had

written a paper on this:

356

:

Digest Survey where they realized that, or

the data from the survey are actually

357

:

somewhere, like they're available.

358

:

The, um, and one of the quest, the survey

asked people who they would vote for, but

359

:

it also asked who they voted for in the

previous election.

360

:

So you can adjust for that because you

know, the election outcome, the previous

361

:

election outcome.

362

:

Well, it's not perfect.

363

:

It's not everybody voted in the previous

election.

364

:

And, but it's pretty good.

365

:

And when you do that adjustment, you get,

well, you find that Roosevelt was supposed

366

:

to win.

367

:

Well, it's not a perfect adjustment.

368

:

It's still quite a bit off.

369

:

It's.

370

:

Even after doing this adjustment, it's

still not a representative sample.

371

:

But now we've changed the lesson from,

hey, it's not a random sample, you fool,

372

:

blah, blah, blah, to, hey, this sample is

not a representative sample, but

373

:

statistics can be used to adjust it.

374

:

Look at this.

375

:

But the adjustment is imperfect.

376

:

So it's a more subtle message.

377

:

Well, it's trickier to teach.

378

:

That's one reason why I like having the

story written as a story very clearly in

379

:

the book, because then the student or the

teacher can read through the whole thing.

380

:

If you're a student, you can read it

through.

381

:

And if you're a teacher, you can first

read it before trying to teach it.

382

:

And there it is.

383

:

It's on page 36 and 37 of our book.

384

:

There's a copy of the survey form.

385

:

And.

386

:

It takes it.

387

:

It's it's literally like the takes up the

description takes up one one page of of

388

:

the book.

389

:

Almost almost all of it is a quote from

Lauren Brick because they're the ones who

390

:

did it and then a little discussion of how

it relates to the class.

391

:

But everything is like these stories are

all like that.

392

:

Like they're all you have to balance it.

393

:

And it's it's it's tricky like they almost

should be another.

394

:

booklet of the really simple stories that

we've been including because they're too

395

:

boring for me, but maybe still interesting

for the students.

396

:

I don't know.

397

:

We went back and forth.

398

:

It's structured from beginning to end of

the course.

399

:

So each sec, there's 20, well, there's a

couple of introductory chapters and then

400

:

there's 13 sections for the first semester

and then 13 sections for the second

401

:

semester.

402

:

So most of the book is, is 13 straight, is

26 sections.

403

:

And in each one we have a story and the

404

:

participation activity.

405

:

And we went back and forth about whether

to do it that way or whether to put all

406

:

the stories in one place and all the

activities in one place.

407

:

And I don't know.

408

:

Now I'm thinking I wish we had done it

that way.

409

:

But Aki and I went around and around on

this a million times.

410

:

There's no, you don't need to hear about

this.

411

:

I wanted it to look right.

412

:

The thing is, if you opened up at random,

you might get a page of homework

413

:

assignments and then it might look like a

textbook.

414

:

So it's like, that's the...

415

:

it all kind of looks the same.

416

:

So maybe if we had separately done the

different things, it would have then

417

:

there'd be a whole section of stories.

418

:

But when you're teaching, it's convenient

that's in order because you just go to the

419

:

week of your class and then you can see

what to do that week.

420

:

So that's, I used it to teach.

421

:

Yeah, and I mean, I really love also your

focus on the stories, right?

422

:

I see it's definitely a theme of your work

recently, and I really love that because I

423

:

think it also puts an emphasis on the fact

that statistics is not done in a vacuum,

424

:

right?

425

:

And it's also done by humans.

426

:

with their biases and also their

motivations and so on.

427

:

And I found that way more interesting, way

more realistic.

428

:

And also that captures more the

imagination of the students rather than

429

:

teaching them theorems and formula, which

often is quite intimidating to a lot of

430

:

them.

431

:

So yeah, I hope to admit the stories are

like all things that I can personally

432

:

relate to.

433

:

Like either there are things that I was, I

was either it's research I was involved in

434

:

or it's something close enough to what I

do.

435

:

Like I'm interested in the question being

asked.

436

:

Um, it's yeah, there were, there were, and

the same with the same with the

437

:

activities.

438

:

The activities have a lot of simulated

data.

439

:

I'm a big fan of.

440

:

Yeah, you are.

441

:

Uh, in a, in a lot of your books, you, you

took up with that.

442

:

Um, do you want to, do you want to talk

about.

443

:

bit more about that or you think we've

covered already the idea of simulated data

444

:

in the traditional data?

445

:

Well, I'll just say briefly that I think

we are, as statisticians or computer

446

:

scientists or whatever, we're used to the

idea of here is a data set, let's see what

447

:

we can learn.

448

:

But science, I mean, sometimes we proceed

that way in learning.

449

:

We want to understand the world, you're

curious about something, someone gets a

450

:

bunch of data from

451

:

Basketball or whatever, and then you play

around and see what you can get.

452

:

So that happens, but often things are more

directly motivated.

453

:

Like, yes, in a public opinion poll,

you're really starting with the question.

454

:

When in demonstrating a method encoding

examples, it's super great to have

455

:

simulation.

456

:

partly because it's like it's the dual

problem, right?

457

:

If I can, I simulate the data, then I fit

the model.

458

:

I can check, I can see if the parameter

estimates are similar to the true value,

459

:

but also just the active simulation is the

time reversal of the active inference.

460

:

So it makes sense to show the forward

process too.

461

:

And I think it's kind of a bit of a power

thing.

462

:

It's a student, like I can state, I can

simulate data.

463

:

I can make fake data myself, right?

464

:

That's.

465

:

That's something that can be done.

466

:

Traditionally, we do simulation when we're

teaching probability, like you'll teach

467

:

the central limit theorem by simulating

draws.

468

:

But just a lot of examples come up.

469

:

It's very simulation is a kind of it's

like a universal solvent.

470

:

Like, for example, I think one of our

discussion problems in classes, I show

471

:

them data from some regression, which is

based on real data.

472

:

And I don't remember the example, but

something where there's some treatment

473

:

effect.

474

:

which you maybe expect is positive.

475

:

Maybe the estimate is, let's say the

estimate is 0 .3 and the standard error is

476

:

0 .2.

477

:

And so then I say, and it's based on 100

data points.

478

:

So then I, so it's estimates, estimate is

0 .3, the standard error is 0 .2.

479

:

So I'd say how large a sample would you

need to get a result that's two standard

480

:

errors away from zero?

481

:

That's statistically significant, a term

that I don't like to use, but of course

482

:

they need to know how it gets used.

483

:

So you'd say, oh well, the standard error

is 2, but really the standard error would

484

:

have to be 1 and 1 half for it to be 2

standard errors away from 0.

485

:

So the sample size would have to increase

by a factor of 2 divided by 1 .5 squared.

486

:

So you take 2 over 1 .5 squared, and

that's, you know, so you can do that, you

487

:

know, and you say here,

488

:

2 over 1 .5 squared times 100, and that's

177.

489

:

So you'd say, well, you need a sample size

of 177 to get your estimate to be true.

490

:

So work that out.

491

:

That's wrong.

492

:

That's not the correct answer.

493

:

Because if you redo a study with 177

people, there's no reason to think the

494

:

point estimate will be the same.

495

:

In fact,

496

:

Like the whole point of saying that the

estimate is less than two standard errors

497

:

away from zero and you don't know whether

to believe it, somehow the whole point

498

:

from a Bayesian point of view, the point

is that it's likely to be closer to zero.

499

:

From a classical point of view, the idea

is that you can't rule out zero as an

500

:

explanation and zero is like typically a

privileged value there.

501

:

So if you're replicating a study or even

doing it longer,

502

:

you would have to, the answer depends on

the true treatment effect, not on the

503

:

coefficient estimate.

504

:

And well, that's harder, right?

505

:

But the point is you can show that with a

simulation.

506

:

If it's based on real data, it's trickier

to show because what are you doing?

507

:

But if I then do a simulation and then I

say, well, look, let me try simulating

508

:

100.

509

:

with this true treatment effect and then I

see what I get.

510

:

I say, well, shoot, I didn't get a

treatment effect of 0 .3.

511

:

I was supposed to have to keep doing it.

512

:

And then you realize you're selecting just

some.

513

:

So to me, it brings it to life.

514

:

The applied point gets demonstrated in a

way that's harder to do with just one data

515

:

set.

516

:

Yeah.

517

:

Yeah, yeah, yeah.

518

:

I really love that.

519

:

I agree.

520

:

And that's also something I tend to use.

521

:

On a lot of questions people have on, you

know, A, B tests, settings, things like

522

:

that.

523

:

There's a lot of questions about these,

the sample size, the iteration, things

524

:

like that.

525

:

And I find personally, I have to do the

simulated data studies to answer these

526

:

kinds of questions.

527

:

Like I, I'm bad at like remembering, you

know, all those rules are awesome.

528

:

Like, like let's do that kind of studies

with simulated data and that gives me a

529

:

way better idea.

530

:

So in a completely unrelated topic, I can

tell you about our two truths and a lie

531

:

example.

532

:

That's a demonstration we do.

533

:

I'm mentioning that partly because writing

a book is like writing a hundred articles.

534

:

So at one point I thought, well, maybe I

should publish these as a hundred articles

535

:

because each story could be, well, that

just takes a lot of work and maybe more

536

:

people will read it in book form.

537

:

So I didn't do that, but.

538

:

I did one of them.

539

:

I did one or maybe I did one or two.

540

:

It takes a while to publish an article.

541

:

And for the bad reason that it's just

formatted in a different way, for the

542

:

moderately good reason that you need to

explain more if it's in an article rather

543

:

than a book because you need the context,

for the pretty good reason that you're

544

:

forced to that, that like you have an

opportunity to expand because you have

545

:

more space in the book.

546

:

I can't take up too much.

547

:

I can't have each thing take too long.

548

:

And for the probably the biggest thing is

you get useful reviewer comments and

549

:

people point out problems anyway.

550

:

So the one of the the activities I did

write up as an article was two truths and

551

:

a lie.

552

:

And I gave a link to the article version,

which is longer than what's in the book.

553

:

But I love the story.

554

:

OK, is the story how it came out is that

there's this game which did not exist when

555

:

I was a child.

556

:

But I don't know if they do it in Europe.

557

:

It's a big it was it's popular in.

558

:

in the U .S.

559

:

as the kids do it as an icebreaker in

class, you'll have a group of people and

560

:

one person is the storyteller and this

person tells three things about

561

:

themselves.

562

:

Two of them have to be true and one has to

be a lie and then the other people discuss

563

:

and try to figure out which is the truth

or which is the lie.

564

:

So it's such a fun activity.

565

:

I like to use it as an icebreaker in my

statistics class.

566

:

But it has no statistics content.

567

:

I mean, it is because there's uncertainty,

but what do you do with it?

568

:

So I thought about and thought about and

well, I decided to put it in the second

569

:

semester.

570

:

I was ready for a good icebreaker and the

second semester started with logistic

571

:

regression.

572

:

Okay, I can make it logistic regression

problem because you can say, what's the

573

:

probability you get it right?

574

:

What's the probability you guess correct?

575

:

But then you need some predictor.

576

:

So, oh, predictor.

577

:

Well, you can have when you guess, you

also have to give a certainty score, some

578

:

number between zero and 10 representing

how certain you are that you're correct.

579

:

Then it has to be done in groups.

580

:

So I figured it out.

581

:

Each, you divide the class into groups of

four.

582

:

Usually we do pairs, but this one, four.

583

:

Each group, you have one student is the

storyteller, tells the three statements.

584

:

The other three discuss together.

585

:

And then,

586

:

come up with a guess of which they think

is true, which of them that they think is

587

:

a lie, and a certainty score.

588

:

So write the certainty score down in a

sheet of paper, then find out whether your

589

:

guess was correct and write that down too.

590

:

So they find out.

591

:

Then there's four of you in the group, so

you rotate.

592

:

Then the next person does it.

593

:

So as a result, as a group, each group has

four certainty scores and four.

594

:

correct or incorrect answers.

595

:

So they have four numbers, they have eight

numbers, first four numbers between zero

596

:

and 10, and then the four numbers which

are zeros and ones.

597

:

And so, by the way, when you do this, I

have a slide prepared, or I write it on

598

:

the board, the exact instructions.

599

:

You need to give in, you can't just tell

it, people aren't paying attention for one

600

:

second.

601

:

I'm just doing this for you in that thing,

but actually we have the instructions

602

:

there.

603

:

Then did this thing I discovered a couple

of years ago.

604

:

It's putting things on Google Forms.

605

:

So live in class, I create a Google Form,

I open Google, type it in right there.

606

:

So this is also it's a power thing for

them.

607

:

Look at this.

608

:

I didn't have to prepare this.

609

:

I type the Google Form, I put question

one, certainty score, make it a response

610

:

from zero to 10.

611

:

Question two, yes or no, did you get it?

612

:

Was your guess correct?

613

:

So with each group, I want you to go, oh,

and then we use tiny URL to get a URL.

614

:

And then for each group, I say, pull out

your phone or your computer, and one

615

:

person from the group, enter your four

data points.

616

:

So we set it up with four.

617

:

So there's actually eight responses, the

first one, the first one.

618

:

Then we get the data, it takes them a

minute to type it in.

619

:

Then I have it all prepared.

620

:

I've done it before, right?

621

:

So I have the code ready.

622

:

I.

623

:

So I go to the Google page, I download it,

I put it on the desktops.

624

:

It's not even my laptop, it's just a

computer that's in the classroom.

625

:

Then I go, I open R, I read it in, and I

have the code prepared so I can do it.

626

:

And then we can make graphs.

627

:

So we fit a legit, so, but then I did

something I always like to do.

628

:

I set it all up.

629

:

Okay, we have the data.

630

:

I type in the code for logistic

regression.

631

:

Again, I have a pause.

632

:

I say, well, write the code with your

neighbor what the logistic regression code

633

:

would look like.

634

:

So, yeah, and then I do it and then I type

it and I said, then I do display, you

635

:

know, of the fitted regression.

636

:

And before hitting carriage return, I

said, this is what it's going to look

637

:

like.

638

:

There's going to be coefficient estimate,

standard error.

639

:

What are they going to be?

640

:

You and your neighbor have to figure out,

try to guess what the estimate and the

641

:

standard error are gonna be.

642

:

Well, the standard error is tricky, like

that's hard.

643

:

So I said, just figure out, guess what the

estimate will be.

644

:

And so then I have them do it, I go around

the room, I make sure they're all drawing

645

:

the curve, and then I have someone go on

the board and draw what they had done.

646

:

And then I ask people, do you think this

is reasonable?

647

:

Do you think this slope is reasonable?

648

:

Now what do you think the standard error

will be?

649

:

Do you think the slope will be more than

two standard errors away from zero?

650

:

Then you fit it.

651

:

and you have the scatter plot and they can

see and they've thought about that

652

:

committed to it.

653

:

So that's logistic regression.

654

:

But when I wrote up the article, the

people in the journal said, well, what

655

:

about other classes?

656

:

And then I realized you can use this to

teach measurement.

657

:

You can use it to teach experimentation,

like all sorts of things.

658

:

You could do a lot with that.

659

:

But I felt so satisfied because just I

felt like it was just created out of

660

:

nothing.

661

:

I wanted to true Snellai activity and now

there is one.

662

:

So that was just felt so it felt so good

to have created.

663

:

Now I want everyone to do it because now

that I created this this beautiful thing

664

:

out of nothing, it did not exist.

665

:

Anyway, just I'm very happy about that.

666

:

Yeah, I love that.

667

:

I definitely tried that in my own.

668

:

My own classes seems like a good thing to

do on the first or second class, isn't it?

669

:

Right, exactly.

670

:

Now the point is that you're killing two

birds there.

671

:

Yeah, yeah.

672

:

No, that's super cool.

673

:

Definitely going to try that for sure.

674

:

So, and it's like, I have a commencement

device now.

675

:

I have officially publicly committed to do

that.

676

:

So I have to do it and then.

677

:

Come back to you, Andrew, to tell you how

it went.

678

:

The other thing you can do is there are

certain fun psychology experiments from

679

:

the literature that can be done in class,

because things that have very large

680

:

effects, like some of the classic Tversky,

Kahneman experiments of cognitive

681

:

illusions, we have one of those examples

too.

682

:

You can do it live in class.

683

:

Yeah, that sounds also super cool.

684

:

I also saw in preparing the episode that

you have a flipped classroom, like you

685

:

emphasize a flipped classroom environment.

686

:

I don't think I've ever heard you talk

about that.

687

:

Could you explain what this approach is

and how you think that enhances the

688

:

learning of client progression and calls

on inference?

689

:

I think to me the flipped classroom is

pretty much the same as traditional high

690

:

school classes, high school math class.

691

:

So if you take math in high school, you

have a book you're supposed to read and

692

:

there's homework assignments.

693

:

Usually you read just enough of the book

to allow you to do the homework

694

:

assignments.

695

:

Then in class, the teacher does a couple

things in the board and most of the time

696

:

in class you spend working on problems in

pairs or small groups and then people go

697

:

up to the board and share their answers.

698

:

That's kind of what I think should be.

699

:

So that's the model of so it's very

traditional.

700

:

The flipping part is, you know, I don't

have videos.

701

:

I guess I could, but I don't.

702

:

Akki has videos for his glasses that I

have.

703

:

But the flip part is the reading.

704

:

Right.

705

:

So they I'm not lecturing because they're

supposed to have read the book.

706

:

Now, what happens, you know, it works only

if you have a book that you can can lean

707

:

on.

708

:

But I think that's very important.

709

:

This semester, I'm teaching in a

710

:

statistics class teaching some multi

-level modeling and some other things.

711

:

My book with Aki and Jennifer on advanced

regression and multi -level modeling

712

:

doesn't exist yet.

713

:

It's supposed to be the updated version of

my book with Jennifer.

714

:

I couldn't quite bring myself to teach out

of my book with Jennifer just because the

715

:

code is old, but then I don't have a new

book.

716

:

And so as a result, the class I'm teaching

this semester,

717

:

It's fun.

718

:

I think the students are enjoying it, but

I'm not it's not going as perfectly as it

719

:

could because I can't really do the flip

thing because I keep I end up spending a

720

:

lot of time in class like my computer

demos typically end up being me doing the

721

:

homeworks, working them out the homeworks

that were just do which is fine, but it's

722

:

it's not they're a little bit more

elaborate than.

723

:

Ideally, I think computer demos would be

shorter.

724

:

They don't have enough to read before, so

I end up spending a lot of time lecturing.

725

:

I think I spend most of today's class just

talking.

726

:

I felt a little bad about that.

727

:

I don't know.

728

:

I think it's still fine.

729

:

It's still a breath of fresh air compared

to other classes they're taking.

730

:

I'm sure if all the classes were like

mine, then that would be horrible.

731

:

But an occasional class that's like mine

can be good.

732

:

I think in general, students like more

organization.

733

:

A book is better.

734

:

Even my

735

:

My when I teach ever regression other

stories that's super organized, but it's

736

:

not always what students want because they

want to set up methods and formulas and

737

:

theorems and so forth.

738

:

So I'm not always giving people what they

want.

739

:

Anyway, I think that they again, I think

they're really looking for very clear.

740

:

I don't I have this thing, the goal is to

be fluent in the foreign language, but I

741

:

don't think people usually think of it

that way.

742

:

I think that they're looking for.

743

:

something different.

744

:

But what that means is that it puts a

special burden on me to be super organized

745

:

because if I'm not super organized, then I

think students will not see the point.

746

:

So my class this semester, it doesn't use

the book.

747

:

It's not as flipped as it could be.

748

:

I still have them talking with each other

in class, but not having the flipped

749

:

classroom makes it a little more of a

passive experience for them.

750

:

And then when I do have them talking,

they're often just talking to each other

751

:

saying, oh, I have no idea what's going on

here.

752

:

It's like, oh, good that I know that, I

guess.

753

:

That's true.

754

:

Yeah.

755

:

And I mean, I do relate to this idea of

the, you know, getting fluent in a foreign

756

:

language.

757

:

That's actually also a metaphor I use

quite a lot to people who are curious

758

:

about what the...

759

:

work of a statistical modeler is.

760

:

And that's funny because there's that

weird human brain bias of just thinking

761

:

that someone who is doing something that

looks hard to you, or they must have been

762

:

good at it since the beginning.

763

:

And at least for me, it couldn't be

further from the truth.

764

:

It comes from a lot.

765

:

As you were saying, I think you were

saying learning is a

766

:

Vector is magnitude and direction, right?

767

:

So definitely magnitude is very important

for me each time I learn something.

768

:

And often I'm saying, yeah, well, it looks

hard because you have to learn kind of two

769

:

languages, the language of stats and the

language, like the actual programming

770

:

language that you need to do the stats.

771

:

But it's just as any other language, you

need to...

772

:

talk to people in that language and with

time you'll see your brain just getting

773

:

there.

774

:

So it does go through to people, but at

the same time they need to see some

775

:

results along the way because otherwise

the motivation is gonna fall down.

776

:

So it's always that needle that's a bit

hard to thread in my experience.

777

:

Yeah, well, I like this book.

778

:

See, I seriously think this book is just

fun to read.

779

:

Although, as I said, I kind of I kind of

wish I had separated it out in a different

780

:

way because I do feel when people when you

open at random, you end up you might see

781

:

some code or you might see a homework

assignment or you might like it's not

782

:

always clear what like you're not

necessarily opening into a middle of a

783

:

story.

784

:

And so like homework assignments don't

look like fun and code doesn't look like

785

:

fun.

786

:

So I'm.

787

:

Don't think I realized you don't see the

book until it's a book before that's this

788

:

PDF on the screen and it has it has a

different experience that way and and

789

:

Akki's gonna kill me that I say this

because we went back and forth and and but

790

:

like now I think we really should have of

I really think we made a mistake by not

791

:

doing it the other way because I think it

would look a lot more fun that way if If

792

:

like all the stories were in one place and

all the activities were in another place

793

:

I'm really feeling bad about that.

794

:

I still love it.

795

:

It's just, we just have so many fun

things.

796

:

Oh, then we have, for the final exam, we

made, it's multiple choice.

797

:

So what I do is I have four or more

questions per chapter.

798

:

It's like, it's, it's,

799

:

The exam has so there's 12 chapters for

the fall and 12 for the spring.

800

:

So each chapter, I have four or more

questions.

801

:

What I do is I randomly sample one per

chapter and give that to the students as

802

:

their practice exam.

803

:

Then I randomly sample two per chapter and

give that and make that the final exam.

804

:

So therefore, by construction, the

practice exam is representative of the

805

:

final exam because they're two random

samples from the same population.

806

:

So I think that's that that's great to be

able to do that now.

807

:

Of course, all the problems are now in the

book, although without the answers.

808

:

So you'd have to figure out which it is.

809

:

But in theory, someone could read through

all of those.

810

:

But of course, the usual story is if

someone really goes to the trouble of

811

:

reading through all of them and figuring

them all out, that's probably good anyway.

812

:

So I don't mind if they didn't do well on

the exam.

813

:

But it took a lot of effort to write.

814

:

These multiple choice questions are hard

to write, but I think they're easier to

815

:

grade.

816

:

And I think they're testing something

that's a bit more focused.

817

:

It's very easy to write open -ended

questions and not know what you're

818

:

testing.

819

:

True.

820

:

Yeah.

821

:

Yeah.

822

:

It's a bit more like astrology, where you

always find something you're satisfied

823

:

about.

824

:

Yeah, yeah, exactly.

825

:

And it also encourages a certain behavior

among students to just keep writing and

826

:

trying to like touch all the bases.

827

:

True.

828

:

Yeah, yeah.

829

:

As a pure product of the French

educational system, I can tell you open

830

:

ended questions are like my bread and

butter.

831

:

I've been trained at that a lot.

832

:

So if someone have to answer, like I have

a weird feeling of familiarity and that...

833

:

At the same time, I like it and I dread

it.

834

:

So that's what...

835

:

Many years ago, I taught a class in France

and the students are supposed to do

836

:

projects and it just happened.

837

:

Yeah, everybody's busy.

838

:

So one of the groups did, they did

nothing.

839

:

They turned something in, which was pretty

much they had just like, it wasn't

840

:

plagiarized, but they had just copied

stuff from the internet.

841

:

Like, you know, they just literally copied

some images and it was essentially

842

:

nothing.

843

:

So I talked to the...

844

:

The head instructor of the class, I said,

well, I want to give him a two out of 20

845

:

on this.

846

:

Like, I guess, you know, I, I, maybe I

don't give them zero because they wrote

847

:

out sentence or two, but like, can I, can

I give them a two out of 20?

848

:

He said, well, yeah, you're giving the

grade.

849

:

I said, in the U S if you want to give

someone a low grade, you have to ask for

850

:

permission because you're afraid they

might sue you or complain or something.

851

:

And, but he said, no, in France, you can

give people, you know, two out of 20.

852

:

They might even think it's a good grade.

853

:

So it is a different...

854

:

French system is a little more rough in

how the grading goes.

855

:

I don't remember that.

856

:

Yeah.

857

:

I mean, it depends.

858

:

I don't know at what level you're

teaching, but if you're teaching in the...

859

:

especially in the class préparatoire, you

know, so that weird stuff we have in

860

:

between high school and universities.

861

:

These were graduate students.

862

:

Yeah.

863

:

So you can definitely do that.

864

:

I know I was like my first philosophy...

865

:

dissertations when I was in the class,

were absolutely a disaster.

866

:

Um, it was, that was, I think I got four

out of 20, something like that.

867

:

And that was not even the worst grades.

868

:

You know how like in gymnastics, like it's

like 9 .8, 9 .9, 9 .93, like that, like

869

:

the grading system did that.

870

:

But statistics is, it's really hard.

871

:

Like I think real world problems, I

wouldn't give myself.

872

:

a 20 out of 20 in my analysis, because if

you're doing an experiment in political

873

:

science or psychology or economics or an

observational study, everybody knows about

874

:

identification being difficulty, but

there's a lot of other difficulties.

875

:

So usually if you're doing a causal study,

you wanna have between person comparisons,

876

:

or in political science or economics, it

would be called panel study.

877

:

You wanna have...

878

:

Ideally, you do the treatment and the

control on each person.

879

:

But if you can't do that, you want to make

comparisons.

880

:

That's super important, partly for

statistical efficiency and for balance.

881

:

And it's also kind of a measurement issue

because measurements can be biased and

882

:

biases can actually like the treatment

effect.

883

:

The treatment can affect the measurement

bias and you can even have treatments that

884

:

affect the measurement bias without

affecting the outcome.

885

:

Like, it's so naive view that if you just.

886

:

give randomly assigned treatment and

control that you have a kosher estimate,

887

:

the causal effect, that's not really right

in general, because that assumes that the

888

:

measurement bias doesn't vary with the

treatment, and that's often a mistake.

889

:

So you really want to have panel structure

or repeated measurements with in -person

890

:

designs.

891

:

That means you want to start setting

multilevel models.

892

:

So if you don't have a lot of observations

or a lot of groups, then your inferences

893

:

can depend on the prior, which it really

does.

894

:

You can't, you could act really tough and

say, oh, I'm really tough.

895

:

I'm not using a prior, but then it just

means your inference is really noisy.

896

:

And that's, that's not good either.

897

:

It means you can get bad things.

898

:

And then what predictors to include in

theory, everything should be interacted

899

:

with everything because otherwise that can

induce bias.

900

:

But in practice, if you do that, you have

a lot of the coefficients running around.

901

:

So even the simplest problems are like,

like there's no right way of doing it.

902

:

which gives me a lot of sympathy for

researchers.

903

:

And I know here we're not talking about

like the crisis in science, but I'll say

904

:

that like sometimes people will say that

you should pre -register your design and

905

:

analysis.

906

:

And I think that's great, but it's not

gonna solve a lot of problems because if I

907

:

don't know the right analysis to do, I

don't know what I'm supposed to be pre

908

:

-registering.

909

:

It's really difficult.

910

:

It's not, we can't just do better science

by just like.

911

:

Like there's this phrase, questionable

research practices.

912

:

Like it's not like you can just stop doing

questionable research practices and

913

:

everything will be okay.

914

:

It's not clear.

915

:

Doing it right is not just the absence of

making mistakes.

916

:

It's very difficult.

917

:

And so when we're teaching or when you're

learning, I'll say, cause I really would

918

:

like our book to be read by people who are

not necessarily teaching a class, but just

919

:

want to learn the stuff that when you're

learning, there is this.

920

:

weird thing where you have to learn the

skills and at the same time realize the

921

:

limitations.

922

:

And it is, it's hard to teach in that way.

923

:

It's not like, it's easier to teach

something like physics or chemistry where

924

:

you say, here's what we're doing.

925

:

And then later on, we're going to tell you

why these ideas aren't correct.

926

:

And we're going to do something more

elaborate in statistics.

927

:

It's hard to reach that like plateau where

you say, well, here's the basics, learn

928

:

the basics.

929

:

Once you're learning the basics, you keep

930

:

seeing all the problems at the same time.

931

:

So it makes it very fun to learn, but also

challenging.

932

:

Yeah, true.

933

:

Yeah.

934

:

And actually that makes me wonder, how do

you think, so for people who are going to

935

:

use your book for teaching, so

instructors, how can they adapt the

936

:

materials for different educational

settings like...

937

:

such as introductory course or more

advanced courses.

938

:

So it's set up for this class on applied

regression and causal inference.

939

:

So if you're teaching out of regression

and other stories, it's very easy.

940

:

It just gives you a whole template for a

two semester class.

941

:

I've also taught a one semester version

where I just do one activity and each week

942

:

I have two of everything.

943

:

So instead I just pick one story, one

activity and so forth.

944

:

That's what actually I did.

945

:

Last semester, if it's a more advanced

class, and I would say, or or more basic,

946

:

if it's a more basic class, I think it's

still pretty much works.

947

:

You just have to simplify the code

demonstrations are going to be way too

948

:

complicated for more basic class.

949

:

But I think the stories work and the

activities work.

950

:

You just maybe have to change it a little.

951

:

So.

952

:

In two truths and a lie, you wouldn't do

logistic regression, but for example, you

953

:

could still make a scatter plot and you

could still compare the probability, the

954

:

proportion of correct guesses for people's

certainty scores higher than five or lower

955

:

than five.

956

:

You can adapt it.

957

:

I think a lot of the activities are like

that in the stories.

958

:

For more advanced class, I think again, it

works in the other direction that this can

959

:

be a starting point.

960

:

You give the story and...

961

:

And also people have their own stories.

962

:

So reading my story might help you as a

teacher, think of your own story and tell

963

:

it in the same way.

964

:

Yeah.

965

:

Okay.

966

:

Yeah, I see what you mean.

967

:

I'm thinking randomly.

968

:

It sounds like you would be interested in

Andrew at some point in writing some

969

:

fictional stats -based stories.

970

:

something like, I think Carl Sagan, right,

did write some science fiction.

971

:

Would you be like, do you see yourself

doing that at some point so that you are

972

:

forced to maybe not use any modeling or

things like that in the book and you have

973

:

to completely only tell stats through the

stories and all?

974

:

Well, well, fake data for sure.

975

:

I did have an idea.

976

:

I was thinking about having a book where

it's

977

:

all like it's learning statistics through

fake data simulation where everything is

978

:

just you just start with some very simple

things like everything that's like the

979

:

gimmick right the gimmick is here all the

principles of probability and statistics

980

:

and you're only you're not allowed to use

any real data you're only allowed to do

981

:

fake data simulation and you can cover a

lot like all sorts of things the the

982

:

attenuation of the of the code the

treatment effect when you have measurement

983

:

error in your predictor and

984

:

Like anyway, all sorts of things you might

want to cover.

985

:

You could do that way.

986

:

So I thought that would be fun.

987

:

Maybe a fun future book.

988

:

I mean, fiction, you know, Jessica and I

wrote a play, Jessica Holman and I wrote a

989

:

play recursion, which is fiction.

990

:

It has computer science theme.

991

:

It was performed at a computer science

conference recently.

992

:

So, so I guess, yeah, we have written

fiction.

993

:

It didn't really have, it had some

statistical principles in there.

994

:

There were, there were some, it had some.

995

:

Like we, yeah, I think we had some line

where one of the characters talked about

996

:

their code being beautiful, and then

somebody else said, code that runs is

997

:

beautiful.

998

:

And then somebody else says, code that

runs and you know it runs is beautiful.

999

:

So that's like some workflow principle.

:

00:56:19,447 --> 00:56:24,797

So we were able to put in some of our

thoughts about statistical workflow in

:

00:56:24,797 --> 00:56:25,777

fiction.

:

00:56:26,377 --> 00:56:27,997

So yeah, it's possible.

:

00:56:28,301 --> 00:56:30,001

I knew it.

:

00:56:30,001 --> 00:56:31,001

I knew it.

:

00:56:31,001 --> 00:56:31,721

Yeah.

:

00:56:31,801 --> 00:56:33,921

I love to hear that.

:

00:56:33,921 --> 00:56:35,401

I love to hear that.

:

00:56:35,401 --> 00:56:37,041

Read that book.

:

00:56:37,041 --> 00:56:43,301

And I was saying here, because I think,

and you could even record the audio

:

00:56:43,301 --> 00:56:44,381

version yourself.

:

00:56:44,381 --> 00:56:45,771

I think that'd be awesome.

:

00:56:45,771 --> 00:56:45,981

Yeah.

:

00:56:45,981 --> 00:56:49,221

Well, that performance apparently went

well, but they didn't video it.

:

00:56:49,221 --> 00:56:52,561

So we want to get it performed somewhere

else.

:

00:56:52,561 --> 00:56:53,741

Yeah.

:

00:56:54,101 --> 00:56:55,461

Well, let's try that.

:

00:56:55,461 --> 00:56:56,749

If there is...

:

00:56:56,749 --> 00:57:05,289

One day if I manage to do a live LBS

dinner, that should definitely be

:

00:57:05,289 --> 00:57:08,369

performed at that dinner.

:

00:57:08,449 --> 00:57:10,929

That's a must.

:

00:57:13,369 --> 00:57:21,769

Now I'd like to ask you something about, I

know a topic that's dear to your heart is

:

00:57:21,769 --> 00:57:25,609

visualization and it's time to

understanding.

:

00:57:25,889 --> 00:57:26,605

Because...

:

00:57:26,605 --> 00:57:32,465

the focus on visualization is a key aspect

of your book, Active Statistics.

:

00:57:32,465 --> 00:57:36,745

It's also a key aspect of almost all your

work.

:

00:57:36,885 --> 00:57:39,205

So I'd like to hear your thought about

that.

:

00:57:39,205 --> 00:57:46,185

How do you think visualization aids in the

comprehension of statistics and cost of

:

00:57:46,185 --> 00:57:47,045

models?

:

00:57:47,365 --> 00:57:49,505

Well, so I'll talk about two things.

:

00:57:49,505 --> 00:57:54,381

First, visualization in teaching and

second, visualization in statistical.

:

00:57:54,381 --> 00:57:55,801

Like applied statistics.

:

00:57:55,801 --> 00:58:01,561

So with teaching, I think like I think the

deterministic part is usually the more

:

00:58:01,561 --> 00:58:02,521

important part of the model.

:

00:58:02,521 --> 00:58:05,661

So I want people to be able to visualize

what is the line?

:

00:58:05,661 --> 00:58:07,761

Why goes a plus BX?

:

00:58:07,761 --> 00:58:10,241

What what does it look like if I have an

interaction?

:

00:58:10,241 --> 00:58:12,501

What would the two lines look like?

:

00:58:12,501 --> 00:58:15,681

What is logistic curve look like?

:

00:58:16,201 --> 00:58:22,081

I I don't I think it's a mistake when

statistics books start with things like a

:

00:58:22,081 --> 00:58:23,021

histogram.

:

00:58:23,021 --> 00:58:25,761

Histogram is not fundamental.

:

00:58:25,761 --> 00:58:27,541

Actually, it's very confusing.

:

00:58:27,541 --> 00:58:35,301

I used to do this assignment where I would

say to students, gather between 30 and 50

:

00:58:35,301 --> 00:58:39,681

data points on anything and make a

histogram of it.

:

00:58:39,681 --> 00:58:42,271

And about half the students would do it.

:

00:58:42,271 --> 00:58:46,221

Like they might gather data on 30

countries or 50 states, or they might take

:

00:58:46,221 --> 00:58:49,541

30 observations of something and make a

histogram.

:

00:58:49,981 --> 00:58:52,127

The other half would.

:

00:58:52,141 --> 00:58:57,641

make a bar chart showing their 30

observations in time order.

:

00:58:57,641 --> 00:59:00,921

So it would be like, basically it was a

time series except it would just be

:

00:59:00,921 --> 00:59:03,241

displayed in bars because it was a

histogram.

:

00:59:03,241 --> 00:59:07,681

And so like you see the problem is that a

histogram is supposed to convey a

:

00:59:07,681 --> 00:59:11,041

distribution, but what people are getting

out of it is it looks like a bunch of bars

:

00:59:11,041 --> 00:59:13,341

and half the students didn't get the

point.

:

00:59:13,341 --> 00:59:16,781

The concept of a distribution is very

abstract because...

:

00:59:16,781 --> 00:59:21,621

The height of the bar represents the

number of cases or the proportion of

:

00:59:21,621 --> 00:59:22,781

cases.

:

00:59:22,781 --> 00:59:25,011

It's not like a scatter plot.

:

00:59:25,011 --> 00:59:27,041

I think it's actually more intuitive.

:

00:59:27,041 --> 00:59:31,781

But I noticed that statistics classes were

always focusing on that because, oh,

:

00:59:31,781 --> 00:59:33,251

histogram is one dimensional.

:

00:59:33,251 --> 00:59:35,011

What could be more simple than that?

:

00:59:35,011 --> 00:59:37,851

I think a time series is really much more

basic.

:

00:59:37,851 --> 00:59:42,941

So when it comes to plotting data, I think

we really have to get a little closer to

:

00:59:42,941 --> 00:59:44,525

what we care about.

:

00:59:44,525 --> 00:59:47,905

Um, a lot of just stupid stuff, like box

plots.

:

00:59:47,905 --> 00:59:48,565

I hate that.

:

00:59:48,565 --> 00:59:49,455

I hate that stuff.

:

00:59:49,455 --> 00:59:51,285

And it's like, I don't see it.

:

00:59:51,285 --> 00:59:55,725

It's just like, people just do things that

are conventional and I think are

:

00:59:55,725 --> 00:59:56,745

absolutely horrible.

:

00:59:56,745 --> 01:00:02,345

But anyway, all this focus on

distributions, I think the linear, the

:

01:00:02,345 --> 01:00:04,535

deterministic part of the model is more

important.

:

01:00:04,535 --> 01:00:07,505

And so that's what I try to convey.

:

01:00:07,565 --> 01:00:08,621

I do.

:

01:00:08,621 --> 01:00:12,841

One thing I noticed is that students will

learn stuff if it's on the homework and on

:

01:00:12,841 --> 01:00:13,411

the exam.

:

01:00:13,411 --> 01:00:17,681

They won't learn it just because it's on

the blackboard in class or in your slides.

:

01:00:17,681 --> 01:00:26,381

So I found that when I did my work, I

often make sketches of graphs.

:

01:00:26,641 --> 01:00:29,521

And so I require like I have homework

assignments where you have to make a

:

01:00:29,521 --> 01:00:32,761

sketch, sketch what you think it's going

to look like, then fit the model.

:

01:00:32,761 --> 01:00:35,521

Because if you don't ask people to do

that, they won't.

:

01:00:35,521 --> 01:00:37,965

So teaching has to be.

:

01:00:37,965 --> 01:00:42,505

Like you want people to actually practice

that kind of workflow.

:

01:00:42,805 --> 01:00:45,985

So that's then I had something else to

say, but I won't.

:

01:00:45,985 --> 01:00:49,765

We can say it another time about

statistical graphics.

:

01:00:49,885 --> 01:00:51,765

It's already kind of going on a little

bit.

:

01:00:51,765 --> 01:00:55,825

So if we ever talk about statistical

graphics again, just ask me to tell you

:

01:00:55,825 --> 01:00:59,925

what I think is this really super

important aspect of statistical graphics

:

01:00:59,925 --> 01:01:01,805

within statistical inference.

:

01:01:01,805 --> 01:01:03,705

And I'll tell you about that.

:

01:01:03,705 --> 01:01:04,725

Okay, perfect.

:

01:01:04,725 --> 01:01:06,669

Well, definitely.

:

01:01:06,669 --> 01:01:08,829

Definitely tell you.

:

01:01:08,829 --> 01:01:12,849

Do you still have time for one or two

questions or should we?

:

01:01:12,849 --> 01:01:13,429

Yeah, sure.

:

01:01:13,429 --> 01:01:15,969

I have time for one or two questions,

sure.

:

01:01:15,969 --> 01:01:16,789

Okay, awesome.

:

01:01:16,789 --> 01:01:18,849

Let's continue.

:

01:01:18,849 --> 01:01:22,809

I'm curious about that.

:

01:01:23,169 --> 01:01:30,749

How do you handle the distinction and or

the transition from regression analysis to

:

01:01:30,749 --> 01:01:31,709

causal inference?

:

01:01:31,709 --> 01:01:36,429

How do you navigate these two topics in

the classroom setting?

:

01:01:36,429 --> 01:01:41,809

to ensure that students grasp both

concepts effectively.

:

01:01:42,109 --> 01:01:43,389

So I overlap.

:

01:01:43,389 --> 01:01:48,469

So I start talking about causal inference

at the very beginning, partly because they

:

01:01:48,469 --> 01:01:49,689

can't avoid it.

:

01:01:49,689 --> 01:01:51,949

So we'll have a regression.

:

01:01:52,408 --> 01:01:57,489

Maybe you fit one of the examples we use

in regression, other stories is predicting

:

01:01:57,489 --> 01:02:00,949

from some survey, predicting earnings from

height.

:

01:02:00,949 --> 01:02:04,709

Taller people make a little bit more money

than...

:

01:02:05,101 --> 01:02:11,461

shorter people and then you can also you

can throw sex into the model and men make

:

01:02:11,461 --> 01:02:13,061

more money than women taller men.

:

01:02:13,061 --> 01:02:15,621

So you can say how do you interpret the

coefficient of height?

:

01:02:15,621 --> 01:02:19,791

Well if you're one, you know for every

inch taller you make this much more money.

:

01:02:19,791 --> 01:02:20,921

So that's not right.

:

01:02:20,921 --> 01:02:26,701

You have to say comparing two people of

the same sex one of whom is one inch

:

01:02:26,701 --> 01:02:32,921

taller than the other under the model on

average the taller person will be making

:

01:02:32,921 --> 01:02:34,061

this much more money.

:

01:02:34,061 --> 01:02:35,941

So what are the things you need to say?

:

01:02:35,941 --> 01:02:38,801

You have to say comparing, because it's

all comparative.

:

01:02:38,801 --> 01:02:40,701

There's no causal language.

:

01:02:40,701 --> 01:02:46,661

You have to say, on average, you have to

say according to the model.

:

01:02:46,661 --> 01:02:52,981

And you have to say not controlling for

blah, blah, but comparing to people who

:

01:02:52,981 --> 01:02:54,721

are the same in these other predictors.

:

01:02:54,721 --> 01:02:56,621

You're not holding everything else

constant.

:

01:02:56,621 --> 01:02:58,157

You're doing this comparison.

:

01:02:58,157 --> 01:03:01,777

So I do this, I have a drilling class

where they have to do it.

:

01:03:01,777 --> 01:03:02,937

I can then they laugh.

:

01:03:02,937 --> 01:03:07,137

It's like a joke as I say, here's a

regression, explain each coefficient of

:

01:03:07,137 --> 01:03:07,637

words.

:

01:03:07,637 --> 01:03:11,117

And they say, like, what's the coefficient

of the intercept of this model?

:

01:03:11,117 --> 01:03:14,337

It's like something I'm predicting

something as a function of time.

:

01:03:14,337 --> 01:03:18,497

So this says in the year Jesus was born,

this is well, that's the intercept right

:

01:03:18,497 --> 01:03:19,977

at year zero.

:

01:03:19,981 --> 01:03:22,081

So is that interpretable?

:

01:03:22,081 --> 01:03:23,501

Well, maybe it's interpretable.

:

01:03:23,501 --> 01:03:27,741

have a time series going from:

to 2000, maybe we're not particularly

:

01:03:27,741 --> 01:03:30,661

interested in what happened when the year

Jesus was born.

:

01:03:30,661 --> 01:03:34,261

That's a bit of an extrapolation that

implies.

:

01:03:34,381 --> 01:03:35,981

So, but same with the coefficient.

:

01:03:35,981 --> 01:03:37,421

So it's like a joke in class.

:

01:03:37,421 --> 01:03:42,361

It's a fun inside joke we have in class

that I'll ask them to explain the

:

01:03:42,361 --> 01:03:47,381

regression coefficient and they have to

say it without using the wrong language.

:

01:03:47,381 --> 01:03:48,865

And it's like,

:

01:03:48,941 --> 01:03:53,161

It's like the game you play as a kid where

like you're not like you say like you're

:

01:03:53,161 --> 01:03:54,281

not allowed to say the word no.

:

01:03:54,281 --> 01:03:55,741

If you say the word no, you lose.

:

01:03:55,741 --> 01:03:58,061

You have to figure out a way to decline.

:

01:03:58,061 --> 01:04:00,561

Will you give me your cake?

:

01:04:00,561 --> 01:04:02,921

I choose not to give you your cake.

:

01:04:02,921 --> 01:04:05,661

You know, like I choose to do something

else or whatever.

:

01:04:05,661 --> 01:04:08,441

So similarly, you're not allowed to use

this word.

:

01:04:08,441 --> 01:04:13,901

And so right away, we're introducing the

idea that causation is important.

:

01:04:13,901 --> 01:04:15,021

And.

:

01:04:15,021 --> 01:04:18,961

Then when we get the causal inference,

well, we have regression already.

:

01:04:18,961 --> 01:04:23,541

So we use that not for controlling for

things, but for adjusting for things.

:

01:04:23,541 --> 01:04:27,521

So we've already done non -causal

examples, like the survey example, where

:

01:04:27,521 --> 01:04:31,441

we adjust for differences in order to post

stratify.

:

01:04:31,441 --> 01:04:34,111

So then it fits in.

:

01:04:34,111 --> 01:04:38,961

So there's a lot of specific things about

causal inference, but we first half is we

:

01:04:38,961 --> 01:04:40,161

don't cheat at the beginning.

:

01:04:40,161 --> 01:04:42,661

We don't pretend to be causal when we're

not.

:

01:04:42,661 --> 01:04:44,749

Then when we get to causal inference,

:

01:04:44,749 --> 01:04:49,009

We make use of what we've already done

rather than treating it as an entirely new

:

01:04:49,009 --> 01:04:49,969

topic.

:

01:04:49,969 --> 01:04:56,309

My little particular pet thing is that the

usual way causal inference is taught is

:

01:04:56,309 --> 01:04:58,469

there's an outcome and a treatment.

:

01:04:58,469 --> 01:05:00,868

And some people get the treatment, some

get the control.

:

01:05:00,868 --> 01:05:05,379

I say the basic is there's pre -test

measurement, a treatment, and an outcome,

:

01:05:05,379 --> 01:05:06,729

and that's in time order.

:

01:05:06,729 --> 01:05:08,249

So it introduces time.

:

01:05:08,249 --> 01:05:11,169

You don't have to have a pre -test, but

you should.

:

01:05:11,169 --> 01:05:13,805

And so it's good practice, but also it...

:

01:05:13,805 --> 01:05:18,545

It puts you into the regression framework

already, which is helpful.

:

01:05:19,285 --> 01:05:22,505

So sometimes things that are too simple

are harder to understand.

:

01:05:22,505 --> 01:05:25,005

A little context can help.

:

01:05:25,805 --> 01:05:31,265

Yeah, I found so the...

:

01:05:31,265 --> 01:05:40,205

The Dirichlet graphs do help quite a lot

in teaching the causal inference concepts,

:

01:05:40,225 --> 01:05:43,237

especially because you can then...

:

01:05:43,277 --> 01:05:47,237

marry that with the graphical

representation of the Bayesian model that

:

01:05:47,237 --> 01:05:48,557

you can come up with.

:

01:05:48,557 --> 01:05:50,717

And then you use simulated data.

:

01:05:50,717 --> 01:05:56,257

You can come up with the model, then write

the model, and then just simulate data and

:

01:05:56,257 --> 01:05:58,327

see what the model tells you.

:

01:05:58,327 --> 01:06:04,597

And if it's able to recover the true

parameters, I find these fit pretty well

:

01:06:04,597 --> 01:06:06,283

together in the workflow.

:

01:06:08,525 --> 01:06:09,505

Good.

:

01:06:09,505 --> 01:06:14,685

Yeah, I think there's a lot of different

ways of teaching these things and using

:

01:06:14,685 --> 01:06:15,525

these.

:

01:06:15,525 --> 01:06:19,465

There are different frameworks that can

work well.

:

01:06:19,465 --> 01:06:23,005

And I think that's good that that's the

case.

:

01:06:23,005 --> 01:06:28,045

There's more than one way of explaining

things and understanding things.

:

01:06:28,045 --> 01:06:30,025

Yeah, true.

:

01:06:30,345 --> 01:06:37,389

Actually, I'm curious, based on the

methodologies and...

:

01:06:37,389 --> 01:06:43,369

Also, the philosophies that present in

active statistics, how do you see the

:

01:06:43,369 --> 01:06:49,289

future of statistical education evolving,

particularly with the advent of new

:

01:06:49,289 --> 01:06:50,609

technologies?

:

01:06:50,669 --> 01:06:54,429

And how do you see that play out in the

coming years?

:

01:06:54,449 --> 01:06:55,689

I don't know.

:

01:06:55,689 --> 01:07:00,529

I mean, I'm still unhappy with how

statistics is usually taught.

:

01:07:00,529 --> 01:07:03,853

So introductory statistics, it's really

been...

:

01:07:03,853 --> 01:07:08,653

Like the textbooks now are almost all

pretty much the same as the textbooks from

:

01:07:08,653 --> 01:07:09,813

40 years ago.

:

01:07:09,813 --> 01:07:17,113

I mean, they look different, but it's

based on this thing where they teach, like

:

01:07:17,113 --> 01:07:21,753

there is this, they teach these

distributions and it, so it starts by

:

01:07:21,753 --> 01:07:27,013

focusing on variation, which I think is

not even really quite right.

:

01:07:27,013 --> 01:07:30,653

And then, it's not really focusing on the

questions that are being asked, it's

:

01:07:30,653 --> 01:07:32,397

really focused on the error term.

:

01:07:32,397 --> 01:07:37,617

And then there's all this stuff about the

sampling distribution of the sample mean,

:

01:07:37,617 --> 01:07:38,957

which is just kind of weird.

:

01:07:38,957 --> 01:07:43,537

Nobody cares about the sample mean and or

rarely do.

:

01:07:43,537 --> 01:07:46,597

It becomes very abstract and hard to

follow.

:

01:07:46,597 --> 01:07:50,897

And then there are these like confidence

intervals, like a huge amount of work to

:

01:07:50,897 --> 01:07:55,177

create these little summaries that you

don't really want to be using along with a

:

01:07:55,177 --> 01:07:56,057

bunch of messages.

:

01:07:56,057 --> 01:07:58,297

If you don't have random assignment,

you're screwed.

:

01:07:58,297 --> 01:08:00,653

If you don't have random sampling, you're

screwed.

:

01:08:00,653 --> 01:08:04,413

Then at the end, there's some stuff like

regression and Chi -squared tests and

:

01:08:04,413 --> 01:08:06,013

things that people do.

:

01:08:06,013 --> 01:08:08,672

And it's just kind of a disaster.

:

01:08:08,672 --> 01:08:09,853

I really, I really hate it.

:

01:08:09,853 --> 01:08:14,363

And I, I would like things to be much more

focused on the questions being asked.

:

01:08:14,363 --> 01:08:18,473

It's hard for me to think exactly how to

construct the introductory class to do

:

01:08:18,473 --> 01:08:19,133

this.

:

01:08:19,133 --> 01:08:22,853

But for the second class in statistics,

like the one that we teach on applied

:

01:08:22,853 --> 01:08:26,853

regression and causal inference, I do like

how we do it in regression and other

:

01:08:26,853 --> 01:08:27,273

stories.

:

01:08:27,273 --> 01:08:30,221

I feel like we developed through the

models.

:

01:08:30,221 --> 01:08:32,901

in a way that makes sense.

:

01:08:33,061 --> 01:08:35,441

I try to do that in active statistics.

:

01:08:35,441 --> 01:08:40,881

But really, the most important part of

teaching are the most basic classes.

:

01:08:42,221 --> 01:08:47,060

And there, we're still working on how to

do that.

:

01:08:47,121 --> 01:08:51,201

So I don't really know what the future is.

:

01:08:51,201 --> 01:08:57,181

There's a lot of statistics and machine

learning methods out there, but a lot

:

01:08:57,181 --> 01:08:57,925

of...

:

01:08:58,189 --> 01:09:02,369

basic concepts, of course, are still

coming up no matter how you do it, like

:

01:09:02,369 --> 01:09:07,509

issues of adjustment and bias and

variation.

:

01:09:07,829 --> 01:09:12,069

So it's hard, it is hard to get it all

like feel like it's all in one place.

:

01:09:12,069 --> 01:09:12,889

It's frustrating.

:

01:09:12,889 --> 01:09:13,549

Yeah.

:

01:09:13,549 --> 01:09:14,229

Yeah.

:

01:09:14,229 --> 01:09:14,749

Yeah.

:

01:09:14,749 --> 01:09:15,869

Now I agree with that.

:

01:09:15,869 --> 01:09:20,569

I'm also asking the question because I'm

pretty curious about it because I'm also

:

01:09:20,569 --> 01:09:24,169

personally a bit lost when I start

thinking about these things.

:

01:09:24,169 --> 01:09:25,389

It's so cute.

:

01:09:25,389 --> 01:09:25,901

And, uh,

:

01:09:25,901 --> 01:09:31,921

Like for now, I don't have a clear

organization in my head, you know.

:

01:09:32,281 --> 01:09:38,121

Maybe one last question for you, Andrew,

before I let you go, because you've

:

01:09:38,121 --> 01:09:43,241

already been extremely generous with your

time and you know me, I could really

:

01:09:43,241 --> 01:09:44,941

interview you for like three hours, no

problem.

:

01:09:44,941 --> 01:09:46,781

I have so many questions.

:

01:09:46,901 --> 01:09:50,521

But maybe what's next for you?

:

01:09:50,521 --> 01:09:55,575

What are your coming projects in maybe in

the, in this coming year?

:

01:09:56,461 --> 01:09:58,361

Well, we're trying to finish.

:

01:09:58,361 --> 01:10:04,421

Well, Aki and I are trying to finish our

Bayesian workflow book, and we'd like to

:

01:10:04,421 --> 01:10:07,761

do our advanced regression and multilevel

models book.

:

01:10:07,761 --> 01:10:12,281

It would be fun to get recursion performed

somewhere by some university theater group

:

01:10:12,281 --> 01:10:13,261

somewhere.

:

01:10:13,261 --> 01:10:22,481

Doing this research on combining, you

know, multilevel regression and post

:

01:10:22,481 --> 01:10:24,229

-traffication and

:

01:10:24,269 --> 01:10:27,529

with sampling weights, which I think is

really important.

:

01:10:27,529 --> 01:10:31,709

And I think also this could be useful for

causal inference too, because people use

:

01:10:31,709 --> 01:10:32,729

weighting there.

:

01:10:32,729 --> 01:10:39,489

So that's probably the one project I'm

most excited about from that direction.

:

01:10:39,889 --> 01:10:42,109

And then we're trying to write.

:

01:10:42,109 --> 01:10:45,149

I have a list.

:

01:10:45,149 --> 01:10:49,089

I have on my web page, I have a list of

published, unpublished, and unwritten

:

01:10:49,089 --> 01:10:50,609

research articles.

:

01:10:50,609 --> 01:10:53,229

So the unwritten is a list of like,

:

01:10:53,229 --> 01:10:56,349

things that I want to do or write up.

:

01:10:56,349 --> 01:10:57,809

So there's a long list of that.

:

01:10:57,809 --> 01:11:00,009

I'm collaborating with an economist.

:

01:11:00,029 --> 01:11:07,809

We're trying to create a unified framework

for causal inference for panel data, which

:

01:11:07,809 --> 01:11:12,749

really includes things like before -after

studies and regression discontinuities and

:

01:11:12,749 --> 01:11:19,789

difference and difference and just regular

regression, time series.

:

01:11:19,789 --> 01:11:21,293

I have a...

:

01:11:21,293 --> 01:11:24,973

Like just as a simple example, if you're

doing linear regression, like you have a

:

01:11:24,973 --> 01:11:29,233

pretest, you regress, you condition on the

pretest, you adjust for that, really.

:

01:11:29,233 --> 01:11:34,153

But if you have a, usually things in Econ,

like things are measured with error.

:

01:11:34,153 --> 01:11:37,343

And so you won't really want to regress on

the pretest.

:

01:11:37,343 --> 01:11:40,833

What you really want to do is regress on

the latent value that the pretest is a

:

01:11:40,833 --> 01:11:41,873

measurement of.

:

01:11:41,873 --> 01:11:43,633

Well, you can do that in Stan now.

:

01:11:43,633 --> 01:11:48,393

So now in Stan, you can write these models

and do Bayesian models with latent

:

01:11:48,393 --> 01:11:49,613

variables and.

:

01:11:49,613 --> 01:11:54,773

I think there's some theoretical results

to be done to show how or see how these

:

01:11:54,773 --> 01:11:57,653

things reduce to other things in special

cases.

:

01:11:57,653 --> 01:12:02,773

It's a little related to my chickens paper

that I did a couple of years ago, which I

:

01:12:02,773 --> 01:12:04,233

really enjoyed.

:

01:12:04,233 --> 01:12:06,693

That's another story.

:

01:12:06,853 --> 01:12:11,833

The chicken story is not in the Act of

Statistics book.

:

01:12:11,933 --> 01:12:15,973

I don't think it's like there's more

stories.

:

01:12:15,973 --> 01:12:19,349

There's room for another 52 stories, I'm

sure.

:

01:12:19,725 --> 01:12:21,285

in the future.

:

01:12:22,125 --> 01:12:23,525

Yeah, for sure.

:

01:12:23,905 --> 01:12:29,825

And the, yeah, we should link to your

chicken paper, actually, in the show

:

01:12:29,825 --> 01:12:29,855

notes.

:

01:12:29,855 --> 01:12:30,945

I like the chicken paper.

:

01:12:30,945 --> 01:12:33,525

It's not the world's most readable.

:

01:12:33,525 --> 01:12:35,585

I mean, it's technical, but I like it.

:

01:12:35,585 --> 01:12:36,425

It's Bayesian.

:

01:12:36,425 --> 01:12:37,685

It's good.

:

01:12:37,945 --> 01:12:38,845

Yeah.

:

01:12:38,985 --> 01:12:43,665

Is it, are you referencing the one from

:

:

01:12:44,745 --> 01:12:45,125

Or is that...

:

01:12:45,125 --> 01:12:46,795

Yeah, yeah.

:

01:12:46,795 --> 01:12:48,205

Slamming the sham.

:

01:12:48,205 --> 01:12:52,164

A Bayesian model for adaptive adjustment

with noisy control data.

:

01:12:52,164 --> 01:12:56,045

Yeah, it's published in Statistics in

Medicine, which like a journal, nobody

:

01:12:56,045 --> 01:12:57,125

reads.

:

01:12:57,125 --> 01:12:58,315

But what can you do?

:

01:12:58,315 --> 01:13:00,365

I guess nobody reads any journal anymore.

:

01:13:00,365 --> 01:13:02,305

So that's fine, perhaps.

:

01:13:02,305 --> 01:13:04,665

Nobody reads anything.

:

01:13:05,265 --> 01:13:06,535

Nobody reads anything.

:

01:13:06,535 --> 01:13:08,605

They're too busy reading stuff.

:

01:13:09,605 --> 01:13:14,285

Yeah, I mean, definitely that's why it's

very good that you come on the show.

:

01:13:14,285 --> 01:13:16,845

And also that you write these books.

:

01:13:17,133 --> 01:13:21,192

I think it's extremely important because

definitely the general public doesn't read

:

01:13:21,192 --> 01:13:22,233

paper.

:

01:13:22,493 --> 01:13:27,533

I know I do read paper, but it's mainly

because I have to for my job.

:

01:13:27,533 --> 01:13:34,353

I almost never read a paper by pleasure

because it's just like, yeah, the way it's

:

01:13:34,353 --> 01:13:39,153

written is just like so dry, you know, and

I really love a story, as you were saying.

:

01:13:39,153 --> 01:13:42,973

That's also why I really love your

writings in your books, in your blog,

:

01:13:42,973 --> 01:13:46,605

because it's always wrapped.

:

01:13:46,605 --> 01:13:51,835

in a story and in a context and the papers

are mainly just, okay, this is the result,

:

01:13:51,835 --> 01:13:56,205

this is what we're doing, but it's just

too drawing to me and so I'm not reading

:

01:13:56,205 --> 01:14:00,285

that when I'm trying to just read for fun,

you know.

:

01:14:00,445 --> 01:14:04,985

But yeah, awesome, well thanks a lot

Andrew.

:

01:14:04,985 --> 01:14:11,745

I will, that being said, I will link to

this chicken paper in the show notes for

:

01:14:11,745 --> 01:14:13,665

people who want to dig deeper.

:

01:14:14,065 --> 01:14:16,013

Thank you so much Andrew for...

:

01:14:16,013 --> 01:14:20,093

again, taking the time and being on this

show.

:

01:14:20,693 --> 01:14:28,053

Two patrons will have the chance of

receiving for free a hard copy of your

:

01:14:28,053 --> 01:14:29,613

book, thanks to your editor.

:

01:14:29,613 --> 01:14:33,213

So thank you so much, Cambridge University

Press.

:

01:14:33,693 --> 01:14:39,833

And in the show notes, you will have the

links also to buy the book on the

:

01:14:39,833 --> 01:14:42,633

Cambridge University Press website.

:

01:14:42,873 --> 01:14:43,085

So...

:

01:14:43,085 --> 01:14:44,625

Go ahead and do that.

:

01:14:44,625 --> 01:14:50,605

You have a 20 % discount active until July

,:

:

01:14:51,385 --> 01:14:57,625

The code is in the show notes of these

episodes, so definitely go there.

:

01:14:57,905 --> 01:14:59,145

And By Andrew's book.

:

01:14:59,145 --> 01:15:03,725

This one is really fun and you can read it

on the beach this summer, you know, and

:

01:15:03,725 --> 01:15:09,025

then you'll have a lot of cool stories to

tell your children or at the bar at night,

:

01:15:09,025 --> 01:15:10,945

so definitely do that.

:

01:15:11,405 --> 01:15:16,245

Thanks again, Andrew, and of course,

welcome back on the show anytime you

:

01:15:16,245 --> 01:15:20,645

finish your 15 upcoming books.

:

01:15:21,565 --> 01:15:25,845

Merci encore pour l 'opportunité de parler

avec toi.

:

01:15:25,945 --> 01:15:30,865

Perfect, as you can hear, Andrew speaks

very good French.

:

01:15:34,989 --> 01:15:38,729

This has been another episode of Learning

Bayesian Statistics.

:

01:15:38,729 --> 01:15:43,689

Be sure to rate, review, and follow the

show on your favorite podcatcher, and

:

01:15:43,689 --> 01:15:48,609

visit learnbaystats .com for more

resources about today's topics, as well as

:

01:15:48,609 --> 01:15:53,349

access to more episodes to help you reach

true Bayesian state of mind.

:

01:15:53,349 --> 01:15:55,259

That's learnbaystats .com.

:

01:15:55,259 --> 01:16:00,119

Our theme music is Good Bayesian by Baba

Brinkman, fit MC Lass and Meghiraam.

:

01:16:00,119 --> 01:16:03,279

Check out his awesome work at bababrinkman

.com.

:

01:16:03,279 --> 01:16:04,429

I'm your host,

:

01:16:04,429 --> 01:16:05,429

Alex and Dora.

:

01:16:05,429 --> 01:16:09,669

You can follow me on Twitter at Alex

underscore and Dora like the country.

:

01:16:09,669 --> 01:16:14,749

You can support the show and unlock

exclusive benefits by visiting Patreon

:

01:16:14,749 --> 01:16:16,929

.com slash LearnBasedDance.

:

01:16:16,929 --> 01:16:19,389

Thank you so much for listening and for

your support.

:

01:16:19,389 --> 01:16:25,269

You're truly a good Bayesian change your

predictions after taking information and

:

01:16:25,269 --> 01:16:28,569

if you're thinking I'll be less than

amazing.

:

01:16:28,569 --> 01:16:31,725

Let's adjust those expectations.

:

01:16:31,725 --> 01:16:37,145

Let me show you how to be a good Bayesian

Change calculations after taking fresh

:

01:16:37,145 --> 01:16:43,185

data in Those predictions that your brain

is making Let's get them on a solid

:

01:16:43,185 --> 01:16:44,965

foundation

Chapters

Video

More from YouTube

More Episodes
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
85. #85 A Brief History of Sports Analytics, with Jim Albert
01:06:10
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
2. #2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck
00:43:37
3. #3.1 What is Probabilistic Programming & Why use it, with Colin Carroll
00:32:33
bonus #3.2 How to use Bayes in industry, with Colin Carroll
00:32:06
5. #5 How to use Bayes in the biomedical industry, with Eric Ma
00:46:37
8. #8 Bayesian Inference for Software Engineers, with Max Sklar
00:48:41
11. #11 Taking care of your Hierarchical Models, with Thomas Wiecki
00:58:01
22. #22 Eliciting Priors and Doing Bayesian Inference at Scale, with Avi Bryant
01:06:55
23. #23 Bayesian Stats in Business and Marketing Analytics, with Elea McDonnel Feit
00:59:05
32. #32 Getting involved into Bayesian Stats & Open-Source Development, with Peadar Coyle
00:53:04
33. #33 Bayesian Structural Time Series, with Ben Zweig
00:57:49
58. #58 Bayesian Modeling and Computation, with Osvaldo Martin, Ravin Kumar and Junpeng Lao
01:09:25
63. #63 Media Mix Models & Bayes for Marketing, with Luciano Paz
01:14:43
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05