Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
In this episode, I had the pleasure of speaking with Allen Downey, a professor emeritus at Olin College and a curriculum designer at Brilliant.org. Allen is a renowned author in the fields of programming and data science, with books such as "Think Python" and "Think Bayes" to his credit. He also authors the blog "Probably Overthinking It" and has a new book by the same name, which he just released in December 2023.
In this conversation, we tried to help you differentiate between right and wrong ways of looking at statistical data, discussed the Overton paradox and the role of Bayesian thinking in it, and detailed a mysterious Bayesian killer app!
But that’s not all: we even addressed the claim that Bayesian and frequentist methods often yield the same results — and why it’s a false claim. If that doesn’t get you to listen, I don’t know what will!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Links from the show:
Abstract
We are happy to welcome Allen Downey back to ur show and he has great news for us: His new book “Probably Overthinking It” is available now.
You might know Allen from his blog by the same name or his previous work. Or maybe you watched some of his educational videos which he produces in his new position at brilliant.org.
We delve right into exciting topics like collider bias and how it can explain the “low brith weight paradox” and other situations that only seem paradoxical at first, until you apply causal thinking to it.
Another classic Allen can unmystify for us is Simpson’s paradox. The problem is not the data, but your expectations of the data. We talk about some cases of Simpson’s paradox, for example from statistics on the Covid-19 pandemic, also featured in his book.
We also cover the “Overton paradox” - which Allen named himself - on how people report their ideologies as liberal or conservative over time.
Next to casual thinking and statistical paradoxes, we return to the common claim that frequentist statistics and Bayesian statistics often give the same results. Allen explains that they are fundamentally different and that Bayesian should not shy away from pointing that out and to emphasise the strengths of their methods.
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
In this episode, I had the pleasure of
speaking with Alan Derny, a professor
2
:emeritus at Allin College and a curriculum
designer at brilliant.org.
3
:Alan is a renowned author in the fields of
programming and data science, with books
4
:such as ThinkPython and ThinkBase to his
credit.
5
:He also authors the blog Probably
Overthinking It, and has a new book by the
6
:same name,
7
:which he just released in December 2023.
8
:In this conversation, we tried to help you
differentiate between right and wrong ways
9
:of looking at statistical data, we
discussed the overtone paradox and the
10
:role of Bayesian thinking in it, and we
detailed a mysterious Bayesian killer app.
11
:But that is not all.
12
:We even addressed the claim that Bayesian
infrequentist method often yield the same
13
:results, and why it is a false claim.
14
:If that doesn't get you to listen, I don't
know what will.
15
:This is Learning Basion Statistics,
,:
16
:Hello, Mediabasians!
17
:I have two announcements for you today.
18
:First, congratulations to the 10 patrons
who won a digital copy of Alan's new book.
19
:The publisher will soon get in touch and
send you the link to your free...
20
:digital copy if you didn't win.
21
:Well, you still won, because you get a 30%
discount if you order with the discount
22
:code UCPNew from the UChicagoPress
website.
23
:I put the link in the show notes, of
course.
24
:Second, a huge thank you to Matt Nichols,
Maxime Goussensdorf, Michael Thomas, Luke
25
:Corey and Corey Kaiser for supporting the
show on Patreon.
26
:I can assure you, this is the best way to
start the year.
27
:Thank you so much for your support.
28
:It literally makes this show possible and
it made my day.
29
:Now onto the show with Alan Downey.
30
:Show you how to be a good peasy and change
your predictions.
31
:Alan Downey, welcome back to Learning
Vasion Statistics.
32
:Thank you.
33
:It's great to be here.
34
:Yeah, thanks again for taking the time.
35
:And so for people who know you already,
36
:or getting to know you.
37
:Allen was already on LearnBasedStats in
episode 41.
38
:And so if you are interested in a bit more
detail with his background and also much
39
:more about his previous book, ThinkBased,
I recommend listening back to the episode
40
:41, which will be in the show notes.
41
:focus on other topics, especially your new
book, Alain.
42
:I don't know how you do that.
43
:But well done, congratulations on, again,
another great book that's getting out.
44
:But first, maybe a bit more generally, how
do you define the work that you're doing
45
:nowadays and the topics that you're
particularly interested in?
46
:It's a little hard to describe now because
I was a professor for more than 20 years.
47
:And then I left higher ed about a year, a
year and a half ago.
48
:And so now my day job, I'm at
brilliant.org and I am writing online
49
:lessons for them in programming and data
science, which is great.
50
:I'm enjoying that.
51
:Yeah.
52
:Sounds like fun.
53
:It is.
54
:And then also working on these books and
blogging.
55
:And
56
:I think of it now as almost being like a
gentleman scientist or an independent
57
:scientist.
58
:I think that's my real aspiration.
59
:I want to be an 18th century gentleman
scientist.
60
:I love that.
61
:Yeah, that sounds like a good objective.
62
:Yeah, it definitely sounds like fun.
63
:It also sounds a bit similar to...
64
:what I'm doing on my end with the podcasts
and also the online courses for intuitive
65
:base.
66
:And also I teach a lot of the workshops at
Pimc Labs.
67
:So yeah, a lot of teaching and educational
content on my end too, which I really
68
:love.
69
:So that's also why I do it.
70
:And yeah, it's fun because most of the
time, like you start
71
:teaching a topic and that's a very good
incentive to learn it in lots of details.
72
:Right.
73
:So, lately I've been myself diving way
more into caution processes again, because
74
:this is a very fascinating topic, but
quite complex and causal inference also
75
:I've been reading up again on this.
76
:So it's been quite fun.
77
:What has been on your mind recently?
78
:Well, you mentioned causal inference and
that is certainly a hot topic.
79
:It's one where I always feel I'm a little
bit behind.
80
:I've been reading about it and written
about it a little bit, but I still have a
81
:lot to learn.
82
:So it's an interesting topic.
83
:Yeah, yeah, yeah.
84
:And the cool thing is that honestly, when
you're coming from the Bayesian framework,
85
:to me that feels extremely natural.
86
:It's just a way of...
87
:Some concepts are the same, but they're
just named differently.
88
:So that's all you have to make the
connection in your brain.
89
:And some of them are somewhat new.
90
:But if you've been doing generative
modeling for a while, then just coming up
91
:with the directed acyclic graph for your
model and just updating it from a
92
:generative perspective and doing
counterfactual analysis, it's really,
93
:do it in the Bayesian workflow.
94
:So that's a really good, that really helps
you.
95
:To me, you already have the foundations.
96
:And you just have to, well, kind of add a
bit of a toolbox to it, you know, like,
97
:OK, so what's regression discontinuity
design?
98
:What's interrupted time series?
99
:Things like that.
100
:But otherwise, what's difference in
differences?
101
:things like that, but these are kind of
just techniques that you add on top of the
102
:foundations, but the concepts are pretty
easy to pick up if you've been in a
103
:Bayesian for a while.
104
:I guess that's really the good news for
people who are looking into that.
105
:It's not completely different from what
you've been doing.
106
:No, I think that's right.
107
:And in fact, I have a recommendation for
people if they're coming from Bayes and
108
:getting into causal inference.
109
:Judea Pearl's book, The Book of Why,
follows exactly the progression that you
110
:just described because he starts with
Bayesian nets and then says, well, no,
111
:actually, that's not quite sufficient.
112
:Now for doing causal inference, we need
the next steps.
113
:So that was his professional progression.
114
:And it makes, I think, a good logical
progression for learning these topics.
115
:Yeah, exactly.
116
:And well, funny enough, I've been, I've
started rereading the Book of White
117
:recently.
118
:I had read it like two, three years ago
and I'm reading it again because surely
119
:there are a lot of things that I didn't
pick up at the time, didn't understand.
120
:And there are some stuff that are going to
resonate with me more now that I have a
121
:bit more background, let's say, or...
122
:Some other people would say more wrinkles
on my front head, but I don't know why
123
:they would say that.
124
:So, Alain, already getting off topic, but
yeah, I really love that.
125
:The causal inference stuff has been fun.
126
:I'm teaching that next Tuesday.
127
:First time I'm going to teach three hours
of causal inference.
128
:That's going to be very fun.
129
:I can't wait for it.
130
:Like you try to study the topic and there
are all angles to consider and then a
131
:student will come up with a question that
you're like, huh, I did not think about
132
:that.
133
:Let me come back to you.
134
:That's really the fun stuff to me.
135
:As you say, I think every teacher has that
experience that you really learn something
136
:when you teach it.
137
:Oh yeah.
138
:Yeah, yeah.
139
:I mean, definitely.
140
:That's really one of the best ways for me
to learn.
141
:a deadline, first, I have to teach that
stuff.
142
:And then having a way of talking about the
topic, whether that's teaching or
143
:presenting, is really one of the most
efficient ways of learning, at least to
144
:me.
145
:Because I don't have the personal
discipline to just learn for the sake of
146
:learning.
147
:That doesn't really happen for me.
148
:Now, we might not be as off topic as you
think, because I do have a little bit of
149
:causal inference in the new book.
150
:Oh, yeah?
151
:I've got a section that is about collider
bias.
152
:And this is an example where if you go
back and read the literature in
153
:epidemiology, there is so much confusion.
154
:There was the low birth weight paradox was
one of the first examples, and then the
155
:obesity paradox and the twin paradox.
156
:And they're all baffling.
157
:if you think of it in terms of regression
or statistical association, and then once
158
:you draw the causal diagram and figure out
that you have selected a sample based on a
159
:collider, the light bulb goes on and it's,
oh, of course, now I get it.
160
:This is not a paradox at all.
161
:This is just another form of sampling
bias.
162
:What's a collider for the, I was going to
say the students, for the listeners?
163
:And also then what does collider bias mean
and how do you get around that?
164
:Yeah, no, this was really interesting for
me to learn about as I was writing the
165
:book.
166
:And the example that I started with is the
low birth weight paradox.
167
:And this comes from the 1970s.
168
:It was a researcher in California.
169
:who was studying low birth weight babies
and the effect of maternal smoking.
170
:And he found out that if the mother of a
newborn baby smoked, it is more likely to
171
:be low birth weight.
172
:And low birth weight babies have health
effects, including higher mortality.
173
:But what he found is that if you zoom in
and you just look at the low birth weight
174
:babies,
175
:you would find that the ones whose mother
smoked had better health outcomes,
176
:including lower mortality.
177
:And this was a time, this was in the 70s,
when people knew that cigarette smoking
178
:was bad for you, but it was still, you
know, public health campaigns were
179
:encouraging people to stop smoking, and
especially mothers.
180
:And then this article came out that said
that smoking appears to have some
181
:protective effect.
182
:for low birth weight babies.
183
:That in the normal range of birth weight,
it appears to be minimally harmful and for
184
:low birth weight babies, it's good.
185
:And so, he didn't quite recommend maternal
smoking but he almost did.
186
:And there was a lot of confusion.
187
:It was, I think it wasn't until the 80s
that somebody explained it in terms of
188
:causal inference.
189
:And then finally in the 90s where someone
was able to show using data that not only
190
:was this a mistake, but you could put the
numbers on it and say, look, this is
191
:exactly what's going on.
192
:If you correct for the bias, you will find
that not surprisingly smoking is bad
193
:across the board, even for low birth
weight babies.
194
:So the explanation is that there's a
collider and a collider in a causal graph
195
:means that there are two arrows.
196
:coming into the same box, meaning two
potential causes for the same thing.
197
:So in this case, it's low birth weight.
198
:And here's what I think is the simplest
explanation of the low birth weight
199
:paradox, which is there are two things
that will cause a baby to be low birth
200
:weight, either the mother smoked or
there's something else going on like a
201
:birth defect.
202
:The maternal smoking is relatively benign.
203
:It's not good for you, but it's not quite
as bad as the other effects.
204
:So you could imagine being a doctor.
205
:You've been called in to treat a patient.
206
:The baby is born at a low birth weight.
207
:And now you're worried.
208
:You're saying to yourself, oh, this might
be a birth defect.
209
:And then you find out that the mother
smoked.
210
:You would be relieved.
211
:because that explains the low birth weight
and it decreases the probability that
212
:there's something else worse going on.
213
:So that's the effect.
214
:And again, it's caused because when they
selected the sample, they selected low
215
:birth weight babies.
216
:So in that sense, they selected on a
collider.
217
:And that's where everything goes wrong.
218
:Yeah.
219
:And it's like, I find that really
interesting and fascinating because in a
220
:way,
221
:it comes down to a bias in the sample in a
way here.
222
:But also the like, so here, in a way, you
don't have really any ways of.
223
:doing the analysis without going back to
the data collecting step.
224
:But also, colliders are very tricky in the
sense that if you so you have that path,
225
:as you were saying.
226
:So the collider is a common effect of two
causes.
227
:And the two causes can be completely
unrelated.
228
:As is often said, if you control for the
collider, then it's going to open the path
229
:and it's going to allow information to
flow from, let's say, X to Y and C is the
230
:collider.
231
:X is not related to Y in the causal graph.
232
:But if you control for C, then X is going
to become related to Y.
233
:That's really the tricky thing.
234
:That's why we're telling people, do not
just throw.
235
:predictors at random in your models when
you're doing the linear regression, for
236
:instance.
237
:Because if there is a collider in your
graph, and very probably there is one at
238
:some point, if it's a complicated enough
situation, then you're going to have
239
:spurious statistical correlations which
are not causal.
240
:But you've created that by basically
opening the collider path.
241
:So the good news is that the path is
closed if you like.
242
:naturally.
243
:So if you don't control for that, if you
don't add that in your model, you're good.
244
:But if you start adding just predictors
all over the place, you're very probably
245
:going to create collider biases like that.
246
:So that's why it's not as easy when you
have a count found, which is kind of the
247
:opposite situation.
248
:So let's say now C is the common cause of
x and y.
249
:Well, then if you have a count found, you
want to block the pass.
250
:the path that's going from X to Y through
C to see if there is a path, direct path
251
:from X to Y.
252
:Then you want to control for C, but if
it's a collider, you don't.
253
:So that's why, like, don't control for
everything.
254
:Don't put predictors all over the place
because that can be very tricky.
255
:Yeah, and I think that's a really valuable
insight because when people start playing
256
:with regression,
257
:Sure, they just, you know, you add more to
the model, more is better.
258
:And yes, once you think about colliders
and mediators, and I think this vocabulary
259
:is super helpful for thinking about these
problems, you know, understanding what
260
:should and shouldn't be in your model if
what you're trying to do is causal.
261
:Yeah.
262
:And that's also definitely something I...
263
:can see a lot.
264
:It depends on where the students are
coming from.
265
:But yeah, where it's like they show me a
regression with, I don't know, 10
266
:predictors already.
267
:And then I can't.
268
:I swear the model doesn't make really
sense.
269
:I'm like, wait, did you try with less
predictors?
270
:Like, you just do first the model with
just an intercept and then build up from
271
:that?
272
:And no, often it turns out it's the first
version of the model with 10 predictors.
273
:So you're like, oh, wait.
274
:Look at that again from another
perspective, from a more minimalist
275
:perspective.
276
:But that's awesome.
277
:I really love that you're talking about
that in the book.
278
:I recommend people then looking at it
because it's not only very interesting,
279
:it's also very important if you're looking
into, well, are my models telling me
280
:something valuable?
281
:Are they?
282
:helping me understand what's going on or
is it just something that helps me predict
283
:better?
284
:But other than that, I cannot say a lot.
285
:So definitely listeners refer to that.
286
:And actually, the URL editor was really
kind to me and Alan because, well, first
287
:10 of the patrons are going to get the
book for free at random.
288
:So thank you so much.
289
:link that you have in the show notes, you
can buy the book at a 30% discount.
290
:So, even if you don't win, you will win.
291
:So, definitely go there and buy the book,
or if you're a patron, enter the random
292
:draw, and we'll see what randomness has in
stock for you.
293
:And actually, so we already started diving
in one of your chapters, but
294
:Maybe let's take a step back and can you
provide an overview of your new book
295
:that's called Probably Overthinking It and
what inspired you to write it?
296
:Yeah, well, Probably Overthinking It is
the name of my blog from more than 10
297
:years ago.
298
:And so one of the things that got this
project started was kind of a greatest
299
:hits from the blog.
300
:There were a number of articles that
had...
301
:either got a lot of attention or where I
thought there was something really
302
:important there that I wanted to collect
and present a little bit more completely
303
:and more carefully in a book.
304
:So that's what started it.
305
:And it was partly like a collection of
puzzles, a collection of paradoxes, the
306
:strange things that we see in data.
307
:So like Collider Bias, which is Berkson's
paradox is the other name for that.
308
:There's Simpson's paradox.
309
:There's one paradox after another.
310
:And that's when I started, I thought that
was what the book was going to be about.
311
:It was, here are all these interesting
puzzles.
312
:Let's think about them.
313
:But then what I found in every chapter was
that there was at least one example that
314
:bubbled up where these paradoxes were
having real effects in the world.
315
:People were getting things genuinely
wrong.
316
:And.
317
:those errors had consequences for public
health, for criminal justice, for all
318
:kinds of real things that affect real
lives.
319
:And that's where the book kind of took a
turn toward not so much the paradox
320
:because it's fun to think about, although
it is, but the places where we use data to
321
:make better decisions and get better
outcomes.
322
:And then a little bit of the warnings
about what can go wrong when we make some
323
:of these errors.
324
:And most of them boil down, when you think
about it, to one form of sampling bias or
325
:another.
326
:That should be the subtitle of this book
is like 12 chapters of sampling bias.
327
:Yeah, I mean, that's really interesting to
see that a lot of problems come from
328
:sampling biases, which is almost
disappointing in the sense that it sounds
329
:really simple.
330
:But I mean, as we can see in your book,
it's maybe easy to understand the problem,
331
:but then solving it is not necessarily
easy.
332
:So that's one thing.
333
:And then I'm wondering.
334
:How would you say, probably over thinking
it helps the readers differentiate between
335
:the right and wrong ways of looking at
statistical data?
336
:Yeah, I think there are really two
messages in this book.
337
:One of them is the optimistic view that we
can use data to answer questions and
338
:settle debates and make better decisions.
339
:and we will be better off if we do.
340
:And most of the time, it's not super hard.
341
:If you can find or collect the right data,
most of the time you don't need fancy
342
:statistics to answer the questions you
care about with the right data.
343
:And usually a good data visualization, you
can show what you wanna show in a
344
:compelling way.
345
:So that's the good news.
346
:And then the bad news is these warnings.
347
:I think the key to these things is to
think about them and to see a lot of
348
:examples.
349
:And I'll take like Simpson's paradox as an
example.
350
:If you take an intro stats class, you
might see one or two examples.
351
:And I think you come away thinking that
it's just weird, like, oh, those were
352
:really confusing and I'm not sure I really
understand what's happening.
353
:where at some point you start thinking
about Simpson's paradox and you just
354
:realize that there's no paradox there.
355
:It's just a thing that can happen because
why not?
356
:If you have different groups and you plot
a line that connects the two groups, that
357
:line might have one slope.
358
:And then when you zoom in and look at one
of those groups in isolation and plot a
359
:line through it, there's just no reason.
360
:that second line within the group should
have the same slope as the line that
361
:connects the different groups.
362
:And so I think that's an example where
when you see a lot of examples, it changes
363
:the way you think about the thing.
364
:Not from, oh, this is a weird, confusing
thing to, well, actually, it's not a thing
365
:at all.
366
:The only thing that was confusing is that
my expectation was wrong.
367
:Yeah, true.
368
:Yeah, I love that.
369
:I agree.
370
:always found it a bit weird to call all
these phenomenon paradoxes in a way.
371
:Because as you're saying, it's more prior
expectation that makes it a paradox.
372
:Whereas, why should nature obey our simple
minds and priors?
373
:there is nothing that says it should.
374
:And so most of the time, it's just that,
well, reality is not the way we thought it
375
:was.
376
:That's OK.
377
:And I mean, in a way, thankfully,
otherwise, it would be quite boring.
378
:But yeah, that's a bit like when data is
dispersed a lot, there is a lot of
379
:variability in the data.
380
:And then we tend to say data is over
dispersed.
381
:which I always find weird.
382
:It's like, well, it's not the data that's
over dispersed.
383
:It's the model that's under dispersed.
384
:The data doesn't have to do anything.
385
:It's the model that has to adapt to the
data.
386
:So just adapt the model.
387
:But yeah, it's a fun way of phrasing it,
whereas it's like it's the data's fault.
388
:But no, not really.
389
:It's just, well, it's just a lot of
variation.
390
:And.
391
:And that made me think actually the
Simpson paradox that also made me think
392
:about, did you see that recent paper by, I
mean from this year, so it's quite recent
393
:for a paper from Andrew Gellman, Jessica
Hellman, and Lauren Kennedy about the
394
:causal quartets?
395
:No, I missed it.
396
:Awesome, well I'll send that away and I'll
put that on the show notes.
397
:But basically the idea is,
398
:taking Simpson's paradox, but instead of
looking at it from a correlation
399
:perspective, looking at it from a causal
perspective.
400
:And so that's basically the same thing.
401
:It's different ways to get the same
average treatment effect.
402
:So, you know, like Simpson's paradox where
you have four different data points and
403
:you get the same correlation between them,
well, here you have four different
404
:causal structures that give you different
data points.
405
:But if you just look at the average
treatment effect, you will think that it's
406
:the same for the four, whereas it's not.
407
:You know, so the point is also, well,
that's why you should not only look at the
408
:average treatment effect, right?
409
:Look at the whole distribution of
treatment effects, because if you just
410
:look at the average, you might be in a
situation where the population is really
411
:not diverse and then yeah, the average
treatment effect is fake.
412
:effect is something representative.
413
:But what if you're in a very dispersed
population and the treatment effects can
414
:be very negative or very positive, but
then if you look at the averages, it looks
415
:like there is no average treatment effect.
416
:So then you could conclude that there is
no treatment effect, whereas there is
417
:actually a big treatment effect just that
when you look at the average, it cancels
418
:out.
419
:So yeah, like the...
420
:The idea of the paper is the main idea is
that.
421
:And that's, I mean, I think this will be
completely trivial to you, but I think
422
:it's a good way of teaching this, where
you can, if you just look at the average,
423
:you can get beaten by that later on.
424
:Because basically, if you're average, you
summarize.
425
:And if you summarize, you're looking some
information somewhere.
426
:So you're young.
427
:You have to cut some dimension of
information to average naturally.
428
:So if you do that, it comes at a cost.
429
:And the paper does a good job at showing
that.
430
:Yes, that's really interesting because
maybe coincidentally, this is something
431
:that I was thinking about recently,
looking at the evidence for pharmaceutical
432
:treatments for depression.
433
:There was a meta-analysis a few months
ago.
434
:that really showed quite modest treatment
effects, that the average is not great.
435
:And the conclusion that the paper drew was
that the medications were effective for
436
:some people and they said something like
15%, which is also not great, but
437
:effective for 15% and ineffective or
minimally effective for others.
438
:And I was actually surprised by that
result because it was not clear to me how
439
:they were distinguishing between having a
very modest effect for everybody or a
440
:large effect for a minority that was
averaged in with a zero effect for
441
:everybody else, or even the example that
you mentioned, which is that you could
442
:have something that's highly effective for
one group and detrimental for another
443
:group.
444
:And exactly as you said, if you're only
looking at the mean, you can't tell the
445
:difference.
446
:But what I don't know and I still want to
find out is in this study, how did they
447
:draw the conclusion that they drew, which
is they specified that it's effective for
448
:15% and not for others.
449
:So yeah, I'll definitely read that paper
and see if I can connect it with that
450
:research I was looking at.
451
:Yeah.
452
:Yeah, I'll send it to you and I already
put it in the show notes for people who
453
:want to dig deeper.
454
:And I mean, that's a very common pitfall,
especially in the social sciences, where
455
:doing big experiments with lots of
subjects is hard and very costly.
456
:And so often you're doing inferences on
very small groups.
457
:And that's even more complicated to just
look at the average treatment effect.
458
:It can be very problematic.
459
:And interestingly, I talked about that.
460
:I mentioned that paper first in episode 89
with Eric Trexler, who works on the
461
:science of nutrition and exercise,
basically.
462
:So in this field, especially, it's very
hard to have big samples when they do
463
:experiments.
464
:And so most of the time, they have 10, 20
people per group.
465
:is like each time I read that literature,
first they don't use patient stats a lot.
466
:And I'm like, with so low sample sizes,
it's, I'm like, yeah, you should use more,
467
:use BRMS, use BAMB, if you don't really
know how to do the models, but really, you
468
:should.
469
:And also, if you do that, and then you
also only look at the average treatment
470
:effects.
471
:I'm guessing you have.
472
:big uncertainties on the conclusions you
can draw.
473
:So yeah, I will put that episode also in
the show notes for people who when I
474
:referred to it, that was a very
interesting episode where we talked about
475
:exercise science, nutrition, how that
relates to weight management and how from
476
:an anthropological perspective, also how
the body reacts to these effects.
477
:mostly will fight you when you're trying
to lose a lot of weight, but doesn't
478
:really fight you when you gain a lot of
weight.
479
:And that's also very interesting to know
about these things, especially with the
480
:rampant amount of obesity in the Western
societies where it's really concerning.
481
:And so these signs helps understand what's
going on and how also we can help.
482
:people getting into more trajectories that
are better for their health, which is the
483
:main point basically of that research.
484
:I'm also wondering, if your book, when you
wrote it, and especially now that you've
485
:written it, what would you say, what do
you see as the key takeaways for readers?
486
:And especially for readers who may not
have a strong background in statistics.
487
:Part of it is I hope that it's empowering
in the sense that people will feel like
488
:they can use data to answer questions.
489
:As I said before, it often doesn't require
fancy statistics.
490
:So...
491
:There are two parts of this, I think.
492
:And one part is as a consumer of data, you
don't have to be powerless.
493
:You can read data journalism and
understand the analysis that they did,
494
:interpret the figures and maintain an
appropriate level of skepticism.
495
:In my classes, I sometimes talk about
this, a skeptometer, where if you believe
496
:everything that you read,
497
:That is clearly a problem.
498
:But at the other extreme, I often
encounter students who have become so
499
:skeptical of everything that they read
that they just won't accept an answer to a
500
:question ever.
501
:Because there's always something wrong
with a study.
502
:You can always look at a statistical
argument and find a potential flaw.
503
:But that's not enough to just dismiss
everything that you read.
504
:If you think you have found a potential
flaw, there's still a lot of work to do to
505
:show that actually that flaw is big enough
to affect the outcome substantially.
506
:So I think one of my hopes is that people
will come away with a well-calibrated
507
:skeptometer, which is to look at things
carefully and think about the kinds of
508
:errors that there can be, but also take
the win.
509
:If we have the data and we come up with a
satisfactory answer, you can accept that
510
:question as provisionally answered.
511
:Of course, it's always possible that
something will come along later and show
512
:that we got it wrong, but provisionally,
we can use that answer to make good
513
:decisions.
514
:And by and large, we are better off.
515
:This is my argument for evidence and
reason.
516
:But by and large, if we make decisions
that are based on evidence and reason, we
517
:are better off than if we don't.
518
:Yeah, yeah.
519
:I mean, of course I agree with that.
520
:It's like preaching to the choir.
521
:It shouldn't be controversial.
522
:No, yeah, for sure.
523
:A difficulty I have though is how do you
explain people they should care?
524
:You know?
525
:Why do you think...
526
:we should care about even making decisions
based on data.
527
:Why is that even important?
528
:Because that's just more work.
529
:So why should people care?
530
:Well, that's where, as I said, in every
chapter, something bubbled up where I was
531
:a little bit surprised and said, this
thing that I thought was just kind of an
532
:academic puzzle actually matters.
533
:People are getting it wrong.
534
:because of this.
535
:And there are examples in the book,
several from public health, several from
536
:criminal justice, where we don't have a
choice about making decisions.
537
:We're making decisions all the time.
538
:The only choice is whether they're
informed or not.
539
:And so one of the example, actually,
Simpson's paradox is a nice example.
540
:Let me see if I remember this.
541
:It came from a journalist, and I
deliberately don't name him in the book
542
:because I just don't want to give him any
publicity at all.
543
:but the Atlantic magazine named him the
pandemic's wrongest man because he made a
544
:career out of committing statistical
errors and misleading people.
545
:And he actually features in two chapters
because he commits the base rate fallacy
546
:in one and then gets fooled by Simpson's
paradox in another.
547
:And if I remember right, in the Simpsons
Paradox example, he looked at people who
548
:were vaccinated and compared them to
people who were not vaccinated and found
549
:that during a particular period of time in
the UK, the death rate was higher for
550
:people who were vaccinated.
551
:The death rate was lower for people who
had not been vaccinated.
552
:So on the face of it, okay, well, that's
surprising.
553
:Okay, that's something we need to explain.
554
:It turns out to be an example of Simpson's
paradox, which is the group that he was
555
:looking at was a very wide age range from
I think 15 to 89 or something like that.
556
:And at that point in time during the
pandemic, by and large, the older people
557
:had been vaccinated and younger people had
not, because that was the priority
558
:ordering when the vaccines came out.
559
:So in the group that he compared, the ones
who were vaccinated were substantially
560
:older than the ones who were unvaccinated.
561
:And the death rates, of course, were much
higher in older age groups.
562
:So that explained it.
563
:range of ages together into one group, you
saw one effect.
564
:And if you broke it up into small age
ranges, that effect reversed itself.
565
:So it was a Simpson's paradox.
566
:If you appropriately break people up by
age, you would find that in every single
567
:age group, death rates were lower among
the vaccinated, just as you would expect
568
:if the vaccine was safe and effective.
569
:And that's also where I feel like if you
start thinking about the causal graph, you
570
:know, and the causal structure, that's
also where that would definitely help.
571
:Because it's not that hard, right?
572
:The idea here is not hard.
573
:It's not even hard mathematically.
574
:I think anybody can understand it even if
they don't have a mathematical background.
575
:So yeah, it's mainly that.
576
:And I think the most important point is
that, yeah.
577
:matters because it affects decisions in
the real world.
578
:That thing has literally life and death
consequences.
579
:I'm glad you mentioned it because you do
discuss the base rate fallacy and its
580
:connection to Bayesian thinking in the
book, right?
581
:It starts with the example that everybody
uses, which is interpreting the results of
582
:a medical test.
583
:Because that's a case that's surprising
when you first hear about it and where
584
:Bayesian thinking clarifies the picture
completely.
585
:Once you get your head around it, it is
like these other examples.
586
:Not only gets explained, it stops being
surprising.
587
:And this I'll...
588
:Give the example, I'm sure this is
familiar to a lot of your listeners, but
589
:if you take a medical test, let's take a
COVID test as an example, and suppose that
590
:the test is accurate, 90% accurate, and
let's suppose that means both specificity
591
:and sensitivity.
592
:So if you have the condition, there's a
90% chance that you correctly get a
593
:positive test.
594
:If you don't have the condition, there's a
90% chance that you correctly get a
595
:negative test.
596
:And so now the question is, you take the
test, it comes back positive, what's the
597
:probability that you have the condition?
598
:And that's where people kind of jump onto
that accuracy statistic.
599
:And they think, well, the test is 90%
accurate, so there's a 90% chance that I
600
:have, let's say, COVID in this example.
601
:And that can be totally wrong, depending
on the base rate or invasion terms,
602
:depending on the prior.
603
:And here's where the Bayesian thinking
comes out, which is that different people
604
:are going to have very different priors in
this case.
605
:If if you know that you were exposed to
somebody with COVID three days later, you
606
:feel a scratchy throat.
607
:The next day you wake up with flu
symptoms.
608
:Before you even take a test, I'm going to
say there's at least a 50% chance that you
609
:have COVID, maybe higher.
610
:Could be a cold.
611
:So, you know, it's not 100%.
612
:So let's say it's 50-50.
613
:You take this COVID test.
614
:And let's say, again, 90% accuracy, which
is lower than the home test.
615
:So I'm being a little bit unfair here.
616
:But let's say 90%.
617
:Your prior was 50-50.
618
:The likelihood ratio is about 9 to 1.
619
:And so your posterior belief is about 9 to
1, which is roughly 90%.
620
:So quite likely that test is correct,
621
:in this example, have COVID.
622
:But the flip side is, let's say you're in
New Zealand, which has a very low rate of
623
:COVID infection.
624
:You haven't been exposed.
625
:You've been working from home for a week,
and you have no symptoms at all.
626
:You feel totally fine.
627
:What's your base rate there?
628
:What's the probability that you
miraculously have COVID?
629
:1 in 1,000 at most, probably lower.
630
:And so if you.
631
:took a test and it came back positive,
it's still probably only about one in a
632
:hundred that you actually have COVID and a
99% chance that that's a false positive.
633
:So that's, you know, as I said, that's the
usual example.
634
:It's probably familiar, but it's a case
where if you neglect the prior, if you
635
:neglect the base rate, you can be not just
a little bit wrong, but wrong by orders of
636
:magnitude.
637
:Yeah, exactly.
638
:And it is a classical example for us in
the stats world, but I think it's very
639
:effective for non-stats people because
that also talks to them.
640
:And it's also the gut reaction to a
positive test is so geared towards
641
:thinking you do have the disease that I
think that that's also why
642
:It's a good one.
643
:Another paradox you're talking about in
the book is the Overton paradox.
644
:Could you share some insights into this
one?
645
:I don't think I know that one and how
Bayesian analysis plays a role in
646
:understanding it, if any.
647
:Sure.
648
:Well, you may not have heard of the
Overton paradox, and that's because I made
649
:the name up.
650
:We'll see, I don't know if it will stick.
651
:One of the things I'm a little bit afraid
of is it's possible that this is something
652
:that has been studied and is well known
and I just haven't found it in the
653
:literature.
654
:I've done my best and I've asked a number
of people, but I think it's a thing that
655
:has not been given a name.
656
:So maybe I've given it a name, but we'll
find out.
657
:But that's not important.
658
:The important part is I think it answers
an interesting question.
659
:And this is
660
:If you compare older people and younger
people in terms of their political
661
:beliefs, you will find in general that
older people are more conservative.
662
:So younger people, more liberal, older
people are more conservative.
663
:And if you follow people over time and you
ask them, are you liberal or conservative,
664
:it crosses over.
665
:When people are roughly 25 years old, they
are more likely to say liberal.
666
:By the time they're 35 or 40, they are
more likely to say conservative.
667
:So we have two patterns here.
668
:We have older people actually hold more
conservative beliefs.
669
:And as people get older, they are more
likely to say that they are conservative.
670
:Nevertheless, if you follow people over
time, their beliefs become more liberal.
671
:So that's the paradox.
672
:By and large, people don't change their
beliefs a lot over the course of their
673
:lives.
674
:Excuse me.
675
:But when they do, they become a little bit
more liberal.
676
:But nevertheless, they are more likely to
say that they are conservative.
677
:So that's the paradox.
678
:And let me put it to you.
679
:Do you know why?
680
:I've heard about the two in isolation, but
I don't think I've heard them linked that
681
:way.
682
:And no, for now, I don't have an intuitive
explanation to that.
683
:So I'm very curious.
684
:So here's my theory, and it is partly that
conservative and liberal are relative
685
:terms.
686
:I am to the right of where I perceive the
center of mass to be.
687
:And the center of mass is moving over
time.
688
:And that's the key, primarily because of
generational replacement.
689
:So as older people die and they are
replaced by younger people, the mean
690
:shifts toward liberal pretty consistently
over time.
691
:And it happens in all three groups among
people who identify themselves as
692
:conservative, liberal, or moderate.
693
:All three of those lines are moving almost
in parallel toward more liberal beliefs.
694
:And what that means is if you took a time
machine to:
695
:average liberal and you put them in a time
machine and you bring them to the year
696
:2000.
697
:they would be indistinguishable from a
moderate in the year:
698
:And if you bring them all the way to the
present, they would be indistinguishable
699
:from a current conservative, which is a
strange thing to realize.
700
:If you have this mental image of people in
tie dye with peace medallions from the
701
:seventies being transported into the
present, they would be relatively
702
:conservative compared to current views.
703
:And that is almost that time traveler
example is almost exactly what happens to
704
:people over the course of their lives.
705
:That in their youth, they hold views that
are left of center.
706
:And their views change slowly over time,
but the center moves faster.
707
:And that's, I call it chasing the Overton
window.
708
:The Overton window, I should explain where
that term comes from, is in political
709
:science,
710
:It is the set of ideas that are
politically acceptable at any point in
711
:time.
712
:And it shifts over time, which is
something that might have been radical in
713
:the 1970s, might be mainstream now.
714
:And there are a number of views from the
seventies that were pretty mainstream.
715
:Like a large fraction.
716
:I don't think it was a majority, but I
forget the number.
717
:It might, might've been 30% of people in
the:
718
:marriages should be illegal.
719
:Yeah.
720
:That wasn't the majority view, but it was
mainstream.
721
:And now that's pretty out there.
722
:That's a pretty small minority still hold
that view and it's considered extreme.
723
:Yeah, and it changed quite, quite fast.
724
:Yes.
725
:Also, like, the acceptability of same sex
marriage really changed very fast.
726
:If you look in it, you know,
727
:time series perspective.
728
:That's also a very interesting thing that
these opinions can change very fast.
729
:So yeah, okay.
730
:I understand.
731
:It's kind of like how you define liberal
and conservative in a way explains that
732
:paradox.
733
:Very interesting.
734
:This is a little speculative, but that's
something that might have accelerated
735
:since the 1990s.
736
:that in many of the trends that I saw
between:
737
:relatively slow and they were being driven
by generational replacement.
738
:By and large, people were not changing
their minds.
739
:It's just that people would die and be
replaced.
740
:There's a line from the sciences that says
that the sciences progress one funeral at
741
:a time.
742
:Just a little morbid.
743
:But that is in some sense the baseline
rate.
744
:societal change and it's relatively slow.
745
:It's about 1% a year.
746
:Yeah.
747
:In the starting the 1990s, and
particularly you mentioned support for
748
:same sex marriage, also just general
acceptance of homosexuality changed
749
:radically.
750
:In in 1990, it was about 75% of the US
population would have said that
751
:homosexuality was wrong.
752
:That was one of the questions in the
general social survey.
753
:Do you think it's wrong?
754
:75%?
755
:That's
756
:I think below 30 now.
757
:So between 1990 and now, let's say roughly
40 years, it changed by about 40
758
:percentage points.
759
:So that's about the speed of light in
terms of societal change.
760
:And one of the things that I did in the
book was to try to break that down into
761
:how much of that is generational
replacement and how much of that is people
762
:actually changing their minds.
763
:And that was an example where I think 80%
of the change was changed minds.
764
:not just one funeral at a time.
765
:So that's something that might be
different now.
766
:And one obvious culprit is the internet.
767
:So we'll see.
768
:Yeah.
769
:And another proof that the internet is
neither good nor bad, right?
770
:It's just a tool, and it depends on what
we're doing with it.
771
:The internet is helping us right now
having that conversation and me having
772
:that podcast for four years.
773
:Otherwise, that would have been.
774
:virtually impossible.
775
:So yeah, really depends on what you're
doing with it.
776
:And another topic, I mean, I don't think I
don't remember it being in the book, but I
777
:think you mentioned it in one of your blog
posts, is the idea of a Bajan killer app.
778
:So I have to ask you about that.
779
:Why is it important in the context of
decision making and statistics?
780
:I think a perpetual question, which is,
you know, if Bayesian methods are so
781
:great, why are they not taking off?
782
:Why is not everybody is using them?
783
:And I think one of the problems is that
when people do the comparison of Bayesian
784
:and frequentism, and they have tried out
the usual debates, they often show an
785
:example where you do the frequentist
analysis and you get a point estimate.
786
:And then you do the Bayesian analysis and
you generate a point estimate.
787
:And sometimes it's the same or roughly the
same.
788
:And so people sort of shrug and say, well,
you know, what's the big deal?
789
:The problem there is that when you do the
Bayesian analysis, the result is a
790
:posterior distribution that contains all
of the information that you have about
791
:whatever it was that you were trying to
estimate.
792
:And if you boil it down to a point
estimate, you've discarded all the useful
793
:information.
794
:So.
795
:If all you do is compare point estimates,
you're really missing the point.
796
:And that's where I was thinking about what
is the killer app that really shows the
797
:difference between Bayesian methods and
the alternatives.
798
:And my favorite example is the Bayesian
bandit strategy or Thompson sampling,
799
:which is an application to anything that's
like A-B testing or running a medical test
800
:where you're comparing two different
treatments.
801
:you are always making a decision about
which thing to try next, A or B or one
802
:treatment or the other, and then when you
see the result you're updating your
803
:beliefs.
804
:So you're constantly collecting data and
using that data to make decisions.
805
:And that's where I think the Bayesian
methods show what they're really good for,
806
:because if you are making decisions and
those decisions
807
:the whole posterior distribution because
most of the time you're doing some kind of
808
:optimization.
809
:You are integrating over the posterior or
in discrete world, you're just looping
810
:over the posterior and for every possible
outcome, figuring out the cost or the
811
:benefit and weighting it by its posterior
probability.
812
:That's where you get the real benefit.
813
:And so, Thompson
814
:end-to-end application where people
understand the problem and where the
815
:solution is a remarkably elegant and
simple one.
816
:And you can point to the outcome and say,
this is an optimal balance of exploitation
817
:and exploration.
818
:You are always making the best decision
based on the information that you have at
819
:that point in time.
820
:Yeah.
821
:Yeah, I see what you're saying.
822
:And I...
823
:In a way, it's a bit of a shame that it's
the simplest application because it's not
824
:that simple.
825
:But yeah, I agree with that example.
826
:And for people, I put this blog post where
you talk about that patient care app in
827
:the show notes because yeah, it's not
super easy,
828
:I think it's way better in a written
format, or at least a video.
829
:But yeah, definitely these kind of
situations in a way where you have lots of
830
:uncertainty and you really care about
updating your belief as accurately as
831
:possible, which happens a lot.
832
:But yeah, in this case also, I think it's
extremely valuable.
833
:But I think it can be.
834
:Because first of all, I think if you do it
using conjugate priors, then the update
835
:step is trivial.
836
:You're just updating beta distributions.
837
:And every time a new data comes in, a new
datum, you're just adding one to one of
838
:your parameters.
839
:So the computational work is the increment
operator, which is not too bad.
840
:But I've also done a version of Thompson
sampling as a dice game.
841
:I want to take this opportunity to point
people to it.
842
:I gave you the link, so I hope it'll be in
the notes.
843
:But the game is called The Shakes.
844
:And I've got it up on a GitHub repository.
845
:But you can do Thompson sampling just by
rolling dice.
846
:Yeah.
847
:So we'll definitely put that in the show
notes.
848
:And also to come back to something you
said just a bit earlier.
849
:For sure.
850
:Then also something that puzzles me is
when people have a really good patient
851
:model, it's awesome.
852
:It's a good representation of the
underlying data generating process.
853
:It's complex enough, but not too much.
854
:It samples well.
855
:And then they do decision making based on
the mean of the posterior estimates.
856
:And I'm like, no, that's a shame.
857
:Why are you doing that past the whole
distribution?
858
:to your optimizer so that you can make
decisions based on the full uncertainty of
859
:the model and not just take the most
probable outcome.
860
:Because first, maybe that's not really
what you care about.
861
:And also, by definition, it's going to
sample your decision.
862
:It's going to bias your decision.
863
:So yeah, that always kind of breaks my
heart.
864
:But you've worked so well to get that.
865
:It's so hard to get those posterior
distributions.
866
:And now you're just.
867
:throwing everything away.
868
:That's a shame.
869
:Yeah.
870
:Do patient decision making, folks.
871
:You're losing all that information.
872
:And especially in any case where you've
got very nonlinear costs, nonlinear in the
873
:size of the error, and especially if it's
asymmetric.
874
:Thinking about almost anything that you
build, you always have a trade off between
875
:under building and over building.
876
:Over building is bad because it's
expensive.
877
:And underbuilding is bad because it will
fail catastrophically.
878
:So that's a case where you have very
nonlinear costs and very asymmetric.
879
:If you have the whole distribution, you
can take into account what's the
880
:probability of extreme catastrophic
effects, where the tail of that
881
:distribution is really important to
potential outcomes.
882
:Yeah, definitely.
883
:And.
884
:What I mean, I could continue, but we're
getting short on time and I still have a
885
:lot of things to ask you.
886
:So let's move on.
887
:And actually, I think you mentioned it a
bit at the beginning of your answer to my
888
:last question.
889
:But in another of your blog posts, you
addressed the claim that patient
890
:infrequentist methods often yield the same
results.
891
:And so I know you like to talk about that.
892
:So could you elaborate on this and why
you're saying it's a false claim?
893
:Yeah, as I mentioned this earlier, you
know, frequentist methods produce a point
894
:estimate and a confidence interval.
895
:And Bayesian methods produce a posterior
distribution.
896
:So they are different kinds of things.
897
:They cannot be the same.
898
:And I think Bayesians sometimes say this
as a way of being conciliatory that, you
899
:know, we're trying to let's all get along.
900
:And often, frequentist and Bayesian
methods are compatible.
901
:So that's good.
902
:The Bayesian methods aren't scary.
903
:I think strategically that might be a
mistake, because you're conceding the
904
:thing that makes Bayesian methods better.
905
:It's the posterior distribution that is
useful for all the reasons that we just
906
:said.
907
:So it is never the same.
908
:It is sometimes the case that if you take
the posterior distribution and you
909
:summarize it,
910
:with a point estimate or an interval, that
yes, sometimes those are the same as the
911
:frequentist methods.
912
:But the analogy that I use is, if you are
comparing a car and an airplane, but the
913
:rule is that the airplane has to stay on
the ground, then you would come away and
914
:you would think, wow, that airplane is a
complicated, expensive, inefficient way to
915
:drive on the highway.
916
:And you're right.
917
:If you want to drive on the highway, an
airplane is a terrible idea.
918
:The whole point of an airplane is that it
flies.
919
:If you don't fly the plane, you are not
getting the benefit of an airplane.
920
:That is a good point.
921
:And same, if you are not using the
posterior distribution, you are not
922
:getting the benefit of doing Bayesian
analysis.
923
:Yeah.
924
:Yeah, exactly.
925
:drive airplanes on the highway hurt you
well.
926
:Actually, a really good question is that
you can really see, and I think I do, and
927
:I'm probably sure you do in the work, you
do see many practitioners that might be
928
:hesitant to adopt patient methods due to
some perceived complexity most of the
929
:time.
930
:So I wonder in general, what resources or
strategies you recommend to those who want
931
:to learn and apply patient techniques in
their work.
932
:Yeah, I think Bayesian methods get the
reputation for complexity, I think largely
933
:because of MCMC.
934
:That if that's your first exposure, that's
scary and complicated.
935
:Or if you do it mathematically and you
start with big scary integrals, I think
936
:that also makes it seem more complex than
it needs to be.
937
:I think there are a couple of
alternatives.
938
:And the one that I use in think Bayes is
everything is discrete and everything is
939
:computational.
940
:So all of those integrals become for loops
or just array operations.
941
:And I think that helps a lot.
942
:So those are using grid algorithms.
943
:I think grid algorithms can get you a
really long way with very little tooling,
944
:basically arrays.
945
:You lay out a grid, you compute a prior,
you compute a likelihood, you do a
946
:multiplication, which is usually just an
array multiplication, and you normalize,
947
:divide through by the total.
948
:That's it.
949
:That's a Bayesian update.
950
:So I think that's one approach.
951
:The other one, I would consider an
introductory stats class that does
952
:everything using Bayesian methods, using
conjugate priors.
953
:And don't derive anything.
954
:Don't compute why the beta binomial model
works.
955
:But if you just take it as given, that
when you are estimating a proportion, you
956
:run a bunch of trials.
957
:and you'll have some number of successes
and some number of failures.
958
:Let's call it A and B.
959
:You build a beta distribution that has the
parameters A plus one, B plus one.
960
:That's it.
961
:That's your posterior.
962
:And now you can take that posterior beta
distribution and answer all the questions.
963
:What's the mean?
964
:What's a confidence or credible interval?
965
:But more importantly, like what are the
tail probabilities?
966
:What's the probability that I could exceed
some critical value?
967
:Or, again, loop over that posterior and
answer interesting questions with it.
968
:You could do all of that on the first day
of a statistics class.
969
:And use a computer, because we can
compute.
970
:SciPy.stats.beta will tell you everything
you want to know about a beta
971
:distribution.
972
:of a stats class, that's estimating
proportions.
973
:It's everything you need to do.
974
:And it handles all of the weird cases.
975
:Like if you want to estimate a very small
probability, it's okay.
976
:You can still get a confidence interval.
977
:It's all perfectly well behaved.
978
:If you have an informative prior, sure, no
problem.
979
:Just start with some pre-counts in your
beta distribution.
980
:So day one, estimating proportions.
981
:Day two, estimate rates.
982
:You could do exactly the same thing with a
Poisson gamma model.
983
:And the update is just as trivial.
984
:And you could talk about Poisson
distributions and exponential
985
:distributions and estimating rates.
986
:My favorite example is I always use either
soccer, football, or hockey as my example
987
:of goal scoring rates.
988
:And you can generate predictions.
989
:You can say, what are the likely outcomes
of the next game?
990
:What's the chance that I'm going to win,
let's say, it's a best of seven series.
991
:The update is computationally nothing.
992
:Yeah.
993
:And you can answer all the interesting
questions about rates.
994
:So that's day two.
995
:I don't know what to do with the rest of
the semester because we've just done 90%
996
:of an intro stats class.
997
:Yes.
998
:Yeah, that sounds like something I think
that would work in the sense that at least
999
:that was my experience.
:
01:06:29,130 --> 01:06:34,332
Funny story, I used to not like stats,
which is funny when you see what I'm doing
:
01:06:34,332 --> 01:06:35,052
today.
:
01:06:35,052 --> 01:06:40,054
But when I was in university, I did a lot
of math.
:
01:06:40,194 --> 01:06:43,995
And the thing is, the stats we were doing
with was pen and paper.
:
01:06:44,076 --> 01:06:46,216
So it was incredibly boring.
:
01:06:46,216 --> 01:06:51,939
I was always, you know, dice problems and
very trivial stuff that you have to do
:
01:06:51,939 --> 01:06:55,920
that because the human brain is not good
at computing that kind of stuff, you know.
:
01:06:59,118 --> 01:07:05,682
did when I started having to use
statistics to do electoral forecasting.
:
01:07:05,843 --> 01:07:06,984
I was like, but this is awesome.
:
01:07:06,984 --> 01:07:09,246
Like I can just simulate the distribution.
:
01:07:09,246 --> 01:07:12,028
I can see them on the screen.
:
01:07:12,028 --> 01:07:14,630
I can really almost touch them.
:
01:07:14,630 --> 01:07:20,455
You know, and that was much more concrete
first and also much more empowering
:
01:07:20,455 --> 01:07:26,800
because I could work on topics that were
not trivial stuff that I only would use
:
01:07:26,800 --> 01:07:28,021
for board games.
:
01:07:28,021 --> 01:07:28,581
You know?
:
01:07:28,581 --> 01:07:29,201
So.
:
01:07:30,234 --> 01:07:32,955
I think it's a very powerful way of
teaching for sure.
:
01:07:34,975 --> 01:07:43,338
So to play us out, I'd like to zoom out a
bit and ask you what you hope readers will
:
01:07:43,338 --> 01:07:49,380
take away from probably overthinking it
and how can the insights from your book be
:
01:07:49,380 --> 01:07:53,001
applied to improve decision making in
various fields?
:
01:07:53,001 --> 01:07:53,361
Yeah.
:
01:07:53,361 --> 01:07:54,402
Well, I think I'll...
:
01:07:54,402 --> 01:07:59,863
come back to where we started, which is it
is about using data to answer questions,
:
01:08:00,243 --> 01:08:01,843
make better decisions.
:
01:08:02,104 --> 01:08:08,465
And my thesis again is that we are better
off when we use evidence and reason than
:
01:08:08,465 --> 01:08:09,466
when we don't.
:
01:08:09,466 --> 01:08:11,066
So I hope it's empowering.
:
01:08:11,066 --> 01:08:17,408
I hope people come away from it thinking
that you don't need graduate degrees in
:
01:08:17,408 --> 01:08:23,146
statistics to work with data to interpret
the results that you're seeing in
:
01:08:23,146 --> 01:08:29,730
research papers, in newspapers, that it
can be straightforward.
:
01:08:30,051 --> 01:08:33,453
And then occasionally there are some
surprises that you need to know about.
:
01:08:35,210 --> 01:08:38,331
Yeah.
:
01:08:38,331 --> 01:08:38,731
For sure.
:
01:08:38,731 --> 01:08:45,214
Personally, have you changed some of the
ways you're making decisions based on your
:
01:08:45,214 --> 01:08:46,514
work for this book, Kéján?
:
01:08:48,735 --> 01:08:49,255
Maybe.
:
01:08:49,255 --> 01:08:56,578
I think a lot of the examples in the book
come from me thinking about something in
:
01:08:56,578 --> 01:08:57,619
real life.
:
01:08:58,779 --> 01:09:04,462
There's one example where when I was
running a relay race, I noticed that
:
01:09:05,182 --> 01:09:09,843
everybody was either much slower than me
or much faster than me.
:
01:09:09,843 --> 01:09:13,984
And it seemed like there was nobody else
in the race who was running at my speed.
:
01:09:15,284 --> 01:09:19,426
And that's the kind of thing where when
you're running and you're oxygen deprived,
:
01:09:19,426 --> 01:09:21,146
it seems really confusing.
:
01:09:21,526 --> 01:09:24,907
And then with a little bit of reflection,
you realize, well, there's some
:
01:09:24,907 --> 01:09:29,788
statistical bias there, which is, if
someone is running the same speed as me,
:
01:09:29,788 --> 01:09:31,769
I'm unlikely to see them.
:
01:09:33,249 --> 01:09:33,718
Yeah.
:
01:09:33,718 --> 01:09:38,759
But if they are much faster or much
slower, then I'm going to overtake them or
:
01:09:38,759 --> 01:09:40,439
they're going to overtake me.
:
01:09:40,439 --> 01:09:42,200
Yeah, exactly.
:
01:09:42,200 --> 01:09:46,321
And that makes me think about an
absolutely awesome joke from, of course, I
:
01:09:46,321 --> 01:09:54,523
don't remember the name of the comedian,
but very, very well-known US comedian that
:
01:09:54,523 --> 01:09:55,103
you may know.
:
01:09:55,103 --> 01:10:00,865
And the joke was, have you ever noticed
that everybody that drives slower than you
:
01:10:00,865 --> 01:10:02,814
on the road is a jackass?
:
01:10:02,814 --> 01:10:08,536
and everybody that drives faster than you
is a moron.
:
01:10:08,576 --> 01:10:10,837
It's really the same idea, right?
:
01:10:10,837 --> 01:10:16,539
It's like you have the right speed and
you're doing the right thing and everybody
:
01:10:16,539 --> 01:10:21,061
else is just either a moron or a jackass.
:
01:10:21,061 --> 01:10:21,902
That's exactly right.
:
01:10:21,902 --> 01:10:23,902
I believe that is George Carlin.
:
01:10:24,083 --> 01:10:26,604
This exactly George Carlin, yeah, yeah.
:
01:10:26,604 --> 01:10:30,165
And amazing, I mean, George Carlin is just
absolutely incredible.
:
01:10:30,165 --> 01:10:30,825
But...
:
01:10:30,846 --> 01:10:36,467
Yeah, that's what is already a very keen
observation of the human nature also, I
:
01:10:36,467 --> 01:10:39,128
think.
:
01:10:39,128 --> 01:10:48,031
Which is also an interesting joke in the
sense that it relates to one, you know,
:
01:10:48,031 --> 01:10:54,312
concepts of how minds change and how
people think about reality and so on.
:
01:10:56,293 --> 01:10:57,574
And I find it...
:
01:10:57,574 --> 01:10:58,514
I find it very interesting.
:
01:10:58,514 --> 01:11:01,616
So for people interested, I know we're
short on time, so I'm just going to
:
01:11:01,616 --> 01:11:07,699
mention there is an awesome book that's
called How Minds Change by David McCraney.
:
01:11:07,699 --> 01:11:09,300
I'll put that in the show notes.
:
01:11:09,300 --> 01:11:14,683
And he talks about these kind of topics
and that's especially interesting.
:
01:11:14,683 --> 01:11:19,306
And of course, patient statistics are
mentioned in the book because if you're
:
01:11:19,306 --> 01:11:24,669
interested in optimal decision making at
some point, you're going to talk about
:
01:11:24,669 --> 01:11:25,729
patient stats.
:
01:11:26,134 --> 01:11:27,114
But he's a journalist.
:
01:11:27,114 --> 01:11:31,198
Like he doesn't know at all about patient
stats originally.
:
01:11:31,198 --> 01:11:33,359
And then at some point, it just appears.
:
01:11:34,420 --> 01:11:35,801
I will check that out.
:
01:11:35,801 --> 01:11:38,343
Yeah, I'll put that into the show notes.
:
01:11:39,825 --> 01:11:44,808
So before asking you the last two
questions, Alan, I'm curious about your
:
01:11:45,850 --> 01:11:52,334
predictions, because we're all scientists
here, and we're interested in predictions.
:
01:11:53,576 --> 01:11:55,762
I wonder if you think there is a way
:
01:11:55,762 --> 01:12:01,832
In the realm of statistics education, are
there any innovative approaches or
:
01:12:01,832 --> 01:12:07,282
technologies that you believe have the
potential to change, transform how people
:
01:12:07,282 --> 01:12:09,965
learn and apply statistical concepts?
:
01:12:11,866 --> 01:12:16,268
Well, I think the things we've been
talking about, computation, simulation,
:
01:12:16,328 --> 01:12:23,232
and Bayesian methods, I think have the
best chance to really change statistics
:
01:12:23,232 --> 01:12:25,253
education.
:
01:12:25,253 --> 01:12:27,294
I'm not sure how it will happen.
:
01:12:27,774 --> 01:12:34,558
It doesn't look like statistics
departments are changing enough or fast
:
01:12:34,558 --> 01:12:35,358
enough.
:
01:12:35,779 --> 01:12:39,741
I think what's going to happen is that
data science departments are going to be
:
01:12:39,741 --> 01:12:40,901
created
:
01:12:41,126 --> 01:12:43,867
And I think that's where the innovation
will be.
:
01:12:44,748 --> 01:12:48,231
But I think the question is, what that
will mean?
:
01:12:48,231 --> 01:12:53,314
When you create a data science department,
is it going to be all machine learning and
:
01:12:53,735 --> 01:13:01,781
algorithms or statistical thinking and
basic using data for decision making, as
:
01:13:01,781 --> 01:13:03,562
I'm advocating for?
:
01:13:03,942 --> 01:13:06,004
So obviously, I hope it's the latter.
:
01:13:06,004 --> 01:13:08,425
I hope data science becomes.
:
01:13:08,926 --> 01:13:13,990
in some sense, what statistics should have
been and starts doing a better job of
:
01:13:13,990 --> 01:13:20,035
using, as I said, computation, simulation,
Bayesian thinking, and causal inference, I
:
01:13:20,035 --> 01:13:21,816
think is probably the other big one.
:
01:13:22,837 --> 01:13:23,698
Yeah.
:
01:13:23,698 --> 01:13:25,018
Yeah, exactly.
:
01:13:25,199 --> 01:13:30,363
And they really go hand in hand also, as
we were seeing at the very beginning of
:
01:13:30,363 --> 01:13:31,143
the show.
:
01:13:32,265 --> 01:13:38,329
Of course, I do hope that that's going to
be the case.
:
01:13:38,570 --> 01:13:40,831
You've already been very generous with
your time.
:
01:13:40,831 --> 01:13:46,394
So let's ask you the last two questions,
ask everyone at the end of the show.
:
01:13:46,394 --> 01:13:50,996
And you're in a very privileged position
because it's your second episode here.
:
01:13:50,996 --> 01:13:57,760
So you're in the position where you can
answer something else from your previous
:
01:13:57,760 --> 01:14:02,783
answers, which is a very privileged
position because usually the difficulty of
:
01:14:02,783 --> 01:14:08,325
these questions is that you have to choose
and you cannot answer all of it.
:
01:14:08,822 --> 01:14:12,148
you get to have a second round, Alain.
:
01:14:12,148 --> 01:14:18,361
So first, if you had unlimited time and
resources, which problem would you try to
:
01:14:18,361 --> 01:14:19,081
solve?
:
01:14:21,206 --> 01:14:30,069
I think the problem of the 21st century is
how do we get to::
01:14:30,069 --> 01:14:33,830
planet and a good quality of life for
everybody on it?
:
01:14:34,411 --> 01:14:37,532
And I think there is a path that gets us
there.
:
01:14:37,972 --> 01:14:42,474
It's a little hard to believe when you
focus on the problems that we currently
:
01:14:42,474 --> 01:14:43,234
see.
:
01:14:43,294 --> 01:14:45,075
But I'm optimistic.
:
01:14:45,075 --> 01:14:48,716
I really do think we can solve climate
change.
:
01:14:50,934 --> 01:14:55,056
the slow process of making things better.
:
01:14:55,557 --> 01:15:01,381
If you look at history on a long enough
term, you will find that almost everything
:
01:15:01,381 --> 01:15:08,826
is getting better in ways that are often
invisible, because bad things happen
:
01:15:08,826 --> 01:15:14,189
quickly and visibly, and good things
happen slowly and in the background.
:
01:15:14,770 --> 01:15:19,633
But my hope for the 21st century is that
we will continue to make slow, gradual
:
01:15:19,633 --> 01:15:20,593
progress
:
01:15:21,154 --> 01:15:23,916
and a good ending for everybody on the
planet.
:
01:15:23,916 --> 01:15:26,017
So that's what I want to work on.
:
01:15:26,017 --> 01:15:33,301
Yeah, I love the optimistic tone to close
out the show.
:
01:15:34,202 --> 01:15:38,164
And second question, if you could have
dinner with any great scientific mind,
:
01:15:38,164 --> 01:15:40,246
then it would be a lot more fictional.
:
01:15:40,246 --> 01:15:42,107
Who would it be?
:
01:15:43,007 --> 01:15:45,308
I think I'm going to argue with the
question.
:
01:15:46,694 --> 01:15:51,137
I think it's based on this idea of great
scientific minds, which is a little bit
:
01:15:51,137 --> 01:15:56,260
related to the great person theory of
history, which is that big changes come
:
01:15:56,260 --> 01:15:59,862
from unique, special individuals.
:
01:16:00,423 --> 01:16:02,104
I'm not sure I buy it.
:
01:16:02,104 --> 01:16:06,667
I think the thing about science that is
exciting to me is that it is a social
:
01:16:06,667 --> 01:16:07,828
enterprise.
:
01:16:07,948 --> 01:16:10,509
It is intrinsically collaborative.
:
01:16:10,570 --> 01:16:12,150
It is cumulative.
:
01:16:17,462 --> 01:16:21,825
Making large contributions, I think, very
often is the right person in the right
:
01:16:21,825 --> 01:16:23,506
place at the right time.
:
01:16:24,167 --> 01:16:28,450
And I think often they deserve that
recognition.
:
01:16:29,351 --> 01:16:32,573
But even then, I'm going to say it's the
system.
:
01:16:32,874 --> 01:16:36,336
It's the social enterprise of science that
makes progress.
:
01:16:36,977 --> 01:16:41,020
So that's, I want to have dinner with the
social enterprise of science.
:
01:16:42,081 --> 01:16:44,503
Well, you call me if you know how to do
that.
:
01:16:45,424 --> 01:16:46,245
But yeah.
:
01:16:46,245 --> 01:16:46,945
I mean.
:
01:16:47,126 --> 01:16:52,149
Choking aside, I completely agree with you
and I think also it's a very good reminder
:
01:16:52,490 --> 01:16:57,074
to say it right now because we're
recording very close to the time where
:
01:16:57,074 --> 01:17:05,341
Nobel prizes are awarded and yeah, these
participate in the fame, like making
:
01:17:05,341 --> 01:17:11,846
science basically kind of like another
movie industry or industries like that are
:
01:17:11,846 --> 01:17:15,369
played by just fame.
:
01:17:16,598 --> 01:17:19,018
and all that comes with it.
:
01:17:19,018 --> 01:17:23,179
And yeah, I completely agree that this is
especially a big problem in science
:
01:17:23,179 --> 01:17:29,481
because scientists are often specialized
in a very small part of their field.
:
01:17:29,481 --> 01:17:36,843
And usually for me, it's a red flag, and
that happened a lot in COVID, where some
:
01:17:36,843 --> 01:17:41,464
scientists started talking about
epidemiology, whereas it was not their
:
01:17:43,305 --> 01:17:43,965
specialty.
:
01:17:43,965 --> 01:17:44,565
And
:
01:17:45,622 --> 01:17:48,744
To me, usually that's a red flag, but the
problem is that if they are very
:
01:17:48,744 --> 01:17:53,768
well-known scientists who may end up
having the Nobel Prize, well, then
:
01:17:53,768 --> 01:17:56,450
everybody listens to them, even though
they probably shouldn't.
:
01:17:56,811 --> 01:18:02,615
When you rely too much on fame and
popularity, that's a huge problem.
:
01:18:02,715 --> 01:18:11,042
Just trying to make heroes is a big
problem because it helps from a narrative
:
01:18:11,042 --> 01:18:13,804
perspective to make people interested in
science.
:
01:18:16,214 --> 01:18:19,414
basically that people start learning about
them.
:
01:18:19,514 --> 01:18:23,755
But there is a limit where it also
decorates people.
:
01:18:24,696 --> 01:18:30,217
Because, you know, if it's that hard, if
you have to be that smart, if you have to
:
01:18:30,217 --> 01:18:37,959
be Einstein or Oppenheimer or any of these
big or Laplace, you know, then it's just
:
01:18:37,959 --> 01:18:40,600
like, you don't even want to start.
:
01:18:46,206 --> 01:18:47,306
working on this.
:
01:18:47,306 --> 01:18:54,088
And that's a big problem because as you're
saying, progress for scientific progress
:
01:18:54,088 --> 01:18:59,469
is small incremental steps done by
community that works together.
:
01:19:00,390 --> 01:19:04,691
And there is competition of course, but
that really works together.
:
01:19:04,691 --> 01:19:11,893
And yeah, if you start implying that most
of that is just you have to be a once in a
:
01:19:11,893 --> 01:19:14,393
century genius to make science.
:
01:19:14,478 --> 01:19:19,580
We're going to have problems, especially
HR problems in the universities.
:
01:19:19,580 --> 01:19:21,561
So yeah, no, you don't need that.
:
01:19:21,561 --> 01:19:28,563
And also you're right that if you look
into the previous work, like even for
:
01:19:28,563 --> 01:19:33,986
Einstein, the idea of relativity was
already there in the time.
:
01:19:34,306 --> 01:19:40,188
If you look at some writings from
Poincaré, one of the main French
:
01:19:40,188 --> 01:19:43,049
mathematicians of the 20th century.
:
01:19:43,114 --> 01:19:47,996
already Poincaré just a few years before
Einstein is already talking about this
:
01:19:47,996 --> 01:19:50,978
idea of relativity and you can see the
equations also in one of his books
:
01:19:50,978 --> 01:19:52,959
previous to Einstein's publications.
:
01:19:52,959 --> 01:20:00,163
So it's like often it's, as you were
saying, an incredible person that's also
:
01:20:00,163 --> 01:20:08,227
here at the right time, at the right
place, who is in the ideas of his time.
:
01:20:08,308 --> 01:20:10,729
So that's also very important to
highlight.
:
01:20:10,729 --> 01:20:12,029
I completely agree with that.
:
01:20:13,362 --> 01:20:17,745
Yeah, in almost every case that you look
at, if you ask the question, if this
:
01:20:17,745 --> 01:20:22,828
person had not done X, when would it have
happened?
:
01:20:22,848 --> 01:20:24,549
Or who else might have done it?
:
01:20:24,549 --> 01:20:30,653
And almost every time the ideas were
there, they would have come together.
:
01:20:30,653 --> 01:20:36,997
Yeah, maybe a bit later, or even maybe a
bit earlier, we never know.
:
01:20:36,997 --> 01:20:41,040
But yeah, that's definitely the case.
:
01:20:41,060 --> 01:20:43,166
And I think the best
:
01:20:43,166 --> 01:20:49,529
proxy to the dinner we wanted to have is
to have a dinner with the LBS community.
:
01:20:49,749 --> 01:20:54,652
So we should organize that, you know, like
an LBS dinner where everybody can join.
:
01:20:55,472 --> 01:20:57,053
That would actually be very fun.
:
01:20:57,053 --> 01:20:58,414
Maybe one day I'll get to do that.
:
01:20:58,414 --> 01:21:05,918
One of my wildest dreams is to organize a,
you know, live episode somewhere where
:
01:21:05,918 --> 01:21:12,241
people could come join the show live and
have a live audience and so on.
:
01:21:13,206 --> 01:21:15,107
We'll see if I can do that one day.
:
01:21:15,788 --> 01:21:19,991
If you have ideas or opportunities, feel
free to let me know.
:
01:21:19,991 --> 01:21:22,953
And I think about it.
:
01:21:25,022 --> 01:21:25,542
Awesome.
:
01:21:25,542 --> 01:21:27,642
Alain, let's call it a show.
:
01:21:27,642 --> 01:21:30,083
I could really record with you for like
three hours.
:
01:21:30,083 --> 01:21:36,125
I literally still have a lot of questions
on my cheat sheet, but let's call it a
:
01:21:36,125 --> 01:21:41,846
show and allow you to go to your main
activities for the day.
:
01:21:41,846 --> 01:21:44,407
So thank you a lot, Alain.
:
01:21:44,527 --> 01:21:48,128
As I was saying, I put a lot of resources
and a link to your website in the show
:
01:21:48,128 --> 01:21:49,968
notes for those who want to dig deeper.
:
01:21:50,289 --> 01:21:53,449
Thanks again, Alain, for taking the time
and being on this show.
:
01:21:54,210 --> 01:21:54,670
Thank you.
:
01:21:54,670 --> 01:21:55,470
It's been really great.
:
01:21:55,470 --> 01:21:58,371
It's always a pleasure to talk with you.
:
01:21:58,371 --> 01:21:58,612
Yeah.
:
01:21:58,612 --> 01:22:02,793
Feel free to come back to the show and
answer the last two questions for a third
:
01:22:02,793 --> 01:22:03,534
time.