Artwork for podcast Learning Bayesian Statistics
#119 Causal Inference, Fiction Writing and Career Changes, with Robert Kubinec
Causal Inference, AI & Machine Learning Episode 11913th November 2024 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:25:00

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

  • Bob's research focuses on corruption and political economy.
  • Measuring corruption is challenging due to the unobservable nature of the behavior.
  • The challenge of studying corruption lies in obtaining honest data.
  • Innovative survey techniques, like randomized response, can help gather sensitive data.
  • Non-traditional backgrounds can enhance statistical research perspectives.
  • Bayesian methods are particularly useful for estimating latent variables.
  • Bayesian methods shine in situations with prior information.
  • Expert surveys can help estimate uncertain outcomes effectively.
  • Bob's novel, 'The Bayesian Hitman,' explores academia through a fictional lens.
  • Writing fiction can enhance academic writing skills and creativity.
  • The importance of community in statistics is emphasized, especially in the Stan community.
  • Real-time online surveys could revolutionize data collection in social science.

Chapters:

00:00 Introduction to Bayesian Statistics and Bob Kubinec

06:01 Bob's Academic Journey and Research Focus

12:40 Measuring Corruption: Challenges and Methods

18:54 Transition from Government to Academia

26:41 The Influence of Non-Traditional Backgrounds in Statistics

34:51 Bayesian Methods in Political Science Research

42:08 Bayesian Methods in COVID Measurement

51:12 The Journey of Writing a Novel

01:00:24 The Intersection of Fiction and Academia

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström and Stefan.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Did you know there is a novel out there about...

2

:

...Basian statistics?

3

:

It even has a great title, The Basian Hitman, and an even greater author, Robert Kubinik.

4

:

When I heard about that, I, of course, had to invite Bob on the show.

5

:

An assistant professor at the University of South Carolina, Bob's research focuses on

wealth creation and democratization.

6

:

causal inference, and Bayesian statistics.

7

:

In this episode, Bob takes us through his fascinating journey from working in government

to pursuing a career in academia, exploring his current work on measuring corruption and

8

:

how Bayesian methods help in estimating latent variables.

9

:

This is Learning Bayesian Statistics, episode 119, recorded October 8, 2024.

10

:

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

11

:

I'm your host, Alex Andorra.

12

:

You can follow me on Twitter at alex-underscore-andorra.

13

:

like the country.

14

:

For any info about the show, learnbasedats.com is Laplace to be.

15

:

Show notes, becoming a corporate sponsor, unlocking Bayesian Merch, supporting the show on

Patreon, everything is in there.

16

:

That's learnbasedats.com.

17

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.

18

:

See you around, folks.

19

:

and best patient wishes to you all.

20

:

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

21

:

Check us out at pimc-labs.com.

22

:

Hello my dear patients!

23

:

Today I want to welcome two new patrons in the Lone Base Dads family.

24

:

Thank you so much to Rasmus Hinstrom and to the mysterious Stefan.

25

:

Your support truly makes this show possible.

26

:

I can't wait to talk with you in the Slack channel and I hope you will enjoy being there

and talking about everything base.

27

:

Okay, on to the show now.

28

:

Bob Kubinek, welcome to Learning Bayesian Statistics.

29

:

Thank you.

30

:

Alex, it's so great to be on here.

31

:

Thanks so much for inviting me.

32

:

Yeah, that's awesome.

33

:

And I also really love how that episode came around because I discovered your book, I

mean, your novel, which is your first novel.

34

:

We'll talk about that during the show, which is called The Bayesian Hitman.

35

:

Of course, everybody should go and check it out.

36

:

It will be in the show notes.

37

:

And so I discovered that book when I was at StandCon a few weeks ago.

38

:

And I recorded a bunch of like two, live episodes.

39

:

And then of course, afterwards you go for a drink.

40

:

And I was having a drink with, I think it was Francis D'Italia and Richard McGrath.

41

:

And we started.

42

:

Of course, we were of course talking about base stuff and I don't know why at some point

Francis mentioned your book.

43

:

It was like, Richard, I remember.

44

:

And I was like, wait, there is a novel about like that's based around base statistics.

45

:

I need to have that person on the show.

46

:

So that's how it came to my knowledge.

47

:

And then Francis, I think tagged you on Twitter and we talked a bit.

48

:

on on twitter and then he we are so that's definitely one of the most random episodes that

i've ever done yeah i mean that's fantastic and hopefully richard maccabee does read does

49

:

read my novel so i'd love to get his feedback on it and he his textbook like for many

people for me it was very influential in in doing basian statistics and he's an excellent

50

:

writer i don't know if he's written any fiction the fun thing i will say about writing a

novel

51

:

What's unusual about it isn't actually so much that I wrote a novel.

52

:

There actually are a fair number of novels written by academics.

53

:

It's more that I wrote it under my actual name.

54

:

There's some untold number of academics that are publishing under pen names.

55

:

Yeah, which is really fun.

56

:

So next time you see a book that's anonymous, it could be by an academic.

57

:

For me, I use my real name because the book actually is also about academia, even about

the work that I do.

58

:

And I wanted

59

:

I wanted people in the field to read it.

60

:

That was part of the fun of writing it.

61

:

That's why it's under my actual name, not some cool made up name.

62

:

Okay.

63

:

Well, that's already interesting.

64

:

I wasn't aware of that at all.

65

:

We'll definitely get back to that.

66

:

But that's my job here to do the teasing.

67

:

Stay tuned for more about why Bob wrote the book.

68

:

But first, let's talk about your

69

:

your origin story and all that because you have a very interesting one.

70

:

But first, as usual, can you tell us what you're doing nowadays?

71

:

Yeah.

72

:

So I'm assistant professor of political science at the University of South Carolina.

73

:

I just started here.

74

:

Before this, was at New York University Abu Dhabi in Abu Dhabi, the United Arab Emirates

for five years.

75

:

I did my PhD at the University of Virginia.

76

:

and I did a one-year postdoc at Princeton.

77

:

My work here, so in political science, I'm what they call a political economist.

78

:

I study a lot of, what I tell people is you take kind of the worst parts about political

science and economics, then you put them together, and you've got political economy.

79

:

So it's all like analyzing things like corruption, money in politics, business influence

over politics.

80

:

And maybe on the brighter side, I do some work on economic development.

81

:

I have a big project right now studying entrepreneurship in developing countries and how

young people can kind of get more involved.

82

:

But yeah, do a lot of these sort of dark money.

83

:

the connection really to Beijing Cystics is that a lot of the stuff that I study is very,

hard to measure.

84

:

And a lot of my work in Beijing Cystics deals with measurement and using

85

:

models, especially latent variable models, and very difficult measurement problems.

86

:

How do you find an estimate?

87

:

I'm just writing a viewer response today about this.

88

:

How do you find an estimate of something that's hard to study that you can't directly

observe that kind of best incorporates everything you know?

89

:

And Bayesian frameworks are really, I think, best at that.

90

:

know, roughly half of my research is sort of, let's say, statistical in nature, making new

models, especially in measurement modeling.

91

:

And then the other half is sort of empirical, like going out into the real world and

trying to discover new things about political economy.

92

:

That is super fun.

93

:

So you mean that people don't self-declare that they are corrupted or corrupting someone?

94

:

Yeah, I it.

95

:

Yeah, yeah, it is really strange.

96

:

And corruption is this really fun thing to study, precisely because, you know, how do you

get people to admit to it, right?

97

:

And there's this whole, you know, field of what they call sensitive survey questions,

which is, you know, if someone has done something wrong or illegal, how can you get them

98

:

to admit that on a survey in a way that, you know, and in some sense, it's sort of, you

talk about things that are unobservable, you can't

99

:

it's very hard to observe someone doing something completely illegal or unethical because

sort of by definition if it's illegal they're not going to do it in front of other people

100

:

that are not going to admit to it.

101

:

you have this sort of you know big central problem of you know how do you determine or how

do you study social behavior that you can observe right?

102

:

And yeah it's a complex issue.

103

:

Yeah for sure but that sounds...

104

:

That sounds fascinating.

105

:

Maybe you can talk a bit about that, we'll go back to your origin story.

106

:

Since we're talking about that, how do you do that?

107

:

How do you make people admit that they did something wrong without having them admit it?

108

:

Do you use hypnosis?

109

:

What are you using, Bob?

110

:

There's two approaches.

111

:

One is to rely on...

112

:

sort of government data.

113

:

And this they do a lot in developed democracies.

114

:

So primarily the EU and the United States.

115

:

And that's because, you know, in these countries, they have enough regulatory capability

that they can force people to report.

116

:

So like in the US and the EU, there are like lobbying registries, right, where every time

a business deals with a politician, there's a record of it.

117

:

Also, even when there's not a record of things, there's so much data available.

118

:

For example, there's a lot of work on contracting.

119

:

people will get access to the records of all the government contracts that have been

issued in a certain country and then search through them and try to find companies that

120

:

have political connections to certain politicians.

121

:

And there, again, it's tricky because you can't necessarily prove that there was a corrupt

transaction or something like this.

122

:

But let's say if you see that companies that have many more

123

:

they have someone on the board who was a former member of parliament if they get contracts

at a higher rate relative to other companies that don't you know that that sort of that

124

:

suggested things that's that's kinda one way of doing it and the other is through you know

trying and this is more i do which is trying to directly crop collect information about

125

:

corruption these sorts of issues and and that's a lot about trying to assure people some

kind of anonymity confidentiality

126

:

And I do a lot of online survey research which is better at confidentiality because if you

can fill out something on your phone or in your computer by yourself, I've done a lot of

127

:

getting employees to talk about what their companies do.

128

:

So they don't necessarily have to report that they themselves did a corrupt transaction.

129

:

But do you know if your business is working with some political parties or has offered

130

:

Do you know if your boss or CEO seems to be involved in some kind of transactions with

political parties?

131

:

Which happens all the time in developing countries.

132

:

Anyway, yeah.

133

:

So that's how I've been really getting at it.

134

:

Lately, I've been experimenting.

135

:

I kind of mentioned sensitive survey questions, and these are really fun.

136

:

They're basically ways of trying to encrypt a survey.

137

:

So encryption uses keys, right?

138

:

And so there's a method called randomized response and I actually have a blog post about

this I have a recent study that I did in Tunisia using this where essentially you use a

139

:

key the key is the respondents birthday So you you ask the respondent?

140

:

Are you were you born in these three months of the year?

141

:

And then you ask them a question that combines that answer which is random right?

142

:

with actual so you say

143

:

If you were born, like so you say, you know, did you did you let's say give a bribe or

something or do this sensitive thing?

144

:

The answer is like you you did that or you were born in these three months of the year.

145

:

And the other answer is, you know, yes, I did it or something like that.

146

:

What you do is you combine the this natural randomness from this question that's

irrelevant, but random with the actual thing you're trying to measure.

147

:

And just like

148

:

encryption that you use on a computer or whatever, it's actually the same process.

149

:

Because I don't know the respondent's birthday, I don't know their true answer.

150

:

But when I take all that data, because I know the proportion of people in the population

who are born in those three months of the year, which is roughly uniform, you can then

151

:

back out what the population estimate is, the latent estimate.

152

:

So this is some really clever, very

153

:

counterintuitive methods.

154

:

And so I'm experimenting with some of these now and yeah, doing things like that.

155

:

But.

156

:

This is super fun.

157

:

Sounds a bit like also, you know, detective work.

158

:

Yeah.

159

:

That's really fun.

160

:

Yeah.

161

:

Anti-corruption research is fun.

162

:

It's extremely difficult and difficult to get data.

163

:

And often you're kind of guessing with like...

164

:

what you can gather, but I do enjoy it a lot.

165

:

Yeah.

166

:

That makes me think a bit of a project I worked with some researchers from Dalhousie

University.

167

:

That sounds very different, but it made me think of it because they were trying to infer

the trade of shark meat across, know, between countries.

168

:

And, but trying to do it at the species level in countries don't have to report the

species.

169

:

They trade, they have to report the species they fish, but not the species they, they

trade.

170

:

And the thing is they, are some species they cannot trade.

171

:

And so of course, they, they, they can report the, the species that the trade, the

species.

172

:

for trade, but they don't always do it.

173

:

so you're like, okay, if they don't do it, that mean they are trading some species that

are not supposed to?

174

:

so actually the whole work was to try and infer from both the trade data and the lending's

data, so the fishing data, which species are actually traded by which country to which

175

:

country.

176

:

And so that's a bit like this where you don't have.

177

:

The data is based on self-reports.

178

:

The reports are not very constraining, so you have to do all that detective work and

that's where all the Bayesian methods are very powerful.

179

:

Yeah, totally.

180

:

That's where I've gotten lot of leverage from them in my own work.

181

:

I think too, because most Bayesian models have a frequentist analog.

182

:

For a lot of people, that's like, well, I don't see the difference.

183

:

When you're running a regression model, often there really isn't, right?

184

:

If it's like a maximum likely that simple linear regression model.

185

:

But where the Bayesian methods really shine is when you're studying some latent quantity

and especially when you have prior information about that.

186

:

Because when you have some, let's say, very subtle prior information, like let's say,

know, experts think that the wildlife trade isn't higher than like this threshold.

187

:

But there's uncertainty.

188

:

You're not sure exactly what the threshold is.

189

:

If you do your work with Stan and everything, you can include that information basically

almost directly into the model in a way that will give you much better estimates than

190

:

starting from some population sampling perspective.

191

:

And that's, I'd say, my favorite projects, the ones where Bayesian approaches have been so

helpful are those where you have this subtle prior information.

192

:

and that's where Bayes can really shine.

193

:

The flip side, of course, is it's not easy.

194

:

I'm sure it's not easy to do that type of modeling, and especially when you start deriving

custom models, custom distributions, it's intense.

195

:

It can take a long time, but the answer can be just so much better than alternatives

because it's just so much more nuanced.

196

:

I could preach on this topic for a long time.

197

:

No, I mean, that's great.

198

:

Although you would be preaching to the choir here.

199

:

Yeah, which is great.

200

:

That's wonderful.

201

:

Yeah, for sure.

202

:

But yeah, that just sounds super fun.

203

:

And I'm happy that I was able also to bring up that project because that fish project,

that shark tray project will have one of the main authors on the show very soon in

204

:

November, Aaron McNeil.

205

:

from Dalhousie University with whom I've worked on this project and the whole team.

206

:

So stay tuned, guys, for that episode.

207

:

That's going to be a very fun one.

208

:

Aaron is a very good communicator and also a very interesting person to talk to.

209

:

So it should be a very cool episode.

210

:

But let's get back to you, because I said that you had a...

211

:

a very interesting origin story, very original one.

212

:

And that's because when I, so to prepare the show, I of course stalk all my future guests,

right?

213

:

I hope you understand.

214

:

And while stalking you, I saw that you've transitioned from working at IBM and the US

Department of State to

215

:

Well, now academia, which is what you just said.

216

:

So I love that.

217

:

How did that happen?

218

:

Yeah, know life takes a lot of twists and turns.

219

:

And yeah, so I'd say essentially for the first part of my career, I was very, very wanted

to work in government, especially in foreign affairs.

220

:

you know, life, you know, life changed.

221

:

course changes are always a bit complicated.

222

:

did one tour with the Department in Saudi Arabia.

223

:

And a lot of my research is in the Middle East and North Africa, so that hasn't really

changed a whole lot.

224

:

But there are things that I loved about the State Department, and some of my colleagues,

just really amazing people.

225

:

Personally, actually when I was finishing my master's degree at George Washington, was at

a policy school, it wasn't a program that was really emphasizing preparation for PhD.

226

:

But I did some research there as part of my thesis sort of thing.

227

:

And I realized I had this sort of epiphany that I actually really liked doing research.

228

:

was this Eureka moment.

229

:

And what I also really came to value was independence.

230

:

And I think you can probably see some of that with my academic trajectory, that I like to

work on the things that I work on, and I like to kind of take different approaches.

231

:

And the State Department is a giant bureaucracy.

232

:

You know, and kind of has to be its job is to implement the legislation and all that

stuff.

233

:

And I just sort of personally realized that I would do better in a sort of less structured

environment, which, you know, there's still rules and universities and stuff, but

234

:

relatively speaking, it's much less structured.

235

:

And, you you work in smaller teams and your work is really more your own.

236

:

And so I really valued that.

237

:

So that was part of it.

238

:

And the other part was personal.

239

:

My fiance at the time was in the US and she was studying in a graduate program.

240

:

the State Department, have to have worldwide availability.

241

:

You have to go wherever they tell you to go.

242

:

they were going to next send me to Mexico.

243

:

That was the closest that they could send me back to the States.

244

:

And I talked to them.

245

:

was like, my fiance's and she's in Virginia and Richmond, US.

246

:

and they said, yeah, we're sending you to Mexico.

247

:

Because in the State Department world, that's close.

248

:

You're in the same hemisphere, but it's not that close.

249

:

And I was like, was a combination of that and then getting accepted to the PhD program at

University of Virginia.

250

:

So was all kind of all these things combined.

251

:

And honestly, there are definitely challenges to working in academia I'm sure you're

familiar with, but I really

252

:

I I made the right choice, at least for me.

253

:

I really do enjoy the freedom and flexibility of academia.

254

:

That's probably why I've stuck with it.

255

:

one thing I really care a lot about is open science and transparency.

256

:

Unfortunately, those aren't always hallmarks of academia, but I do think that at least in

principle, we have the ability to be much more honest and transparent than people in

257

:

government.

258

:

And so that's one thing I've always really enjoyed about.

259

:

academic research is the ability to just be upfront about what you think, release your

conclusions without a lot of political pressure to change them.

260

:

Yeah.

261

:

This is really interesting.

262

:

And when I can definitely relate to that background, because it's also what happened to

me.

263

:

was working at the French Central Bank, so it's not there.

264

:

the Department of State, but I actually worked a bit before the Central Bank for the

French Foreign Ministry.

265

:

Definitely had the same experience.

266

:

Interesting.

267

:

Okay.

268

:

Yeah.

269

:

Cool.

270

:

And then, yeah, for sure, getting much more autonomy and freedom in not only what I do,

but how I do it is odd.

271

:

Definitely something I tremendously appreciate since then and for sure I've never looked

back since that time.

272

:

But I'm curious though, did your background in the State Department influence your

research mythologies in political science?

273

:

much of a back and forth?

274

:

is there between the work you did once and you once did and the work you're doing now?

275

:

Yeah.

276

:

I I think that where you start out has a lot of influence.

277

:

That's where you train.

278

:

Those are the questions you're exposed to.

279

:

And absolutely, working at the State Department influenced a lot of the stuff I worked on

later.

280

:

Part of it simply might be that I tend to work on sort what they call policy relevant

topics.

281

:

most of my research is very contemporary Middle East issues.

282

:

I have colleagues who do a lot of amazing historical research, can be really fascinating

stuff, but the long-term influence of history on the contemporary world.

283

:

And I don't, I tend to focus more on what's kind of currently developing or happening in

different countries.

284

:

I certainly do spend time writing

285

:

for a policy audience so written for washington post carney down and that something in the

brookings institution so it now i i could certainly do more of that but i do and i think

286

:

to that you probably influences my writing style which tends to kind of focus on

simplifying things in making them clear and that was certainly you know when i was when i

287

:

was a diplomat that was

288

:

you very much stressed because you're writing for a policymaker audience and you cannot

use lots of jargon you cannot give them ideas they can't digest in 30 seconds because

289

:

that's all the time they have and so there's you there's a lot real like focus on being

short to the point I think I benefited from that I think it also yeah definitely

290

:

influences the way that I do things and I think too that

291

:

I came into my PhD program and definitely into sort of the statistical quantitative world

with a very nontraditional background.

292

:

So the State Department is like the least quantitative place in the world.

293

:

Like it's all diplomats who, you know, just sort of know a little bit about everything

and, you know, write, you know, pieces that just reflect their experience.

294

:

And I think honestly, my feeling is that coming into my program with that background was

actually super helpful.

295

:

because I wasn't sort of locked into existing paradigms.

296

:

So I really had studied some statistics prior to grad school, but not very much.

297

:

And so when I kind of came around to be introduced to Bayesian methods, I was just like,

this is great.

298

:

And I think too that it's led me to question things maybe in a way that I wouldn't have

otherwise.

299

:

And a lot of my papers and projects in the statistics world have often come out of me kind

of questioning things and

300

:

thinking that things are unclear and wanting to know why that's the case.

301

:

I think to, you know, there are people listening to this podcast, let's say, who are just

starting out in the world of statistics and don't have that background, you they didn't

302

:

grow up doing, let's say, the math Olympiad, playing chess all the time or whatever the

stereotype is, you know, that's really fine.

303

:

And there's a lot that you can contribute.

304

:

Like I really think almost anyone can do statistics or data science.

305

:

They're going to do it differently.

306

:

They're going to stress different things.

307

:

They're going to learn differently.

308

:

They're going to communicate differently.

309

:

But you can make a contribution.

310

:

And part of it is just that as people, think differently.

311

:

And if you think somewhat differently than the average statistician, that actually can be

a really good thing, especially when you're doing research.

312

:

Because research is all about finding solutions, right?

313

:

And if you want to find a solution, it has to be something that

314

:

no one else has thought of yet.

315

:

you know, I think that for me, it's actually been really fun being a statistician without

that background.

316

:

It's also been at times intimidating.

317

:

So you mentioned StanCon.

318

:

I went to the first StanCon in 2018, and that was also really the first time that I had

presented at like these political methodology conferences, they're where you have a lot of

319

:

quantitative social scientists, but

320

:

It was definitely my first presentation that had a math stats focus, and I was just

absolutely petrified that someone like Andy Gelman was going to ask me this horrifically

321

:

hard question about driving the analytical posterior distribution or something, and I was

just going to kind of collapse on stage.

322

:

And that didn't happen, and I had a lot of fun.

323

:

And I even had time to talk to some of the stand-devs and ask them these deep questions

about Hamiltonian Monte Carlo.

324

:

And I didn't really understand the answers, but it was still fun.

325

:

yeah, I think that that reputation of intimidation and stuff is not good for the field.

326

:

But when people get past that, it actually can be a lot of fun.

327

:

I definitely, yeah, I I agree with...

328

:

everything you just said and definitely recommend people to check out events like

StandCon.

329

:

They are really absolutely fantastic.

330

:

As I said, I recorded two live episodes there.

331

:

They are not out yet at the time when your episode is going to be out.

332

:

They require a bit more editing, but they will drop in your feed, folks, in a few weeks.

333

:

Cool.

334

:

But yeah, that's just a fantastic experience because a lot of the Stan developers are

there and you can ask them all the questions that you want that you think are stupid, but

335

:

actually are very interesting.

336

:

And yeah, I get that it can be intimidating, but that's the great thing of that Beijing

community.

337

:

From the beginning, I found that it's very welcoming.

338

:

community.

339

:

as you were saying, you started going into that world without a math degree, know, or an

engineering degree.

340

:

It's the same for me.

341

:

I studied management and political science.

342

:

So for a long time, I was, you know, kind of fearing that that part of my background with

a lot of imposter syndrome.

343

:

as you're saying,

344

:

Actually, that can make you an interesting statistician because, precisely because you

haven't gone through the classic way to statistics.

345

:

So for sure, you're not going to weigh in on the mass-saputation matrix routines if you

don't want to, because that's clearly another wheelhouse.

346

:

you'll have a lot to say on a lot of other topics, especially applied statistics, which in

the end is extremely important because all these software are here to be applied to use

347

:

cases.

348

:

Yeah.

349

:

And I think for listeners, and I'm sure you talk about this more, but if you are

interested in basic statistics in general,

350

:

and getting into it.

351

:

community is the place to be.

352

:

There's a discourse site that you can post questions about using Stan.

353

:

But if you just simply dig up a lot of the documentation that they've made, both for Stan,

but also they have these case studies online, they're excellent.

354

:

And a lot of this has to do really with Andy Gilman.

355

:

And Andy, just, you know, his own writing, if you read his articles, are very clear.

356

:

And that always set him apart in the statistics world for this love of clarity, this love

of simplicity over formal notation and dense and penitential text.

357

:

And that has really defined the Stan community as well.

358

:

don't know if it's as much, the Stan is now so big.

359

:

I kind of started out relatively early in it.

360

:

And part of it was, so John Kropko was sort of my stats advisor at UVA and he had

361

:

had did a two-year postdoc with Andy Gelman's team as they were developing the first

edition of Stan.

362

:

So I was sort of exposed to it relatively early on and it you know back then it was the

community was small enough that it was a Google group and like you know people really like

363

:

knew each other if they posted on there and now it's a lot bigger but I think that ethos

is still there and I you know really encourage people that you know to

364

:

to post, to ask questions.

365

:

Obviously, people can always be rude to newcomers and things, but it really is a great, if

you're going to start somewhere, it's a great place to start because the ethos is, how can

366

:

we include more people?

367

:

I just looking at the stand, they were talking about how they, I think they have like,

368

:

They're talking about how they're measuring who goes to StanCon and who has sort of a

non-traditional background and stuff like this.

369

:

And that's because they care about that.

370

:

And a lot of people don't.

371

:

So that's a beautiful thing.

372

:

And think Stan has been responsible really for raising the level of statistical literacy

really across the entire Applied Statistics community because

373

:

They're smart, they do amazing work, but they also explain things, right?

374

:

And there's certain models that I know how they work because of the Stan documentation.

375

:

Because it's just a lot clearer and to the point.

376

:

you know, that's...

377

:

And I think, I honestly think there's a lot of...

378

:

There's actually a fair amount of bad vibes against Stan in the Beijing Cystics world

among people who...

379

:

We're kind of around before it are attached to different older style methods.

380

:

But I think part of it too is they don't like the vibe like the you know, anyone can do

this.

381

:

We can help anyone understand that's not popular in all circles.

382

:

I'll just say that.

383

:

Yeah.

384

:

Damn, yeah, that's the first time I hear that.

385

:

But that's good to know.

386

:

Yeah, and for sure, completely second everything you just said.

387

:

If you're coming from the Python world, I also definitely encourage you to look at the

PaemC community.

388

:

That's where I started.

389

:

So the PaemC discourse is a great place to get your questions answered.

390

:

Also answer some questions yourself because that's really how you're going to learn.

391

:

So I definitely make use of all that, of all that open source community and make some PRs.

392

:

PRs are always welcome, can tell you that.

393

:

And I'm actually curious also, before we switch to your novel, to talking about your

novel, how, how...

394

:

Like first, do you remember when you were first introduced to Bayesian stance?

395

:

And also if these methods have shaped your research approach in political science?

396

:

Yeah.

397

:

Yeah.

398

:

Good.

399

:

Yeah.

400

:

I should talk about like some of my actual work in this field as part of this podcast.

401

:

Yeah.

402

:

So it's really funny, but my first project, so I was working on this paper.

403

:

that I had to do in grad school that was like a capstone paper for one of my minors.

404

:

It was an applied statistic field concentration.

405

:

I had to write this paper and I was doing mixture modeling.

406

:

And I think the funny thing about this project is that it was in many ways like a failure.

407

:

Because if you've ever played around with mixtures of Gaussians, mixtures of our

distributions,

408

:

there are these horrific identification issues.

409

:

And I was trying to fit a model where I had to identify certain clusters in the data, or I

was trying to do this with mixture modeling.

410

:

And it just wasn't working.

411

:

So essentially, my advisor, I was using a frequentist package, our package, called

FlexMix.

412

:

I remember this very distinctly.

413

:

And it was just...

414

:

Basically, every time I ran it, I got a different result.

415

:

And that was because of having a multimodal likelihood.

416

:

There wasn't a single solution, and so the algorithm would end up in a different place

each time.

417

:

And this was driving me nuts.

418

:

So then I switched to Stan, a early version of Stan at the time.

419

:

But there was sufficient documentation of mixture modeling.

420

:

And the funny thing was the model didn't actually work any better in Stan.

421

:

But...

422

:

it was doing it in the stand that I actually understood the model.

423

:

Right?

424

:

And after beating my head against the wall for a few months, I was like, I understand.

425

:

Yes, of course, this model is not identified.

426

:

A mixture is, know, without sufficient prior information about the location of the

mixtures, like you don't know where they are.

427

:

And so that was sort of my gateway drug.

428

:

And then I was like, then I read McElhary's text, like when I was, because it had,

429

:

I was just lucky, know, we all lucky.

430

:

It came out just as I, his first edition came out just as I was starting out and I was

using Twitter, which, you know, now is let's say not as useful as it was, but the social

431

:

media things are useful.

432

:

I appeared on social media, people like, wow, this is really different.

433

:

And I got it.

434

:

And it just blew me away.

435

:

I mean, just the clarity, the simplicity, how he connected a lot of things that I had

thought of before.

436

:

And, you know, I think

437

:

after reading it I went from being like, this is cool, to really being a little bit more

of a zealot.

438

:

You know, this is how we should do research, this can really make a big difference.

439

:

And then of course I also read, you know, like the canonical Bayesian day analysis.

440

:

The first time I went through it, it was a little intense.

441

:

McElroy's text is definitely more approachable.

442

:

But then a lot of my learning was, yeah, this informal case studies, blog posts, people

sharing online, people answered my questions on the forum.

443

:

That was all super, super helpful.

444

:

ultimately, what I really got into...

445

:

So I have two R packages that are Bayesian modeling.

446

:

One is called Ord Beta Reg or Ordered Beta Regression.

447

:

And this is a model that...

448

:

We don't have to go all into the weeds, but traditional beta regression is supposed to be

model of proportions, but the beta distribution can't handle observations so-called at the

449

:

bounds, meaning if your proportion goes from 0 to 1, you can include any observations that

have a 0 or a 1.

450

:

And ordered beta regression is basically a compound model where one part is a beta

distribution and the other part is actually a simple sort of ordered logit.

451

:

that allows the model to include those.

452

:

So there's two cut points in the model, like an order logistic regression.

453

:

And you have one linear model.

454

:

you can include, basically, the model has sort of three components, zero, anything between

zero and one, and then one.

455

:

And the payoff to the model, and it's getting a lot of exposure, is that

456

:

It's a pretty simple model.

457

:

not much more complicated than beta regression, but you can include an outcome that has

zeros and ones or zero and 100 or whatever your scale is.

458

:

And that's very useful for people, especially in the social sciences.

459

:

My other...

460

:

And that one's available.

461

:

It's really a wrapper around BRMS to allow people to fit these types of models.

462

:

And that's out on Crayon and everything.

463

:

The other one is much more ambitious and is actually just coming to fruition.

464

:

And it's called Ideal Stan.

465

:

And ideal there comes from a social science model called the Ideal Point Model, which is

itself a variant of something called item response theory, which people have come across

466

:

of.

467

:

It's from psychometrics.

468

:

And what I've been doing is

469

:

sort of expanding and generalizing this model using Stan.

470

:

And what it allows people to do, and I've used it for multiple papers and publications

already, but I've never released the final version, is it allows you to apply what's

471

:

essentially a very general purpose measurement model to all kinds of distributions of data

that people couldn't before.

472

:

So one of the big innovations is non-ignorable missing data.

473

:

Like when you're fitting a latent variable model,

474

:

or any kind of measurement model, missing data is really, really tough because missing

data is essentially its own latent problem.

475

:

When you have missing data and a latent variable, what do do?

476

:

So this package has a way of dealing with that in a way that avoids bias in your estimate

of the latent variable.

477

:

But it also has a bunch of other things.

478

:

So Stan is really good at time, so there's a lot of work I've done on time-varying latent

variables.

479

:

And that's particularly useful nowadays because as social scientists and data scientists,

we're getting so much more time-grain data, right?

480

:

From Twitter.

481

:

Well, I guess not anymore from Twitter.

482

:

But there's plenty of data sets that have and can get time stamped right down into

seconds.

483

:

And if you want to estimate some latent quality from that, like let's say you want to know

about corruption or polarization or...

484

:

some other quantity and it's noisy, how do you handle that variation over time, especially

when you have these sparse data sets, right?

485

:

You have only a few observations in a given time window, but your time series is really

long.

486

:

So that's the kind of stuff I've been working on, and I'm hoping to finally release a

final version of that by the end of this year.

487

:

Knock on wood.

488

:

And I've already used it for a range of things.

489

:

It's really useful for survey data sets when you have like missing data.

490

:

Like I used it to measure essentially people's wealth from a survey when there was a lot

of missing data in different variables in the survey.

491

:

I've used it to measure countries' policy responses at COVID-19 when there's a lot of

complexity in how they respond and which countries are the most prepared.

492

:

And yeah, so on and so forth.

493

:

So that package hopefully will be out soon and that uses Stan, that uses like raw Stan

code in an R framework.

494

:

I know you're, it sounds like you're a Python guy, in principle it's, can estimate in

Python as well, but that would have to be a future project.

495

:

But that's my, those are my big kind of Bayesian method stuff.

496

:

I have done a, I have a new paper out, the Journal of Royal Ciscal Society that

497

:

that fits a big Bayesian measurement models in stand to measure COVID infections in the US

in the early pandemic period.

498

:

So, and this is talking about, you know, prior information, how useful it is in the US

context.

499

:

Well, most countries, right, during the early part of COVID, like we didn't know, we

didn't even have bias data, right?

500

:

Like there were so few tests available and

501

:

I started working on this actually in sort of the early part of the pandemic and it just

was recently published, but I got really fascinated by this topic of like, well, how do

502

:

you measure COVID infections if you just don't have any data, right?

503

:

Like you can't test, you can't do anything.

504

:

And it's a really cool problem from a Bayesian point of view, because as a Bayesian, you

think, well, the best answer that you can give to a problem is to include all of your

505

:

prior information, right?

506

:

Beyond that, that's the best answer you can give.

507

:

And so I started to think about that from a very kind of purely, so I've told people that

this is my most Bayesian project ever because I just kind of was like sat down and I

508

:

worked with an epidemiologist guy, Luis Carvalho, who's also a big Stan person.

509

:

He's on the Discourse site and we worked together on this and kind of came up with a new

approach and the approach really emphasizes using priors.

510

:

And so we show how you can get a decent estimate of COVID infections.

511

:

in this very early period like March of 2020 by using things like expert surveys, right?

512

:

Where you simply go and ask a bunch of experts, well, how many infections do you think

there are?

513

:

And then you have uncertainty in that estimate, right?

514

:

And then you allow that uncertainty to propagate into your model, into the final estimates

of COVID infections.

515

:

And you essentially, can get a pretty good estimate that's still uncertain.

516

:

but actually incorporates this information that that's not you know that like you don't

have tests are or you know hospitalization even but you know if if you take that basically

517

:

take advantage of information that you have you can give a much better answer than just

spitballing or as many people did assuming way uncertainty and pretending to be much more

518

:

certain than they were any such that i think some of my recent stuff that i'm always

tinkering you know i'm sure as you are with with different things but

519

:

Yeah, same.

520

:

There is always a part of me that is looking forward to the day where I will have

everything figured out and understood.

521

:

that day never comes.

522

:

It's like, I will finally understand Gartian processes.

523

:

And then I think I understand.

524

:

And then a new use case comes and I'm like, wait, how do I do that?

525

:

Wait, why doesn't...

526

:

Why doesn't that work?

527

:

It's like, my God, I have to learn that new method.

528

:

But I guess that's part of the job description.

529

:

And that's actually the fun part, I would say.

530

:

It's like, you just have to reassure that other voice in your hand.

531

:

like, that's fine.

532

:

That's normal.

533

:

That's part of the job.

534

:

And that's actually why it's fun and interesting.

535

:

But congrats on all those projects.

536

:

That sounds really cool and really fascinating.

537

:

And interesting that, like, I didn't know about the beta, the ordered beta regression, but

definitely makes sense.

538

:

I have some experience with the order logistic.

539

:

I've used that most recently on a football analytics paper slash project I'm working on.

540

:

But the ordered beta, I didn't know about that, but that sounds like fun.

541

:

And yeah, I should say it is getting used in industry.

542

:

I know there's a guy from Amazon who actually wanted me to make a code change so it could

be deployed somewhere in Amazon.

543

:

So I know it's out there doing something.

544

:

I don't know.

545

:

I guess if your Amazon order doesn't come through, then it's my package.

546

:

That was the problem.

547

:

yeah, so it's getting...

548

:

I'll be honest, I'm really, really happy with how the model is getting used in different

places.

549

:

And essentially it really matters for predictions.

550

:

Like if you fit your model and you want to predict, okay, given this many orders, what

proportion, or this many ads, what proportion will buy the product?

551

:

You want that prediction to respect the bounds.

552

:

You don't want, if you use OLS or something, then you could end up predicting that 115 %

of your customers will buy a product.

553

:

That doesn't make any sense.

554

:

you know the order beta regression allows you to take into account that non-linearity and

the outcome that you have these bounds and yeah and I will say for for order logit again

555

:

you're talking about the stand documentation but Michael Betancourt who's you know kind of

a legend in the stand community but he has an amazing case study on order logit as this

556

:

stuff does it does get somewhat technical but it's really brilliant I've never seen anyone

557

:

sort of go through the model the way he does, but if you can get through his case study,

you really understand OrderedLogit.

558

:

And as he does, in that case study, this is just a case study, he derives a novel

distribution for the priors in OrderedLogit model, the cut points.

559

:

He just does this part of the case study.

560

:

Like, yeah, here's this new Sysl distribution no one's ever used before.

561

:

So yeah, OrderedLogit's really cool.

562

:

And again, Order Beta,

563

:

It's really about thinking out of the box because this was an area, as I mentioned, where

it's somewhat well-trodden issue.

564

:

And that was something where it was sort of really combining two things that are very

different.

565

:

So I ordered logits model for discrete data, betas for continuous data, and then was

really combining them that made it work.

566

:

People think of statistics as a sort of dry rote, read formulas off of a page.

567

:

But in a lot of, think, actual problems, a lot of it's really creative thinking.

568

:

How do you get stuck in a dead end?

569

:

How do you find the way out?

570

:

And sometimes that's taking a very different approach.

571

:

Yeah, definitely.

572

:

And so we should add these links to the show notes.

573

:

So if you have anything to share regarding your package,

574

:

order beta, please add that to the show notes for the people because I know a lot of them

are going to want to dig deeper.

575

:

And I'll also add the link to Michael's case study about logistic for sure.

576

:

My football project doesn't have anything yet that's ready to be shared, but I will do

that as soon as possible.

577

:

For sure.

578

:

Maybe that's something we'll do.

579

:

We'll teach at PyData with Chris Fonsbeck in PyData New York in November.

580

:

Maybe we'll do that.

581

:

We'll see how that works.

582

:

Maybe at that point, I'll be able to share that and add that to the show notes.

583

:

In the meantime, let's add your package or any paper or case study or things like that

that you've...

584

:

return or read and think is interesting or even videos and tutorials and I'll add

Michael's case study.

585

:

And so I know you had a hard step in a few minutes and I definitely want to talk about

your novel.

586

:

I mean, I still have like tons of questions for you and the work you do and how you use

space and so on because honestly, I love all the work you do and I...

587

:

We could do like a three hour episode very easily.

588

:

But I definitely want to talk about your novel.

589

:

So let's do that because you're the first novelist on the show.

590

:

So first question, you know, if I were talking to you in the street or in a bar, would be

like, why?

591

:

What inspired you to write The Bayesian Hitman?

592

:

Yeah.

593

:

Well, my interest in writing, and this is the thing that for me, like doing statistics

kind of came a little bit later in life, my interest in writing predated my doing

594

:

statistics by a number of years, actually.

595

:

And I was always sort of interested in writing.

596

:

I got into writing fiction, actually, when I was a diplomat in Saudi Arabia, where, not to

too fine a point on it, there's not very much to do in Saudi Arabia.

597

:

And there's very few creative outlets.

598

:

And I used to love doing things in the States.

599

:

I used to actually be very much involved in improv theater at one time.

600

:

And there was very little of that.

601

:

But you can write a novel anywhere.

602

:

And so that's actually where I got into writing.

603

:

And all of that stuff that I wrote will be forever locked away in my computer and never

released.

604

:

But eventually I just kept writing from time to time, even in grad school, just a really

nice outlet.

605

:

It's a different way of using your brain.

606

:

After I get locked into, you

607

:

doing research papers and stuff and then fiction is just so different.

608

:

At least it should be.

609

:

I was posting on Twitter about this.

610

:

There's some academic studies that have turned out to be fiction and have had to be

retracted lately.

611

:

But in theory, as active as you are, not doing fiction, you're doing real research.

612

:

And yeah, the genesis of this novel actually came out of me being on the academic job

market in the fall when I was a grad student, so my last year of grad school.

613

:

I kind of had this

614

:

I don't know, as you get these sort of visions, I had this idea of someone going to a

university that wasn't their top choice, that was in an area that they didn't like, but

615

:

then things being radically different than they expect in a good way.

616

:

And that was really where the idea of the novel came from because when I was on the job

market, the thing about the job market

617

:

and that I think appeals as a sort of human story is how our lives are just so disrupted

as academics.

618

:

know, people move, you know, across their country, sometimes across the world.

619

:

They end up in places that they never thought they would live.

620

:

They're culturally very different.

621

:

And I thought that's a really great setting for a story.

622

:

And yeah, and you know, I think a lot of writing advice when you're starting out that

you'll get is to write what you know and...

623

:

And that was kind of what I knew.

624

:

That was the world I was living in.

625

:

One thing I want to clarify, it's not an autobiographical novel.

626

:

And I always worry about that a bit.

627

:

But the main character is not me.

628

:

And it's actually set in a fictional town with a fictional university.

629

:

I that very intentionally because I didn't want to like...

630

:

I say the main character has, let's say, very uncensored opinions about his institution.

631

:

And I didn't want to like...

632

:

you know, critique somewhat, you know, an actual university or college.

633

:

But the main character is very much informed by my experiences, but also by all my friends

and the things they went through and the places they traveled.

634

:

And I think ultimately, as I wrote in the acknowledgement of the book, I really, wanted it

to be a novel about academia that was much more realistic, that really got into the

635

:

problems people have, issues and the challenges they face, and to try to, you know,

636

:

Because I felt a lot of writing about academics, it's always these like mysterious

literature professors that hang around in like, you know, beautiful Ivy League, you know,

637

:

places and like solve crimes in their spare time or something.

638

:

And I wanted it to be a little more hard hitting and, and also really, you know, about,

you know, life as a research academic and what that's like and the pressures you face.

639

:

And so that was really the Genesis.

640

:

And then the main character, you know,

641

:

became about a Bayesian statistician, I don't know, I don't remember distinctly making

that choice, but the Khitmanger character is a Bayesian statistician, and that became

642

:

really fun because I still think, I mean, it's impossible to verify this claim, but I'd

say there's a high posterior probability that it is the only novel that has a Bayesian

643

:

statistician as the protagonist.

644

:

And that was really fun, because it's fun getting to take an area that

645

:

really doesn't appear in fiction and like put it into a story and you know, see what

happens.

646

:

Yeah, I mean, that is really interesting to see how like, know, the way of thinking that

got you there.

647

:

okay, so from like, that was one of the scenarios I had in my head where it's like

actually something you wanted to do already.

648

:

So you write of course academic content.

649

:

How was the experience?

650

:

How different was the experience?

651

:

And did you enjoy it more to write the fiction?

652

:

Do you feel freer writing fictional work because you you don't have to check everything,

cite everything and things like that?

653

:

Or was the experience in me and quite similar?

654

:

That's a great question.

655

:

I think that, unfortunately, think at the end, they tend to converge in some ways.

656

:

But I think that's partly the, I'd say the initial work is very, very different.

657

:

I when you're writing fiction, you really are, you're really trying to feed your creative

instinct.

658

:

And hopefully when you're writing academic paper, that's not the, hopefully you have like

data that, you know, constrains, know, these sorts of things.

659

:

But when you're trying to get that story out for the first time, and yeah, that's a very

different, and I don't know how good I am at that, but that's really about just going

660

:

wherever the story leads and trying to figure out the next thing.

661

:

And it's difficult, but it's definitely a very different kind of skill set or experience

than writing academic paper.

662

:

I'd say that they tend to converge at the end because

663

:

and this was my experience.

664

:

I noticed your words in my first novel and that made me very nervous that maybe there will

be another novel.

665

:

But yeah, it's not easy and it's not easy to finish a novel.

666

:

think, I mean, it's hard to write one for sure, but it's that finishing that really,

that's where you have to come back and make it coherent and edit it.

667

:

And my book was in a kind of editing phase for

668

:

years probably of just going back and forth and adding things, taking away, trying to make

the story really move at the right pace.

669

:

know, how do you have to, you don't want to have too much detail or too little and things

that become even technical.

670

:

Like we have what are called plot holes where, you you said something happened in a

certain place, but that couldn't happen because the character was here.

671

:

So then, you know, at the end of the day, you know, when you're finishing a novel really

comes down to a lot of details and

672

:

cross-checking things and stuff that's not that different from finishing an academic

paper.

673

:

so, and that's not, know, I mean, the editing phase is not usually people's favorite phase

unless they're kind of weird.

674

:

But yeah, so I would say ultimately they start to converge.

675

:

But of course, the writing style is very, very different.

676

:

I do think, you know, honestly, you my advice would be especially for academic

researchers, I really encourage you to write.

677

:

creatively.

678

:

Creative fiction, creative nonfiction, poetry.

679

:

No one has to see it.

680

:

No one has to know.

681

:

You can post it online under a pseudonym if you want, but it actually really does help you

write.

682

:

And I would say it has made my academic writing better.

683

:

mean, academic writing at the of the day, especially when it goes through after the peer

review process, you end up responding to reviewers and anything beautiful you made gets

684

:

destroyed.

685

:

But also, I write a lot of blog posts.

686

:

These have almost, my blog posts have far greater reach than my academic writing, I'm sure

of that.

687

:

And those are definitely much more informed by my creative writing.

688

:

And the nice thing is they're not peer reviewed and I'm able to add tone, I'm able to add

fun things.

689

:

And that definitely is much more connected to my fiction writing and issues of clarity.

690

:

But I'd say that you really can become a better, it's sort of like cross training as an

athlete.

691

:

Trying out different forms is really helpful.

692

:

I'm sure that writing my novel, I didn't write it to get ahead, although I've joked with

people that part of my tenure case is writing a novel.

693

:

But I don't think it'll actually will help.

694

:

But I'd say without maybe meaning to it, it definitely has made me a better writer and

that's great for my career at the end of the day.

695

:

Yeah, I completely agree with that.

696

:

Personally, I hate reading academic papers.

697

:

I do it because I have to, but each time I have to do that, I'm like, my God, no.

698

:

Actually, something I would like to try is feed a paper to Chetjipiti and ask it to

rewrite it from a very, more like a novel or a more exciting tone because honestly, the

699

:

writing is just terrible.

700

:

Yeah, count me a story, know, something like that.

701

:

But as you were saying, like to your point, really like Richard McArif's style because in,

I talked a bit about that with him at Stankon.

702

:

So I won't, you know, divulge all the details because I don't know if he wants to.

703

:

But long story short, it's like he also, you know, he, he definitely trained his writing

styles and he's aware of things he wants to.

704

:

to how he wants to do it.

705

:

And I think that's also a big part of why his book has been so successful.

706

:

It's the writing, it's so much more engaging.

707

:

And I've never understood that, honestly, from the academic world, where it's like, no, it

seems like to look serious, you have to be as boring as possible.

708

:

And that's just terrible.

709

:

That doesn't make people wanna read papers.

710

:

I'm not saying it should be completely entertaining and not saying it should be tick

tocking on papers on the country, but people like stories.

711

:

And if you can tell stories and at the same time, teach them something or show them a new

method, I think that's much better.

712

:

Everybody wins.

713

:

And I think it's also much more enjoyable for the writer.

714

:

So, you know, why not do a

715

:

little bit more of that.

716

:

that's why I think it's awesome that Richard writes that way, that you also like to write

that way and you're even writing novels.

717

:

I think that's awesome and that's probably going to change gradually.

718

:

I know other authors also doing that.

719

:

Osvaldo Martin, for instance, who's going to be back on the show next week.

720

:

Osvaldo was the first ever guest of Learning Patient Statistics.

721

:

Episode one was with him five years ago.

722

:

And he's been kind of a mentor to me.

723

:

And I really like his side of writing, for instance, also.

724

:

He's someone who has a lot of humor.

725

:

that, know, you can see that in the writing.

726

:

Also, I like it because he doesn't, you know, drown.

727

:

you with technical details from the get-go.

728

:

His writing is much more applied where it's like, okay, let me tell you about Bayesian

additive regression trees.

729

:

Here's the theory you need to know, but not too much.

730

:

in, okay, here is how we do it.

731

:

Here are the limitations and so on.

732

:

And I think that's much more efficient and also much more engaging.

733

:

totally.

734

:

And I think blogs have changed statistics.

735

:

maybe even helped create data science because it's this form of publishing that allows us

to just skirt around the whole archaic academic system.

736

:

you're just, you know, especially for listeners who aren't, let's say, aren't as plugged

into academia, haven't been through like the paper publishing process.

737

:

mean, reviewers really do like, you know, even if you try to, I mean,

738

:

If you write a paper better, it'll be a better paper.

739

:

they really, I mean, I've had reviewers like kill titles because they didn't think that

they were like academic-y enough.

740

:

you know, and the title I had to replace it with was definitely like a worse title, right?

741

:

So, and I've had reviewers like comments say, your writing style is too informal.

742

:

Like nothing to do with the actual substance of paper.

743

:

Just this doesn't sound, you know,

744

:

technical.

745

:

So when you're reading an academic paper, and it's turgid, it's like, well, some of that

is I mean, and also, I mean, some people just aren't like struggle to write some people,

746

:

English is not a language they're very comfortable with.

747

:

And so that, you know, there's not everyone's going to be Richard McElwreath, like that

guy has a gift.

748

:

Okay.

749

:

But, but I think to, you know, blogs allow people to just completely circum you know, do

an end runner out that

750

:

The other person I mentioned who's fantastic blogger on stats issues, I don't know if he

would explicitly say he's Bayesian, but a lot of his stuff is, is Andrew Heiss.

751

:

He's a fellow political scientist at Georgia State.

752

:

And we can put his link in the blog.

753

:

And he is, I think, really one of the best sort of stats bloggers out there because he has

a remarkable gift for visualization, but he's also very good at sort of the explanation

754

:

side.

755

:

And so if there's people listening who have not added him to their list, I he's really

great at, and very applied, you know, like the embedded code chunks in the blog and stuff.

756

:

So yeah, I mean, so I think this is all fantastic.

757

:

And, you know, the only challenge, and I know as academic who does some blogging is, you

know, we're not paid to do it, and it is a public good.

758

:

And when I say we're not paid to do it, mean, theoretically, yes, like,

759

:

anything you do as academic you're quote unquote paid for, but it's not really part of

your valuation.

760

:

Most people ignore it, right, for tenure and for these sorts of things.

761

:

And so I'd say it's not incentivized.

762

:

It's up to people to do it yourself.

763

:

I think my projects have been, I mean, I think some of my blog posts that relate to my

academic work, they help with, let's say, getting citations.

764

:

But at the end of the day, it's something that you kind of have to want to do.

765

:

But I would say personally, I found it just so rewarding to write that way and to see

stuff get out there in the world.

766

:

I'll just throw this in.

767

:

It's just a funny tidbit.

768

:

My most visited blog post for years now is actually a blog post I wrote about a pregnancy

test called a cell-free pregnancy screening for Down syndrome.

769

:

which if you know anything about being in cystics, right, testing is this like core part,

you it's like the example all the intro books use, you know, some kind of how many

770

:

vampires are in the population or whatever.

771

:

And so this blog post came out of actually a very personal story of my wife and I having a

child who was tested or recommended for testing.

772

:

And to make a long story short, I was really upset about how these tests were being

interpreted in a way that was like statistically invalid.

773

:

right?

774

:

Not being aware of prior distributions and how they affect the interpretation of the test.

775

:

So I wrote a blog post about this and I tried to make it very, very clear, even to people

of no stats background, right?

776

:

And I mean, it's just my personal blog.

777

:

I just posted it up there and somehow it's become one of the top Google search results for

this particular test.

778

:

It's called Maternity 21.

779

:

It's a top five.

780

:

So it gets like 100

781

:

view, you know, unique views a week, sometimes higher than that of people.

782

:

And I've gotten like emails from people all over the world.

783

:

And sometimes I have to say, I'm sorry, I'm not a medical doctor.

784

:

I'm just commenting on, you know, how you correctly interpret the statistics of a test.

785

:

like that's like as an academic, I guess really cool to have that kind of impact.

786

:

And I thought that was going to be a post I wrote that would be quickly forgotten, but

ended up.

787

:

And it's still it's people read it all the time.

788

:

And hopefully,

789

:

make fewer statistical errors, right?

790

:

From using those tests.

791

:

anyway, yeah.

792

:

Yeah, yeah, for sure.

793

:

That's cool because that was actually going to be a question I had for you, know, like the

way you saw your novel in your writing in general contribute to public understanding of

794

:

patient stance and scientific thinking in general.

795

:

Yeah.

796

:

But since we're short on time, because I think you have to leave in like 14 minutes.

797

:

I'm going to ask the last two questions to ask you that I asked every guest at the end of

the show.

798

:

But before that, one last question regarding your novel.

799

:

Did you get any feedback already from your academic peers and readers?

800

:

And what kind of feedback was it?

801

:

Not a ton yet.

802

:

mean, know, novel's only been out for a month and we're all busy.

803

:

So the people who have read it really like it.

804

:

Of course, as a Beijing statistician who works on, I know that the reported feedback is

not always the same as the true feedback, right?

805

:

That's a latent quantity.

806

:

So that being said, the way I would interpret people's feedback so far is that, wow, this

novel is actually not that bad.

807

:

I mean, it has a sort of claim to fame as being one of the first novels on the basis that

it's such a protagonist, but that doesn't mean that it's actually fun to read.

808

:

But the people who have gotten through it said, it's actually well-paced.

809

:

There's some kind of mystery thriller elements in it.

810

:

The plot moves along nicely.

811

:

They enjoyed it.

812

:

And that's honestly what I want people to take from it the end of the day.

813

:

There are some things in the book that

814

:

If you know me, you're probably not surprising.

815

:

Some things get into some questions about science, what is it about, philosophy of

science, things like this.

816

:

But it's not too heavy-handed and it's fun and it's a little bit escapist.

817

:

And that's what I wanted was for people who do research to have a fun book to read and

enjoy it.

818

:

It doesn't get super into the weeds on Bayesian stats.

819

:

There is some, and actually you mentioned Gaussian processes.

820

:

That's one of the few.

821

:

There is actually time when the characters sort of make fun of Gaussian process

regression.

822

:

So I was very happy that that ended up in the novel.

823

:

And maybe I'll get angry emails from people who love Gaussian processes.

824

:

Yeah, for me.

825

:

Yeah, for sure.

826

:

Yeah.

827

:

I mean, yeah.

828

:

I mean, I'd say if you understand Bayesian statistics, you'll understand more of what the

main character is doing.

829

:

I don't write out models in the book, but you have a much better sense of the technical

side of what's happening.

830

:

in the novel, I tried to make it more fun for people, even people who don't have a

background at all in statistics and stuff like that.

831

:

But yeah.

832

:

So it's still early for...

833

:

There's a lot of people who have the book and are reading it.

834

:

Yeah.

835

:

If the pilot distribution of feedback is the same as the true distribution, so far people

enjoyed it.

836

:

Nice.

837

:

Yeah, that's awesome.

838

:

And well done again for taking the time of doing that because I know how long it takes to

write a book and how much dedication and sacrifice of free time it asks for.

839

:

So yeah, thanks a lot for doing that.

840

:

I think it's super...

841

:

important for science communication.

842

:

And I do think we should teach science from much more of a storytelling perspective

because science is done by people.

843

:

And this is not just a bunch of dry theorems and papers.

844

:

So I think your novel definitely contributes to that.

845

:

So thanks a lot, And now...

846

:

So I need to ask you the last two questions before you can get out to your next

engagement.

847

:

first one, if you had unlimited time and resources, which problem would you try to solve?

848

:

And caveat is that you have to solve it with the Gaussian process.

849

:

No, of course not.

850

:

It's just, what problem if you had?

851

:

limited time and resources?

852

:

Yeah, limited time and resources.

853

:

Well, would buy a football team.

854

:

Important for the world.

855

:

I have actually thought about this, but I do a lot of work with online surveys.

856

:

I do think that we haven't really fully exploited them.

857

:

People maybe shouldn't be on social media as much as they are, but they're on at a time.

858

:

And there's incredible possibilities through that for data collection.

859

:

And one thing that I think would be really cool would be to do real-time online surveys

across the whole world that happen almost every day.

860

:

So this is, yeah, I guess, the social scientists dream.

861

:

But for some of stuff I study, like

862

:

corruption, like how people report issues of corruption or what's happening with their

business or things that look shady.

863

:

Having a survey like that that would run around the world all the time every day would be

pretty awesome.

864

:

Facebook did this during COVID.

865

:

They had a COVID poll and made me so jealous because they could, because they're Facebook,

they could just have this thing appear on people's feeds.

866

:

would just pop up and say, want to take a survey about COVID.

867

:

So there's this incredible data out there of like daily, sometimes it gets down to like

the county or state level of, know, how many, you know, stuff like, you know, how much

868

:

contact do people have with other people?

869

:

And we have this information, like literally almost the entire world.

870

:

It's just stunning.

871

:

And so I would love to do, to do that kind of thing about, you know, topics I care about,

like corruption and just see, you know, because in general in academia, we do,

872

:

The most we get away with is like a sort of point in time survey.

873

:

Like here's what people thought about President Trump at this particular time.

874

:

But longitudinal data is so much more interesting.

875

:

And there's so many more important questions you can ask when you get into things like

when do people change their minds?

876

:

How do they change their minds?

877

:

Like even with the current election, which we haven't discussed yet.

878

:

So we have to discuss that, right?

879

:

But, you know, like there's all this, you know, conversation about, who supports, you

know, Kamala Harris, who supports Donald Trump.

880

:

And the thing is we don't really have longitudinal data.

881

:

So we really don't know who has changed their mind or not because we don't know.

882

:

People answer a survey and they say which one they like at the moment.

883

:

But does that mean they really changed their mind or just that's who they saw on TV last

night?

884

:

That's the most recent candidate they've heard of.

885

:

What we really want to know are people who used to support one candidate, now they support

another.

886

:

Those are the really interesting ones.

887

:

So that's sort of my dream ambition.

888

:

Maybe that's a very

889

:

I don't know, uninteresting dream ambition, that would be one thing I would love to do.

890

:

And it really would require unlimited funding.

891

:

So if you know of a source of unlimited funding, please put me in touch.

892

:

Yeah.

893

:

mean, if I first use it and then I'll tell you, I'll tell you I want.

894

:

Great answer.

895

:

I'm not surprised.

896

:

You seem like you're really passionate about what you're doing.

897

:

I'm not surprised you came up with a very appropriate answer and of course, a very nerdy

answer.

898

:

was really hoping for that.

899

:

That's a prerequisite to be on the show.

900

:

know.

901

:

And so second question, if you could have dinner with any great scientific mind, dead,

alive or fictional, who would it be?

902

:

Yeah.

903

:

So this is a great question.

904

:

And when I've had to think about

905

:

And but it ended up being actually very clear for me and that that's Leonardo da Vinci.

906

:

And I've always heard about him.

907

:

was it took a trip to Italy a few years ago.

908

:

And there's a museum of Leonardo da Vinci.

909

:

I believe it's in Rome, but don't quote me on that.

910

:

That's not my if I remember correctly, is in Rome.

911

:

And it wasn't the biggest museum, but it was so fascinating.

912

:

And what I loved about this guy and this kind of relates to our conversation right about

913

:

statisticians writing novels.

914

:

This guy had no rules, right?

915

:

And he would, you know, he'd like wake up one day, he'd paint the Mona Lisa, he'd wake up

the next day, he'd like invent a new way to build a dam.

916

:

And no one was there to tell him like, hey, you should, you know, no, no, no, you're an

artist, like you should just stay painting all the time, or, my gosh, you're really good

917

:

at, you know, science, like you should, you know, just write scientific treatises.

918

:

Like he just decided he was going to do it all.

919

:

Now, obviously he was tremendously gifted and maybe there's not another person alive who

has ever been that multi-talented, but I just think that's so fascinating.

920

:

His scientific discoveries probably don't measure up to let's say Newton or Bacon or

Einstein, but I think as a person he's fascinating in the way that he's doing art.

921

:

He's doing science and the two can blend together.

922

:

And so I think for me, hands down, I just love to sit down and chat with him.

923

:

And where did you get your ideas from, right?

924

:

Where did they, is steady stream of insights and you look him up, I mean, his

contributions across fields are staggering, right?

925

:

And, yeah, so that's that he gets my vote.

926

:

If you can set that up too, that'd be great.

927

:

If you know how to bring dead people to life or whatever.

928

:

Well, I definitely will and I will join the dinner because honestly, yeah, I think it's

great choice.

929

:

yeah, some other people also have made that choice.

930

:

So that will be a very interesting dinner.

931

:

I thought it was super original, but I guess not.

932

:

I mean, it is original.

933

:

It's not the bulk of the distribution, but you're not the first one.

934

:

Yeah, that's fine.

935

:

Yeah, it's a great one.

936

:

It would have been very original if you wanted to optimize that to say myself.

937

:

If you have the source of unlimited funding, I would definitely be saying that.

938

:

If you like Leonardo da Vinci stuff as I do, there is in my native region in France,

939

:

There is his last house, which was offered to him, gifted to him by the King of France,

Francis I.

940

:

And at the time, the King was in the small city of Amboise, which is in the Loire Valley.

941

:

And so if you go to, so I definitely recommend the region.

942

:

This is really.

943

:

It's really beautiful.

944

:

It's like, it's the same vibe as Tuscany since you know Italy, but without the mountains.

945

:

But it's great food, great wine, lots of history, lots of castles.

946

:

Leonardo da Vinci spent his last years in Amboise and his castle, which is called the Clos

Lucé, is actually a museum now that you can visit.

947

:

Lots of his inventions are there.

948

:

even handwritten notes, always very original because he was writing with the left hand

from right to left.

949

:

So it's very hard to decipher actually.

950

:

There's an amazing park and there is also a bit of, there are some vines because he was

making some wine.

951

:

So yeah, definitely recommend that.

952

:

That's a really beautiful place.

953

:

That sounds fascinating.

954

:

And I didn't need another reason to go visit France, but you've given me another one, so I

will.

955

:

I'll have another excuse to visit for sure.

956

:

Yeah.

957

:

Thank you.

958

:

also it's not a very touristic one.

959

:

mean, it is touristic, but it's mainly European tourism and some Americans.

960

:

I mean, now that I've talked on the podcast, of course it's going to become much more

touristic because I'm kind of an influencer, but it's just like...

961

:

Yeah.

962

:

All these really nerdy people will show up and...

963

:

Yeah, know, trying to.

964

:

T-shirts and stuff.

965

:

Yeah.

966

:

Did you get a Stan T-shirt at the conference or?

967

:

No, I don't think there were T-shirts this year but I have some cool stickers.

968

:

have some cool stickers.

969

:

Okay, okay.

970

:

Yeah, they have stickers.

971

:

Yeah.

972

:

I got a T-shirt back when but I will say the first Stan conference, I wouldn't say it was

the best because I think probably the qualities, know, like there's so many more people

973

:

use Stan but it was in California on the coast.

974

:

Like they had a resort like that was on the Pacific.

975

:

was pretty, it was very pretty.

976

:

yeah.

977

:

Yeah.

978

:

It was really fun.

979

:

But yeah.

980

:

No, this year was in Oxford University.

981

:

So as expected, it was raining.

982

:

But the, but the university is pretty cool to look at.

983

:

Yeah.

984

:

Yeah.

985

:

No, that's, that's absolutely beautiful.

986

:

And, and, again, like, you know, you, you go to, to the UK, you expect it to be raining,

you know, so that's why it's like going to France and not expecting some strikes.

987

:

It's like,

988

:

You're missing part of the experience, you know.

989

:

Awesome.

990

:

Well, Bob, I need to let you go.

991

:

know you have another engagement, but thank you so much for taking the time.

992

:

That was absolutely great.

993

:

Awesome conversation.

994

:

I have still a gazillion questions for you, but let's do that when your next novel comes

around.

995

:

So you told me in about three months, right?

996

:

So that's awesome.

997

:

my gosh.

998

:

And yeah, as usual, I put resources and a link to your website in the show notes for those

who want to dig deeper.

999

:

Thank you again, Bob, for taking the time and being on the show.

:

01:23:32,890 --> 01:23:33,520

Thank you, man.

:

01:23:33,520 --> 01:23:36,921

I had a great time and I'll definitely get everything else to you.

:

01:23:36,921 --> 01:23:37,629

But thanks so much.

:

01:23:37,629 --> 01:23:39,012

It was so fun to this conversation.

:

01:23:39,012 --> 01:23:40,039

You asked great questions.

:

01:23:40,039 --> 01:23:41,533

made me think a lot.

:

01:23:41,933 --> 01:23:43,174

So I really appreciate it.

:

01:23:43,174 --> 01:23:44,834

I hope you have a good week.

:

01:23:48,686 --> 01:23:52,369

This has been another episode of Learning Bayesian Statistics.

:

01:23:52,369 --> 01:24:02,868

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbaystats.com for more resources about today's topics, as well as access to more

:

01:24:02,868 --> 01:24:06,961

episodes to help you reach true Bayesian state of mind.

:

01:24:06,961 --> 01:24:08,903

That's learnbaystats.com.

:

01:24:08,903 --> 01:24:13,767

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraam.

:

01:24:13,767 --> 01:24:16,929

Check out his awesome work at bababrinkman.com.

:

01:24:16,929 --> 01:24:18,102

I'm your host.

:

01:24:18,102 --> 01:24:18,943

Alex Andorra.

:

01:24:18,943 --> 01:24:23,202

You can follow me on Twitter at Alex underscore Andorra like the country.

:

01:24:23,202 --> 01:24:30,573

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

:

01:24:30,573 --> 01:24:32,955

Thank you so much for listening and for your support.

:

01:24:32,955 --> 01:24:35,267

You're truly a good Bayesian.

:

01:24:35,267 --> 01:24:42,083

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

:

01:24:42,083 --> 01:24:45,624

Let's adjust those expectations.

:

01:24:45,624 --> 01:24:58,593

me show you how to be a good Bayesian Change calculations after taking fresh data in Those

predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
119. #119 Causal Inference, Fiction Writing and Career Changes, with Robert Kubinec
01:25:00
118. #118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
00:58:50
117. #117 Unveiling the Power of Bayesian Experimental Design, with Desi Ivanova
01:13:11
116. #116 Mastering Soccer Analytics, with Ravi Ramineni
01:32:46
115. #115 Using Time Series to Estimate Uncertainty, with Nate Haines
01:39:50
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
113. #113 A Deep Dive into Bayesian Stats, with Alex Andorra, ft. the Super Data Science Podcast
01:30:51
112. #112 Advanced Bayesian Regression, with Tomi Capretto
01:27:18
110. #110 Unpacking Bayesian Methods in AI with Sam Duffield
01:12:27
107. #107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt
01:21:37
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
104. #104 Automated Gaussian Processes & Sequential Monte Carlo, with Feras Saad
01:30:47
103. #103 Improving Sampling Algorithms & Prior Elicitation, with Arto Klami
01:14:38
102. #102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle
01:08:53
100. #100 Reactive Message Passing & Automated Inference in Julia, with Dmitry Bagaev
00:54:41
98. #98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié
01:05:06
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
94. #94 Psychometrics Models & Choosing Priors, with Jonathan Templin
01:06:25
90. #90, Demystifying MCMC & Variational Inference, with Charles Margossian
01:37:35
89. #89 Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler
01:59:50
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
86. #86 Exploring Research Synchronous Languages & Hybrid Systems, with Guillaume Baudart
00:58:42
84. #84 Causality in Neuroscience & Psychology, with Konrad Kording
01:05:42
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
56. #56 Causal & Probabilistic Machine Learning, with Robert Osazuwa Ness
01:08:57
68. #68 Probabilistic Machine Learning & Generative Models, with Kevin Murphy
01:05:35
71. #71 Artificial Intelligence, Deepmind & Social Change, with Julien Cornebise
01:05:07
78. #78 Exploring MCMC Sampler Algorithms, with Matt D. Hoffman
01:02:40
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05