#141 AI Assisted Causal Inference, with Sam Witty

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Get early access to Alex's next live-cohort courses!
Enroll in the Causal AI workshop, to learn live with Alex (15% off if you're a Patron of the show)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

Causal inference is crucial for understanding the impact of interventions in various fields.
ChiRho is a causal probabilistic programming language that bridges mechanistic and data-driven models.
ChiRho allows for easy manipulation of causal models and counterfactual reasoning.
The design of ChiRho emphasizes modularity and extensibility for diverse applications.
Causal inference requires careful consideration of assumptions and model structures.
Real-world applications of causal inference can lead to significant insights in science and engineering.
Collaboration and communication are key in translating causal questions into actionable models.
The future of causal inference lies in integrating probabilistic programming with scientific discovery.

Chapters:

05:53 Bridging Mechanistic and Data-Driven Models

09:13 Understanding Causal Probabilistic Programming

12:10 ChiRho and Its Design Principles

15:03 ChiRho’s Functionality and Use Cases

17:55 Counterfactual Worlds and Mediation Analysis

20:47 Efficient Estimation in ChiRho

24:08 Future Directions for Causal AI

50:21 Understanding the Do-Operator in Causal Inference

56:45 ChiRho’s Role in Causal Inference and Bayesian Modeling

01:01:36 Roadmap and Future Developments for ChiRho

01:05:29 Real-World Applications of Causal Probabilistic Programming

01:10:51 Challenges in Causal Inference Adoption

01:11:50 The Importance of Causal Claims in Research

01:18:11 Bayesian Approaches to Causal Inference

01:22:08 Combining Gaussian Processes with Causal Inference

01:28:27 Future Directions in Probabilistic Programming and Causal Inference

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Joshua Meehl, Javier Sabio, Kristian Higgins, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık, Suyog Chandramouli and Adam Tilmar Jakobsen.

Intro to Bayes Course (first 2 lessons free)
Advanced Regression Course (first 2 lessons free)

Links from the show:

Sam’s website: https://samwitty.github.io/
Sam on LinkedIn: https://www.linkedin.com/in/sam-witty-46708572/
Sam on GitHub: https://github.com/SamWitty
ChiRho docs: https://basisresearch.github.io/chirho/getting_started.html
Causal Inference using Gaussian Processes with Structured Latent Confounders: https://proceedings.mlr.press/v119/witty20a/witty20a.pdf
Automated Efficient Estimation using Monte Carlo Efficient Influence Functions: https://proceedings.neurips.cc/paper_files/paper/2024/file/1d10fe211f5139de49f94c6f0c7cecbe-Paper-Conference.pdf
PhD Thesis: https://samwitty.github.io/papers/Witty_Dissertation.pdf
LBS #137 Causal AI & Generative Models, with Robert Ness: https://learnbayesstats.com/episode/137-causal-ai-generative-models-robert-ness

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:05

In this episode, I am joined by Sam Witte, founder of Cerberus AI.

2

: 00:00:11

Sam is the driving force behind Pyro, an open-source framework that extends Pyro with

causal capabilities, giving scientists and engineers a way to run counterfactuals, analyze

3

: 00:00:23

path-specific effects, and build hybrid models that combine machine learning with domain

knowledge.

4

: 00:00:30

We talk about the challenges of translating causal questions for clients, the importance

of Bayesian thinking for scrutinizing assumptions, and the real-world impact of causal

5

: 00:00:40

probabilistic programming, from sustainable energy to preemptive health.

6

: 00:00:45

This is Learning Bayesian Statistics, recorded July 31, 2025.

7

: 00:00:56

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

8

: 00:01:17

I'm your host, Alex Andorra.

9

: 00:01:20

You can follow me on Twitter at alex-underscore-andorra.

10

: 00:01:23

like the country.

11

: 00:01:24

For any info about the show, learnbasedats.com is Laplace to be.

12

: 00:01:28

Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on

Patreon, everything is in there.

13

: 00:01:35

That's learnbasedats.com.

14

: 00:01:37

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

15

: 00:01:48

See you around, folks.

16

: 00:01:50

and best patient wishes to you all.

17

: 00:01:52

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

18

: 00:01:59

Check us out at pimc-labs.com.

19

: 00:02:05

Hello my dear Baysians!

20

: 00:02:06

Big news!

21

: 00:02:07

I am considering running a 6-week, more or less, live cohort of my Advanced Baysian

Regression course from Intuitive Bays that I wrote with the brilliant Robin Coomer and

22

: 00:02:20

Tommy Capretto.

23

: 00:02:22

Think weekly live sessions, projects, office hours, community...

24

: 00:02:27

So if you are into practical Bays and want to see how it's done in the wild, in practice

and

25

: 00:02:34

learned that, well, that sounds interesting to you, feel free to answer the short Google

form in the show notes to help me scope demand and logistics, and of course also to get

26

: 00:02:46

the early bird access and discount.

27

: 00:02:50

Thank you so much for your help and for being such a good patient.

28

: 00:02:58

I'm Witty, welcome to Learning Vision Statistics.

29

: 00:03:02

Thank you.

30

: 00:03:03

Glad to be here.

31

: 00:03:04

Yeah, I am super excited to have you on the show because, well, you're doing things I'm

very interested in, especially uh these days where I'm ramping up my causal inference,

32

: 00:03:19

intersecting with uh generative AI modeling skills.

33

: 00:03:25

I discovered your work through

34

: 00:03:28

Robert Ness's episode with us and also his great book called the Lay-A-Aye where he uses

Cairo and you are of course one of the developers of his great package.

35

: 00:03:41

So we'll talk about all that today.

36

: 00:03:44

um And if you have some stuff by the way you want to share on your screen, know, demo

stuff, live or else, let me know you can share your screen and that will be then on the

37

: 00:03:54

YouTube channel for people who want to follow along.

38

: 00:03:58

But first, as usual, we're going to dive into your origin story, So what are you doing

nowadays and how did you end up working on that?

39

: 00:04:10

So right now I'm a founder and principal consultant at a company I founded called Sorbus

AI.

40

: 00:04:17

um We do consulting on probabilistic and causal AI, mostly for engineering and scientific

clients.

41

: 00:04:26

um I'll talk a little later about why I think that's a really sweet spot for this

technology right now.

42

: 00:04:31

But at a high level, it's a boutique consultancy where I sit very closely with clients and

help them um kind of bridge the gap between what I call uh mechanistic and kind of

43

: 00:04:45

data-driven modeling workflows.

44

: 00:04:47

um So the story that got me there, um in undergrad I was a mechanical engineer.

45

: 00:04:53

I actually don't have uh you know, undergrad

46

: 00:04:56

computer science or statistics background, and I came into this world sort of by accident.

47

: 00:05:01

um I was a working mechanical engineer.

48

: 00:05:05

I worked for a small consulting firm that was doing energy efficiency consulting.

49

: 00:05:10

um Actually, uh a fun aside, I was doing causal inference without knowing it was causal

inference.

50

: 00:05:16

We were evaluating uh public policy programs m that give money for energy efficiency

projects.

51

: 00:05:24

and they wanted to know what the causal impact of those programs were.

52

: 00:05:27

um So at that job, I spent a lot of time doing kind of engineering first um modeling,

building models of fluid dynamics or heat transfer, these sorts of things.

53

: 00:05:42

um But I felt like very quickly that wasn't really able to keep up with the data we would

gather in the real world.

54

: 00:05:48

um So that naturally brought me to statistics and machine learning.

55

: 00:05:53

um which at the same time felt like I had to throw away all of the mechanistic knowledge I

had learned in engineering school.

56

: 00:06:01

um So with that kind of tension in mind, I went to grad school, studied a bunch of things

we'll talk about today, and now this consulting work is kind of coming full circle.

57

: 00:06:12

I'm trying to bridge the world between engineering first, modeling where you think hard

about the physics, think hard about the mechanisms in the world.

58

: 00:06:21

and the kind of probabilistic and causal AI that's made a lot of technical progress in the

last several years.

59

: 00:06:27

Yeah, definitely.

60

: 00:06:30

And I really like how you're blending all these different experiences into what you're

doing today.

61

: 00:06:38

And I'm actually also wondering, you did your PhD at UMass Amherst.

62

: 00:06:44

You did a visiting at MIT.

63

: 00:06:47

And if I understood correctly, both of those

64

: 00:06:50

revolved around some kind of probabilistic programming.

65

: 00:06:53

So was there a particular moment that turned that academic work into a career focus?

66

: 00:07:03

And where you knew you wanted to focus on that for at least the coming years because you

thought it was really interesting.

67

: 00:07:12

Yeah, I've always been academically minded.

68

: 00:07:15

um

69

: 00:07:18

For any of your listeners who went to engineering school, uh you'll remember a common

phrase that professors make, is, uh I'm sorry for all the math, but we'll get to the cool

70

: 00:07:28

stuff soon.

71

: 00:07:29

And that was never me.

72

: 00:07:29

I liked the math.

73

: 00:07:30

um So, you know, doing a PhD was a pretty natural progression for me.

74

: 00:07:38

But at the same time, I think that the career focus has been kind of driven by my past

engineering experience.

75

: 00:07:45

And then I see the computer science

76

: 00:07:47

um academic background is developing the schools to kind of meet that end.

77

: 00:07:52

Whereas other people might have started with a set of tools and then later developed those

tools into application areas for their career.

78

: 00:08:00

I see.

79

: 00:08:02

That's interesting.

80

: 00:08:03

I didn't know the like the engineering professors were like, okay, let's, let's get the

math out of the way and then focus on the cool stuff.

81

: 00:08:14

Yeah, it's it's often a lot of car nerds.

82

: 00:08:16

they want to build engines or design circuits and some, not all, certainly not all

engineers see math as an impediment to doing the cool physical things they want to do.

83

: 00:08:29

um It's obviously, it's a broad sweeping statement.

84

: 00:08:33

Many engineers are as math-nerdy as I am.

85

: 00:08:38

Right, Now that reminds me of em the TV series Big Bang Theory where

86

: 00:08:44

course Howard is the only the only member of the team who is an engineer and always get

teased by by Sheldon because first he doesn't have a PhD and second uh he's an engineer

87

: 00:08:56

saying that that made me think about that ah and which is of course not not true uh the

proof is that you're here you are on on this podcast ah which is very academic something

88

: 00:09:10

also I was I was um

89

: 00:09:13

interested in when preparing for your episode was that on your homepage, that's in the

show notes, by the way, folks, you say that you help teams combine mechanistic knowledge

90

: 00:09:27

with data driven machine learning.

91

: 00:09:30

From where you're coming from and the answers you just gave, I really understand now why

you're saying that on your website.

92

: 00:09:38

um But I'm also wondering

93

: 00:09:42

when you first realized that that marriage was a missing ingredient for many heartache

problems?

94

: 00:09:50

Yeah, yeah, it's a great question.

95

: 00:09:53

one of the core things about engineering models is that you can extrapolate with them,

right?

96

: 00:09:59

For a lot of models, they're derived from very simple concepts, right?

97

: 00:10:02

You might look at Newton's laws, for example.

98

: 00:10:06

And a lot of engineering modeling is about uh seeing the

99

: 00:10:10

natural consequences of those very simple laws.

100

: 00:10:14

so for example you can derive things about uh about objects colliding or rotation, uh

leverage, inertia, um conservation of energy, all from these kind of basic fundamental uh

101

: 00:10:32

laws of the world.

102

: 00:10:33

um and so the most part those laws generalize, right?

103

: 00:10:39

if i

104

: 00:10:40

If I imagine a bridge that I've never built before uh using these kind of mechanistic

laws, I can be pretty confident that if I've been careful about the math, that the bridge

105

: 00:10:51

is going to work the way I would expect it to work.

106

: 00:10:53

Now there are some caveats.

107

: 00:10:55

Of course, some materials behave in somewhat unsuspecting ways, but with a large asterisk,

that is generally true.

108

: 00:11:03

We expect engineering kind of mechanistic models first to generalize to unforeseen worlds.

109

: 00:11:08

um

110

: 00:11:09

And that matters, right?

111

: 00:11:10

People want to build systems.

112

: 00:11:13

They want to um make predictions outside of the realm of the data that they've gathered.

113

: 00:11:19

They want to be able to explain what would have happened had something been different to

inform policy decisions.

114

: 00:11:25

So people care about causality, um but they also care about learning from data, right?

115

: 00:11:33

Our knowledge of the world is fundamentally incomplete.

116

: 00:11:35

um And if you

117

: 00:11:39

If all you ever did was sat down, wrote down a model by hand, and then probed it for its

logical implications, the model would likely not represent the world very well.

118

: 00:11:50

It's just too complicated to try to model everything by hand.

119

: 00:11:54

I past waves of AI research have kind of taught us this.

120

: 00:11:58

um So the key point is that we should learn something from those mechanistic models of the

world, try to get some of that extrapolative capabilities, but not

121

: 00:12:10

lose all of kind of data-based machine learning approaches or statistical approaches that

have seen a lot of progress.

122

: 00:12:17

And I happen to think that probabilistic programming, in particular causal probabilistic

programming, is an extremely natural way to kind of bridge these two worlds.

123

: 00:12:26

Yeah, actually, that's a good segue into trying and defining all that.

124

: 00:12:33

Can you define those terms?

125

: 00:12:36

How do you differentiate them?

126

: 00:12:37

And why do you think?

127

: 00:12:40

that causal probabilistic programming has.

128

: 00:12:43

um Why do you have such hope for it to be able to bridge that gap?

129

: 00:12:50

Yeah, so when you say define those terms, you mean causal probabilistic programming?

130

: 00:12:55

Yes, exactly.

131

: 00:12:56

And the difference with the classic data-driven machine learning.

132

: 00:13:00

Sure.

133

: 00:13:01

So um it's worth zooming out a little bit.

134

: 00:13:04

um Maybe we could start by talking about

135

: 00:13:07

what is probabilistic programming before adding the causal part.

136

: 00:13:10

um So probabilistic programming ah in its most simplest form is just programs with

randomness.

137

: 00:13:17

um You can think of it essentially as writing simulation models that uh have special

syntax for denoting random variables and distributions.

138

: 00:13:30

um Now that's kind of a simplified representation because it doesn't tell you what they

do.

139

: 00:13:37

right?

140

: 00:13:38

and what you do with probabilistic programs is to automate or semi-automate uh a process

known as inference, which is about manipulating uh distributions to answer questions.

141

: 00:13:52

so some manipulations look like marginalization, uh what would happen if i were to sort of

uh average over a particular variable in my model, or conditioning where you

142

: 00:14:07

hold some variable fixed at some data and then update your distribution over all of the

other random variables.

143

: 00:14:13

um And these reasoning operations, these core reasoning operations of marginalization and

conditioning can be very computationally challenging.

144

: 00:14:23

So probabilistic programming languages often come with a set of inference algorithms that

either exactly or approximately solve um those reasoning steps.

145

: 00:14:33

So that's purely probabilistic.

146

: 00:14:36

Imagine uh coin flips or dice rolls or things like that.

147

: 00:14:41

um At this point, no causality is kind of imbued in these systems.

148

: 00:14:45

You can ask them about what is likely to occur when other events happen.

149

: 00:14:50

um But causal probabilistic programs give this kind of idea of taking action or

intervention in your system.

150

: 00:15:00

And they let you answer questions like,

151

: 00:15:03

would my distributions change if I were to hit my system with a hammer in some way?

152

: 00:15:06

if I were to intervene?

153

: 00:15:07

um and ah how would those changing distributions ah propagate into the things I care

about?

154

: 00:15:17

so causal probabilistic programming systems are systems that, uh similar to how

probabilistic programming systems help to automate probabilistic reasoning, causal

155

: 00:15:26

probabilistic programming systems like Cairo help to automate causal reasoning.

156

: 00:15:31

asking these kind of what if questions um that, as I said, many scientists and engineers

and policymakers care a lot about.

157

: 00:15:40

Yeah, yeah, yeah.

158

: 00:15:41

And I'm actually super grateful that you developed that whole package because it's

definitely something I'm going to use a lot.

159

: 00:15:52

And mainly because it's built on top of a probabilistic programming package, is amazing

because you get to keep all

160

: 00:16:01

goodies of Bayesian models basically where you have already the generative structure and

you can do forward and backward sampling but then you add the causal structure the ability

161

: 00:16:14

to intervene on your causal graph to do counterfactuals uh and stuff like that so I find

this is extremely helpful um and actually that's a good point now for us to turn

162

: 00:16:31

and talk about Cairo in particular.

163

: 00:16:34

I'm curious first what design principles of Cairo you are particularly happy with and also

why was Pyro the right backend in your mind to do that?

164

: 00:16:51

Yeah, so maybe I explain what causal probabilistic programming was.

165

: 00:16:56

general, but maybe I could just take a few minutes to explain what Cairo is.

166

: 00:17:00

specifically.

167

: 00:17:01

forgot my own question.

168

: 00:17:02

Thanks for doing that, Sam.

169

: 00:17:04

That's okay.

170

: 00:17:08

You can do my job later too, if you want.

171

: 00:17:11

No, that's way harder.

172

: 00:17:12

I uh prefer it like that.

173

: 00:17:14

Well, we'll see.

174

: 00:17:16

So Cairo is a causal extension to the Pyro probabilistic programming language, which is

built on top of PyTorch.

175

: 00:17:23

um So Pyro was one of the

176

: 00:17:26

earlier probabilistic programming languages to um really take deep learning and kind of

differential programming um as kind of a core building block.

177

: 00:17:36

um So Pyro is built on top of PyTorch.

178

: 00:17:40

um Pyro programs are essentially collections of PyTorch operations with special, as I

said, special syntax for defining random variables.

179

: 00:17:50

um So Kyro um

180

: 00:17:55

in a certain sense, is just a wrapper around pyro programs.

181

: 00:18:00

uh You write your probabilistic causal models as you would any other probabilistic

program.

182

: 00:18:07

It's just that when you're writing it, you have to kind of imbue it with a little bit more

meaning than you would an ordinary probabilistic program.

183

: 00:18:13

You're making a kind of implicit assertion that the order of variables that you write

represents like the causal ordering of things unfolding in the world.

184

: 00:18:24

Whereas

185

: 00:18:25

This is not something you need to do explicitly if all you care about is estimating um

marginal and conditional probabilities, as in Pyro.

186

: 00:18:33

um So with that kind of conceptual difference, um Cairo provides two main uh pieces of

functionality above what Pyro does.

187

: 00:18:47

um One is, as we said, an intervention operator.

188

: 00:18:51

So this takes a probabilistic program

189

: 00:18:54

and transforms it where some intervention has been applied.

190

: 00:18:57

ah You can think of this intervention like ah forcing some variable in your system to take

a particular value or to be generated by some function that is not the normal mechanisms

191

: 00:19:10

in your probabilistic program.

192

: 00:19:12

uh And the other thing that Cairo does, which as far as I can tell no other systems yet

support, is to

193

: 00:19:22

provide sort of bookkeeping for managing what we call counterfactual worlds.

194

: 00:19:26

um And this is important because lots of really rich causal questions involve um

combinations of outputs from different worlds with different interventions applied that

195

: 00:19:41

are interacting with each other.

196

: 00:19:42

And without this kind of support in Cairo, that kind of reasoning is really, really

difficult.

197

: 00:19:47

um I can give an example of that.

198

: 00:19:50

um

199

: 00:19:51

If you'd like.

200

: 00:19:52

OK.

201

: 00:19:52

So um a common thing you might want to do with your model is understand what are called

path-specific effects.

202

: 00:19:59

um So for those familiar with the jargon, this is called mediation analysis.

203

: 00:20:04

And the idea is that if I have a large and complicated model, I might want to isolate the

effect of one variable on another variable only through a particular pathway.

204

: 00:20:13

um I don't want to know the total effect of treatment on outcome.

205

: 00:20:20

I only want to know the effects mediated through something I particularly care about.

206

: 00:20:25

Yeah.

207

: 00:20:26

Yeah.

208

: 00:20:27

So that'd be like, maybe to give an example to people, it's like, if you have a

confounder, so common cause of the treatment and the outcome, you obviously have two uh

209

: 00:20:44

causal paths here from the treatment to the outcome.

210

: 00:20:47

but you only had one direct path, which is the one.

211

: 00:20:50

So this is actually a subtlety.

212

: 00:20:52

Treatment to the error.

213

: 00:20:53

The confounding relationship.

214

: 00:20:55

uh for those who are unfamiliar, confounding, I never know how far to back up the jargon,

but I'll just go far up.

215

: 00:21:04

So confounding is when you have a variable that influences both the thing you're

interested in intervening on and the thing you're interested in measuring the outcome.

216

: 00:21:12

uh

217

: 00:21:14

So in this setting, actually confounding is not the issue.

218

: 00:21:16

The issue is that between my treatment and my outcome, there are many different paths, all

pointing the same direction.

219

: 00:21:24

OK, so that'd be more like if you had a direct path and then a path that's mediated?

220

: 00:21:30

Yeah, exactly.

221

: 00:21:31

OK.

222

: 00:21:31

Exactly.

223

: 00:21:32

And you want to be able to isolate how much of the total effect is going through each of

these different paths.

224

: 00:21:39

OK.

225

: 00:21:39

So one way to codify this.

226

: 00:21:42

is using what are called nested counterfactuals.

227

: 00:21:45

So you can reframe this question of ah what is the path-specific effect of treatment on

outcome as, let me make sure I get this right, ah what would the outcome be if I

228

: 00:21:58

intervened on treatment and intervened on the mediator to be what it would have been had I

not intervened on treatment?

229

: 00:22:06

Right.

230

: 00:22:07

So that long convoluted sentence involves

231

: 00:22:11

intervening in two different ways in two different places with values sampled from a

different counterfactual world and kind of interweaving with each other.

232

: 00:22:20

That sentence is very difficult to codify unless you have the counterfactual worlds that

that Cairo provides.

233

: 00:22:28

And with Cairo it's four lines of code.

234

: 00:22:32

And we actually have some tutorials that show exactly how to do that.

235

: 00:22:36

If you go you'll see a mediation analysis tutorial on the docs page.

236

: 00:22:41

Yeah, and the docs page is in the show notes.

237

: 00:22:45

I highly encourage people to check it out.

238

: 00:22:48

Yeah.

239

: 00:22:50

mediation analysis is one example.

240

: 00:22:52

But it turns out that these uh manipulating counterfactual worlds is necessary for

answering other questions, like the ambiguous question, like, why did something happen?

241

: 00:23:03

It turns out to represent that, you need to do a search over possible counterfactual

worlds.

242

: 00:23:09

um

243

: 00:23:11

and optimize that search for outcomes where the thing you're trying to explain is likely

to happen and some other conditions are unlikely to happen.

244

: 00:23:20

um So doing that kind of search over counterfactual worlds is again not easy, that's kind

of stretching Cairo as far as it can go, um but at least it can be mechanized in Cairo um

245

: 00:23:34

and it's very difficult otherwise without these utilities.

246

: 00:23:40

Yeah, I can.

247

: 00:23:41

I can attest to that.

248

: 00:23:45

And that's why the work you're doing on Cairo uh is very important to help practitioners

focus on what is actually the most interesting things to them and the most important and

249

: 00:24:00

the thing that is actually customized to their use case.

250

: 00:24:06

Whereas what Cairo is doing is

251

: 00:24:08

trying and abstracting away the math, which is basically the thing that's generalizable.

252

: 00:24:16

So like in all cases, you're gonna need uh to calculate, you're gonna need to analyze your

counterfactual graphs.

253

: 00:24:24

And so Cairo is doing that for you.

254

: 00:24:27

then you can just focus on, what are the DAGs I'm focused on?

255

: 00:24:31

What are the counterfactual queries I'm actually after?

256

: 00:24:35

So.

257

: 00:24:35

Yeah, yeah, think this is this absolutely awesome for people

258

: 00:24:41

Hello my dear Bajans, just a short break to let you know of a very exciting opportunity I

have for you.

259

: 00:24:49

Learning Bajan statistics friend of the show Robert Osasu-Hernes has just opened a new

cohort of his causal AI online course and you know what?

260

: 00:25:01

I personally will be part of this cohort, so if you want to come learn with me, make sure

to enroll at the link in the show notes by Sunday September…

261

: 00:25:11

First, of course, you have a 50 % discount if you're a patron of the show.

262

: 00:25:17

I already shared the magic code with you on Discord and Patreon.

263

: 00:25:22

In this course, we're going to see how to merge the power of deep learning, causal

inference and probabilistic programming all in the same models.

264

: 00:25:31

And honestly, as a lifelong learner, I am beyond excited to add these causal arrows to my

statistical quiver.

265

: 00:25:40

So if you need more details, can listen to Robert's latest appearance on the show, which

is episode 137.

266

: 00:25:49

You can check out the workshop's page in the show notes, or you can just launch a

discussion on Discord if you're a patron.

267

: 00:25:57

Hope to see you in the workshop, folks.

268

: 00:25:59

And well, I hope that I will learn with you and with Robert.

269

: 00:26:04

And now, let's get back to my interview with Sam Waity.

270

: 00:26:11

So one point of clarification there, because it's a little subtle, is that Cairo doesn't

actually use the do calculus.

271

: 00:26:17

um for...

272

: 00:26:20

I thought you guys were.

273

: 00:26:21

So yeah, how do you do that?

274

: 00:26:23

Yeah, I could explain.

275

: 00:26:25

um So the do calculus ah is a set of symbolic rules developed by Pearl and students

several years ago that translate causal questions into probabilistic questions.

276

: 00:26:41

for what are called nonparametric structural causal models.

277

: 00:26:44

So the idea is that if all I have is a graph, um which denotes the structure of causal

relationships between my variables, um then the Do Calculus gives me a sound and complete

278

: 00:26:59

algorithm for translating that into probabilistic expressions that I can estimate from

data.

279

: 00:27:04

um One of the limitations of the Do Calculus um

280

: 00:27:10

It's a great tool, and I think the symbolic approaches are awesome.

281

: 00:27:14

I see what we do in Cairo as supplementary.

282

: 00:27:16

um One of the limitations of the do calculus is uh there's a key word I said in the

definition, is nonparametric.

283

: 00:27:25

And a lot of the times, the models we write about the world include parametric assumptions

that actually make it really much easier to do puzzle inference, generally.

284

: 00:27:37

um

285

: 00:27:39

So an example of this is, uh if you're familiar with regression discontinuity designs, uh

this is what's called the quasi-experimental design, where you have observational data,

286

: 00:27:51

where you have a continuous covariate.

287

: 00:27:53

So for example, um a classic example is the test scores a student get uh on an exam.

288

: 00:28:03

And you have a treatment, which is

289

: 00:28:05

in this example, whether they are admitted into the next grade or not, or whether they're

held back, and then an outcome which is uh their education outcomes in the long term.

290

: 00:28:17

So a regression discontinuity design ah takes advantage of the fact that we know that

there's a fixed threshold in this test score where if you're to the left of this

291

: 00:28:28

threshold, you get held back a grade, and if you are to the right of this threshold, you

advance to the next grade.

292

: 00:28:34

A key idea, a key conclusion of this is that you can look at individuals who are very,

close to that threshold on the left and the right and kind of intuit that because they

293

: 00:28:48

have such similar values of test scores, which may confound the relationship between

whether they um get held back and how their education outcomes are.

294

: 00:29:04

um You can assume that they're basically the same, because they're infinitesimally close

to each other, uh but they have this kind of differentiated treatment.

295

: 00:29:14

So it turns out that if you're willing to make these parametric assumptions about uh

smoothness of the functions, or even just this discontinuity existing at all, you can

296

: 00:29:25

estimate what's called the conditional average treatment effect arbitrarily close to that

discontinuity.

297

: 00:29:31

That kind of assumption is really easy to write as a program.

298

: 00:29:34

um You can't, as far as I know, write it directly as a graphical model.

299

: 00:29:40

The structure of the variables influencing each other doesn't tell you enough to encode

that.

300

: 00:29:46

So we use a different way of thinking about causal inference, um which I call, in my

thesis, uh Bayesian structural causal inference.

301

: 00:29:59

So the idea, actually I

302

: 00:30:01

Is it okay if I share some some figures?

303

: 00:30:04

Yeah, yeah, for sure.

304

: 00:30:05

That's gonna be that's gonna be Yeah, and actually that's where so people listening you

might want to switch to the to YouTube channel and you'll have the chapters in the

305

: 00:30:19

description so you can jump right into that moment in the in the episode.

306

: 00:30:26

Okay, so this is a uh kind of a cartoonish view of

307

: 00:30:31

what a structural causal model does in what we call the factual world.

308

: 00:30:36

So you can think of this space on the left as the sort of space of all the causal models

in the world that you're considering.

309

: 00:30:45

And the space on the right is um the data that you would observe.

310

: 00:30:51

So here you can see it's, again, kind of cartoonish, but you can imagine a single

structural causal model inducing a distribution over data.

311

: 00:30:58

So being a

312

: 00:31:00

Being good Bayesians, we can put probability distributions on these things.

313

: 00:31:04

And again, quite cartoonish.

314

: 00:31:06

But now, if you imagine a distribution over structural causal models, that induces a much

broader distribution over the data you would see.

315

: 00:31:15

Oh, yeah.

316

: 00:31:16

That makes sense.

317

: 00:31:17

Great.

318

: 00:31:18

So then, hopefully this is all very intuitive.

319

: 00:31:24

If you start with a distribution over structural causal models, and then you observe

factual data,

320

: 00:31:30

you can propagate, ah you can condition on that data and propagate that into posterior

distribution over these structural causal models.

321

: 00:31:39

And so far this is nothing but ordinary Bayesian inference.

322

: 00:31:41

There's really nothing new here.

323

: 00:31:43

um But interesting things happen when you start to add these intervention transformations.

324

: 00:31:50

So you can think of an intervention as a kind of function between models.

325

: 00:31:55

It takes a structural causal model,

326

: 00:31:58

and turn that into a different structural causal model, which, as you can see here, is

kind of at a different point in structural causal model space.

327

: 00:32:06

So the original structural causal model induces a distribution over data.

328

: 00:32:11

And the intervened structural causal model induces a different distribution over data.

329

: 00:32:15

And uh you can construct what we call a causal query, which is some function of your

distribution over data and counterfactual data.

330

: 00:32:26

So this might be like,

331

: 00:32:28

uh the average difference between outcomes under intervention and under your original

world.

332

: 00:32:35

So again, the story is similar.

333

: 00:32:38

If we induce a distribution over structural causal models, that propagates to a

distribution over intervened structural causal models, which then propagate to

334

: 00:32:47

distributions over factual data and counterfactual data, and then a distribution over the

causal query.

335

: 00:32:54

um

336

: 00:32:56

And it turns out that if you've uh set up these interventions in a certain way,

essentially you just need to preserve measurability, um then the whole thing can just be

337

: 00:33:13

treated as one big probabilistic inference problem.

338

: 00:33:17

So now rather than having just a joint distribution over models and data, you have a joint

distribution over models, data, intervened models, counterfactual data, and counterfactual

339

: 00:33:25

queries.

340

: 00:33:26

that's just a joint distribution.

341

: 00:33:28

and the machinery of probabilistic modeling and inference tells us how we can manipulate

these joint distributions and get posterior distributions over all the things we care

342

: 00:33:39

about.

343

: 00:33:41

so your factual data propagates to the posterior over structural causal models, which then

gives you a push forward distribution onto intervened structural causal models, which then

344

: 00:33:52

pushes forward into a distribution over counterfactual data.

345

: 00:33:56

And then you get a sort of posterior distribution over answers to your causal query.

346

: 00:34:04

Yeah, this is great.

347

: 00:34:05

Yeah.

348

: 00:34:06

this is this idea of just like taking causal models and turning them into probabilistic

models um is not unique to um to Cairo or this or my thesis.

349

: 00:34:19

um But I think in in Cairo, we really codified this.

350

: 00:34:25

and made it very explicit.

351

: 00:34:26

um So one way of thinking about what Cairo is doing is it's automating this edge here,

intervening from a structural causal model to a different structural causal model.

352

: 00:34:39

And it's automating, constructing all these different parallel worlds that construct this

big joint distribution over different worlds.

353

: 00:34:47

But then once you have that, like I said, it's just probability.

354

: 00:34:52

Once you get to this space,

355

: 00:34:54

And then you can use all of Pyro's existing inference tools to approximate inference

however you'd like.

356

: 00:35:01

Yeah, this is amazing.

357

: 00:35:04

Thanks a lot for sharing that.

358

: 00:35:06

Yeah, no problem.

359

: 00:35:07

Makes me want to try this right now.

360

: 00:35:10

I need a use case.

361

: 00:35:13

Spin up a record.

362

: 00:35:14

Let's go.

363

: 00:35:15

Yeah.

364

: 00:35:16

Man, yeah.

365

: 00:35:17

I'm going to use that very soon.

366

: 00:35:20

I have some.

367

: 00:35:23

some ideas of projects I can apply that to.

368

: 00:35:27

that sounds, yeah, I think it's super interesting.

369

: 00:35:30

And I do think, you know, it's feeling the need we have right now with the generative AI,

well, in making sure we can harness this power in a more structured way.

370

: 00:35:45

So it's not gonna be helpful for every problems, but I'm guessing there is a lot of...

371

: 00:35:52

problems for which is going to be extremely helpful.

372

: 00:35:56

Yeah.

373

: 00:35:57

Yeah.

374

: 00:35:58

And that's great.

375

: 00:36:00

realized I kind of skipped your actual question.

376

: 00:36:03

You asked me about Cairo's design and I talked about what Cairo is.

377

: 00:36:07

Yeah, that's fine.

378

: 00:36:08

I was gonna I was gonna ask you again that.

379

: 00:36:11

Okay.

380

: 00:36:11

And then maybe we can walk through a do a period to

381

: 00:36:17

Concrete concrete example after that, but yeah first if you can come back to the yeah the

designing of Cairo Yeah I think you answered the pyro part, you know why you based it on

382

: 00:36:29

pyro, but Mainly the the main philosophy if you want the Zen of Cairo, let's say yeah

Yeah, so it's worth saying the caveat which is that one reason we designed it off of pyro

383

: 00:36:45

is because

384

: 00:36:46

Eli Bingham, uh was another core developer of Pyro, arguably the lead developer uh and

architect, was also a core developer of the Pyro probabilistic programming language.

385

: 00:36:59

So there's a certain practical element of just familiarity.

386

: 00:37:01

That helps.

387

: 00:37:02

It helps, uh At the same time, I think there are some interesting things about Pyro's

design that made it uh easy to build off of.

388

: 00:37:14

um

389

: 00:37:16

The main goal with Cairo was to make the causal probabilistic programming language uh

modular and extensible.

390

: 00:37:24

So modular meaning that it could be decomposed into kind of atomic parts that can be

combined at will to solve all sorts of different problems.

391

: 00:37:34

So that means users can write models in a modular fashion and combine it with many

different questions.

392

: 00:37:41

They can take

393

: 00:37:43

many different questions and reduce them to probabilistic inference and then have modular

choices about what probabilistic inference algorithms they use.

394

: 00:37:52

um So that combination of freely being able to mix and match models, questions, and

algorithms um gives you this common tutorial explosion of choices as a modeler.

395

: 00:38:08

And they're choices you can enumerate um pretty quickly.

396

: 00:38:12

Right, moving from an average treatment effect to a conditional average treatment effect

is again, just a few lines of code.

397

: 00:38:18

Going from a conditional average treatment effect to mediation analysis is another few

lines of uh Modeling non-compliance in your treatment assignment is another single line of

398

: 00:38:27

code.

399

: 00:38:27

You just add a distribution to your intervened treatment assignment.

400

: 00:38:31

And because this is all just code, uh all the kind of composability you would normally get

from software follows in CHIRO programs as well.

401

: 00:38:39

You're not really...

402

: 00:38:40

uh

403

: 00:38:42

constrained to just thinking of causal inference questions ah as a kind of catalog of

different things to choose from.

404

: 00:38:49

It really is a programming experience, um which we were really intentional about.

405

: 00:38:54

uh The other thing we were trying to achieve, which I think we have, is to make Cairo

extensible.

406

: 00:39:02

um So it was pretty clear when first building Cairo that we weren't going to build in all

of the capabilities that we wanted from the get-go.

407

: 00:39:09

um

408

: 00:39:11

The literature is too large and complicated.

409

: 00:39:14

Research continues and progresses the field.

410

: 00:39:17

So we wanted to design it in a way that as new research was developed, it could be tightly

integrated into Cairo without kind of uprooting the whole system.

411

: 00:39:26

um So there are two ways that we've seen that happen already, um which we could nerd out

about for a very long time, but maybe we can delay for later.

412

: 00:39:39

um

413

: 00:39:41

One is extending Cairo's modeling language to support continuous time dynamical systems.

414

: 00:39:46

um And we have a paper that's currently in review and will be coming out soon kind of

describing all the math of what's going on there.

415

: 00:39:57

Yeah, what do you mean by that?

416

: 00:39:59

It's easy.

417

: 00:40:01

Yeah, what's the problem right now?

418

: 00:40:03

And what would that solve?

419

: 00:40:05

Yeah, so the problem is that um by default,

420

: 00:40:10

Intervention semantics for probabilistic programs don't tell you how to deal with

continuous time.

421

: 00:40:16

um And they don't tell you how to deal with models written as differential equations.

422

: 00:40:23

So that means, does that mean when you have, so for instance, time series, or is it also a

problem if you have continuous um variables?

423

: 00:40:35

do you mean that your outcome and treatment always have to be binary?

424

: 00:40:40

Or can they be continuous?

425

: 00:40:42

Yeah, variables can be continuous in Cairo.

426

: 00:40:45

That's really no problem.

427

: 00:40:46

um Defining an intervention on a continuous variable is just as easy as defining an

intervention on a binary variable.

428

: 00:40:55

Estimating can be harder, but that's kind of a separate question.

429

: 00:41:00

What I mean is that uh if you have a time series model, like a differential equation, that

um

430

: 00:41:09

describes the behavior of the system at every point in continuous time, um a key challenge

is that when you're simulating the system, you don't actually visit every point in time.

431

: 00:41:21

You would approximate this continuous time system with a collection of finite time points.

432

: 00:41:26

And if you did the naive thing of just applying an intervention to any of those finite

time points, you might miss the time where you want to apply an intervention.

433

: 00:41:34

um

434

: 00:41:36

And sometimes you might want to have an intervention where you don't know in advance when

it's going to occur.

435

: 00:41:42

So for example, if you're modeling a control system that turns on an HVAC controller when

the temperature of some room gets above some threshold, you kind of need to wait until you

436

: 00:41:54

see what the temperature of the room is before you can decide when to intervene.

437

: 00:41:58

um And that kind of representation of an intervention just wasn't covered by the existing

438

: 00:42:05

uh existing literature.

439

: 00:42:07

Okay, okay.

440

: 00:42:08

Yeah.

441

: 00:42:09

Yeah, that sounds hard.

442

: 00:42:11

That definitely sounds harder.

443

: 00:42:13

It's harder.

444

: 00:42:13

But at the same time, one of the really cool things about Cairo is uh if you define what

it means to intervene in such a system, which is work, I mean, it there's a fair amount of

445

: 00:42:24

math to make sure that that works out correctly.

446

: 00:42:27

But once you've done that work, you get all the other thing Cairo provides to you for

free.

447

: 00:42:32

So now all of a sudden, you can ask

448

: 00:42:35

mediation analysis questions of differential equations models.

449

: 00:42:38

You can ask attribution questions.

450

: 00:42:40

You can ask about probability of necessity or probability of sufficiency.

451

: 00:42:43

um And this works because, like I said, Cairo is modular.

452

: 00:42:48

We have this abstract notion of an intervention.

453

: 00:42:51

We define what that means on any given modeling system.

454

: 00:42:53

And then we have queries that combine these abstract interventions um in a way that's...

455

: 00:42:59

So the query model agnostic, even though it's implemented concretely on any given model.

456

: 00:43:04

um So that's very exciting.

457

: 00:43:07

I happen to like those kind of models, but also I just think it's a really interesting

demonstration of how far we might be able to take causal reasoning into different modeling

458

: 00:43:16

families that people don't normally think of as causal.

459

: 00:43:25

sounds super exciting, honestly.

460

: 00:43:29

Can't wait to see that.

461

: 00:43:31

Yeah, I'll hold off on the other thing because I think

462

: 00:43:33

We're going to talk about it later.

463

: 00:43:35

we've also been able to extend Cairo to support what's called efficient estimation.

464

: 00:43:41

um So now with more or less the click of a button, Cairo provides efficient estimators for

more or less any probabilistic program in any query you write in Cairo, which is very

465

: 00:43:55

exciting.

466

: 00:43:57

Yeah, I mean, we can talk about that now.

467

: 00:43:59

That sounds interesting.

468

: 00:44:00

So yeah, maybe define what you mean by.

469

: 00:44:02

efficient estimators, why that's useful and in which cases, like, why would people use

that concretely?

470

: 00:44:12

Yeah, okay.

471

: 00:44:13

So em when solving causal inference problems, you inevitably have to solve uh some

estimation task, right?

472

: 00:44:22

So if I'm interested in the effect of some treatment on some outcome, and there are

confounders, I need to adjust for all the potential confounders.

473

: 00:44:32

I'm observing.

474

: 00:44:33

uh One way of doing that is to build a regression model of the outcome given all the other

things.

475

: 00:44:41

So essentially, uh fit uh some distribution p of y given this huge vector of covariance

and one individual treatment.

476

: 00:44:51

um And without even going Bayesian, even just from a frequentist perspective, you can show

that

477

: 00:45:00

as the number of covariates gets very, large, m even though you're only trying to estimate

a single treatment outcome relationship, um your error rate, like the ability to get

478

: 00:45:13

closer and closer to the truth, um doesn't decay that fast.

479

: 00:45:18

Essentially, you have what are called a lot of these nuisance parameters, a lot of

parameters that are necessary to fit the model that don't really matter for the question

480

: 00:45:28

you're trying to answer.

481

: 00:45:30

that you'd like to kind of ignore.

482

: 00:45:33

You can think of it as a kind of like statistical budgeting.

483

: 00:45:37

I've got a finite amount of data.

484

: 00:45:38

I want to allocate the statistical juice I get out of that data towards the questions I

care about and not all these other nuisances that don't really matter.

485

: 00:45:46

em So an efficient estimator is an estimator that behaves as if you were only estimating a

single variable in this regression.

486

: 00:46:00

single parameter.

487

: 00:46:00

Okay.

488

: 00:46:01

Okay.

489

: 00:46:02

So it gets uh it approaches the truth as fast as if it were just a one D regression

problem with some constants thrown in.

490

: 00:46:12

But that's the general idea.

491

: 00:46:15

Okay.

492

: 00:46:15

Okay.

493

: 00:46:16

Okay.

494

: 00:46:17

Yeah.

495

: 00:46:17

So it's gonna make like, ideally, this is something that users wouldn't even really need

to be conscious about.

496

: 00:46:28

right?

497

: 00:46:29

That's something that's probably would do under the hood for them.

498

: 00:46:32

Yeah, so they it's much faster to have their analysis run.

499

: 00:46:36

But like, the normal user wouldn't even need to worry about that.

500

: 00:46:41

Yeah, that's right.

501

: 00:46:42

So right now, it's quite computationally intensive.

502

: 00:46:44

So we leave it as a separate module that you can either use or not use.

503

: 00:46:47

um But yeah, that's that's the general idea is that um systems like Cairo should automate

the hard math of finding these efficient estimators.

504

: 00:46:59

Yeah.

505

: 00:47:00

So that you can focus on modeling and asking questions.

506

: 00:47:03

And you don't need to have a PhD and a lot of expertise in functional analysis, for

example.

507

: 00:47:09

Yeah.

508

: 00:47:10

Yeah, that makes sense.

509

: 00:47:11

That's also what we try to always optimize for in PMC, for instance.

510

: 00:47:19

And so it makes sense that you guys are doing the same thing.

511

: 00:47:22

Yeah.

512

: 00:47:26

Do you have anything to add on that or should we try and go through a um concrete

do-operator example for listeners to make sure the idea clicks?

513

: 00:47:38

Maybe I'll add a little bit of detail for the really interested listeners.

514

: 00:47:45

The way that algorithm works is there's a paper led by Raj Agrawal.

515

: 00:47:51

is a spotlight at NeurIPS last year.

516

: 00:47:53

um

517

: 00:47:55

where we show how to estimate what's called an efficient influence function.

518

: 00:47:59

um And maybe all I'll say is that ah there's been a lot of literature in um the

semi-parametric statistics literature that shows that if you were able to find an

519

: 00:48:12

efficient influence function, you can get these efficient estimators ah for free.

520

: 00:48:17

There are a number of general recipes for using this efficient influence function object

to build efficient estimators.

521

: 00:48:23

um

522

: 00:48:24

The challenge before our paper was that finding an efficient influence function required a

lot of handcrafted math.

523

: 00:48:33

And what our paper shows is how to approximate the efficient influence function using only

quantities that an ordinary probabilistic programming system can give you.

524

: 00:48:42

You need uh derivatives of probability densities.

525

: 00:48:46

You need to approximate the inverse Fisher information matrix.

526

: 00:48:49

um But that's something you can do using an AD system.

527

: 00:48:54

and a probabilistic programming system.

528

: 00:48:55

um So we show in the paper that using this technique, even though it's an approximation,

you get arbitrarily close to um what the efficient estimators would be.

529

: 00:49:06

There's a number of experiments.

530

: 00:49:08

Check out the paper for the cool math.

531

: 00:49:09

um Yeah.

532

: 00:49:12

Yeah.

533

: 00:49:12

And make sure to add the paper in the show notes if that's not done already, because I'm

sure interested listeners will go and check that out.

534

: 00:49:22

um

535

: 00:49:23

Cool.

536

: 00:49:24

Do you want to go through a do-operator example, then?

537

: 00:49:28

Sure.

538

: 00:49:28

Yeah, sounds great.

539

: 00:49:29

Fantastic.

540

: 00:49:30

Do you want to share your screen again, or do you want to do that, I don't have a visual

explanation.

541

: 00:49:37

I was hoping we could just talk about it.

542

: 00:49:39

Yeah, that works.

543

: 00:49:41

I was just curious how you wanted to do that.

544

: 00:49:44

Well, um Yeah.

545

: 00:49:47

OK, so uh maybe this isn't what you were looking for, but I think you can.

546

: 00:49:52

You can think of a do operator as modeling uh a kind of thought experiment where you force

something to be true or not true or a particular value.

547

: 00:50:04

So you could imagine a model relating um medication that patients take um and their health

outcomes and some variables describing their behavior.

548

: 00:50:19

uh

549

: 00:50:21

And in your original model, em all of these things are connected.

550

: 00:50:26

The way people behave in the world, their diet, their exercise, may causally influence the

medications they take em and may also causally influence their health outcomes.

551

: 00:50:39

This is what we would call a confounding variable, as we said before.

552

: 00:50:45

And if you want to ask the question, does this medication influence cardiovascular

disease, for example,

553

: 00:50:52

You can model that by uh instantiating a kind of thought experiment where you've forced

some people either to take the medication or not take the medication, regardless of what

554

: 00:51:05

their ordinary data generating mechanism would be.

555

: 00:51:09

And that act of forcing someone to do something is modeled in Cairo using this do

operator.

556

: 00:51:15

So the way you would do that mechanically is you write a probabilistic program describing

the way people,

557

: 00:51:22

the way you think people choose to uh take medication or not.

558

: 00:51:29

And again, this can be kind of a statistical model or machine learning model with many

free parameters you learn from data.

559

: 00:51:35

That's all fine.

560

: 00:51:37

And then in asking your question, what is the effect of medication on cardiovascular

disease, you would instantiate two counterfactual worlds and intervene to set one world

561

: 00:51:50

where the

562

: 00:51:51

Patients are forced not to take the disease.

563

: 00:51:54

One world where the patients are forced to take the disease and then compute their

difference in outcomes.

564

: 00:52:01

To see this concretely in code, maybe I'll share after some specific pointers to Cairo's

documentation where you can kind of walk through similar examples where the actual syntax

565

: 00:52:11

for the due operator is being used.

566

: 00:52:13

Yeah, for sure.

567

: 00:52:15

I mean, I've looked into those before the show also for people who've listened to a

568

: 00:52:20

Robert Ness episode here.

569

: 00:52:23

Well, I recommend to go through it.

570

: 00:52:26

And if you have his book also, he goes into a lot of details about the two operator.

571

: 00:52:33

Yes, it's good that you mentioned Robert.

572

: 00:52:35

it's a lot of the ideas I'm talking about, I like to try to be careful with credit

assignment.

573

: 00:52:40

So I'm not claiming that I invented any of the kind of general ideas of sort of causal

probabilistic programming stuff, I think.

574

: 00:52:50

um Robert, me, uh Alex Lev, Eli Bingham, Zena Tavares, and Jeremy Zucker um all wrote this

kind of opinion article maybe four or five years ago together, kind of outlining this

575

: 00:53:05

general kind of approach to thinking about causality and probabilistic programming.

576

: 00:53:09

um So just quickly wanted to give credit where credit is due, and I know Robert pretty

well, and we've chatted about a lot of these ideas for a long time.

577

: 00:53:18

Yeah, yeah, yeah.

578

: 00:53:19

And I've added so he was on episode 137 of the show and it's in the show notes already

folks.

579

: 00:53:26

So definitely recommend giving that a listen.

580

: 00:53:28

That was his second appearance actually on the show.

581

: 00:53:32

um Maybe a more general question for you about Cairo Sam that I have is what would you say

are the best use cases for it and when would you recommend listeners to give it a go?

582

: 00:53:47

So I think it's

583

: 00:53:49

quite a general purpose tool.

584

: 00:53:50

um If you're um interested in purely statistical forward causal inference workflows, you

can uh represent a lot of common assumptions as probabilistic programs in Cairo.

585

: 00:54:07

And we have a number of tutorials that show how to do that.

586

: 00:54:11

m For example, we have a synthetic difference in differences tutorial for uh causal

inference in panel data.

587

: 00:54:19

We've got uh some models about um proxy variables where you have a latent confounder and

some measured proxy of that confounder.

588

: 00:54:31

um So a lot of the kind of core kind of statistical econometric social science um models

and assumptions can be pretty easily represented in Cairo.

589

: 00:54:44

um

590

: 00:54:46

What I'm most excited about are these kind of hybrid models where part of your model is

like physical equations, like I said before, like mechanistic models.

591

: 00:54:57

And part of the model is kind of filled in with machine learning where you may assume a

causal structure, but you don't know much about the particular functions relating

592

: 00:55:08

variables.

593

: 00:55:09

And you just kind of hot swap in a machine learning component in your probabilistic

program um to represent that.

594

: 00:55:15

kind of broad uncertainty.

595

: 00:55:17

So I think either of those two camps would get a lot of value out of Cairo.

596

: 00:55:21

um It's also worth mentioning that Cairo provides some interesting capabilities even if

you don't have data and you just want to probe your model.

597

: 00:55:31

um So asking these kind mediation analysis questions or attribution questions, even just

from a probabilistic program you write from hand, can be really insightful.

598

: 00:55:40

It can tell you a lot about what the implicit assumptions you're making about your model

are.

599

: 00:55:46

it's nice that it happens to work well with conditioning on data and then doing posterior

inference over parameters.

600

: 00:55:54

Yeah, yeah, exactly.

601

: 00:55:55

mean, it's, it's also something that you get from, from Bayesian modeling, right, where

you have to, explicit your priors and your structure of the model, then if you add the

602

: 00:56:07

causal structure on top of that, it also makes your real modeling assumption in

assumptions even clearer and, and, and you get to know whether you can actually answer the

603

: 00:56:15

where is your actual curse about or if you need uh to refine your graph, your causal

graph, or if you need to um refine your data, if you can.

604

: 00:56:27

Yeah, it's actually, it's a great point.

605

: 00:56:30

So I think another audience who might get a lot of value out of looking at Cairo are data

scientists who are familiar with Bayesian modeling, but maybe less familiar with causal

606

: 00:56:40

inference.

607

: 00:56:41

Cause I happen to think that this view of causal inference as

608

: 00:56:45

writing probabilistic programs and then transforming them um is fairly intuitive for

people who already used to writing probabilistic programs.

609

: 00:56:52

And to me, it makes the assumptions seem much more clear.

610

: 00:56:55

uh Often the causal inference literature can be very verbose and complicated.

611

: 00:57:02

And I've always found it much more clear to look at a program and to kind of try to tease

apart what assumptions are being made in that program.

612

: 00:57:10

Yeah, completely agree.

613

: 00:57:12

I'm the same.

614

: 00:57:12

I'm also a very...

615

: 00:57:14

practical and visual learner.

616

: 00:57:16

So looking at the code and the programs and playing with it is actually a much better way

for me to learn.

617

: 00:57:25

I also get lost in the terminology.

618

: 00:57:31

actually seeing what the different objects represent is much more efficient for me.

619

: 00:57:37

Yeah, it's also unfortunate that there's so much terminology because the ideas are so

universal.

620

: 00:57:44

these different camps um actually are quite intimately connected.

621

: 00:57:50

I you can translate between them.

622

: 00:57:53

If you know the language, you can translate between them.

623

: 00:57:56

But from someone who's just getting into causal inference and looking at this kind of

literature, it's very confusing, unfortunately.

624

: 00:58:05

Yeah, for sure.

625

: 00:58:08

That's why I also have these podcasts in trying to be a

626

: 00:58:13

trying to be a bit helping people about, it's like out of the terminology, what are we

talking about here?

627

: 00:58:22

And how can we actually use that on, you know, on real models and use cases?

628

: 00:58:28

Yeah, I think it's great that you're talking to people and kind of cutting through the

jargon and the papers and the software.

629

: 00:58:35

I find just, like I said, talking or just listening to people talk, very clarifying.

630

: 00:58:42

Yeah.

631

: 00:58:43

Yeah, exactly.

632

: 00:58:44

And that's also why I keep doing this show, that like personally I learn a lot by talking

to people like you.

633

: 00:58:50

uh And that also makes the papers more uh incarnated, whereas creating a paper can be very

cold.

634

: 00:59:01

Yeah, it's sort of unfortunate that in an academic setting you write papers almost

defensively to prevent other reviewers from...

635

: 00:59:12

penalizing you too hard.

636

: 00:59:14

But that trend, I think, pushes papers to be less accessible for people who would actually

get value out of them.

637

: 00:59:21

ah So now that I'm out of my PhD a few years now, I'm sort of inclined to just write blog

posts.

638

: 00:59:31

That seems a bit nicer than still writing some papers.

639

: 00:59:34

But yeah, it's not as obviously the best communication medium to me anymore.

640

: 00:59:40

Yeah, I completely agree with you in

641

: 00:59:42

Yeah, I'm always, I always say that I read papers because I have to, but not because I

want to.

642

: 00:59:50

It's like, reading paper is never fun, honestly, for me.

643

: 00:59:54

Reading blog posts, reading package documentations as you have on Cairo, reading books,

know, like Robert's books or, you know, this is much, much more entertaining to me.

644

: 01:00:08

Also, because here, I didn't hear what you're trying to

645

: 01:00:12

uh optimize is not whether you're going to pass the threshold for peer reviewed

publication, but it's mainly just, well, I want people to enjoy what they are going to

646

: 01:00:23

read and remember what they're going to read and be interested in it.

647

: 01:00:27

And so that's a much better function to optimize than the first one.

648

: 01:00:32

Yeah.

649

: 01:00:33

And ultimately, if I want, you know, these tools to be useful to people, you have to have

people use them.

650

: 01:00:39

And I have a pleasure to use and understand, as you know.

651

: 01:00:45

That's a good point.

652

: 01:00:47

I will say, when I'm learning a new field or a new subfield, I actually like to read PhD

dissertations.

653

: 01:00:54

find that when you write a thesis, you have to write all this background and scaffolding,

if you do a good job.

654

: 01:01:02

And that provides a lot of context, which is hard to find otherwise.

655

: 01:01:07

Yeah.

656

: 01:01:07

Yeah, I see what you mean.

657

: 01:01:08

Yes, I mean, a thesis is almost like a book, right?

658

: 01:01:12

I think it's basically the same.

659

: 01:01:14

Right.

660

: 01:01:14

Could be a book most of the time.

661

: 01:01:16

So, right.

662

: 01:01:17

That's true.

663

: 01:01:17

Yeah, that helps.

664

: 01:01:18

em In well, actually, I'm curious, you know, talking about Cairo, what do you have in mind

on the roadmap for for the coming month?

665

: 01:01:30

Is there anything you are especially looking forward to adding to making better?

666

: 01:01:36

Do you need any help on that?

667

: 01:01:39

The links are in the show notes.

668

: 01:01:41

So people interested in making some PRs for that, they can look into that.

669

: 01:01:46

Yeah, what's the state of Cairo?

670

: 01:01:49

Yeah, so me personally, I'm doing a little bit less core development work now that I've

picked up more of this consulting work.

671

: 01:01:55

em But there are some things I'm excited about.

672

: 01:01:59

em So one is extending some of the dynamical systems work I talked about before.

673

: 01:02:06

um The landscape of dynamical systems models is very broad.

674

: 01:02:10

um So far we've gotten support for ordinary differential equations, but you might also

want to model stochastic differential equations, differential algebraic equations, VDE's,

675

: 01:02:22

those sorts of things.

676

: 01:02:23

um And that's going to take a lot of development work.

677

: 01:02:27

um As far as how people can get involved, know, Cairo, we try to maintain a um

678

: 01:02:34

the issues and pull requests as best you can.

679

: 01:02:37

And if you browse the issues and see any tagged help wanted, just jump right in or

discussion, anything like that.

680

: 01:02:43

em It's totally open source.

681

: 01:02:46

We welcome uh pull requests from the community.

682

: 01:02:50

em Just ask that you write tests and engage in conversation.

683

: 01:02:55

Yeah.

684

: 01:02:55

Well, that sounds great.

685

: 01:02:58

So folks, if you want to get in there, ah please do.

686

: 01:03:03

And I'm sure it will be received very well.

687

: 01:03:07

thing that's always useful for a system like Cairo is more examples.

688

: 01:03:12

So we have a number of examples and tutorials, but we're really only scratching the

surface of the kinds of things you can do with Cairo.

689

: 01:03:19

um So if you think of interesting causal machine learning methods or models you want to

implement and share with the world, that's a really good avenue to contribute to open

690

: 01:03:32

source software.

691

: 01:03:34

And I think it teaches a lot.

692

: 01:03:36

Yeah, yeah, it's also about how I how I learn about a new method, It's diving into an

example, trying to find your use case, and then just, you know, trying it and see what

693

: 01:03:49

doesn't work, what I didn't understand yet.

694

: 01:03:51

And that really makes me understand way better than than just passively consuming content.

695

: 01:03:58

um On the on the roadmap, though.

696

: 01:04:01

um Yeah, do you know, do you know if

697

: 01:04:03

if the other devs have something in particular they want to develop in the coming month?

698

: 01:04:09

is it pretty much open right now?

699

: 01:04:11

think Cairo is pretty open.

700

: 01:04:13

that some context um Cairo is uh built by a number of researchers at a nonprofit research

institute that I used to be a research scientist at named basis, constantly hiring great

701

: 01:04:28

people doing very interesting work.

702

: 01:04:31

So um

703

: 01:04:33

Cairo was developed uh as a part of Basis.

704

: 01:04:38

Again, like I said, it's fully open source.

705

: 01:04:40

And there are a number of other open source technologies that people at Basis are working

on.

706

: 01:04:45

um I think the plan right now, I'd have to check in with some of the other um Cairo

developers, but I think the plan right now is to kind of rebuild kind of the foundation of

707

: 01:04:59

probabilistic programming systems from the ground up.

708

: 01:05:03

So for now, don't take this with a grain of salt.

709

: 01:05:07

But my understanding is that Cairo is fairly stable um and that there are more kind of

larger scale visions on the research pipeline there.

710

: 01:05:17

Yeah.

711

: 01:05:18

I see.

712

: 01:05:19

So actually, let's talk a bit about your service AI uh company.

713

: 01:05:25

Because now that's your main gig, as you were saying.

714

: 01:05:28

That's right.

715

: 01:05:29

Yeah, okay.

716

: 01:05:30

I'm curious.

717

: 01:05:30

Of course, I'm very curious about it because this is what I used to do.

718

: 01:05:33

This is uh I've been in the exact same place as you are right now with the last but in

: 2020

719

: 01:05:41

So yeah, like, how is it going?

720

: 01:05:46

What has been your uh your biggest surprises?

721

: 01:05:49

uh Currently, it's been on the three months you've told me but I'm also wondering, you

know, what are the toughest real world scientific question?

722

: 01:05:58

uh a client has thrown at you so far, and how did causal probabilistic programming help

with it?

723

: 01:06:07

Yeah, yeah.

724

: 01:06:07

So um I can't say that much about what I'm working on.

725

: 01:06:12

But from a high level, I have a few clients now who are in the kind of science and

engineering or health space.

726

: 01:06:21

um So one is working on uh

727

: 01:06:26

materials for sustainable energy products, and another is working on preemptive health.

728

: 01:06:31

And both of these settings care about m probability and causality um and machine learning

and learning from data.

729

: 01:06:42

um So I've been able to kind of insert myself into their teams and support them in

building models, understanding ah cause and effect relationships.

730

: 01:06:56

Um, and it's a lot, a lot of the work is translation.

731

: 01:06:59

Um, I think because causal probabilistic programming or causal inference is still a fairly

niche topic.

732

: 01:07:06

Um, I'd say the largest challenge for me is not just jumping right out the door with a lot

of jargon, um, and trying to meet people where they're at and do this kind kind of

733

: 01:07:17

translation into causal and probabilistic assumptions in my own head.

734

: 01:07:23

so it's, fairly common that I'll sit down and

735

: 01:07:25

and meet with engineers and kind of um in trying to probe causal questions for them, ask

it in several different ways.

736

: 01:07:34

um Often the questions that I thought they wanted to ask were not exactly the questions

they were asking.

737

: 01:07:42

um And actually, systems like Cairo in those settings make it uh really, really easy to

iterate on those workflows.

738

: 01:07:52

So I can sit down with an engineer and say, okay, well,

739

: 01:07:55

You have this kind of vague causal question you're interested about this domain.

740

: 01:08:00

ah Here's one proposed way of codifying that ah concretely, looking at the implications

together in the span of a few minutes, and then having them say, well, no, that's not

741

: 01:08:12

quite what I wanted.

742

: 01:08:14

Could we think about this other kind of causal question?

743

: 01:08:16

And then with just a few lines of code, changing the question and getting answers right

there in a live environment, that's turned out to be really powerful.

744

: 01:08:24

as kind of a modeling workflow.

745

: 01:08:26

um I think the biggest challenge, this is not specific to clients I have now, but the

biggest challenge I see in getting broader adoption out into the world is that for some

746

: 01:08:39

reason, it seems like people forget that when you have high dimensional data, correlation

still does not equal causation.

747

: 01:08:47

um I don't know why that is.

748

: 01:08:50

I think that there's just this really strong tendency to anthropomorphize

749

: 01:08:53

machine learning models to doing what they're not doing, especially given all the kind of

hype around agentic systems, LLMs, reasoning models, these kinds of things.

750

: 01:09:06

um It's very tempting for everyone to kind of say, well, let's just throw AI at it without

really knowing what that is or knowing what the implications are.

751

: 01:09:15

And in trying to be rigorous and precise, that's kind of an uphill battle.

752

: 01:09:21

You can find yourself battling.

753

: 01:09:23

That said, think my current clients, I'm very lucky to have collaborators who are uh

careful, thoughtful, measured, em maybe don't have as much expertise in the specific

754

: 01:09:36

things that I'm offering, but in general are kind of critical enough of hype cycles that

em they're not just buying all into LLMs as a kind of one-stop shop for all your AI needs.

755

: 01:09:53

Yeah, yeah, that makes a ton of sense.

756

: 01:09:56

I definitely resonate with that.

757

: 01:09:58

um And of course, I have uh delivery that's coming in.

758

: 01:10:04

So I'll come back in like five minutes stops.

759

: 01:10:06

And then we'll but we'll we'll get back to that.

760

: 01:10:09

Basically, I'll pick up on that and talk about what we're talking before uh recording, you

know, uh we're with like the adoption of causal inference and stuff like that.

761

: 01:10:20

And yeah, sounds great.

762

: 01:10:22

Yeah, awesome.

763

: 01:10:23

I'll be back in just a few minutes.

764

: 01:10:25

you soon.

765

: 01:10:25

That's good.

766

: 01:10:27

Yeah, actually, I actually have a similar experience and something I was always very

curious about this net, you often have to justify that you want to take the causal path.

767

: 01:10:42

And I'll always, you know, surprised by that because it was like, it feels like the burden

of proof should be reversed.

768

: 01:10:50

You know, you should

769

: 01:10:51

want, especially if you want scientific answers, you want reproducible results.

770

: 01:10:56

So I'm always thinking, well, why would you not want causal results should be the default.

771

: 01:11:04

uh But yeah, somehow, it's not really I have different uh speculations on why that is.

772

: 01:11:11

But I'm curious what your experience and thoughts are on on that.

773

: 01:11:17

Yeah, do you want my to have my hot take?

774

: 01:11:19

Or do you want my measured?

775

: 01:11:21

Diplomatic response.

776

: 01:11:25

Both, think both are going to be interesting.

777

: 01:11:28

I feel comfortable.

778

: 01:11:29

Let's go with the hot take.

779

: 01:11:29

So I think one of the major reasons why people are hesitant to go for causal inference is

because there's an academic tradition of um being almost curmudgeonly when people make

780

: 01:11:47

causal claims.

781

: 01:11:49

It's just

782

: 01:11:50

Because causal inference is hard, it's very easy to criticize.

783

: 01:11:56

I you need to make a number of assumptions, and um the conclusions often depend strongly

on the assumptions that you make, which is, good to know.

784

: 01:12:06

I mean, it's worth being critical.

785

: 01:12:08

um But I think, unfortunately, what this means is that people often are so afraid of being

criticized for overstepping in their causal claims that they don't even make an attempt.

786

: 01:12:20

the first place, and instead use kind of hosel-ish language with associational methods,

but never actually go out there and say anything about causality because they're kind of

787

: 01:12:33

afraid.

788

: 01:12:33

uh so let me give you an example.

789

: 01:12:36

several years ago i worked on a problem called genome-wide association studies, which is

about trying to understand the association between ah

790

: 01:12:48

single nucleotide polymorphisms, like the positions in your gene sequence um in your DNA,

um with various health outcomes.

791

: 01:12:59

So if you have this single nucleotide polymorphism, you're more likely to get, I don't

know, cardiovascular disease before you're 60 years old, something like that.

792

: 01:13:14

And if you look at a lot of the methods in...

793

: 01:13:18

genome-wide association studies.

794

: 01:13:19

They look like causal inference methods.

795

: 01:13:22

So you adjust for potential latent confounders.

796

: 01:13:26

You even do this kind um of latent PCA um correction step that looks like some causal

inference methods that have come out in the last several years on, it's called multi-cause

797

: 01:13:45

confounding.

798

: 01:13:46

um

799

: 01:13:48

But the methods are called association studies.

800

: 01:13:52

Again, I think because it's very risky to say that a particular single nucleotide

polymorphism is a cause for a particular phenotype.

801

: 01:14:03

And it's good to be hesitant, I think, again.

802

: 01:14:07

The problem with that kind cautious worldview, I think, is that you have to make

decisions.

803

: 01:14:13

And if you have to make decisions one way or another,

804

: 01:14:16

implicitly you need to reason about causality.

805

: 01:14:18

So if you're snoozing to bury your head in the sand and say, I don't have strong enough

assumptions, I'm unwilling to even make any sort of a causal claim, what you're implicitly

806

: 01:14:30

choosing is to just do causal modeling poorly.

807

: 01:14:32

um So I think it is better to be honest about what you're trying to achieve.

808

: 01:14:39

And if you really care about causality, try to model the problem as a causal problem.

809

: 01:14:45

and to the best of your ability, represent your uncertainty and your assumptions

explicitly and be cautious in how you interpret them, but still go forward somehow.

810

: 01:14:58

I like to say often, inaction is action.

811

: 01:15:01

um Time is passing by, right?

812

: 01:15:04

And if you have a choice of making a policy or not making a policy and you refuse to make

causal claims, you're choosing not to make a policy.

813

: 01:15:11

Yeah, definitely.

814

: 01:15:14

Yeah, the fact...

815

: 01:15:15

The fact of not doing anything is already is already doing something.

816

: 01:15:19

yeah, you're choosing to stay with the current solution, actually.

817

: 01:15:23

yeah, yeah.

818

: 01:15:24

Yeah.

819

: 01:15:25

And I think this is definitely one of the of the big reasons uh I also see kind of the

same em phenomenon at hand with uh patient models, generally, would say, because they are

820

: 01:15:42

much better.

821

: 01:15:44

and being poked at, right?

822

: 01:15:46

You know when you have divergences, when the sampling's not going well, you can check the

priors, you can check how the posterior is contracting around, how the prior is

823

: 01:15:56

contracting around the posterior, you can test again the different test values and so on.

824

: 01:16:03

So it's much easier to poke at a Bayesian model than it is to poke at a classic

Frequentius model.

825

: 01:16:10

And so that means

826

: 01:16:11

your assumptions also much more out in the open.

827

: 01:16:15

And so you have many more things to criticize much transparently.

828

: 01:16:20

so, yeah, I can definitely see that being weirdly an impediment to putting in production

better models that are more open, but they are easier to criticize.

829

: 01:16:33

So then it's easier to just delay them.

830

: 01:16:36

Yeah.

831

: 01:16:37

Yeah.

832

: 01:16:38

It's, mean, there are some settings where

833

: 01:16:44

you can get away with making fewer assumptions.

834

: 01:16:48

And that's good.

835

: 01:16:51

But yeah, as you're saying, sometimes you need to.

836

: 01:16:54

Yeah.

837

: 01:16:55

And I mean, in when you do causal modeling, you need to like even from a mathematical

perspective, like the the the latter of of causal inference, you know, the three levels

838

: 01:17:10

that Judy appeals

839

: 01:17:12

uh demonstrated or not, you know, just a way of thinking about things and just putting

things in buckets.

840

: 01:17:19

It's actually a mathematical derivation of what you can do with the data model and

assumptions at hand.

841

: 01:17:27

And if you're not willing to make level three assumptions, so counterfactual assumptions,

then you will never be able to make counterfactual claims.

842

: 01:17:38

Yeah, that's right.

843

: 01:17:39

Yeah.

844

: 01:17:39

It's not like we're doing

845

: 01:17:41

We're saying, yeah, that's a cool thing.

846

: 01:17:42

can just divine things into categories.

847

: 01:17:45

It's actually kind of a, it's a theory, you know, so.

848

: 01:17:48

Yeah.

849

: 01:17:48

Yeah.

850

: 01:17:48

Yeah.

851

: 01:17:49

To make a choice.

852

: 01:17:51

So that's, if you don't mind me digressing a little bit, um, there is one caveat about the

kind of Bayesian approach to causal inference that I'm proposing here that I want people

853

: 01:18:03

to be aware of, is that.

854

: 01:18:05

Yeah, go ahead.

855

: 01:18:06

Nerdy digressions is it's the subtitle of this podcast.

856

: 01:18:11

Great.

857

: 01:18:11

Okay, good.

858

: 01:18:12

Good.

859

: 01:18:12

I'm in comfortable territory then.

860

: 01:18:16

zooming out a little bit, um this Bayesian approach to causal inference and Cairo's kind

of mathematical underpinnings don't escape this fundamental need for assumptions that

861

: 01:18:30

you've described.

862

: 01:18:31

Right?

863

: 01:18:32

There's no there's no free lunch in Cairo.

864

: 01:18:33

It doesn't magically mean that you can derive causal conclusions from nothing.

865

: 01:18:38

um

866

: 01:18:39

Cairo really should be thought of as a way of writing your assumptions and then seeing

their conclusions.

867

: 01:18:44

um So the way identifiability shows up in this Bayesian framing is in the posterior

itself.

868

: 01:18:55

The idea is that um if a query is not identifiable from the data, then the posterior

distribution should never collapse to a single point or the truth, regardless of how much

869

: 01:19:06

data you get.

870

: 01:19:07

um

871

: 01:19:10

that's i think a well-defined kind of mathematical framing of what identifiability is in a

Bayesian setting.

872

: 01:19:17

the tricky thing is that um if you want to know whether your query is not identifiable or

not, um just running approximate inference turns out to be a pretty bad method.

873

: 01:19:30

um these kind of non-identified posteriors can get um arbitrarily thin, and you can think

of it as like

874

: 01:19:40

manifold in a higher dimensional space and expecting any approximate inference algorithm

to like slowly traverse that manifold is asking too much.

875

: 01:19:48

So I think what will often happen is that if you use an approximate inference algorithm,

it'll find a mode of that kind of infinitely thin posterior and then get stuck there and

876

: 01:19:59

imply that you have a lot more certainty in your answer than you actually do.

877

: 01:20:04

So it is still worth doing the kind of explicit identifiability analysis beforehand.

878

: 01:20:10

just to make sure that those kind of pathologies aren't misleading you in your causal

conclusions.

879

: 01:20:16

Now, in theory, if the probabilistic inference methods continue to get better and are able

to traverse these very challenging posterior landscapes, this will be a non-issue.

880

: 01:20:26

um But for now, ah it is an issue.

881

: 01:20:31

I wrote some work in my PhD about how to address this.

882

: 01:20:34

think it's kind of an incomplete.

883

: 01:20:37

The paper I wrote is kind of incomplete.

884

: 01:20:39

But people can read it if they're interested in it I'll share it in the show notes em for

one way of thinking about identifiability, how to find whether causal questions are

885

: 01:20:50

identifiable in this kind of Bayesian setting.

886

: 01:20:53

yeah, definitely.

887

: 01:20:55

Do share that.

888

: 01:20:56

It will be very interesting in that too.

889

: 01:20:59

I know the do-while library has a very good user interface to help people see if their

causal queries are identified.

890

: 01:21:09

But this is not the innovation framework.

891

: 01:21:11

So it's going to be, it's going to be very interesting to see.

892

: 01:21:14

So I haven't looked at do I in a while.

893

: 01:21:17

um I know a few of those guys quite well.

894

: 01:21:21

And I know them okay.

895

: 01:21:23

And I, you know, think highly of their work.

896

: 01:21:25

um My memory the last time I looked is that it's still very much in the kind of

nonparametric graphical models setting.

897

: 01:21:33

um So some of the assumptions that you can represent in a system like Cairo,

898

: 01:21:39

that end up being very practical are very difficult to reason about there.

899

: 01:21:43

Things like the regression discontinuity design I described earlier.

900

: 01:21:46

uh OK, I need to play this out here because I've already taken a lot of your time.

901

: 01:21:55

ah So I had, of course, I had a question about your Gaussian process paper that you

authored a few years ago.

902

: 01:22:03

So maybe we'll go with that one because I'm very curious.

903

: 01:22:06

ah

904

: 01:22:08

I love GPS, I use them all the time.

905

: 01:22:10

So when I saw a about Gaussian process and causal inference with structure and latent

confounders, I needed to ask you about that.

906

: 01:22:20

yeah.

907

: 01:22:20

Sure.

908

: 01:22:21

Can you tell us what this is about and um how practical this is for people if they want to

use that in their own models?

909

: 01:22:32

Yeah, sure.

910

: 01:22:33

that paper is...

911

: 01:22:34

um

912

: 01:22:37

is several years old at this point.

913

: 01:22:38

think it was 2020 when I wrote that.

914

: 01:22:41

I forget exactly.

915

: 01:22:42

Yeah, I think the release date is 2021.

916

: 01:22:45

knowing academia, must have be you must have written in in 2020.

917

: 01:22:50

Yeah.

918

: 01:22:50

Yeah, which was, I guess, right.

919

: 01:22:52

That was just when the COVID was starting.

920

: 01:22:54

Yeah, which feels like an infinite amount of time ago now.

921

: 01:22:58

So that that paper was about combining two different ideas.

922

: 01:23:03

One is using

923

: 01:23:05

Bayesian nonparametric models, Gaussian processes to do causal inference, and encoding a

particular kind of assumption that we call structured latent confounding in the paper.

924

: 01:23:17

And the idea behind structured latent confounding is that there are many, settings where

you have a kind of hierarchical structure between ah objects and units in your data.

925

: 01:23:29

So for example, if I have data for different schools,

926

: 01:23:33

different students that each belong to that school, there's a many-to-one relationship

between uh schools and students, and there may be many different schools for which I

927

: 01:23:43

gather data on students.

928

: 01:23:44

em so that's super super useful in causal inference because you can make assumptions of

the form em any confounding variables, any latent confounding variables, if you can assume

929

: 01:24:00

that they're at the level of the school and not at the level of the student.

930

: 01:24:02

So for example, a school is in a particular geographic region.

931

: 01:24:07

So all the students in that school are subject to the same geographic region, the same

influence of that geographic region.

932

: 01:24:13

em So if you're willing to make that assumption, em you uh can still estimate causal

effects even though you have this latent confounder because of the structure you've

933

: 01:24:27

assumed.

934

: 01:24:28

em

935

: 01:24:30

One way of thinking about it is like, if I were to bundle students belonging to the same

school and look at the difference in their outcomes under different treatment conditions,

936

: 01:24:38

because they're all subject to the same school, that wouldn't be biased.

937

: 01:24:42

So then if I want to aggregate a bunch of different schools together into a kind of

holistic causal estimate, that intuition of being able to like isolate the effect of the

938

: 01:24:53

confounding variable at the school level still holds as you get to multiple schools.

939

: 01:24:57

uh

940

: 01:25:00

So the paper is really somewhat mechanical.

941

: 01:25:03

It's just showing how to combine these two ideas.

942

: 01:25:05

So you have a Gaussian process regression for the outcome model and a Gaussian process

latent variable model for modeling this latent confounder.

943

: 01:25:15

And what we found empirically is that it seemed to do pretty well for effect estimation in

these kind of hierarchical settings.

944

: 01:25:24

Now, that specific model, I think there are a lot of things that could be improved.

945

: 01:25:31

would take a long time to talk about all of them.

946

: 01:25:35

but i think the core idea of taking bayesian non-parametric models and combining them with

these kind of richly structured assumptions is a very powerful one.

947

: 01:25:45

um and that general idea i've used in, um for example, time series modeling or other kind

of hierarchical or relational models, um maybe not specifically with gaussian processes,

948

: 01:26:00

maybe with

949

: 01:26:01

um I don't know, some normalizing flow neural network or something else, and it seems to

work very well.

950

: 01:26:09

um So I think that the essence of that paper, I think, is generally useful, even if the

specific model is not as compelling.

951

: 01:26:16

um There's also a little, like if you really want to nerd out in the supplementary

materials, there's some technical detail about how to reason about uh what are called

952

: 01:26:27

individual treatment effects with Gaussian processes.

953

: 01:26:29

um

954

: 01:26:31

The subtlety there is that uh when you're sampling from a GP, you don't have access to the

structural function directly.

955

: 01:26:39

You kind of only have access to the marginal distribution of data integrating out the

structural functions.

956

: 01:26:46

um So you can't do the kind of default thing of like passing in an argument and holding

the noise fixed.

957

: 01:26:53

So there are some tricks in the paper for how to handle that.

958

: 01:26:56

Hmm.

959

: 01:26:56

Okay.

960

: 01:26:58

He's cool.

961

: 01:26:58

Yeah.

962

: 01:26:59

How in

963

: 01:27:00

How did you guys implement that at the time?

964

: 01:27:02

Were you using Pyro again?

965

: 01:27:04

No.

966

: 01:27:05

So um at the time I was a visiting student at um Fakash Mansinkas lab at MIT.

967

: 01:27:11

uh So uh Fakash's group um produced a probabilistic programming system called GEN, which

is written in Julia.

968

: 01:27:20

And that was led by Marco Cusumano-Tanner.

969

: 01:27:23

He's also a really sharp guy, great friend.

970

: 01:27:26

um

971

: 01:27:28

So at the time I was doing all my research work in that probabilistic programming

language.

972

: 01:27:32

has a lot of the same kind of ideas as Pyro.

973

: 01:27:35

mean, a lot of these probabilistic programming systems look very similar with um some

different emphasis on their designs.

974

: 01:27:42

um So there, uh it took me a long time to write the models because I didn't have a system

like Kyro at the time.

975

: 01:27:52

I was manually building out these counterfactual worlds.

976

: 01:27:57

in just a single program by hand.

977

: 01:28:00

Yeah, that's hard.

978

: 01:28:02

Which is hard.

979

: 01:28:04

Yeah.

980

: 01:28:04

Yeah.

981

: 01:28:05

Next time.

982

: 01:28:06

um OK.

983

: 01:28:07

Yeah, this is super cool.

984

: 01:28:10

as usual, the paper is in the show notes, folks, if you're curious.

985

: 01:28:15

um Sam, before asking you the last two questions, I'm also curious, in general, where do

you see the field you're in?

986

: 01:28:27

like everything we've talked about right now, basically intersection of probabilistic

programming and causal inference.

987

: 01:28:32

Where do you see that field going uh in the next few months?

988

: 01:28:37

And also if you have any advice for people interested in joining this field.

989

: 01:28:42

Yeah.

990

: 01:28:43

Yeah.

991

: 01:28:44

So I mean, it would be a little sad if I said the thing I'm most excited about is not the

thing I'm going to be working on.

992

: 01:28:51

So with that caveat in mind, uh

993

: 01:28:56

I'm very excited about this kind of engineering and science focused probabilistic

programming workflow.

994

: 01:29:03

The kind of models I'm really enthusiastic about are the models that combine kind of

physical laws with statistics or machine learning forward models.

995

: 01:29:14

And there are a number of different uh ways to combine these two different modeling

paradigms.

996

: 01:29:20

um I think ah I'm pretty optimistic

997

: 01:29:25

that these kind of probabilistic programming tools will lead to actual real discoveries,

like actually make science uh faster, more generalizable, make it easier for people to

998

: 01:29:38

communicate with each other.

999

: 01:29:39

um I think we should have a bold vision of where we want to kind of bring scientific

computing.

: 1000

01:29:46,857 --> 01:29:52,961

And I see these tools as uh a really fruitful path to go to a new era of scientific

computing.

: 1001

01:29:52,961 --> 01:29:54,662

um

: 1002

01:29:54,734 --> 01:30:10,265

As far as how to get involved, um my recommendation for anyone at a kind high level is to

touch software, get into the implementation, submit pull requests, even just submitting

: 1003

01:30:10,265 --> 01:30:12,407

issues, writing up examples.

: 1004

01:30:12,407 --> 01:30:15,689

eh I think it's really, really instructive to write code.

: 1005

01:30:15,989 --> 01:30:19,631

And it's also a great way to get a built-in community.

: 1006

01:30:20,052 --> 01:30:23,094

People are much more likely to listen to you and engage with you.

: 1007

01:30:23,350 --> 01:30:27,333

if you have a conversation attached to a pull request that contributes to their software.

: 1008

01:30:27,333 --> 01:30:42,275

um I think from an academic perspective, um I am seeing more and more academic communities

being open to this idea, both in science and engineering, causal inference, and machine

: 1009

01:30:42,275 --> 01:30:42,646

learning.

: 1010

01:30:42,646 --> 01:30:45,828

um I think people are excited.

: 1011

01:30:45,828 --> 01:30:50,312

um So it's a good time to write papers about these ideas.

: 1012

01:30:50,312 --> 01:30:52,846

um In fact,

: 1013

01:30:52,846 --> 01:30:56,667

One way of thinking about Cairo is it's kind of a paper factory.

: 1014

01:30:56,667 --> 01:31:04,469

Write a model in Cairo, spend a month doing it, write a paper, add to your h-index if

that's the game you're in.

: 1015

01:31:04,469 --> 01:31:09,931

um It's a little tongue in cheek, but I'm also sort of serious.

: 1016

01:31:11,912 --> 01:31:13,332

That makes sense.

: 1017

01:31:13,332 --> 01:31:14,259

Yeah, and I agree with that.

: 1018

01:31:14,259 --> 01:31:20,718

That's also a way for you to, if you're starting out, to get really good mentors um who

are...

: 1019

01:31:20,718 --> 01:31:30,024

are going to have your best interest at mind because, ah well, you are also contributing

to something they really care about.

: 1020

01:31:30,024 --> 01:31:31,285

It's a win-win.

: 1021

01:31:31,766 --> 01:31:32,546

Yeah.

: 1022

01:31:32,546 --> 01:31:32,896

Yeah.

: 1023

01:31:32,896 --> 01:31:38,630

And specifically, if people are interested in the specific work that I've done, feel free

to email me.

: 1024

01:31:38,630 --> 01:31:46,075

um I like to make time to just talk to people about their research interests, even if I

get nothing out of it.

: 1025

01:31:46,235 --> 01:31:49,297

I might say no, but it doesn't hurt to shoot an email.

: 1026

01:31:49,297 --> 01:31:49,658

Yeah.

: 1027

01:31:49,658 --> 01:31:50,402

Yeah.

: 1028

01:31:50,402 --> 01:32:00,910

careful though because you know millions of listeners so I've heard former guests who

can't leave their house now because people are just outside all the time asking for

: 1029

01:32:01,251 --> 01:32:15,973

selfies and autographs I have a very comfortable couch so that's okay awesome Sam, well

that was really great as you can see from the document I shared with you I still had a lot

: 1030

01:32:15,973 --> 01:32:19,616

of questions but I need to be respectful of all your time so

: 1031

01:32:20,058 --> 01:32:21,679

Let's call it a show.

: 1032

01:32:21,679 --> 01:32:26,960

And of course, let's ask you the last two questions I ask everybody at the end of the

show.

: 1033

01:32:27,020 --> 01:32:33,462

first one is, if you had unlimited time and resources, which problem would you try to

solve?

: 1034

01:32:34,182 --> 01:32:36,423

Yeah, I think that's fairly easy for me.

: 1035

01:32:36,423 --> 01:32:41,244

um I want to solve our environmental problems.

: 1036

01:32:41,344 --> 01:32:48,626

I'm motivated by the kind of technological and policy questions um behind.

: 1037

01:32:49,112 --> 01:32:52,185

climate change, but also beyond climate change.

: 1038

01:32:52,185 --> 01:33:02,343

I think things like just general pollution or waste disposal or energy scarcity are really

important problems.

: 1039

01:33:02,343 --> 01:33:06,196

And I think it's going to take technology and policy.

: 1040

01:33:06,196 --> 01:33:16,365

um And I'm hopeful that computational tools like those that I've worked on will help make

that happen.

: 1041

01:33:16,365 --> 01:33:18,566

um

: 1042

01:33:18,582 --> 01:33:23,824

In a certain sense, I see one of the core problems as getting people to collaborate.

: 1043

01:33:23,824 --> 01:33:29,906

um I think that's a really interesting kind of computational problem.

: 1044

01:33:30,447 --> 01:33:35,269

And I'm hopeful that probabilistic programming technology could help there.

: 1045

01:33:35,269 --> 01:33:36,789

It's not exactly clear what that looks like.

: 1046

01:33:36,789 --> 01:33:40,351

But if I was starting a PhD right now, that's what I would work on.

: 1047

01:33:40,351 --> 01:33:43,972

um Yeah.

: 1048

01:33:44,493 --> 01:33:44,973

Yeah.

: 1049

01:33:44,973 --> 01:33:46,373

Love that.

: 1050

01:33:46,373 --> 01:33:48,360

And second question.

: 1051

01:33:48,360 --> 01:33:55,332

If you could have dinner with any great scientific mind, dead, alive or fictional, who

would be?

: 1052

01:33:56,372 --> 01:34:09,406

I'd have to say Isaac Newton, which may be kind of a stereotyped answer, but I think

there's something about his discoveries that seemed very transformational, like a paradigm

: 1053

01:34:09,406 --> 01:34:09,796

shift.

: 1054

01:34:09,796 --> 01:34:17,078

um And I'm very curious about what makes, what leads to kind of

: 1055

01:34:17,354 --> 01:34:21,836

non incremental progress in scientific thinking.

: 1056

01:34:21,836 --> 01:34:24,798

think sometimes, you know, we need to take big steps.

: 1057

01:34:24,798 --> 01:34:30,961

And he was one of the most obvious big step scientific thinkers that I know of.

: 1058

01:34:32,522 --> 01:34:35,444

that's, yeah, to say the least, for sure.

: 1059

01:34:35,444 --> 01:34:36,004

Yeah.

: 1060

01:34:36,004 --> 01:34:37,529

Although I hear he was a weird guy.

: 1061

01:34:37,529 --> 01:34:40,306

So maybe he wouldn't be very pleasant to get dinner with.

: 1062

01:34:40,647 --> 01:34:42,598

Yeah, I don't know about that.

: 1063

01:34:42,598 --> 01:34:44,058

But I'll be curious when that happens.

: 1064

01:34:44,058 --> 01:34:45,449

You let me know, please.

: 1065

01:34:45,449 --> 01:34:47,310

I'll be happy to join the dinner.

: 1066

01:34:47,758 --> 01:34:49,718

Yeah, you're invited.

: 1067

01:34:51,298 --> 01:34:52,358

Awesome, Sam.

: 1068

01:34:52,838 --> 01:34:57,458

Well, that was really fantastic to have you on the show.

: 1069

01:34:57,818 --> 01:35:04,298

Please come back anytime you have something new in the world of probabilistic causal

inference.

: 1070

01:35:04,958 --> 01:35:09,218

The show notes are going to be full for this one again, folks.

: 1071

01:35:09,458 --> 01:35:14,318

So do give it a look if you want to dig deeper.

: 1072

01:35:14,578 --> 01:35:15,234

And Sam.

: 1073

01:35:15,234 --> 01:35:18,637

Thanks again for taking the time and being on this show.

: 1074

01:35:19,219 --> 01:35:22,762

Yeah, thank you very much for having me and doing what you do.

: 1075

01:35:26,210 --> 01:35:29,912

This has been another episode of Learning Bayesian Statistics.

: 1076

01:35:29,912 --> 01:35:40,398

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbaystats.com for more resources about today's topics, as well as access to more

: 1077

01:35:40,398 --> 01:35:44,480

episodes to help you reach true Bayesian state of mind.

: 1078

01:35:44,480 --> 01:35:46,441

That's learnbaystats.com.

: 1079

01:35:46,441 --> 01:35:51,284

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.

: 1080

01:35:51,284 --> 01:35:54,446

Check out his awesome work at bababrinkman.com.

: 1081

01:35:54,446 --> 01:35:55,638

I'm your host.

: 1082

01:35:55,638 --> 01:35:56,619

Alex and Dora.

: 1083

01:35:56,619 --> 01:36:00,818

can follow me on Twitter at Alex underscore and Dora like the country.

: 1084

01:36:00,818 --> 01:36:08,107

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

: 1085

01:36:08,107 --> 01:36:10,488

Thank you so much for listening and for your support.

: 1086

01:36:10,488 --> 01:36:12,800

You're truly a good Bayesian.

: 1087

01:36:12,800 --> 01:36:22,877

Change your predictions after taking information and if you're picking up less than the

basic, let's adjust those expectations.

: 1088

01:36:22,877 --> 01:36:25,068

Let me show you how to be a

: 1089

01:36:32,349 --> 01:36:35,890

Let's get them on a solid foundation

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube