Artwork for podcast Learning Bayesian Statistics
#134 Bayesian Econometrics, State Space Models & Dynamic Regression, with David Kohns
Causal Inference, AI & Machine Learning Episode 13410th June 2025 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:40:55

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

  • Setting appropriate priors is crucial to avoid overfitting in models.
  • R-squared can be used effectively in Bayesian frameworks for model evaluation.
  • Dynamic regression can incorporate time-varying coefficients to capture changing relationships.
  • Predictively consistent priors enhance model interpretability and performance.
  • Identifiability is a challenge in time series models.
  • State space models provide structure compared to Gaussian processes.
  • Priors influence the model's ability to explain variance.
  • Starting with simple models can reveal interesting dynamics.
  • Understanding the relationship between states and variance is key.
  • State-space models allow for dynamic analysis of time series data.
  • AI can enhance the process of prior elicitation in statistical models.

Chapters:

10:09 Understanding State Space Models

14:53 Predictively Consistent Priors

20:02 Dynamic Regression and AR Models

25:08 Inflation Forecasting

50:49 Understanding Time Series Data and Economic Analysis

57:04 Exploring Dynamic Regression Models

01:05:52 The Role of Priors

01:15:36 Future Trends in Probabilistic Programming

01:20:05 Innovations in Bayesian Model Selection

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Today, I'm excited to be joined by David Comte, a postdoctoral researcher in the Bayesian

workflow group under Professor Aki Dettari at Aalto University.

2

:

With a background in econometrics and Bayesian time series modeling, David's work focuses

on using state-space models and principled prior licitation to improve model reliability

3

:

and decision-making.

4

:

In this episode, David demos live how to use the AR.

5

:

squared prior, a flexible and predictive prior definition for Bayesian autoregressions.

6

:

We show how to use this prior to write your own Bayesian time series models ARMA,

autoregressive distributed lag or ARTL and vector autoregressive models VAR.

7

:

David also talks about the different ways one can generate samples from the prior to mimic

the different expected time series behaviors and look into what the prior

8

:

implies on many other spaces than the natural parameter space of the AR coefficients.

9

:

So you will see this episode is packed with technical advice and recommendations and we

even live demo the code for you so you might wanna tune in on the YouTube channel for this

10

:

episode.

11

:

And if you like this new format, kind of a hybrid between a classic interview and a

modeling webinar, well, let me know.

12

:

and let me know which topics and guests you would like to have for this new format.

13

:

This is Learning Vasion Statistics, episode 134, recorded April 24, 2025.

14

:

Welcome to Learning Basion Statistics, a podcast about Basion inference, the methods, the

projects, and the people who make it possible.

15

:

I'm your host, Alex Andorra.

16

:

You can follow me on Twitter at alex-underscore-andorra.

17

:

like the country.

18

:

For any info about the show, learnbasedats.com is Laplace to be.

19

:

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on

Patreon, everything is in there.

20

:

That's learnbasedats.com.

21

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.

22

:

See you around, folks.

23

:

and best patient wishes to you all.

24

:

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

25

:

Check us out at pimc-labs.com.

26

:

David Gomes, welcome to Learn Invasions Statistics.

27

:

Thank you very much, pleasure to be here.

28

:

Yeah, that's great.

29

:

I'm delighted to have you on.

30

:

I feel I could do a live show at Aalto University and just interview everybody one after

the other and then have one year of content and then just go to an island and receive Mai

31

:

Tai and just earn a passive income thanks to you guys.

32

:

I'm sure we can organize that Yeah, full disclosure.

33

:

I would not be able to leave with the leader the podcast that would not work at all It's

not a good business model.

34

:

Don't do that people ah But if you can have fun then then yeah do it um Now it's great to

have you on I'm gonna have also team.

35

:

I don't know if that's how you printed his name, but I'm gonna have Timo Timo on the show

in a few weeks also um

36

:

Osvaldo has been has been here.

37

:

uh Akki, obviously, and and I think and Noah should should come on the show one of these

days to talk about everything he's he's doing.

38

:

So no, I like I need to I need to contact you.

39

:

So anyways, today's you, David, thanks.

40

:

Thank you so much for taking the time I've been reading your

41

:

You'll work for a few weeks now because you're doing a lot of very interesting things

about auto-aggressive models, state-space models, how to choose priors, and so on.

42

:

So that's really cool.

43

:

We're going to talk about that in a few minutes, and you're going to do a few demos live.

44

:

um So if you happen to be in the chat because you're an LBS patron, um please don't be shy

and introduce yourself in the chat, and then you can ask questions to David.

45

:

But before that, David, as usual, let's start with your origin story.

46

:

Can you tell us what you're doing nowadays and how you ended up working on this?

47

:

Yeah, thanks.

48

:

So pretty much my whole educational background is in econ.

49

:

So I did my bachelor and master's and PhD later on also on econ, but um always with a

flavor of econometrics.

50

:

So I was early on interested already in my undergraduate studies about statistical

relationships.

51

:

uh Particularly back then I was more interested in things like the relationship between

debt relief allocation and then later on the country's development.

52

:

So that involved a lot of uh what we call an econ panel data methods, which are really uh

spatial type of models.

53

:

uh

54

:

Then during my graduate studies, I was then more interested in time series models.

55

:

I really just loved kind of the simplicity and the mathematics of working through some

discrete time series models.

56

:

And um that is, of course, widely applicable to many things in econ.

57

:

at some point, well, I got then really interested in uh thinking about uh how can you

apply higher dimensional

58

:

time series models to problems where you have maybe lots of data.

59

:

So especially like in finance and macroeconomics, you have a lot of situations where you

have very short time series, but potentially a lot of explanatory factors.

60

:

And so then classical methods tend to be fairly weak in terms of power, but also then in

terms of regularizing the variance sufficiently of the model to get useful predictions.

61

:

So then I really delved into Bayesian Econometrics with, I suppose, my first mentor, you

wouldn't call him that, Gary Coupe at Strathclyde University in Scotland.

62

:

So I did my graduate study in Edinburgh and then he, Gary Coupe, was in Strathclyde and uh

had the great honor of doing this Bayesian Econometrics course with him.

63

:

And it was probably the best six weeks of my academic life at that point.

64

:

I just really loved his stuff.

65

:

uh

66

:

great website by the way, with a lot of resources if people are interested in Bayesian

time series econometrics, some panel data stuff as well, a lot of multivariate stuff in

67

:

fact, so a lot of like vector auto regressions, but maybe we can talk about that later

too.

68

:

yeah, basically starting with the backgrounds that Gary gave, I delved further and further

into Bayesian time series econometrics.

69

:

And that's where I'm pretty much

70

:

tradition still.

71

:

after that, did my PhD also in Scotland.

72

:

And there, my sole focus was then on Bayesian methods for time series modeling, and then

also some modeling in the direction of quantile regression as well.

73

:

Okay, interesting.

74

:

Yeah, I didn't know you were your icon heavy.

75

:

That's interesting.

76

:

That's a bit like, yeah, Jesse Grabowski has the same, a similar background.

77

:

So I want to refer people to episode 124 where Jesse talked about state-based models.

78

:

All of that, it's one of his specialties.

79

:

So for background about that, we'll talk about that a bit today again, but for more

background information on that, Nisner, you can refer to that as a prerequisite, let's

80

:

say, for this episode with David.

81

:

And yeah, definitely if you have a link to uh Gary Coop's material, feel free to add that

to the show notes place because I think it's going to be very uh interesting to people, at

82

:

least to me.

83

:

uh I love time series and vector of accuracy stuff and so on.

84

:

And JC and I are working a lot to make the Pimc state space module.

85

:

better and more useful to people.

86

:

yeah, if we can make all that easier to use, that's going to be super helpful.

87

:

yeah, awesome.

88

:

Feel free to add that to the show note.

89

:

And thanks for this very quick introduction.

90

:

That's perfect.

91

:

That's a great segue to just uh start and dive in, basically, because you have a um case

study for us.

92

:

today and you're going to share your screen.

93

:

uh Maybe we can start with a quick theory of state space models and mainly geared to

towards what you're going to share with us today.

94

:

you can take it over David, feel free to share screen already or a bit later.

95

:

So perhaps before I go into the state space specifics, maybe I can first comment on like

maybe what we're still working on today.

96

:

And then I think that that

97

:

give you some background at least and why we're interested in still thinking about state

spaces.

98

:

because part of the reason why I entered the research realm around, well not around, also

was that uh Aki was working a lot on these kind of Bayesian workflow problems.

99

:

So how to build models in various circumstances, how to robustly draw inference.

100

:

And one thing that was, I think, direly missing from also the research I was doing at the

beginning of my PhD was how to safely build out these time series models.

101

:

Like, uh how do you set priors on things that you can interpret?

102

:

then that allows you then oftentimes to add more complexity to the model without

sacrificing in predictions or at least statistics that involve predictions.

103

:

And so uh one thing that we're working on at the moment very in a focused sense is this

idea of predictively consistent priors, meaning that you start out with some notion of

104

:

understanding about a statistic on the predictive space that might be something like the R

squared statistic.

105

:

This measures the amount of variance fit.

106

:

So this is a statistic between zero and one.

107

:

often it's bounded in that space uh for many models at least, and it measures uh the

variance that the predicted term of your model is fitting.

108

:

So let's say the location component of a normal linear regression over the total variance

of the data.

109

:

So the higher the r2 is, the better.

110

:

So how much variance can I fit as a fraction between 0 and 1?

111

:

And uh that kind of

112

:

idea has been developed also in the Bayesian sense, where Aki and Andrew Gellman have

worked out the kind of methodology behind this Bayesian R squared, so Bayesian

113

:

interpretation, which really is just a posterior predictive of the predictor term or uh

the variance of your model over the uh entire predictive variance, including also the

114

:

error term and so on so forth.

115

:

And what we recognize is that this statistic is well understood in many domains.

116

:

So in econ, in the biomedical sphere, in lot of social sciences, people usually have a

model where they understand this notion of R2, this notion of variance fit.

117

:

So that goes even beyond just the kind of classical normal linear regression case, but

also for

118

:

general GLMs.

119

:

There are certain definitions of R-square that exist and people are able to interpret.

120

:

And what we are doing in our group at the moment a lot, at least I'm working on it a lot

and with Noah also, we're looking into how can you set a prior on the R-squared and from

121

:

that point of view, citer out the or define the priors and the rest of the model.

122

:

So you start from a notion of understanding of R squared and perhaps some prior about

this.

123

:

And given this, how can you find priors of all the other components in the model?

124

:

Yeah.

125

:

Yeah, yeah.

126

:

I really love that.

127

:

That's very interpretable.

128

:

And that's also really how you would define models most of the time you think about them.

129

:

Because anybody who's worked with a model with an AR component in there knows that and

have done

130

:

prior predictive checks, these checks become crazy in magnitude if you have just an AR2.

131

:

AR1 is fine with somewhat normal priors, but then if you have an AR2 component and I

encourage you, if you stan or PMC, go to the stan or PMC website, just copy paste the code

132

:

for an AR model and then just

133

:

uh sample prior predictive samples from there with an AR2 and you'll see that if you use a

normal 0,1 on the coefficients the magnitude of the priors just becomes super huge with

134

:

the time steps and that's the big problem uh and one of the problems that you are trying

to address with the AR squared prior and I think that's yeah that

135

:

The way you it, I really love it because it's also very interpretable and intuitive.

136

:

Yeah.

137

:

And then again, it's predictably consistent.

138

:

you have a notion of R squared.

139

:

And if you generate from your model, so if you don't condition your data, you just do this

push forward distribution where you sample from your prior, plug it into your model,

140

:

generate predictions, then the prior predict of R squared will align with your prior

expectations.

141

:

your prior knowledge of uncertainty of r squared, say, the shape of the distribution.

142

:

And that's exactly what we're doing in this line of research for time series models, in

particular, stationary time series models.

143

:

Yeah.

144

:

Yeah, yeah, Yeah.

145

:

So thanks a lot for this background.

146

:

I think that's indeed very, important.

147

:

And so now, do you want to dive a bit more into the state space models in the case study

you have for us today?

148

:

Yeah.

149

:

Let's do it.

150

:

Awesome.

151

:

Let's go.

152

:

So for a

153

:

people live in the in the recording you'll be able to see david's screen and otherwise if

you are watching on youtube um this episode well you you also see that in the in the video

154

:

you'll see david's live otherwise if you are listening to the episode well for that part

of the episode

155

:

Encourages you to go on YouTube and check that out as soon as you can because that's

probably gonna be a bit easier to follow

156

:

Alright, I'll just share my entire screen, think that will be easiest.

157

:

Yes, we are on.

158

:

So this is the dynamic regression case study um that you see on the screen.

159

:

You have that listeners in the show notes of this episode.

160

:

So the link is in there.

161

:

It's on David's website.

162

:

And now David, you can take it away.

163

:

All right.

164

:

So.

165

:

Yeah, I think we covered some of the basics already with the R squared stuff.

166

:

You can define this for AR type regressions and MA and ARMA type models.

167

:

uh There are some special mathematical things you have to take into account for.

168

:

this time series structure implies a conditional variance, which you have to include in

your prior definition.

169

:

uh But here we're looking at something that's even one step further.

170

:

So we go from

171

:

model that has as the target yt, this is a scalar, uh we relate it to a set of covariates,

uh so those are the x's, they are here dimension k times one per time point t, and we have

172

:

this unknown regression vector beta t.

173

:

So so far so good, this is basically the same almost as just your normal linear regression

case, but indexed by time.

174

:

The special thing about this model is that uh the coefficients themselves, the betas, they

evolve according to a latent state process.

175

:

So this is the second uh row in equation one.

176

:

uh This says that the coefficients uh vary across time according to an AR1 process.

177

:

And this allows for the fact that the relationship

178

:

between your covariates and your targets may change over time.

179

:

So like a famous uh example in econ, for example, is that the response of uh interest

rates that the uh central bank might set to do economic policy might change in response to

180

:

inflation, which is one of the main drivers by policy changes over time, because maybe the

targets shift of this relationship um or

181

:

there are some extra things happening like COVID, for example, which somehow uh distort

this relationship for a little time.

182

:

And uh this time-varying process of how these coefficients evolve is then regulated, if

you will, by an AR1 process.

183

:

And in fact, those people who know time series, they will notice that since this beta

vector is k times 1, you essentially have a vector order regression here in the second

184

:

line.

185

:

But what we do in the paper for the simplicity of the math, and also what I do in this

case study, is that I assume that this uh coefficient vector, which is called the uh state

186

:

transition matrix, phi, just k by k, is diagonal.

187

:

So it's only non-zero across the diagonal component.

188

:

What this means is that each individual uh

189

:

coefficient is only related to its own past individual coefficient, not other coefficients

also.

190

:

Right, okay, okay, yeah.

191

:

And if it were not the case, would we be in the presence of a VAR model then?

192

:

Vector, or TOR, or aggressive?

193

:

So it's still a VAR model, it's just assuming that um all the other...

194

:

coefficients are unrelated except for maybe this error term here.

195

:

This kind of gives you some further way how to impose non-zero correlation between the

coefficients.

196

:

Right, yeah, yeah, yeah.

197

:

Okay.

198

:

So here, but here we assume that the different...

199

:

So we have k times series here that are modeled at the same time, right?

200

:

Right, so the target is still a scalar, but then the covariates are then k times 1, right?

201

:

And then the process for the covariate coefficients, that is then a vector autoregression.

202

:

Right, yeah.

203

:

So the beta t's are modeled with a vector autoregression, but here we impose the

correlation between the betas, the k betas,

204

:

to be zero when it's not on the diagonal.

205

:

Well, it's implied by the structure of the state transition matrix.

206

:

Yeah.

207

:

So that means the later process is dependent only on the previous version.

208

:

of the covariate, the previous value of the covariate.

209

:

Yeah, coefficients, exactly.

210

:

Yeah, and the k covariates don't interact, basically.

211

:

No, exactly.

212

:

This keeps the math nice and contained.

213

:

yeah.

214

:

And so that means um we have k latent states here, and we have k latent states because we

have

215

:

k covariates.

216

:

Yes.

217

:

Correct.

218

:

So you can expand this in several ways.

219

:

You can also, what we do in the paper as well, the AR2 paper is that we allow also for

time varying intercepts.

220

:

So you would have like another, let's say tau coefficient here, and then that could also

have its own AR process.

221

:

This would be then closer to this what you mentioned before, Alex, this structural time

series models, where you have uh multiple state processes modeled simultaneously.

222

:

Right, yeah.

223

:

So I think I mentioned that off the record.

224

:

I'm going to say it again on the record.

225

:

uh Yeah, basically, we're here.

226

:

The idea would be to have um like

227

:

each latent state, so each of the K latent state being modeled with not only an

autoregressive process as we have here, but maybe you have a local linear trend and then

228

:

you add to that an AR process to pick up the noise.

229

:

Because the issue of just having the AR process is that when you are interested in out of

sample predictions is that...

230

:

the out-of-sample predictions of the AR are usually not very interesting uh because they

pick up the noise.

231

:

And so that's not really what you're interested in when you do out-of-sample predictions.

232

:

So here, if you have a structural time-series decomposition, you could be able to

decompose basically the signal and the noise between these different processes.

233

:

And so here, yeah, each of your case states

234

:

would be modeled like that with one structural time series, but you would still have an

emission.

235

:

So like the YTs we see here, like the data usually in these literature are called

emissions.

236

:

um And so your emission would still be 1D, right?

237

:

It would still be a scalar emission.

238

:

That's correct.

239

:

Okay, cool.

240

:

You can extend that as well, so you can make y also multivariate.

241

:

That's a different beast, maybe we can talk about that later.

242

:

Yeah, yeah.

243

:

These beasts start to be very big models where you have covariation everywhere at the...

244

:

What is the...

245

:

I always forget the name of the second equation, so you have the latent state equations

and the emission equation.

246

:

Yeah, like the...

247

:

the process equation and the emission equation.

248

:

Is that the right term?

249

:

Well, every literature has their own definition.

250

:

I've heard that as well, emission.

251

:

I've never used it actually, be honest.

252

:

So in econ, we call the y equation this one, the observation equation, and we call this

the state equation for the betas.

253

:

Right, yeah.

254

:

Yeah, so I've seen emission and observation equation used interchangeably and then the

latent equation.

255

:

yeah, dude, so that people have the nomenclature right and clear.

256

:

Yeah.

257

:

OK, cool.

258

:

So that's all clear, hopefully.

259

:

So let's continue with the case study.

260

:

Right.

261

:

And so one of the big problems here is that this unknown in the state equation, so the

betas are explained by the past betas plus another error term.

262

:

That's k dimensional.

263

:

This has a covariance, which we call big sigma subscript, sorry, big sigma subscript beta.

264

:

And these, eh in this case, I'm also just making a diagonal structure just for the

265

:

for simplicity of everything, they determine how wiggly the states are uh because they

inject noise into the state process and the larger these variance terms are, so those are

266

:

the diagonals across the error covariance term of the state, the larger these are, the

more variable the state process is.

267

:

uh

268

:

There's a huge literature on how to set priors for these oh because if you let them be

fairly uh wide, then what you'll find is that a horribly overfitting state space model

269

:

because you're essentially fitting all the noise in your data by making the state process

as wiggly as possible.

270

:

Alex, I think you're on mute.

271

:

Right, sorry.

272

:

uh Yeah, yeah, that makes sense.

273

:

So basically, if you have two y of a prior on the sigma from the latent state equation, um

and I think also I've seen in the literature this matrix, because it's often written also

274

:

in matrix form when you have...

275

:

I hate the names of these matrix because they don't mean anything.

276

:

think it's like F and Q and H and R.

277

:

It's like, invented these names?

278

:

It's terrible.

279

:

They tried to make it as unaccessible as possible for newcomers.

280

:

It's completely stupid.

281

:

Anyways, yeah, so you have like the, like, and these matrix on the, on the, on the

location of the normals.

282

:

So F and.

283

:

and h usually in the literature they are called also uh the weights of the processes and

on the right so the noise um well I think they are called drifts also uh there is a lot of

284

:

different names for that so that's why I get that out of the way for for people right now

but basically here we're talking about the noise of the latent state equation

285

:

So this is the Sigma Beta in your case study.

286

:

people would probably see that also in the literature as the matrix Q.

287

:

And so what you're saying is that if the priors on this matrix are too big, then basically

your AR process will explain all the noise.

288

:

in your data and your observational noise, so the sigma on the emission equation, the

observation equation exactly, which is the sigma in your case study, which also people

289

:

will see as...

290

:

um

291

:

think R, the matrix R in the literature.

292

:

Everybody knows R stands for noise.

293

:

so yeah, then that means this matrix will be really small.

294

:

And if you just take that for granted, you would uh just interpret that as, there is not a

lot of noise in my observational process.

295

:

Yeah, correct.

296

:

So this is the scalar, just to be clear.

297

:

um

298

:

Yeah, exactly.

299

:

you know, typically the, what the previous literature does is it says, let's put an

inverse gamma.

300

:

That's what I call IG here.

301

:

Inverse gamma prior on the state innovation variances.

302

:

the diagonal of the state variance covariance.

303

:

Let's put an inverse gamma on this and uh be fairly un-reformative or

304

:

you know, in quotation marks, something like a inverse comma 0.1, 0.1.

305

:

And I think the listeners of your podcast will probably immediately know, oh, this is a

bad choice because you have a very long tail and along the positive reels.

306

:

And if your likelihood information is not very strong, that identifies the variation of

the states, then the prior will dominate and you'll end up with hugely.

307

:

a huge variance on your states and therefore overfitting.

308

:

And I can recommend this paper in particular.

309

:

I'm hovering over it uh in the case study.

310

:

It's by Sylvia Früderschnatter, great econometrician, a statistician, and her colleague

Helga, um who rewrite the space process into its non-centered form.

311

:

which allows you to put normal priors on the state standard deviation, which feature then

in the observation equation.

312

:

This might sound a bit esoteric without kind of seeing the math, but um they go into much

more detail as to why setting as an inverse gamma prior on the state variances is a bad

313

:

idea.

314

:

And um we take kind of this idea one step further in that we say, okay, how about we

315

:

uh starts in fact from an R squared prior over the entire process.

316

:

So something that explains the variation of this guy.

317

:

So the, the state and covariate contribution over the variance of the entire data, because

this is something that we often can interpret, say like, we know our model explains, let's

318

:

60 % of the variation in our target.

319

:

And then from prior on this, what is the prior prior on

320

:

the state variances.

321

:

And uh just to be clear what y priors will entail, it will entail that the variance of uh

this term, the predictor term in the observation equation, so x times beta, will dwarf the

322

:

variance of the observation noise, which is in this case because it's just a normal model,

uh this variance plus the variance of the observation model.

323

:

Yeah, yeah, So that's radially 2.

324

:

what we just talked about in the dead variance becomes too wide.

325

:

Exactly.

326

:

Oftentimes you'll find that those overfitting models will in fact result in a R squared

that is very close to one.

327

:

Basically saying that, you're able to explain all of the variation of data and this is

often highly unrealistic.

328

:

And particularly if you think about this with time series models where, and let's just

briefly go back to the AR.

329

:

so the simple autoregressive type model case, if you add more lags, so more information

about the past, you wouldn't think that you can better predict the future, right?

330

:

Oftentimes, only the first couple of lags or whatever the time series structure is, is

good for prediction.

331

:

And then if you increase the number of lags more and more, you wouldn't think that you're

going to get more more variance of future data, right?

332

:

So in that sense, uh setting a reasonable prior on the R squared is actually a good thing

also with time series.

333

:

And this is kind of preempting some maybe screams that the audience has, like particularly

those who are more trained in classical time series econometrics, they'll tell you, R

334

:

squared is not a good thing to look at for time series.

335

:

And I agree when the model and data are non-stationary.

336

:

because then the variance goes to infinity and this R-squared metric is not well defined.

337

:

But in the case where you have stationary time series, the variance will be strictly uh

below infinity and therefore this R-squared metric again makes sense to use.

338

:

Yeah.

339

:

Okay.

340

:

But...

341

:

Sounds a bit like, you yeah.

342

:

You could be R-squared hacking with that, basically.

343

:

Yeah, exactly.

344

:

I mean, that's what people are afraid of with this R-squared thing, right?

345

:

Because they understand from their classical training, if you just include more more

covariates, then by definition, R-square is monotonically increasing with the amount of

346

:

covariates you include.

347

:

However, in the probabilistic sense, you also have a uh probability distribution,

posterior probability distribution over your R-squared.

348

:

And here, you can regularize with the prior

349

:

away from this tendency.

350

:

Yeah, that makes sense.

351

:

And um yeah, so if we think along the lines of what the R squared metric looks like, if we

go through the math that we present at the paper, then we get this ugly looking fraction.

352

:

And this is basically telling you that the R squared is a function

353

:

of, let me zoom in a bit, as a function of the state variances, the state AR coefficients,

phi, and the observation noise.

354

:

And what we've done to arrive here is that we integrated out the data, so the x's and the

y's, but also the state

355

:

Realization itself.

356

:

So you'll recognize that the betas don't appear here, but only the variance of the betas.

357

:

Yeah.

358

:

And the nice thing about this expression really is it's pretty much the total variance of

your predictor term.

359

:

So that's Xt times beta t over the variance of your predictor term plus one.

360

:

And so if we wanted to set a certain prior, let's say a beta prior on this R squared

metric, then we can figure out by change of variables, what is the implied prior on the

361

:

state variances on this kind of total variance term here.

362

:

Yeah.

363

:

And that's very cool because then you can just basically define that prior on R squared.

364

:

in your model, right?

365

:

And then I guess just use that as the prior in like use that in the priors for the betas

in the model directly.

366

:

Correct.

367

:

Yeah.

368

:

And so then I can, guess, and I,

369

:

I think you give some recommendations in the paper for recommend if I remember correctly

to set the prior R squared and then just basically do prior predictive checks to see that

370

:

it makes sense in your case uh and then go from there and then you can fit your model.

371

:

Yeah, exactly.

372

:

In the paper, we'd like to recommend this uh beta one third and three prior.

373

:

parabenorized here in terms of location and scale lipid distribution.

374

:

I think in PMC you also have that coded up, Yeah.

375

:

The beta proportion.

376

:

Yeah, exactly.

377

:

So that would be familiar in that case uh because it has a lot of the mass towards uh an R

squared below 0.5.

378

:

It has a very gentle slope.

379

:

So if the uh likelihood is pulling you

380

:

in one direction, you're not going to overwhelm, most likely, the likelihood too much with

an aggressive slope on the R-square space.

381

:

You're of uh weakly, let's say, regularizing toward ah lower R-square uh values and

therefore likely have overfitting.

382

:

Yeah, yeah, No, that's very cool.

383

:

And so you're going to show a bit now the implementation on different.

384

:

uh...

385

:

different data sets but also for people using PIMC Austin, Rushford, Deed,

386

:

like, Kodita basically that prior in Pimc uh on his blog I linked to the blog post in the

show notes uh that's a very good very very good very good blog post so I definitely

387

:

encourage you to check that out uh his blog post though is uh limited to one part of your

paper you do more than that in the paper and you'll get to that

388

:

in a minute, David, but yeah, like that's a good introduction.

389

:

think in his blog post, Austin mainly, so he cuts up the prior and then generates data

based on three different processes, different data generating processes, and then check

390

:

that we can recover the parameters.

391

:

of the three different processes with the R-Square prior.

392

:

You do that, but you also do more than that.

393

:

that's what we're also going to talk about today.

394

:

And so just briefly walking through the machinery, then we are able to set the uh prior

process, how it would look like in the Stan program.

395

:

So we have then this variance term here in particular, which comes then from this whole

R-squared machinery.

396

:

And if you look very closely at these two equations, so one is this R-squared definition

and the other is the prior variance on the latent spaces, you'll see that here are two

397

:

factors, which if they are included, they allow you to get rid

398

:

of most of this very unwieldy looking stuff in the R-square definition and allows you then

to isolate uh only the variance terms.

399

:

uh But yeah, so there's more about this in the paper.

400

:

I would encourage you for those who want to dig more into this to have look at that.

401

:

But the only other important thing I want to mention at this point is that you have

another part of this R-square prior that allows you to decompose uh

402

:

um the variance.

403

:

And this you can think of as determining the importance of the individual model components

and um what they do mechanistically.

404

:

states, right.

405

:

Exactly.

406

:

And what they do mechanistically is that they allocate the variance.

407

:

Right.

408

:

Yeah.

409

:

So basically which part, which states contributes more variance than another state.

410

:

Exactly.

411

:

Because basically, think ultimately you can't really determine the exact value of the

variance that's contributed by each state.

412

:

can't just...

413

:

In absolute, you can only do that in relative.

414

:

The proportion of the variance is coming from that state, but you cannot really say it

from an absolute perspective, I guess.

415

:

As in marginal?

416

:

Yeah, that would be hard.

417

:

You can make statements about the entire variance of all the

418

:

And you can make a statement about relative variance in a way.

419

:

that's what this decomposition lets you do.

420

:

Yeah, exactly.

421

:

Yeah, because I think if you want the absolute decomposition, that's just undetermined.

422

:

So because an infinity of different decomposition of the variance of the different states

will give you the same total variance.

423

:

So yeah, I think just the proportion is going to be identifiable, which is what this is

doing.

424

:

That's why you're doing a direct clip prior on this side term.

425

:

Yeah, Dirichlet makes sense here because you're trying to find weights that allows you to

decompose this variance and then Dirichlet is just a natural prior of a simplex, really.

426

:

But yeah, I mean, you mentioned something about identifiability.

427

:

We're not taking hard stances on this.

428

:

It can be a problem, I think, in state spaces more generally.

429

:

Like how can you identify where the variance comes from?

430

:

But in general, think putting a prior on the weights to decompose the variance makes

sense.

431

:

You want the data somehow to inform also that, I think.

432

:

Yes.

433

:

No, for sure, for sure.

434

:

Yeah.

435

:

mean, identifiability in general is hard for time series models.

436

:

It's hard also if you use GPs on time series models.

437

:

It's just that time search data is hard, and you don't have a lot of data in a way.

438

:

You need a lot of covariates.

439

:

If you can have external covariates, that helps a lot.

440

:

um But whether you're using state space models or GPs, I think GPs is even harder because

this is semi-parametric, where state space is, you have more structure by definition.

441

:

um But yeah, identifiability is always a.

442

:

a big issue here and the more informative data and priors you can have the better.

443

:

And I'll mention that Arno Solen at Aalto has investigated the link between GPs and state

spaces and there's a very close computational link.

444

:

can find the posterior state space as if were, sorry, the posterior of a GP as if it were

a state space.

445

:

Right, yeah.

446

:

And you can think of also the state space as being

447

:

somewhat of a discrete approximation to the continuous GP.

448

:

can also, because you also have like, let's say like a variance of the GP in a way, the

latent function, the, in this case, the, the state itself competing with the variance of

449

:

the observation model itself.

450

:

Yes.

451

:

Yeah.

452

:

Yeah.

453

:

And that definitely happens with GPs, right?

454

:

Yeah.

455

:

They can, they can pick up.

456

:

like they are so flexible that they can pick up the noise.

457

:

Also, so you have to be very careful on the priors.

458

:

And so, and then also, and if you add categorical predictors to that, it's very hard

because categorical predictors are not really predictors anyways, you know, it's like, I

459

:

find they don't really add a lot of information.

460

:

They just, you know, break down the model in different subsets.

461

:

ah But you also need, like if you can have continuous predictors indicate like.

462

:

informing the different subsets that definitely helps.

463

:

Because otherwise, yeah, the GP can fit anything, so it can definitely feed the noise in

your different subset anyways.

464

:

But yeah, I'm not surprised that state space is generalized to GPs.

465

:

It seems to be a lot of their universe, everything in the GP in the end.

466

:

I'm pretty sure that's what Black Holes are.

467

:

They are just cities inside.

468

:

So yeah, let's continue.

469

:

think you can now go to the application part of that, right?

470

:

And you have an inflation forecasting example for us.

471

:

Exactly.

472

:

And there are some other priors also in literature.

473

:

So those who are familiar with econ might recognize this Minnesota prior.

474

:

Those who generally follow also the uh shrinkage literature, they'll know the regularized

social prior.

475

:

So we're incorporating these here too as a part of comparison.

476

:

Mm hmm.

477

:

Yes.

478

:

And then inflation forecasting.

479

:

So here, some very crude code.

480

:

I'm not particularly um happy about it, but I think it does the job.

481

:

It loads data from the um from one of the feds in the US with this red R package.

482

:

And I'm just have a lot of functions here.

483

:

So let me just skip that.

484

:

And where the data are directly loaded uh from

485

:

the St.

486

:

Louis Fed website.

487

:

I've changed it now locally to first download it and then upload it because I find that

these, like downloading from links is not the safest thing to do.

488

:

So here in this particular instance where I'm showing this, just having first download the

data and then uploading.

489

:

And we have a set of 20 covariates.

490

:

So 20 covariates and then therefore also 20 states.

491

:

because those are the regression coefficients which we allow to vary over time.

492

:

OK, yeah.

493

:

Yeah, I think that's interesting.

494

:

Yeah, especially that plot here where you showed the data.

495

:

you have the outcome variable is inflation, right?

496

:

Correct.

497

:

And then you have 20 other time series.

498

:

And each of these time series are a covariate, right?

499

:

Yeah, each of these time series are, so that's just the covariate value of the X's in the

ah state space equation I showing above.

500

:

It's different data, it relates to financial market information, you have some sub parts

of inflation in here as well.

501

:

uh Industrial production for those who are in economy and macro, they'll know that

industrial production is super important for explaining ah macro data movements.

502

:

So that's included here.

503

:

So that's like, if we look at just one of them,

504

:

for instance, industrial production that would be like for each year, I think it's monthly

data, right?

505

:

For each month, what is the value of the industrial production, whatever the value, the

scale of it is.

506

:

And so if you're looking at the screen, have like each, these are lines, but basically

these are just points and then each point gives you that value.

507

:

And so for the corresponding value of the industrial production, you have a corresponding

value at that same time point of the inflation, which is the outcome, which is the y of

508

:

the equation, of the reservation equations.

509

:

And then the XTs are all the other variables.

510

:

So 20 of them in total, which is k in the equations we saw before.

511

:

here, k equals 20.

512

:

so the matrix we talked about before.

513

:

the phi matrix that is also called the R matrix in the literature, the H matrix, sorry, in

the literature, uh which is the matrix of the latent state process.

514

:

uh Could be a full matrix, full rank matrix, but here it's on your diagonal matrix.

515

:

And so the parameters in that matrix are called the betas.

516

:

So, and betas are indexed by K and T.

517

:

Is that all correct?

518

:

Can we continue?

519

:

I think that was pretty much, yeah.

520

:

Okay, cool.

521

:

Yeah, so, you know, I think it's always a good point in the workflow to plot your data,

just to know like, oh, is there maybe an outlier somewhere that looks fishy?

522

:

You know, you see, for example, here all the time series have usually like this S shape

around COVID, this is:

523

:

So you know that there's a lot of funky things expected around this time.

524

:

Yeah, and I'm guessing you're scaling all the variables, standardizing all the variables

before feeding that to the model so that it's all in the same scale?

525

:

So here I'm following the recommendations of the St.

526

:

Louis Fed.

527

:

They have a set of very good researchers who look at what is the best transformation for

the data, such that they are stationary.

528

:

Or, know, weekly stationary at least.

529

:

And um generally if you do econ analysis for macro time series data stuff, I would

recommend just follow the recommendations of the statistical agencies.

530

:

In this case, the San Luis Fed.

531

:

Interesting.

532

:

um They are kind of the data authority on much of the uh US stuff.

533

:

Interesting.

534

:

Yeah.

535

:

Nice.

536

:

So here, do you do?

537

:

you need, do you, like, what is the recommendation?

538

:

Are you doing any pretreatment or do you just...

539

:

follow their...

540

:

So they have codes.

541

:

They have codes.

542

:

They mean different things.

543

:

Like, let's say one time difference, maybe growth rate calculation, maybe you leave it

entirely unprepared.

544

:

So it depends on the time series.

545

:

I...

546

:

Okay.

547

:

Okay.

548

:

Exactly.

549

:

Interesting.

550

:

Yeah.

551

:

Because these time series have very different scales.

552

:

So...

553

:

Exactly.

554

:

Yeah.

555

:

So yeah, that's definitely something I would be concerned with, especially when you give

that to HMC.

556

:

And so the R-square prior, in fact, can be made robust to the scale of the data by

including the variance of uh your covariate information in the prior itself.

557

:

So you can scale it properly.

558

:

Which is not done here, right?

559

:

I've not seen that in the equation.

560

:

Let me see.

561

:

I think for simplicity I assumed that the covariates all have variance 1, but in the model

it would be then another fraction here divided by the variance of x.

562

:

But more on that also on the paper.

563

:

So that's where we have that more.

564

:

Okay, so now we know what the data are looking like.

565

:

So just to motivate where time variation, the coefficients comes from, what I've done here

months, starting from:

566

:

a univariate regression.

567

:

So just our target inflation against, let's say, industrial production.

568

:

save the coefficient value and roll until the end of uh my data availability.

569

:

And what we would find if there's indeed variation in the coefficients is that we also

find variation here.

570

:

And so that means basically you run univariate regression for each time point

individually?

571

:

Exactly.

572

:

Like in a for loop?

573

:

Exactly.

574

:

That's exactly what this guy's doing.

575

:

So the model doesn't know anything about like...

576

:

time correlation.

577

:

Why it's like you just run 1990 against 1990 then 91 against 91 again and yeah okay I mean

it's not a statistical guarantee however if you would find that those lines are all just

578

:

like straight you know then you probably are not going to find much variation even if you

do all the bells and whistles that we offer kind of you know yeah yeah no I mean and

579

:

that's okay that's a good check right because these models are not

580

:

The models we're talking about here are not trivial.

581

:

Each time you need to do state spaces or Gaussian processes, it's not trivial.

582

:

So if you don't have time variation in your data, that's better.

583

:

Honestly, your life is going to be easier.

584

:

uh yeah, that's interesting.

585

:

I didn't know about that method.

586

:

like, it's a good heuristic.

587

:

It's just like, you can just take a subset of your data, maybe your random sample, or just

take every

588

:

I don't know, five or six months and then you run a regression in a for loop.

589

:

I mean, not even a for loop, you just factorize that with standard primacy.

590

:

But it's just like, it's an independent univariate regression, just plain.

591

:

You could do that in PRMS or BAMBI even.

592

:

Exactly.

593

:

If the regression coefficients come up very, very close to each other, then that probably

means you don't have that much time variation.

594

:

Here it's not the case.

595

:

Here we can see the lines are very, very weakly.

596

:

Exactly.

597

:

It's also part of the workflow.

598

:

I think you should always start simple.

599

:

Always start with a simple model.

600

:

See if you can find interesting relationships.

601

:

can even start just by, you you do a ggplot and then you just have put an lm between the

data points in there and just see, is there any kind of interesting dynamics you could

602

:

pick up?

603

:

And this is doing this essentially 20 times.

604

:

Yeah, yeah, yeah.

605

:

I would probably for that plot that probably be useful if you uh shared the Y scale, know,

the Y axis between the plots because like they had different scales and so probably that

606

:

will also inform but which chorus seem to be more variable in time than others.

607

:

Yeah, that's a good point.

608

:

um I think for some you can already see that there's some significant variation like

that's a industrial production.

609

:

I think if you look at the scale of the data before, was between think 0 and 15.

610

:

The coefficient goes between 0.2 and minus 0.4.

611

:

So there's some variation to be expected.

612

:

Nice.

613

:

now, Stan models.

614

:

Yeah.

615

:

So I've hit them below.

616

:

So you can look at them, I think, when you have the time.

617

:

There also, we have a repo for the paper as well.

618

:

There, we have written a SnakeMake pipeline.

619

:

So it's maybe a little bit obscure if you haven't gone through SnakeMake pipelines before.

620

:

But the Stan code is also there.

621

:

And here, I reproduced it for the dynamic regression, so the state space model that we're

looking at here.

622

:

And that's at the end of this.

623

:

Yeah, and I've put that link in the show notes, of course.

624

:

All right.

625

:

And so here we're just setting up the models and sampling from them.

626

:

I've coded up one indicator there that indicates whether it's a prior predictive or full

posterior analysis.

627

:

So basically including the model in the model block or the likelihood contribution model

block or not.

628

:

And we were talking in the beginning about what happens if you have fairly unrestricted

priors on the variances of your coefficients in your model.

629

:

And this is exactly what's happened, what I'm showing here.

630

:

So the Minnesota and RHS, those are two popular shrinkage priors for time series.

631

:

If you sample from the priors, plug it into the observation equation, generate what would

be the per predictive

632

:

wise and then calculate the R-square statistic, you'll find that these two models say a

priori we expect to fit 100 % of the data variation.

633

:

Whereas if we uh look at our model, the AR2, um here we have full control about how this

distribution looks like.

634

:

And this is approximately this beta 1 3rd distribution.

635

:

There's also a nice way to check that your coding is correct.

636

:

Like if this had an entirely different form, like a 10 structure around 0.5, um then you

would also know, okay, something's wrong with my code.

637

:

Yeah, Yeah, for sure.

638

:

And that's really why, that's what we were talking about before at the beginning.

639

:

That's what I really like about the way of setting prior that way.

640

:

Also, that's how you can see that setting your priors with other priors than the IRR

squared is really weird.

641

:

It's like, before seeing any data, you're expecting, you're telling the model to expect to

be able to explain.

642

:

all the variance in the data with your latent state equation.

643

:

So it's like saying, oh yeah, there is no noise in the data at all.

644

:

It's possible, but I would bet it is very, very improbable.

645

:

Well, they cancel the noise in data, but it is dwarfed by the...

646

:

variance of your...

647

:

the latent space process.

648

:

Yeah, it's gonna pick up everything and I don't think it's a good model in choice.

649

:

I think the error squared process here in the prior distribution makes much more sense.

650

:

Yeah, yeah.

651

:

And if you know inflation or if you know kind of your inflation data in the US, you also

know that it's a really hard time series to predict.

652

:

So there's a whole literature about how hard it is to predict variance, sorry, And we

would expect in fact

653

:

lower r squared something between 0 and 0.5 like if you're a specialist in econ you would

say okay one not possible

654

:

All right.

655

:

And so here, um I'm just generating still from the prior.

656

:

So just to motivate what the variance looks like, looks like of the coefficients to each

of these time series, I just sample from the state process.

657

:

This is how it looks like for the AR2.

658

:

You know, it's not informed by the data, so they all look approximately like you know, a

distributed around zero.

659

:

And this is how it looks like for

660

:

the Minnesota and the RHS.

661

:

And you're still plugging in the X information.

662

:

You're informing the prior with the likelihood at this point.

663

:

But you see that the variance of the data also heavily influences the prior predictives

here.

664

:

And so these are the betas, right?

665

:

Yes.

666

:

Yeah.

667

:

Yeah.

668

:

And you can see, definitely, if you see the screen here, people, or if you're following up

with a blog,

669

:

blog post.

670

:

These are weird.

671

:

These are weird prior checks.

672

:

Yeah, you wouldn't expect your coefficient value for a certain time series to range

between minus 50 and 50 if your range was like between 0 and 100, let's say, for your

673

:

target.

674

:

Yeah, and also because that implies that then other time series have zero contribution and

you cannot really control also.

675

:

which one have zeros also.

676

:

So it's quite bad.

677

:

Basically, also the problem with that is that it puts a lot of onus on the data to be very

informative.

678

:

And that might not be the case, especially with some search data where all these models

have a lot of parameters.

679

:

so that's already a big responsibility for the model.

680

:

then if you like.

681

:

put even less prior information in there.

682

:

That means you need to squeeze even more information from the data, where the data already

in time series is not necessarily the most informative.

683

:

So it's like that's pining up on the complexities.

684

:

Yes, correct.

685

:

And good point, by the way.

686

:

I think also for listeners, you can have very fancy priors and everything.

687

:

But if your likelihood is very, very strong, really informative about the value of

parameters, eh oftentimes uh

688

:

doesn't matter so much what you're doing with the prior.

689

:

So it can happen that the data information is fully overwhelmed the prior.

690

:

ah As you just said, you have k states, you have t time points.

691

:

That means you have k times t parameters you're estimating.

692

:

That's a lot.

693

:

At least.

694

:

And that's just the bad-ass.

695

:

But then if you start sharing information and so on, you add parameters uh to

696

:

to be able to do that partial pulling and so on.

697

:

So, and also like each time that means also like you, you sub that subset, posterior space

if you want in a way.

698

:

And so that means that each part of the, of this, it's just subspace is only informed by

one layer of the data.

699

:

So it's not like you're taking the full time series and then you're just sharing

everything.

700

:

It's no, like then you have that time state and these state and that's just like,

701

:

you might end up just having one data point to inform that parameter in the end.

702

:

if at all.

703

:

Yeah, if at all.

704

:

So priors matter.

705

:

this distance basis.

706

:

Especially for a time search model.

707

:

it's like, that's basically my point.

708

:

Because also I've discovered that with experience, right?

709

:

And that's why if you don't see any time variation, it's way better.

710

:

Because then if you ignore time, you basically can

711

:

of your data and aggregate it, and so that increases your sample size basically, and so

that increases the information that you have in the likelihood, and decreases the

712

:

information of the importance of the price.

713

:

Yeah, yeah, exactly.

714

:

And of course, you know, the nice thing about this R-squared stuff is that you're a priori

saying that those states, they have to fight each other for the same variance.

715

:

Like, we've upper bounded the variance, so they have to fight each other for explaining

the data, m loosely speaking.

716

:

So if one state is important, and that means away from 0 significantly in some sense, then

another state has to give, it has to then have less variation.

717

:

And that comes from the Dirichlet prior.

718

:

Yes, correct.

719

:

Yeah, and this manifests.

720

:

now we have the posterior distributions on the r squared.

721

:

We can see that they get, that they can, like we have posterior shrinkage, so that's

really good.

722

:

all go in the same direction but it's just that Minnesota and and Hoss for Pryor were so

biased towards the one the uh priority a probability mass gerrit and biased towards an R

723

:

squad of one that like it is super hard for them to get away from it too much whereas the

724

:

the R squared prior is much more aggressive on saying that the latent states are not that

are not picking up too much of the noise.

725

:

Yeah, correct.

726

:

maybe this...

727

:

I don't know if this is a good value of R squared.

728

:

I'm not making a statement about this, but there's a big difference and that's what's

important.

729

:

And we can verify whether this good or not later with predictions.

730

:

Yes.

731

:

And so basically what the R squared AR squared model here is saying is that the covariates

here, the latent states, inform much less of the variation in the data than what you would

732

:

conclude if you're using Minnesota or RHS priors.

733

:

Absolutely correct.

734

:

Yeah, nothing to add.

735

:

Very good.

736

:

Thanks.

737

:

Awesome.

738

:

Let's go on.

739

:

oh And you can also think in time series about some notion of R squared over time.

740

:

And this literally takes uh just the contribution of the states and covariates in terms of

the variability per time point and relates it to the total variability per time point.

741

:

And this is like how much of the variance of the data can you explain at each individual

time point?

742

:

And what's...

743

:

What those posterior series are saying here is that the Minnesota and RHS, which tended to

have a marginal R-squared, total over all time points to be larger, also show much more

744

:

variability over time in R-squared.

745

:

Okay, yeah, that's interesting.

746

:

And you have, like, that formula, I guess you implemented it in...

747

:

in R in the package somewhere?

748

:

Yeah, well, I've just coded it myself here.

749

:

I made a function that's up below, and I just call it.

750

:

um So this is the extract R2 function.

751

:

it's very easy.

752

:

You really just take a sample from the sample from your posterior, those beta.

753

:

you multiply it by the inner product of the...

754

:

oh, it was mistake with the transposes, by the way.

755

:

I'll fix that.

756

:

You multiply the inner product of the covariate vector per time point and relate this to

that um quantity again and the observation noise.

757

:

This is a way how you can think about R squared over time.

758

:

Yeah, it's definitely something that's like...

759

:

Yeah, needs to be if you're using a package to that.

760

:

Like let's say we have that in primacy state space.

761

:

That's a function we'd like to have basically.

762

:

And uh same story, more variation with Minnesota RHS compared to AR2.

763

:

And then here are now the posteriors of the beta vector over time.

764

:

So we have drawn our MCMC samples, we take the average over the MCMC samples, and then

just look at the time series of the beta uh states.

765

:

And so we see some variation with the AR2 that's being picked up.

766

:

There's a lot of variation for all the time series in a way, uh very similar scale.

767

:

So nothing is um fully dominating em the variance.

768

:

So these are the betas.

769

:

These are the weights of the latent states.

770

:

Exactly.

771

:

And for those who are following along, TVP in those graphs refers to time varying

parameters.

772

:

In Econ, we refer to these state space models where you have a stage for the coefficients

for an inner regression.

773

:

We call those time varying parameters for whatever reason.

774

:

I understand the reason, but it's a little bit too general.

775

:

And this is what happens with the Minnesota and RHS, the same picture, basically.

776

:

A lot of the series are getting shrunk to zero and then a couple of times series are have

a lot of variation.

777

:

Just to show you this how looks like for our test too.

778

:

actually, a quick question that's a bit more theoretical, and I don't know if you'll be

able to answer it, but what I'm wondering, maybe what I'm a bit confused by here is, is

779

:

that a state-space model with discrete or continuous latent spaces here?

780

:

Discrete.

781

:

Yeah, OK.

782

:

Yeah, discrete.

783

:

But they are not.

784

:

matured exclusive.

785

:

No, mean, a discrete is a subset of the continuous time series.

786

:

Right.

787

:

But it's not, so it's not an HMM.

788

:

It's not a hidden Markov model.

789

:

Is it?

790

:

Well, it depends how you define hidden Markov model in a way.

791

:

So if you say that the hidden or the Markovian process here is this state, discrete state

space transition.

792

:

than uh it would be, but it's not in the sense of what you sometimes see where you say,

okay, we have five discrete states for coefficients and we draw inference onto the

793

:

location and magnitude of where the states are.

794

:

Right.

795

:

Yeah, for me, an HMM is more like that, where it's like we have discrete states, but

you're switching from one state to the other.

796

:

It's like...

797

:

Let's say you have five states at some part in the time series, the regime you're at that

uh dictates your emissions depends on well, an AR process, for instance, that belongs to

798

:

state one.

799

:

And then at some point, the regime switches to state two and then it switches back to one

or goes to three or five, et cetera.

800

:

That's more like that where it's like, that's why I was saying mutually exclusive.

801

:

Whereas here, the states are not mutually exclusive, like literally because the parameters

in the sense that the parameters uh can be all active at the same time.

802

:

Like you can have beta one positive for um industrial production and also beta one

positive or negative for AAA FFME here, which I don't know what that means, right?

803

:

It's not like all the states can be active at the same time.

804

:

And it's like, and then the application of them gives you the emissions, which in my mind

is not really a hidden Markov model, but it's more like kind of a discretized linear

805

:

Gaussian state space.

806

:

Yes.

807

:

And, well, okay.

808

:

I mean, the hidden Markov model can also be discrete, right?

809

:

But then what's not the case here is that uh let's say you have 10 time points and you

have 10

810

:

beta states, then it cannot be beta 1, beta 2, beta 3, beta 1.

811

:

You're not repeating the same state along the time series.

812

:

Every new time point implies a new state.

813

:

They can be related, but there's no transition matrix which says

814

:

that the probability of going back to beta 1 after 10.1 has passed.

815

:

Yes.

816

:

Yeah, yeah.

817

:

Yeah.

818

:

Yeah.

819

:

So that's why it's really different in my mind.

820

:

Like that's, that looks much more like a linear Gaussian state space model to me, whereas

the hidden Markov model is more something like a categorical, categorical, not necessarily

821

:

emissions, but categorical state.

822

:

at least.

823

:

they're in some way related.

824

:

think the um Hamilton time series book has some nice description on relationship between

these models.

825

:

I read it during beginning of my PhD.

826

:

Don't quiz me on the details, but it's a cool read if you want to learn more about that

stuff too.

827

:

Yeah, I'm sure I'm confusing some people here, but it speaks...

828

:

I'm confused myself on that.

829

:

like, I'm still trying to understand really the difference.

830

:

I know it's a nuanced difference and that maybe don't really doesn't really matter.

831

:

But yeah, it's just like, for me to understand really what the actual differences.

832

:

I mean, just a recap, we're not drawing inference on another transition matrix, which

tells you the probability of going between states.

833

:

It's just that you start at a state and you end at a state and what happens in between

834

:

can be fairly unrestricted.

835

:

Yeah.

836

:

It's more like, we are, so here each state, each case state is like one dimensional.

837

:

So it's like tracking the position of, of a particle for instance, like that's what each

state is doing where we're tracking the position of the inflation particle in the subset

838

:

of industrial production, for instance.

839

:

Yeah, correct.

840

:

And um yeah, pretty much that's what's going on here.

841

:

um Just to recap, lot more variation in some states than in others compared to the AR2,

which has more like a constant, almost, variance across all states.

842

:

you might ask, well, which one is better for prediction?

843

:

And it turns out that the AR2 is then significantly better in terms of ELPD diff.

844

:

Yeah, which is great.

845

:

I guess you were happy to see that.

846

:

Yeah, exactly.

847

:

Awesome.

848

:

Yeah.

849

:

Maybe your last question related to that.

850

:

So I linked to Austin's blog post.

851

:

Can you tell us basically

852

:

what's the difference between what Austin is implementing in the blog post and what you're

doing in the paper is because Austin is just doing one part.

853

:

That blog post is just implementing one part of what you're doing in the paper.

854

:

So can you make sure that is clear to people when what the difference is?

855

:

Yeah, of course.

856

:

Thank you.

857

:

So the main difference is that Austin is looking only at one of our subset of the time

series models that we define this R2 prior over.

858

:

So in the paper, have AR models, MA models, ARMAs.

859

:

We have uh AR plus X, so independent covariates included with the AR regression.

860

:

And we have some simple state space models.

861

:

And what Austin did was he took a subset of only the AR simulations.

862

:

and looked at the recovery for um the true parameter values that he sets according to what

we do in the paper with the AR prior set over the AR coefficients.

863

:

So there there's no unknown states, it's all just y at the target and then um on the right

hand side of the equation you have lags of your target.

864

:

Yeah, yeah because...

865

:

oh

866

:

then yeah, it's just like the likelihood of Y is an AR.

867

:

that's all.

868

:

The model is an AR and then the likelihood is conditionally normal.

869

:

Whereas something that is more practical is what we're talking about at the beginning,

where you would have Y as a normal emission here as you have in the case study, but then

870

:

the states could

871

:

Well, not the state, but the observation equation could depend on each state being a

structurally decomposed time series with an AR process.

872

:

So local linear trend plus AR, and you will use the AR squared prior on the AR

coefficient.

873

:

Yes.

874

:

Well, I mean, in the state space models, don't actually, the AR superar is not set on

the...

875

:

state space er coefficients but on the state spaces variances.

876

:

Because that is the main determinant for the variability.

877

:

okay.

878

:

the, how did you call the col-vend here in your case study?

879

:

uh The sigmas.

880

:

So in the literature I know about it's the R matrix.

881

:

um So the

882

:

the variance of the state equation.

883

:

And here you call that the sigma.

884

:

Capital sigma underscore beta.

885

:

Sigma betas.

886

:

Yes, exactly.

887

:

Cool.

888

:

Awesome.

889

:

Great.

890

:

So uh thank you so much, David, for that in-depth case today.

891

:

Damn, that was good.

892

:

And I think that was a first on the show.

893

:

So thank you so much for doing that.

894

:

um If you listeners let me know what you thought about that.

895

:

I really like that kind of hybrid uh format content.

896

:

uh think it's really it's more handsome and I think it's very practical.

897

:

That means you guys have to check out the YouTube channel maybe a bit more but oh

898

:

But I'm fine with that.

899

:

So yeah, that was at least super cool to do.

900

:

So thank you so much for that, David.

901

:

I think you can stop sharing your screen now.

902

:

And I've already taken a lot of your time, so I still have a lot of questions for you, but

I'm going to start playing this out, because I it's getting late for you.

903

:

But maybe what I'm curious about is maybe for...

904

:

you know, your future work.

905

:

uh Like, what do you see as the most exciting trends or advancements in your field?

906

:

And also where, where do you see the future of probabilistic programming heading?

907

:

Of course, you, you're called out on Stan.

908

:

You also do, you also work on some, some Python now, thanks to Osvaldo being there with

you, know, spreading the, spreading the dark energy of the Python world.

909

:

Thanks, Osvaldo.

910

:

Yeah, so basically, I'm curious to know where your head is at here, where your future

projects are.

911

:

Yeah, I think there's a lot that excites me about our research agenda at Aalto, but also

others.

912

:

What excites me in our group is that, and the people that we work with more generally, is

that we're still very actively thinking about how can we set priors about things that we

913

:

have expert knowledge on.

914

:

uh summary statistics, something about the predictive space and what do these in prior

imply then for all of these coefficients that we have in the model where we typically just

915

:

go ahead and set normal zero one priors, you know.

916

:

uh Those, that is still under active development.

917

:

So we have like the, let's say the simple time series stuff covered to some degree, but

there's so much more to be done in time series, even with multivariate models.

918

:

So there are ways to define this R squared step also for multivariate time series stuff.

919

:

think that's really cool and has a lot of policy applications well.

920

:

Because, you know, central banks and so on who do the econ policy for a country, they

often know that, well, everything is related to each other.

921

:

If you're modeling inflation, you're also going to model GDP and so on and so forth.

922

:

And, you know, doing this jointly is really the way to go in the end.

923

:

And these priors, I think, can also be

924

:

very good for those kind of questions.

925

:

No, for sure.

926

:

In the end, everything is a vector or a progressive model.

927

:

oh I you're pitching to the choir, but I would tend to agree, at least approximately.

928

:

Yeah, yeah.

929

:

mean, yeah.

930

:

Basically, often the limitation, the

931

:

is the computational bottleneck, right?

932

:

But honestly, almost all the time you would want uh vector autoregressive processes on the

observation equation and on the latent state equations.

933

:

Most of the time you have correlations everywhere and you want to estimate that.

934

:

The problem is that we often don't do that because it's just impossible to fit.

935

:

But ideally, we would be able to do that.

936

:

Yeah, exactly.

937

:

And, you know, lot to be done there still.

938

:

And we're also looking a lot into still, you know, workflow in terms of how, you know,

prior is one thing, but a whole new aspect also is model selection.

939

:

So we're also very excited about a project where we're investigating the question of when

is selection necessary if you have different priors.

940

:

uh to fulfill your goal in terms of prediction in the first case scenario.

941

:

But even for causal analysis, this is an important question.

942

:

How do you set the priors and do you need selection to somehow um produce reasonable

predictions for the treated versus non-treated treated?

943

:

And we have lots of covariates or other structure in your model.

944

:

uh So we're working also on that.

945

:

I think it's going to be, you

946

:

fun results are to come out of that.

947

:

What do you mean by selection here?

948

:

Selection processes, selection bias, or is that different?

949

:

More like a variable selection or like component selection.

950

:

so there's some stuff like this predictive inference, which does selection based on can

you find a surrogate model which gets as close as possible to a full model, like a

951

:

Gaussian process that is hard to compute.

952

:

And, you know, statistical folklore tells you that, well, if things get too hard, as in

you have too many components, do selection.

953

:

Uh, because then you, you, you, implicitly decreased the variance for predictions because

you're focusing only on a couple things that are model and, um, well, you know, what we're

954

:

kind of saying is, well, that's not necessarily true if you have good priors and

understanding, um, when that statement in fact is true and when, when it is not so true

955

:

is, is an interesting.

956

:

question because like let's say in those causal analyses where you have uh randomized

controlled trials, let's a drug uh is being administered to one population randomly or

957

:

not, uh then does it make sense to let's say use an R-square prior, which implicitly will

say the treatment effect is correlated with other parameters that you're estimating.

958

:

And is that a good choice?

959

:

you know.

960

:

What we're saying is like, it's, it's, it depends.

961

:

And we kind of go into detail, but when the R square priors and priors like that are good

and when they're bad and when selection is needed and when not.

962

:

Nice.

963

:

Yeah.

964

:

Yeah.

965

:

Super interesting.

966

:

Let me, let me know when, when you have something out on that.

967

:

I'll be, I'll be very interested to, to read about that and, and, and maybe talk to you

again about that because that sounds, sounds very, important and interesting.

968

:

So, yeah.

969

:

Yeah.

970

:

I'll be very curious about that.

971

:

Maybe one thing about other people's work.

972

:

was very selfish talking about our work, but I think there's some really cool stuff I'm

excited about that comes out from groups around like uh Paul Berkner and so on, which are

973

:

also picking up work on normalizing flows and amortized Bayesian inference.

974

:

think that stuff is going to be really good going forward because you can simplify

computations.

975

:

You can reuse models for huge estimation tasks.

976

:

I think this will make the kind of general

977

:

based in computational workflow much easier, much more easier in the future.

978

:

So I think using this, maybe integrating it with the knowledge that we're working on also

how to model and then how to do computation, those things, they are interdependent, I

979

:

think, for the future.

980

:

I'll be back to see what comes out of that.

981

:

Yeah, completely agree with that.

982

:

And I'll refer listeners to episode 107 with Marvin Schmidt about amortization inference.

983

:

that was super interesting and haven't been able to use that in production yet but really

I'm looking forward to be able to do that and like have an excuse and use case for that

984

:

because this looks really cool and and yeah I completely agree with you that it has a lot

of potential uh for that and everything Marvin and the bass flow team and Paul

985

:

Paul Berkner are doing on that front end.

986

:

Even anything Paul is doing is just always super brilliant and interesting.

987

:

And what I love is um very practical.

988

:

It's not research that's like, okay, that's cool,

989

:

I can't even do that because the math is too complicated and it's not implemented

anywhere.

990

:

know, that's always...

991

:

uh His research and you guys research at Aalto is what I really like.

992

:

It's often...

993

:

It's always geared towards practical application and not just, yeah, that's cool math,

but...

994

:

uh

995

:

Nobody knows how to implement that.

996

:

So that's really cool.

997

:

And well done on that.

998

:

I think it's amazing.

999

:

ah And talking about normalizing flows, I'll also add to the show notes.

:

01:27:38,247 --> 01:27:54,686

nutpy from adrian zaybolt uh so he was also on the podcast i will also link to his podcast

episode with me where he came and talked about zero subnormal and uh nutpy which is uh an

:

01:27:54,686 --> 01:28:08,342

implementation of hmc but in rust so that's much faster and now he did something very cool

in nutpy and you can use that with pymc and stan you know models but now

:

01:28:08,342 --> 01:28:14,696

you can use normalizing flows to adapt HMC in NutPy.

:

01:28:14,696 --> 01:28:22,161

So basically what this will do is first run normalizing flow and train a neural network

with that.

:

01:28:22,161 --> 01:28:32,297

And then once it learns the way to basically turn the posterior space into a standard

normal, then it will use that to.

:

01:28:32,297 --> 01:28:35,709

initialize HMCE and run HMCE in your model.

:

01:28:35,709 --> 01:28:42,364

uh And so, of course, you don't want to do that on a simple linear regression, right?

:

01:28:42,364 --> 01:28:54,363

It's overkill, because it's going to take at least 10 minutes to fit, because you have to

train an old network first to learn uh the transformation of the posterior space that

:

01:28:54,363 --> 01:28:56,504

would make it sound normal.

:

01:28:56,504 --> 01:29:00,787

But if you have very complex models with

:

01:29:01,003 --> 01:29:16,424

very complex posterior space, things like nil's funnels, uh banana shapes, and so on,

where it's very hard to find a reparametrization that's efficient, then uh trying the

:

01:29:16,424 --> 01:29:21,638

normalizing flow adaptation of NetPy could be very interesting to you.

:

01:29:21,638 --> 01:29:29,023

uh And literally, if that works in your case, it can make your MCMC sampling

:

01:29:29,353 --> 01:29:31,445

much faster and also much more efficient.

:

01:29:31,445 --> 01:29:34,508

So that means much bigger effective sample size.

:

01:29:35,029 --> 01:29:47,103

So I will definitely do that in the show notes because I think it's something people uh

need to know about and well, try it out uh and that way Adrian can know uh this uh is

:

01:29:47,103 --> 01:29:48,364

working out there in the world.

:

01:29:48,364 --> 01:29:50,105

And I know he loves that.

:

01:29:52,783 --> 01:29:54,804

awesome well date uh...

:

01:29:54,804 --> 01:29:57,606

that's cool anything you want and that maybe i didn't uh...

:

01:29:57,606 --> 01:30:10,431

i didn't ask you or or mention before asking the last two questions and don't known i

think we have a couple of grantor uh...

:

01:30:10,431 --> 01:30:18,575

i think there's a lot of cool stuff here it's it's probably impossible to to find it all i

do want to make an honorable mention to all this work

:

01:30:18,731 --> 01:30:21,573

that goes into uh prior elicitation.

:

01:30:21,573 --> 01:30:35,963

I know that you're also interested in that, Alex, but there's also work that is coming out

of Helsinki and Aalto, which is looking into how can we go from knowledge about effects of

:

01:30:35,963 --> 01:30:37,824

covariance to priors.

:

01:30:37,905 --> 01:30:47,211

And um we have tools that can work for simple cases very well, but what if you have

correlated effects?

:

01:30:47,403 --> 01:30:56,226

like let's say, I don't know, age and um income predicting, I don't know, school outcomes

or whatever, right?

:

01:30:56,226 --> 01:31:06,528

Those things are often highly correlated and then going from like a conditional

expectation on predictions to um the prior.

:

01:31:06,528 --> 01:31:14,631

So let's say you have this age and this income, does that, how does that relate ah to

education outcomes?

:

01:31:14,631 --> 01:31:15,647

uh

:

01:31:15,647 --> 01:31:19,389

And specifying the prior in that way, I think is super interesting.

:

01:31:19,389 --> 01:31:36,286

And there's a lot of cool stuff also being developed that helps to specify these priors

with artificial intelligence, uh AI trying to go from um very prose and conversational way

:

01:31:36,286 --> 01:31:43,059

of talking about then what we want to do a prior on to then actually implementing it and

uh things like Stan and PyMC and so on.

:

01:31:43,059 --> 01:31:44,399

think that's

:

01:31:44,479 --> 01:31:54,888

a lot of the future that's awaiting people who are maybe not so interested in learning

Stan and details, but still want to do cool Bayesian inference.

:

01:31:54,888 --> 01:32:01,353

And then these kinds of things, I think, will make it accessible to a much wider audience

than it right now.

:

01:32:01,734 --> 01:32:02,514

Yeah.

:

01:32:02,514 --> 01:32:02,995

Yeah.

:

01:32:02,995 --> 01:32:04,396

I mean, definitely.

:

01:32:04,396 --> 01:32:13,015

I mean, even for us, know, who are like power users of the software, that would make my

model workflow be way faster.

:

01:32:13,015 --> 01:32:26,455

because most of the time that's a much, much more interpretable and intuitive way of

defining the priors than trying to understand what the prior on the AR squared process of

:

01:32:26,455 --> 01:32:30,575

my time series of my structural time series model is going to mean.

:

01:32:30,575 --> 01:32:42,255

The only way I can understand what this means right now is just doing cumbersome iterative

process of changing one not at a time and seeing how that impacts

:

01:32:42,443 --> 01:32:51,125

the prior predictive checks and maybe an interesting metric, like the prior squared or

something like that.

:

01:32:52,326 --> 01:32:55,927

that's the only thing that's really reliable right now.

:

01:32:55,927 --> 01:32:59,623

And it feels like it can be automated for sure.

:

01:32:59,623 --> 01:33:11,411

Because it's like a lot of cumbersome back and forth, basically, probably something

ASC-TEED would make faster.

:

01:33:12,223 --> 01:33:17,868

Yeah, but still it's kind of nice that you still have to get your hands dirty in a way.

:

01:33:17,868 --> 01:33:21,191

not everything is too automated because it does let you learn a lot.

:

01:33:21,191 --> 01:33:28,616

But the problem still remains that not everyone has the time, inclination or interest in

getting their hands that dirty.

:

01:33:29,077 --> 01:33:34,642

Yeah, yeah, No, and also like everything has a trade-off, right?

:

01:33:34,642 --> 01:33:41,487

So the time you spend on that is not time you're spending thinking about expanding your

model.

:

01:33:41,545 --> 01:33:43,776

Yeah, I need more expressive and so on.

:

01:33:43,776 --> 01:33:51,139

yeah, that if we can make that easier, that definitely be amazing and high impact.

:

01:33:52,019 --> 01:33:52,420

Awesome.

:

01:33:52,420 --> 01:33:54,050

So I need to let you go, David.

:

01:33:54,050 --> 01:33:56,121

That's already like one hour and a half we're recording.

:

01:33:56,121 --> 01:33:57,651

So I don't want to take too much of your time.

:

01:33:57,651 --> 01:34:01,683

You'll come back on the show for your for future work you have for sure.

:

01:34:02,044 --> 01:34:07,676

But before you go, let me ask you the last two questions I ask every guest at the end of

the show.

:

01:34:07,676 --> 01:34:10,709

So if you had unlimited time and resources,

:

01:34:10,709 --> 01:34:12,879

Which problem would you try to solve?

:

01:34:14,819 --> 01:34:21,284

This is a really a weighty question and I feel like there have been such good answers in

the past So it's it's really hard.

:

01:34:21,284 --> 01:34:31,891

I find to add to any of that But you know, let's say that with infinite resources and

everything I've I've done all the things that we should do for humanity.

:

01:34:31,891 --> 01:34:43,669

All right, so we've been the good guy already I think What I would do is I would go back

to one of those core econ things oh that are important namely

:

01:34:43,669 --> 01:34:51,845

How do you set policy such that you maximize the utility of a nation or maybe all nations?

:

01:34:53,667 --> 01:35:02,674

you know, one, one particular question econ is how can you, how can you achieve the best

amount of good or the most amount of good for all people?

:

01:35:02,755 --> 01:35:08,489

And this is a really difficult question because there are just, there are always so many

trade-offs in, in policymaking.

:

01:35:08,489 --> 01:35:11,051

do one thing, you improve the life for others.

:

01:35:11,051 --> 01:35:12,502

You decrease the,

:

01:35:13,169 --> 01:35:15,260

benefit for another group.

:

01:35:15,260 --> 01:35:32,034

And I think if I had infinite resources, I would try to find the optimal policy rule that

would satisfy the condition of uh best amount of welfare, whatever that definition is, by

:

01:35:32,034 --> 01:35:38,746

the way, I guess that needs to be conditioned on philosophy um across all time periods.

:

01:35:38,746 --> 01:35:42,007

And then basically have a fairly automated rule.

:

01:35:42,007 --> 01:35:53,982

uh that kind of is running and whenever any economic actor takes any decision and what

would happen that would be that you would basically have like a steady state process for

:

01:35:53,982 --> 01:35:59,254

the entire nation's economy without any significant variation.

:

01:35:59,254 --> 01:36:10,439

like policymaking would always be such that we would all have kind of the best economic

life uh possible within the confines of the chosen philosophy and the constraints of

:

01:36:10,439 --> 01:36:11,479

resources.

:

01:36:12,919 --> 01:36:13,799

That's fine.

:

01:36:13,799 --> 01:36:15,919

Yeah, I love that.

:

01:36:16,579 --> 01:36:17,839

Very nerdy answer.

:

01:36:17,839 --> 01:36:19,759

And I really appreciate that.

:

01:36:19,759 --> 01:36:20,879

Thank you.

:

01:36:20,879 --> 01:36:22,019

I appreciate the effort.

:

01:36:22,019 --> 01:36:22,419

I love that.

:

01:36:22,419 --> 01:36:24,679

And I definitely resonate with that.

:

01:36:25,439 --> 01:36:27,739

Although I would argue we're very far from that.

:

01:36:27,739 --> 01:36:30,279

So you would need to do a lot of work.

:

01:36:30,279 --> 01:36:32,619

Good thing you have unlimited time.

:

01:36:34,759 --> 01:36:42,499

And second question, if you could have dinner with any great scientific mind dead alive or

fictional, who do you

:

01:36:43,647 --> 01:36:46,519

So, so again, that is like too much of a weighty question.

:

01:36:46,519 --> 01:36:47,969

So I'm just going to sidestep that.

:

01:36:47,969 --> 01:36:55,754

think there are too many cool people I would like to talk to, but I think who is alive and

I would really like have a dinner with is Chris Sims.

:

01:36:55,754 --> 01:36:57,775

He's a Nobel laureate in econ.

:

01:36:57,775 --> 01:37:03,258

um He in fact was one of the initial researchers on vector audit questions, Alex.

:

01:37:03,258 --> 01:37:11,582

So if you're, if you're looking into vector audit stuff, then Chris Sims is like one of

those OG researchers in a way.

:

01:37:11,807 --> 01:37:12,947

And.

:

01:37:13,073 --> 01:37:21,665

He won Nobel Prize on related work related to more policy related stuff, but he's done a

lot of really interesting time series econometrics.

:

01:37:21,685 --> 01:37:33,379

And I would love to just have a conversation with him over dinner where we talk about how

can we integrate, let's say, the work on R squared stuff and, you know, safe uh Bayesian

:

01:37:33,379 --> 01:37:35,639

model building with his time series knowledge.

:

01:37:35,639 --> 01:37:39,550

I think that would be such a cool, such a cool thing to do.

:

01:37:39,631 --> 01:37:41,691

And in fact, he

:

01:37:41,695 --> 01:37:53,498

I think that a lecture recently in the past, two, three years where he was suggesting that

people should look at econ problems with multiple lenses.

:

01:37:53,498 --> 01:38:02,781

This goes a little bit into this kind of a multiverse idea of, of, um, statistical

modeling and acknowledging that there's a workflow that you have to work through.

:

01:38:02,781 --> 01:38:08,623

There's not always one solution for every statistical problem and econ, which is kind of

dogma, you know?

:

01:38:08,623 --> 01:38:10,583

Um, I think.

:

01:38:10,687 --> 01:38:13,388

Working with him on that would be such a cool thing to do.

:

01:38:14,029 --> 01:38:15,490

Yeah, definitely.

:

01:38:15,490 --> 01:38:19,282

ah And I've never had a Nobel Prize laureate on the show.

:

01:38:19,282 --> 01:38:23,754

I've had a sir, but I've never had a Nobel Prize laureate.

:

01:38:23,754 --> 01:38:25,295

yeah, if anybody knows...

:

01:38:25,295 --> 01:38:30,037

um Kreese, right?

:

01:38:30,178 --> 01:38:32,129

Yes, I'm sure.

:

01:38:32,129 --> 01:38:34,299

Then let me know.

:

01:38:34,460 --> 01:38:35,560

Put me in contact.

:

01:38:35,560 --> 01:38:38,862

I'll definitely try and get him on the show for sure.

:

01:38:39,483 --> 01:38:40,003

Amazing.

:

01:38:40,003 --> 01:38:40,783

Well...

:

01:38:41,163 --> 01:38:42,523

David, thank you so much.

:

01:38:42,523 --> 01:38:44,423

um That was awesome.

:

01:38:44,423 --> 01:38:45,424

Really had a blast.

:

01:38:45,424 --> 01:38:51,076

um Learned a lot, but I'm m not surprised by that.

:

01:38:51,076 --> 01:38:53,837

I had a good prior on that.

:

01:38:53,837 --> 01:38:59,189

yeah, thank you so much for taking the time.

:

01:38:59,189 --> 01:39:06,675

Please let me know, listeners, how you find that new hybrid format.

:

01:39:06,675 --> 01:39:15,021

I really like it so far, so unless you tell me, I really hate it and most of you tell me

that, I think I'll keep going with that uh whenever I can.

:

01:39:15,742 --> 01:39:32,514

So as usual, I put a lot of things in the show notes for those who want a deep deeper

David, so your socials, your work and so on, for people who want a deep deeper.

:

01:39:33,425 --> 01:39:37,006

Thanks again for taking the time and being on this show.

:

01:39:38,513 --> 01:39:39,394

thank you.

:

01:39:43,541 --> 01:39:47,252

This has been another episode of Learning Bayesian Statistics.

:

01:39:47,252 --> 01:39:57,735

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

LearnBayStats.com for more resources about today's topics, as well as access to more

:

01:39:57,735 --> 01:40:01,816

episodes to help you reach true Bayesian state of mind.

:

01:40:01,816 --> 01:40:03,777

That's LearnBayStats.com.

:

01:40:03,777 --> 01:40:08,618

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraan.

:

01:40:08,618 --> 01:40:11,779

Check out his awesome work at BabaBrinkman.com.

:

01:40:11,779 --> 01:40:12,969

I'm your host.

:

01:40:12,969 --> 01:40:14,040

Alex and Dora.

:

01:40:14,040 --> 01:40:18,199

can follow me on Twitter at Alex underscore and Dora like the country.

:

01:40:18,199 --> 01:40:25,448

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

:

01:40:25,448 --> 01:40:27,830

Thank you so much for listening and for your support.

:

01:40:27,830 --> 01:40:30,111

You're truly a good Bayesian.

:

01:40:30,111 --> 01:40:40,535

oh

:

01:40:40,535 --> 01:40:53,434

Be sure you have to be a good bazier Change calculations after taking fresh data Those

predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
134. #134 Bayesian Econometrics, State Space Models & Dynamic Regression, with David Kohns
01:40:55
133. #133 Making Models More Efficient & Flexible, with Sean Pinkney & Adrian Seyboldt
01:12:12
132. #132 Bayesian Cognition and the Future of Human-AI Interaction, with Tom Griffiths
01:30:14
131. #131 Decision-Making Under High Uncertainty, with Luke Bornn
01:31:45
130. #130 The Real-World Impact of Epidemiological Models, with Adam Kucharski
01:09:04
129. #129 Bayesian Deep Learning & AI for Science with Vincent Fortuin
01:02:42
128. #128 Building a Winning Data Team in Football, with Matt Penn
00:58:10
127. #127 Saving Sharks... with Python, Causal Inference and Aaron MacNeil
01:04:08
125. #125 Bayesian Sports Analytics & The Future of PyMC, with Chris Fonnesbeck
00:58:14
124. #124 State Space Models & Structural Time Series, with Jesse Grabowski
01:35:43
123. #123 BART & The Future of Bayesian Tools, with Osvaldo Martin
01:32:13
122. #122 Learning and Teaching in the Age of AI, with Hugo Bowne-Anderson
01:23:10
121. #121 Exploring Bayesian Structural Equation Modeling, with Nathaniel Forde
01:08:12
120. #120 Innovations in Infectious Disease Modeling, with Liza Semenova & Chris Wymant
01:01:39
119. #119 Causal Inference, Fiction Writing and Career Changes, with Robert Kubinec
01:25:00
118. #118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
00:58:50
117. #117 Unveiling the Power of Bayesian Experimental Design, with Desi Ivanova
01:13:11
116. #116 Mastering Soccer Analytics, with Ravi Ramineni
01:32:46
115. #115 Using Time Series to Estimate Uncertainty, with Nate Haines
01:39:50
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
113. #113 A Deep Dive into Bayesian Stats, with Alex Andorra, ft. the Super Data Science Podcast
01:30:51
112. #112 Advanced Bayesian Regression, with Tomi Capretto
01:27:18
110. #110 Unpacking Bayesian Methods in AI with Sam Duffield
01:12:27
107. #107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt
01:21:37
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
104. #104 Automated Gaussian Processes & Sequential Monte Carlo, with Feras Saad
01:30:47
103. #103 Improving Sampling Algorithms & Prior Elicitation, with Arto Klami
01:14:38
102. #102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle
01:08:53
100. #100 Reactive Message Passing & Automated Inference in Julia, with Dmitry Bagaev
00:54:41
98. #98 Fusing Statistical Physics, Machine Learning & Adaptive MCMC, with Marylou Gabrié
01:05:06
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
94. #94 Psychometrics Models & Choosing Priors, with Jonathan Templin
01:06:25
90. #90, Demystifying MCMC & Variational Inference, with Charles Margossian
01:37:35
89. #89 Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler
01:59:50
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
86. #86 Exploring Research Synchronous Languages & Hybrid Systems, with Guillaume Baudart
00:58:42
84. #84 Causality in Neuroscience & Psychology, with Konrad Kording
01:05:42
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
56. #56 Causal & Probabilistic Machine Learning, with Robert Osazuwa Ness
01:08:57
68. #68 Probabilistic Machine Learning & Generative Models, with Kevin Murphy
01:05:35
71. #71 Artificial Intelligence, Deepmind & Social Change, with Julien Cornebise
01:05:07
78. #78 Exploring MCMC Sampler Algorithms, with Matt D. Hoffman
01:02:40
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05