Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Takeaways:
Chapters:
10:09 Understanding State Space Models
14:53 Predictively Consistent Priors
20:02 Dynamic Regression and AR Models
25:08 Inflation Forecasting
50:49 Understanding Time Series Data and Economic Analysis
57:04 Exploring Dynamic Regression Models
01:05:52 The Role of Priors
01:15:36 Future Trends in Probabilistic Programming
01:20:05 Innovations in Bayesian Model Selection
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
Today, I'm excited to be joined by David Comte, a postdoctoral researcher in the Bayesian
workflow group under Professor Aki Dettari at Aalto University.
2
:With a background in econometrics and Bayesian time series modeling, David's work focuses
on using state-space models and principled prior licitation to improve model reliability
3
:and decision-making.
4
:In this episode, David demos live how to use the AR.
5
:squared prior, a flexible and predictive prior definition for Bayesian autoregressions.
6
:We show how to use this prior to write your own Bayesian time series models ARMA,
autoregressive distributed lag or ARTL and vector autoregressive models VAR.
7
:David also talks about the different ways one can generate samples from the prior to mimic
the different expected time series behaviors and look into what the prior
8
:implies on many other spaces than the natural parameter space of the AR coefficients.
9
:So you will see this episode is packed with technical advice and recommendations and we
even live demo the code for you so you might wanna tune in on the YouTube channel for this
10
:episode.
11
:And if you like this new format, kind of a hybrid between a classic interview and a
modeling webinar, well, let me know.
12
:and let me know which topics and guests you would like to have for this new format.
13
:This is Learning Vasion Statistics, episode 134, recorded April 24, 2025.
14
:Welcome to Learning Basion Statistics, a podcast about Basion inference, the methods, the
projects, and the people who make it possible.
15
:I'm your host, Alex Andorra.
16
:You can follow me on Twitter at alex-underscore-andorra.
17
:like the country.
18
:For any info about the show, learnbasedats.com is Laplace to be.
19
:Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
20
:That's learnbasedats.com.
21
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash Alex underscore and Dora.
22
:See you around, folks.
23
:and best patient wishes to you all.
24
:And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
25
:Check us out at pimc-labs.com.
26
:David Gomes, welcome to Learn Invasions Statistics.
27
:Thank you very much, pleasure to be here.
28
:Yeah, that's great.
29
:I'm delighted to have you on.
30
:I feel I could do a live show at Aalto University and just interview everybody one after
the other and then have one year of content and then just go to an island and receive Mai
31
:Tai and just earn a passive income thanks to you guys.
32
:I'm sure we can organize that Yeah, full disclosure.
33
:I would not be able to leave with the leader the podcast that would not work at all It's
not a good business model.
34
:Don't do that people ah But if you can have fun then then yeah do it um Now it's great to
have you on I'm gonna have also team.
35
:I don't know if that's how you printed his name, but I'm gonna have Timo Timo on the show
in a few weeks also um
36
:Osvaldo has been has been here.
37
:uh Akki, obviously, and and I think and Noah should should come on the show one of these
days to talk about everything he's he's doing.
38
:So no, I like I need to I need to contact you.
39
:So anyways, today's you, David, thanks.
40
:Thank you so much for taking the time I've been reading your
41
:You'll work for a few weeks now because you're doing a lot of very interesting things
about auto-aggressive models, state-space models, how to choose priors, and so on.
42
:So that's really cool.
43
:We're going to talk about that in a few minutes, and you're going to do a few demos live.
44
:um So if you happen to be in the chat because you're an LBS patron, um please don't be shy
and introduce yourself in the chat, and then you can ask questions to David.
45
:But before that, David, as usual, let's start with your origin story.
46
:Can you tell us what you're doing nowadays and how you ended up working on this?
47
:Yeah, thanks.
48
:So pretty much my whole educational background is in econ.
49
:So I did my bachelor and master's and PhD later on also on econ, but um always with a
flavor of econometrics.
50
:So I was early on interested already in my undergraduate studies about statistical
relationships.
51
:uh Particularly back then I was more interested in things like the relationship between
debt relief allocation and then later on the country's development.
52
:So that involved a lot of uh what we call an econ panel data methods, which are really uh
spatial type of models.
53
:uh
54
:Then during my graduate studies, I was then more interested in time series models.
55
:I really just loved kind of the simplicity and the mathematics of working through some
discrete time series models.
56
:And um that is, of course, widely applicable to many things in econ.
57
:at some point, well, I got then really interested in uh thinking about uh how can you
apply higher dimensional
58
:time series models to problems where you have maybe lots of data.
59
:So especially like in finance and macroeconomics, you have a lot of situations where you
have very short time series, but potentially a lot of explanatory factors.
60
:And so then classical methods tend to be fairly weak in terms of power, but also then in
terms of regularizing the variance sufficiently of the model to get useful predictions.
61
:So then I really delved into Bayesian Econometrics with, I suppose, my first mentor, you
wouldn't call him that, Gary Coupe at Strathclyde University in Scotland.
62
:So I did my graduate study in Edinburgh and then he, Gary Coupe, was in Strathclyde and uh
had the great honor of doing this Bayesian Econometrics course with him.
63
:And it was probably the best six weeks of my academic life at that point.
64
:I just really loved his stuff.
65
:uh
66
:great website by the way, with a lot of resources if people are interested in Bayesian
time series econometrics, some panel data stuff as well, a lot of multivariate stuff in
67
:fact, so a lot of like vector auto regressions, but maybe we can talk about that later
too.
68
:yeah, basically starting with the backgrounds that Gary gave, I delved further and further
into Bayesian time series econometrics.
69
:And that's where I'm pretty much
70
:tradition still.
71
:after that, did my PhD also in Scotland.
72
:And there, my sole focus was then on Bayesian methods for time series modeling, and then
also some modeling in the direction of quantile regression as well.
73
:Okay, interesting.
74
:Yeah, I didn't know you were your icon heavy.
75
:That's interesting.
76
:That's a bit like, yeah, Jesse Grabowski has the same, a similar background.
77
:So I want to refer people to episode 124 where Jesse talked about state-based models.
78
:All of that, it's one of his specialties.
79
:So for background about that, we'll talk about that a bit today again, but for more
background information on that, Nisner, you can refer to that as a prerequisite, let's
80
:say, for this episode with David.
81
:And yeah, definitely if you have a link to uh Gary Coop's material, feel free to add that
to the show notes place because I think it's going to be very uh interesting to people, at
82
:least to me.
83
:uh I love time series and vector of accuracy stuff and so on.
84
:And JC and I are working a lot to make the Pimc state space module.
85
:better and more useful to people.
86
:yeah, if we can make all that easier to use, that's going to be super helpful.
87
:yeah, awesome.
88
:Feel free to add that to the show note.
89
:And thanks for this very quick introduction.
90
:That's perfect.
91
:That's a great segue to just uh start and dive in, basically, because you have a um case
study for us.
92
:today and you're going to share your screen.
93
:uh Maybe we can start with a quick theory of state space models and mainly geared to
towards what you're going to share with us today.
94
:you can take it over David, feel free to share screen already or a bit later.
95
:So perhaps before I go into the state space specifics, maybe I can first comment on like
maybe what we're still working on today.
96
:And then I think that that
97
:give you some background at least and why we're interested in still thinking about state
spaces.
98
:because part of the reason why I entered the research realm around, well not around, also
was that uh Aki was working a lot on these kind of Bayesian workflow problems.
99
:So how to build models in various circumstances, how to robustly draw inference.
100
:And one thing that was, I think, direly missing from also the research I was doing at the
beginning of my PhD was how to safely build out these time series models.
101
:Like, uh how do you set priors on things that you can interpret?
102
:then that allows you then oftentimes to add more complexity to the model without
sacrificing in predictions or at least statistics that involve predictions.
103
:And so uh one thing that we're working on at the moment very in a focused sense is this
idea of predictively consistent priors, meaning that you start out with some notion of
104
:understanding about a statistic on the predictive space that might be something like the R
squared statistic.
105
:This measures the amount of variance fit.
106
:So this is a statistic between zero and one.
107
:often it's bounded in that space uh for many models at least, and it measures uh the
variance that the predicted term of your model is fitting.
108
:So let's say the location component of a normal linear regression over the total variance
of the data.
109
:So the higher the r2 is, the better.
110
:So how much variance can I fit as a fraction between 0 and 1?
111
:And uh that kind of
112
:idea has been developed also in the Bayesian sense, where Aki and Andrew Gellman have
worked out the kind of methodology behind this Bayesian R squared, so Bayesian
113
:interpretation, which really is just a posterior predictive of the predictor term or uh
the variance of your model over the uh entire predictive variance, including also the
114
:error term and so on so forth.
115
:And what we recognize is that this statistic is well understood in many domains.
116
:So in econ, in the biomedical sphere, in lot of social sciences, people usually have a
model where they understand this notion of R2, this notion of variance fit.
117
:So that goes even beyond just the kind of classical normal linear regression case, but
also for
118
:general GLMs.
119
:There are certain definitions of R-square that exist and people are able to interpret.
120
:And what we are doing in our group at the moment a lot, at least I'm working on it a lot
and with Noah also, we're looking into how can you set a prior on the R-squared and from
121
:that point of view, citer out the or define the priors and the rest of the model.
122
:So you start from a notion of understanding of R squared and perhaps some prior about
this.
123
:And given this, how can you find priors of all the other components in the model?
124
:Yeah.
125
:Yeah, yeah.
126
:I really love that.
127
:That's very interpretable.
128
:And that's also really how you would define models most of the time you think about them.
129
:Because anybody who's worked with a model with an AR component in there knows that and
have done
130
:prior predictive checks, these checks become crazy in magnitude if you have just an AR2.
131
:AR1 is fine with somewhat normal priors, but then if you have an AR2 component and I
encourage you, if you stan or PMC, go to the stan or PMC website, just copy paste the code
132
:for an AR model and then just
133
:uh sample prior predictive samples from there with an AR2 and you'll see that if you use a
normal 0,1 on the coefficients the magnitude of the priors just becomes super huge with
134
:the time steps and that's the big problem uh and one of the problems that you are trying
to address with the AR squared prior and I think that's yeah that
135
:The way you it, I really love it because it's also very interpretable and intuitive.
136
:Yeah.
137
:And then again, it's predictably consistent.
138
:you have a notion of R squared.
139
:And if you generate from your model, so if you don't condition your data, you just do this
push forward distribution where you sample from your prior, plug it into your model,
140
:generate predictions, then the prior predict of R squared will align with your prior
expectations.
141
:your prior knowledge of uncertainty of r squared, say, the shape of the distribution.
142
:And that's exactly what we're doing in this line of research for time series models, in
particular, stationary time series models.
143
:Yeah.
144
:Yeah, yeah, Yeah.
145
:So thanks a lot for this background.
146
:I think that's indeed very, important.
147
:And so now, do you want to dive a bit more into the state space models in the case study
you have for us today?
148
:Yeah.
149
:Let's do it.
150
:Awesome.
151
:Let's go.
152
:So for a
153
:people live in the in the recording you'll be able to see david's screen and otherwise if
you are watching on youtube um this episode well you you also see that in the in the video
154
:you'll see david's live otherwise if you are listening to the episode well for that part
of the episode
155
:Encourages you to go on YouTube and check that out as soon as you can because that's
probably gonna be a bit easier to follow
156
:Alright, I'll just share my entire screen, think that will be easiest.
157
:Yes, we are on.
158
:So this is the dynamic regression case study um that you see on the screen.
159
:You have that listeners in the show notes of this episode.
160
:So the link is in there.
161
:It's on David's website.
162
:And now David, you can take it away.
163
:All right.
164
:So.
165
:Yeah, I think we covered some of the basics already with the R squared stuff.
166
:You can define this for AR type regressions and MA and ARMA type models.
167
:uh There are some special mathematical things you have to take into account for.
168
:this time series structure implies a conditional variance, which you have to include in
your prior definition.
169
:uh But here we're looking at something that's even one step further.
170
:So we go from
171
:model that has as the target yt, this is a scalar, uh we relate it to a set of covariates,
uh so those are the x's, they are here dimension k times one per time point t, and we have
172
:this unknown regression vector beta t.
173
:So so far so good, this is basically the same almost as just your normal linear regression
case, but indexed by time.
174
:The special thing about this model is that uh the coefficients themselves, the betas, they
evolve according to a latent state process.
175
:So this is the second uh row in equation one.
176
:uh This says that the coefficients uh vary across time according to an AR1 process.
177
:And this allows for the fact that the relationship
178
:between your covariates and your targets may change over time.
179
:So like a famous uh example in econ, for example, is that the response of uh interest
rates that the uh central bank might set to do economic policy might change in response to
180
:inflation, which is one of the main drivers by policy changes over time, because maybe the
targets shift of this relationship um or
181
:there are some extra things happening like COVID, for example, which somehow uh distort
this relationship for a little time.
182
:And uh this time-varying process of how these coefficients evolve is then regulated, if
you will, by an AR1 process.
183
:And in fact, those people who know time series, they will notice that since this beta
vector is k times 1, you essentially have a vector order regression here in the second
184
:line.
185
:But what we do in the paper for the simplicity of the math, and also what I do in this
case study, is that I assume that this uh coefficient vector, which is called the uh state
186
:transition matrix, phi, just k by k, is diagonal.
187
:So it's only non-zero across the diagonal component.
188
:What this means is that each individual uh
189
:coefficient is only related to its own past individual coefficient, not other coefficients
also.
190
:Right, okay, okay, yeah.
191
:And if it were not the case, would we be in the presence of a VAR model then?
192
:Vector, or TOR, or aggressive?
193
:So it's still a VAR model, it's just assuming that um all the other...
194
:coefficients are unrelated except for maybe this error term here.
195
:This kind of gives you some further way how to impose non-zero correlation between the
coefficients.
196
:Right, yeah, yeah, yeah.
197
:Okay.
198
:So here, but here we assume that the different...
199
:So we have k times series here that are modeled at the same time, right?
200
:Right, so the target is still a scalar, but then the covariates are then k times 1, right?
201
:And then the process for the covariate coefficients, that is then a vector autoregression.
202
:Right, yeah.
203
:So the beta t's are modeled with a vector autoregression, but here we impose the
correlation between the betas, the k betas,
204
:to be zero when it's not on the diagonal.
205
:Well, it's implied by the structure of the state transition matrix.
206
:Yeah.
207
:So that means the later process is dependent only on the previous version.
208
:of the covariate, the previous value of the covariate.
209
:Yeah, coefficients, exactly.
210
:Yeah, and the k covariates don't interact, basically.
211
:No, exactly.
212
:This keeps the math nice and contained.
213
:yeah.
214
:And so that means um we have k latent states here, and we have k latent states because we
have
215
:k covariates.
216
:Yes.
217
:Correct.
218
:So you can expand this in several ways.
219
:You can also, what we do in the paper as well, the AR2 paper is that we allow also for
time varying intercepts.
220
:So you would have like another, let's say tau coefficient here, and then that could also
have its own AR process.
221
:This would be then closer to this what you mentioned before, Alex, this structural time
series models, where you have uh multiple state processes modeled simultaneously.
222
:Right, yeah.
223
:So I think I mentioned that off the record.
224
:I'm going to say it again on the record.
225
:uh Yeah, basically, we're here.
226
:The idea would be to have um like
227
:each latent state, so each of the K latent state being modeled with not only an
autoregressive process as we have here, but maybe you have a local linear trend and then
228
:you add to that an AR process to pick up the noise.
229
:Because the issue of just having the AR process is that when you are interested in out of
sample predictions is that...
230
:the out-of-sample predictions of the AR are usually not very interesting uh because they
pick up the noise.
231
:And so that's not really what you're interested in when you do out-of-sample predictions.
232
:So here, if you have a structural time-series decomposition, you could be able to
decompose basically the signal and the noise between these different processes.
233
:And so here, yeah, each of your case states
234
:would be modeled like that with one structural time series, but you would still have an
emission.
235
:So like the YTs we see here, like the data usually in these literature are called
emissions.
236
:um And so your emission would still be 1D, right?
237
:It would still be a scalar emission.
238
:That's correct.
239
:Okay, cool.
240
:You can extend that as well, so you can make y also multivariate.
241
:That's a different beast, maybe we can talk about that later.
242
:Yeah, yeah.
243
:These beasts start to be very big models where you have covariation everywhere at the...
244
:What is the...
245
:I always forget the name of the second equation, so you have the latent state equations
and the emission equation.
246
:Yeah, like the...
247
:the process equation and the emission equation.
248
:Is that the right term?
249
:Well, every literature has their own definition.
250
:I've heard that as well, emission.
251
:I've never used it actually, be honest.
252
:So in econ, we call the y equation this one, the observation equation, and we call this
the state equation for the betas.
253
:Right, yeah.
254
:Yeah, so I've seen emission and observation equation used interchangeably and then the
latent equation.
255
:yeah, dude, so that people have the nomenclature right and clear.
256
:Yeah.
257
:OK, cool.
258
:So that's all clear, hopefully.
259
:So let's continue with the case study.
260
:Right.
261
:And so one of the big problems here is that this unknown in the state equation, so the
betas are explained by the past betas plus another error term.
262
:That's k dimensional.
263
:This has a covariance, which we call big sigma subscript, sorry, big sigma subscript beta.
264
:And these, eh in this case, I'm also just making a diagonal structure just for the
265
:for simplicity of everything, they determine how wiggly the states are uh because they
inject noise into the state process and the larger these variance terms are, so those are
266
:the diagonals across the error covariance term of the state, the larger these are, the
more variable the state process is.
267
:uh
268
:There's a huge literature on how to set priors for these oh because if you let them be
fairly uh wide, then what you'll find is that a horribly overfitting state space model
269
:because you're essentially fitting all the noise in your data by making the state process
as wiggly as possible.
270
:Alex, I think you're on mute.
271
:Right, sorry.
272
:uh Yeah, yeah, that makes sense.
273
:So basically, if you have two y of a prior on the sigma from the latent state equation, um
and I think also I've seen in the literature this matrix, because it's often written also
274
:in matrix form when you have...
275
:I hate the names of these matrix because they don't mean anything.
276
:think it's like F and Q and H and R.
277
:It's like, invented these names?
278
:It's terrible.
279
:They tried to make it as unaccessible as possible for newcomers.
280
:It's completely stupid.
281
:Anyways, yeah, so you have like the, like, and these matrix on the, on the, on the
location of the normals.
282
:So F and.
283
:and h usually in the literature they are called also uh the weights of the processes and
on the right so the noise um well I think they are called drifts also uh there is a lot of
284
:different names for that so that's why I get that out of the way for for people right now
but basically here we're talking about the noise of the latent state equation
285
:So this is the Sigma Beta in your case study.
286
:people would probably see that also in the literature as the matrix Q.
287
:And so what you're saying is that if the priors on this matrix are too big, then basically
your AR process will explain all the noise.
288
:in your data and your observational noise, so the sigma on the emission equation, the
observation equation exactly, which is the sigma in your case study, which also people
289
:will see as...
290
:um
291
:think R, the matrix R in the literature.
292
:Everybody knows R stands for noise.
293
:so yeah, then that means this matrix will be really small.
294
:And if you just take that for granted, you would uh just interpret that as, there is not a
lot of noise in my observational process.
295
:Yeah, correct.
296
:So this is the scalar, just to be clear.
297
:um
298
:Yeah, exactly.
299
:you know, typically the, what the previous literature does is it says, let's put an
inverse gamma.
300
:That's what I call IG here.
301
:Inverse gamma prior on the state innovation variances.
302
:the diagonal of the state variance covariance.
303
:Let's put an inverse gamma on this and uh be fairly un-reformative or
304
:you know, in quotation marks, something like a inverse comma 0.1, 0.1.
305
:And I think the listeners of your podcast will probably immediately know, oh, this is a
bad choice because you have a very long tail and along the positive reels.
306
:And if your likelihood information is not very strong, that identifies the variation of
the states, then the prior will dominate and you'll end up with hugely.
307
:a huge variance on your states and therefore overfitting.
308
:And I can recommend this paper in particular.
309
:I'm hovering over it uh in the case study.
310
:It's by Sylvia Früderschnatter, great econometrician, a statistician, and her colleague
Helga, um who rewrite the space process into its non-centered form.
311
:which allows you to put normal priors on the state standard deviation, which feature then
in the observation equation.
312
:This might sound a bit esoteric without kind of seeing the math, but um they go into much
more detail as to why setting as an inverse gamma prior on the state variances is a bad
313
:idea.
314
:And um we take kind of this idea one step further in that we say, okay, how about we
315
:uh starts in fact from an R squared prior over the entire process.
316
:So something that explains the variation of this guy.
317
:So the, the state and covariate contribution over the variance of the entire data, because
this is something that we often can interpret, say like, we know our model explains, let's
318
:60 % of the variation in our target.
319
:And then from prior on this, what is the prior prior on
320
:the state variances.
321
:And uh just to be clear what y priors will entail, it will entail that the variance of uh
this term, the predictor term in the observation equation, so x times beta, will dwarf the
322
:variance of the observation noise, which is in this case because it's just a normal model,
uh this variance plus the variance of the observation model.
323
:Yeah, yeah, So that's radially 2.
324
:what we just talked about in the dead variance becomes too wide.
325
:Exactly.
326
:Oftentimes you'll find that those overfitting models will in fact result in a R squared
that is very close to one.
327
:Basically saying that, you're able to explain all of the variation of data and this is
often highly unrealistic.
328
:And particularly if you think about this with time series models where, and let's just
briefly go back to the AR.
329
:so the simple autoregressive type model case, if you add more lags, so more information
about the past, you wouldn't think that you can better predict the future, right?
330
:Oftentimes, only the first couple of lags or whatever the time series structure is, is
good for prediction.
331
:And then if you increase the number of lags more and more, you wouldn't think that you're
going to get more more variance of future data, right?
332
:So in that sense, uh setting a reasonable prior on the R squared is actually a good thing
also with time series.
333
:And this is kind of preempting some maybe screams that the audience has, like particularly
those who are more trained in classical time series econometrics, they'll tell you, R
334
:squared is not a good thing to look at for time series.
335
:And I agree when the model and data are non-stationary.
336
:because then the variance goes to infinity and this R-squared metric is not well defined.
337
:But in the case where you have stationary time series, the variance will be strictly uh
below infinity and therefore this R-squared metric again makes sense to use.
338
:Yeah.
339
:Okay.
340
:But...
341
:Sounds a bit like, you yeah.
342
:You could be R-squared hacking with that, basically.
343
:Yeah, exactly.
344
:I mean, that's what people are afraid of with this R-squared thing, right?
345
:Because they understand from their classical training, if you just include more more
covariates, then by definition, R-square is monotonically increasing with the amount of
346
:covariates you include.
347
:However, in the probabilistic sense, you also have a uh probability distribution,
posterior probability distribution over your R-squared.
348
:And here, you can regularize with the prior
349
:away from this tendency.
350
:Yeah, that makes sense.
351
:And um yeah, so if we think along the lines of what the R squared metric looks like, if we
go through the math that we present at the paper, then we get this ugly looking fraction.
352
:And this is basically telling you that the R squared is a function
353
:of, let me zoom in a bit, as a function of the state variances, the state AR coefficients,
phi, and the observation noise.
354
:And what we've done to arrive here is that we integrated out the data, so the x's and the
y's, but also the state
355
:Realization itself.
356
:So you'll recognize that the betas don't appear here, but only the variance of the betas.
357
:Yeah.
358
:And the nice thing about this expression really is it's pretty much the total variance of
your predictor term.
359
:So that's Xt times beta t over the variance of your predictor term plus one.
360
:And so if we wanted to set a certain prior, let's say a beta prior on this R squared
metric, then we can figure out by change of variables, what is the implied prior on the
361
:state variances on this kind of total variance term here.
362
:Yeah.
363
:And that's very cool because then you can just basically define that prior on R squared.
364
:in your model, right?
365
:And then I guess just use that as the prior in like use that in the priors for the betas
in the model directly.
366
:Correct.
367
:Yeah.
368
:And so then I can, guess, and I,
369
:I think you give some recommendations in the paper for recommend if I remember correctly
to set the prior R squared and then just basically do prior predictive checks to see that
370
:it makes sense in your case uh and then go from there and then you can fit your model.
371
:Yeah, exactly.
372
:In the paper, we'd like to recommend this uh beta one third and three prior.
373
:parabenorized here in terms of location and scale lipid distribution.
374
:I think in PMC you also have that coded up, Yeah.
375
:The beta proportion.
376
:Yeah, exactly.
377
:So that would be familiar in that case uh because it has a lot of the mass towards uh an R
squared below 0.5.
378
:It has a very gentle slope.
379
:So if the uh likelihood is pulling you
380
:in one direction, you're not going to overwhelm, most likely, the likelihood too much with
an aggressive slope on the R-square space.
381
:You're of uh weakly, let's say, regularizing toward ah lower R-square uh values and
therefore likely have overfitting.
382
:Yeah, yeah, No, that's very cool.
383
:And so you're going to show a bit now the implementation on different.
384
:uh...
385
:different data sets but also for people using PIMC Austin, Rushford, Deed,
386
:like, Kodita basically that prior in Pimc uh on his blog I linked to the blog post in the
show notes uh that's a very good very very good very good blog post so I definitely
387
:encourage you to check that out uh his blog post though is uh limited to one part of your
paper you do more than that in the paper and you'll get to that
388
:in a minute, David, but yeah, like that's a good introduction.
389
:think in his blog post, Austin mainly, so he cuts up the prior and then generates data
based on three different processes, different data generating processes, and then check
390
:that we can recover the parameters.
391
:of the three different processes with the R-Square prior.
392
:You do that, but you also do more than that.
393
:that's what we're also going to talk about today.
394
:And so just briefly walking through the machinery, then we are able to set the uh prior
process, how it would look like in the Stan program.
395
:So we have then this variance term here in particular, which comes then from this whole
R-squared machinery.
396
:And if you look very closely at these two equations, so one is this R-squared definition
and the other is the prior variance on the latent spaces, you'll see that here are two
397
:factors, which if they are included, they allow you to get rid
398
:of most of this very unwieldy looking stuff in the R-square definition and allows you then
to isolate uh only the variance terms.
399
:uh But yeah, so there's more about this in the paper.
400
:I would encourage you for those who want to dig more into this to have look at that.
401
:But the only other important thing I want to mention at this point is that you have
another part of this R-square prior that allows you to decompose uh
402
:um the variance.
403
:And this you can think of as determining the importance of the individual model components
and um what they do mechanistically.
404
:states, right.
405
:Exactly.
406
:And what they do mechanistically is that they allocate the variance.
407
:Right.
408
:Yeah.
409
:So basically which part, which states contributes more variance than another state.
410
:Exactly.
411
:Because basically, think ultimately you can't really determine the exact value of the
variance that's contributed by each state.
412
:can't just...
413
:In absolute, you can only do that in relative.
414
:The proportion of the variance is coming from that state, but you cannot really say it
from an absolute perspective, I guess.
415
:As in marginal?
416
:Yeah, that would be hard.
417
:You can make statements about the entire variance of all the
418
:And you can make a statement about relative variance in a way.
419
:that's what this decomposition lets you do.
420
:Yeah, exactly.
421
:Yeah, because I think if you want the absolute decomposition, that's just undetermined.
422
:So because an infinity of different decomposition of the variance of the different states
will give you the same total variance.
423
:So yeah, I think just the proportion is going to be identifiable, which is what this is
doing.
424
:That's why you're doing a direct clip prior on this side term.
425
:Yeah, Dirichlet makes sense here because you're trying to find weights that allows you to
decompose this variance and then Dirichlet is just a natural prior of a simplex, really.
426
:But yeah, I mean, you mentioned something about identifiability.
427
:We're not taking hard stances on this.
428
:It can be a problem, I think, in state spaces more generally.
429
:Like how can you identify where the variance comes from?
430
:But in general, think putting a prior on the weights to decompose the variance makes
sense.
431
:You want the data somehow to inform also that, I think.
432
:Yes.
433
:No, for sure, for sure.
434
:Yeah.
435
:mean, identifiability in general is hard for time series models.
436
:It's hard also if you use GPs on time series models.
437
:It's just that time search data is hard, and you don't have a lot of data in a way.
438
:You need a lot of covariates.
439
:If you can have external covariates, that helps a lot.
440
:um But whether you're using state space models or GPs, I think GPs is even harder because
this is semi-parametric, where state space is, you have more structure by definition.
441
:um But yeah, identifiability is always a.
442
:a big issue here and the more informative data and priors you can have the better.
443
:And I'll mention that Arno Solen at Aalto has investigated the link between GPs and state
spaces and there's a very close computational link.
444
:can find the posterior state space as if were, sorry, the posterior of a GP as if it were
a state space.
445
:Right, yeah.
446
:And you can think of also the state space as being
447
:somewhat of a discrete approximation to the continuous GP.
448
:can also, because you also have like, let's say like a variance of the GP in a way, the
latent function, the, in this case, the, the state itself competing with the variance of
449
:the observation model itself.
450
:Yes.
451
:Yeah.
452
:Yeah.
453
:And that definitely happens with GPs, right?
454
:Yeah.
455
:They can, they can pick up.
456
:like they are so flexible that they can pick up the noise.
457
:Also, so you have to be very careful on the priors.
458
:And so, and then also, and if you add categorical predictors to that, it's very hard
because categorical predictors are not really predictors anyways, you know, it's like, I
459
:find they don't really add a lot of information.
460
:They just, you know, break down the model in different subsets.
461
:ah But you also need, like if you can have continuous predictors indicate like.
462
:informing the different subsets that definitely helps.
463
:Because otherwise, yeah, the GP can fit anything, so it can definitely feed the noise in
your different subset anyways.
464
:But yeah, I'm not surprised that state space is generalized to GPs.
465
:It seems to be a lot of their universe, everything in the GP in the end.
466
:I'm pretty sure that's what Black Holes are.
467
:They are just cities inside.
468
:So yeah, let's continue.
469
:think you can now go to the application part of that, right?
470
:And you have an inflation forecasting example for us.
471
:Exactly.
472
:And there are some other priors also in literature.
473
:So those who are familiar with econ might recognize this Minnesota prior.
474
:Those who generally follow also the uh shrinkage literature, they'll know the regularized
social prior.
475
:So we're incorporating these here too as a part of comparison.
476
:Mm hmm.
477
:Yes.
478
:And then inflation forecasting.
479
:So here, some very crude code.
480
:I'm not particularly um happy about it, but I think it does the job.
481
:It loads data from the um from one of the feds in the US with this red R package.
482
:And I'm just have a lot of functions here.
483
:So let me just skip that.
484
:And where the data are directly loaded uh from
485
:the St.
486
:Louis Fed website.
487
:I've changed it now locally to first download it and then upload it because I find that
these, like downloading from links is not the safest thing to do.
488
:So here in this particular instance where I'm showing this, just having first download the
data and then uploading.
489
:And we have a set of 20 covariates.
490
:So 20 covariates and then therefore also 20 states.
491
:because those are the regression coefficients which we allow to vary over time.
492
:OK, yeah.
493
:Yeah, I think that's interesting.
494
:Yeah, especially that plot here where you showed the data.
495
:you have the outcome variable is inflation, right?
496
:Correct.
497
:And then you have 20 other time series.
498
:And each of these time series are a covariate, right?
499
:Yeah, each of these time series are, so that's just the covariate value of the X's in the
ah state space equation I showing above.
500
:It's different data, it relates to financial market information, you have some sub parts
of inflation in here as well.
501
:uh Industrial production for those who are in economy and macro, they'll know that
industrial production is super important for explaining ah macro data movements.
502
:So that's included here.
503
:So that's like, if we look at just one of them,
504
:for instance, industrial production that would be like for each year, I think it's monthly
data, right?
505
:For each month, what is the value of the industrial production, whatever the value, the
scale of it is.
506
:And so if you're looking at the screen, have like each, these are lines, but basically
these are just points and then each point gives you that value.
507
:And so for the corresponding value of the industrial production, you have a corresponding
value at that same time point of the inflation, which is the outcome, which is the y of
508
:the equation, of the reservation equations.
509
:And then the XTs are all the other variables.
510
:So 20 of them in total, which is k in the equations we saw before.
511
:here, k equals 20.
512
:so the matrix we talked about before.
513
:the phi matrix that is also called the R matrix in the literature, the H matrix, sorry, in
the literature, uh which is the matrix of the latent state process.
514
:uh Could be a full matrix, full rank matrix, but here it's on your diagonal matrix.
515
:And so the parameters in that matrix are called the betas.
516
:So, and betas are indexed by K and T.
517
:Is that all correct?
518
:Can we continue?
519
:I think that was pretty much, yeah.
520
:Okay, cool.
521
:Yeah, so, you know, I think it's always a good point in the workflow to plot your data,
just to know like, oh, is there maybe an outlier somewhere that looks fishy?
522
:You know, you see, for example, here all the time series have usually like this S shape
around COVID, this is:
523
:So you know that there's a lot of funky things expected around this time.
524
:Yeah, and I'm guessing you're scaling all the variables, standardizing all the variables
before feeding that to the model so that it's all in the same scale?
525
:So here I'm following the recommendations of the St.
526
:Louis Fed.
527
:They have a set of very good researchers who look at what is the best transformation for
the data, such that they are stationary.
528
:Or, know, weekly stationary at least.
529
:And um generally if you do econ analysis for macro time series data stuff, I would
recommend just follow the recommendations of the statistical agencies.
530
:In this case, the San Luis Fed.
531
:Interesting.
532
:um They are kind of the data authority on much of the uh US stuff.
533
:Interesting.
534
:Yeah.
535
:Nice.
536
:So here, do you do?
537
:you need, do you, like, what is the recommendation?
538
:Are you doing any pretreatment or do you just...
539
:follow their...
540
:So they have codes.
541
:They have codes.
542
:They mean different things.
543
:Like, let's say one time difference, maybe growth rate calculation, maybe you leave it
entirely unprepared.
544
:So it depends on the time series.
545
:I...
546
:Okay.
547
:Okay.
548
:Exactly.
549
:Interesting.
550
:Yeah.
551
:Because these time series have very different scales.
552
:So...
553
:Exactly.
554
:Yeah.
555
:So yeah, that's definitely something I would be concerned with, especially when you give
that to HMC.
556
:And so the R-square prior, in fact, can be made robust to the scale of the data by
including the variance of uh your covariate information in the prior itself.
557
:So you can scale it properly.
558
:Which is not done here, right?
559
:I've not seen that in the equation.
560
:Let me see.
561
:I think for simplicity I assumed that the covariates all have variance 1, but in the model
it would be then another fraction here divided by the variance of x.
562
:But more on that also on the paper.
563
:So that's where we have that more.
564
:Okay, so now we know what the data are looking like.
565
:So just to motivate where time variation, the coefficients comes from, what I've done here
months, starting from:
566
:a univariate regression.
567
:So just our target inflation against, let's say, industrial production.
568
:save the coefficient value and roll until the end of uh my data availability.
569
:And what we would find if there's indeed variation in the coefficients is that we also
find variation here.
570
:And so that means basically you run univariate regression for each time point
individually?
571
:Exactly.
572
:Like in a for loop?
573
:Exactly.
574
:That's exactly what this guy's doing.
575
:So the model doesn't know anything about like...
576
:time correlation.
577
:Why it's like you just run 1990 against 1990 then 91 against 91 again and yeah okay I mean
it's not a statistical guarantee however if you would find that those lines are all just
578
:like straight you know then you probably are not going to find much variation even if you
do all the bells and whistles that we offer kind of you know yeah yeah no I mean and
579
:that's okay that's a good check right because these models are not
580
:The models we're talking about here are not trivial.
581
:Each time you need to do state spaces or Gaussian processes, it's not trivial.
582
:So if you don't have time variation in your data, that's better.
583
:Honestly, your life is going to be easier.
584
:uh yeah, that's interesting.
585
:I didn't know about that method.
586
:like, it's a good heuristic.
587
:It's just like, you can just take a subset of your data, maybe your random sample, or just
take every
588
:I don't know, five or six months and then you run a regression in a for loop.
589
:I mean, not even a for loop, you just factorize that with standard primacy.
590
:But it's just like, it's an independent univariate regression, just plain.
591
:You could do that in PRMS or BAMBI even.
592
:Exactly.
593
:If the regression coefficients come up very, very close to each other, then that probably
means you don't have that much time variation.
594
:Here it's not the case.
595
:Here we can see the lines are very, very weakly.
596
:Exactly.
597
:It's also part of the workflow.
598
:I think you should always start simple.
599
:Always start with a simple model.
600
:See if you can find interesting relationships.
601
:can even start just by, you you do a ggplot and then you just have put an lm between the
data points in there and just see, is there any kind of interesting dynamics you could
602
:pick up?
603
:And this is doing this essentially 20 times.
604
:Yeah, yeah, yeah.
605
:I would probably for that plot that probably be useful if you uh shared the Y scale, know,
the Y axis between the plots because like they had different scales and so probably that
606
:will also inform but which chorus seem to be more variable in time than others.
607
:Yeah, that's a good point.
608
:um I think for some you can already see that there's some significant variation like
that's a industrial production.
609
:I think if you look at the scale of the data before, was between think 0 and 15.
610
:The coefficient goes between 0.2 and minus 0.4.
611
:So there's some variation to be expected.
612
:Nice.
613
:now, Stan models.
614
:Yeah.
615
:So I've hit them below.
616
:So you can look at them, I think, when you have the time.
617
:There also, we have a repo for the paper as well.
618
:There, we have written a SnakeMake pipeline.
619
:So it's maybe a little bit obscure if you haven't gone through SnakeMake pipelines before.
620
:But the Stan code is also there.
621
:And here, I reproduced it for the dynamic regression, so the state space model that we're
looking at here.
622
:And that's at the end of this.
623
:Yeah, and I've put that link in the show notes, of course.
624
:All right.
625
:And so here we're just setting up the models and sampling from them.
626
:I've coded up one indicator there that indicates whether it's a prior predictive or full
posterior analysis.
627
:So basically including the model in the model block or the likelihood contribution model
block or not.
628
:And we were talking in the beginning about what happens if you have fairly unrestricted
priors on the variances of your coefficients in your model.
629
:And this is exactly what's happened, what I'm showing here.
630
:So the Minnesota and RHS, those are two popular shrinkage priors for time series.
631
:If you sample from the priors, plug it into the observation equation, generate what would
be the per predictive
632
:wise and then calculate the R-square statistic, you'll find that these two models say a
priori we expect to fit 100 % of the data variation.
633
:Whereas if we uh look at our model, the AR2, um here we have full control about how this
distribution looks like.
634
:And this is approximately this beta 1 3rd distribution.
635
:There's also a nice way to check that your coding is correct.
636
:Like if this had an entirely different form, like a 10 structure around 0.5, um then you
would also know, okay, something's wrong with my code.
637
:Yeah, Yeah, for sure.
638
:And that's really why, that's what we were talking about before at the beginning.
639
:That's what I really like about the way of setting prior that way.
640
:Also, that's how you can see that setting your priors with other priors than the IRR
squared is really weird.
641
:It's like, before seeing any data, you're expecting, you're telling the model to expect to
be able to explain.
642
:all the variance in the data with your latent state equation.
643
:So it's like saying, oh yeah, there is no noise in the data at all.
644
:It's possible, but I would bet it is very, very improbable.
645
:Well, they cancel the noise in data, but it is dwarfed by the...
646
:variance of your...
647
:the latent space process.
648
:Yeah, it's gonna pick up everything and I don't think it's a good model in choice.
649
:I think the error squared process here in the prior distribution makes much more sense.
650
:Yeah, yeah.
651
:And if you know inflation or if you know kind of your inflation data in the US, you also
know that it's a really hard time series to predict.
652
:So there's a whole literature about how hard it is to predict variance, sorry, And we
would expect in fact
653
:lower r squared something between 0 and 0.5 like if you're a specialist in econ you would
say okay one not possible
654
:All right.
655
:And so here, um I'm just generating still from the prior.
656
:So just to motivate what the variance looks like, looks like of the coefficients to each
of these time series, I just sample from the state process.
657
:This is how it looks like for the AR2.
658
:You know, it's not informed by the data, so they all look approximately like you know, a
distributed around zero.
659
:And this is how it looks like for
660
:the Minnesota and the RHS.
661
:And you're still plugging in the X information.
662
:You're informing the prior with the likelihood at this point.
663
:But you see that the variance of the data also heavily influences the prior predictives
here.
664
:And so these are the betas, right?
665
:Yes.
666
:Yeah.
667
:Yeah.
668
:And you can see, definitely, if you see the screen here, people, or if you're following up
with a blog,
669
:blog post.
670
:These are weird.
671
:These are weird prior checks.
672
:Yeah, you wouldn't expect your coefficient value for a certain time series to range
between minus 50 and 50 if your range was like between 0 and 100, let's say, for your
673
:target.
674
:Yeah, and also because that implies that then other time series have zero contribution and
you cannot really control also.
675
:which one have zeros also.
676
:So it's quite bad.
677
:Basically, also the problem with that is that it puts a lot of onus on the data to be very
informative.
678
:And that might not be the case, especially with some search data where all these models
have a lot of parameters.
679
:so that's already a big responsibility for the model.
680
:then if you like.
681
:put even less prior information in there.
682
:That means you need to squeeze even more information from the data, where the data already
in time series is not necessarily the most informative.
683
:So it's like that's pining up on the complexities.
684
:Yes, correct.
685
:And good point, by the way.
686
:I think also for listeners, you can have very fancy priors and everything.
687
:But if your likelihood is very, very strong, really informative about the value of
parameters, eh oftentimes uh
688
:doesn't matter so much what you're doing with the prior.
689
:So it can happen that the data information is fully overwhelmed the prior.
690
:ah As you just said, you have k states, you have t time points.
691
:That means you have k times t parameters you're estimating.
692
:That's a lot.
693
:At least.
694
:And that's just the bad-ass.
695
:But then if you start sharing information and so on, you add parameters uh to
696
:to be able to do that partial pulling and so on.
697
:So, and also like each time that means also like you, you sub that subset, posterior space
if you want in a way.
698
:And so that means that each part of the, of this, it's just subspace is only informed by
one layer of the data.
699
:So it's not like you're taking the full time series and then you're just sharing
everything.
700
:It's no, like then you have that time state and these state and that's just like,
701
:you might end up just having one data point to inform that parameter in the end.
702
:if at all.
703
:Yeah, if at all.
704
:So priors matter.
705
:this distance basis.
706
:Especially for a time search model.
707
:it's like, that's basically my point.
708
:Because also I've discovered that with experience, right?
709
:And that's why if you don't see any time variation, it's way better.
710
:Because then if you ignore time, you basically can
711
:of your data and aggregate it, and so that increases your sample size basically, and so
that increases the information that you have in the likelihood, and decreases the
712
:information of the importance of the price.
713
:Yeah, yeah, exactly.
714
:And of course, you know, the nice thing about this R-squared stuff is that you're a priori
saying that those states, they have to fight each other for the same variance.
715
:Like, we've upper bounded the variance, so they have to fight each other for explaining
the data, m loosely speaking.
716
:So if one state is important, and that means away from 0 significantly in some sense, then
another state has to give, it has to then have less variation.
717
:And that comes from the Dirichlet prior.
718
:Yes, correct.
719
:Yeah, and this manifests.
720
:now we have the posterior distributions on the r squared.
721
:We can see that they get, that they can, like we have posterior shrinkage, so that's
really good.
722
:all go in the same direction but it's just that Minnesota and and Hoss for Pryor were so
biased towards the one the uh priority a probability mass gerrit and biased towards an R
723
:squad of one that like it is super hard for them to get away from it too much whereas the
724
:the R squared prior is much more aggressive on saying that the latent states are not that
are not picking up too much of the noise.
725
:Yeah, correct.
726
:maybe this...
727
:I don't know if this is a good value of R squared.
728
:I'm not making a statement about this, but there's a big difference and that's what's
important.
729
:And we can verify whether this good or not later with predictions.
730
:Yes.
731
:And so basically what the R squared AR squared model here is saying is that the covariates
here, the latent states, inform much less of the variation in the data than what you would
732
:conclude if you're using Minnesota or RHS priors.
733
:Absolutely correct.
734
:Yeah, nothing to add.
735
:Very good.
736
:Thanks.
737
:Awesome.
738
:Let's go on.
739
:oh And you can also think in time series about some notion of R squared over time.
740
:And this literally takes uh just the contribution of the states and covariates in terms of
the variability per time point and relates it to the total variability per time point.
741
:And this is like how much of the variance of the data can you explain at each individual
time point?
742
:And what's...
743
:What those posterior series are saying here is that the Minnesota and RHS, which tended to
have a marginal R-squared, total over all time points to be larger, also show much more
744
:variability over time in R-squared.
745
:Okay, yeah, that's interesting.
746
:And you have, like, that formula, I guess you implemented it in...
747
:in R in the package somewhere?
748
:Yeah, well, I've just coded it myself here.
749
:I made a function that's up below, and I just call it.
750
:um So this is the extract R2 function.
751
:it's very easy.
752
:You really just take a sample from the sample from your posterior, those beta.
753
:you multiply it by the inner product of the...
754
:oh, it was mistake with the transposes, by the way.
755
:I'll fix that.
756
:You multiply the inner product of the covariate vector per time point and relate this to
that um quantity again and the observation noise.
757
:This is a way how you can think about R squared over time.
758
:Yeah, it's definitely something that's like...
759
:Yeah, needs to be if you're using a package to that.
760
:Like let's say we have that in primacy state space.
761
:That's a function we'd like to have basically.
762
:And uh same story, more variation with Minnesota RHS compared to AR2.
763
:And then here are now the posteriors of the beta vector over time.
764
:So we have drawn our MCMC samples, we take the average over the MCMC samples, and then
just look at the time series of the beta uh states.
765
:And so we see some variation with the AR2 that's being picked up.
766
:There's a lot of variation for all the time series in a way, uh very similar scale.
767
:So nothing is um fully dominating em the variance.
768
:So these are the betas.
769
:These are the weights of the latent states.
770
:Exactly.
771
:And for those who are following along, TVP in those graphs refers to time varying
parameters.
772
:In Econ, we refer to these state space models where you have a stage for the coefficients
for an inner regression.
773
:We call those time varying parameters for whatever reason.
774
:I understand the reason, but it's a little bit too general.
775
:And this is what happens with the Minnesota and RHS, the same picture, basically.
776
:A lot of the series are getting shrunk to zero and then a couple of times series are have
a lot of variation.
777
:Just to show you this how looks like for our test too.
778
:actually, a quick question that's a bit more theoretical, and I don't know if you'll be
able to answer it, but what I'm wondering, maybe what I'm a bit confused by here is, is
779
:that a state-space model with discrete or continuous latent spaces here?
780
:Discrete.
781
:Yeah, OK.
782
:Yeah, discrete.
783
:But they are not.
784
:matured exclusive.
785
:No, mean, a discrete is a subset of the continuous time series.
786
:Right.
787
:But it's not, so it's not an HMM.
788
:It's not a hidden Markov model.
789
:Is it?
790
:Well, it depends how you define hidden Markov model in a way.
791
:So if you say that the hidden or the Markovian process here is this state, discrete state
space transition.
792
:than uh it would be, but it's not in the sense of what you sometimes see where you say,
okay, we have five discrete states for coefficients and we draw inference onto the
793
:location and magnitude of where the states are.
794
:Right.
795
:Yeah, for me, an HMM is more like that, where it's like we have discrete states, but
you're switching from one state to the other.
796
:It's like...
797
:Let's say you have five states at some part in the time series, the regime you're at that
uh dictates your emissions depends on well, an AR process, for instance, that belongs to
798
:state one.
799
:And then at some point, the regime switches to state two and then it switches back to one
or goes to three or five, et cetera.
800
:That's more like that where it's like, that's why I was saying mutually exclusive.
801
:Whereas here, the states are not mutually exclusive, like literally because the parameters
in the sense that the parameters uh can be all active at the same time.
802
:Like you can have beta one positive for um industrial production and also beta one
positive or negative for AAA FFME here, which I don't know what that means, right?
803
:It's not like all the states can be active at the same time.
804
:And it's like, and then the application of them gives you the emissions, which in my mind
is not really a hidden Markov model, but it's more like kind of a discretized linear
805
:Gaussian state space.
806
:Yes.
807
:And, well, okay.
808
:I mean, the hidden Markov model can also be discrete, right?
809
:But then what's not the case here is that uh let's say you have 10 time points and you
have 10
810
:beta states, then it cannot be beta 1, beta 2, beta 3, beta 1.
811
:You're not repeating the same state along the time series.
812
:Every new time point implies a new state.
813
:They can be related, but there's no transition matrix which says
814
:that the probability of going back to beta 1 after 10.1 has passed.
815
:Yes.
816
:Yeah, yeah.
817
:Yeah.
818
:Yeah.
819
:So that's why it's really different in my mind.
820
:Like that's, that looks much more like a linear Gaussian state space model to me, whereas
the hidden Markov model is more something like a categorical, categorical, not necessarily
821
:emissions, but categorical state.
822
:at least.
823
:they're in some way related.
824
:think the um Hamilton time series book has some nice description on relationship between
these models.
825
:I read it during beginning of my PhD.
826
:Don't quiz me on the details, but it's a cool read if you want to learn more about that
stuff too.
827
:Yeah, I'm sure I'm confusing some people here, but it speaks...
828
:I'm confused myself on that.
829
:like, I'm still trying to understand really the difference.
830
:I know it's a nuanced difference and that maybe don't really doesn't really matter.
831
:But yeah, it's just like, for me to understand really what the actual differences.
832
:I mean, just a recap, we're not drawing inference on another transition matrix, which
tells you the probability of going between states.
833
:It's just that you start at a state and you end at a state and what happens in between
834
:can be fairly unrestricted.
835
:Yeah.
836
:It's more like, we are, so here each state, each case state is like one dimensional.
837
:So it's like tracking the position of, of a particle for instance, like that's what each
state is doing where we're tracking the position of the inflation particle in the subset
838
:of industrial production, for instance.
839
:Yeah, correct.
840
:And um yeah, pretty much that's what's going on here.
841
:um Just to recap, lot more variation in some states than in others compared to the AR2,
which has more like a constant, almost, variance across all states.
842
:you might ask, well, which one is better for prediction?
843
:And it turns out that the AR2 is then significantly better in terms of ELPD diff.
844
:Yeah, which is great.
845
:I guess you were happy to see that.
846
:Yeah, exactly.
847
:Awesome.
848
:Yeah.
849
:Maybe your last question related to that.
850
:So I linked to Austin's blog post.
851
:Can you tell us basically
852
:what's the difference between what Austin is implementing in the blog post and what you're
doing in the paper is because Austin is just doing one part.
853
:That blog post is just implementing one part of what you're doing in the paper.
854
:So can you make sure that is clear to people when what the difference is?
855
:Yeah, of course.
856
:Thank you.
857
:So the main difference is that Austin is looking only at one of our subset of the time
series models that we define this R2 prior over.
858
:So in the paper, have AR models, MA models, ARMAs.
859
:We have uh AR plus X, so independent covariates included with the AR regression.
860
:And we have some simple state space models.
861
:And what Austin did was he took a subset of only the AR simulations.
862
:and looked at the recovery for um the true parameter values that he sets according to what
we do in the paper with the AR prior set over the AR coefficients.
863
:So there there's no unknown states, it's all just y at the target and then um on the right
hand side of the equation you have lags of your target.
864
:Yeah, yeah because...
865
:oh
866
:then yeah, it's just like the likelihood of Y is an AR.
867
:that's all.
868
:The model is an AR and then the likelihood is conditionally normal.
869
:Whereas something that is more practical is what we're talking about at the beginning,
where you would have Y as a normal emission here as you have in the case study, but then
870
:the states could
871
:Well, not the state, but the observation equation could depend on each state being a
structurally decomposed time series with an AR process.
872
:So local linear trend plus AR, and you will use the AR squared prior on the AR
coefficient.
873
:Yes.
874
:Well, I mean, in the state space models, don't actually, the AR superar is not set on
the...
875
:state space er coefficients but on the state spaces variances.
876
:Because that is the main determinant for the variability.
877
:okay.
878
:the, how did you call the col-vend here in your case study?
879
:uh The sigmas.
880
:So in the literature I know about it's the R matrix.
881
:um So the
882
:the variance of the state equation.
883
:And here you call that the sigma.
884
:Capital sigma underscore beta.
885
:Sigma betas.
886
:Yes, exactly.
887
:Cool.
888
:Awesome.
889
:Great.
890
:So uh thank you so much, David, for that in-depth case today.
891
:Damn, that was good.
892
:And I think that was a first on the show.
893
:So thank you so much for doing that.
894
:um If you listeners let me know what you thought about that.
895
:I really like that kind of hybrid uh format content.
896
:uh think it's really it's more handsome and I think it's very practical.
897
:That means you guys have to check out the YouTube channel maybe a bit more but oh
898
:But I'm fine with that.
899
:So yeah, that was at least super cool to do.
900
:So thank you so much for that, David.
901
:I think you can stop sharing your screen now.
902
:And I've already taken a lot of your time, so I still have a lot of questions for you, but
I'm going to start playing this out, because I it's getting late for you.
903
:But maybe what I'm curious about is maybe for...
904
:you know, your future work.
905
:uh Like, what do you see as the most exciting trends or advancements in your field?
906
:And also where, where do you see the future of probabilistic programming heading?
907
:Of course, you, you're called out on Stan.
908
:You also do, you also work on some, some Python now, thanks to Osvaldo being there with
you, know, spreading the, spreading the dark energy of the Python world.
909
:Thanks, Osvaldo.
910
:Yeah, so basically, I'm curious to know where your head is at here, where your future
projects are.
911
:Yeah, I think there's a lot that excites me about our research agenda at Aalto, but also
others.
912
:What excites me in our group is that, and the people that we work with more generally, is
that we're still very actively thinking about how can we set priors about things that we
913
:have expert knowledge on.
914
:uh summary statistics, something about the predictive space and what do these in prior
imply then for all of these coefficients that we have in the model where we typically just
915
:go ahead and set normal zero one priors, you know.
916
:uh Those, that is still under active development.
917
:So we have like the, let's say the simple time series stuff covered to some degree, but
there's so much more to be done in time series, even with multivariate models.
918
:So there are ways to define this R squared step also for multivariate time series stuff.
919
:think that's really cool and has a lot of policy applications well.
920
:Because, you know, central banks and so on who do the econ policy for a country, they
often know that, well, everything is related to each other.
921
:If you're modeling inflation, you're also going to model GDP and so on and so forth.
922
:And, you know, doing this jointly is really the way to go in the end.
923
:And these priors, I think, can also be
924
:very good for those kind of questions.
925
:No, for sure.
926
:In the end, everything is a vector or a progressive model.
927
:oh I you're pitching to the choir, but I would tend to agree, at least approximately.
928
:Yeah, yeah.
929
:mean, yeah.
930
:Basically, often the limitation, the
931
:is the computational bottleneck, right?
932
:But honestly, almost all the time you would want uh vector autoregressive processes on the
observation equation and on the latent state equations.
933
:Most of the time you have correlations everywhere and you want to estimate that.
934
:The problem is that we often don't do that because it's just impossible to fit.
935
:But ideally, we would be able to do that.
936
:Yeah, exactly.
937
:And, you know, lot to be done there still.
938
:And we're also looking a lot into still, you know, workflow in terms of how, you know,
prior is one thing, but a whole new aspect also is model selection.
939
:So we're also very excited about a project where we're investigating the question of when
is selection necessary if you have different priors.
940
:uh to fulfill your goal in terms of prediction in the first case scenario.
941
:But even for causal analysis, this is an important question.
942
:How do you set the priors and do you need selection to somehow um produce reasonable
predictions for the treated versus non-treated treated?
943
:And we have lots of covariates or other structure in your model.
944
:uh So we're working also on that.
945
:I think it's going to be, you
946
:fun results are to come out of that.
947
:What do you mean by selection here?
948
:Selection processes, selection bias, or is that different?
949
:More like a variable selection or like component selection.
950
:so there's some stuff like this predictive inference, which does selection based on can
you find a surrogate model which gets as close as possible to a full model, like a
951
:Gaussian process that is hard to compute.
952
:And, you know, statistical folklore tells you that, well, if things get too hard, as in
you have too many components, do selection.
953
:Uh, because then you, you, you, implicitly decreased the variance for predictions because
you're focusing only on a couple things that are model and, um, well, you know, what we're
954
:kind of saying is, well, that's not necessarily true if you have good priors and
understanding, um, when that statement in fact is true and when, when it is not so true
955
:is, is an interesting.
956
:question because like let's say in those causal analyses where you have uh randomized
controlled trials, let's a drug uh is being administered to one population randomly or
957
:not, uh then does it make sense to let's say use an R-square prior, which implicitly will
say the treatment effect is correlated with other parameters that you're estimating.
958
:And is that a good choice?
959
:you know.
960
:What we're saying is like, it's, it's, it depends.
961
:And we kind of go into detail, but when the R square priors and priors like that are good
and when they're bad and when selection is needed and when not.
962
:Nice.
963
:Yeah.
964
:Yeah.
965
:Super interesting.
966
:Let me, let me know when, when you have something out on that.
967
:I'll be, I'll be very interested to, to read about that and, and, and maybe talk to you
again about that because that sounds, sounds very, important and interesting.
968
:So, yeah.
969
:Yeah.
970
:I'll be very curious about that.
971
:Maybe one thing about other people's work.
972
:was very selfish talking about our work, but I think there's some really cool stuff I'm
excited about that comes out from groups around like uh Paul Berkner and so on, which are
973
:also picking up work on normalizing flows and amortized Bayesian inference.
974
:think that stuff is going to be really good going forward because you can simplify
computations.
975
:You can reuse models for huge estimation tasks.
976
:I think this will make the kind of general
977
:based in computational workflow much easier, much more easier in the future.
978
:So I think using this, maybe integrating it with the knowledge that we're working on also
how to model and then how to do computation, those things, they are interdependent, I
979
:think, for the future.
980
:I'll be back to see what comes out of that.
981
:Yeah, completely agree with that.
982
:And I'll refer listeners to episode 107 with Marvin Schmidt about amortization inference.
983
:that was super interesting and haven't been able to use that in production yet but really
I'm looking forward to be able to do that and like have an excuse and use case for that
984
:because this looks really cool and and yeah I completely agree with you that it has a lot
of potential uh for that and everything Marvin and the bass flow team and Paul
985
:Paul Berkner are doing on that front end.
986
:Even anything Paul is doing is just always super brilliant and interesting.
987
:And what I love is um very practical.
988
:It's not research that's like, okay, that's cool,
989
:I can't even do that because the math is too complicated and it's not implemented
anywhere.
990
:know, that's always...
991
:uh His research and you guys research at Aalto is what I really like.
992
:It's often...
993
:It's always geared towards practical application and not just, yeah, that's cool math,
but...
994
:uh
995
:Nobody knows how to implement that.
996
:So that's really cool.
997
:And well done on that.
998
:I think it's amazing.
999
:ah And talking about normalizing flows, I'll also add to the show notes.
:
01:27:38,247 --> 01:27:54,686
nutpy from adrian zaybolt uh so he was also on the podcast i will also link to his podcast
episode with me where he came and talked about zero subnormal and uh nutpy which is uh an
:
01:27:54,686 --> 01:28:08,342
implementation of hmc but in rust so that's much faster and now he did something very cool
in nutpy and you can use that with pymc and stan you know models but now
:
01:28:08,342 --> 01:28:14,696
you can use normalizing flows to adapt HMC in NutPy.
:
01:28:14,696 --> 01:28:22,161
So basically what this will do is first run normalizing flow and train a neural network
with that.
:
01:28:22,161 --> 01:28:32,297
And then once it learns the way to basically turn the posterior space into a standard
normal, then it will use that to.
:
01:28:32,297 --> 01:28:35,709
initialize HMCE and run HMCE in your model.
:
01:28:35,709 --> 01:28:42,364
uh And so, of course, you don't want to do that on a simple linear regression, right?
:
01:28:42,364 --> 01:28:54,363
It's overkill, because it's going to take at least 10 minutes to fit, because you have to
train an old network first to learn uh the transformation of the posterior space that
:
01:28:54,363 --> 01:28:56,504
would make it sound normal.
:
01:28:56,504 --> 01:29:00,787
But if you have very complex models with
:
01:29:01,003 --> 01:29:16,424
very complex posterior space, things like nil's funnels, uh banana shapes, and so on,
where it's very hard to find a reparametrization that's efficient, then uh trying the
:
01:29:16,424 --> 01:29:21,638
normalizing flow adaptation of NetPy could be very interesting to you.
:
01:29:21,638 --> 01:29:29,023
uh And literally, if that works in your case, it can make your MCMC sampling
:
01:29:29,353 --> 01:29:31,445
much faster and also much more efficient.
:
01:29:31,445 --> 01:29:34,508
So that means much bigger effective sample size.
:
01:29:35,029 --> 01:29:47,103
So I will definitely do that in the show notes because I think it's something people uh
need to know about and well, try it out uh and that way Adrian can know uh this uh is
:
01:29:47,103 --> 01:29:48,364
working out there in the world.
:
01:29:48,364 --> 01:29:50,105
And I know he loves that.
:
01:29:52,783 --> 01:29:54,804
awesome well date uh...
:
01:29:54,804 --> 01:29:57,606
that's cool anything you want and that maybe i didn't uh...
:
01:29:57,606 --> 01:30:10,431
i didn't ask you or or mention before asking the last two questions and don't known i
think we have a couple of grantor uh...
:
01:30:10,431 --> 01:30:18,575
i think there's a lot of cool stuff here it's it's probably impossible to to find it all i
do want to make an honorable mention to all this work
:
01:30:18,731 --> 01:30:21,573
that goes into uh prior elicitation.
:
01:30:21,573 --> 01:30:35,963
I know that you're also interested in that, Alex, but there's also work that is coming out
of Helsinki and Aalto, which is looking into how can we go from knowledge about effects of
:
01:30:35,963 --> 01:30:37,824
covariance to priors.
:
01:30:37,905 --> 01:30:47,211
And um we have tools that can work for simple cases very well, but what if you have
correlated effects?
:
01:30:47,403 --> 01:30:56,226
like let's say, I don't know, age and um income predicting, I don't know, school outcomes
or whatever, right?
:
01:30:56,226 --> 01:31:06,528
Those things are often highly correlated and then going from like a conditional
expectation on predictions to um the prior.
:
01:31:06,528 --> 01:31:14,631
So let's say you have this age and this income, does that, how does that relate ah to
education outcomes?
:
01:31:14,631 --> 01:31:15,647
uh
:
01:31:15,647 --> 01:31:19,389
And specifying the prior in that way, I think is super interesting.
:
01:31:19,389 --> 01:31:36,286
And there's a lot of cool stuff also being developed that helps to specify these priors
with artificial intelligence, uh AI trying to go from um very prose and conversational way
:
01:31:36,286 --> 01:31:43,059
of talking about then what we want to do a prior on to then actually implementing it and
uh things like Stan and PyMC and so on.
:
01:31:43,059 --> 01:31:44,399
think that's
:
01:31:44,479 --> 01:31:54,888
a lot of the future that's awaiting people who are maybe not so interested in learning
Stan and details, but still want to do cool Bayesian inference.
:
01:31:54,888 --> 01:32:01,353
And then these kinds of things, I think, will make it accessible to a much wider audience
than it right now.
:
01:32:01,734 --> 01:32:02,514
Yeah.
:
01:32:02,514 --> 01:32:02,995
Yeah.
:
01:32:02,995 --> 01:32:04,396
I mean, definitely.
:
01:32:04,396 --> 01:32:13,015
I mean, even for us, know, who are like power users of the software, that would make my
model workflow be way faster.
:
01:32:13,015 --> 01:32:26,455
because most of the time that's a much, much more interpretable and intuitive way of
defining the priors than trying to understand what the prior on the AR squared process of
:
01:32:26,455 --> 01:32:30,575
my time series of my structural time series model is going to mean.
:
01:32:30,575 --> 01:32:42,255
The only way I can understand what this means right now is just doing cumbersome iterative
process of changing one not at a time and seeing how that impacts
:
01:32:42,443 --> 01:32:51,125
the prior predictive checks and maybe an interesting metric, like the prior squared or
something like that.
:
01:32:52,326 --> 01:32:55,927
that's the only thing that's really reliable right now.
:
01:32:55,927 --> 01:32:59,623
And it feels like it can be automated for sure.
:
01:32:59,623 --> 01:33:11,411
Because it's like a lot of cumbersome back and forth, basically, probably something
ASC-TEED would make faster.
:
01:33:12,223 --> 01:33:17,868
Yeah, but still it's kind of nice that you still have to get your hands dirty in a way.
:
01:33:17,868 --> 01:33:21,191
not everything is too automated because it does let you learn a lot.
:
01:33:21,191 --> 01:33:28,616
But the problem still remains that not everyone has the time, inclination or interest in
getting their hands that dirty.
:
01:33:29,077 --> 01:33:34,642
Yeah, yeah, No, and also like everything has a trade-off, right?
:
01:33:34,642 --> 01:33:41,487
So the time you spend on that is not time you're spending thinking about expanding your
model.
:
01:33:41,545 --> 01:33:43,776
Yeah, I need more expressive and so on.
:
01:33:43,776 --> 01:33:51,139
yeah, that if we can make that easier, that definitely be amazing and high impact.
:
01:33:52,019 --> 01:33:52,420
Awesome.
:
01:33:52,420 --> 01:33:54,050
So I need to let you go, David.
:
01:33:54,050 --> 01:33:56,121
That's already like one hour and a half we're recording.
:
01:33:56,121 --> 01:33:57,651
So I don't want to take too much of your time.
:
01:33:57,651 --> 01:34:01,683
You'll come back on the show for your for future work you have for sure.
:
01:34:02,044 --> 01:34:07,676
But before you go, let me ask you the last two questions I ask every guest at the end of
the show.
:
01:34:07,676 --> 01:34:10,709
So if you had unlimited time and resources,
:
01:34:10,709 --> 01:34:12,879
Which problem would you try to solve?
:
01:34:14,819 --> 01:34:21,284
This is a really a weighty question and I feel like there have been such good answers in
the past So it's it's really hard.
:
01:34:21,284 --> 01:34:31,891
I find to add to any of that But you know, let's say that with infinite resources and
everything I've I've done all the things that we should do for humanity.
:
01:34:31,891 --> 01:34:43,669
All right, so we've been the good guy already I think What I would do is I would go back
to one of those core econ things oh that are important namely
:
01:34:43,669 --> 01:34:51,845
How do you set policy such that you maximize the utility of a nation or maybe all nations?
:
01:34:53,667 --> 01:35:02,674
you know, one, one particular question econ is how can you, how can you achieve the best
amount of good or the most amount of good for all people?
:
01:35:02,755 --> 01:35:08,489
And this is a really difficult question because there are just, there are always so many
trade-offs in, in policymaking.
:
01:35:08,489 --> 01:35:11,051
do one thing, you improve the life for others.
:
01:35:11,051 --> 01:35:12,502
You decrease the,
:
01:35:13,169 --> 01:35:15,260
benefit for another group.
:
01:35:15,260 --> 01:35:32,034
And I think if I had infinite resources, I would try to find the optimal policy rule that
would satisfy the condition of uh best amount of welfare, whatever that definition is, by
:
01:35:32,034 --> 01:35:38,746
the way, I guess that needs to be conditioned on philosophy um across all time periods.
:
01:35:38,746 --> 01:35:42,007
And then basically have a fairly automated rule.
:
01:35:42,007 --> 01:35:53,982
uh that kind of is running and whenever any economic actor takes any decision and what
would happen that would be that you would basically have like a steady state process for
:
01:35:53,982 --> 01:35:59,254
the entire nation's economy without any significant variation.
:
01:35:59,254 --> 01:36:10,439
like policymaking would always be such that we would all have kind of the best economic
life uh possible within the confines of the chosen philosophy and the constraints of
:
01:36:10,439 --> 01:36:11,479
resources.
:
01:36:12,919 --> 01:36:13,799
That's fine.
:
01:36:13,799 --> 01:36:15,919
Yeah, I love that.
:
01:36:16,579 --> 01:36:17,839
Very nerdy answer.
:
01:36:17,839 --> 01:36:19,759
And I really appreciate that.
:
01:36:19,759 --> 01:36:20,879
Thank you.
:
01:36:20,879 --> 01:36:22,019
I appreciate the effort.
:
01:36:22,019 --> 01:36:22,419
I love that.
:
01:36:22,419 --> 01:36:24,679
And I definitely resonate with that.
:
01:36:25,439 --> 01:36:27,739
Although I would argue we're very far from that.
:
01:36:27,739 --> 01:36:30,279
So you would need to do a lot of work.
:
01:36:30,279 --> 01:36:32,619
Good thing you have unlimited time.
:
01:36:34,759 --> 01:36:42,499
And second question, if you could have dinner with any great scientific mind dead alive or
fictional, who do you
:
01:36:43,647 --> 01:36:46,519
So, so again, that is like too much of a weighty question.
:
01:36:46,519 --> 01:36:47,969
So I'm just going to sidestep that.
:
01:36:47,969 --> 01:36:55,754
think there are too many cool people I would like to talk to, but I think who is alive and
I would really like have a dinner with is Chris Sims.
:
01:36:55,754 --> 01:36:57,775
He's a Nobel laureate in econ.
:
01:36:57,775 --> 01:37:03,258
um He in fact was one of the initial researchers on vector audit questions, Alex.
:
01:37:03,258 --> 01:37:11,582
So if you're, if you're looking into vector audit stuff, then Chris Sims is like one of
those OG researchers in a way.
:
01:37:11,807 --> 01:37:12,947
And.
:
01:37:13,073 --> 01:37:21,665
He won Nobel Prize on related work related to more policy related stuff, but he's done a
lot of really interesting time series econometrics.
:
01:37:21,685 --> 01:37:33,379
And I would love to just have a conversation with him over dinner where we talk about how
can we integrate, let's say, the work on R squared stuff and, you know, safe uh Bayesian
:
01:37:33,379 --> 01:37:35,639
model building with his time series knowledge.
:
01:37:35,639 --> 01:37:39,550
I think that would be such a cool, such a cool thing to do.
:
01:37:39,631 --> 01:37:41,691
And in fact, he
:
01:37:41,695 --> 01:37:53,498
I think that a lecture recently in the past, two, three years where he was suggesting that
people should look at econ problems with multiple lenses.
:
01:37:53,498 --> 01:38:02,781
This goes a little bit into this kind of a multiverse idea of, of, um, statistical
modeling and acknowledging that there's a workflow that you have to work through.
:
01:38:02,781 --> 01:38:08,623
There's not always one solution for every statistical problem and econ, which is kind of
dogma, you know?
:
01:38:08,623 --> 01:38:10,583
Um, I think.
:
01:38:10,687 --> 01:38:13,388
Working with him on that would be such a cool thing to do.
:
01:38:14,029 --> 01:38:15,490
Yeah, definitely.
:
01:38:15,490 --> 01:38:19,282
ah And I've never had a Nobel Prize laureate on the show.
:
01:38:19,282 --> 01:38:23,754
I've had a sir, but I've never had a Nobel Prize laureate.
:
01:38:23,754 --> 01:38:25,295
yeah, if anybody knows...
:
01:38:25,295 --> 01:38:30,037
um Kreese, right?
:
01:38:30,178 --> 01:38:32,129
Yes, I'm sure.
:
01:38:32,129 --> 01:38:34,299
Then let me know.
:
01:38:34,460 --> 01:38:35,560
Put me in contact.
:
01:38:35,560 --> 01:38:38,862
I'll definitely try and get him on the show for sure.
:
01:38:39,483 --> 01:38:40,003
Amazing.
:
01:38:40,003 --> 01:38:40,783
Well...
:
01:38:41,163 --> 01:38:42,523
David, thank you so much.
:
01:38:42,523 --> 01:38:44,423
um That was awesome.
:
01:38:44,423 --> 01:38:45,424
Really had a blast.
:
01:38:45,424 --> 01:38:51,076
um Learned a lot, but I'm m not surprised by that.
:
01:38:51,076 --> 01:38:53,837
I had a good prior on that.
:
01:38:53,837 --> 01:38:59,189
yeah, thank you so much for taking the time.
:
01:38:59,189 --> 01:39:06,675
Please let me know, listeners, how you find that new hybrid format.
:
01:39:06,675 --> 01:39:15,021
I really like it so far, so unless you tell me, I really hate it and most of you tell me
that, I think I'll keep going with that uh whenever I can.
:
01:39:15,742 --> 01:39:32,514
So as usual, I put a lot of things in the show notes for those who want a deep deeper
David, so your socials, your work and so on, for people who want a deep deeper.
:
01:39:33,425 --> 01:39:37,006
Thanks again for taking the time and being on this show.
:
01:39:38,513 --> 01:39:39,394
thank you.
:
01:39:43,541 --> 01:39:47,252
This has been another episode of Learning Bayesian Statistics.
:
01:39:47,252 --> 01:39:57,735
Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
LearnBayStats.com for more resources about today's topics, as well as access to more
:
01:39:57,735 --> 01:40:01,816
episodes to help you reach true Bayesian state of mind.
:
01:40:01,816 --> 01:40:03,777
That's LearnBayStats.com.
:
01:40:03,777 --> 01:40:08,618
Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraan.
:
01:40:08,618 --> 01:40:11,779
Check out his awesome work at BabaBrinkman.com.
:
01:40:11,779 --> 01:40:12,969
I'm your host.
:
01:40:12,969 --> 01:40:14,040
Alex and Dora.
:
01:40:14,040 --> 01:40:18,199
can follow me on Twitter at Alex underscore and Dora like the country.
:
01:40:18,199 --> 01:40:25,448
You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
:
01:40:25,448 --> 01:40:27,830
Thank you so much for listening and for your support.
:
01:40:27,830 --> 01:40:30,111
You're truly a good Bayesian.
:
01:40:30,111 --> 01:40:40,535
oh
:
01:40:40,535 --> 01:40:53,434
Be sure you have to be a good bazier Change calculations after taking fresh data Those
predictions that your brain is making Let's get them on a solid foundation