Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
-------------------------
Love the insights from this episode? Make sure you never miss a beat with Chatpods! Whether you're commuting, working out, or just on the go, Chatpods lets you capture and summarize key takeaways effortlessly.
Save time, stay organized, and keep your thoughts at your fingertips.
Download Chatpods directly from App Store or Google Play and use it to listen to this podcast today!
https://www.chatpods.com/?fr=LearningBayesianStatistics
-------------------------
Takeaways:
Chapters:
00:00 Introduction to Bayesian Statistics and Epidemiology
03:35 Guest Backgrounds and Their Journey
10:04 Understanding Computational Biology vs. Epidemiology
16:11 The Role of Bayesian Statistics in Epidemiology
21:40 Recent Projects and Applications in Epidemiology
31:30 Sampling Challenges in Health Surveys
34:22 Model Development and Computational Challenges
36:43 Navigating Different Jargons in Survey Design
39:35 Post-COVID Trends in Epidemiology
42:49 Funding and Data Availability in Epidemiology
45:05 Collaboration Across Disciplines
48:21 Using Neural Networks in Bayesian Modeling
51:42 Model Diagnostics in Epidemiology
55:38 Parameter Estimation in Compartmental Models
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström and Stefan.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
Welcome to the second ever live LBS episode recorded at STANCON on September 11, 2024.
2
:In this episode, Lisa Semenova and Chris Wyman bring computational biology and
epidemiology to life, making, I have to say, science seriously cool.
3
:You'll learn how Bayesian statistics and causal inference help in advancing the front
4
:tier of our knowledge in these fields and enjoy, I hope, the live Q &A with the fantastic
Stankon audience who attended this episode.
5
:Again, a huge thank you to the organizing committee and to the audience, you folks were
absolutely wonderful.
6
:This is Learning Visions Statistics, episode 120.
7
:Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,
the projects, and the people who make it possible.
8
:I'm your host, Alex Andorra.
9
:You can follow me on Twitter at alex-underscore-andorra.
10
:like the country.
11
:For any info about the show, learnbasedats.com is Laplace to be.
12
:Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on
Patreon, everything is in there.
13
:That's learnbasedats.com.
14
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
15
:See you around, folks.
16
:and best patient wishes to you all.
17
:And if today's discussion sparked ideas for your business, well, our team at PIMC Labs can
help bring them to life.
18
:Check us out at pimc-labs.com.
19
:Hey folks, before we start the show, I just wanted to share something.
20
:You can guess that I love podcasts.
21
:I have one.
22
:So I listen to lot of podcasts and something that happens a lot when I listen to podcasts
is I hear something on a show and I think, that's awesome.
23
:I gotta write this down.
24
:But then I miss it.
25
:I have to pause.
26
:have to rewind the episode.
27
:That's a hassle.
28
:But...
29
:that's where I discovered actually chat pods and they agreed to sponsor the show.
30
:It's really cool because whenever I catch a quote then I just tap a button and the app
will instantly transcribe and save that moment for me to revisit later.
31
:Honestly that's super practical when you listen to a technical show because then you can
just do that and then boom if you're working on a model you know time series model
32
:Boom, you can already have that in chat pods and you have the moment that you are
interested in.
33
:So really, now that's how I listen to my favorite shows and I'd love for you to try it
with me.
34
:So if you're interested, just check the podcast description or search for chat pods in
your favorite app store.
35
:And as they say at chat pods, capture podcast highlights anytime, anywhere.
36
:Now, onto the show.
37
:Well people, officially welcome to the second ever live episode of the Learning Basics and
podcast.
38
:Please welcome Lisa Semenova and Chris Wyman.
39
:So let's start.
40
:What do you want to talk about?
41
:How to put your guests very, you know, uncomfortable.
42
:so that's No, no, no, that was a joke.
43
:That's fine.
44
:No, so let's start with your backgrounds.
45
:So Lisa, you were already on the podcast, which was like, what, three years ago, something
like that.
46
:And you do so many things that I feel like we should update your
47
:your background section, your origin story.
48
:So, yeah, can you start by telling us how you ended up doing what you're doing today?
49
:Like, how did you end up using Stan, using Bayesian stats, using PIMC, using a lot of
different stuff to do very interesting research?
50
:Okay, I don't want to be the buzz killer at the StumpCon, but right now, my favorite...
51
:probabilistic programming language is NumPyro.
52
:And that has very, it's not just matter of taste, but it is matter of functionality
because the work I do these days requires easy integration of neural network architecture
53
:into a PPL, which of course is possible in Stan as well, but there you need to write the
architecture down by hand.
54
:and it's easy for trivial architecture, but once the architecture becomes a little bit
more complicated, you really don't want to be doing it manually.
55
:So this is the reason I'm not a PMC active user right now, not a super active Stan user
right now, rather a NUMPYRO account for the time being.
56
:And how did you end up doing patient stats?
57
:Well...
58
:I did my PhD in epidemiology and as we have heard yesterday and today, Bayesian statistics
is very prevalent in epidemiology for a ton of reasons.
59
:The type of epidemiology I was and still am doing is in the space of spatial statistics
where the main tool, the hammer of spatial statistics is
60
:Gaussian processes.
61
:yeah, I think that was my introduction.
62
:And I tried to get out of that world a couple of times by working in Pharma and building
models based on Joe's work, for example, and trying to do other things.
63
:But somehow this dark hole keeps attracting me back again and again.
64
:OK, thanks.
65
:Chris, what about you?
66
:So my undergraduate, my PhD and my first postdoc were all in particle physics.
67
:I went on a bit of a journey through my PhD of kind of questioning what I was working on.
68
:When I started my PhD, it was in a big group of particle physicists and I asked to be put
with the most theoretical person there because I thought theory and maths is really fun.
69
:And then as I was kind of as I was going through my papers, I thought but the paper I'm
writing isn't really changing anything.
70
:I don't feel like it's that useful.
71
:And so I kind of moved, I just kind of crept towards the more useful, what I consider the
more useful end of the discipline, namely where experiments were being done to try and
72
:confront these, all these different theories of physics with the experimental data to see
which ones are ruled out.
73
:There wasn't really any signal at the Large Hadron Collider for new physics beyond the
Higgs boson, which everybody had been expecting.
74
:And so this kind of stepwise, I want to do something a bit more useful, a bit more useful,
eventually ended up with, well, if I broaden my horizons beyond particle physics, I can
75
:think of things that I
76
:consider to be a lot more useful.
77
:And somebody in my group worked with or was friends with somebody who did mathematical
modeling of HIV.
78
:And I had no idea that something that sounded so interesting and something so applied
existed.
79
:And so I got in contact with that guy and did a month long free internship just to test
the water.
80
:And then I started my postdoc.
81
:by that point, I thought I want to get out of this field.
82
:And then got in touch with my current boss.
83
:And I'm still with him 10, 11 years later.
84
:doing infectious disease epidemiology.
85
:So mostly HIV and some COVID stuff as well.
86
:Huh, okay.
87
:Yeah, that's fascinating.
88
:I didn't know you started with physics, so you can see the amount of background work
that's going into the episodes.
89
:No, but that's really fascinating.
90
:So you add to the cohort of ex-physicists who are doing amazing things in the Bayesian
world.
91
:Yeah, I sometimes feel in quantitative disciplines, physicists are like rats that you're
never more than two meters away from even if you didn't know about it.
92
:Yeah, and I mean, so I guess you understood the topic for today.
93
:That's like yesterday, that was the nerd panel.
94
:We talked about samplers and tuples and very technical things.
95
:Today, we're gonna...
96
:It's a dummy panel?
97
:And today, we're going to see basically how to apply that on real data and what are the
fascinating things that you guys are doing.
98
:And I think that's awesome because that's going to make science look really good, which is
also the goal of this podcast, make better educational scientific content.
99
:If you have any questions for Lisa and Chris, write them down and the last 10 minutes of
the show you'll be able to ask them whatever you want.
100
:Again, the questions are recorded, the sound, so your voice, but you won't be filmed, but
you will be recorded and you will get to be, if you ask a question, in one of the episodes
101
:of Learning Visions Statistics, so you know, that's something to brag about.
102
:to somebody who knows what that is.
103
:So anyway, so write down your questions, blah, blah.
104
:So let's start diving a bit more.
105
:Actually, talking with you guys before the show, I realized there is something very
fundamental that I didn't know and understand.
106
:It's that there is a difference between computational biology and epidemiology.
107
:For me, it was kind of the same thing.
108
:So maybe can you explain what the difference is and what you guys do actually in these
realms?
109
:Yeah, surprise.
110
:So epidemiology generally is a science about health and health in the most generic sense
possible.
111
:It could be physical health, could be mental health, could concern infectious diseases.
112
:It could concern non-communicable diseases.
113
:It could concern health in a particular region.
114
:could be epidemiology of a particular region or a particular disease.
115
:Or it could be global health, looking at the distribution of health on very large scale.
116
:if you like, epidemiology is a macro science, while biology is looking rather into tiny,
tiny details.
117
:I think actually if you pay attention, most of the epidemiologists you would meet, they
wear glasses.
118
:Because we can't see things really well.
119
:We just more or less look at the globe like, okay, there's malaria in this part of the
world.
120
:There is dengue in this part of the world, more or less.
121
:Biologists are not like that.
122
:Biology, I feel, is much more of a precise science.
123
:They look into details of things.
124
:metachondria and this cell and that cell, how does this cell become a brain cell out of a
stem cell and so on.
125
:So they tried to understand world at a micro level.
126
:Yeah, would agree with that, except I think there are sort of biological areas of study at
higher scales.
127
:I'm just less familiar with them.
128
:So biologists would study ecosystems and you know, could, yeah, just other things beyond
the microscopic level.
129
:But sort of it's some of the things you're studying might not have any relevance to human
health.
130
:So you might be studying sort of what's happening in given animal, even plant, and just
understanding how that function works.
131
:Whereas epidemiology is almost always focused on humans, I mean you can do veterinary
science as well, but it's related to health and health at the individual level, the
132
:population level, but it's about understanding, there's very commonly an applied aim, we
want to understand what's good health, what's bad health, so that we can improve health.
133
:Whereas biology, think, at a basic level is just about understanding these processes,
whatever they are.
134
:And computational biology is of course just the computational side of that, and in both
disciplines.
135
:there's work which is in biology and epidemiology, there's work which is not computational
and not even quantitative even.
136
:So you can do qualitative work, which is very important, particularly with epidemiology,
know, understanding people's attitudes to healthcare, people's attitudes to certain kinds
137
:of new treatment, getting those kinds of things in place and understanding cultural
differences with interventions coming in can be really important to make these things
138
:work.
139
:Absolutely.
140
:Yeah.
141
:So the field of epidemiology actually spans several sciences.
142
:with social sciences on one end, passing through economics and so on and so on, ending up
at the statistics and machine learning and computational epidemiology.
143
:Okay, thanks.
144
:That's really useful and fascinating.
145
:I'm curious if your background in physics is actually helping you in this new field,
because it sounds to me like it would, but I'm curious if it does and
146
:If it does, how?
147
:I think the short answer is no.
148
:that compared to a count, know, helping compared to what compared to a counterfactual of
having had a PhD in epidemiology, you know, I don't think it is as useful.
149
:But I think that's sort of the reason you see physicists everywhere in this type of work
like rats is that an education in physics teaches you how to describe phenomena
150
:quantitatively, to make predictions, to understand the mechanisms.
151
:And then if you can take that kind of skill set to different phenomena.
152
:then that can still be useful.
153
:that's, I think that's why it is useful, being able to describe things using maths and the
skill set associated with that, like coding and making plots and sort of understanding the
154
:relationships with things.
155
:So at that level, it was helpful.
156
:Okay.
157
:So, if, if like someone from high school, you know, went to you and asked you if she
should pursue a degree more in epidemiology and computational biology to the
158
:of work you're doing, would you recommend doing that or would you say, well, physics is
very useful because you're going to learn these building blocks basically of exactly what
159
:you describe and then you can apply that to any field.
160
:I think it would depend what area of epidemiology you wanted to go into.
161
:So if you wanted to go into this kind of qualitative work and understanding the cultural
differences, obviously the background physics is helping you not at all there.
162
:If you wanted to end up in the area of epidemiology,
163
:epidemiology I work in, think physics isn't bad, but a more useful route would be through
something like mathematical biology or biostatistics to kind of to learn the methods and
164
:some of the kind of the areas you're going to be applying them to at the same time.
165
:Anything to add Lisa?
166
:You don't have to, but it's just I'm checking before.
167
:My default answer if you don't know what to study, study maths.
168
:Yeah.
169
:You can't go wrong with that.
170
:Yeah.
171
:Yeah.
172
:Especially algebra.
173
:Yeah, but something I'm wondering about then is how does how do Bayesian statistics fit in
all that?
174
:You know, why are you folks even in Stan or an Empire or PMC?
175
:Why are you even interesting in being in these kind of conferences like StanCon?
176
:And I think that's going to give us and the audience a better
177
:concrete idea of what you're doing every day.
178
:So, just to give a...
179
:So far I've been talking about epidemiology.
180
:work in...
181
:Both of us work in infectious disease epidemiology.
182
:Me completely, I think you partly.
183
:Anyway, so I work completely in infectious disease epidemiology and just to clarify what
that is, so it's of course epidemiology of infectious diseases.
184
:So you have infectious diseases and non-communicable diseases, ones that don't spread from
one person to another via a pathogen.
185
:And so for infectious disease epidemiology...
186
:We're interested in the infection process at lots of different levels.
187
:So when pathogens get inside our cells and inside our organs and our bodies and our
households and our workplaces and our cities, countries and the whole world.
188
:So there's kind of lots of processes going on at lots of different levels.
189
:And we want to understand those a lot of the time using quantitative data.
190
:So sort of, you know, the most sort of familiar level would be the individual level.
191
:So when an individual gets an infection.
192
:what's the probability of certain outcomes happening?
193
:So getting this symptom or not getting it would just be kind of a single probability
parameter, but conditional on getting symptoms or a certain set of symptoms, when would
194
:you get them?
195
:So then you start to think about timing distributions, and there's lots of those in
infectious disease epidemiology.
196
:So how long after I get infected do things happen?
197
:Am I getting symptoms?
198
:Am I getting hospitalized?
199
:Am I dying?
200
:And am I transmitting to somebody else?
201
:So you have all these kind of timing distributions.
202
:And you have observations of those, sometimes censored, sometimes incomplete.
203
:And so getting estimates of what these distributions are is clearly a question for
statistics.
204
:And a lot of the time studying the dynamics as well.
205
:So one of the key differences between infectious disease epidemiology and the epidemiology
of non-communicable diseases is the dynamics, essentially, which is that when you have a
206
:process that spreads from person to person, that naturally gives rise to exponential
dynamics.
207
:until something interrupts it, like an intervention or population building up immunity,
whereas non-communicable diseases don't have those exponential dynamics.
208
:mathematical models for the dynamics of the system are very different between the two
things.
209
:But you still want to estimate those a lot of the time using statistical models.
210
:For example, we had this talk earlier, I think, from Judith about estimating the R number
over time to the average number of people I pass the disease onto, and they pass the
211
:disease on as well.
212
:So what is that number?
213
:How does it change over time and in response to what?
214
:So lot of these are statistical questions.
215
:So yeah, to complement this answer, long story short, Bayesian statistics is a great way,
A, to connect models to the data, B, to get uncertainty, C, to allow your models to be as
216
:complex, well, within limit, of course, a reasonable limit, as complex as you would like
them to be.
217
:And what is interesting is that, yes indeed, there is separation in epidemiology between
infectious disease and NCDs, non-communicable diseases, but in terms of modeling, there is
218
:also some overlap that exists.
219
:So how do we model infectious diseases, right?
220
:For A, there is compartmental disease transmission models, which tell us how to...
221
:how do agents, individuals move from one compartment to another.
222
:B, there is agent-based models where we model every individual by themselves and then try
to compute summary statistics to also fit them to the data.
223
:There are semi-mechanistic models which sit somewhere in between.
224
:And there are spatial models which might or might not consider the temporal component.
225
:So.
226
:What's happening in the NCD world and communicable diseases?
227
:Okay.
228
:ABM is probably not very appropriate.
229
:Semi-mechanistic models not very appropriate.
230
:Spatial models, they don't care about the nature of your data.
231
:They are spatial, right?
232
:They don't know whether the spatial pattern is present due to how an infectious disease
was developing in a population or due to...
233
:some environmental exposures that caused the pattern of this type of cancer, for example,
in space.
234
:So spatial models is number one.
235
:That is one common denominator.
236
:And second, also the state space models or compartmental models.
237
:Turns out they're useful in the non-communicable space as well.
238
:Because rather than modeling
239
:rather than viewing each compartment as all the people together, which are in the same
state, we can view this as a state of one person during the course of their disease or
240
:condition.
241
:Okay, Chris, you have...
242
:What's a recent project you've been working on that you're particularly excited about that
you can share with us so that...
243
:then that gives us an even more concrete idea of what your job involves every day.
244
:So for the benefit of the listeners of the podcast and with apologies to the audience who
will hear this again tomorrow, that project would definitely be the most interesting
245
:statistical project I've worked on is about understanding how having two HIV infections is
different from having one HIV infection.
246
:because you can get, once you've got HIV, you have it for life, but you could get infected
again.
247
:Or sometimes when you're infected in a single event, you get two quite different viral
particles and both of them go on to establish a productive infection.
248
:you can be in this state which might persist or might disappear over time of having two
viruses simultaneously.
249
:And there's been lots of work, lots of studies done on trying to figure out, that worse
for you?
250
:If you had to pick an answer, even if you didn't know much about it, you might think it's
probably not better for you than having HIV just once.
251
:And so people trying to estimate.
252
:Is it worse for you, and if so, how much?
253
:And so we came to this question with a lot more data than has been used previously.
254
:The way we were able to have a lot more data was by using next generation sequencing
instead of more traditional older Sanger sequencing methods.
255
:So it's kind of high throughput sequencing methods, which means you can get lots of
samples through.
256
:You can have bigger sample sizes.
257
:But it means the data is a bit noisier.
258
:And so you have to think a bit more carefully about interpreting it.
259
:And so something that's
260
:I thought was particularly interesting to use Bayesian stats for here is we built a simple
causal model, sort linking together a number of covariates together with causal effects.
261
:And so obviously the different variables are affecting each other.
262
:And so it's important to propagate uncertainty through this kind of model because the
uncertainty in one part is relevant for the uncertainty in another.
263
:And so getting an overall answer like to what extent does my immune system decline more
quickly if at all when I have two HIV infections.
264
:is a function of many different parameters in the model.
265
:And so it's nice to use a Bayesian model in this context to bring that uncertainty
together properly.
266
:OK, yeah.
267
:Yeah, that's fascinating.
268
:preparing for the episode, you were kind enough to share with me some of the material
we're going to show tomorrow.
269
:Definitely recommend coming and checking out Chris's talk, because that's going to be
fascinating.
270
:And we'll put it, of course, in the show notes of that episode for the podcast listeners.
271
:Before moving on to Lisa in your project, I'm curious, what does a simple causal model
mean in your field?
272
:Well, I use the term simple in the context of this conference, in the sense that it has
maybe seven overall types of variables.
273
:I think it's 50 parameters estimated numerically in a model, but I have seven nodes in my
DAG, essentially.
274
:I have certain kinds of predictors like age and sex, which are often
275
:relevant in epidemiology, coming in together with the genetic data.
276
:So after we've sequenced the person's virus, we determine what kind of, you know, what
genetic sequences, and there's many of them in a given person.
277
:So how do we connect that together with clinical longitudinal data about how their
infection is progressing together with these predictors?
278
:So how do you kind of link these things up?
279
:So there's only about seven variable, seven classes of variable overall.
280
:So the DAG only has seven things in, so that's simple for me.
281
:Yeah, yeah, yeah, indeed, indeed.
282
:But that's
283
:That's pretty incredible.
284
:I love that that simplicity still gives you a model that can be extremely powerful and
then that can be interpreted causally.
285
:So, last question.
286
:After that, I give you the stage, Lisa, but I'm curious about your workflow in these
cases.
287
:Like, are you, like, how do you work?
288
:Are you setting up the DAG yourself and then you go to other domain experts and you like
basically
289
:test your DAG on them?
290
:Or are several of you setting up the DAG?
291
:And when are you satisfied enough with the DAG to say, that's a good DAG, now we can go
build that in Stan?
292
:So in this case, we actually intuited our way towards the likelihood first, and then I
realized only afterwards that this, you what DAG does this correspond to?
293
:And so yes, those two things make sense together.
294
:within our team and particularly with the network of collaborators we have around us
who've contributed to the data, we are the domain experts, I guess.
295
:So it's not, as I'll clarify a bit more tomorrow, I don't consider myself a statistician,
I'm an infectious disease epidemiologist.
296
:So we're coming with the domain expertise and then sort of learning the stats we need to
then make these models work.
297
:Okay, okay.
298
:That's awesome.
299
:I'd really love to see one of these team meetings.
300
:That must be absolutely fascinating.
301
:Lisa?
302
:What about you?
303
:What have you been up to recently?
304
:What are you especially excited about?
305
:And an example that you can share with us.
306
:There are three directions which are very close to my heart right now.
307
:And I'll start with the one that I've been working on for the last two or three years, I
think.
308
:And that is building emulators using deep generative models.
309
:So what I mentioned already today once, an example is trying to build an emulator for
quantities which are computationally very unpleasant.
310
:In epidemiology, what are examples of unpleasant quantities?
311
:They're Gaussian processes.
312
:are ordinary systems of ordinary differential equations.
313
:We really don't like them.
314
:We don't like them within every MCMC step.
315
:So we would like to get rid of them or we would like to
316
:create a quantity, a surrogate that quacks like a duck but is not a duck.
317
:So something that behaves like that quantity of interest but does not have all the
computational burden.
318
:So that's one direction, building emulators.
319
:Second is again about Gaussian processes.
320
:I told you this is a black hole.
321
:That is building Gaussian processes on graphs.
322
:because I think if we started talking about Gaussian processes now, the whole conversation
would be mostly around this kernel, that kernel, that representation, that representation,
323
:but all of that most likely would concern RN, so building Gaussian processes in maybe
multidimensional but real valued space.
324
:However, in epidemiology, very often we encounter network data, graph data.
325
:So imagine the start of COVID, we observe cases in a couple of countries, and we would
like to predict what's going on in other countries.
326
:And it's not really appropriate.
327
:And some countries are separated by ocean.
328
:We can't just take geographical coordinates of those countries and say, this is the
distance between the United States and the United Kingdom.
329
:That's why we will now infer what's going on in the US based on what's happening in the
United Kingdom.
330
:Still, we would like to use some notion of similarity, maybe airline data, maybe, I don't
know how many sharks based on Viannese talk, swim every day from New York to London.
331
:And turns out some very clever people have worked out the maths not so long ago in 2020,
but similar story to HSGP paper, I guess.
332
:There was a paper laying out the theory, but then as a practitioner, you're like, but what
do I do with it?
333
:Stirring at this math, how do I implement it?
334
:How do I choose the priors?
335
:yada, yada.
336
:So that is the project which I'm mostly excited about right now.
337
:There's two of us, the brilliant collaborator Slava Degeslav Borovitsky, who is the first
author of the:
338
:Big shout to him.
339
:I don't have anything to show for it yet, but it's working.
340
:Once we are ready, we will share.
341
:How do you know it's working?
342
:MCMC is finally running and effective sample size is larger than one, which was not the
case all the way throughout.
343
:just by running the model and comparing, say, raw data to our layers, we figured out there
was one country that was just sticking out.
344
:The data did not look right.
345
:And then it was France.
346
:Of course, probably a strike.
347
:Yeah, but then we went and checked the data and indeed either the data is corrupted,
there's something strange going on, but we understood it having run the model, not having
348
:investigated the data.
349
:What does it say about us as modelers?
350
:Well, you're the judge.
351
:But yeah, that's the second project.
352
:And the third, I can't say it's one project, it's an overall direction that is
353
:sequential data collection, and that's related to iterative methods that Anna and I will
cover in the workshop on Friday.
354
:Can you already tell a bit more?
355
:Because I won't be here on Friday.
356
:Sure.
357
:So the context of surveys is that we would like to make judgment about populations, say we
would like to estimate one quantity about a population, say the average
358
:wealth of everyone in this room, but it's impossible to survey absolutely everyone.
359
:So we would like to collect a representative sample of everyone who is in this room.
360
:So we would like to reach out to poor people and rich people and sample a little bit from
each of the group and then compute say an average or an aggregate.
361
:And there are in real life certain issues associated with that.
362
:First of all, non-response.
363
:that if we go out to very rich people who probably are not declaring their taxes properly,
they might not be willing to tell us how much they earn or what their total wealth is.
364
:Similar about poor people, they might be coming from marginalized populations, they just
are not cooperative, they do not like organized research.
365
:So our estimates, hence, will be biased.
366
:This phenomenon that I now qualitatively described quantitatively means that there is a
correlation between the recording mechanism, basically who and where with sample, and the
367
:response.
368
:So whether you are rich is correlated with the fact whether you give us an answer or not.
369
:That is a big problem.
370
:surveys often are designed statically.
371
:Basically, we decide who and where to sample before we go to the field.
372
:And the idea is to use sequential methods to try and solve those problems as we go.
373
:It's a little bit like building a plane as you fly, but it does have benefits.
374
:So we can go and field a batch of samples, collect a little bit of data, come back home,
run our Bayesian model.
375
:see where we are certain, where we are uncertain, and then use the exploration
exploitation trade-off meaning if we're already certain about the subpopulation, do we
376
:really need to go and ask them again, or do we rather go and spend our budget where we are
uncertain about that subpopulation?
377
:would like to learn more.
378
:Okay, yeah, I mean, that sounds fascinating, at least for the political scientist nerd
that I am.
379
:I started like that, so it's very, very close to basically all the issues that polls are
having.
380
:So I'll definitely study and watch this tutorial.
381
:Thanks, Lida.
382
:Actually, once it's available, we should put that in the show notes for that episode.
383
:And also, if you can send me the papers you mentioned, we should definitely put that in
the show notes.
384
:Thanks.
385
:Another question for the both of you, let's start with you, Chris.
386
:I'm curious, what are your most significant challenges when you're developing a model like
the kind of model you told us about a few minutes ago?
387
:So think the main thing that was a time sync so far, something I will explain in more
detail tomorrow, has been that in working with these...
388
:datasets with longitudinal data from many individuals, so 2,600 individuals in this
project, using that many numerical parameters for individual specific random effects, I
389
:think wouldn't be feasible in this case.
390
:And so I did analytical marginalization, so I'm calculating some of the integrals myself,
and then Stan doesn't have to know about the values of these parameters.
391
:And so it's the time involved just in doing the maths, and then if you want to know what
the posterior is for these things are.
392
:you have to kind of undo that maths afterwards.
393
:And so doing this and testing it on simulation to make sure you're getting it right and
getting some of the linear algebra with the matrices, you know, I spent some time chasing
394
:up a bug I'd made in the maths.
395
:So that had been a time sink.
396
:But otherwise, it's been relatively smooth sailing for what has been sort of one of my
first Bayesian projects and definitely the biggest.
397
:And so I guess I was just quite lucky that things tended to work pretty much as I went
along.
398
:But testing on simulation, as many people have said, has been critical to catching bugs at
each level of adding more complexity to the model.
399
:And it's often something I would never have caught otherwise, like putting the standard
deviation instead of the variance in the function call or something like that.
400
:OK, yeah.
401
:So basically, simulating data, running the model on that, and that helps you debug that
nasty bug that was hidden somewhere in the code.
402
:Yeah.
403
:Thanks, very practical.
404
:I like that kind of tip.
405
:Lisa?
406
:I think coding bugs definitely, but also speaking very different languages with people
from the domains.
407
:So coming to survey design, there is literature on surveys which speaks more of a very
practical and statistical language.
408
:They write down...
409
:with a sum sign, they have all set up their own jargon such as sampling frame and many,
many other words which I did not know what they mean.
410
:On the other hand, there is computer science literature and they talk all the time about
information gain and formulate everything in terms of information.
411
:And then you understand, well, the solution you're looking for somewhere in that
literature.
412
:but because it's written using completely different language and jargon, it is absolutely
impossible to compare apples with pears.
413
:Yeah, so I mean, I noticed from your answers that it seems like using Stan or any PPL is
actually not really your bottleneck in these cases, which is amazing to hear, nonetheless.
414
:I'm going to ask you, if Stan or any PPL you're using could make your life easier, what
would that look like?
415
:So far I don't know and I think it's because I haven't used Stan enough yet to come up
against any limitations really.
416
:A key point of learning which Stan does handle already.
417
:What's normally said is you can't use discrete parameters in Stan or guess any Hamiltonian
Monte Carlo or...
418
:maybe within reason if other languages let you explore a few different parameter spaces
separately.
419
:But in Stan you can't, it doesn't natively support discrete parameters, but if your
discrete parameter corresponds to which different process or which different component of
420
:a finite mixture model is contributing, then it effectively does and it's just that you
need to do the maths to handle that.
421
:So that was sort of what I initially perceived to be a limitation of Stan, but then you
just need to know the right way of dealing with it and then it's fine.
422
:No suggestion for me yet.
423
:For me, I think it comes back to the first answer, that's integration with neural network
libraries.
424
:As long as I'm able to combine easy ways to formulate an architecture that I would like to
use within a PPL, the bingo that I'm sold.
425
:OK.
426
:And what are the neural network libraries you go to?
427
:PyTorch Jax in particular, because I work now with NumPyro, Jax is our best choice.
428
:Having said that, it's a little bit fiddly.
429
:So what people do, they try to build a neural network in PyTorch to figure out all the
details of the architecture.
430
:And then once they have the final answer, it to Jax.
431
:Okay.
432
:So of course you've
433
:like you're working in a field that we've heard a lot about, at least since COVID, right?
434
:So it doesn't happen that much that you have like such a big event that I'm guessing has a
lot of consequences in your fields.
435
:So I'm curious if you have noticed any, yeah, any new trends post-COVID in the work you're
doing.
436
:in the people you were able to talk to, were able to reach, things like that.
437
:I don't think it's changed much.
438
:was through COVID that I got more into statistics.
439
:I've done very little statistical work before that.
440
:So it's been a change for me in terms of what I've worked on.
441
:It meant a change in the level of news coverage our papers received in a way that was kind
of uncomfortable.
442
:I decided at the very start of the pandemic, didn't want to have a sort of, I didn't want
to...
443
:engage with reporters, partly because I felt like there's so many people who know much
more about it than me.
444
:I don't want to contribute noise, partly because I thought another white male being a face
of something that's already could do more diversity would be helpful.
445
:And partly just feeling not comfortable in that scenario.
446
:each paper coming out and getting a lot of news coverage was just very strange.
447
:That's definitely died down now.
448
:see things are of going a bit more back to normal.
449
:In terms of changing the work, I would imagine
450
:quite a lot of people decided it was an interesting thing to go and work in or a useful
thing to go and work in, having lived through some of the negative experiences of not
451
:controlling infectious diseases.
452
:I haven't seen that sort of feed through very much yet in terms of like new people coming
into the group, things have been relatively stable for us.
453
:But I imagine that is a kind of a macro level trend.
454
:Yeah, yeah, I was going to ask you that because I was talking with Bob Carpenter yesterday
and he was telling me that in computer science, for instance, like the
455
:the number of students just boomed extremely in NLP and all these fields.
456
:So I was curious if you had also seen a boom like that in the number of students that
joined your fields.
457
:Lisa, maybe on my first question, basically, did you notice any new trends?
458
:How do you feel about that?
459
:Or are there also things that you'd love to see but that you don't see yet?
460
:First, very personal trend after people ask me, what do you do, what's your profession?
461
:And I say, epidemiologists, they stopped asking back, what is it?
462
:Second trend is data.
463
:Never ever in epidemiology we had so much data which is so precise, global, of such high
quality, and also a lot of international cooperation.
464
:So databases were created where global data was collected, which of course enabled
research, which was not able up till that point at this scale on this level.
465
:Okay.
466
:Yeah.
467
:And what about funding?
468
:Did that make that a bit easier to get funding?
469
:I've not been involved in any grant applications yet.
470
:I'm happy staying as a postdoc forever.
471
:I like spending all my time just doing the analysis and somebody else thinks about money.
472
:That's great for me.
473
:So I haven't had personal experience of it.
474
:I think I heard during the pandemic that something like a billion was promised in the US
for viral genomics.
475
:And the Biden administration created, I think, a new center for forecasting analytics for
infectious diseases.
476
:in DC and they've received a lot of money which they're distributing throughout the US to
do lot more mathematical modeling of infectious diseases.
477
:So I think that a lot more money has become available.
478
:I haven't seen it personally, but I think it's around.
479
:Can I ask my previous answer with a specific example of data?
480
:So here in the UK an unprecedented survey has been conducted and that's React.
481
:national level survey, which is aimed to be representative, and it was taking place in
several ways.
482
:And it has served now as a...
483
:So it laid foundation for surveys which can continue.
484
:So that only happened due to COVID and during COVID.
485
:Yeah, okay.
486
:What was the last question?
487
:No, it's just, you know, curious if there was something you had helped...
488
:Yeah, yeah, funding also.
489
:Sorry, funding, yeah.
490
:I think I was the lucky one to ride this wave because I was holding fellowship until I
started my last job here at Oxford, funded by Schmitt Sciences.
491
:And that was, I think, due to the timing.
492
:So they're very keen on applications of AIML and adjacent methods.
493
:in real life.
494
:So sciences which have impact in real life.
495
:So I guess my pitch came at the right time of AI for epidemiology.
496
:I guess Gaussian process is qualified for AI, right?
497
:Yeah.
498
:Thankfully.
499
:Do we have already some questions?
500
:Thank you very much.
501
:My name is Mpatswa, I'm a third year of my PhD.
502
:My question goes mostly to Chris, but I think this had some really good points that you
might also want to contribute on.
503
:So I'm currently at Imperial, your former institution, Chris, and I've loved both your
explanations and definitions for what epidemiology is, but I think you've downplayed how
504
:much physicists and people from other backgrounds bring to epidemiology.
505
:So I come from a medical background and then did a masters in AP, but then I've been blown
away by how much to borrow a term that I've from someone who someone's podcasts called
506
:free associations, how much people are free to use, you know, to make causal inferences
about, you know, small service that they've done.
507
:So the clarification in the thinking when you bring together, know, like your DAG thinking
and, you, you force people almost to connect the mechanisms, you know, in a principled way
508
:and then make inferences.
509
:I found that really helpful and mind-blowing.
510
:my question is, dealing with people from other domains, which Lisa also commented on,
medical people and epidemiologists specifically, what advice do you have for someone like
511
:me to be more torn down and more humble about how we connect data and make conclusions
about things in the world?
512
:Thank you.
513
:I guess the first thing that leaps to mind is don't be more humble because your experience
is so valuable.
514
:But collaboration is key, right?
515
:So it's there in the background or the foreground with lot of Bayesian analysis that
domain expertise is so relevant.
516
:Not just what functional form do I choose for my prior, where do I concentrate it, but
what kind of likelihood, what kind of data should I be looking at, what question should I
517
:be trying to answer.
518
:So domain expertise, so making sure you collaborate with people.
519
:You can decide for yourself what kind of question you want to work on.
520
:And then if you're not the expert in sort of how that kind of process works, how that kind
of data works, how it's generated, work with the people who are.
521
:If you're not the expert in the kind of methods you need to analyze that data and draw
conclusions, work with the people who are.
522
:So I guess, yeah, don't be humble, but value what you can bring, but value what everybody
else can bring as well.
523
:Lisa?
524
:Anything to add?
525
:Okay, that was another question.
526
:Yeah, I think you were before.
527
:This is for Eliza.
528
:Can you just speak more what's your requirements for Deep PPL or how do you use the deep
learning part of the framework?
529
:Yes.
530
:Okay, let's talk about surrogates for a little bit.
531
:Let's write an imaginary PBL program here.
532
:Define a model where I have some outcome data and the last line is the likelihood.
533
:So the last line is y is distributed as something something likelihood, but then this
likelihood I have some difficult term, Gaussian process.
534
:don't like it, so let's say I want to model on a grid of million by a million.
535
:If I actually run an MCMC on this program, that at every step of MCMC, it will have to
deal with the Gaussian process.
536
:In fact, the one line above the likelihood is the sampling statement where I sample from
the prior of the Gaussian process.
537
:So I say F is distributed as a multivariate normal with
538
:there are covariance matrix K.
539
:So what I want to be able to do is to write exact same program, just to scratch this line
out where F is distributed something and write instead F hat is distributed as something
540
:else very simple, where F hat though looks very close to how F would look.
541
:So where do I sample this f hat from?
542
:The samples of f hat are given by a pre-trained dip generative model.
543
:And what is the structure of the dip generative model?
544
:And we need to go back to different times to parameterize multivariate normals.
545
:So we want to use a non-centered parameterization where actually to sample f.
546
:instead of f hat is the, sorry, f is distributed as multivariate normal, we'd rather write
f equals Lz, and we write one line above z is distributed as standard normal.
547
:Sampling from standard normal is not hard, right?
548
:They are all uncorrelated, z does not depend on any parameters, wonderful, so it doesn't
get better than that, z, standard normal.
549
:Next line, f equals Lz,
550
:L is really problematic.
551
:It's that Cholesky factor, cubic complexity, includes all the GP parameters.
552
:So this is the line we don't like, F equals LZ.
553
:So F hat then equals some function phi of Z.
554
:Z is again very nice, no problem sampling from Z, but this phi now is a neural network.
555
:So basically we've pre-trained a neural network phi which learns
556
:how to pass random, but simple vector z through a deterministic function phi to create
draws of priors for f hat looking very similar to f.
557
:Does it make sense?
558
:Yeah, yeah, no, well done.
559
:I think that was really impressive to have that without the blackboard.
560
:So you're basically like
561
:approximating a Gaussian process with the neural network.
562
:Precisely.
563
:And now let's perform this mental exercise.
564
:You say, don't care about spatial statistics.
565
:I don't need Gaussian processes.
566
:care.
567
:I'm Julian Riu.
568
:I'm Judith.
569
:I'm Nicholas.
570
:I care about mechanistic disease transmission models.
571
:How can you help me?
572
:Well, and I can say, all right, let's write your PPL how you usually do it.
573
:What do you do?
574
:Okay, always start writing a PPL from the last line.
575
:Bad habit.
576
:That's how I do it.
577
:Okay, last line is always likelihood.
578
:Y is distributed as something.
579
:Let's model now virtually the number of daily counts of a particular disease.
580
:Y is distributed as Poisson with intensity lambda.
581
:Lambda function lambda of t.
582
:would have the, okay, so Poisson where the mean is i of t, where i is the function which I
got as a solution of the SIR model, right?
583
:So I write a very complicated order here, S prime equals i prime equals r prime equals,
then within every MCMC step, I have to solve the system of differential equations
584
:with three compartments, best case, just to get this one tiny solution, one compartment
out, I of T, plug this into my likelihood.
585
:So why on earth was I solving that whole ODE, right?
586
:All I need is I of T.
587
:What do I do?
588
:I say, PPL, wait a second.
589
:I'll go pre-train a neural network.
590
:What do I show through the neural network?
591
:I show it many solutions.
592
:I give it one solution of ODE after the other, after the other.
593
:and I don't need to give it all three compartments.
594
:I only need to show its solution I of t.
595
:Again, I pre-train that model, call it phi of t.
596
:Coming back to my PPL, scratch out the ODE saying y of t, daily counts, is distributed as
Poisson distribution with a mean i hat of t, where i hat is the pre-trained neural
597
:network.
598
:Impressive.
599
:Lisa Semenova, ladies and gentlemen.
600
:I think there were still like two questions.
601
:Do we have time?
602
:Yeah.
603
:Thank you.
604
:I have one question for you each.
605
:So for Aliza, the top.
606
:So imagine you're the modeler who needs to handle all three models, like agent-based, and
semi-agent, and compartment.
607
:I think you already are.
608
:So what are some characteristics of good test diagnostics for you?
609
:Context of this question is different data generating process requires different
evaluation measures for the model.
610
:And even conditional on one modeling, say, compartment model for SIR, diffusions of idea
and diffusions of pathogen should differ, like how we evaluate the fit of the model.
611
:So I'm just generally curious about, since epidemiology seems to be where different
modeling philosophy comes and convenes, just because it's macro-filled, I want to ask some
612
:philosophy behind, or what test diagnostic would be most comfortable for you?
613
:useful for you.
614
:And for Chris, sorry.
615
:What's your workflow on choosing which parameter to estimate as opposed to assumed?
616
:For instance, reproduction number is a ratio of two different parameters.
617
:And I always get confused which to set as assumed and which to set as estimated.
618
:One very easy way to frame this is, you first start by estimating every every
619
:parameter as estimation and then somehow lower the uncertainty of the model or do you
start very small by setting everything except one as assumed and then start to increase
620
:the uncertainty of the model.
621
:Thank you.
622
:May clarify the question please?
623
:So you asked for the three types of models.
624
:What are the useful diagnostics?
625
:For instance, the test diagnostic, think it's pregnant by Ronald, it's compartmental
related because that's mostly representative as a function.
626
:But if you're doing an agent-based modeling, what are the test diagnostics and how does
that relate to other models?
627
:I'm not sure I understand test diagnostic in this context.
628
:fitting, model fitting, yeah, okay.
629
:We do need to link different parts of models to the data.
630
:In cases like compartmental models, we already get the daily mean.
631
:Then in terms of agent-based models, we do need to create summary statistics.
632
:So if we would like, it's very popular topic right now where people create digital twins
of entire countries.
633
:So for example, here in the UK also, there's several versions of digital twins,
particularly trying to model spread of infectious disease and they try to be as realistic
634
:as possible and include existence of schools and shops and where else people would go and
meet each other and at what rate.
635
:like insanely detailed models.
636
:And of course, we don't have data at that level, but we still have data at the exact same
level as we have it for SIR, which is what is the number of infected cases per day that we
637
:record.
638
:So we do need in that case to...
639
:So the model itself can be as complex as we want it to be, but then in order to fit it to
the data, we do need to create summary statistics.
640
:So we run the system simulation forward and then we sum across all individuals and say,
okay, in this country on this day, as a result of this very complex process, what is the
641
:number of infected individuals?
642
:Yeah, and so just one more point to add following up on that.
643
:So some of my colleagues have written some of the best agent-based models for infectious
diseases.
644
:And as far as I know, they only ever use approximate Bayesian inference for model fitting.
645
:I don't mean, maybe it's possible I don't know how you would write an agent-based model in
STAND, for example.
646
:Maybe you can embed it in some of these other alternative methods.
647
:But as far as I know, ABC tends to be the go-to for agent-based models in infectious
diseases.
648
:for your question about which parameters to assume and which ones to fix.
649
:I haven't done any, so one of the most common examples you see, we've already heard, is
sort of estimating dynamics using statistical models, so the R number over time, for
650
:example.
651
:I've not done that kind of work.
652
:I've been sort of estimating parameters of static model, which I'll be talking more about
tomorrow.
653
:So like, what are the static parameters of this probability distribution?
654
:What's the static value of this effect size and so on.
655
:So just a bit of background for those who aren't familiar with the SIR model.
656
:So this is the of the compartmental model where you are in S, I, or R and you kind of flow
through between them.
657
:And in those ODE equations, the R number, you can think about it both at a population
level and an individual level.
658
:And the R number corresponds to the instantaneous hazards for infecting somebody else if
you're infected divided by the instantaneous hazard.
659
:for recovering and no longer being infected.
660
:So if you think about the ODEs for little while, you'll see why that makes sense.
661
:So it's this ratio of two parameters that determines the R number.
662
:And I mean, if you have no data to tell you anything more than just the number of people
infected over time, I think you should just come out and say, I can only estimate the
663
:ratio of these two parameters.
664
:But ideally, you would find something to try and separate those two.
665
:For example, there is very good data at this stage on how long does it take people to
recover from COVID.
666
:A limitation of these kind of compartmental models is that because they're working with
kind of constant hazards for transition, the waiting time in any given compartment is
667
:necessarily exponential.
668
:Whereas as soon as I become infected with COVID, it's not that I spend an exponential
amount of time infected because my hazard is not going to be, I'm not going to recover
669
:instantly basically.
670
:So you might want to think about going to more realistic models in that case anyway, if
you wanted to estimate the rate at which people recover.
671
:Yeah, we're over time, so thank you very much.
672
:And please join me in giving a huge round of applause to Lisa Semenova and Chris Wyman.
673
:This has been another episode of Learning Bayesian Statistics.
674
:Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbaystats.com for more resources about today's topics, as well as access to more
675
:episodes to help you reach true Bayesian state of mind.
676
:That's learnbaystats.com.
677
:Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraam.
678
:Check out his awesome work at bababrinkman.com.
679
:I'm your host.
680
:Alex Andorra.
681
:You can follow me on Twitter at Alex underscore Andorra, like the country.
682
:You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
683
:Thank you so much for listening and for your support.
684
:You're truly a good Bayesian.
685
:Change your predictions after taking information in.
686
:And if you're thinking I'll be less than amazing, let's adjust those expectations.
687
:me show you how to be a good Bayesian Change calculations after taking fresh data in Those
predictions that your brain is making Let's get them on a solid foundation