Artwork for podcast Learning Bayesian Statistics
#70 Teaching Bayes for Biology & Biological Engineering, with Justin Bois
Episode 7022nd October 2022 • Learning Bayesian Statistics • Alexandre ANDORRA
00:00:00 01:05:31

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Back in 2016, when I started dedicating my evenings and weekends to learning how to code and do serious stats, I was a bit lost… Where do I start? Which language do I pick? Why are all those languages just named with one single letter??

Then I found some stats classes by Justin Bois — and it was a tremendous help to get out of that wood (and yes, this was a pun). I really loved Justin’s teaching because he was making the assumptions explicit, and also explained them — which was so much more satisfying to my nerdy brain, which always wonders why we’re making this assumption and not that one.

So of course, I’m thrilled to be hosting Justin on the show today! Justin is a Teaching Professor in the Division of Biology and Biological Engineering at Caltech, California, where he also did his PhD. Before that, he was a postdoc in biochemistry at UCLA, as well as the Max Planck Institute in Dresden, Germany.

Most importantly for the football fans, he’s a goalkeeper — actually, the day before recording, he saved two penalty kicks… and even scored a goal! A big fan of Los Angeles football club, Justin is a also a magic enthusiast — he is indeed a member of the Magic Castle…

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, Adam Bartonicek, William Benton, Alan O'Donnell, Mark Ormsby, James Ahloy, Robin Taylor, Thomas Wiecki, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Elea McDonnell Feit, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Joshua Duncan, Ian Moran, Paul Oreto, Colin Caprani, George Ho, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Matthew McAnear, Michael Hankin, Cameron Smith, Luis Iberico, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Aaron Jones, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, David Haas, Robert Yolken and Or Duek.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

Abstract

By Christoph Bamberg

Justin Bois did his Bachelor and PhD in Chemical Engineering before working as a Postdoctoral Researcher in Biological Physics, Chemistry and Biological Engineering. He now works as a Teaching Professor at the division of Biology and Biological Engineering at Caltech, USA.

He first got into Bayesian Statistics like many scientists in fields like biology or psychology, by wanting to understand what the statistics actually mean that he was using. His central question was “what is parameter estimation actually?”. After all, that’s a lot of what doing quantitative science is on a daily basis!

The Bayesian framework allowed him to find an answer and made him feel like a more complete scientist. As a teaching professor, he is now helping students of life sciences such as neuroscience or biological engineering to become true Bayesians.

His teaching covers what you need to become a proficient Bayesian analyst, from opening datasets to Bayesian inference. He emphasizes the importance of models implicit in quantitative research and shows that we do in most cases have a prior idea of an estimand’s magnitude.

Justin believes that we are naturally programmed to think in a Bayesian framework but still should mess up sometimes to learn that statistical techniques are fragile. You can find some of his teaching on his website.

Transcript

This transcript was generated automatically. Some transcription errors may have remained. Feel free to reach out if you're willing to correct them.

[00:00:00] In 2016, when I started dedicating my evenings and weekends to learning how to code and do serious stats, I was a bit lost, to be honest. Where do I start? Which language do I speak? Why are all those languages just named with one single letter, like R or C? Then I found some stats classes by just in voice.

And it was a tremendous help to get out of that wood. And yes, this was a pun. I really enjoyed Justine's teaching because he was making the assumptions explicit, and he also explained them, which was so much more satisfying to my minority brain, which always wonders why we're making this assumption and not that one.

So of course, I'm thrilled to be hosting Justin on the show today. Justin is a teaching professor in the division of biology and biological engineering at Caltech, California, where he also did his PhD. Before that, he was a postdoc in biochemistry at UCLA as well as the Max Plan Institute in Tris, Germany.

Most importantly, for the football fans, Justin is a goalkeeper. [00:01:00] Actually, the day before recording, he saved two penalty, penalty, kicks, and even scored a goal. Yes, a big fan of Los Angeles's football club. Justine is also a magic enthusiast. He is indeed a member of the Magic Castle. This is Learning Patient Statistics.

Ex episode 70, recorded September 2nd, 2022. Welcome to Learning Patient Statistics, a fortnightly podcast on Beijing Inference, The methods project in the People who Make Impossible. I'm your host, Alex Andora. You can follow me Twitter at ann underscore like the country. For any info about the podcast, learn base stats.com is lap less to be Show notes becoming corporate sponsor supporting lbs and Pat.

Unlocking base merge, everything is in there. That's learn base dance.com. If with all that info, a model is still resisting you, or if you find my voice special, smooth and [00:02:00] want me to come and teach patient stats in company, then reach out at alex.andorra@pymc-labs.io or book call with me at learnbayesstats.com.

Thanks a lot folks. And best patient wish shes to you old. Let me show you how to be a good bla and change your predictions after taking information and, and if you're thinking they'll be less than amazing, let's adjust those expectations. What's a basian is someone who cares about evidence and doesn't jump to assumptions based on intuitions and prejudice.

Abassian makes predictions on the best available info and adjusts the probability cuz every belief is provisional. And when I kick a flow, mostly I'm watching eyes widen. Maybe cuz my likeness lowers expectations of tight ryman. How would I know unless I'm Ryman in front of a bunch of blind men, drop in placebo controlled science like I'm Richard Feinman, just in boys.

Welcome to Learning Patient St Sticks. Thank you. Happy to be here. Yes. Very [00:03:00] happy to have you here because, well, you know that, but listeners do not. But you are actually one of the first people who introduced me back to, uh, statistics and programming in 2017 when I started my Carrie Shift. So it's awesome to have you here today.

I'm glad my stuff helped you get going. That's, that's the point. That's the goal. Yeah. Yeah, that's really cool. And also, I'm happy to have learned how you pronounce your last name because in French, you know, that's a French name. I dunno if you have some French origin, but in French it means, I know, I know it's a French name, but it's actually, as far as I understand, my family's from Northern Germany and there's a, a name there that's spelled b e u s s, like, and it's pronounced like in Germany, you say Boce.

And then it got anglicized, I think when I moved to the US but uh, I was actually recently, just this past summer in Luanne, Switzerland, and there was a giant wood recycling bin. With my name on it, , it said d i s. So I got my picture taken next to that. So yeah. Yeah. Lo Zen is in the French speaking part of Switzerland.[00:04:00]

That's right. Cool. So we're starting already with the origin story, so I love that cuz it's actually always my first question. So how did you jump to the stats in biology worlds and like how Senior of a Pass read it? Well, I think the path that I had toward really thinking carefully about statistical inferences is a very common path among scientists, meaning scientists outside of data scientists and, and maybe also outside of really data rich branches of sciences such as astronomy.

So I studied chemical engineering as an undergraduate. It was a standard program. I didn't really do any undergrad research or anything, but I got into a little bit of statistics when I had a job at Kraft Foods. After undergraduate where I worked at the statistician on doing some predictive modeling about, uh, some food safety issues.

And I thought it was interesting, but I sort of just, I was an engineer. I was making the product, I was implementing the stuff in the production facility and the statistician kind of took care of [00:05:00] everything else. I thought, I thought he was one of the coolest people in the company, . Um, but I didn't really, you know, it didn't really hook me in to really thinking about that.

But I went and did a PhD and my PhD really didn't involve really much experimentation at all. I was actually doing computational modeling of like how nucleic acids get their structure and shape and things. And that was, it just didn't really involve analysis of much data. Then in my post-doctoral studies, in my post-doctoral work, I was working with some experimentalists who had some data sets and they needed.

do estimates of parameters based on some theoretical models that I had derived or worked on. And I had done some stuff and you know, various lab classes and stuff, but it's your standard thing. It's like, ooh, I know how to do a cur fit. Meaning I can, I guess in the Python way I would do it, SciPi dot optimized dot cur fit.

Or you know, in MATLAB I could do at least squares or something like that. And, and I knew this idea of minimizing the sum of the square of the residuals and that's gonna get you [00:06:00] a line that looks close to what your data points are. But the inference problems, the theoretical curves were actually a little bit say for some of 'em.

There was no close to form solution. They were actually solutions to differential equations. And so the actual theoretical treatment I had was a little bit more complicated. And so I needed to start to think a little bit more carefully about exactly how we're going about estimating the parameters thereof.

Right? And so I kind of just started grabbing uh, books and I. Discovered quickly that I had no idea what I was doing, , and actually neither did anybody around me. And I don't mean that pejoratively, it's just, it's a very common thing among the scient. A lot of people in the sciences that aren't, that don't work as much with data.

And perhaps it's less common now, but it's definitely more common than, you know, 10, 15, uh, years ago. And so I just kind of started looking into how we should actually think about the estimates of [00:07:00] parameters given a data set. And really what happened was the problem became crystallized for me, the problem of parameter estimation.

And I had never actually heard that phrase, perimeter estimation. To me. It was find the best fit per. If your curve goes through your data point, that means that you're, the theory that you derived is probably pretty good. And of course, I didn't think about what the word probably meant there. I, I only knew it colloquially, right?

And so, cuz I was focused on deriving what the theory is. And of course that's a whole, hugely important part of, of the scientific enterprise. But once you get that theory arrived to try to estimate the parameters of that are present in that theory from measurement, that problem just became clear to me.

Once I had a clear problem statement, then I was able to start to think about how to solve it. And so the problem statement was, I have a theory that has a set of parameters. I want to try to figure out what the parameters are by taking [00:08:00] some measurements and checking for one set of parameters. The measurements would be different.

How do I find what parameters there are to, to give me this type, type of data that I observe. I intentionally just stated that awkwardly because that awkwardness there sort of made the, It's funny, it made it clear to me that the problem was unclear . And, and so I, that's what got me into a basian mode of thinking because it was hard for me to wrap my head around what it meant to do that thing that I've been doing all this time.

This minimizing some squares of residuals and trying to find the best fit parameter. And, you know, in retrospect now I've actually, you know, that I taught myself. Cause I didn't really ever have a course in statistical inference or anything like that, say Okay. I was essentially doing a maximum likelihood estimation, which is a f way of doing prime destination.

And I, and I hadn't actually thought about what that meant. I mean, I understand that now. We don't really need to talk [00:09:00] about that since we're talking about BA stuff now, but, and it was just harder for me to wrap my head around what that meant. And so I started reading. About the basing interpretation of probability, and it was really, it really just crystallized everything and made it clear, and then I could state the problem much more clearly.

The problem was I was trying to find a posterior probability density function for these parameters given the data, and that was just so much clearly stated in Baying framework, and then that kinda lit me on fire because I was like, Holy cow, this thing that we do so often in the scientific enterprise, I can actually state the question , right?

And I just thought that was such a profound moment, and then I was kind of hooked from there on out and I, I was concent trying to improve how I thought about these things. And yeah, so I did a lot of reading. I realized I just talked a lot. You probably have [00:10:00] some questions about some of the stuff I just said, so please.

Oh yeah, well wait. But, um, I mean, that's good to have a, an overview like that. And so I guess that's also like, it sounds like you were introduced to patient statistics at the same time as you were doing that deep dive into, wait, like, I'm not sure I understand what I'm using then. Oh, actually I don't understand anything and then I have to learn about that.

But it seems that you, you were also introduced to patient stats at that same time, Is that right? Yeah, I think so. And I think this is actually sort of a classic way in which scientists come up with what it is that they want to study. Because instead you start poking around, you kind of don't really know where the holes in your knowledge are.

And so what I saw was like just a giant hole in my knowledge and my toolbox, and I saw the hole and I said, All right, let's fill it . And um, and so then I just started feeling around on how to do that. I see. And I am also curious as [00:11:00] to, and what motivated you to dive into the Beijing way of doing things?

I really do think it was the clarity. I think that, Okay. I think that arguing about like what interpretation or probability you wanna use is not the most fruitful way to spend one's time. For me, it was really, it was just so much more intuitive. I felt like I could have this interpretation of probability that it's, it's a quantification of the plausibility of a logical conjecture of any logical conjecture gave me sort of the flexibility where I could think about like a...