#80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande

Episode 80 •
11th April 2023 • Learning Bayesian Statistics • Alexandre ANDORRA

*Proudly sponsored by **PyMC Labs**, the Bayesian Consultancy. **Book a call**, or **get in touch**!*

I’m sure you know at least one Bart. Maybe you’ve even used one — but you’re not proud of it, because you didn’t know what you were doing. Thankfully, in this episode, we’ll go to the roots of regression trees — oh yeah, that’s what BART stands for. What were you thinking about?

Our tree expert will be no one else than Sameer Deshpande. Sameer is an assistant professor of Statistics at the University of Wisconsin-Madison. Prior to that, he completed a postdoc at MIT and earned his Ph.D. in Statistics from UPenn.

On the methodological front, he is interested in Bayesian hierarchical modeling, regression trees, model selection, and causal inference. Much of his applied work is motivated by an interest in understanding the long-term health consequences of playing American-style tackle football. He also enjoys modeling sports data and was a finalist in the 2019 NFL Big Data Bowl.

Outside of Statistics, he enjoys cooking, making cocktails, and photography — sometimes doing all of those at the same time…

*Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at **https://bababrinkman.com/** !*

**Thank you to my Patrons for making this episode possible!**

*Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor, Thomas Wiecki, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Joshua Duncan, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, David Haas, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, and Arkady.*

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

**Links from the show:**

- Sameer’s website: https://skdeshpande91.github.io/
- Sameer on GitHub: https://github.com/skdeshpande91
- Sameer on Twitter: https://twitter.com/skdeshpande91
- Sameer on Google Scholar: https://scholar.google.com/citations?user=coVrnWIAAAAJ&hl=en
- LBS #50 Ta(l)king Risks & Embracing Uncertainty, with David Spiegelhalter: https://learnbayesstats.com/episode/50-talking-risks-embracing-uncertainty-david-spiegelhalter/
- LBS #51 Bernoulli’s Fallacy & the Crisis of Modern Science, with Aubrey Clayton: https://learnbayesstats.com/episode/51-bernoullis-fallacy-crisis-modern-science-aubrey-clayton/
- LBS #58 Bayesian Modeling and Computation, with Osvaldo Martin, Ravin Kumar and Junpeng Lao: https://learnbayesstats.com/episode/58-bayesian-modeling-computation-osvaldo-martin-ravin-kumar-junpeng-lao/
- Book
*Bayesian Modeling and Computation in Python*: https://bayesiancomputationbook.com/welcome.html - LBS #39 Survival Models & Biostatistics for Cancer Research, with Jacki Buros: https://learnbayesstats.com/episode/39-survival-models-biostatistics-cancer-research-jacki-buros/
- Original BART paper (Chipman, George, and McCulloch 2010): https://doi.org/10.1214/09-AOAS285
- Hill (2011) on BART in causal inference: https://doi.org/10.1198/jcgs.2010.08162
- Hahn, Murray, and Carvalho on Bayesian causal forests: https://doi.org/10.1214/19-BA1195
- Main BART package in R: https://cran.r-project.org/web/packages/BART/index.html
- dbart R package: https://cran.r-project.org/web/packages/dbarts/index.html
- Sameer’s own re-implementation of BART: https://github.com/skdeshpande91/flexBART

**Abstract**

In episode 80, Sameer Deshpande, assistant professor of Statistics at the University of Wisconsin-Madison is our guest.

He had a passion for math from a young age. And got into Bayesian statistics at university, teaching statistics now himself. We talk about the intricacies of teaching bayesian statistics, such as helping students accept that there are no objective answers.

Sameer’s current work focuses on Bayesian Additive Regression Trees (BARTs). He also works on prior specification, and numerous cool applied projects, for example on the effects of playing American football as an adolescent and its effects for later health

We primarily talk about BARTs as a way of approximating complex functions by using a collection of step functions. They work off the shelf pretty well and can be applied to various models such as survival models, linear models, and smooth models. BARTs are somewhat analogous to splines and can capture trajectories well over time. However, they are also a bit like a black box making them hard to interpret.

We further touch upon some of his work on practical problems, such as how cognitive processes change over time or models of baseball empires’ decision making.

**Transcript**

*Please note that the following transcript was generated automatically and may therefore contain errors. Feel free to **reach out** if you're willing to correct them.*

This podcast uses the following third-party services for analysis:

Podcorn - https://podcorn.com/privacy

Semi or disbanding? Welcome to Learning Bayesian statistics. Great to be here. Yeah, thanks so much. for taking the time. I think listeners will be happy in particular my patrons because he has been asking for you to come on the show for a while now. So finally we we made it happen. So I'm personally very happy because we're going to talk about regression trees and topics that we haven't covered a lot yet and but yes, but also I'm happy because it makes some patients happy. Excellent. So before talking about barns, though it's not with your origin story as usual. And how do you kind of do the stats and data? Well, Semir and you know, how seniors have a path that was?

Yeah, in some sense, it was actually a pretty straight path. I think like a lot of folks. You know, I got into statistics through kind of experience in mathematics. And I grew up in Texas, and at the time that I was growing up there. It was a really nice place to be as a student interested in math. You know, we had kind of math competitions, the Olympia kids, and and related things. And there was a really strong community of, you know, high school students interested in math contests and in Texas at the time. And one of the things that was, was kind of a lot of fun when I was growing up was there was a university down in South Texas and a Texas State University. That would host like a six to eight week math camp every summer. And so I used to go to that every summer basically, from the middle of, you know, my second like all through secondary education. And as part of that, I remember once working on a kind of research project there, which involves some statistics, and it was the first time that I'd ever seen statistics. I had never taken a class on statistics, but it was some sort of data analysis and, and I got, you know, that this sort of fog was like, you know, that was a lot of fun. I get to use a lot of the kind of mathematical thinking a lot of problem solving, and answer kind of a substantive question. And so when I went on to university, I had kind of already decided, you know, hey, I kind of want to do statistics. I kind of want to go to graduate school. And so I kind of went on a straight line journey through undergrad on, what do I need to do to kind of go to Statistics graduates programs, I jumped straight to my graduate program straight from undergrad and and I'm still here, which is, which is exciting.

Yeah, so like you've developed an interest for math in scientific thinking really, really fast.

Yeah, I, it all kind of worked out. I didn't have a lot of I don't have a very circuitous path. I kind of liked the problem solving aspect. That statistics somehow inherently has, you know, we are trying to think creatively about, you know, what is the data, what is sort of the story and the data without putting too fine a point on it. So that's something that kind of appeal to that kind of like problem solving kind of lizard brain side of me and what I still get a lot of joy out of it.

Yeah, yeah, I can see why the Bayesian framework is and that would be interesting for you. To target with that, but yeah, I can I can also understand that you get that kind of like, you know, murder mystery, solving things all the time with research and with statistics in particular. Yeah, definitely.

It's I still have a lot of fun. Doing statistics. Yeah, I think a real important part of a part of my career.

Yeah, for sure. And did you Where did you go? Where did you study before? You're under graduates? So like until high school? Did you study in the US or? Yeah,

I was. I was I was in Texas. I was I went to school and outside the Dallas area. And yeah, I went to kind of a specialized school for math and science in my last few years. So I was kind of in and around universities and in and around academia, basically. From the time I was like 15, or 16.

And I'm curious actually, because in France, for instance, you would not learn about you know, the scientific methods of inquiry before you start doing a PhD basically, or even like yeah, between masters Yeah, masters. Let's see master's degree. And I'm surprised blowing it up is my to me now. Sciences. You so much related to the method actually, it's not just you know, theories that just appeared from nowhere. Or Elliot, we know that gravity works that way. Because it's like, just like just like inventing a theory and or discovering it. But it's actually coming from the method and like all that murder mystery method that we were talking about before, which is actually inherent in the statistical work. And I find that a shame because that would have been awesome to be introduced to that method before because maybe I would have gone into research. I don't know. But I'm guessing that some some students don't go into research because they don't know about that. And they think oh, I'm not good enough at math, math, to invent theorems or algorithms. Or it's actually

I want to push on that point just a little bit. So I sort of find, like, constitutionally, I believe that everybody can do that. It's at statistics. I really reject the kind of this is not for everybody. Or like I even see it. I'm like some of my students that I'm teaching now, somebody will say, you know, I, I don't really, I'm not really a statistics type of person. It's really they don't they don't see themselves in the field. And so this is something that I actually think a lot about and how do we fix this from a pedagogical perspective? And, you know, I can tell a story about how I think, you know, Bayes is really taps into some creativity and we really need to get away from kind of procedural thinking, but maybe that's for later in the podcast. But

yeah, that was kind of like that was a long preamble to my question, but yet there was basically did you get introduced to When did you get introduced to that kind of method? And also Yeah, like personally had what do you do as a teacher also to handle that pedagogically? Because I find that also quite hard, because it's like such an ingrained idea that oh, yeah, but you have to be a math person to do that kind of thing.

No. So my own journey and today's I think starts once I got to graduate school. So I tried it. My undergraduate training was was at MIT. And it was at a time when there was really not a lot of statistical activity happening. On campus. There would be a couple of classes in the math department, a few classes in the business school, a few scattered around the university, but this was sort of a little bit before that got really organized around sort of data and data science and information. And really got into this and so I was just kind of I had a very sort of narrow view of statistics there was only only what I saw it through through the courses I took once I got to graduate school. In my first year, I thought, you know, I came into graduate school really wanting to do like asymptotic theory, sort of like two point arguments. minimax it, that's what I thought I really wanted to do. And as I started taking my core sequence in my first year, I took an applied base class that was taught by Shane Jensen, so I really owe Shane a lot in some sense, my entire career. So thank you, Shane, if you're listening, but, you know, I took his applied sort of Bayesian modeling class and I was just totally hooked, I thought, hierarchical models and sharing of information and how you can leverage information from one group structurally to inform estimation another group, I just found that totally fascinating and I started working with Shane on some analysis of some basketball data where it actually was pretty useful to take a Bayesian perspective. And so I was kind of working along these two tracks, early in graduate school. I was reading all of these sort of theoretical papers, and then I was going back to my office and working on some really applied analyses. And after a while, I realized I kind of liked this patient stuff a lot more than I like, you know, trying to chase bones and that is sort of like a kind of a personal preference. And it's sort of grown from there. I've gotten much more interested in methods and, you know, thinking about creative ways to solve interesting problems from the patient perspective. So maybe the short answer is, I just started doing some applied data analysis based was super useful. It tapped into kind of things that I thought were very intuitive i i liked the flexibility and the creativity that was afforded by the you know, you in some sense, get to write down whatever prior you want. You can structure your model however you you please. Whether you can compete with that is another issue and I think there's a lot of fun and exploring that tension.

Yeah, for sure. And what are you teaching actually?

So currently in here at Wisconsin, I've been teaching our graduate base class. I've taught that a couple of times and I've taught a Bayesian class at the undergraduate level and now I'm currently teaching our kind of undergraduate progression, so very non days. And so I'm kind of relearning what a you know, a confidence interval is, you know, how to how to express it and telling people it's not the probability that your parameter hurts in an interval, you know, the usual things. And so I you know, as much as I enjoy the Bayesian paradigm, I do appreciate having to teach outside of it from time to time.

Yeah. And well, I'm curious, have you already taught some some Bayesian classes?

I have. Yes.

Okay. And so, yeah, I'm curious. What, what were the main difficulties for for your student students when when learning the the Bayesian framework, actually.

So our Bayesian, our graduate course is situated as kind of something that our students will take after they've seen a lot of kind of mathematical statistics and inference. So they most of the students have already come in having seen a bit of decision theory, they've seen unbiased estimation, minimum variance estimation. They've seen mixed models from a kind of like linear mixed models, or like I LME for style computation. And, and so one of the big challenges early on for me was, how do I convince these students that they need another form of statistical thinking? Like why do we need another paradigm and what I really wanted to avoid was getting bogged down by any one topic. You know, you could teach a semester long graduate sequence on probably a year long sequence or more on just MCMC. But I think if you focus too much on on computation, you lose, you have to lose and, you know, model building and model criticism, I think, what, what Andrew Gelman and colleagues would call the patient workflow, which I find extremely compelling. So I tried to cover a bit of that. I tried to introduce my students to Stan I tried not to make them you know, hand derive a whole bunch of samplers, because for many of them, this would have been their only patient course. And so, I would say that my students probably found it to be a bit of like drinking from a firehose, throwing a lot at that, but I think some of them really did. Come around to being open to thinking about a patient paradigm. And I think many of them realize that, you know, in their own consulting work or their own applied work, that they might actually use some of these ideas. So I consider that a success. But maybe if my students were overhearing this, maybe they'd have other opinions, but that might be.

Yeah, it's quite hard. I mean, especially I find it even harder to introduce the Bayesian framework when people have been working for a long time in the frequencies framework. And and they are not looking for something else. They are looking for something else they already know. They need something else because they have a problem they cannot work on with their classic tools. Otherwise, I find it extremely hard to convince them in a way. And it's it's pretty closely related to the idea of objectivity. You know. Sometimes I kind of feel like, you know, that guy in the matrix, which was saying like, Yeah, but objectivity doesn't exist, actually. It was like, so. Yeah, exactly. It's like so take that trip bill, and show you what the world looks like. Once you accept that objectivity doesn't exist or will just take the blue pill and continue doing what you're doing. It works for a lot of things. But awesome cases where it will not work, but that that's okay. Yeah. Yeah. Until you accept the fact that objectivity doesn't exist. I cannot really do anything for you, except showing you that the other methods but if you don't really understand that, that preamble, it's going to be very hard.

Maybe I take a somewhat less poetic approach to this and I'd say a lot of times that I feel that as a field we probably are better at acknowledging, like just how much of what we're doing is just kind of entirely made up. You know, Statistics has a lot of, I think, much more vibes than anything else. You know, you write down a model like if it seems to work and if it gives you a good answer, who am I to judge it? So I fully embrace subjectivity. And I try to get my students to think less procedurally like you see this type of data, you must use this, this length in a GL lab, and if you see that type of data, you must do this. And you must always use, you know, clustered standard errors or robust standard errors. Sort of without that, like uncritically I think, yeah, so I'm with you that objectivity, to me is doesn't exist, everything is subjective. And it's whether I can convince you that my analysis and the story that I'm telling based on my data is compelling.

Yeah, and, yeah, well, I'm thinking you're looking at these things right now, please. Yeah, I'm teaching a workshop and I'm interested in like people coming from the frequency score mark. So it's really interesting through for me, from a pedagogical standpoint. Yeah. But anyways, like, That's it. Let's continue. I could make two pieces with that. But we already talked about that are on the on the podcast. If folks are curious about that, I would recommend episode 50 with David Spiegelhalter, who goes as far as saying that probability doesn't even exist. That sounds interesting recently. And exactly, and then Episode 51, with Aubrey Clayton, who wrote the book Donnelly's fantasy, and the crisis of modern science, so, really interesting book also in episodes, I'll put those two episodes in the show notes, where basically, these are whole episodes about these more epistemological topics that I love. So of course, like, sprinkled a bit of it with you right now. But also when I talk about some statistical methods. Sure. Yeah, and actually before digging to event, can you just define for us the work you're doing nowadays and the topics that you are particularly focusing on and interested in

Yeah, so I'm kind of, in some sense all over the place. In terms of what I'm working on, I would say currently, I spend maybe 40 to 60% of my time really thinking about trade models, and like developing art and trying to make it more flexible and deploying it in different places, slapping parts into different different parts of various probabilistic models. And then I really just enjoy kind of working in that space. About 10 to 20% of my time is also spent on something slightly more classical this is some work on sparsity and model selection with like spike and slap priors. In my training, I encountered the spike and slab law so and kind of em like algorithms for forgetting map estimation and with some students were thinking about how do we get good uncertainty quantification out of these posterior areas, which are then geometrically a nightmare, you know, very spiky and multimodal and hard to explore. So I like thinking about those and, and then the rest of it is sort of applied projects and various collaborations. I'd say a lot of my collaborations are centered around kind of understanding how effects of like playing sports and adolescence might affect your health later in life. And, and here, you have to think about how does an effect change over time? How does it vary across the population and how do I estimate these things in a super flexible way? And so that kind of leads to a lot of things about what I want to do in terms of BART and what like the direction of like the methodological development is all sort of roughly answering this question of, how do effects change over time and how can I be as flexible as possible and ad modeling us?

Okay, yeah, yeah, it's definitely super interesting. Yeah, we can we can talk about that those specific Sure, absolutely. samples later on, because I'm curious, especially but the last one Yeah. But like, how does sport impact your, like health outcomes later on in life? But first, so as you said, yeah, you work in a lot of topics, but one of them in particular, we haven't treated in depth on that podcast yet. And that's boards, so Bayesian entity regression trees. That's what you meant by working on tree models. By the way, in case listeners got confused and thought you're working about forests, trees, and things like that, maybe you are but that's what you what you meant. Yeah, can you tell us what boards are to begin with?

McCulloch and this wonderful:yeah, okay. So, but would you like is that different from random forest because the way it's calculated or because you would not use them on the same kind of models or problems?

That's a great question. I think they're trained in fundamentally different way you use all of the data to train all of the trees. There is some interface in part, all of the data informs all of the trees that are not in

depolarizing. Whereas in random forest, and those that go Random Forests, right, you're you're sort of you're sort of

subsample of some of your data to train this this sort of bootstrap aggregation idea, okay. You'll and you don't use all of the predictors, but, you know, it's, it is a different algorithm. The trees or the trees are not done sequentially. They're kind of learned together. So

yeah, it's much more Bayesian, you know. Yeah.

But what I will say and maybe this is worth pointing out that another perspective, one can take about what Bart is actually trying to do. It's helpful to think about piecewise constant step functions. So a regression tree is in some sense, a piecewise constant function. Given an arbitrary space, it doesn't have to be Euclidean space. It can be any sort of space in some sense. You can write down a, if you can recursively, partition the space you know, given a space randomly break it into two pieces, then go into each of those spaces randomly break them. If you just continue this, you can arrange all of them kind of hierarchically. into a tree, you know, the first break and then you subsequently break each piece. And then if you just put a an output at the bottom of each tree, that defines a piecewise constant function over over your space. And what Bart's doing is, it's saying I want to approximate some complicated function using a large collection of relatively simple piecewise constant functions. And at first, this might sound kind of bizarre, like what if the function is smooth? What if it's really structured? Why would you use piecewise constant functions? But it's important to note that piecewise constant set functions are a universal function approximator with enough of them you can approximate damn near anything. And so what Bart is trying to do, in some sense, is it's trying to learn a collection of step functions that when added together gives you a good approximation to the regression function. You're after. And because there's uncertainty about which functions you should use, you can write down a prior over them. You can turn the Bayesian crank, you can get a posterior and you'd hope that the posterior distribution places most of its probability, on the sort of individual step functions that would add it together give you a good approximation. And so that's, I think, the way that I like to understand Bart, the tree is just a computationally efficient way of representing a step function. Yeah, but let's approximate a function using a bunch of weak learners that happen to be well, well represented by countries.

Righty Yeah. Yeah, thanks for that presentation. That sounds really clear to me, I looked at and you managed to like basically put it down to the like smallest building blocks of barns. And basically the idea of parties would be dead, right. It's like finding the font them the most fundamental particles in which you can disintegrate the model and then trying to build a model based on D adding all of those particles together.

ly came up with a way in that:Yep. And you get all the bells and whistles from Yeah, you get all the uncertainty quantification in then you've got posterior samples. So you can ask any question from those posterior samples, including doing decision making optimization and stuff like that?

Exactly. Now, I will I will make one small caveat. Relative to kind of our traditional like parametric models, you know, the things we might fit really well with HMC or instead or in any number of languages now you know, there we think a lot about like, you know, convergence diagnostics is my MCMC mixing and here's, here's, here's, here's the honest truth about bar it doesn't mix. It has in some sense, no hope of mixing over tree space. These trees are not identified. So you're working with this, like very complicated representation that isn't identified. Sums of trees aren't identified. It's a really curious thing that it works as well as it does, despite us knowing that you know, however long you've run your MCMC for, however, many chains you've run it for. It probably hasn't mixed in a meaningful sense. And yet, the posterior if we just close our eyes and compute posterior means and close our eyes and compute intervals. It tends to work pretty well. And I think that's actually, you know, exciting that, like, by writes, this thing shouldn't work as well as it does. And yet we have a decade of sort of empirical evidence that, you know, you actually get some good answers out of it. And I think there's, there's some, there's some folks currently working on working on sort of thinking hard about the theory about, you know, like, Nixon and Bart, and this is a lovely paper and I really, really like it, I can send it to you and throw it in the notes. Yeah, for sure. It's, it's really rich. I think it's, you know, these things that we would normally say, you know, if you're running a stand model, and say, Oh, then hasn't converged yet, let me just throw some more iterations or redesigned the model or re parameterize to to get it to work more efficiently. With art. It's just sort of I think it's just a fascinating research question. Why does it work as well as it does even though all of our conventional diagnostics suggests that this shouldn't, this shouldn't be doing good things? So I find that like a very interesting space to live in.

also the original paper from:There there. There are some really deep connections there, you know,

okay. Yeah, maybe can you talk about that because it sounds to me like bonds are kind of like, discrete clients.

So yeah.

Can you talk a bit about these relationships?

So I would say that there's a, there's probably a nice it's not a direct analog, by any stretch. You can view the trees in some sense as defined and I think this is really in the spirit of kind of the more classical regression trees or or sort of the, or the Mars frequents Mars method. I think the what was the multivariate adaptive splines? I think it's you can, somebody can look that up later. But that is saying that, you know, the tree defines kind of an adaptive basis. And at some level, you're decomposing a function into a data adaptive basis. And insofar as you know, splines can kind of do the similar thing, they're somewhat less adaptive. You can draw a parallel there, but I think there is a connection to like the Haar wavelet basis you know, where you're, you take an interval and you by partition it and then you further partition it into this like very, you know, discrete thing. I think that's the hard wavelet basis. Let me just double check here. Yeah, it's where you, you kind of take an interval, cut it in half, then take that those intervals, cut them in half again, that's in some sense, Bart is doing something like that, but it's somewhat less rigid than that basis. So that might be but but I think there are kind of ways to get some insight into what it's doing is thinking about it as kind of similar to a specific type of wavelet decomposition, but we're, you know, we're not specifying get fully in advance. We're kind of letting the data determine at what resolution we need to split our input space at. So I think there are probably some really interesting ways to draw parallels here. I haven't spent a lot of time kind of probing this, but I do think I do think there's some I think there's a there there.

Okay, yeah, yeah, definitely super interesting. And that's cool that the concepts concepts are close to each other because makes it easier to understand in a way, you know, like you already know about Fourier transforms or or splines. Then it's easier to understand about about bars. Yeah. And so, I actually we talked about that already, like so you said that it's different from random forest. This I'm guessing that folks are familiar with random forest a bit more. So you said that it's different, not only because, so it's different because it's not doing the same thing under the hood. It's not the same algorithm. But also like say, I asked you that and then I think we switch to something else. Is that also difference because you would not use bots for the same kind of problems that you could use random forests.

I actually think you can use a bar for a much richer class of problems. So Bart originally got its start for nonparametric regression, but you know, a Gaussian Gaussian errors fixed variants, you know, your, your run of the mill nonparametric regression problem. And insofar as the log likelihood there looks like a squared error. Yeah, I mean, that looks like you know, using MSE when when training random forests for regression, but for classification, you know, when you're getting the full probabilistic model, so to do part for classification, it's, it's easiest to do it as like a probit regression. It sort of makes the computation a little bit easier. But then once you do that, you can start to do things like well, what if you want heteroskedastic regression? where not only does the mean function change with the X but but your your sigma changes with x? Well, it's a very clever paper by basically the bark authors again. They said, Well, you can just use a sum of trees to write to decompose the log variance as a function of x. And now we're really cooking with gas because that's something that random forest is you have to really be clever about your, your your loss function, and a random forests to in order to get that but within kind of the probabilistic framework, it's not that hard. You write down a likelihood it depends on you know, an ensemble for your main function and ensemble for your variants function. Those hat are expressed as sums or products of trees. You put a prior on those trees, there's a way to do your MCMC you can start to embed bark within larger classes of models. So if you want to do say, a survival model, there's a way to do Bardon that some of my some folks over at the Medical Medical College of Wisconsin are have really been pushing on this in some really phenomenal work as, as are some folks down in Florida. They're there they're writing some excellent, you know, art with survival. You can start to do you know, like linear models you can do smooth models, you can do some spatial modeling with BB like, you can start putting it in, essentially, if you can write down a probabilistic model, where at some point, you've got a regression function or like a function that depends on some covariates that if you were expressing that, you know, as a linear form or in some basis, you can probably just put up art in its place. And with a bit of work, probably work out the necessary computation. And so you can do a lot with this really simple idea of let's just approximate a function. One of the, you know, you know, that means sat in this literature now for a couple of years. And thinking really only about our most days. There was this wonderful new paper by by by some folks at Duke on on density regression in art, and I just thought that was great. I thought it was super cool. So you can start doing all sorts of fun stuff that to get random forests or like neural networks to do like, these types of things. Got to do a lot of work and like designing like loss functions, and because you're no longer working with a full probabilistic model, it becomes harder to chase your uncertainties to see really what's going on. So I, I would consider this a apologies for rambling that's sort of the full throated defense of probabilistic models.

I think, yes. It kind of sounds like it's, it's kind of like a nonparametric reg regression, basically.

Yeah, it definitely is, but we can do a lot more than just plain vanilla regression.

Yeah, for sure. That's like, makes me think also Gaussian processes. Of course. Yes. The kings of the nonparametric world everything is a easily gotten process. I'm pretty sure a black hole inside a black hole there is a Gaussian process like my bed and sleep.

Well, certainly there's there's a there's a fear of but at some level I'm and I certainly didn't prove this and I don't know if it's been formally proven, but it kind of goes like this. In part, you know, what you're doing is you have to write down this prior on, on regression trees. And you know, that's a fairly complicated thing to like, try to do you can do it implicitly by describing just how you generate them, you you draw the tree structure, you draw the decision rules, and then you have to draw the values that the the tree spits out as a function. Well, those values typically are come from like, say, a normal distribution, and you might say in the 200 trees in my ensemble, you know, each of them when you take a single hex and you evaluate the function, it just spits out in the prior just a normal random variable. So you're adding up a bunch of normals. And so now you can start to think about this and saying, Well, I'm adding up a whole bunch of normals. If I take a limit and I stay with correctly, there's some like Gaussian that pops up at the end. And what it turns out, I think, is that I, maybe this isn't formal or sort of a folk there, but I believe that in the infinite tree limit part does converge to a Gaussian process with a very specific kernel. And that kernel is sort of determined by you take two points, x and x prime and your input space. And you see how often would a random regression tree put x and x prime in the same partition cell? And that's sort of the notion of closeness and, and you know, this type of so so there is a like kernel sitting underneath Gardner, there's a kernel sort of, in the limit of a part model and sort of the infinite tree limit. But my sense is that it's often much faster to just do the part approximation rather than treat it formally as a GP, but you know, these these methods are all doing something similar. Bart is just trying to figure out which x's are most similar, so I can leverage similarity in their lives. And that's exactly what a colonel in your GP is doing.

You see, I told you everything comes back up everything that GPN I'm telling you folks like this is the the Nobel Prize winning discovering black holes, just GPS, which we just don't know how to compute their, their covariance kernels. Cool. So yeah, thanks a lot. That makes that makes a lot of sense. So now, I have an idea of, okay, why would I use the borings? When would I use them? Before you before asking you the next question, I need to plug in my computer. It's gonna die. Oh, no. Just notice that Okay,

okay.

Let's get back to it. Let's get back to it. So now, what I would like to ask you is actually what are the most challenging parts when you're working with bots? And when would you not use bots?

It's worth the risk of sounding maybe. Well, okay, I would use Bart all the time. I think and if it isn't easy to do it, I didn't do that as an opportunity. So what I really mean to be, you know, not so flippant about it. You know, Bart's really great at applications where you really need to get a good prediction, and you want to get your uncertainties around those predictions. If you if you think about the old sort of two cultures type of framing of, you know, there's a black box that takes your ex to your wise. Bart's not necessarily trying to make it interpretable it's not, it's just a way to go from x to y hat, in some sense. Or f hat. To be more precise, so you know, if you really need interpretability if you really need you know, identify the main effect or a main driver of some process. Bart will get you great predictions, it will probably get you great fit, but it's not going to be easy to figure out, you know, is this and what, what are the interactions and I say it's not easy. This is true of just about every tree model, figuring out what is, you know, the most important driver, there's many ways to do this, you know, partial dependence plots are or script important scores and there's like a whole cottage industry that's, that's really trying to interpret sums of regression trees. It gets complicated because with some sort of regression trees you can represent the same function using many different signs. So from a likelihood standpoint, the trees are not identified through the likelihood. So if we're deriving things based on, you know, the tree structure that becomes slightly problematic, in my opinion, so if you really need interpretability bar, it's maybe not the thing to be going for first, you could potentially run a bar and then postprocess this. So you know, like Spencer, Woody and Carlos Carvalho and some of their colleagues have really pulled on this Jared Murray's pulled on this a lot. And I really liked this idea of I guess some of some of some of the folks in this space would call it fitting the fit, where you run a bar, that you take the prediction from a bar and then you Train Like, you know, classification and regression tree on top of those predictions for him. So mapping X to Y hat, you do that so that might be a way to get interpretability but bark out of the box is not going to be interpretable. So if you're chasing interpretability, or something mechanistic parts, maybe not the way to do it. In terms of challenges, you know, it's very easy to say here's some fancy probabilistic model that I use a very bespoke model and I just want to replace a parametric form with this, you know, nonparametric some of the trees that's like very easy to say, it's very hard to do in practice because, you know, the software is it's not as easy to just like plug and play a bar into any old model. I know that there are packages like you know, d bar tries to do this and it does so pretty well. And you know, I enjoy using that package. But I would say for the most part, a lot of the advances in Bart are people kind of writing their own. Their own are packages or writing their own C++ code to implement a very specific part model that does one or two things. Something that as a community that I think we can do a lot better, is write more extensible and better documented source. Code. I do you believe that want high performance models that scale and and really are efficient? You really do need to be writing it in in a lower level language and taking advantage of, you know, a lot of the class structure and something that I've been trying to do in my own work is write new extensions of fired in ways that are easy for other people to use. Hoping that that's what my source code is, but maybe the proof will be that sort of an eating that pudding. So we'll, we'll see about that. But I think the biggest barrier is implementation. Yeah. And, you know, I'm not so I feel, I love the idea of, you know, having a very portable like, you write down a probabilistic model and just here's a little spark module that you can slap in there. I worry a little bit about getting that to work, because of how the trees are updated. Just the actual MCMC behind it. I don't know off the top of my head how you would abstract it enough to to make it super modular. But again, I think this is an opportunity. I don't view this as a limitation. I think this is, you know, if maybe I speak loud enough, my graduate students will hear like, hey, this could be a super cool thesis idea. But in reality, this is something that as a community we are going to have to face if we want it to be, you know, as easy to use and I use that word advisedly as something like Stan.

Yeah. Yeah, for sure. Yeah, that's why probabilistic languages like Stan pi MC and and so on. Also, so good because they allow people to focus on the code instead of having to write down the algorithms and then how to plug everything together, which is extremely complex, say, for sure. Like so to you. That would be the main limitation for an hour of BART's, like, the most challenging part, actually, when working with bots, would be the implementation.

Yeah, I think that that's it and there are some really nice user facing packages. I mean, if you're willing to use if you're willing to use our you know, Rob's part package is really easy to use, like I've, I'm telling you put your x's in a matrix, put your y's in a vector called Bart X, Y and you're in business. Like it really is not, like, not hard to use. And I think that's that's really spilled over to all of the subsequent developments of bark, are the folks who are writing new bark based models and packaging them. They're making it easy to use, but it's that, you know, are are we keeping up with all of the fancy models that people can dream up? That's, you know, we're certainly behind it there.

Yeah, okay. Yeah, I see. Can you also put the link to that package in the in the show notes for listeners? So that's in our right, so that was actually going to be one of my questions, which package do you use to run your Barak models? So it seems like you, you answered already. So yeah, that, that in the show notes.

I tend to write my own And insofar as there's a usually it's stuff in C++ that that interfaces through something like our CPP, which is you know, pretty well developed and mature. Every so often, there'll be a, you know, make this run in Python. And, you know, I'm, I'm sympathetic to this. I just don't know the interfacing as much. And I think like a native Python implementation could be a little bit slow trees, you know, to represent a tree in a lightweight fashion. It does help to have some of the structure that something like C or C++ gives you know, tree is kind of like a linked list and you don't want to have lots of sequential memory for that. Once you get down to that level, you know, I think writing all of the art in a low level and then interfacing it at a higher level is probably the most this is going to be the most successful. If I can ever figure out how to do interfacing with Python, I'll probably end up doing that. It's just not something I know how to do.

Yeah, that's okay. We know how to do that.

Oh, excellent. And I think, yeah, we should chat offline about that.

Yeah, for sure. I mean, for sure. If you want to get in touch with Osvaldo, like, I can definitely introduce, introduce you to him is I know he is way more knowledgeable than me and then and then implementation stuff. Cool. So I've so many more questions, because then these models are so cool. And I'm really glad that we're finally digging deep into into these models. I really hope listeners also will appreciate that. But time is flying by and I want to ask you also about some examples. So you talked about the fact that you were activating like applications of bass in sport applications of bass in, in public health. So could you take an example basically, from your work to let us understand how you're applying patient stats in those in those settings. It can be using barcodes or not. It's okay. Like, for instance, you mentioned at the beginning of the episode that you were working in that studio on how the on the impact of sports during adolescence later in life, that can be that or it can be something else.

nitially back in, I would say:Night. Yeah. Do you have anything written about that that we can put into the shownotes? Yeah, absolutely. I'll put links there. Awesome. Yeah. And he's shown it can it be full? can be fun. Yeah. Yeah. So I'm actually curious about that study you you made about, you know, the, like the impact of the amount of sports you do as an adolescent and the impact it has in your house later on. I'm curious, because it sounds to me that you would need longitudinal data to do that. So it takes a lot of time, right, like so did you do that? Or did you manage to find a way to not use legitimate beef?

ootball today. I have to wait: ink they have started that in:Both you should always be careful.

So the caveat would be you always have to be careful when you're using causal inference. I mean, when you want to make causal claim, right, if you're using linear regression, you have to be careful about what you're saying. But yeah, can you hear this done? Yeah. So once you have to be careful in the thoughts framework. So there's a

r by Jennifer Hill in I think:Yeah, I mean, thinking about your priors, and your Yeah,

ya know that there's always good utility there.

Yeah. That's why That's also when, like, that's what I'm telling people. When the when they're like, oh, yeah, but why are prayer useful? You know, stuff like that. Well, it's priors are useful for that like, especially if you don't have any data and you don't have any prior knowledge. How do you want to learn anything, anything? You're basically I don't know, you're basically saying, I don't know anything. And I don't have any more information than then. How do we want to make inferences based on that? If you want to make inferences and you don't have a lot of data, and then you need to see, go back and see if you have some prior knowledge or at least an idea of the story your model could say.

For sure. I definitely agree with that sentiment.

Cool. Okay. Thank you. So that's a trove. of information here. For listeners that truly cool the show notes are going to be full so I'm happy but that is nice. Nurses love them. Maybe before asking, asking you the last two questions. Let me ask you what what does the future patients that's look like to you? And more specifically, what would you like to see and not see?

Oh, boy. I think the future of patient sets is just super exciting. I think the envelope is getting pushed in so many different ways. And you know, my personal thing is I really want to get back to thinking about uncertainty quantification when in these kinds of like spike and slap bottles, these these sort of like very just there's like a very discrete space that we're interested in, and we see it through kind of a manifestation of like a continuous space. I don't know how to do uncertainty quantification there very well. That's something that I would most like to learn and thinking hard. I think there's a lot of work that you know, looks at things like the patient Bootstrap or weighted likelihood Bootstrap. Certainly some of my colleagues here in Wisconsin are very into this. And that's something that I really am. I think, this idea of like, can we be clever about optimization to somehow approximate the samples from the posterior not not a you know, approximation of the posterior but the actual posterior we write down? I think that would be just totally cool. But that's a very, like, idiosyncratic. answer to your question.

Yeah, no, I mean, that was kind of like a question. A very broad question. So I like that. And so I'm curious you you do a lot of things. So I just want to ask you, what's the next next thing that you want to learn? Oh, it can be not nonstatic. Yeah.

But oh, well. Let's keep it in says there's plenty in statistics that I want to learn. From talking to a lot of my colleagues here I've gotten really interested in in sort of random graphs and networks. I kind of want to learn more about that. I've been learning a lot from them already. But I kind of want to start thinking about that and thinking about, you know, where, you know, potentially some like ideas from Bayesian nonparametric. Or could we use Spark to estimate a graph on I think that's something a colleague of mine asked me a few days ago and I think the answer is probably yes. But maybe I need to think about this more.

I think Yeah. Perfect. Well, Samir, maybe Is there is there. One, one question that you wish I had asked you, but you didn't because I'm a very poor interviewer.

No, no, this was great. You're excellent. I mean, come on. I don't I don't have any questions that I like, you know, wish you had asked me or anything like that.

, hopefully, we'll get to the:Yeah, first question. If you had unlimited time and resources which problem we do try to solve?

I mean, broadly, I want to get to I really want to understand, you know, approximation error when we're thinking about our posterior, like, no matter what we do, we are just approximating it and I think understanding that error in terms of, you know, how it manifests and like, you know, the approximation error for like a posterior mean, or a posterior variance. And I will say there's a lot of like, very exciting work that's already being done on this. But if I had the like, time and resources, I would really spend a lot of time trying to think about that. You do, you do some approximate inference procedure? Gosh, would be great if I can, if we can get a guarantee and finite samples in a metric that you know, metric sizes, things like you know, means variances, or quantiles and things like that. Yeah. Yeah, yeah.

And second question, if you could have dinner with any great scientific mind, dead, alive or fictional? Who would it be?

So I'll be creating. And I'll give you four answers here. So one one, unfortunately, they're all they've all passed. But I would, I would absolutely love to, to have dinner with with George Box. Here in Wisconsin. He was our founding founding member of our department, and the ethos that he set really does, you know, come come through even to this day, so I would just be fascinating to to learn from him. I think it'd be, you know, absolutely phenomenal to have him to to sit down with Dennis Lindley and some of his writing and some of Jimmy savages writing has been really influential in and how I think and how I teach Bayes. And the last one, which is maybe more kind of personal is, you know, my my advisor was was that George and one of his advisors was was Charles Stein. And Stein unfortunately passed when I was when I was a graduate student. At a time when I was really interested in thinking about shrinkage estimation. And it's something that I wish I, you know, to, to have met to a mentor to or interacted with him is something that I wasn't lucky enough to be able to do but I've sort of would have loved to sit down and have a dinner with I guess by what would be my academic grandfather. So that that's, you know, there's four for you.

Right, yeah, well, it definitely sounds like a very nice dinner. So let's have a fake goodbye here. Then we'll stop the recordings and I'll tell you what to do with the audacity track.

Sounds good. Thanks for having me.

Yeah, well, perfect. Thanks a lot. Semir that that was really great. I'm very happy that we can dive deep into barks and I hope that we gave people some curiosity and now everyone everybody wants to try when they're trying parks. Again, if you are curious about all these topics, folks, there are a ton of show notes for this episode. So go to the episodes page on the Learn based sites website and you will get all the show notes. As usual. Thank you again, Sameer for taking the time and being on the show.

Thanks for having me.

Perfect, so you can stop with SC T. Caster. Good. So you know density you go to file export, export as W AVI and then you save it wherever you want on your computer and make sure to save it as signed 24 bits PCM because sometimes the format change 24 bit and then and then you keep the default meta data. We don't care about that. And once it is saved, well you can send it to me however you want. That can be it's a big five so each like that could be Dropbox, we transfer Google Drive, whatever you want. Sounds good. And then I can start editing. Sounds good. Perfect. Yeah. So I told you, your clutch should be at and we are at. We're gonna have 79 Go out next week. So your research should be out in about three weeks.

Oh, perfect. Pretty fast. Excellent. Thank you so much.

Now you bet. Like thank you for taking so much time. That was super interesting. I hope that you're not too tired,

because you took them Oh, no, this is great. I do actually need to run to two. I'm sitting in on a lecture that starts in five or 10 minutes. Oh, sure. So I will Well, later this afternoon, I will send you the the audio file and everything. And I will edit those show notes. Probably from the back of that.

Yeah, perfect. Yeah. Okay, maybe we can do that. That's awesome. And then we'll keep in touch of course only when the episode disabled and I mean, it's keep in touch in general.

So take care.

Thanks again, for your for your letter.

Let me get rid of the Zen casters. Here.

Transcribed by https://otter.ai