Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
This episode is unlike anything I’ve covered so far on the show. Let me ask you: Do you know what a research synchronous language is? What about hybrid systems? Last try: have you heard of Zelus, or ProbZelus?
If you answered “no” to one of the above, then you’re just like me! And that’s why I invited Guillaume Baudart for this episode — to teach us about all these fascinating topics!
A researcher in the PARKAS team of Inria, Guillaume's research focuses on probabilistic and reactive programming languages. In particular, he works on ProbZelus, a probabilistic extension to Zelus, itself a research synchronous language to implement hybrid systems.
To simplify, Zelus is a modeling framework to simulate the dynamics of systems both smooth and subject to discrete dynamics — if you’ve ever worked with ODEs, you may be familiar with these terms.
If you’re not — great, Guillaume will explain everything in the episode! And I know it might sound niche, but this kind of approach actually has very important applications — such as proving that there are no bugs in a program.
Guillaume did his PhD at École Normale Supérieure, in Paris, working on reactive programming languages and quasi-periodic systems. He then worked in the AI programming team of IBM Research, before coming back to the École Normale Supérieure, working mostly on reactive and probabilistic programming.
In his free time, Guillaume loves spending time with his family, playing the violin with friends, and… cooking!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Nathaniel Neitzke, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony and Joshua Meehl.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Links from the show:
Abstract
Guillaume Baudart is researcher at Inria in the PARKAS team at the Département d'Informatique (DI) of the École normale supérieure. He joins us for episode 86 to tell us about ProbZelus, a synchronous probabilistic programming language, that he develops.
We have not covered synchronous languages yet, so, Guillaume gives us some context on this kind of programming approach and how ProbZelus adds probabilistic notions to it.
He explains the advantages of the probabilistic aspects of ProbZelus and what practitioners may profit from it.
For example, synchronous languages are used to program and test autopilots of planes and ensure that they do not have any bugs. ProbZelus may be useful here as Guillaume argues.
Finally, we also touch upon his teaching work and what difficulties he encounters in teaching probabilistic programming.
Transcript
Please note that this is an automated transcript that may contain errors. Feel free to reach out if you're willing to correct them.
Alex:
Guillaume Baudard, welcome to Learning Bayesian Statistics. So I'm very proud because I am increasing the quota of French people on the show. This is good. We had a really recent one with Nicolas Chopin. It was episode 82. So yeah, the French community is making its way slowly but surely on the podcast. I'm pretty sure some people are not happy about that, but that's okay. So Guillaume, yes, super happy to have you on the show. I want to thank Virgil Andréani for putting us in contact. So Virgil is a fellow PrimeSea developer. He's also working with us now at PrimeSea Labs. So it's really cool that he made that connection between us two because... I have so many questions for you today and I think it's gonna be an original episode because a lot of the things we're gonna talk today I think I've never really covered them on the podcast so that is perfect so thank you Virgil and now without further ado let's dive in and as usual start with your origin story Guillaume How did you come to the world of statistics and probabilistic languages and how sinuous of a path was it?
Guillaume Baudart:
Well, so thank you very much for having me. I really enjoy the show, so thank you very much. And to answer your question, it was definitely not a straight path. So as a as a kid, I just wanted to do music and then I for some reason I went to science and math and I did a like a straight French curriculum in math and computer science.
Alex:
Mm-hmm.
Guillaume Baudart:
And then during my master, I tried to reconcile these two aspects. And I did a master in computer science, signal processing, acoustic, everything in science applied to music basically. So that's EarCamp, which is a small part of the Pompidou Center that is focusing on music and modern art.
Alex:
Mm-hmm.
Guillaume Baudart:
And so during this year, I played a little bit with machine learning, Gaussian mixture and stuff like that. And I was working on this project called Antesco 4, which is a music score follower, where basically you have live instruments that try to play with electronics.
Alex:
Uh huh.
Guillaume Baudart:
And this was the first time I saw some Bayesian metas because the score follower, the things that was trying to see where the instrumentist was playing in the score,
Alex:
Mm-hmm.
Guillaume Baudart:
implemented with some kind of particle filter or a Kalman filter. So I saw these vague notions and I remember like discussing that with a friend and then I stopped and I did a PhD on a completely different topic which was on embedded systems and at the end of my PhD I was hired by IBM research.
Alex:
Mm-hmm.
Guillaume Baudart:
And after a few years at IBM research, they asked us to look into privacy programming because they said, hey, look, looks like it's a hot topic right now.
Alex:
Okay.
Guillaume Baudart:
It could be interesting for us. So can you, can you take a look?
Alex:
Mm-hmm.
Guillaume Baudart:
And, and this is where I started to actively learn about privacy programming, Bayesian method and stuff like that. Starting from the very basic, like the survey paper, like the first paper. Yeah. I read was the Andrew Gordon and co-authors that is called probabilistic programming. It
Alex:
Mm-hmm.
Guillaume Baudart:
was published in the future of software engineering tracks at XC or something like that.
Alex:
That's nice.
Guillaume Baudart:
And after a while, we were working mostly with Wimondel, and we had a similar background in embedded system and reactive programming.
Alex:
Mm-hmm.
Guillaume Baudart:
And we had this idea that maybe we can combine probabilistic programming with reactive programming and see how far we can go. And this
Alex:
Mm-hmm.
Guillaume Baudart:
was the beginning of Probsilus. And still actively working on that.
Alex:
Hmm, okay, nice. I didn't know about that background story. So, and what kind of music were you into? Like, what was your dream, ideal career path in music?
Guillaume Baudart:
Oh, for me,
Alex:
Yeah.
Guillaume Baudart:
right. So depending at what age, right,
Alex:
Yeah.
Guillaume Baudart:
when I was very little, I wanted to be a solo violinist. And then
Alex:
Okay.
Guillaume Baudart:
you quickly realize that you are not good enough.
Alex:
Mm-hmm.
Guillaume Baudart:
That's a good part with violin. And then later I really wanted to apply science around music. I built some stuff for musicians. something like that. And then I got more interested in the science part and programming languages and semantics and stuff like that.
Alex:
hmm yeah this is so nice yeah I mean I understand violinist this is quite cool yeah always extremely impressed by by this instrument and the ones who are able to play
Guillaume Baudart:
Yeah.
Alex:
it so masterfully
Guillaume Baudart:
I mean
Alex:
and
Guillaume Baudart:
you haven't
Alex:
also
Guillaume Baudart:
heard me play it yet.
Alex:
yeah and also I mean the connection between math and music is actually quite quite close right even though i would think that if you take some random persons in the street they would not necessarily make that connection very fast but it's something that i often say that music is actually very mathematical and and and that all kind of translates the beauty of math
Guillaume Baudart:
Yeah.
Alex:
in a way So cool, thanks, I have a better idea now. And so I think you already kind of answered that but maybe let me ask you more formally if you remember how you first got introduced to Bayesian methods.
Guillaume Baudart:
Yes, the first encounter was this score following system where you try to align a score with a live
Alex:
Mm-hmm.
Guillaume Baudart:
musician and you have some priors about... I mean the priors are basically the score, right,
Alex:
Uh-huh.
Guillaume Baudart:
where you expect the musician to be and you have an estimation of the tempo, the speed at which the musician is going and you want to, based on observations, the sound that you actually hear, you want to align the score with the...
Alex:
Ah, okay.
Guillaume Baudart:
the live performance.
Alex:
Nice,
Guillaume Baudart:
So that was
Alex:
okay, yeah.
Guillaume Baudart:
the first system.
Alex:
Yeah, very patient indeed.
Guillaume Baudart:
Yeah
Alex:
Um, and so you have an interesting work, but for me it's hard to define it. So how would you, how would you define the work you're doing today? And also the topics that you are particularly interested in.
Guillaume Baudart:
Right, so right now, so I'm back in France after a few years in the US at IBM
Alex:
Mm-hmm.
Guillaume Baudart:
Research and I'm in an IRIA team that is working on reactive languages for embedded systems.
Alex:
Mm-hmm.
Guillaume Baudart:
So the idea is how you design your embedded systems and then build the entire compilation chain so that you have strong guarantees on the codes that execute on your embedded systems.
Alex:
Uh
Guillaume Baudart:
And
Alex:
huh.
Guillaume Baudart:
we are targeting like a critical embedded system like airplane controller or something like that. Well, you really want to be sure of what you're doing. And on top of that, I want to add Bayesian inference. So I want embedded system designers to be able to write their own probabilistic models and
Alex:
Mm-hmm.
Guillaume Baudart:
use that as part of the design of their system.
Alex:
Okay.
Guillaume Baudart:
The interesting part is that you can use that. There are two ways to use a probabilistic model. You could
Alex:
Mm-hmm.
Guillaume Baudart:
use that to simulate your environment, because you have an open environment. So it's always useful to have something that gives some uncertainty. But you can also use that inside the system. And the classic example is that you want to implement a tracker for your position. You want to estimate your position from noisy observations.
Alex:
Mm-hmm.
Guillaume Baudart:
So the classic way to do that in an embedded system manually code some stochastic controller or some stochastic method, say Kalman filter for instance.
Alex:
Mm-hmm.
Guillaume Baudart:
But what we want to do is to give the programmers the ability to write the underlying probabilistic models with all the uncertainty everywhere and
Alex:
Mm-hmm.
Guillaume Baudart:
leave the compiler and the runtime do the hard hard job for you of computing the solution.
Alex:
Mmm.
Guillaume Baudart:
So this is why we are trying to develop PropZelus. And of course, PropZelus is just an academic prototype so far, but this is a long-term goal.
Alex:
Mm-hmm. OK. OK, yes. So that's a bit clearer. So actually, yeah, let's dive into PropZList. And can you give us an overview of PropZList and its significance in the field of programming languages?
Guillaume Baudart:
Yeah, so Probzelus is a synchronous language that is extended with probabilistic constructs. And
Alex:
Mm-hmm.
Guillaume Baudart:
first I would like to mention that this is a joint project with a group at MIT, so the group of Mike
Alex:
Mm-hmm.
Guillaume Baudart:
Carbin at MIT, and most notably Eric Atkinson and Ben Sherman at the beginning, and then Charles Rahn and more recently Elitchan, who are working with us. And so... Maybe I should start by describing what a synchronous language is. So, yeah.
Alex:
Yes, for sure.
Guillaume Baudart:
So, the idea is that you want to program with streams, because you want to program systems that will never stop, right? So you can see your system as a stream processor. Everything is a stream, so instead of 42, you have 42, 42, 42 all the time.
Alex:
Mm-hmm.
Guillaume Baudart:
And everything is based on a discrete notion of time. So at each time step... you compute a new value for all your variables in your
Alex:
Mm-hmm.
Guillaume Baudart:
program. And so you compute streams of values that we sometimes call flows.
Alex:
Okay.
Guillaume Baudart:
And you build your block by assembling these stream processors. And this is... So for control engineers, there was this notation that was called block diagrams, where you could compose these blocks together, and in each arrows you have a stream, and each block is a... is a stream processor.
Alex:
Mm-hmm.
Guillaume Baudart:
And so it comes from that. And the core idea of synchronous languages, which are a very French notion, like synchronous languages, we're introducing the AETs by three teams in France concurrently. And the idea was to restrict what you could express in the language, such that you have a very precise formal semantics for the language.
Alex:
Mm-hmm.
Guillaume Baudart:
And you can have a compilation chance that gives you a lot of... very good guarantees on the code. And the main guarantee that you are looking for is execution with bounded memory. So
Alex:
Okay.
Guillaume Baudart:
you are guaranteed that whatever you do, you will never blow up the memory, which is useful because the system never terminates again. So it
Alex:
Hmm.
Guillaume Baudart:
should be able to keep running and running and running. Yeah, so this is synchronous languages, data flow synchronous languages. And on top of that, we want to add the classic probabilistic construct. So you want to be able to sample from a distribution and you want some conditioning. So observe, observe variable from a distribution.
Alex:
Mm-hmm.
Guillaume Baudart:
And then you want to infer the posterior distribution from prior observation. And so if you think about the difference with general probabilistic languages,
Alex:
Mm-hmm.
Guillaume Baudart:
now what you want to do is to infer a stream of distribution. So at each time step, based on some observation, you want to put posterior at a given time step. And a key property of Probs-elucid is that you want a reactive system, so you want to be able to interact with intermediate results of the inference. And this is... This is different from most probabilistic language reads where you have a bunch of observations and you want to compute the posterior and you can throw away intermediate results like you don't really care. Here we want to observe them and potentially we want to react to what we've seen
Alex:
Mm-hmm.
Guillaume Baudart:
which might change future inference. So if you think about a tracker for instance, you are estimating your position and based on this estimation you can take one direction or another which would change the next observation. So we called that inference in the loop and it was something that was really at the heart of the design of the language.
Alex:
Hmm.
Guillaume Baudart:
And then to talk a little bit about the inference. So we have this language that is a bit weird. And then we had to come up with an inference algorithm that would work with all these constraints. And we ended up choosing sequential Monte Carlo methods that
Alex:
Mm-hmm.
Guillaume Baudart:
you talked a lot with Nicolas Chopin in a previous episode. And this was because exactly for what I just said, that you get access to these intermediate results. And yeah, so this makes sense. And so we were able to build a first prototype with basic sequential Monte Carlo, like a basic bootstrap filter relatively quickly. And
Alex:
Mm-hmm.
Guillaume Baudart:
we were really happy that it worked. And we focused on the semantics for a while. And this was a little bit tricky. But then Eric and Ben read this paper in Sweden, I think, which was about autistic how black realization of sequential Monte Carlo methods. So the idea is that you want to do symbolic computation as much as possible, because like a basic particle filter doesn't scale really well and the performance are not that great, especially for big models like here.
Alex:
Mm-hmm.
Guillaume Baudart:
And... There was this paper that was called I think Automatic Rawl Black polarization something something by Lawrence
Alex:
Mm-hmm.
Guillaume Baudart:
Murray and Daniel Luden and David Broman And the idea is that you have this runtime that is trying to do symbolic computation as much as possible so trying to apply probability theory as much as possible mostly conjugate prior rule and when that fails you fall back on particle filter so you sample some of the variables and you try to keep on working.
Alex:
Okay.
Guillaume Baudart:
And the design of this algorithm for Probselus, because we wanted to adapt this algorithm such that it would run on bandit memory. So this was the first, this made the first true prototype for Probselus.
Alex:
Hmm, nice. Okay. And so, and to be clear, so we'll put the GitHub repo, of course, in the show notes of PropZilla's. And what do people who are interested in this, how can they run it?
Guillaume Baudart:
Oh, so everything is open source, so it's on GitHub.
Alex:
Mm-hmm.
Guillaume Baudart:
So you can find the prototypes. There is a bunch of examples and a bunch of benchmarks also if you want to try that. You do have to get used to the syntax. So it's probably easier for people who are familiar with OKML because, yeah. So everything is written in OKML. But yeah, you can try it. There is no need to install anything apart from OCaml
Alex:
Hmm,
Guillaume Baudart:
and Probsilus.
Alex:
okay. Yeah. Nice. And do you need, are you at the stage where you need some people to come in and do some pull requests or are you still at a very early stage of the development?
Guillaume Baudart:
I would say we are in the middle of that. So it's not like the repo is not really polished in the sense that there is a lot of missing stuff in the doc and it's mostly, as I said, an academic prototype, but it's always nice to have a full request and contributors. So
Alex:
Mm-hmm.
Guillaume Baudart:
if people wants to take a look, please do.
Alex:
Nice, yeah, so folks, we'll put that into the show notes and feel free to check it out and we'll send some pull requests to the PropZellers team. And, oh yeah, actually, so you talked a bit about the different advantages that PropZellers would offer, especially if you're working on that kind of topics. And... And before we dive a bit more into that, because this is a very technical use case. So I would be interested if you could explain a bit more now. Something that I've read in preparing this episode is that, uh, so there's this concept of synchronous languages that you talked about, probabilistic, uh, synchronous languages, which is probzillas. Um, and I saw that it relates. to hybrid systems modeling. So can you talk a bit about that so that listeners get a better idea of how and when these languages would be useful?
Guillaume Baudart:
Right, so as I said, synchronous languages and data flow synchronous languages, so the original language for that was called Lust or Lustre
Alex:
Uh huh.
Guillaume Baudart:
for English speakers in the 80s. And then more recently, there was a lot of work by Marc Pouzet and Timothy Brooks and co-authors about the programming language Zellust, which is also a synchronous language, but that was extended with dynamic... like continuous dynamics. The
Alex:
Mm-hmm.
Guillaume Baudart:
idea is that you want in the same language to write your discrete controller
Alex:
Mm-hmm.
Guillaume Baudart:
using your data flow stream processors. But you also want to simulate an environment. And the physical environment is typically described with ordinary differential equations.
Alex:
Mm-hmm.
Guillaume Baudart:
And so, Zellus was a huge effort to be able to do everything in the same language. And you could then, the compiler would separate the parts. that talks about the environment and the part that has a discrete controller and link with ODE solvers and run the entire simulation. So what you have in Zillis is that you really have a full synchronous language in the classic sense, that
Alex:
Mm-hmm.
Guillaume Baudart:
is discrete time and with a stream combinator and stuff like that, but you
Alex:
Mm-hmm.
Guillaume Baudart:
also have And Probzellus is focusing only on the discrete part so far, because it was really hard enough to extend
Alex:
Heh.
Guillaume Baudart:
everything and to do the
Alex:
Yeah.
Guillaume Baudart:
language design. But this is also a long-term goal. How can we reconcile the two worlds, like probabilistic programming on one hand and continuous dynamics on the other hand? So we played a little bit with that and I think if you look at the repo there are some examples that are already there. But it's really toy examples like you launch a bouncing ball and you are just listening to the tick that does the ball when it rebounds and you are trying to compute the gravity based on that.
Alex:
Oh, nice.
Guillaume Baudart:
So yeah, so this kind of toy.
Alex:
Ah, okay, nice. And so this kind of example would be what you would describe as a hybrid system.
Guillaume Baudart:
Yes, so basically you embed an ordinary differential equation inside the probabilistic model. And so, yeah, another way to see it is that you, like for probabilistic programming in general, what you are describing is a generative process. And
Alex:
Mm-hmm. Yeah.
Guillaume Baudart:
for this generative process, you need to solve an ordinary differential equation. And so, yeah, so it's
Alex:
which
Guillaume Baudart:
a
Alex:
is
Guillaume Baudart:
really
Alex:
usually
Guillaume Baudart:
difficult
Alex:
very
Guillaume Baudart:
problem.
Alex:
hard.
Guillaume Baudart:
Yeah. And so we are... I know in for the toy example we limit ourselves to very simple inference algorithm for once
Alex:
Mm-hmm.
Guillaume Baudart:
and we also limit ourselves to a particular kind of differential equations where you just like the probabilities only occurs on discrete events so you don't have it's easier to explore this.
Alex:
I see, OK. Yeah, because ODEs are usually really nasty
Guillaume Baudart:
Yeah.
Alex:
to infer, right? At least in the Bayesian framework. I don't know in other frameworks, but at least in the Bayesian framework, it can get very nasty to estimate.
Guillaume Baudart:
Yeah, that's why I said it's a long-term goal. It's like we played around and there is everything that you need in the language, like for a programmer point of view, but there is everything to do for the runtime and the compilation.
Alex:
Yeah, yeah, I mean, I've talked a bit about that. We've had Dimitri on the show, I don't remember the number of the episode, but I will put that in the show notes. And also Adrian Zeybold, episode 74, if I remember correctly, Adrian created a Python library called Sonody that plays well with Pimc of course because Adrian is a Pimc dev. So I'll put that in the show notes also, I know Adrian does a lot of work on that and he's usually very keen to work on very hard mathematical problems so if he's working on that you can be sure that it's because it's interesting mathematically.
Guillaume Baudart:
Yeah.
Alex:
Maybe can you just sum up for the listeners why ODs are So how to infer for the Beijing innovation framework.
Guillaume Baudart:
Right, so in general, yeah, from a mathematical point of view, I'm not sure I have anything interesting to say here, but for our prototypes, you can imagine that you have a particle filter. So the way you do inference is that you simulate a bunch of parallel executions of the same generative process, and you accumulate all these particles, and that gives you an approximation of the posterior distribution.
Alex:
Mm-hmm.
Guillaume Baudart:
And if you think about the ODE, you need to solve the ODE, so you need to embed one ODE solver per particle. So assuming
Alex:
Oh yeah.
Guillaume Baudart:
you need 10,000 particles to get reasonable results, you need
Alex:
Mm-hmm.
Guillaume Baudart:
10,000 instances of an ODE solver and it quickly explodes.
Alex:
Yeah. Heh. For sure.
Guillaume Baudart:
And the other thing is that the semi-symbolic method that we are focusing on for Krupp-Zegus, at this point we have no idea how to derive the maths. If you have simple conjugate relationships between pairs of random variables, or if you have a tree of things that you know that you will be able to compute a closed-form solution,
Alex:
Mm-hmm.
Guillaume Baudart:
then you can work with that. But if you need to solve an arbitrary... ordinary differential equation in the middle of that. That's an entirely open question.
Alex:
Hmm yeah yeah thanks for that small summary and totally on the go by the way I didn't ask Guillaume to prepare for that at all so well done. And so how now that we have the background a bit but this hybrid is extremely hard to say the French. person, I'm just gonna say hybrid. So hybrid systems. How does Probzilus differ from other probabilistic programming languages in that respect?
Guillaume Baudart:
Right, so there are a couple of points here. So one thing is that, as I said, you have inference in the loop. That is the main difference between a general probabilistic problem. Like if you think about Stan, for example, you are defining an entire, your program is a model. and the goal is to infer the posterior of this model.
Alex:
Mm-hmm.
Guillaume Baudart:
Here we want to interleave deterministic components with probabilistic components and there is a possible interaction between them. Think about the tracker model and a controller of a robot and you want
Alex:
Mm-hmm.
Guillaume Baudart:
the controller to go somewhere using the estimations. The other thing is that we have some constraints, as I said, like we want to execute in bounded memory, and we want to be able to access intermediate results, so that drastically limits the inference algorithms that we're able to use.
Alex:
Mm-hmm.
Guillaume Baudart:
That's why we ended up with SMC. And it's still an open question whether we can explore other things, like what could be... what would be a good other option for that. And the last thing is that we want to be able to check in the same line of thoughts as synchronous languages, we want to be able to check some constraints. We want to be able to check that the program will run in bounded memory.
Alex:
Mm-hmm.
Guillaume Baudart:
And if you remember, we have this weird runtime where we are trying to do symbolic computation.
Alex:
Mm-hmm.
Guillaume Baudart:
And this is using a huge dynamic structure that is basically trying to build a Bayesian network. at runtime and try to simplify this page and network as you go and sample some variables if you fail.
Alex:
Mm-hmm.
Guillaume Baudart:
And checking that this executes in bounded memory is actually tricky. And what we did, and it was mostly Eric Atkinson ideas at the beginning, was to define static analysis that is able to look at your program and tell you before running it whether it will run in bounded memory or not using...
Alex:
Oh nice.
Guillaume Baudart:
our algorithm.
Alex:
Yeah.
Guillaume Baudart:
So, yeah, so that was a very, very interesting work. And the main idea is that you have, like, as I said, you are building these Bayesian networks at runtime, and it's possible that you will have infinite paths, like as you go, like you keep adding some random variables, and you want to be sure that at some point you can collect some of the nodes, like you can remove some of the nodes from the graph. safely because you have no way to access them anymore. And yes, you need to define some condition on your program such that this is always true.
Alex:
Hmm. Okay.
Guillaume Baudart:
And of course it's a static analysis, so you have false positives. I mean,
Alex:
Yeah.
Guillaume Baudart:
you can guarantee that something will run in bounded memory, but for some programs, they actually run in bounded memory, but we are not going to run them.
Alex:
Hmm. Yeah, damn. Yeah, that's such a hard use case, because you not only have to worry about the sampling time and the algorithm and just the efficiency and accuracy of the algorithm, but also the memory load is really something extremely important in this case. Damn. Yeah. No. No.
Guillaume Baudart:
Yeah, there is also something that we didn't touch about that. It's
Alex:
Uh
Guillaume Baudart:
a
Alex:
huh.
Guillaume Baudart:
worst case execution time, which is something really important for embedded
Alex:
Mm-hmm.
Guillaume Baudart:
systems. And here we haven't started working on that. Like how costly are the inference algorithms? This is also
Alex:
Yeah,
Guillaume Baudart:
good.
Alex:
I see. Yeah, yeah. Yeah. Yeah. Super interesting. And the see, yeah, actually, let's try now to dive a bit more concretely now that we have a better idea of the context and the implementation. Can you tell us some real world applications of Probsilus and its ability to simulate these? these systems that you were talking about. Oh, no. Oh, yeah. No, maybe just before that, actually, a distinction that I think we should again make is basically the ability to simulate dynamics in systems with smooth dynamics and systems with discrete dynamics, because I saw it was something also while reading up on the on the episode that could be important. Is that something that you have? Yeah, can you tell us? Give us a background about that maybe.
Guillaume Baudart:
Yeah, so... Yes, we are in prob.zlis, like we are still in a discrete world. Like we can... Actually, I shouldn't say that because there is two notions of confidence, right? So
Alex:
Haha. Heh.
Guillaume Baudart:
when I say we are in a discrete world, it means that we have a discrete notion of time and at each time step, like you compute streams of values or streams of distributions, but you have a notion of what is the value of each stream at each time step, right?
Alex:
Uh huh. Okay.
Guillaume Baudart:
Then... You can program your probabilistic models using continuous or discrete distributions. This is something you can do. But so far, if you want to do some simulation of physics or things like that, you need to discretize it in ProbZelus.
Alex:
Mm-hmm.
Guillaume Baudart:
That being said, since ProbZelus is built on top of the programming language Zelus,
Alex:
Mm-hmm.
Guillaume Baudart:
You can express some continuous component and use that as a set to generate some observation that you feed to your model, but the model itself needs to be discrete.
Alex:
Okay, I see.
Guillaume Baudart:
And so to answer your question about concrete applications, so what we did to play an experiment with Propzelus is starting with use cases from robotics. And so
Alex:
Mm-hmm.
Guillaume Baudart:
we had this use case where we wanted to implement a non-trivial robot controller. And so we implemented a linear quadratic regulator, which is known to be optimal for some metrics. And the idea is that we wanted to use a tracker model that is implemented as a net shaman. So estimating your position from noisy observations at each time
Alex:
Mm-hmm.
Guillaume Baudart:
step and feed that to the LQR and see whether you can reach the objective or not. So this was one use case. And we have one other application is that we build a toy version of the SLAM problem. in the simultaneous location and mapping problem. So very briefly, like you have an agent that is trying to estimate its position and a map of its environment, right? And in the discrete version, the environment is just a grid that is painted black or white,
Alex:
Mm-hmm.
Guillaume Baudart:
and the agent can look at its feet and see white or black and move to another cell. And the problem is that the sensor is faulty, so sometimes you see white and the square is actually black. And also you have slippery wheels. So sometimes you say, I want to go up and you actually stay in place. Oh, go right and stay in place.
Alex:
Hmph.
Guillaume Baudart:
And given that, you want to estimate both your current position and view of the map. And the interesting part in PropZelith is that we were also able to build a simple controller that was trying to explore the part of the map with the most uncertainty first. So because you have this retraction between the result of the inference and the controller.
Alex:
Mmm, okay. Yeah, this is so nice. Is there any paper or video or tutorials that you can add to the show notes? So, demonstrating that.
Guillaume Baudart:
p.jl was published at PLEI in:Alex:
Mm-hmm.
Guillaume Baudart:
in this paper we described the robot example, I think,
Alex:
Ah,
Guillaume Baudart:
and the
Alex:
perfect.
Guillaume Baudart:
es. So this one was at OOPSLA:Alex:
Okay. Yeah, for sure. Like, add these two ones to the show notes. Pretty sure listeners will appreciate that. And another thing I was really curious about when preparing for your episode was that Probsilus can be used... to prove the absence of bugs in a complex system which if I understood correctly so basically like trying to prove that there is no bugs in an autopilot of a plane so did I understood that right? and if yes how is that possible? because that's really hard to me to be able to prove a negative
Guillaume Baudart:
Yeah, so I don't think this is quite right, but this is an interesting question.
Alex:
Okay. Yeah, feel free to say I'm wrong. Like I'm wrong a lot of times. So it's no problem.
Guillaume Baudart:
No, it's probably a misadvertisement, I guess. But the motivation for this kind of language is putting aside the probabilistic part, is that what you write is really close to the mathematical semantics. So
Alex:
Mm-hmm.
Guillaume Baudart:
it's really close to the mathematical objects, right?
Alex:
Okay, yeah.
Guillaume Baudart:
And then all... The idea behind the work on synchronous languages is that the compilation chain is correct by construction. So what you execute is exactly what you want. And for classic synchronous languages, there are some proof assistants or model checkers that are dedicated to these languages or programs written in these languages. Because again, the expressivity is very restricted. problem can be a little bit easier. So in some cases you are able to prove some properties. So not the absence of bug, but you can prove some properties like say, like an alarm will always go on or you never have buffer overflows or things like that. And
Alex:
Mmm.
Guillaume Baudart:
so in the same vein, we do have static analysis that guarantee a bunch of stuff and like the work on the... and static analysis to prove bounded memory execution is part of that. Like we can prove that there is no bug with respect to execution in bounded memory. But that's...
Alex:
I think,
Guillaume Baudart:
that's
Alex:
okay.
Guillaume Baudart:
pretty much it.
Alex:
Okay, but so that would be something that could be tried, that you could try to use to try and see if there is no bug in an autopilot basically.
Guillaume Baudart:
Yeah, so there are some work like, there is a model checker that is called the kind to for used like languages, like synchronous
Alex:
Uh-huh.
Guillaume Baudart:
languages, and you can try to write your autopilot program then write the properties that you want to be true for this program and see if you can automatically prove them. And so check if there are mistakes or not. What we haven't tried is to use that on probabilistic models and what do we need? what do we need to do to extend these tools to be able to verify probabilistic models. And this is an entire new area of research,
Alex:
Hmm.
Guillaume Baudart:
like doing the kind of work that people do with proof assistants and model checkers but manipulating probabilities. I think it's fascinating and I haven't looked at that yet. I know that there is a very recent paper that... the Stan compiler in Coq by Jean-Baptiste Tristan and Joseph de Favoutier at Boston College. If people are interested, probably something to check.
Alex:
Yeah for sure. And again, add that to the show notes because that's definitely something super interesting and I'm pretty sure listeners will be interested. And if you are, folks, let me know and I could invite some of the authors of these analyses to the show to do some follow-up episodes. Now we have an idea of what ProbeZillus is about and what it allows you to do. Now I'm curious about the frontier basically of the package and the challenges and limitations that practitioners will face or are facing right now when working with a language like ProbeZillus.
Guillaume Baudart:
Yeah, so there are actually two questions here, right? So as a practitioner... As a practitioner, like, Prop.ZL is a prototype, so you can run example and try to simulate some example on your own laptop, but it's
Alex:
Uh-huh.
Guillaume Baudart:
not at a state where you can embed the code. And actually, so, I come from the community of programming language design and semantics and stuff like that. And the low level aspects of what you need to do to output low level code. is not something that we focused that much on. And so right now the prototype is emitting OCaml code, so it's a relatively high level. And there is still a lot of work to do from OCaml to a low level C or whatever code that you want to embed.
Alex:
I see.
Guillaume Baudart:
So, yeah, so what we wanted to do is to have something to be able to... some experiments and prove some ideas and see what we were able to express. So it's not something that you can already use to program your own editing system. That being said, a lot of ideas that originated from our team and mostly from Marc Pouzet when he was working on Lustre and Lucid Synchron and then Zellus were adopted by Ansys and there is an industrial tool that is called Pouzet. that is called SCADE that is used for example at Airbus to program airplane controller, but not with the probabilistic part of course.
Alex:
Hmm, okay.
Guillaume Baudart:
And for researchers, I think there are a lot of interesting questions. So one thing that was really tricky for us was to define the semantics of the language. So what are the mathematical objects that you manipulate with the language? Because we have this notion of streams and you want to be able to combine streams inside the probabilistic world. So what does that even mean? And you want to add this inference in the loop. So what does that mean? To look at intermediate results.
Alex:
Mm-hmm.
Guillaume Baudart:
So defining the semantic was tricky, so that's one point. And then the restrictions that we have because of the settings are also imposing new constraints. So the main example is bounded resources.
Alex:
Hmm. Yeah, so like, definitely something that will be on the roadmap for a long time.
Guillaume Baudart:
And the last thing I wanted to say is for practitioners, one thing that is very tricky is to convince people that it's actually useful. Because
Alex:
Ah,
Guillaume Baudart:
like
Alex:
okay.
Guillaume Baudart:
embedded system designers, they are used to uncertainty, right? We do have airplanes everywhere that are flying around and they do evolve in an uncertain world with noisy environments. They do encounter birds sometimes and they have to react to that. So how did they do that? And... And the answer is that they are manually coding some stochastic controllers that are proved to be safe on pen and paper, but they are deriving the equation by hand and they are programming that. So if you talk to an ability system designer, they are used to say Kalman filters, for example, they know how to implement that, and
Alex:
Yeah.
Guillaume Baudart:
it says this is all I need to estimate a position, and I know how to derive the equation. why should I write the probabilistic model? And the point we want to make is that by making the underlying probabilistic model completely explicit, you make explicit all your assumptions about the environment, and that's actually better and easier to modify, and it's more modular, and so on.
Alex:
Yeah, that's interesting because it's also a general remark, I would say, about patient statistics.
Guillaume Baudart:
Yeah.
Alex:
It is definitely something I have to say a lot also in a lot of my courses and teachings and consulting work, where it's definitely a lot of this. Why would... that kind of method be useful? Which is a good question. I mean, I like that question. And yeah, usually the explicit modeling and explicitation of your priors, which are just assumptions to me is something that makes that framework really powerful.
Guillaume Baudart:
Yeah, and the other thing is that sometimes it's difficult to... I mean, it's... Like, there is a step that people need to make when they are trying to use Bayesian methods, right?
Alex:
Mm-hmm.
Guillaume Baudart:
They need to think about the generative process and stuff like that, and it's not completely natural, at least for some people. And it's difficult to convince them that it's actually better than writing the solution. And... I mean, the Canon filter is a perfect example because you can always say, oh, but you are already doing that. Like it's just the same direction.
Alex:
ty years, so beginning of the:Guillaume Baudart:
Yeah, so that's great questions. So yeah, so this started when I went back to France. So I was contacted by Christine Tasson,
Alex:
Mm-hmm.
Guillaume Baudart:
who was working on the probabilistic programming, but from a really semantics point of view, like denotational semantics and category theory and stuff like that. And she wanted to build a proposal for a course at MPRE. So it's a. Parisian master for research in computer science. Something like
Alex:
Mm-hmm.
Guillaume Baudart:
that. Maybe there is an official acronym in English. I should check. And yeah, so it's not only ENS students. There are also people from Paris-CET university. Oh, it's not Paris-CET anymore. It's Paris-Cité university and Polytechnique and Paris-Saint-Claire. People from all background. These are students with a very strong background in theoretical computer science and mathematics. And so the emphasis for the course was really to do something that was in the middle between a practical implementation and semantics and denotational semantics, typing, category theory and programming language theory. I think it got accepted, like the proposal got accepted because probabilistic programming is getting more and more traction.
Alex:
Mm-hmm.
Guillaume Baudart:
And you can see like there is a boom in the number of published papers on the topic. So that must have been one reason.
Alex:
Mm-hmm.
Guillaume Baudart:
And the other thing is that there were other courses that were teaching a little bit of probabilistic programming, like for example there is probabilistic lambda calculus or like another course on denotational semantics that were presenting this probabilistic programming as a use case,
Alex:
Mm-hmm.
Guillaume Baudart:
not really a use case but part of the course. And what we wanted to do is to propose something that was really focused on Bayesian inference. How do you reason about that?
Alex:
Hmm.
Guillaume Baudart:
The idea was to divide the course between Christine and me. So Christine was doing the denotational semantics and like a really abstract formal part of the course. And I really wanted to give the students a taste of how can you implement these things and what can we use from a probabilistic, from programming language theory that can help us. And so for example, this include like building a type system. for a probabilistic language and then how to use functional programming to implement your own runtime libraries with your own insurance algorithms and
Alex:
Mm-hmm.
Guillaume Baudart:
things like that. And progressively adding more and more to the language design. And the last thing is that we wanted to give them a taste of a lot of different probabilistic programming languages. And so we try to build some labs that are using a lot of different flavors of poetic languages. So like Stan, Pyro, not PyMC yet, but
Alex:
Not PMC, what's up?
Guillaume Baudart:
maybe later. I did use an example from a user, Bayesian Metals for Hackers, that was written in
Alex:
Oh
Guillaume Baudart:
PyMC.
Alex:
yeah. Yeah.
Guillaume Baudart:
But it was for the first course, so I wanted them to implement that in their own. PPL.
Alex:
Yeah, I see. Nice, that's super cool. And of course, like if you need any help for poems, the examples, feel free to reach out. I will be happy to help. As long as the program is not written in French because I cannot code in French. Ha ha ha. Ha ha
Guillaume Baudart:
It's
Alex:
ha.
Guillaume Baudart:
not in
Alex:
Ha
Guillaume Baudart:
French.
Alex:
ha ha. Ha ha ha. Ha ha ha. Yeah, no, that's cool. And so, who's the audience of that course? What level are they and what are they studying as a major?
Guillaume Baudart:
Right, so it's a second year of master's and it's a master's where students can choose whatever they want in a relatively huge set of different courses. But
Alex:
Mm-hmm.
Guillaume Baudart:
most people that were attending the course were focusing on functional programming, static analysis, programming language semantics and things like that. And so there was a... A lot of effort at the beginning of the course was to explain what is Bayesian inference and why does it work backward compared to what we are used to. And this is why I said there is a gap. You need to cross that gap and it's not obvious for a lot of people. How does that work and why are we asking questions about outputs and who is giving us the... this output from a Quran.
Alex:
Mm-hmm. I see. And is that one of the main difficulties you saw that students were having?
Guillaume Baudart:
That's one difficulty, and
Alex:
Mm-hmm.
Guillaume Baudart:
so really understanding what is inference, like what is Bayesian inference. Another difficulty is that what is the difference between just a Bayesian model where you have your prior, and then
Alex:
Mm-hmm.
Guillaume Baudart:
you do some observations and you extract the posterior, and probabilistic programming, which is basically... like Bayesian modeling but on steroids, right? Because
Alex:
Mm-hmm.
Guillaume Baudart:
you can write a probabilistic program that has no inputs and that computes a really weird distribution using Bayesian inference. And it's not like the classic coin example, right? So there is no real notion of inputs or outputs, but you can still use these objects and define weird model. And the question then is, can we give a precise mathematical sense to this kind of object? what is it and how can we manipulate them. And so this is one difficulty. And the second difficulty is that the semantics that we present are based on a major theory. And it's not completely the case right now, but it's still not something that is heavily studied in France and it's possible to go around it by its mere curriculum. When I was growing up, I think I had one course about measure theory in my undergrad and that was it. It was really a black hole.
Alex:
Yeah.
Guillaume Baudart:
No, it's not the case anymore, but for some of the students it's something that they have relatively little knowledge about. That
Alex:
Yeah,
Guillaume Baudart:
can also be true.
Alex:
for sure. I can understand that. And so I'm wondering if there is one main thing that you want to adapt for next semester or next year and the next time that you're going to teach that course, based on that feedback that you already got from the first iteration.
Guillaume Baudart:
Um... That's a very good question, I haven't thought about that at all.
Alex:
Haha.
Guillaume Baudart:
One thing that I want to adapt for sure is projects, so we typically give a project every year and I want the students to be confronted to this notion of inference and inference algorithm relatively early in the course and so I want them to try to program it as early as possible.
Alex:
Mm-hmm.
Guillaume Baudart:
And I would like to, yeah, I want them to have a connection between this kind of inference algorithm and what is really implemented in practice in actual programming languages like PyMC or PyRu or Edward or things like that. And so last year it was really like implement whatever inference you want and try that on a simple model. and I want to do something more guided and maybe a little bit more advanced this year.
Alex:
nice okay well interesting uh really curious about uh how that will go um
Guillaume Baudart:
Yeah.
Alex:
so now that we um so almost at the end of the show so um let's kind of de-zoom a bit and i'm curious in your opinion what are the most exciting developments that you see in the field of probabilistic programming languages.
Guillaume Baudart:
Yeah, that's a very good question because it's a really active field, like in many different directions. So I will probably focus on a few of them. One thing is that all these new advances on the inference front, and so there are these things, a lot of work about synthesizing variational guides from your program automatically. If you think about the Pyro language, at the beginning you needed to write by hand your inference guide, and it was looking really similar most of the time to the model, at least for basic usage.
Alex:
Mm-hmm.
Guillaume Baudart:
And they had this zoo of auto guides that they implemented relatively recently, like one or two years ago, where they started to try to automatically synthesize some guides, and there are some new works about that that are published. even like this here, using programming language theory to improve the synthesis process for that. So this is really interesting and this is to be put aside with this new normalizing flow variational inference where you can have amazing results with that. So I think
Alex:
Hmm.
Guillaume Baudart:
this is really exciting. I really want
Alex:
Oh yeah,
Guillaume Baudart:
to know
Alex:
for sure.
Guillaume Baudart:
more about that. The other thing is that for the practitioner, there is this entire line of work about the Bayesian workflow, where the question is, okay, you have your polystyc models, but how do you come up with that? What do you need to be able to iterate over your models to do better approximations and better inference and stuff like that? I think this raises a lot of very interesting questions for language design. What can you do to help programmers with this kind of workflow?
Alex:
Okay
Guillaume Baudart:
That for me as a programming language designer, this is really interesting.
Alex:
Yeah.
Guillaume Baudart:
And finally, for the semantics, like on the abstract level, there is all this work about the new quasi-borel spaces and probabilistic counts where people are trying to define measures and sigma algebras of a very weird set, like a higher order function and things like that. And this line of work is super interesting. Fascinating and nice. Yep.
Alex:
Yeah, yeah, agreed. Definitely some good stuff ahead. Yeah, and then actually, if we now go back to Probzillas a bit, if you look ahead, I'm curious what future directions or research areas you see for, for Probzillas and its integration into into practical systems or industries.
Guillaume Baudart:
Yeah, so on a research point of view, we are still actively working on inference techniques and static analysis around
Alex:
Mm-hmm.
Guillaume Baudart:
Probe-Zelus. But there is one thing that I think is very promising is leveraging the compiler chain for probabilistic models. And what I mean by that is that if you are able to... Consider the Kalman filter again, because this is a paradigmatic example for us. You know that there is an exact solution and you are able to compute it right. So
Alex:
Mm-hmm.
Guillaume Baudart:
what if the compiler was able to compute the exact solution for you and generate the code? If you were able to do that, you don't need to embed the inference engine anymore. You can just run the deterministic code even though you are still working with probabilistic models. Another question is... Can we do that for other models? Like, how far can we go? And can we bound the number of resources that we are willing to embed and still get some good results, some good approximation results? And what kind of static analysis can we do on this kind of model?
Alex:
I see, yeah. Yeah, that'd be cool indeed. Okay, cool. Well, before we close up the show, I could still ask you a lot of questions, but I want to be mindful of your time. So, is there a topic I didn't ask you about that you'd like to mention?
Guillaume Baudart:
No, I think
Alex:
Great.
Guillaume Baudart:
we covered pretty much everything.
Alex:
Cool. So that was my challenge. So apparently I did it well. So awesome. Well, Guillaume, before stopping the recording, I have to ask you the last two questions I ask every guest at the end of the show. So first one, if you had unlimited time and resources, which problem would you try to solve?
Guillaume Baudart:
Yeah, so this is a really difficult question for me because I'm not really good at selecting projects and diving people around that. And so I will give a personal answer which is, I really like the French academic system where it's only partially based on the grants, right?
Alex:
Mm-hmm.
Guillaume Baudart:
So you have a lot of people that can work on whatever they want.
Alex:
Mm-hmm.
Guillaume Baudart:
And I really like this idea and I think there is a lot of huge progress that were made because people were working on very strange areas of research.
Alex:
Mm-hmm.
Guillaume Baudart:
So what I would do if I had infinite resources is that I would give like a fraction of these infinite resources to a lot of bright people and you can work on whatever you want. And hopefully this will help to solve some of the big problems in the future. The trick is that a fraction of infinity is still a lot, so I'm glad
Alex:
Yeah.
Guillaume Baudart:
that I can do that.
Alex:
Yeah, that makes sense. And second question, if you could have dinner with any great scientific mind, dead, alive or fictional, who would it be?
Guillaume Baudart:
Um, yeah. Um, so, uh, again, I'm not, I'm not great with social events, so dinner is a really stressful
Alex:
Heh.
Guillaume Baudart:
event for me. Like, uh, you just have one dinner, like you need to make a try. Um, but then again, I am fascinating with origin stories, and I would really like to, to combine people that can interact together and see how they can exchange their views on
Alex:
Hehe.
Guillaume Baudart:
different stuff. So I... I would really love to have a dinner with John von Neumann
Alex:
Mm-hmm.
Guillaume Baudart:
and Grace Hopper, the first compiler with the first computer designer, and then add John Bacchus for the first high level programming language designer and see how they interact and what kind of ideas they can exchange and if that would have changed something in their design. So you start from a very high level. Like, if you think about curriculum, so right now in computer science, you really start from very high level programming languages and you typically go down through the chain. And history was the opposite, right? We started from really low level stuff and building from them. So what would have been different? I think that would make a great dinner.
Alex:
Yeah,
Guillaume Baudart:
That is for me.
Alex:
yeah. That does sound good. I agree. Awesome. Well, Guillaume, thanks a lot. I really learned a lot. That was really an original topic in an episode. So thanks a lot about all of that and diving into all the details. As
Guillaume Baudart:
Thank
Alex:
usual.
Guillaume Baudart:
you very much.
Alex:
We'll put resources and a link to your website in the show notes for those who want to dig deeper. Thank you again, Guillaume, for taking the time and being on this show.