Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
In this episode, Dmitry Bagaev discusses his work in Bayesian statistics and the development of RxInfer.jl, a reactive message passing toolbox for Bayesian inference.
Dmitry explains the concept of reactive message passing and its applications in real-time signal processing and autonomous systems. He discusses the challenges and benefits of using RxInfer.jl, including its scalability and efficiency in large probabilistic models.
Dmitry also shares insights into the trade-offs involved in Bayesian inference architecture and the role of variational inference in RxInfer.jl. Additionally, he discusses his startup Lazy Dynamics and its goal of commercializing research in Bayesian inference.
Finally, we also discuss the user-friendliness and trade-offs of different inference methods, the future developments of RxInfer, and the future of automated Bayesian inference.
Coming from a very small town in Russia called Nizhnekamsk, Dmitry currently lives in the Netherlands, where he did his PhD. Before that, he graduated from the Computational Science and Modeling department of Moscow State University.
Beyond that, Dmitry is also a drummer (you’ll see his cool drums if you’re watching on YouTube), and an adept of extreme sports, like skydiving, wakeboarding and skiing!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Takeaways:
- Reactive message passing is a powerful approach to Bayesian inference that allows for real-time updates and adaptivity in probabilistic models.
- RxInfer.jl is a toolbox for reactive message passing in Bayesian inference, designed to be scalable, efficient, and adaptable.
- Julia is a preferred language for RxInfer.jl due to its speed, macros, and multiple dispatch, which enable efficient and flexible implementation.
- Variational inference plays a crucial role in RxInfer.jl, allowing for trade-offs between computational complexity and accuracy in Bayesian inference.
- Lazy Dynamics is a startup focused on commercializing research in Bayesian inference, with the goal of making RxInfer.jl accessible and robust for industry applications.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
In this episode, Dmitry Bagaev discusses
his work in Bayesian statistics and the
2
:development of RxInfer.jl, a reactive
message passing toolbox for Bayesian
3
:inference.
4
:Dmitry explains the concept of reactive
message passing and its applications in
5
:real-time signal processing and autonomous
systems.
6
:He discusses the challenges and benefits
of using RxInfer.jl, including
7
:its scalability and efficiency in large
probabilistic models.
8
:Dimitri also shares insight into the
trade-offs involved in Bayesian inference
9
:architecture and the role of variational
inference in rxinfer.jl.
10
:Additionally, he discusses his startup
Lazy Dynamics and its goal of
11
:commercializing research in Bayesian
inference.
12
:Finally, we also discussed the user
friendliness and trade-offs of different
13
:inference methods, the future developments
of rxinfer,
14
:and the future of automated patient
entrance.
15
:Coming from a very small town in Russia
called Nizhny Komsk, Dmitry currently
16
:lives in the Netherlands, where he did his
PhD.
17
:Before that, he graduated from the
computational science and modeling
18
:department of Moscow State University.
19
:Beyond that, Dmitry is also a drummer,
you'll see his cool drums if you're
20
:watching on YouTube, and an adept of
extreme sports like skydiving,
21
:wakeboarding, and skiing.
22
:Learning Basin Statistics, episode 100,
,:
23
:Dmitry Pagaev, welcome to Learning Basin
Statistics.
24
:Thanks.
25
:Thanks for inviting me for your great
podcast.
26
:Really, I feel very honored.
27
:Yeah, thanks a lot.
28
:The honor is mine.
29
:That's really great to have you on the
show.
30
:So many questions for you and yeah, we're
also gonna be able to talk again about
31
:Julia, so that's super cool.
32
:And I wanna thank of course Albert
Podusenko for putting us in contact.
33
:Thanks a lot Albert, it was a great idea.
34
:I hope you will love the episode.
35
:Well I'm sure you're gonna love Dmitry's
part, and mine is always...
36
:more in the air, right?
37
:And well, Dmitry, thanks again, because I
know you're a bit sick.
38
:So I appreciate it even more.
39
:And so let's start by basically defining
what you're doing nowadays, and also how
40
:did you end up doing what you're doing
basically?
41
:Yes.
42
:So I'm currently working at the University
of Technology in bias lab.
43
:And I just recently finished my PhD in
Bayesian statistics, essentially.
44
:So now I'm just like supervised students.
45
:I did some of the projects there and bias
lab itself is a group in the university
46
:that primarily work on like a real time
Bayesian signal processing.
47
:And we do research in that field.
48
:And the slogan, let's say of the lab is
sort of like, is natural artificial
49
:intelligence and it's phrased.
50
:Uh, like specifically like that, because
there's, there cannot be natural
51
:artificial intelligence.
52
:So it's like a play words, let's say.
53
:Um, and the, the lab is basically trying
to like develop automated, um, control
54
:systems or like novel signal processing
applications.
55
:And it's basically inspired by, uh,
neuroscience.
56
:I know.
57
:And we also opened a startup with my
colleagues.
58
:which is called Lazy Dynamics.
59
:And the idea is basically to commercialize
the research in the lab, but also to find
60
:the new funding for new PG students for
the university.
61
:But they're still quite young, so we are
still like less than one year, and we are
62
:currently like in search of clients and
potential investors.
63
:But yeah, my main focus still remains
being a postdoc in the university.
64
:Yeah, fascinating.
65
:So many things already.
66
:Um, maybe what do you do in your postdoc?
67
:Um, so my main focus, like primary is, uh,
supporting, uh, the toolbox that we wrote,
68
:uh, in our lab that I am a primary author.
69
:We call this toolbox, uh, RX and Ferb.
70
:Uh, and this is like essential part of my
PhD project.
71
:Um, and basically I love to code.
72
:So, um, more or less like, uh,
73
:my scientific career was always aligned
with software development.
74
:And the Erikson FUR project was a really
big project and many other projects in
75
:BiasLab, they depend on it.
76
:And it requires maintenance, like box
fixing, adding new features, performance
77
:improvements.
78
:And, and we are currently have several sub
projects that we develop alongside for the
79
:Erikson FUR.
80
:And that's just like the main focus for
me.
81
:And as something else, I also supervise
students for this project.
82
:Yeah, yeah.
83
:Of course.
84
:That must also take quite some time,
right?
85
:Yes, exactly.
86
:Yeah.
87
:Yeah, super cool.
88
:So let me start basically by diving a bit
more into the concepts you've just named,
89
:because you've already talked about a lot
of the things you work on, which is.
90
:a lot, as I guess listeners can hear.
91
:So first, let's try and explain the
concept of reactive message passing in the
92
:context of Bayesian inference for
listeners who may not be familiar with it,
93
:because I believe it's the first time we
really talk about that on the show.
94
:So yeah, talk to us about that.
95
:Also, because from what I understand, it's
really the main focus of your work, be it
96
:through RxInfR.
97
:infer.jl or lazy dynamics or biaslam.
98
:So let's start by having the landscape
here about reactive message passing.
99
:Yes, good.
100
:So yeah, ARIKS and FER is what we call
reactive message passing based Bayesian
101
:inference toolbox.
102
:And basically in the context of Bayesian
inference, we usually work with
103
:probabilistic models.
104
:And the probabilistic model is usually a
function of some variables and some
105
:variables are being observed.
106
:And we want to infer some probability
distribution over unobserved variables.
107
:And what is interesting about that is that
if we have a probabilistic model, we can
108
:actually represent it as a graph.
109
:And for example, if we can factorize our
probabilistic model into a set of factors,
110
:such that each node will be a factor and
each edge will be a variable of the model,
111
:more like hidden state, and some of them
are observed or not.
112
:And basically message passing by itself is
a very interesting idea of solving Bayes
113
:rule for a probabilistic model defined in
terms of the graph.
114
:So it does it by sending messages between
nodes in the graph, along edges.
115
:And it's quite a very big topic actually.
116
:But essentially here to understand is that
we can do that, right?
117
:So we can reframe the base rule as
something that has this messages in the
118
:ground, uh, reactive message passing, uh,
is a particular implementation, uh, of
119
:this idea.
120
:So, because in the traditional message
passing, we usually have to define an
121
:order of messages, like how, in what order
do we compute them?
122
:It may be very crucial, for example, if
the graph structure has loops.
123
:So there is like some structural
dependencies in the graph and reactive
124
:message passing basically says, okay, no,
we will not do that.
125
:We will not specify any order.
126
:Instead we will react on data.
127
:So, and, uh, the, the order of message
computations, uh, becomes essentially data
128
:driven and we do not enforce any
particular, uh,
129
:order of competition.
130
:OK, so if I try to summarize, that would
be something like, usually when you work
131
:on a Bayesian model, you have to specify
the graph and the order of the graph in
132
:which direction the nodes are going.
133
:In reactive message passing, it's more
like a non-parametric version in a way
134
:where you just say, there are these stuff,
but you're not specifying the.
135
:the directions and you're just trying to
infer that through the data.
136
:How wrong is that characterization?
137
:Not exactly like that.
138
:So indeed the graph that we work with,
they don't have any direction in them,
139
:right?
140
:Because messages, they can flow in any
direction.
141
:The main difference here is that reactive
message passing reacts on changes in data
142
:and updates posteriors automatically.
143
:Right?
144
:So.
145
:There is no particular order in which we
update the series.
146
:For example, if we have some variables in
our mode, like ABC, we don't know which
147
:will be updated first and which will be
the last.
148
:It basically depends on our observations.
149
:Uh, but, uh, it works like that, that as
soon as we have new observation, uh, the
150
:graph reacts in this observation and
updates the series as soon as it can.
151
:without explicitly specifying this order.
152
:And why would you do that?
153
:Why would that be useful?
154
:So it's a very good question.
155
:So because in BiasLab, we essentially work
with, we try to work with autonomous
156
:systems.
157
:And autonomous systems, they have to work
in the field, right?
158
:So like in the real world environment,
let's say, right?
159
:And
160
:Real world environment is extremely
unpredictable.
161
:If we want to, to be more clear, let's say
we try to develop a drone, which tries to
162
:navigate the environment and it has like
several sensors and we want to build a
163
:probabilistic model of the environment,
such that drones wants to act in this
164
:environment and like in sensors, it has
some noise in it.
165
:Like, uh, so essentially.
166
:We cannot predict in what order the data
will be arriving, right?
167
:Because you may have a video signal, you
may have an audio signal and this, um,
168
:devices that record video, let's say they
also have unpredictable update rate.
169
:Usually it's maybe like 60 frames per
second, but it may change.
170
:Right.
171
:Um, so instead of like fixing the
algorithm and saying, okay, we wait for
172
:like new frame.
173
:from a video, wait for a new frame from an
audio, then we update, then we wait again.
174
:Instead of doing that, we just simply let
the system react on new changes and update
175
:the series as soon as possible.
176
:And then based on new posteriors, we act
as soon as possible.
177
:This is kind of the main idea of reactive
implementations.
178
:And in traditional software,
179
:for Bayesian inference, for example, we
just have a model, and we have a data set,
180
:and we feed the data set to the model, and
we have the posterior, and then we analyze
181
:the posterior, and it also works really
great, right?
182
:But it doesn't really work in the field
where you don't have time to synchronize
183
:your data set and to react as soon as you
can.
184
:Okay, okay, I see.
185
:So that's where, basically,
186
:This kind of reactive message passing is
extremely useful when you receive data in
187
:real time that you don't really know the
structure of.
188
:Yes, we work primarily with real-time
signals.
189
:Yes.
190
:Okay, very interesting.
191
:Actually, do you have any examples, any
real-life examples that you've worked on
192
:or...
193
:You know, this is extremely useful to work
on with RxInfoR.jl or just in general,
194
:these kind of relative messages passing.
195
:Yes.
196
:So I myself, I usually do not work with
applications.
197
:So my primary focus lies in the actual
Bayesian inference engine.
198
:But in our lab, there are people who work,
for example, on audio signals.
199
:Right.
200
:So you want to you want, for example,
maybe create a probabilistic model of
201
:environment to be able to denoise speech
or it or it may be like a position
202
:tracking system or a planning system in
real time.
203
:In our lab, we also very often refer to
the term active inference.
204
:which basically defines a probabilistic
model, not only of your environment, but
205
:also of your actions, such that you can
infer the most optimal course of actions.
206
:And this might be useful in control
applications, also for the drone, right?
207
:So we want to infer not only the position
of the drone based on sensors that we
208
:have, but also how it should act to avoid
an obstacle, for example.
209
:I see.
210
:Yeah, OK, super interesting.
211
:So basically, any case where you have
really high uncertainty, right, that kind
212
:of stuff, OK, yes, super interesting.
213
:And so what prompted you to create a tool
for that?
214
:What inspired you to develop our existing
Forto.jl?
215
:And maybe also tell us how it differs from
traditional Bayesian inference tools.
216
:be it in Python or in R or even in Julia.
217
:If I'm a Julia user, I'm used to use
probabilistic programming language in
218
:Julia, then what's the difference with
RxInfoR?
219
:This is a good question.
220
:But there are two questions in one about
inspiration.
221
:So I joined the bias lab in 2019.
222
:without really understanding what it is
going to be about.
223
:So, but really understanding how difficult
it is really.
224
:So, and the inspiration for me came from
the project that I started my PhD on.
225
:And basically the main inspiration in our
lab is like the so-called the free energy
226
:principle, which kind of tries to explain.
227
:how natural biotic systems behave.
228
:Right.
229
:So, and they basically say they define
so-called Bayesian brain portesies and
230
:pre-energy principles.
231
:So they basically say that any biotic
system, they define a probabilistic model
232
:of its environment and tries to infer the
most optimal course of action to survive
233
:essentially.
234
:But all of this is based on Bayesian
inference as well.
235
:So, right.
236
:At the end.
237
:It kind of, it's a very good idea, but at
the end, it all boils down to the, to the
238
:Bayesian inference.
239
:And basically if you look how biotech
system work, we, we note that there are
240
:very specific properties of this biotech
system.
241
:So they do not consume a lot of power.
242
:Right.
243
:It's actually, it has been proven that our
brain consumes like about 20 Watts of
244
:energy, right.
245
:And it's like an ex.
246
:extremely efficient device, if we can say,
right?
247
:It does not even compare with
supercomputers.
248
:It's also scalable because we live in the
very complex environment with many
249
:variables.
250
:We act in real time, right?
251
:And we are able to adapt to the
environment.
252
:And we are also kind of robust to what is
happening around us, right?
253
:So...
254
:If something new happens, we were able to
adapt to it instead of just failing.
255
:Right.
256
:And this is kind of the idea.
257
:So the inspiration for this Bayesian
inference toolbox that we need to be
258
:scalable, real time, adaptive, robust,
super efficient, and also low power.
259
:Right.
260
:So this is the main ideas behind RX
Inferior project.
261
:And here we go to the second part of the
question.
262
:How does it differ?
263
:Because this is exactly where we differ,
right?
264
:So other solutions in Python or in Julia,
also very cool.
265
:There are actually a lot of cool libraries
for Bayesian inference, but most of them,
266
:they have a different set of trades off or
requirements.
267
:And maybe I will be super clear.
268
:We are not trying to be better.
269
:But we are trying to have a different set
of requirements for the Bayesian different
270
:system.
271
:Yeah.
272
:Yeah, you're working on a different set of
needs, in a way.
273
:Yes, yes.
274
:And it's application-driven.
275
:Yeah, you're trying to address another
type of applications.
276
:Exactly.
277
:And if we directly compare to other
solutions, they are mostly based on
278
:sampling, like HMC or not.
279
:Or maybe they are like black box methods
like a GVI, automatic differential
280
:variation inference or VDI.
281
:And they basically, they are great methods
that they tend to consume a lot of
282
:computational power or like energy, right?
283
:So they do a very expensive simulation.
284
:It may run for maybe hours, maybe even
days in some situations.
285
:And they were great, but you cannot really
apply it in this autonomous systems where
286
:you need to...
287
:Uh, like if we're again talking about
audio, it's like 44 kilohertz.
288
:So we need to really perform Bayesian
inference and extremely fast scale.
289
:And it seems you're not, uh, are not
really applicable in this situation.
290
:So.
291
:Yeah, fascinating.
292
:And you were talking, well, we'll get back
to the computation part a bit later.
293
:Maybe first I'd like to ask you, why did
you do it with Julia?
294
:Why did you choose Julia for RxInfer?
295
:And what advantages does it offer for your
applications of patient inference?
296
:The particular choice of Julia was
actually driven by the needs of the bias
297
:lab in the university because all the
research which we do in the university now
298
:in our lab is done in Julia and that
decision has been made by our professor
299
:many, many years ago.
300
:Interestingly enough, our professor
doesn't really code.
301
:But Julia is a really great language.
302
:So if I would choose myself.
303
:If I, I would still choose Julia.
304
:It's, it's, it's a great language.
305
:It's fast.
306
:Right.
307
:So, and our primary concern is efficiency.
308
:Um, and like Python can also be fast.
309
:Uh, if you like know how to use it, if you
use an MP or like some specialized
310
:libraries, uh, but with July, it's, it's
really easy.
311
:It is easier.
312
:In some situations, of course, you need to
know a bit more.
313
:So my background is in C and C++.
314
:And I understand like how compilers works,
for example.
315
:So maybe for me, it's a bit easier to
write a performance Julia code.
316
:But in general, it's just, it's just
really, it's a nice, fast language.
317
:And it also develops fast in the sense
that new versions of Julia, they,
318
:come up like every several months.
319
:And it really gets better with each
release.
320
:Another thing which is actually very
important for us as well is macros.
321
:Are macros in Julia?
322
:So for people who are listening, so macros
basically allow us to apply arbitrary code
323
:transformations to the existing code.
324
:And it also allows you to create
sublanguage within a language.
325
:And why it is particularly useful for us
is that specifying probabilistic models in
326
:Bayesian inference is a bit hard or
tedious.
327
:We don't want to directly specify these
huge graphs.
328
:And instead, what we did and what Turing
also did and many other libraries in
329
:Julia, they came up with the main specific
language for specifying probabilistic
330
:programs.
331
:And it's extremely cool.
332
:So it's much, much simpler to define a
probabilistic program in Julia than in
333
:Python, in my opinion.
334
:And I really like this feature of Julia.
335
:Yeah, these basically building block
aspect of the Julia language.
336
:Yeah, yeah, I've heard that.
337
:There are other aspects I can mention of
Julia.
338
:By the way, maybe I also can make an
announcement regarding Julia is that the
339
:next Julia the con is happening in I'm
told in the city where I'm currently in.
340
:And it's going to be very cool.
341
:It's going to be in PC stadium in the
football stadium.
342
:Right.
343
:The technical is the technical conference
about programming language is going to be
344
:on the stadium.
345
:So, but so another aspect.
346
:about Julia is this notorious dynamic
multiple dispatch.
347
:And it was extremely useful for us in
particular for reactive message passing
348
:implementation.
349
:Because again, so if we think about how
this reactiveness work and how do we
350
:compute these messages on the graph, in
order to compute the message, we wait for
351
:inputs.
352
:And then when all inputs have arrived, we
have to decide
353
:how to compute the message.
354
:And computation of the message is
essentially solving an integral.
355
:But if we know types of the arguments, and
if we know the type of the node, it might
356
:be that there is an analytical solution to
the message.
357
:So it's not really necessary to solve a
complex integral.
358
:And we do it by multiple dispatch in
Julia.
359
:So multiple dispatch in Julia helps us to
pick the most efficient message update
360
:rule.
361
:on the graph, and it's basically built
into the language.
362
:It's also possible to emulate it in
Python, but in Julia, it's just fast and
363
:built-in, and it works super nice.
364
:No idea.
365
:Yeah, super cool.
366
:Yeah, for sure.
367
:Super interesting points.
368
:And I'm very happy because it's been a
long time since we've had a show with some
369
:Julia practitioners, so that's always very
interesting to hear of what's going on in
370
:that.
371
:in that field and yeah, I would be
convinced just by coming to PSV Eindhoven
372
:Stadium.
373
:You don't have to tell me more.
374
:I'll be there.
375
:Let's do a live show in the stadium.
376
:Yes, I will be there.
377
:Yeah.
378
:Yeah, that sounds like a lot of fun.
379
:And actually, so I'm myself an open source
developer, so I'm very biased to ask you
380
:that question.
381
:What were some of the biggest challenges
you faced when you developed RxInfer?
382
:And how did you overcome them?
383
:I guess that's like the main thing you do
when you're an open source developer is
384
:putting a tire.
385
:This is an amazing question.
386
:I really like it.
387
:So, and I even have like some of the
answers in my PhD dissertation.
388
:And I will probably just go ahead.
389
:I'll probably just quote it, but I don't
remember exactly how I framed it.
390
:But I took it from the book, which is
called, um, uh, software engineering for
391
:science.
392
:So, and basically it says that people
usually underestimate how difficult it is
393
:to create, um, a software in scientific
research area.
394
:Uh, and the main difficulty with that is
that there are no clear guidelines to
395
:follow.
396
:Uh, it's not like designing a website with
clear, like a framework rules and you just
397
:need tasks between like people and team.
398
:No, it's like, um, new insights of
science, like, or like an area where we
399
:work in that they happen every day.
400
:Right.
401
:And the requirements for the software,
they may change every day.
402
:Uh, and it's really hard to like come up
with a specific design before we start
403
:developing because
404
:requirements change over time because you
may create some software for research
405
:purposes and then you found out something
super cool which works better or faster or
406
:scales better and then you realize that
well you actually have to start over
407
:because this is just better we just we
just found out something cooler and
408
:It also means that a developer must invest
time into this research.
409
:So it's not only about coding, like you
should understand how it all works from
410
:the scientific point of view, from a
mathematical point of view.
411
:And sometimes if this is like a cutting
edge research, there are no books about
412
:how it works, right?
413
:So we must invest time in reading papers.
414
:Um, and also being able to write a good
code, which is fast and efficient.
415
:Uh, and all of these problems, they, they
also cured, uh, when we developed our
416
:extinfer, uh, even though I'm the main
author, uh, a lot of people have helped
417
:me, right?
418
:It's like, uh, very thankful for that.
419
:Uh, and for our extinfer in particular,
for my, I also needed to learn a very big
420
:part of statistics because when I joined
the lab,
421
:I actually didn't have a lot of experience
with Bayesian inference and with graphs
422
:and with message passing.
423
:So I really need to dive into this field.
424
:And many people helped me to understand
how it works.
425
:A lot of my colleagues, they have spent
their time explaining.
426
:And even though, right, so we have already
this stack of difficulties at the end or
427
:like maybe not at the end, but the
software that we use, we would like it to
428
:be.
429
:Easy to use, like, or user friendly.
430
:So we already have this difficulties about
we don't know how to design it.
431
:We have to invest time into reading
papers.
432
:But then we at the end, we want to have a
functional software that is easy to use,
433
:addresses different needs and allows you
to find new insights.
434
:So the software should be designed such
that it does not.
435
:impose a lot of constraints on what you
can do with this software, right?
436
:Because scientific software is about
finding new insights, not about like doing
437
:some predefined set of algorithm.
438
:You want to find something new
essentially.
439
:And software should help you with that.
440
:Yeah, yeah, for sure.
441
:That's a good point.
442
:What do you think, what would you say are
the key challenges in achieving
443
:scalability and efficiency in this
endeavor and how does RxInfair address
444
:this?
445
:Basically, we are talking in the context
of Bayesian inference and the key
446
:challenge in
447
:the base rule doesn't scale, right?
448
:It's, the formula looks very simple, but
in practice, then we start working with
449
:large probabilistic models.
450
:Just blind application of base rule
doesn't scale because it has exponential
451
:complexity with respect to the number of
variables.
452
:And Arikson-Ford tries to tackle this
by...
453
:having essentially two main components in
the recipe, like maybe three, let's say
454
:three.
455
:So first of all, we use factor graphs to
specify the model.
456
:So we work with factorized models.
457
:We work with message passing, and message
passing essentially converts the
458
:exponential complexity of the Bayes rule
to linear, but only for highly factorized
459
:models.
460
:And like highly factorized here is a
really crucial component, but many models
461
:are indeed highly factorized.
462
:It's it means that Variables do not
directly depend on all other variables.
463
:They directly depend on maybe a very small
subset of variables in the model.
464
:And the third component here is
variational inference.
465
:So because it allows us to trade off the
computational complexity with accuracy.
466
:So if the task is too difficult or it
doesn't scale, basically what variational
467
:inference gives you is the ability to
impose a set of constraints into your
468
:problem, because it reframes the original
problem as an optimization task.
469
:And we can optimize with up to a certain
constraint.
470
:For example, we may say that this variable
is distributed as a Gaussian distribution.
471
:It may not be true in reality and we lose
some accuracy, but at the end it allows us
472
:to solve some equations faster.
473
:And we can impose more and more
constraints if we don't have enough
474
:computational power and if you have large
model, or we may relax constraints if we
475
:have enough computational power and we
gain accuracy.
476
:So we have this sort of a slider.
477
:which allows us to scale better.
478
:But here's the thing, right?
479
:We always can come up with such a large
model with so many variables and so
480
:difficult relationships between variables
where it still will not scale.
481
:And this is fine.
482
:But Alexin Fur tries to push this boundary
for like scaling Bayesian inference to
483
:large models.
484
:And actually, so you're using variational
inference quite a lot in this endeavor,
485
:right?
486
:So actually, can you discuss the role of
variational inference here in RxInfer and
487
:maybe any innovations that you've
incorporated in this area?
488
:So the role I kind of touched upon a
little bit is that it acts as like a
489
:slider.
490
:Right.
491
:In in the controlling the complexity and
the accuracy of your inference result.
492
:This is the main role.
493
:Of course, for some applications, this
might be undesirable.
494
:For some applications, you may want to
have a perfect posterior estimation.
495
:But for some applications, it's not a very
big deal.
496
:Again, we are talking about different
needs for different application.
497
:here.
498
:And the innovation that RX and Fer brings,
I think it's like one of the few
499
:implementation as message passing, like
variational inference as message passing,
500
:because it's usually implemented as like
black box method that takes a function
501
:like a probabilistic model function and
maybe does some automatic differentiation
502
:or some extra sampling under the hood.
503
:And message passing by itself has a very
long history, but I think people
504
:mistakenly think that it's quite limited
to like some product algorithm.
505
:But actually, variational inference can
also be implemented as message passing.
506
:And it's quite good.
507
:So it opens the applicability of the
message passing algorithms.
508
:And also.
509
:As we already talked a little bit about
this reactive nature of the inference
510
:procedure, so it's also maybe even the
first reactive variational inference
511
:engine, which is designed to work with
infinite data streams.
512
:So it continuously updates this posterior
continuously does minimization.
513
:It does not stop.
514
:And as soon as new data arrive, we
basically update our posteriors.
515
:But in between this kind of data windows,
we can spend more computational resources
516
:to find better approximation for the
variational inference.
517
:But yeah, but all other solutions, let's
say that are also variational inference,
518
:they basically require you to, yeah.
519
:to wait for the data, then feed to the
data, or wait for the entire data set,
520
:feed the data set, and then you have the
result, then you analyze the result, and
521
:then you repeat.
522
:So RxInfoR works a bit differently in that
regard.
523
:Yeah.
524
:Fascinating.
525
:And that, I'm guessing you have some
examples of that up in the RxInfoR
526
:website, maybe we can...
527
:a link to that in the shows for people who
are interested to see how you would apply
528
:that in practice?
529
:So I,
530
:So it does not really require reactivity,
but because it's kind of like easy to use
531
:and fast, students can do some homework
for signal processing applications.
532
:What I already mentioned is that we work
with audio signals and with control
533
:applications.
534
:I don't really have a particular example
if our sensor is being used in the field.
535
:or by an industry.
536
:So it's primarily our research tool
currently, but we want to extend it.
537
:So it's still a bit more difficult to use
than Turing, let's say.
538
:Turing, which is also written in Julia,
because yeah, message passing is a bit
539
:maybe more difficult to use and it is not
that universal as HMC and NUTS still
540
:require some approximation methods.
541
:Yeah.
542
:So we still use it as a research tool
currently, but we have some ideas in the
543
:lab, how to expand the available set of
probabilistic models we can run an
544
:inference on.
545
:And yes, indeed, on our documentation, we
have quite a lot of examples where we can
546
:use, but these examples, they are, I would
say, educational in most of the cases.
547
:at least in the documentation.
548
:So we are at this stage where we have a
lot of ideas how we can improve the
549
:inference, how we make it faster, such
that we can actually apply it for real
550
:tasks, like for real drones, for real
robots, to make a real speech, like the
551
:noise or something similar.
552
:Yeah, definitely said.
553
:That would be super interesting, I'm
guessing, for people who are into these
554
:and also just want to check out.
555
:I have been checking out your website
recently to prepare for the episode.
556
:Actually, can you now...
557
:So you've shared some, like the overview
of the theory, how that works, what
558
:RxInfer does in that regard.
559
:Can you share what you folks are doing
with Lazy Dynamics, how that's related to
560
:that?
561
:How does that fit into this ecosystem?
562
:So yeah, Lazy Dynamics, we created this
company to commercialize the research that
563
:we do at our lab to basically find funding
to make our extrovert better and ready for
564
:industry.
565
:Because currently, let's say,
566
:Ericsson is a great research tool for our
purposes, right?
567
:But industry needs some more properties to
the addition that I have already
568
:mentioned.
569
:Right?
570
:For example, indeed the Bayesian inference
engine must be extremely robust, right?
571
:It does not allow to fail if we really
work in the field.
572
:And this is not really a research
question.
573
:It's more about like implementational
side.
574
:Right.
575
:It's like a good goal to good code
coverage, like great documentation.
576
:And this is what we kind of also want to
do with lazy dynamics.
577
:We want to take this next step and want to
create a great product for other
578
:companies, especially that can rely on Rx
and Fur in the maybe in their research or
579
:maybe even in the field.
580
:Right.
581
:And maybe we create some sort of a tools,
a tool set around RxInfer that will allow
582
:you to maybe debug the performance of your
probabilistic problem or your
583
:probabilistic inference, right?
584
:It's also not about research.
585
:It's about like having it more accessible
to other people, like finding bugs or
586
:mistakes in their model specification,
make it easier to use.
587
:Or maybe, for example, we could.
588
:come up with some sort of a library of
models, right?
589
:So you would want to build some autonomous
system and it may require a model for
590
:audio recognition, it may require a model
for video recognition.
591
:And this kind of set of models, they can
be predefined, very well tested, have a
592
:great performance, super robust.
593
:And basically Lazy Dynamics may provide an
access to this kind of a library.
594
:right?
595
:So, and for this kind of, because this is
not a research related questions, it's, it
596
:must be done in a company with like a very
good programmers and very good code
597
:coverage and documentation.
598
:But for research purposes, Ericsson-Fer is
already a great toolbox.
599
:And basically many students in our lab,
they already use it.
600
:But.
601
:Yeah, because we are all sitting in the
same room, let's say on the same floor, we
602
:can kind of brainstorm, find bugs, fix it
on the fly and they keep working that.
603
:But if we want Rx, for Rxinfer to be used
in industry, it really needs to be a
604
:professional toolbox with like a
professional support.
605
:Yeah.
606
:Yeah, I understand that makes sense.
607
:Surprised you can, I don't know when you
sleep though, between the postdoc, the
608
:open source project and the company.
609
:So yeah, it's a great comment, but yeah,
it's hard.
610
:Yeah, hopefully we'll get you some sleep
in the coming months.
611
:To get back to your PhD project, because I
found that very interesting.
612
:So your dissertation will be in the show
notes.
613
:But something I was also curious about is
that in this PhD project, you explore
614
:different trade-offs for Bayesian
inference architecture.
615
:And you've mentioned that a bit already a
bit earlier, but I'm really curious about
616
:that.
617
:So could you elaborate on these trade-offs
and why they are significant?
618
:Yes, we already touched a little bit about
that.
619
:So the main trade-offs here are kind of
computational load, efficiency,
620
:adaptivity, high power consumption, magic.
621
:Yeah.
622
:And another aspect actually, which we
didn't talk about yet is structural model
623
:adaptation.
624
:So this is the requirements that we are
favor.
625
:in the Ricks Center.
626
:And this has the requirements that were
like central to my PhD project.
627
:And this all arises, all of these
properties, they are not just coming from
628
:a vacuum.
629
:They are coming from real time signal
processing applications on autonomous
630
:systems.
631
:We don't have a lot of battery power.
632
:We don't have a very powerful CPUs on this
autonomous devices, because essentially
633
:what we want to do also is that
634
:We want to be able to run a very
difficult, large probabilistic models on
635
:the Raspberry Pi.
636
:And Raspberry Pi doesn't even have a GPU.
637
:So we can buy some small sort of a GPU and
put it on the Raspberry Pi.
638
:But still, the computational capabilities
are very, very limited on edge devices.
639
:For example, one may say, let's just do
everything in the cloud, which is a very
640
:valid argument, actually.
641
:But we also, in some situations, the
latencies are just too big.
642
:And also, maybe we don't have access to
the internet in some areas, but we still
643
:want to create these adaptive Bayesian
inference systems like a drone that they
644
:may...
645
:explore some area maybe in the mountain or
something where we don't really have an
646
:internet so we cannot really process
anything in the cloud.
647
:So it must work as efficient as possible.
648
:On a very, very small device that doesn't
have a lot of power doesn't have a lot of
649
:battery and still this should work in real
time.
650
:Yeah, I think, I think this is mostly the
main trades of and
651
:In terms of how we do it, we use this
variational inference and we sacrifice
652
:accuracy with respect to scalability.
653
:Reactive message passing allows us to
scale to a very large models because it
654
:works on Factor graphs.
655
:Yeah.
656
:And I think that's, these are very
important points to make, right?
657
:Because always when you work and you build
an open source,
658
:package you have to trade off to make.
659
:So that means you have to choose whether
you're going to a general package or a
660
:more specified one.
661
:And that will dictate in a way your trade
off.
662
:In RxInfer, it seems like you're quite
specified, specialist of message passing
663
:inference.
664
:So the cool thing here is that I'm
665
:choices because you're like, no, our main
use case is that.
666
:And so we can use that.
667
:And the devirational inference choice, for
instance, is quite telling because in your
668
:case, it seems to be really working well,
whereas we could not do that in PMC, for
669
:instance.
670
:If we remove the ability to use HMC, we
would have quite a drop in the user
671
:numbers.
672
:So yeah, that's always something I'm.
673
:Try to make people aware of when they are
using open source packages.
674
:You can do everything.
675
:Yeah, exactly.
676
:Exactly.
677
:So I actually really, when I have a need,
I really enjoy working with like HMC or
678
:NAT based methods because they just work,
just like magic.
679
:And, but, and here's the trade off, right?
680
:They work magically in many situations.
681
:But they're slow in some sense.
682
:Let's say they're not slow, but they're
slower than a message button.
683
:So here is this trade-off.
684
:So user friendliness is really, really
important key in this equation.
685
:Yeah, and what do you call user
friendliness in your case?
686
:So what I refer to user friendliness here
is that a user can specify a model, press
687
:a button with HMC and it just runs and the
user gets a result.
688
:Yes, a user needs to wait a little bit
more.
689
:But anyway, like user experience is great.
690
:Just specify a model, just run inference,
just get your result.
691
:With RxInfer, it's a bit less easier
because in most of the cases,
692
:uh, message passing works like that, that
it favors like analytical solutions on the
693
:graph.
694
:And if analytical solution for a message
is not available, uh, basically a user
695
:must specify an approximation method.
696
:Uh, it actually also can be HMC, uh, just
in case.
697
:Uh, but still our X and four does not
really specify a default approximation
698
:method.
699
:Currently,
700
:fine default approximation, but because it
does not define it currently, if a user
701
:specifies a complex probabilistic model,
it will probably throw an error saying
702
:that, okay, I don't know how to solve it,
please specify what should I do here and
703
:there.
704
:And for a new user, it might be a bit
unintuitive how to do that, what to
705
:specify.
706
:So for HMC, there's no need to do it, it
just works.
707
:But if RxInfer, it's not that easy yet.
708
:That's what I was referring to,
user-friendliness.
709
:Yeah, that makes sense.
710
:And again, here, the interesting thing is
that the definition of user-friendliness
711
:is going to depend on what you're trying
to optimize, right?
712
:What kind of use case you're trying to
optimize on.
713
:Yes.
714
:Actually, what's the future for RxInfer?
715
:What are the future developments or
enhancements that you are planning?
716
:So, we have already touched a little bit
about like Lazy Dynamics side, which tries
717
:to make a really, like a commercial
product out of the person, where we have
718
:great support.
719
:This is one side of the future, but we
also have a research side of the project.
720
:And research side of the project includes
structural model adaptation.
721
:We just, uh, which in my opinion is quite
cool.
722
:So what it basically means in a few words
is that we, we may be able in the future
723
:to change the structure of the model on
the fly without stopping the inference
724
:procedure and you may need it for several
reasons, for example, uh, computational
725
:power, uh, computational budget change,
and we are not longer able, we are no
726
:longer able to run inference.
727
:on such a complex model.
728
:So we want to reduce the complexity of the
model.
729
:We want to change the structure, maybe put
some less demanding factor nodes.
730
:And we want to do it on the fly.
731
:We want actually stopping the inference
because for like sampling based methods,
732
:if we change the model, we basically are
forced to restart because we have this
733
:change and it's quite difficult to reduce
the previous result if the structure of
734
:the model change.
735
:graphs, it's actually possible.
736
:So another point why we would need that in
the field is that if you could imagine
737
:different sensors, so we have different
observations, and one sensor all of a
738
:sudden just burned out, or glitched, or
something like that.
739
:So essentially, we are not longer having
this sort of observation.
740
:So we need to change the structure of our
model
741
:to account for this glitch or breakage of
the sensor.
742
:And this is also where reactive message
passing helps us because we basically,
743
:because we do not enforce the particular
order of updates, we stop reacting on this
744
:observation because it's no longer
available.
745
:And we also change the structure of the
model to account for that.
746
:Another thing for the future of RxN4 in
terms of research is that we want to be
747
:to support natively different update rates
for different variables.
748
:And so what I mean by that is that if you
imagine an audio recognition system, let's
749
:say, or audio enhancement system, let's
say, and you have you modeled the
750
:environment of like a person who is
talking around several persons and let's
751
:say their speech signal.
752
:arise at the rate of like 44 kilohertz if
we are talking about a typical microphone.
753
:But their environment, where are they
currently sitting, doesn't really change
754
:that fast because they may sit in a bar
and it will be a bar an hour later.
755
:So there's no need to infer this
information that often as their speech.
756
:So it changes very rarely.
757
:So we have a different set of variables
that may change at different scales.
758
:And we want also to support this natively
in RxInfer.
759
:So we can also make it easier for the
inference engine.
760
:So it does not spend computational
resources on variables, which are not
761
:updating fast.
762
:We want to be able to support
non-parametric models in Rx and FUR.
763
:And this includes like Gaussian processes.
764
:And we have a research, so currently we
have a PhD student in our lab who is
765
:working a lot on that and he has a great
progress.
766
:It's not available in the current version
of Rx and FUR, but he has like experiments
767
:and it works all nicely.
768
:At some point it will be integrated into
the public version.
769
:And...
770
:Yeah, and it just, you know, just
maintenance and fixing bugs and this kind
771
:of stuff, improving the documentation.
772
:So the documentation currently needs
improvement because we have quite some
773
:features and additions that we have
already integrated into the framework and
774
:we happily use them ourselves in our lab
for our research.
775
:But it's like maybe poorly documented,
let's say.
776
:So other people in theory can use this
functionality, but because they cannot go
777
:to my table in the office in the Einhorn
University of Technology, they cannot ask
778
:how to use it properly.
779
:So we should just put it into
documentation and so other people can use
780
:that as well.
781
:Yeah, yeah.
782
:Yeah.
783
:That makes sense.
784
:That's a nice roadmap for this year.
785
:And looking ahead, what's...
786
:your, you know, what's your vision, let's
say, for the future of automated patient
787
:inference in the way you do it, especially
in complex models like yours.
788
:Yeah, what's your vision about that?
789
:What would you like to see in the coming
years?
790
:Also, what would you like to not see?
791
:A good question.
792
:So in my opinion, the future is very
bright.
793
:the future of automated vision and like a
lot of great people working on this and
794
:start to work on that more people are
coming.
795
:Right.
796
:So so many toolboxes in Python and Julia,
like I am see cheering Julia in our there
797
:are like in C plus 10.
798
:So, so many implementations and it's only
getting better every year.
799
:Right.
800
:But I think in my opinion, the future is
that there will be several applications,
801
:like in our case, this autonomous systems
or maybe something else.
802
:And this packages, they will basically not
really compete.
803
:But in a sense, they will, like for
different applications, you will choose a
804
:different solution because all of them
will be kind of great in their own
805
:application.
806
:But I'm not sure if there will be like a
super ultra cool method that solves all
807
:problems of all applications in Bayesian
inference.
808
:And maybe we'll have who knows.
809
:But in my opinion, there will be always
this trades of trades of in different
810
:applications and we'll just have we'll use
different methodologies.
811
:Yeah.
812
:Yeah, that makes sense in.
813
:In a way.
814
:I like your point here, but all these
different methods cooperating in a way
815
:because they are addressing different
workflows or different use cases.
816
:So yeah, definitely I think we'll have
stuff to learn from one type of
817
:application to the other.
818
:I like this analogy of like, no, we don't
cut the bread with a fork.
819
:But it doesn't really make a fork a
useless tool.
820
:I mean, we can use a fork for something
else and we are not eating a soup with a
821
:knife, but it doesn't make knife a useless
tool.
822
:So these are tools that are great, but for
their own purposes.
823
:So Alexin Fur is like a good tool for like
real time signal process application.
824
:And Turing and Julia is like a great tool
for other applications.
825
:So we'll just live together and learn from
each other.
826
:Yeah.
827
:Fascinating.
828
:I really love that.
829
:And well, before closing up the show,
because I don't want to take too much time
830
:with you, but do you have any question I
really like asking from time to time is if
831
:you have any favorite type of model that
you always like to use and you want to
832
:share with listeners?
833
:You mean probabilistic model?
834
:Sure, or it can be a different model for
sure.
835
:But yeah, probabilistic model.
836
:I actually, yeah, I mentioned a little bit
that I do not really work from application
837
:point of view.
838
:I really work on the compiler for Bayesian
inference.
839
:So I don't really have a favorite model,
let's say.
840
:It's hard to say.
841
:Yeah, that's interesting because basically
you work, that's always an interesting
842
:position to me because you really work on
the, basically making the modeling
843
:possible, but usually
844
:one of the people using that modeling
platform yourself.
845
:Exactly.
846
:Yes.
847
:Yeah.
848
:That's always something really fascinating
to me.
849
:Because me, I'm kind of on the bridge, but
a bit more to the applied modeling side of
850
:things.
851
:So I'm really happy that there are people
like you who make my life easier and even
852
:possible.
853
:So thank you so much.
854
:That's cool.
855
:Awesome.
856
:Dmitri, that was super cool.
857
:Thanks a lot.
858
:Before letting you go, though, as usual,
I'm going to ask you the last two
859
:questions.
860
:I ask every guest at the end of the show.
861
:First one, if you had unlimited time and
resources, which problem would you try to
862
:solve?
863
:Yes, I thought about this question.
864
:It's kind of an interesting one.
865
:And I thought it would be cool.
866
:to have if we have an infinite amount of
time to try to solve some sort of
867
:unsolvable paradox because we already have
a limited time.
868
:So one of the areas which I never worked
with, but I'm really fascinated about is
869
:like astronomy and one of the paradoxes in
astronomy which is kind of
870
:I find interesting, but maybe it's not
really a paradox, but anyway, it's like
871
:Fermi paradox, which basically in a few
words, it tries to explain the discrepancy
872
:between the lack of evidence of other
civilizations, even though apparently
873
:there is a high likelihood for its
existence.
874
:Right?
875
:So this is maybe a problem I would work on
if I would have an infinite amount of
876
:resources I can just fly in the space and
try to find them.
877
:That sounds like a fun endeavor.
878
:Yeah, for sure.
879
:I'd love the answer to that paradox.
880
:And people are interested in the physics
side of things.
881
:There is a whole bunch of physics-related
episodes of this show.
882
:So for sure, refer to that.
883
:I'll put them in the show notes.
884
:My whole playlist about physics episodes.
885
:Yeah, I know.
886
:And I know also you're a big fan of
Aubrey...
887
:Clayton's book, The Bernoulli's Fallacy.
888
:So I also put this episode with Aubrey
Clayton in the show notes for people who
889
:have missed it.
890
:If you have missed it, I really recommend
it.
891
:That was a really good episode.
892
:No, I know.
893
:I know.
894
:I know this episode.
895
:Yeah, awesome.
896
:Well, thanks for listening to the show,
Dimitri.
897
:Awesome.
898
:Well.
899
:Thanks a lot, Mitri.
900
:That was really a treat to have you on.
901
:I'm really happy because I had so many
questions, but you helped me navigate
902
:that.
903
:I learned a lot and I'm sure listeners did
too.
904
:As usual, I put resources in a link to
your website in the show notes for those
905
:who want to dig deeper.
906
:Thank you again, Mitri, for taking the
time and being on this show.
907
:Yeah, thanks for inviting me.
908
:It was a pleasure to talk to you.
909
:Really, super nice and super cool
questions.
910
:I like it.