Artwork for podcast Learning Bayesian Statistics
#126 MMM, CLV & Bayesian Marketing Analytics, with Will Dean
Business & Data Science Episode 12619th February 2025 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 00:54:46

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

  • Marketing analytics is crucial for understanding customer behavior.
  • PyMC Marketing offers tools for customer lifetime value analysis.
  • Media mix modeling helps allocate marketing spend effectively.
  • Customer Lifetime Value (CLV) models are essential for understanding long-term customer behavior.
  • Productionizing models is essential for real-world applications.
  • Productionizing models involves challenges like model artifact storage and version control.
  • MLflow integration enhances model tracking and management.
  • The open-source community fosters collaboration and innovation.
  • Understanding time series is vital in marketing analytics.
  • Continuous learning is key in the evolving field of data science.

Chapters:

00:00 Introduction to Will Dean and His Work

10:48 Diving into PyMC Marketing

17:10 Understanding Media Mix Modeling

25:54 Challenges in Productionizing Models

35:27 Exploring Customer Lifetime Value Models

44:10 Learning and Development in Data Science

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia and Michael Cao.

Links from the show:

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

Welcome to another live episode of Learning Vision Statistics, recorded at Pi Data New

York:

2

:

This time, I talked to Will Dean, a statistician and data scientist with a passion for

vision methods, geospatial analytics and data visualization.

3

:

In this episode, we explore

4

:

key applications like media mix modeling, which helps optimize marketing spend, and

customer lifetime value models, which are crucial for understanding long-term customer

5

:

behavior.

6

:

Will also shares his insights on productionizing these models to make them actionable in

real-world business scenarios.

7

:

A huge thank you to the organizers of Pi Data New York for their trust, dedication, and

hard work.

8

:

It was an absolute blast.

9

:

This is Learning Vasion Statistics, episode 126, recorded live on November 7, 2024.

10

:

Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the

projects, and the people who make it possible.

11

:

I'm your host, Alex Andorra.

12

:

You can follow me on Twitter at alex-underscore-andorra.

13

:

like the country.

14

:

For any info about the show, learnbasedats.com is Laplace to be.

15

:

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on

Patreon, everything is in there.

16

:

That's learnbasedats.com.

17

:

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

18

:

See you around, folks.

19

:

and best patient wishes to you all.

20

:

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

21

:

Check us out at pimc-labs.com.

22

:

Hello my dear patrons!

23

:

First, want to thank the fantastic David McCormick, Ronald Legere, Sergio Dolia and

Michael Cao who started to support the show on Patreon.

24

:

Thank you so much for your support guys!

25

:

I think you know but it really makes a difference, it really makes...

26

:

this show possible, and that's with your contributions that I pay for everything going on

here backstage, the editing, the communication, the guest scheduling, preparation of the

27

:

episodes, and of course, recording the episodes.

28

:

So thank you so much to you all, and as usual, I will see you around in our brand new

Discord server.

29

:

Second thing before we start the show,

30

:

I wanted to let you know that I will be at a fun new conference on March 18, 2025 that's

called Field of Play.

31

:

It's going to be in Manchester, UK, and we're going to talk about Sports Analytics.

32

:

So if you're into that, please join us.

33

:

I am not going to give a talk this time, much better.

34

:

Our dear Chris Fonsbeck, PMC's BDFL that you heard of.

35

:

very recently in the podcast.

36

:

Well, Chris will be talking about baseball, of course, and there are speakers from top

Premier League football clubs, from Olympic winning teams, from academia.

37

:

It's going to be fun.

38

:

So feel free to get your tickets at fieldofplay.co.uk.

39

:

The link is in the show notes.

40

:

Okay, on to the show.

41

:

Well, Dean, welcome to learning Asian statistics.

42

:

Thank you for having me.

43

:

Yeah.

44

:

fan, big fan.

45

:

Yeah, well...

46

:

Really?

47

:

You did?

48

:

Of course.

49

:

So, full disclosure for the listener, I just asked to the whole room who was listening,

you didn't raise your hand.

50

:

I did, but you were looking the other way.

51

:

Okay.

52

:

Do you confirm, Sam?

53

:

Okay, that's fine.

54

:

No, no, seriously.

55

:

Thanks a lot for being here.

56

:

That's cool because I get to know you because we work in like we worked at LAMS for like

just a few months together, but I didn't really get to know you.

57

:

So that's great that you're here.

58

:

Today we'll talk about what you do best.

59

:

So that's a lot of marketing analytics, customer lifetime value, that kind of things,

which you're way better than me at.

60

:

So that's great.

61

:

I'm going to learn a lot.

62

:

But before that, like, can you tell us your, your origin story, right?

63

:

Like how, like maybe first what you're doing nowadays and how you ended up working on

that.

64

:

In most of my work now I'm doing consulting with Piemcee Labs and my main project I'm

working on is Piemcee marketing.

65

:

It's open source package.

66

:

A lot of the...

67

:

functionality there is stuff that we leverage for consulting and I work on that a lot.

68

:

There's a few different sides of it.

69

:

There's MMM, Media Mix Modeling, the customer lifetime value stuff, as well as some new

modules that are coming out with some customer choice stuff.

70

:

We're currently working on PRs.

71

:

One of the things that I've been excited on is trying to get these PMC models into

production.

72

:

So we've had recent

73

:

uh, integration with MLflow and we're still working to stuff like that.

74

:

Um, but my background statistics, uh, PIMC has been on my radar for, for years now.

75

:

Um, I was doing stuff back in the PIMC 3, um, when that was the term, I think it was a

little bit of a hard adjustment to drop the, the 3, but it makes sense now.

76

:

Um, and, uh, yeah, I always found it really fascinating.

77

:

Um, at the time I was still kind of picking up my

78

:

coding skills and I found the open source community and welcomeness of PIMC really great.

79

:

I picked up some issues and was always at least reading through the docs of the examples

that PIMC has is they're terrific.

80

:

Really just call my curiosity.

81

:

I really didn't have an introduction to Bayesian Assistant until college and

82

:

didn't really grab my attention too much at the time, but did use hierarch models a lot

for some applications.

83

:

And I think it was when I got into industry where the Bayesian stuff really caught my eye,

the use of posterior predictive, evaluating it through different functions in order to get

84

:

back to the language of the business you're working with, was something that I used and

helped resonate.

85

:

the problems that are solving.

86

:

And I just found it really powerful.

87

:

And I think the interface that PIMC offers is really great.

88

:

And I favorite some of the other ones, but yeah, it's gotten me to where I am now.

89

:

And now I'm a bit more active in the community.

90

:

PIMC marketing has been my project.

91

:

That's something that I'm working on with a few of the other developers.

92

:

And we always have new contributors, which is really great.

93

:

I always try to promote that and try to give people shout outs and help where I can so

that their work shines as well.

94

:

But yeah, having a lot of fun with that.

95

:

So that's what keeps me here.

96

:

Yeah, that's cool.

97

:

And I mean, thank you for all the work you're doing on the open sourcing, the PRs, the

onboarding of people.

98

:

That's absolutely amazing.

99

:

actually, did you learn patient stats while you were ramping up?

100

:

on contributing to PMC and PMC marketing or were you already familiar with that kind of

statistics before and like PMC was just a way for you to apply that theory?

101

:

So I did some courses in college.

102

:

I remember doing all the derivations of conjugate priors.

103

:

of my Python packages on KELBIS.

104

:

It's called a conjugate models and implements many of those in an API that I found

satisfying.

105

:

It's been a kind of a something that I worked towards over my career.

106

:

Found an interface that I enjoy.

107

:

And yeah, to me, I had that sense of what it was, but then it really, obviously for me, it

was very magical that you can just define this random generation process.

108

:

magically, you can get parameter estimates.

109

:

You have to get rid of your pen and paper and you didn't have to do derivations.

110

:

And I thought it was really cool.

111

:

And the fact that you can drop some assumptions and really cater to what you're doing is

really powerful.

112

:

Yeah, completely agree with that.

113

:

I had the same experience where I also did pen and paper stats, which ironically I hated.

114

:

I was much more into algebra at that time because algebra makes much more sense with a pen

and paper, you know?

115

:

Whereas probabilities with pen and paper, you can just mainly do trivial problems with

dice and coins, which I don't care about.

116

:

But yeah, then when I saw that you could generate distributions and have beautiful plots

on the computer, I was just like, damn, that's really cool.

117

:

Definitely, there is an echo there.

118

:

Yeah.

119

:

mean, at one point I was doing probably most of my work on pen and paper just during

college.

120

:

But since then, the fact that you can have a feedback loop of visualizing what problem

you're working on is something that I reach for now.

121

:

Yeah.

122

:

So let's dive a bit into privacy marketing because that's one of your main line of work

these days.

123

:

Can you...

124

:

Introduce Pimcey Marketing to the audience and give us an idea of what you can do with

that and why that would even be interesting to people.

125

:

Yeah, it falls into a general sense of the customer analytics.

126

:

There's the customer lifetime value problems.

127

:

It's a module that's in the package that has kind

128

:

I've implemented in the questions of how much is someone doing something?

129

:

How much are they spending?

130

:

How often are they doing it?

131

:

A lot of these WN words and these higher level questions and that's used a lot of the

state of the art models.

132

:

Cole Allen is really leading that and is trying to get basically any functionality that's

there.

133

:

into the module.

134

:

I've been doing a lot of work on the marketing mix modeling, which is kind of a higher

level time series decomposition of seeing where is your media spend or some type of

135

:

marketing spend kind of best being allocated.

136

:

And then there's a lot of questions that happen on top of that.

137

:

Once you have this decomposition, you can ask, know, what should I do?

138

:

Where are some other scenarios and and because of the way the model structure you can you

can get a good insight into the internals and one of the aspects that I've been trying to

139

:

build out is that you do have easy access to these internals Trying to see when when does

your money stop?

140

:

Yeah, give me your returns.

141

:

Yeah and The problem there is it many marketers have many different assumptions.

142

:

So there's a lot of

143

:

different ways to cater it and make the problem flexible, but also useful for many people.

144

:

We're doing some customer choice models now.

145

:

That's a new module that we're working towards.

146

:

The main question there would be, if you're running a new product, would be where the

existing products, how would that result in their sales or something like that?

147

:

So it's coming from the same type of pool, but how does it get broken down?

148

:

then...

149

:

So more like counterfactual analysis and what if scenarios basically?

150

:

Correct.

151

:

Yeah.

152

:

Ben is doing, Ben Vence is doing a lot of work there and I'm trying to help him get it

across the finish line.

153

:

Cause it seems like some useful stuff.

154

:

Yeah.

155

:

So, but then you do have this sense of, can have that...

156

:

that close tie with casual type of claims then.

157

:

that's pretty cool.

158

:

mean, already a lot of stuff here.

159

:

So we have a diverse audience here and also like even more diverse, the people who listen

afterwards.

160

:

So maybe can you motivate, can you tell them if they don't care or don't do customer

analytics or marketing analytics.

161

:

Why is it still interesting to be hearing from you about these kind of models?

162

:

Like how transferable these skills basically are to other domains?

163

:

I mean, I think you always have people using a product.

164

:

you can always have these, I lump them into these questions of how often, how much,

anything like that.

165

:

So you think you'd have a sense of...

166

:

trying to understand your customer base.

167

:

if you don't have that pulse, then I don't really understand that point of view.

168

:

Yeah, I think that everyone's working with stuff and on that level.

169

:

So these are statistical models.

170

:

So they offer insights that are white box models.

171

:

And they can just give glimpses to their tools into

172

:

understanding that to them face.

173

:

And then also I think any type of marketing spend is something that people are trying to

better allocate.

174

:

So if you're spending money, hopefully get some recognition on where exactly its impact is

going.

175

:

Yeah.

176

:

Yeah.

177

:

So great points.

178

:

And I think also it's all of time series predictions, right?

179

:

like in time series are

180

:

everywhere, especially in spots analytics, as you know, but yeah, like everywhere.

181

:

And so I think that's also very interesting to be exposed to these models because they

teach you a bit how to think about the time series with some counterfactuals.

182

:

The good thing of marketing data from the work I've done at labs for MMMs and things like

that is

183

:

Um, you get to have more controlled data and some even controlled experiments, which you

don't have in a lot of, of, of fields.

184

:

So that's pretty cool because that, that allows you to do many, many more things and very

powerful, very powerful methods.

185

:

I think that's, that's also a very, very good pedagogical use of these tools.

186

:

Um, and I think, so talking about that, the MMMs are like,

187

:

the main, that's the flagship, right?

188

:

Of, Pinesy marketing and even of these, of the, of the models in these industries.

189

:

So can you define what a media mix model is, when it's useful and how, how it's, what's

the, what are the building blocks of such a model?

190

:

Yeah, it comes down, it's, it's a regression model.

191

:

It's a time series.

192

:

So.

193

:

you have this, this target variable, usually sales or installs something along those

lines.

194

:

And then the, the, some of the covariates of that design matrix are going to be things

like spend, but also something about, any type of control you might have.

195

:

and then just, just, the way that, the, the models are built, you want to also try to

account the fact that

196

:

Money doesn't continuously give you infinite returns.

197

:

So there's saturation, non-liarities, as well as an ad stock type of effect.

198

:

trying to incorporate how long or when does this impact from spend get incorporated?

199

:

And this is where it's a bit of a deviation from these linear models.

200

:

But it's just a regression problem.

201

:

kind of can take many forms and become adaptive.

202

:

Yeah, it's like a regression, but with a custom inversing function.

203

:

I mean, a more special inversing function because of the add stock and the saturation

effect that you talked about.

204

:

like you can visualize it a bit like a binomial regression or a Poisson regression where

you have an inversing function somewhere after you have defined your linear regressor

205

:

before putting that into the likelihood.

206

:

But in this case, the inverse link function is a bit more complex than just taking the

exponential for instance or other logic.

207

:

Correct.

208

:

And then you have the time series component also in that linear aggressor.

209

:

In your experience, what's a good way of handling these time series?

210

:

And in which way?

211

:

I don't know.

212

:

Do you use like, do you recommend or do you personally use...

213

:

Gaussian processes, for instance, to use structural time series to take these time

dependency into account via MMM?

214

:

Yeah.

215

:

My experience, it's very similar to, know, we offer these building blocks and you can

build it up in various different ways.

216

:

The way that we structure the project is there's these media transformations and then we

offer the GP.

217

:

These are all aspects that kind of can become useful depending on the problem.

218

:

I know that some of the structural time series stuff is becoming a bit more developed and

easy accessible within POMC.

219

:

So Jesse and his work, and I would love to see kind of that integration, but I haven't an

ability to incorporate that.

220

:

I think that something that...

221

:

And that would be an example that would be great because in this model, you can add

another term, add some more terms or it is kind of incremental, which is nice.

222

:

Yeah.

223

:

So right now, to be clear, potency marketing uses Gaussian processes.

224

:

Correct.

225

:

Yeah.

226

:

There's some seasonality that can be imported.

227

:

That would be kind of as an intercept.

228

:

And there's also a way to understand media.

229

:

changing, maybe effects changing over time.

230

:

we have GPs incorporated in two different ways.

231

:

And then if you do use some of the building blocks, can kind of get as creative as

possible.

232

:

Yeah.

233

:

Yeah.

234

:

So, and are you using the HSTP decomposition or is that vanilla GP?

235

:

HSTP is where it's been integrated.

236

:

That seems like it's the easiest way to integrate where...

237

:

The software being built out really is trying to help facilitate the model deployment.

238

:

So that's the training part as well as the productionization.

239

:

And so there's a bit of constraints of trying to make stuff as serializable so that we can

save it off the blob and retrieve it again and make sure that it's ready and available for

240

:

kind of those business questions that you're having.

241

:

Because there tends to be a separation between the...

242

:

the training and then the use and you want to store up these artifacts.

243

:

Yeah.

244

:

But HSGP has been our way to do it.

245

:

You can imagine that you can use any other flavor.

246

:

Yeah.

247

:

I mean, depends on the data size and so on.

248

:

yeah, like in my experience, HSGPs would be the way to go here.

249

:

Are you using hierarchies on those or like, because you have, imagine you want one

Gaussian process per media channel, for instance.

250

:

But maybe you want some hierarchical pooling across the channels.

251

:

So how is that handled in privacy marketing?

252

:

So recently, one of the biggest constraints with doing deployment is that you have to kind

of have a way to maybe serialize your configuration or have some configuration.

253

:

And the way that we've been doing that does support kind of a user input.

254

:

And that also has a way to define hierarchies.

255

:

So it is up to the user.

256

:

The way that we're shaping stuff is that we support hierarchies.

257

:

can, if you like, have hierarchical parameters.

258

:

And that's really up to the user to define.

259

:

So my goal with the project is to make it as flexible as it can be and offer the user that

wants a new

260

:

marketing analytics and MMM space to be able to take advantage of it.

261

:

So we have a lot of examples on our documentation page.

262

:

Yeah.

263

:

And we're trying to build that out.

264

:

Yeah, we'll put that in the show notes for sure.

265

:

But that's really cool.

266

:

you can have these kind of hierarchical GPs already.

267

:

That's already in the package, right?

268

:

Yeah, the GP is being modified a bit at the moment, but

269

:

That's the long term.

270

:

But all these other components are flexible to end up to the user just to define their

priors in a way.

271

:

And those include hierarchical parameters.

272

:

really, the way I've been viewing is it's kind of an agreement on the shape of the data.

273

:

So if you want to do something with, say you have five covariates, you can generate a five

covariate vector in many different ways.

274

:

It can be hierarchically generated.

275

:

It can be independently generated.

276

:

And you can also flip out the distributions.

277

:

And how do you guys handle the seasonality that you were talking about earlier?

278

:

Are you using Fourier decompositions or are you using another GP kernel to do that?

279

:

We offer two implementations.

280

:

There's the Fourier, which is the origination that was probably the original

implementation.

281

:

And then there's also a GP seasonality as well.

282

:

I would imagine if you incorporate both, might get a kind of overarching trend, a kind of

increase or some type of seasonality.

283

:

And then the Fourier will probably detect some more yearly.

284

:

Okay.

285

:

Yeah, that's really cool.

286

:

are already so many combinations of models that you can make.

287

:

Yeah.

288

:

One of the...

289

:

My goal is really make it...

290

:

allow the user to be empowered to do and work with their assumptions and, and, and be as

flexible as possible.

291

:

And how the box we're offering basically 50 plus with just a toggle on and off.

292

:

but then with the building blocks, you can get as creative as possible.

293

:

then, don't forget about how priors can just be, kind of defined in the way you want.

294

:

So if you want to, can, you can really.

295

:

Get creative.

296

:

And so to get out, we'll get back a bit later to another kind of model you mentioned just

a few minutes ago, but the customer, I think it was customer value, right?

297

:

Something like that.

298

:

But I'm also curious about the productionization of models, which is another focus of

yours lately.

299

:

So yeah, can you talk to us a bit more about that?

300

:

What does that mean to productionize model?

301

:

What are the main challenges?

302

:

And how are you trying to solve them?

303

:

Yeah, the, the productionization side, well, we want to facilitate being able to store a

model artifact.

304

:

And one of the aspects with PyMC and the way it's built is that you always had to have

this, this model graph.

305

:

we offer a way to, from this binary file load it back in, but then also generate this or

recreate this model again.

306

:

So you can.

307

:

just pick up from where you left off.

308

:

You mean the sampling?

309

:

you'd reconstruct the model.

310

:

So just based off of the inference data object, which is stored in blob storage, you not

only just load in this data, but you load the model as well.

311

:

that's pretty cool.

312

:

The tough part with it is that you're not, so we're not pickling and depending upon the

version of Python and kind of seeing some of those rough edges.

313

:

or reconstructing the model so that you can version up the package as well as change some

of your code and increase the methods and some of the development aspects, but then also

314

:

get back to where you were.

315

:

So that's really awesome.

316

:

So that means like when you load, for instance, you do RVs.load.netcdf, you would not only

have

317

:

back from that function, the netcdfi with the inference data object, but also the model

object that created the inference data.

318

:

Correct.

319

:

That's really cool.

320

:

Yeah.

321

:

And it's definitely a tough problem because not everything that PyMC can do is a

regression problem.

322

:

instance, some of the CLV stuff is shaped to form a kind of like a decomposition or just

parameter estimation.

323

:

Similar with these product models that we're

324

:

It's, it's, it's not necessarily that you're trying to do predictions in the future.

325

:

So, some of the models have different requirements, which makes a little bit tough, as

well as also supporting people's flexibility and customization, you know, making sure that

326

:

their priors are exactly what they wore before.

327

:

Yeah.

328

:

so it becomes a tough problem, but we want to make it so that, that at least the model

that we have.

329

:

offering can be recovered and you can really just jump back into it and do some of those

cool things that you want to do.

330

:

Long-term, I would love to see all PMC models can do that.

331

:

It's definitely tough because you can get really creative.

332

:

I think one of the aspects I like about PMC is that it kind of drops some of those

assumptions that you had.

333

:

instance, in the MMM space,

334

:

It's, it is a regression model, but we also have inputs from other sources that come in.

335

:

So it's not just a model with one likelihood, but it's a model that can kind of have an

average amount of likelihoods that would be a kind of a regularization on this general

336

:

global kind of model graph.

337

:

And to kind of cater to that fully is tough.

338

:

Yeah.

339

:

That's sanity for sure.

340

:

But that would be amazing.

341

:

And I know also Luciano Pass is working on trying to pick up the sampling process from

where you left off.

342

:

So that comes from being forced to work a lot on virtual machines.

343

:

And so sometimes your connection gets lost and then you have to sample the model all over

again.

344

:

And often if you're working on a VM, it's because you have a model that's challenging to

sample from, so it takes a lot of time.

345

:

And so it's so much time lost.

346

:

so yeah, what Luciano is working on is trying to find a way to be able to stop the

sampling process automatically.

347

:

And then when you come back, the model will start sampling again from the same spot.

348

:

It was the last time you were connected to the VM, for instance, which would be like that.

349

:

that would be super powerful and so coupled to that kind of development that you're

working on that I would make a big difference.

350

:

would say not only for productionization of models, but also for development in general.

351

:

general recommendation, my understanding is just to get a new data set, just sample it

again entirely.

352

:

Yeah.

353

:

Which can be time intensive.

354

:

do some big models and they're taking three hours plus on

355

:

GPU, even when you're trying to get as fast as possible.

356

:

So I definitely understand that pain.

357

:

One of the aspects that I have been thinking is awesome with the integration with MLflow

is that you can kind of kick off something and see this configuration.

358

:

We do a lot of logging of what was your sampler, what were the configurations, what was

the actual model graph, and we're working on this.

359

:

this way to track experiments so you can kind come back and see some context that you

started with.

360

:

Okay.

361

:

So what's MLflow here?

362

:

I don't know that.

363

:

Like, yeah, what is it and why is it useful in this case?

364

:

MLflow is like a machine learning kind of deployment, experiment tracking framework.

365

:

It has a of a plug-in system to many machine learning packages to log these artifacts and

various aspects of your configuration.

366

:

And it's complete kind of tweakable.

367

:

For instance, the things that you might be interested in to log when you're doing some

experimentation is your sampler, or how long is it taking?

368

:

What was like the?

369

:

some type of metric that you are curious about, maybe some of the plots so can do some of

that interrogation.

370

:

And this is just a way to store this off.

371

:

It's stored in some database and usually it's hosted.

372

:

It'll be served and then you can kind of always come back to it.

373

:

I've kind of been describing it as a way to just kind of do some of that bookkeeping and

to kind of point to all the files that you kind of had.

374

:

of interest as well as anything that might be relevant.

375

:

I think it's a good tool, if you'd use it correctly.

376

:

But yeah, you can definitely, in my mind, leverage it for much bookkeeping and so you can

come back to stuff and get it, get back to the context that you.

377

:

Yeah, this is super.

378

:

Yeah, that's, that sounds super helpful.

379

:

Yeah, for sure.

380

:

Damn.

381

:

Yeah, don't have the best memory, but after three hours of the sampler, you're trying to

do some other, you trigger a few models.

382

:

You want to have the way to kind get back into that workflow.

383

:

It's a tool that I think is trying to make people's life a little bit easier.

384

:

It integrates in a lot of machine learning.

385

:

And is that a feature that's already available in Pinty Marketing or are you still working

on developing that?

386

:

The MLflow has an auto logging and we have a auto logging that's supportive for all PyMC

models, which is cool.

387

:

It tracks off the model graph as well as the number of parameters, the number of

deterministics, and it does it for free, which is great.

388

:

And then for the MMM, we have another set of things that we track down like the add stock

function, the saturation functions.

389

:

And then Louie McGowan, one of the contributors, he's been working on logging off some of

these metrics that are very crucial to MMM.

390

:

So really just makes it, you kick off the model the way you would and for free you get a

lot of great stuff.

391

:

But then under the hood or kind of building with this repertoire, but it's basically for

free where you already have.

392

:

dozens and dozens of things that might be of interest.

393

:

Parameter estimates.

394

:

But it is something that if it's not of interest, then you can change it to your liking.

395

:

we're still kind of collecting feedback on that level, but there's a lot of stuff that's

currently already implemented.

396

:

Damn.

397

:

Yeah, that's pretty cool.

398

:

Well done on the network.

399

:

That must have been quite challenging.

400

:

I'd like to go back to a kind of model.

401

:

You talked about the viterior, which was something about customers.

402

:

Can you remind me what the name was and also define these kinds of models for the

audience?

403

:

Is it the customer lifetime value?

404

:

That's possible.

405

:

Okay.

406

:

So yeah, can you define these models when they are useful, what their strengths are, their

weaknesses?

407

:

say the expert, but I kind of lump into that.

408

:

The W words on your customer is trying to understand just the customer profile and

depending on the model kind of architecture, can derive various different insights, many

409

:

of which kind of have a global population.

410

:

you not only by looking at, you know, some data set you're learning about the populations

in general.

411

:

So there's a

412

:

I think there's various different artifacts depending on which one, but it goes into kind

of different paradigms of how products are set up.

413

:

instance, subscription goals, anytime there's a subscription, there's subset of research

there.

414

:

Contractual, there's various different.

415

:

Yeah, that sounds a bit like a...

416

:

kind of like a survival model in a way where you're trying to, to infer the probability of

like a customer churning out basically.

417

:

Correct.

418

:

Yeah.

419

:

A lot, one of the big use cases, my understanding is for churn prediction, it's, it has

that part of the, um, the, uh, assumptions.

420

:

So depending on the assumptions and you get different insights from them.

421

:

And, um, I know some of them have churn capabilities, so.

422

:

You might have an assumption of you never coming back.

423

:

Yeah, I think it's why it gets determined to buy till you die and that's the dying process

and then if you don't die then you have a time until the next purchase or next transaction

424

:

and that just is derived so you can kind of get something that okay.

425

:

Yeah, I'm not an expert at that feeling.

426

:

It's cold Alan has been developing a lot of that work.

427

:

Mm-hmm.

428

:

So you have have him on

429

:

Yeah.

430

:

mean, and also I'm guessing there are some tutorials on the Pimc marketing website, maybe

we can link to in the show notes of the episode.

431

:

Correct.

432

:

Yeah.

433

:

Yeah.

434

:

There's a lot of documentation there.

435

:

And I think in this CLB model, that's where we've been lumping it.

436

:

There's a notebook for every type of model.

437

:

There's many, many models, all different assumptions and the documentation has the context

and some of this setup and when you use it and what kind of insights can you grab from it?

438

:

Yeah, okay.

439

:

yeah, let's definitely link to that in the show notes.

440

:

I'm also curious, know, like, is there something I always like to learn when I talk to

people like you, whether on the show or in the hallways is, there like a mistake you made

441

:

at some point in your development of whether models or a feature that was really

442

:

enlightening to you and that made you learn something very important.

443

:

In terms of development?

444

:

Yeah, model development or...

445

:

I mean, constantly.

446

:

It's a living piece of software and we're learning all the time.

447

:

And I think we come out with some stuff that we think is structured kind of in a way

that's usable and adjustable and flexible.

448

:

But we're constantly learning and...

449

:

I always appreciate the feedback from the larger community and it's great to have

opportunities to hear from people.

450

:

And I try to kind of give praise to as much people that do have, you know, contributions

and encourage it.

451

:

But, cause that's totally where this, where the, where the, where the projects goals come

from.

452

:

then what's the latest.

453

:

surprised you got, or the most surprising learning experience you got from that?

454

:

I think one of the ones that comes to my mind is in the employment-ready production code,

it's a tough problem because you have to account for the flexibility that PMC models can

455

:

provide.

456

:

So to structure that problem where you can support many different types of models, but

then also get back to your workflow, reconstruct your model is very tough.

457

:

the code base got a lot of love.

458

:

We're constantly working on it, and we're trying to just hit all the use cases and be

constructive on where it's going.

459

:

So, yeah, know, yeah, we just want to be as flexible as possible so we can, we can, you

know, you don't have to be as invasive in the code base in order to support someone's.

460

:

Yeah, I see.

461

:

I see what you're doing.

462

:

Yeah, definitely.

463

:

And that's the same thing on the Pimesy side.

464

:

Yeah.

465

:

Yeah.

466

:

Yeah.

467

:

There's a bit, yeah, there's definitely a little bit of an error context, which I really

enjoy.

468

:

It's, you know,

469

:

You can always add a new distribution, Pimc, and it's kind of these building blocks.

470

:

And now there's a bit of little constraint on top where they're all kind of the same.

471

:

So it's been a fun thing to work on.

472

:

Something I really like also is that in the Discord of Pimc Labs, you always are one of

the guys who drop, you test a lot of new technologies.

473

:

So I love that because you often share that in the Discord.

474

:

Something I saw you guys were testing recently was something like a new notebook stuff or

how is it called?

475

:

Merino or something like that?

476

:

Merino?

477

:

Merino?

478

:

Yeah, something like that.

479

:

yeah, can you tell us what this is about?

480

:

curious.

481

:

Is that something that replaces Jupyter notebooks?

482

:

I don't know exactly the full...

483

:

Aspect dependent, some no more, but I'm playing around with it.

484

:

been fun.

485

:

It's a lot, very interactive.

486

:

I've always been a fan of like the Altair project, which has visualizations and you can

really kind of get that visualization and interactivity to your visualizations.

487

:

I, this has a similar sense, but in a notebook context where some of those sharp edges of

version control as well as cell dependency.

488

:

I think it's trying to be solved here, but it's been very, I think it's been getting

adopted by a handful of people and I'm keeping my eye out.

489

:

I'd love to see that interactivity intersect with, or some of this capability intersect

with PIMC stuff and maybe see how it can.

490

:

Yeah, for sure.

491

:

So it's like it builds on top of Jupyter notebooks to make them like, zero to two version

control?

492

:

I think it's a separate thing.

493

:

Okay.

494

:

Okay.

495

:

Yeah, that's my side of it.

496

:

Okay.

497

:

That's interesting.

498

:

So we should put the link in the show notes also.

499

:

I'm definitely going to try that because Jupyter notebooks are hard for version control,

cell dependency, et cetera.

500

:

So that sounds like something at least I'd like to try if that's really solving these

kinds of issues.

501

:

Yeah.

502

:

was playing around with it yesterday where you can have some of the kind of configuration

at the top.

503

:

And then you change it and then it automatically derives and kind of ships off and

executes kind of the dependent cells.

504

:

don't know if that's the terminology.

505

:

There seems to be some deviation there, but you can, you know, get this sense of

interactivity and, and not have to deal with all of the execution stuff.

506

:

Yeah.

507

:

So it seems exciting.

508

:

It seems like a cool project and

509

:

I hope it seems like it's getting a lot of love.

510

:

Yeah, for sure.

511

:

I'll give it a try and we'll put that in the show notes for sure.

512

:

So to place out, we'll open to the Q &A in a minute, but to place out a question I like to

ask the guests from time to time, there something you're like, what are you learning these

513

:

days?

514

:

Because I know you're

515

:

Always learning new stuff as you were saying.

516

:

So I'm always curious when I meet people like you to know what you're working on these

days, what you're learning, what's your maybe also challenging thing you're working on

517

:

right now where like banging your head against the wall, you know.

518

:

I've been touring around a lot with the channel of actions.

519

:

been having a lot of fun with those.

520

:

For me, I want to make the developer experience kind of as easy as possible.

521

:

I'm a developer.

522

:

I know other people are developers.

523

:

I joke one of the other...

524

:

developers, Juan, when I was just coffee break and I'm trying to get rid of this coffee

break, slow down, speed up everything.

525

:

So I've been having a lot of fun working with those and learn those, but I'm still

developing my workflow that I enjoy for that and trying to make the capabilities pretty

526

:

good.

527

:

Yeah.

528

:

Okay.

529

:

That's pretty cool.

530

:

So that automates a lot of the process on GitHub, GitHub Actions?

531

:

Yeah.

532

:

We, so with privacy marketing, have that for automated testing.

533

:

have documentation checks.

534

:

have the, issue labeling and issue management release, release, consequences like

publishing to packaging and so forth.

535

:

So it handles a lot of things.

536

:

And I've just been toying around that as a medium to, you know, make some of the problems

I do easier.

537

:

I'm always on GitHub and trying to respond to stuff and.

538

:

Make sure I can ask every, get to everyone's questions and help out where I can or point

them to the right direction.

539

:

um, it's like automating a lot of work.

540

:

Correct.

541

:

Automation.

542

:

Yeah, that's great.

543

:

That's great.

544

:

Yeah.

545

:

Awesome.

546

:

Well, um, well, I have to ask you the last two questions, you know, uh, Ice Giver gets it

into the show, but before, uh, do we have any questions from the, from the audience yet or

547

:

are you guys good?

548

:

Yeah.

549

:

So I'll, repeat the question for the.

550

:

the people who are not in room.

551

:

What's your name?

552

:

Jeff.

553

:

So Jeff was asking, so more of an DevOps person, Operations research.

554

:

yeah, question, Jeff questions is basically, yeah, like what you're doing is a lot of time

series in call to inference and how generally can this be applied to the kind of work that

555

:

you're doing, Jeff?

556

:

I guess that's a little bit tough.

557

:

mean, I would say it may be terms from time series and Bayesian kind of statistics, can

get level of outliers and events.

558

:

maybe that would be something that is kind of catered.

559

:

I'd have to kind of learn a little bit more.

560

:

Okay.

561

:

So question was, is setting a prior in the context of a causal inference model different

from.

562

:

setting a prior in a model where you're less interested in causal effects.

563

:

I'm going to go back and add the things I'm learning is causal inference.

564

:

I've found that expert there.

565

:

I'm not exactly sure, but I can imagine that you're still trying to just cater to your

assumptions in general, all the information that you have and leverage the prior

566

:

predictive checks and posterior predictive checking.

567

:

And I like to interrogate internals and models and

568

:

even if it's artifacts that don't have anything to do with it, derivations.

569

:

I would imagine that it's similar to that Bayesian workflow.

570

:

Yeah.

571

:

two cents also would be that.

572

:

Like, I don't think you would do that necessarily differently.

573

:

Not necessarily because deriving the priors for causal inference is not special, but

because the way we usually do that in the Bayesian workflow is because you like...

574

:

You really do that from a generative perspective most of the time.

575

:

So you're already thinking in a causal inference framework most of the time.

576

:

So yeah, I say I don't think that's much different, you know, maybe we'll get some angry

emails after the episode.

577

:

So I'll make sure to forward them to you, Yeah.

578

:

Anything to add Will?

579

:

No.

580

:

Any other question?

581

:

Yeah.

582

:

So you said blank choice modeling?

583

:

Choice modeling.

584

:

So I'm not familiar with that kind of model.

585

:

So that's actually a great question.

586

:

you.

587

:

What's your name?

588

:

Sunmai.

589

:

So question was, yeah, choice modeling.

590

:

Do you, like, is that possible to do in privacy marketing, if I understood correctly?

591

:

Did you guys already think about that?

592

:

So the customer choice module that we're working on is

593

:

also high level time series.

594

:

So it's time series, but it'd be for some type of product.

595

:

And then the customer choice label that we're using at the moment comes from, it's the

model internals trying to delineate where it is introducing of an introduction of a new

596

:

product.

597

:

Where does it come from, from the existing current product?

598

:

So in a way, like where is this higher level choice rather than a

599

:

customer level.

600

:

these models like multinomial choice models?

601

:

There's a, we offer the flexibility to have different likelihoods for the time series

because the likelihood kind of comes from that time series side of things.

602

:

But then in terms of the breakdown of where does the contribute, where are these metrics

going into this for this new product that's has a Dershly

603

:

type of breakdown.

604

:

Okay.

605

:

Yeah.

606

:

Okay.

607

:

So that makes sense to me.

608

:

Yeah.

609

:

Yeah.

610

:

So, but we try to offer that flexibility.

611

:

So if it is, you know, count like data, then it can be used there.

612

:

it's strictly positive, you can change your likelihood, et cetera.

613

:

Yeah.

614

:

Yeah.

615

:

Okay.

616

:

But satisfied or full of, you know, that's why.

617

:

Great.

618

:

Any other questions before we close up?

619

:

So we all awesome.

620

:

Thank you for taking the time.

621

:

That was great to have you here.

622

:

Of course, last two questions I ask every guest at the end of the show.

623

:

So first one, if you had unlimited time and resources, which problem would you try to

solve?

624

:

That is tough.

625

:

One of the realizations with doing the model deployment stuff that's been on my mind is

the benefit of having shape information or dimension information.

626

:

from the PIMC model, external of these model contexts.

627

:

And we have a wrapper that we implement that allows that to be the case.

628

:

But we're still using PIMC, kind of, the functionality at the moment.

629

:

I would love to see that general use of abstract shapes and label coordinates that PIMC

benefits from.

630

:

Kind of a more larger use, think, that...

631

:

more a little bit lower down with high tensor could benefit from that.

632

:

But I'm not too sure there's, might be a lot of consequences, but yeah, it's a, that would

be a resource intensive.

633

:

Yeah.

634

:

Influenation.

635

:

So I would love to see where that can go.

636

:

I am in a, in the future.

637

:

Yeah.

638

:

Yeah.

639

:

Excited to see that.

640

:

Yeah.

641

:

Sounds good.

642

:

And second question, if you could have dinner with any great scientific mind, dead alive

or fictional, who would it be?

643

:

I, I, the last few years I've been watching some math history.

644

:

I think that's an interesting side of mathematics where you get a sense of where, you

know, how people thought throughout time.

645

:

And, one of the stories that I know is, with, the, the, the,

646

:

the great scientist and had the Eureka moment.

647

:

think that's one of the quick minds.

648

:

Yeah.

649

:

Like you his name, but I think it has an interesting aspect to the story where I think

it's mass in my mind is the balance of the invention kind of discovery side.

650

:

when this story to me, this the use of

651

:

how the idea kind of came to him as interesting and I would love to see kind of the

workflow.

652

:

How kind of these historical figures had.

653

:

like the creativity in the mathematical discovery process basically.

654

:

It's something that's very interesting to you.

655

:

Great.

656

:

Awesome.

657

:

Well, thanks a lot, Will, for taking the time.

658

:

and being on this show and thank you everybody for coming.

659

:

That was great to have you here live and well, see you in the next episode.

660

:

Thank you.

661

:

This has been another episode of Learning Bayesian Statistics.

662

:

Be sure to rate, review and follow the show on your favorite podcatcher and visit

learnbaystats.com for more resources about today's topics as well as access to more

663

:

episodes to help you reach true Bayesian state of mind.

664

:

That's learnbaystats.com.

665

:

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lars and Meghiraam.

666

:

Check out his awesome work at bababrinkman.com.

667

:

I'm your host,

668

:

Alex and Dora.

669

:

can follow me on Twitter at Alex underscore and Dora like the country.

670

:

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

671

:

Thank you so much for listening and for your support.

672

:

You're truly a good Bayesian.

673

:

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

674

:

Let's adjust those expectations.

675

:

Let me show you how to be a good Bayesian Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Chapters

Video

More from YouTube

More Episodes
126. #126 MMM, CLV & Bayesian Marketing Analytics, with Will Dean
00:54:46
125. #125 Bayesian Sports Analytics & The Future of PyMC, with Chris Fonnesbeck
00:58:14
124. #124 State Space Models & Structural Time Series, with Jesse Grabowski
01:35:43
123. #123 BART & The Future of Bayesian Tools, with Osvaldo Martin
01:32:13
122. #122 Learning and Teaching in the Age of AI, with Hugo Bowne-Anderson
01:23:10
120. #120 Innovations in Infectious Disease Modeling, with Liza Semenova & Chris Wymant
01:01:39
118. #118 Exploring the Future of Stan, with Charles Margossian & Brian Ward
00:58:50
116. #116 Mastering Soccer Analytics, with Ravi Ramineni
01:32:46
115. #115 Using Time Series to Estimate Uncertainty, with Nate Haines
01:39:50
114. #114 From the Field to the Lab – A Journey in Baseball Science, with Jacob Buffa
01:01:31
113. #113 A Deep Dive into Bayesian Stats, with Alex Andorra, ft. the Super Data Science Podcast
01:30:51
112. #112 Advanced Bayesian Regression, with Tomi Capretto
01:27:18
108. #108 Modeling Sports & Extracting Player Values, with Paul Sabin
01:18:04
106. #106 Active Statistics, Two Truths & a Lie, with Andrew Gelman
01:16:46
97. #97 Probably Overthinking Statistical Paradoxes, with Allen Downey
01:12:35
96. #96 Pharma Models, Sports Analytics & Stan News, with Daniel Lee
00:55:51
91. #91, Exploring European Football Analytics, with Max Göbel
01:04:13
87. #87 Unlocking the Power of Bayesian Causal Inference, with Ben Vincent
01:08:38
85. #85 A Brief History of Sports Analytics, with Jim Albert
01:06:10
83. #83 Multilevel Regression, Post-Stratification & Electoral Dynamics, with Tarmo Jüristo
01:17:20
2. #2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck
00:43:37
3. #3.1 What is Probabilistic Programming & Why use it, with Colin Carroll
00:32:33
bonus #3.2 How to use Bayes in industry, with Colin Carroll
00:32:06
5. #5 How to use Bayes in the biomedical industry, with Eric Ma
00:46:37
8. #8 Bayesian Inference for Software Engineers, with Max Sklar
00:48:41
11. #11 Taking care of your Hierarchical Models, with Thomas Wiecki
00:58:01
22. #22 Eliciting Priors and Doing Bayesian Inference at Scale, with Avi Bryant
01:06:55
23. #23 Bayesian Stats in Business and Marketing Analytics, with Elea McDonnel Feit
00:59:05
32. #32 Getting involved into Bayesian Stats & Open-Source Development, with Peadar Coyle
00:53:04
33. #33 Bayesian Structural Time Series, with Ben Zweig
00:57:49
58. #58 Bayesian Modeling and Computation, with Osvaldo Martin, Ravin Kumar and Junpeng Lao
01:09:25
63. #63 Media Mix Models & Bayes for Marketing, with Luciano Paz
01:14:43
80. #80 Bayesian Additive Regression Trees (BARTs), with Sameer Deshpande
01:09:05