#133 Making Models More Efficient & Flexible, with Sean Pinkney & Adrian Seyboldt

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Intro to Bayes Course (first 2 lessons free)
Advanced Regression Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

Zero Sum constraints allow for better sampling and estimation in hierarchical models.
Understanding the difference between population and sample means is crucial.
A library for zero-sum normal effects would be beneficial.
Practical solutions can yield decent predictions even with limitations.
Cholesky parameterization can be adapted for positive correlation matrices.
Understanding the geometry of sampling spaces is crucial.
The relationship between eigenvalues and sampling is complex.
Collaboration and sharing knowledge enhance research outcomes.
Innovative approaches can simplify complex statistical problems.

Chapters:

03:35 Sean Pinkney's Journey to Bayesian Modeling

11:21 The Zero-Sum Normal Project Explained

18:52 Technical Insights on Zero-Sum Constraints

32:04 Handling New Elements in Bayesian Models

36:19 Understanding Population Parameters and Predictions

49:11 Exploring Flexible Cholesky Parameterization

01:07:23 Closing Thoughts and Future Directions

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.

Links from the show:

Sean's website: https://spinkney.github.io/
Sean on LinkedIn: https://www.linkedin.com/in/sean-pinkney123/
Sean on GitHub: https://github.com/spinkney
Sean on BlueSky: https://bsky.app/profile/spinkney.bsky.social
Sean on Mastodon: https://fosstodon.org/@spinkney
Sean's talk at StanCon 2024: https://youtu.be/eE8Vqxs8OfQ?si=09-vNvCxpbz8enUj
Flexible Cholesky Parameterization of Correlation Matrices: https://arxiv.org/abs/2405.07286
Quantile Regressions in Stan: https://spinkney.github.io/posts/post-2-quantile-reg-series/post-2-quantile-reg-part-I/quantile-reg.html
LBS #74 Optimizing NUTS and Developing the ZeroSumNormal Distribution, with Adrian Seyboldt: https://learnbayesstats.com/episode/74-optimizing-nuts-developing-zerosumnormal-distribution-adrian-seyboldt

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:04

In today's episode, I am thrilled to welcome Sean Pinkney, a managing director at Omnicom

Media Group and a stunt core contributor.

2

: 00:00:14

With a background spanning mathematics, economics and statistics, Sean shares how he

transitioned from economics to data science and how his work evolved into advanced

3

: 00:00:23

Bayesian modeling.

4

: 00:00:24

We dive deep into his 0sumnormal project in Stan, exploring why he created it, how it

improves,

5

: 00:00:31

the efficiency of hierarchical models and what its broader applications might be.

6

: 00:00:36

We're also joined by a special guest star, Adrian Zabolt.

7

: 00:00:41

You may remember Adrian from episode 74 of the show.

8

: 00:00:46

If you don't, well, I recommend you to listen to his dedicated episode.

9

: 00:00:51

It is in the show notes.

10

: 00:00:52

And Adrian will provide additional insights into the technical and philosophical, even

aspects of flexible parameterization.

11

: 00:01:00

and population effects.

12

: 00:01:02

So whether you're a fan of Bayesian stats, of Stan, or simply curious about innovation in

statistical modeling, this episode is packed with technical gems and big picture

13

: 00:01:13

reflections alike.

14

: 00:01:14

This is learning Bayesian statistics, episode 133, recorded February 11, 2025.

15

: 00:01:24

Show you how to be a good peasy and change your predictions after taking

16

: 00:01:39

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,

the projects, and the people who make it possible.

17

: 00:01:48

I'm your host, Alex Andorra.

18

: 00:01:50

You can follow me on Twitter at alex.andorra, like the country.

19

: 00:01:54

For any info about the show, learnbaytestats.com is the place to be.

20

: 00:01:59

Show notes.

21

: 00:02:00

becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on Patreon,

everything is in there.

22

: 00:02:06

That's learnbasedats.com.

23

: 00:02:08

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

24

: 00:02:19

See you around, folks, and best Bayesian wishes to you all.

25

: 00:02:22

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

26

: 00:02:29

Check us out at Pimc-lamps.com

27

: 00:02:35

Sean Pickney, welcome to Learning Vision Statistics.

28

: 00:02:40

Hello, great to be here.

29

: 00:02:43

Yeah, that's awesome to have you here.

30

: 00:02:46

It's been a long time coming.

31

: 00:02:48

We met finally at StanCon 2024 in Oxford.

32

: 00:02:53

em And that's when I learned that you were one of the main persons working behind adding

the zero-sum constraint to Stan.

33

: 00:03:05

Of course, we're going to talk a lot about that today.

34

: 00:03:07

um We may have surprise guests at some point, we'll see.

35

: 00:03:13

To talk about that precisely, but first, can you tell us what you're doing nowadays, Sean,

and how you ended up working on this?

36

: 00:03:21

Yeah, yeah.

37

: 00:03:22

On the zero something, it will be a pretty, I think, hopefully funny story, interesting as

well, because basically, I wasn't focused on this at all.

38

: 00:03:32

So um I find it...

39

: 00:03:35

kind of great that you're interested in it and I got into it a bit accidentally, but we'll

get there.

40

: 00:03:41

Yeah, so just a bit about me.

41

: 00:03:44

uh So currently I am working at this big company called Omni-Con Media Group.

42

: 00:03:53

So it's a big advertising agency, um which is kind of different than I think a lot of uh

the folks that come on here.

43

: 00:04:02

um I am a Stan developer.

44

: 00:04:06

and you can kind of get into how that even became because right now I'm not actually doing

so much like statistics and math, uh, I'm a managing director.

45

: 00:04:17

So I moved up, uh, in the company.

46

: 00:04:20

So I'm leading a lot of projects and it's a lot more, uh, business type stuff.

47

: 00:04:25

Now I still love, I love Stan and I love basing statistics.

48

: 00:04:30

And so I keep up with it.

49

: 00:04:33

and Bob.

50

: 00:04:34

Carpenter who has been on here before, he's like, that's so cool that this is like your

hobby.

51

: 00:04:40

I just, now I start telling people that, okay, this is like my hobby, Bayesian statistics.

52

: 00:04:47

Yeah.

53

: 00:04:48

And when actually did you discover Bayesian stance, were you introduced to Bayesian stance

first?

54

: 00:04:57

uh Because yeah, I mean.

55

: 00:05:01

sure you define that as a hobby, but you do a lot of stuff around Stan and often your

projects are not small ones.

56

: 00:05:07

So it takes time.

57

: 00:05:08

So yeah, I'm curious, you know, how did you get introduced to that and why that sticked

with you?

58

: 00:05:16

Yeah, so actually, let me just like go back a little bit.

59

: 00:05:21

So my background, so I majored in economics and math when I was an undergraduate at the

University of Colorado.

60

: 00:05:29

in Boulder.

61

: 00:05:30

So yeah, grew up in Colorado.

62

: 00:05:32

That's my kind of hometown.

63

: 00:05:34

Currently, I'm in the DC area.

64

: 00:05:37

um And I actually wanted to be an economist.

65

: 00:05:41

uh So I was one of these weird people who knew what they wanted to do, like right when

they went into undergraduate.

66

: 00:05:51

So right off the bat, I double majored in econ and math.

67

: 00:05:54

um And I kind of like basically planned out

68

: 00:05:58

the whole four years about what classes I needed to take in order for that to get all

done.

69

: 00:06:03

um And so I knew I was going to go to grad school um and the plan had been okay to get

like a PhD in economics.

70

: 00:06:16

I also was really interested in like cycling and triathlon.

71

: 00:06:22

And so I was on the triathlon team at Boulder and I've done a bunch of those.

72

: 00:06:28

And for a while there, ah it was like, okay, so I was balancing these things of, you know,

doing these academics, but then also training quite a lot.

73

: 00:06:39

At that time I was probably training like 20, 25 hours a week.

74

: 00:06:42

uh You know, was like, I just can't believe how fast I was back in the day now that I'm

feel like an old man.

75

: 00:06:51

ah But, but yeah, so then I was kind of like, you know, I.

76

: 00:06:57

wanted to try that stuff out too.

77

: 00:07:00

Unfortunately, I had like a really bad accident, senior year of college on the bike and I

needed to get a bunch of things fixed.

78

: 00:07:09

like knock some teeth out, my jaw.

79

: 00:07:13

And I took a year off between undergrad and grad school and I worked, actually worked for

a geologist up in North Dakota, weirdly, like 12 hour days.

80

: 00:07:26

I was working all night long um and we were up there and we were looking at the cuttings

that were coming up from drilling rigs like well so they would drill and these cuttings

81

: 00:07:38

would come up and we would log, we would log them and we would say okay you know we would

look at like the radiation, we would describe the appearance, the color, we'd test if

82

: 00:07:50

there was like any oil and stuff and it was kind of neat.

83

: 00:07:55

ah

84

: 00:07:56

And I made pretty decent money.

85

: 00:07:59

So I was able to basically save up because I was just living on the site there.

86

: 00:08:03

And then at that time, the place in North Dakota was really cheap.

87

: 00:08:09

It was like right at the beginning of like this boom.

88

: 00:08:13

And so I was able to like rent like an apartment for like something like $150 a month or

something ridiculous whenever I was up there.

89

: 00:08:24

And then

90

: 00:08:25

I got into grad school at University of British Columbia at that point, and this was for

economics.

91

: 00:08:31

I was like, hey, I think I don't know if I'm going to do a PhD.

92

: 00:08:36

I don't know if I'm going to go five full years.

93

: 00:08:39

Let me do this.

94

: 00:08:40

And it was great.

95

: 00:08:41

So I went up there.

96

: 00:08:43

It was a one-year program, Canadian University, was able to pay for everything.

97

: 00:08:49

And I did apply to some PhD programs I did get in at the end.

98

: 00:08:55

And at the same time, met who my current, my wife, current, my only, uh, my wife.

99

: 00:09:04

Uh, but at the time, you know, I met her and, uh I was like, you know, do I want to go to

PhD program?

100

: 00:09:11

And I'm sure that this relationship will stop, um, because it was nowhere near where she

was, or I could move to where she was.

101

: 00:09:20

So I decided to move to where she was.

102

: 00:09:23

Um,

103

: 00:09:25

So that kind of ended my, a lot of like my, academic trajectory.

104

: 00:09:33

Now the financial crisis happened at that same time, so 2008, 2009 had a really tough time

getting any jobs.

105

: 00:09:41

And I kept hearing back though, funny enough that they wanted more statistics and they

were like, wow, you have like econ and this stuff, but I really had no statistics.

106

: 00:09:50

Like I had a probability course.

107

: 00:09:52

So in undergrad at that time, like back in

108

: 00:09:54

the early 2000s.

109

: 00:09:59

Data science wasn't really a thing yet.

110

: 00:10:02

A lot of these tools that we take almost for granted today just weren't developed at all.

111

: 00:10:09

And so, I had one probability class and then I had whatever the probability stuff in

econometrics.

112

: 00:10:16

And I was like, you know what?

113

: 00:10:19

I guess I'm just going to go back to school because there were no jobs.

114

: 00:10:22

ah I have family in Ohio, so I applied to this university called the University of Akron,

ah quite like a local college, university, ah but I have family there.

115

: 00:10:34

They gave me full funding for a master's degree in statistics.

116

: 00:10:38

And I was like, all right, you let me try this.

117

: 00:10:40

Let me take two years and do statistics.

118

: 00:10:43

Now it was all frequentist.

119

: 00:10:44

ah I remember one of the professors there even like disparagingly like was like Bayesian,

like what are they, what do those people know?

120

: 00:10:52

At that point, I didn't really know much about anything Bayesian.

121

: 00:10:56

um But that's, yeah, that was like my first like real like deep dive into statistics.

122

: 00:11:04

then I started, I went into actuary and I did some actuary stuff uh in Cleveland for about

six years.

123

: 00:11:11

And then I moved over to media and we're finally gonna get to where the Bayesian stuff

comes in.

124

: 00:11:17

um So I got this job.

125

: 00:11:19

uh

126

: 00:11:21

company called Comscore, ah there was someone who had written some Stan stuff and I was

like, wow, I started to look at it and I was like, this is super interesting.

127

: 00:11:33

And as I was starting to read more about it and what it can do, I realized that this is

like what I've been looking for, like my whole, I don't know, like as I was looking at

128

: 00:11:49

analytic stuff,

129

: 00:11:49

I always wanted to build the model that I wanted to build.

130

: 00:11:53

And I didn't want to have to rely upon some package or like, you know, I think for like a

number of years, it was always like, how can I like take these packages and this thing and

131

: 00:12:02

like fit it to like the model that I have and like all the different nuances uh that you

have with your data.

132

: 00:12:09

And now I felt, okay.

133

: 00:12:11

You know, I didn't have to worry about the same floor.

134

: 00:12:13

didn't have to worry about all these other things.

135

: 00:12:15

And I could just focus on building the model.

136

: 00:12:17

ah

137

: 00:12:20

And so that was about 2017.

138

: 00:12:23

um I think I even asked some questions on the forum.

139

: 00:12:28

I asked some questions to Bob.

140

: 00:12:30

ah He was grateful enough to give me a lot of pointers, like he still does with people on

the forums.

141

: 00:12:38

And then I just spent a couple of years just building a bunch of different stand models um

and just getting very familiar with Bayesian modeling um and all like

142

: 00:12:50

the different gotchas and reading as much literature as I could.

143

: 00:12:53

And I started to then answer a bunch of questions on the forums.

144

: 00:13:00

And I started to really like uh everyone.

145

: 00:13:04

And I just felt very welcomed in the community.

146

: 00:13:10

And so I felt like at a certain point, I was like, you know what?

147

: 00:13:14

I know enough about this that I can, I think, give back to it.

148

: 00:13:20

So I started to add some small functions into Stan.

149

: 00:13:26

And that sort of got me in with more of the Stan developers and talking with them and just

like, what else could we do?

150

: 00:13:35

What issues do we have?

151

: 00:13:37

How can we make these things faster?

152

: 00:13:40

And that's really kind of how it spiraled out.

153

: 00:13:45

Now, in my professional life,

154

: 00:13:47

I went from like managing, so I went from like an individual contributor to managing like

a data science team while I was at CommScore and I was doing quite a lot of data science.

155

: 00:13:58

we did uh like vectorization of clickstream.

156

: 00:14:03

So a clickstream is all these panelists, so people who've agreed to have like their web

browsing, uh you know, tracked and you can see like, okay, they went to page one, two,

157

: 00:14:14

three or whatever.

158

: 00:14:15

And you can take that as a...

159

: 00:14:17

Corpus and you can like vectorize it.

160

: 00:14:20

now, you know, with LLMs, it's like almost commonplace.

161

: 00:14:24

But this was back in like 2017, 2018.

162

: 00:14:28

And like Word2Vec was just coming out.

163

: 00:14:31

It was like, wow.

164

: 00:14:32

And then, you know, we had like this whole new area that we're exploring with it.

165

: 00:14:37

So it was really cool.

166

: 00:14:38

So we were doing, you know, there was a lot of machine learning, a lot of bespoke

modeling.

167

: 00:14:44

And it was a...

168

: 00:14:45

really fun atmosphere to be in at the time.

169

: 00:14:48

uh Unfortunately, that company ah isn't doing that well, or it's kind of like been on life

support for a while.

170

: 00:14:55

And I didn't know, I didn't feel like I had like a great trajectory there um with how

things were going for the company as a whole, even though I loved it, I loved all the

171

: 00:15:07

people I worked with.

172

: 00:15:09

So I started looking elsewhere and I kind of stumbled upon this role at my current job.

173

: 00:15:15

And now I've been here about three years.

174

: 00:15:17

ah And I do some analytics, but yeah, it's mostly like project stuff.

175

: 00:15:23

And in that time, I've still wanted to contribute to Stan.

176

: 00:15:29

I feel like if anything, ah it's gotten much more researchy and technical and less, uh you

know, there's applied components to it, but I think more about the tooling.

177

: 00:15:44

uh...

178

: 00:15:44

anymore and just making making it better and better for people who use it stand up

hopefully better for people who using privacy or using any probabilistic programming

179

: 00:15:55

language uh...

180

: 00:15:56

at the moment

181

: 00:16:00

Yeah, thanks for this extensive uh background.

182

: 00:16:05

I didn't know that's how you start to withstand.

183

: 00:16:07

That's really interesting.

184

: 00:16:09

That's really similar to how I started with Poem C, Lurking in the discourse, eh answering

questions, bothering people with questions, you know?

185

: 00:16:21

We all have to start somewhere, right?

186

: 00:16:23

Yeah, exactly.

187

: 00:16:24

And since we didn't start with, like...

188

: 00:16:27

math degree or stats degree, well we had to had to fight it on the way around.

189

: 00:16:33

Yeah and that's really interesting because then like all the stuff you did has always been

very very core to Stan so that's why I think it's super interesting to have you on the

190

: 00:16:47

show today and I'm also not surprised that Bob helped you a lot because yeah he's just

incredible uh like very very generous person.

191

: 00:16:56

uh

192

: 00:16:57

with his time and also very good pedagogically, I think he's a very good explainer of very

complex topics, um even though he's himself very technical.

193

: 00:17:09

yeah, it's something I really appreciate in Bob and in most of the Stan team, I have to

say.

194

: 00:17:17

Yeah, I think just a side on that, the documentation that was really...

195

: 00:17:23

Mostly Bob was very instrumental in having the Stan documentation so extensive.

196

: 00:17:30

It was a key reason why I just found it.

197

: 00:17:32

It was getting another degree, reading through the Stan manual, basically.

198

: 00:17:37

And I think people still reference it as like, OK, go look at the Stan manual.

199

: 00:17:43

You can see this distribution of this transform or whatever.

200

: 00:17:48

So it is really that good.

201

: 00:17:52

for sure.

202

: 00:17:54

and so let's turn to a let's turn to a one of your big projects actually uh and and the

main one why i wanted to to have you on the show today which is um let's call that the

203

: 00:18:08

zero subnormal project so zero subnormal is is how we called it on the pimes east line and

and that project is dear to me because i i worked a lot on it with uh adrian zabolt

204

: 00:18:21

actually who

205

: 00:18:24

agreed to join us today to talk about that because I know like Adrian is really the father

of that stuff on the Pimesy side, me I was just, you know, applying his vision under his

206

: 00:18:36

supervision.

207

: 00:18:37

uh So yeah, he agreed to come on the show today.

208

: 00:18:40

So um to talk with us.

209

: 00:18:43

So welcome, welcome back to the show.

210

: 00:18:47

um Thank you so much for for being with us today for

211

: 00:18:52

listeners on the show, long-term listeners, they will recognize you from episode 74.

212

: 00:18:59

This is in the show notes.

213

: 00:19:00

You were here to talk to us about NutBuy, which is your implementation of HMC in Rust.

214

: 00:19:06

And also we talked about Zero Sum Novel, which is something we're going to talk about

right now with Sean.

215

: 00:19:13

So I know Sean and you, I think you know each other from StandCon, but if you don't, well.

216

: 00:19:22

Yeah, it's a...

217

: 00:19:23

Let's dive in.

218

: 00:19:24

So Sean, can you tell us basically what that zero-sum constraint that you implemented in

Stan is about?

219

: 00:19:32

yeah, like maybe first giving us an elevator pitch for that, why that's useful and when.

220

: 00:19:41

So, okay, I'm gonna like go back a little bit because it's really funny how it came to be

in Stan.

221

: 00:19:48

So...

222

: 00:19:49

Basically, he said, so he asked Brian Ward, who's one of the Stan developers, he's like,

OK, I really want the sum to zero vector in Stan.

223

: 00:20:01

And so there was a PR that was opened by Brian.

224

: 00:20:04

And it was that naive one where you just sum all the elements and take the subtraction of

it as the last element.

225

: 00:20:11

And I saw this PR and I was like, my God, no.

226

: 00:20:17

This is not good.

227

: 00:20:19

uh We don't want that way of doing it um in standard.

228

: 00:20:25

There are some reasons for that.

229

: 00:20:27

I think Adrian, you and I, can probably talk about here in a second.

230

: 00:20:31

um But I had been working uh actually with Bob and Seth Axon uh and Meno.

231

: 00:20:40

and some other people on the simplex transform paper, which is like at this point, three

years in the making, I don't know, we gotta finish it.

232

: 00:20:51

So it's like basically done.

233

: 00:20:52

think there's just like a few things that we should just put it up on archive.

234

: 00:20:56

But one of the transformations that uh we're using in that paper actually was derived from

this like thing called the inverse ILR transformation, which is done

235

: 00:21:09

and compositional data analysis where they do a bunch of stuff with like simplexes and

everything is like modeled on this like compositional data which uh you know it has like

236

: 00:21:20

this denominator and so it's constrained uh within this area now the way that came about

was hey they wanted this thing called an isometry uh in the coordinate space of the

237

: 00:21:33

simplex to get that you basically need to

238

: 00:21:39

And I don't know the correct term, but like orthogonalize.

239

: 00:21:44

We need some orthogonal sort of transformation.

240

: 00:21:47

And when you look at it, it's the same thing as doing like a zero sum contrast

transformation, basically.

241

: 00:21:54

uh And so in the intermediate step of that, you actually get out a zero sum, which is

pretty neat.

242

: 00:22:03

um And it's actually a really nice parameterization of a simplex.

243

: 00:22:08

which I think we're gonna put into Stan ah sometime soon.

244

: 00:22:12

ah But then I was like, hey, I was like, Bob, Brian, like this thing is so much better.

245

: 00:22:18

ah Now the issue that we had was that the way we had coded it up was like very

inefficient.

246

: 00:22:25

So was like, you would create this like big matrix, ah like Helmert, you know, contrast

matrix, and then you would like multiply it, ah which when you're doing that thousands,

247

: 00:22:35

times in HMC is super inefficient.

248

: 00:22:38

ah And so we start looking at it.

249

: 00:22:42

think Seth, found, he's like, OK, I can actually take out all the matrix stuff and just

make it vectors.

250

: 00:22:47

uh And then Brian starts to look at this.

251

: 00:22:49

And he's like, actually, we can just do one loop.

252

: 00:22:51

We don't even have to create these vectors.

253

: 00:22:53

uh And so let's just do one loop.

254

: 00:22:56

And so that's what it came to be now in Stan's.

255

: 00:22:59

So it's very efficient.

256

: 00:23:00

It's just one loop.

257

: 00:23:01

And you have this thing.

258

: 00:23:04

In the same time period, right?

259

: 00:23:06

think, Adrian, maybe this was even at StanCon, but it was like, oh, know, PyMC has this

already.

260

: 00:23:13

So then after we developed this whole method, I'm like, OK, let's see what PyMC is doing.

261

: 00:23:21

And I'm like, oh, OK, so it's definitely different.

262

: 00:23:25

But at the same time, it ends up being the exact same thing.

263

: 00:23:29

And so, different distribution.

264

: 00:23:33

Any orthogonal one will work.

265

: 00:23:36

Um, but I think, uh, the one neat thing that you have is it's very easy to, uh, go into

like multi-dimensional tensors and to make each like in, PyMC, right?

266

: 00:23:48

Um, that was kind of, that just fell out of the implementation.

267

: 00:23:51

At the beginning, I didn't think about that.

268

: 00:23:53

I just implemented the one dimensional case and then I was thinking, wait, wait a second.

269

: 00:23:58

I could just do that in a loop.

270

: 00:24:00

Let's do that.

271

: 00:24:02

And I really, yeah, and so I started, so our implementation, it's not quite as easy to

generalize so that, but I did work out for the matrix case and it's literally, it's just

272

: 00:24:14

one loop or well, it's like, you know, one loop through the matrix, which is really nice.

273

: 00:24:19

So you don't have like a double or anything and it's, so it's fairly low complexity.

274

: 00:24:24

Actually, is the PIMC implementation, I think that might iterate twice.

275

: 00:24:28

through the array or something.

276

: 00:24:29

not actually sure.

277

: 00:24:31

I think it does because you in the PyMC one, it's like has to sum through all the elements

like in that call or whatever in that dimension.

278

: 00:24:40

And then it goes through again.

279

: 00:24:42

And it like, I don't know, there's a subtraction and everything.

280

: 00:24:45

However, you can do it not exactly the way that you have it, but the way that I've done it

and you can do it.

281

: 00:24:52

So I just go through it once.

282

: 00:24:54

That's cool.

283

: 00:24:55

Yeah.

284

: 00:24:56

And you can actually generalize it to multidimensional tensors.

285

: 00:25:00

It's just, it started to hurt my head to even look at the matrix version.

286

: 00:25:06

And I actually, used Sympy to look at what the matrix equation looks like.

287

: 00:25:12

And I was like, oh my god, and I was telling Bob how like, oh, this is so cool.

288

: 00:25:17

And then if you have another tensor, it'll just add these other things.

289

: 00:25:20

And he's like, oh, this is like a Rimmagewan.

290

: 00:25:25

You know how long the equations were and everything ah But yeah, I think we'll just do the

matrix version next and Stan because we don't really have anything that's like more than a

291

: 00:25:35

matrix there's no um as like a constraint and so it would be like a bigger project to add

that Yeah, there's there's another generalization that kind of I think I coded some some

292

: 00:25:50

really horrible code for one project once that didn't really

293

: 00:25:54

end up being used, but that I would really like is kind of for racked arrays.

294

: 00:25:59

If you kind of have an array of predictors and you want to sum it to zero, but the number

of elements is different each time.

295

: 00:26:08

Oh, yes.

296

: 00:26:10

So if you have that, maybe you could kind of get the tensor version based on that then.

297

: 00:26:15

But doing that efficiently was a lot harder because then...

298

: 00:26:19

uh

299

: 00:26:21

I think it wasn't in the end, it wasn't as efficient as the as the nice clean one, but

still not horrible.

300

: 00:26:30

So that's something that we might want to add in the PMC side that would be cool as well.

301

: 00:26:35

Yeah, I have a, actually, this is so funny.

302

: 00:26:39

I'm working on a case study with Mitzi Morris.

303

: 00:26:42

She actually did like 99 % of the work.

304

: 00:26:44

She was just like, that's like, I want to do this like sum to zero.

305

: 00:26:47

So like, I'll put your name on it, basically, but she's really done.

306

: 00:26:51

of it.

307

: 00:26:51

uh But in it, she's doing like an ICAR model.

308

: 00:26:56

in it, uh they have a different like a ragged array, basically.

309

: 00:27:00

But in Stan, they don't have ragged arrays.

310

: 00:27:02

So you just make like one big vector.

311

: 00:27:05

then, yeah, you slice it.

312

: 00:27:08

ah And then you go through and you use the transform to get the zero sum from it.

313

: 00:27:14

And that actually works pretty well.

314

: 00:27:15

Because then you're literally only going through because it's just a loop.

315

: 00:27:20

you only have to go through it once.

316

: 00:27:21

Yeah, I think there are actually quite a few different related things that might be nice

for different applications.

317

: 00:27:29

For instance, you could also have constraints where you say, I want A times my vector to

be zero.

318

: 00:27:37

So you just want the subspace that's defined by the kernel of something.

319

: 00:27:42

Or you want A times B greater than B, than, than.

320

: 00:27:47

see some other constant and have some convex shapes or something.

321

: 00:27:51

I guess there's a lot of space to experiment there and come up with cool things.

322

: 00:27:57

And then I guess the applications would come up at some later point, I think that would

just happen naturally.

323

: 00:28:03

Yeah, actually, the...

324

: 00:28:04

So in the simplex version, we're using that ILR transformation, which is basically the

same thing as our sum to zero.

325

: 00:28:13

um

326

: 00:28:14

And then you do like a softmax transform.

327

: 00:28:16

And because everything is like very nicely uniform from like the sum to zero, uh it just

like really samples nicely.

328

: 00:28:26

I'm doing like I have it's on this chalkboard I'm pointing over here, but I worked out uh

the doubly stochastic version of that like really nice transform ah using like the ILR and

329

: 00:28:42

like everything.

330

: 00:28:43

uh

331

: 00:28:44

And it's super cool.

332

: 00:28:45

And I'm like, I can't wait to put that in now doubly stochastic.

333

: 00:28:48

I don't know how many times uh people would even use this.

334

: 00:28:52

Like, I think I've used that like once, ah possibly ever.

335

: 00:28:57

ah But the thing that I'm super interested in is I think it can be generalized to a matrix

where ah you want the sum a rectangular matrix.

336

: 00:29:09

So a doubly stochastic matrix has to be square, but

337

: 00:29:13

If we want a rectangular matrix and we want the rows and the columns to have a certain

sum, and then we know what the total of the matrix is as well, I think it can generalize

338

: 00:29:24

to something like that, which in my line of work, there's some cool stuff with that where

you might know some stuff about the population and you might have some information about a

339

: 00:29:33

sample, uh but you actually don't know what the population like internal values of the

matrix are, but you know what the

340

: 00:29:41

totals of the columns and the rows are.

341

: 00:29:44

So can you make, I don't get it yet.

342

: 00:29:47

Yeah, yeah.

343

: 00:29:48

like, let's say, ah actually one of the things that I've seen like in the application is

like with voting.

344

: 00:29:56

And so you might have a bunch of voters, ah some state, let's say Florida, uh and you

might have a bunch of information about some of their demographics.

345

: 00:30:05

So let's just say you have race information and age information.

346

: 00:30:10

But you don't have the cross between race and age.

347

: 00:30:14

I think I get now what you're getting at.

348

: 00:30:16

so you know how many people of these races voted this way.

349

: 00:30:22

And you know how many people of certain ages voted this way.

350

: 00:30:25

But you don't know that table, the marginals of race by age.

351

: 00:30:31

But if we have this nice constraint and we know the totals, we could use Bayesian methods.

352

: 00:30:36

to then sample likely values.

353

: 00:30:38

We can have a stochastic value matrix here, and it's super neat, but you really need to

have a nice transform in place, and so you're not just doing Yeah, because otherwise the

354

: 00:30:48

sample will probably die pretty fast.

355

: 00:30:50

Exactly.

356

: 00:30:51

Yeah, especially like that.

357

: 00:30:52

Yeah, it's really big.

358

: 00:30:52

So that's the thing I've been working through trying to get.

359

: 00:30:58

That sounds But yeah, it's totally related.

360

: 00:31:00

I think there's so many things with the sum to zero that we can do.

361

: 00:31:06

Alex, you had asked me at StanCon, you were like, what if we have a new group?

362

: 00:31:14

immediately, I go back to this forum post where Michael Battencourt's talking about, no,

you can't generalize.

363

: 00:31:22

You can't generalize with Sunda group.

364

: 00:31:27

OK, here's the thing.

365

: 00:31:28

All right, so I did some homework, Alex.

366

: 00:31:32

OK.

367

: 00:31:33

So this is my.

368

: 00:31:35

This was my attempt.

369

: 00:31:36

I did some homework.

370

: 00:31:37

I'm super interested to hear um your guys' thoughts on it.

371

: 00:31:42

So wait, let me set the context for listeners, because most of them were not at StandCon.

372

: 00:31:47

um So first, folks, for those who really don't know about the zero sum normal and the zero

sum constraint, really, recommend listening to Adrian's episode, because we go into

373

: 00:32:04

details.

374

: 00:32:04

about why and how you would use that and the strength and the weaknesses.

375

: 00:32:10

But the short version is you would use that mainly when you have, let's say, categorical

covariates in your regression.

376

: 00:32:19

So for instance, like in the Raiden example that everybody knows from Andrew's paper, you

have homes.

377

: 00:32:27

And so if you have a global intercept in that model, for instance, which would be the mean

of the homes, then

378

: 00:32:34

if you partially pull the coefficients on the homes you are going to have to use a

zero-sum constraint on the parameters of the home otherwise you have an other

379

: 00:32:46

parameterization because you're estimating two intersects in a way so you would have this

parameter in there that's pm.zero-sum normal dims equals home but then

380

: 00:33:00

my question to to Sean and that's because it's a question I always I often have often work

on whether for elections or other purposes is like what if you have so homes here we often

381

: 00:33:15

call that groups like if you use BRMS or BAMI that's what they call that their group so

that's one group my question was what if you have a new element of that group so the group

382

: 00:33:28

is known

383

: 00:33:29

and you've inferred the population parameters of that group, but you observe a new home.

384

: 00:33:38

How do you handle that?

385

: 00:33:40

Because then there's your synonym constraint is not observed in the test set because now

the mean changes because there is a new element.

386

: 00:33:53

So yeah, that's the context.

387

: 00:33:55

so apparently Sean, you've done some homework about that.

388

: 00:33:57

So thank you.

389

: 00:33:58

I'm also very curious to hear about what Adrian's thinking is about that.

390

: 00:34:04

first, let's hear from you, Sean.

391

: 00:34:06

Yeah, okay.

392

: 00:34:08

the distribution, so okay, we take our zero sum and we say that if we assume normality, so

it only works with normal, I think.

393

: 00:34:21

Okay, okay.

394

: 00:34:22

At least I also know how.

395

: 00:34:24

or only know how to do it with a normal if it's not normal.

396

: 00:34:27

Yeah, there's some clever ways to do it with something that's normal-ish, but closed under

some of the normal properties of addition and all that.

397

: 00:34:38

But anyway, Often, when we do hierarchical models, we do have this normal property of

these random effects and fixed effects and everything.

398

: 00:34:50

So it's nice because of that.

399

: 00:34:52

um And when you look at the distribution of the zero sum and you estimate what that

covariance matrix is, um it's like it's got this diagonal property and then it has all the

400

: 00:35:05

off diagonals have like a negative covariance.

401

: 00:35:09

um And what you can actually, what we can actually see from all of this is that using like

the, I'm like, I was just like, want to get into all the.

402

: 00:35:21

But I feel like if you're listening to this, it would be super annoying.

403

: 00:35:25

So I'm just going to try to say it in a general way, which is you can take the property

and what you've estimated, and you can actually back out what the population mean effect

404

: 00:35:36

is, or the variance of the population mean.

405

: 00:35:40

So the variance of your mu, your intercept, or your fixed effect uh in the zero-sum case

will be much smaller.

406

: 00:35:50

when you use a zero sum, the variance of that mu will be much smaller eh than if you just

had a regular hierarchical model.

407

: 00:36:00

Now, the really cool thing, though, is that with a zero sum thing, the estimates that you

get on the mean of mu are actually typically much better often.

408

: 00:36:10

And you get higher ESS and all these things.

409

: 00:36:13

The HMC is able to sample from the zero sum, often much easier than the non one.

410

: 00:36:19

ah And so you can actually go back and you can get the population standard deviation and

you can uh generalize out because we're going to know, we can back out like what the

411

: 00:36:31

covariance and take those effects into account.

412

: 00:36:34

And then we could say, if we had a new group, this is how all of the other effects would

change in the zero sum because we anyway, there's, there's a way that you can like look at

413

: 00:36:45

all the addition, like what you add to it.

414

: 00:36:47

And then

415

: 00:36:48

the new one, but it's just going to be based off the prior and what you've estimated from

your sample.

416

: 00:36:54

So it's not going to be like, maybe there's some clever way to include like some

information about how large that group is or some other standard deviation effects.

417

: 00:37:03

But right now I just like looked at it from, okay, if you use your posterior of what you

have in your prior and then this information about the multivariate normal that's implied

418

: 00:37:13

by the zero sum, then I think, yeah, you can do it.

419

: 00:37:17

which is very long winded.

420

: 00:37:21

But I don't know, Adrian, I want to hear maybe you have a cleaner way, because I like

literally just worked on this for, you know, over the last day or two.

421

: 00:37:28

so it's a bit new in my mind.

422

: 00:37:31

what I really like is kind of the just, oh, sorry.

423

: 00:37:33

Yeah, go ahead.

424

: 00:37:34

Yeah, no, just Adrian, if you like just to tell you guys, if you feel like you want to

share your screen, because it's going to be easier to explain, you can do that right now,

425

: 00:37:44

especially now.

426

: 00:37:45

on the YouTube episodes we have the video so people if they want they can refer to that.

427

: 00:37:50

So yeah, if that's helpful to your explanation at some point, sure, feel free to do that.

428

: 00:37:55

You should be able to share a screen.

429

: 00:37:58

And now Adrian, yeah, back to you.

430

: 00:37:59

So if I understood correctly, Sean, what you said, it's mainly, well, you cannot directly

of course have information about that new home for instance, but you learned about the

431

: 00:38:10

population parameters.

432

: 00:38:12

So you can use those in your predictions and

433

: 00:38:15

Well, if you had better priors about that new home in particular, then definitely use

those prior in conjunction with what you learned about the population and that you can

434

: 00:38:26

make a prediction.

435

: 00:38:27

Yeah.

436

: 00:38:28

And how like the zero sum constraint, like how that manifests into the model.

437

: 00:38:34

you're, it's almost like you're backing out that effect and then applying this new group.

438

: 00:38:39

And then you can like reapply it and then you can re-derive everything.

439

: 00:38:44

I think what cleared up my thinking around that a little bit was seeing what happens if

kind of really redistinguishing between effects relative to the population mean and

440

: 00:38:56

effects relative to the sample mean.

441

: 00:38:58

So if I have an infinite population of homes or population that I assume is infinite and I

have a sample of 50 homes from that population, then I can ask is the radon level or

442

: 00:39:10

whatever level

443

: 00:39:12

50 % higher than the average among the homes that I observed or is it 50 % higher than the

average of all the homes that there are?

444

: 00:39:23

Kind of one is relative to the population, the other is relative to the sample.

445

: 00:39:27

And if I don't use a zero sum normal in my model, I can always compute both, right?

446

: 00:39:31

I can just say, okay, I've got my posterior and I can take em the home specific value and

just subtract the mean of all the

447

: 00:39:41

the observed homes in that draw.

448

: 00:39:44

Then I get back kind of the zero sum effects kind of relative to em the population of

homes that are observed.

449

: 00:39:53

the relative to the sample of homes that are observed.

450

: 00:39:58

And sometimes kind of depending on the application, sometimes you might want to look at

one and sometimes you might want to look at the other, right?

451

: 00:40:05

Maybe I'm interested in how much more

452

: 00:40:08

radiation is there than the average home or I might be interested in how much more

radiation is there compared to the homes that I observed.

453

: 00:40:17

Especially if the population I'm looking at actually isn't infinite, like, I don't know,

US states or something.

454

: 00:40:24

It doesn't really make much sense to ask is that value in that state higher than some

infinite population of US states?

455

: 00:40:33

mean, there are only that many.

456

: 00:40:38

uh Yeah, I mean, actually that's just like in my, so that's a great, I I love that, like

the difference between the sample and the population.

457

: 00:40:47

Yeah, if I think of alpha as like our random effect, and if you take the mean, and so that

would be the random effect by groups, so you might have K groups, whatever.

458

: 00:40:59

If you take the mean of it, in the zero sum case, that mean is going to be zero, right?

459

: 00:41:03

By construction.

460

: 00:41:05

But in the case of just like a traditional hierarchical model, it will be non-zero and

you'll have some non-zero variant.

461

: 00:41:11

And so I think that is kind of the key, right?

462

: 00:41:14

Like you have, this average effect and then, OK, we can use that to sort of back

everything out.

463

: 00:41:22

But it's actually really neat.

464

: 00:41:23

uh And Alex, I just want to thank you for bringing that up ah at StanCon because I think

it is a

465

: 00:41:31

It is quite neat.

466

: 00:41:32

I don't know, do you guys have anything written up on this?

467

: 00:41:36

You're asking the wrong person for a nice written up.

468

: 00:41:40

I know.

469

: 00:41:43

It should be something that we should write up.

470

: 00:41:47

To get back to how you can then get back the population effects, if you start with your

model that counts everything relative to the population, so the infinite population of

471

: 00:41:59

homes.

472

: 00:42:01

then it turns out you can just uh distinguish that in the model itself.

473

: 00:42:08

So you can take a normal distribution with mean mu and some fixed mean, and you can just

uh decompose that into a zero-sum normal and a normal distribution for the sample mean.

474

: 00:42:22

So you have a scalar distribution that's the sample mean of the population, of the sample,

not the population.

475

: 00:42:29

And you have a zero sum normal that tells you what's the distance to that sample mean.

476

: 00:42:35

um And then kind of if you write it down, then you have kind of a scalar value with a

normal distribution and you have plus the zero sum normal plus the normal distribution

477

: 00:42:46

from your intercept.

478

: 00:42:48

And then you can marginalize intercept plus that thing.

479

: 00:42:53

And then you can kind of turn a normal hierarchical model into just a

480

: 00:42:58

It's just strictly a reparameterization of a zero-sum normal.

481

: 00:43:02

And you can then take the zero-sum normal and go back the other way and try to kind of

undo the marginalization manually after sampling, which is really annoying to do.

482

: 00:43:13

So I'm usually too bored.

483

: 00:43:17

I usually don't, to be honest, but eh it's definitely possible.

484

: 00:43:20

If you add more layers, it gets more involved.

485

: 00:43:22

So less fun.

486

: 00:43:25

Yeah, it would be nice to have like an automatic way to just do this.

487

: 00:43:29

I mean, the zero sum, does, at least in my test, it does sample often almost like 95 % of

the time better than just like the regular hierarchical model.

488

: 00:43:41

So you're going to get better estimates.

489

: 00:43:44

And then when you do all the backing out, of course, like it translates to all of that.

490

: 00:43:50

Yeah, I would definitely really like to have a library that you can just tell here my

effects.

491

: 00:43:55

code that as a zero sum normal, but give me back a trace object that includes not just the

zero sum effects, but all the population effects as well.

492

: 00:44:03

So you have access to both and you can look at whatever you're actually interested in in

your application, but the sampler sees whatever's better for the sampler.

493

: 00:44:12

So that would be really nice to have, but I don't think that exists at the moment.

494

: 00:44:17

project for NutPy.

495

: 00:44:24

And we started doing a bit of that, Adrian and I, don't know, a year ago or something like

that, you know, where we have a proof of concept, but then...

496

: 00:44:31

we never cleaned up, No, we never cleaned that up.

497

: 00:44:34

And also it's because like it gets, like if we want to put that into PMC, it has to deal

with all the different models people can have.

498

: 00:44:43

And then if you have a lot of zero-sum normals in there, that's like really hard to track

everything down.

499

: 00:44:51

I think the stuff we had was already quite complete in the sense that it was like, main

effect for a main effect for B and interaction of A and B.

500

: 00:44:59

And then how do you recover, uh, the population means because you have series of normals

everywhere in this model.

501

: 00:45:07

But like people can make arbitrarily complicated model in time C.

502

: 00:45:12

So, uh, then it's, it's very hard to, to generalize.

503

: 00:45:17

Uh, but yeah, that would definitely be a very.

504

: 00:45:21

very valuable contribution.

505

: 00:45:23

I know Luciano pass did a lot of work on that with a client at Pine Cylamps.

506

: 00:45:31

But I don't know how generalizable that stuff is.

507

: 00:45:35

yeah, like that's something we should probably ask Luciano what he thinks about that.

508

: 00:45:40

Maybe we need three or four non-generalizable implementations.

509

: 00:45:44

And then at some point we can extrapolate the actual implementation that works.

510

: 00:45:49

at least most cases.

511

: 00:45:51

yeah.

512

: 00:45:52

exactly.

513

: 00:45:54

But for sure, it's the good thing.

514

: 00:45:56

mean, the silver lining though, is that I think if you like you can still get a very like

a decent prediction, at least in my experience, if you take the population parameters that

515

: 00:46:13

you learned at inference time, and then have a more informed prior for

516

: 00:46:19

your new members of the existing groups so you can still get even though mathematically

it's not the best that you could have because the best would be to do what we just talked

517

: 00:46:29

about that's a practical solution that's not too hard and that at least is accessible to

most people right now so yeah don't despair folks like even if we can do that right now

518

: 00:46:41

like you still can have decent predictions for new members of existing groups uh

519

: 00:46:49

without being completely doomed.

520

: 00:46:51

Yeah, so actually on that point, the transform, because I went through, another way that

possibly I think that you could do that is if you unconstrained, like if you do the

521

: 00:47:07

unconstraining transform of your estimated zero sum vector.

522

: 00:47:13

And so now you'll have k minus 1 elements, right?

523

: 00:47:18

And then you add a new one.

524

: 00:47:19

Well, like you just randomly draw a normal zero one or whatever it is that you're kind of

prior on the unconstrained scale would be.

525

: 00:47:30

ah And then you just go back.

526

: 00:47:32

oh So in the case, think for us, it would be you just add that whatever you drew from the

normal zero one divided by the square root of like k plus

527

: 00:47:47

one times k plus two.

528

: 00:47:49

And then the last element would be the minus of that.

529

: 00:47:53

But I think that's still missing something because we still need to kind of then we are

adding that variance in a sense.

530

: 00:48:00

So kind of we were saying the variance from the sample mean we are adding it back there.

531

: 00:48:06

But that variance is kind of absorbed in the intercept term now.

532

: 00:48:09

So yeah, this would actually need to take out the variance out of the intercept.

533

: 00:48:14

So we can't just add the variants, we also need to remove it at the other end, Yeah, but

there might be a way where you can use uh that unconstrained transform and then it would

534

: 00:48:28

be much simpler in my mind because it's just like, I just have this function, it would

apply all the stuff and then it would come out.

535

: 00:48:34

um Anyway, I'd have to think about it more.

536

: 00:48:42

Yeah, story of my models.

537

: 00:48:47

So we only have 15 minutes left, I think, before Sean has to leave.

538

: 00:48:54

So we can switch gears now.

539

: 00:48:58

Adrian, you can leave if you want, but I think you're going to like the next topic.

540

: 00:49:02

So do whatever you want.

541

: 00:49:03

Again, thank you so much for popping in and talking about that stuff.

542

: 00:49:07

That's like, yeah, it's always very interesting to me,

543

: 00:49:11

get back to, okay, where are we right now?

544

: 00:49:13

And now we're thinking on that topic.

545

: 00:49:16

um But next, what I'd like to talk about with you, Sean, is the flexible Kolesky

parameterization you've recently written about in a preprint.

546

: 00:49:27

And I think that's also what your stand-gun presentation was about.

547

: 00:49:30

um So if you can tell us a bit more about that, what are the practical takeaways?

548

: 00:49:36

What are the strengths and weaknesses?

549

: 00:49:39

uh

550

: 00:49:41

I'm guessing that if Adrian sticks around since he heard about Kolesky, he'll have

questions.

551

: 00:49:48

That definitely sounds interesting, now it gets out that I didn't actually listen to your

talk.

552

: 00:49:55

I think I was preparing something for my talk.

553

: 00:49:57

There is a YouTube video.

554

: 00:50:01

I actually felt so bad because I ended up missing, I think, like a half day of talks uh

just because I ended up working on

555

: 00:50:08

my presentation and on some other things for, for StanCon.

556

: 00:50:12

ah But I did hit, I did get your talk, so you should feel bad.

557

: 00:50:16

ah I think it's fine.

558

: 00:50:21

So actually, yeah.

559

: 00:50:23

So this thing has been like, probably like four years ago, ah thinking about Cholesky

parameterizations and doing some stuff.

560

: 00:50:31

um And there was, there was someone on the forum who wanted, they wanted just all positive

561

: 00:50:38

correlation matrix.

562

: 00:50:39

And I was like, OK, we'll just take instead of like a hyperbolic tangent where you

transform the raw parameters to negative 1 to 1, just have it be like exponential, just

563

: 00:50:53

have it 0 to 1 or whatever.

564

: 00:50:56

And that works.

565

: 00:50:59

You do get all positive.

566

: 00:51:00

But it doesn't give you uniformity um over the space of positive.

567

: 00:51:08

correlation matrices.

568

: 00:51:10

And that's because this other guy, Enzo, who was also at StanCon, he messaged me because

he saw that post and then he noticed this issue with it.

569

: 00:51:21

And I was like, hmm, all right.

570

: 00:51:22

So then I said, OK, that's interesting.

571

: 00:51:25

Let me think about it.

572

: 00:51:27

So I go off and I think about it for a little bit.

573

: 00:51:30

uh And I read a bunch of different papers.

574

: 00:51:34

And I'm like, OK, I'm looking.

575

: 00:51:36

I start to see that there

576

: 00:51:37

There's some commonalities, and I start to piece this thing together.

577

: 00:51:41

And that's where this preprint came about.

578

: 00:51:46

And you can actually see how there these bounds ah that have to be applied.

579

: 00:51:51

And if you apply them such that you do the Jacobian adjustment for the bound, that's just

the Jacobian.

580

: 00:51:58

That's all you need.

581

: 00:51:59

just have to sample from that bound, and that will preserve the correlation matrix.

582

: 00:52:06

uh

583

: 00:52:08

And it works really well for positive correlation matrices.

584

: 00:52:11

And when we compared it to rejection sampling, you can see that it does uniformly uh

sample over the space of positive correlation matrices.

585

: 00:52:21

uh And what is very interesting about that is in the Cholesky formulation, uh if you see

it's a lower triangular matrix,

586

: 00:52:31

And these things like each row ends up being somewhat like of a dot product of like the

previous elements of the other row in order to get the full matrix.

587

: 00:52:42

And what the reason why the original time when I did that, I just constrained it to be

positive, the raw scale, that doesn't work.

588

: 00:52:53

It's because if you multiply two negative numbers together, you can still get a positive.

589

: 00:52:57

So there's a space of these

590

: 00:53:01

vectors, like the dot products of those vectors, such that they can actually be negative.

591

: 00:53:07

Some of the elements can be negative, but they have to be negative in a very special way,

such that when they multiply together, it ends up being positive.

592

: 00:53:17

And that was basically the density that we were missing from that other piece.

593

: 00:53:22

And so this came about.

594

: 00:53:24

then recently, I'm actually working with a few other people.

595

: 00:53:31

um I show how you can actually put block structure into your matrix with this way.

596

: 00:53:39

So not only does it allow bounds, but it allows you to put in specific known values or

specific constraints such that, this correlation has to be equal to this one and stuff

597

: 00:53:49

like that.

598

: 00:53:50

So now instead of having k choose 2 or n choose 2 free parameters for correlation matrix,

you'll have

599

: 00:54:01

n choose 2 minus whatever those restrictions are, which in this guy's case, it's actually

really nice because it like halves the number of free parameters.

600

: 00:54:12

he gets like...

601

: 00:54:12

correlation matrices, they have so many parameters so quickly.

602

: 00:54:17

It's...

603

: 00:54:17

Yeah, it's kind of a nightmare.

604

: 00:54:19

And yeah, and so we're working through that.

605

: 00:54:22

And then recently there was some other people I've been talking to and we're going to look

at some other ways of making...

606

: 00:54:30

these parameterizations hopefully better.

607

: 00:54:32

uh But that will be out soon.

608

: 00:54:36

So the block structure, mean, you can, like that paper, allows it.

609

: 00:54:39

That paper was super short.

610

: 00:54:40

It's like three pages long on archive.

611

: 00:54:43

So I think most people didn't understand what the hell I was talking about or what it was

supposed to be doing.

612

: 00:54:49

They thought, it's just like bounds on this.

613

: 00:54:52

But those bounds are really key.

614

: 00:54:54

And they're key to actually putting in like all this structure.

615

: 00:54:58

uh

616

: 00:55:00

There is an issue uh that I do want to mention, and I do mention it in the paper.

617

: 00:55:04

There's a section about an issue with it, which is that ah you need to...

618

: 00:55:10

So the bounds can sometimes just not work.

619

: 00:55:14

They can be unenforceable.

620

: 00:55:17

So that means you would get a divergence.

621

: 00:55:20

um So how would that come about?

622

: 00:55:22

So let's say you put a certain restriction.

623

: 00:55:25

So in the case where there's no restriction, it will always work.

624

: 00:55:30

But in the case where there are restrictions, um you might have some restrictions that if

you didn't take into account that restriction, like way up ahead at the very top of the

625

: 00:55:41

matrix, so like if you have a restriction way down uh and you're just sampling, and it

doesn't know about it, and then all of sudden it gets to it and says, there's not enough

626

: 00:55:50

space left to make this restriction happen.

627

: 00:55:53

um So in the case of like all negative correlation matrices,

628

: 00:55:59

that space is actually smaller than all positive.

629

: 00:56:02

And it's much harder to sample from that.

630

: 00:56:04

And so you can see, like, if you constrain everything to be negative, you'll get tons of

divergences, especially if you put, an LKJ that's, really um small, like, one or two.

631

: 00:56:16

um And so if the elements, if they get big enough, if you draw something such that it's,

like, a negative 0.9, there's, like, almost nothing left.

632

: 00:56:25

And so it's like, no, in order to...

633

: 00:56:28

have any space left, this next parameter has to be drawn such that it's positive.

634

: 00:56:34

But then we're saying that the bounds must be negative.

635

: 00:56:37

And so it says, OK, well, that's incompatible, and you'll get a divergence.

636

: 00:56:42

That reminds me a little bit of trying to sample from convex spaces, like a polygon, for

instance, where you always also run into issues like that.

637

: 00:56:53

for instance, there definitely exists a bijection from

638

: 00:56:57

are squared to a square, but actually writing down one, even if it's, especially if it's

not really a square, but something else, is actually surprisingly hard.

639

: 00:57:08

And I never managed one that doesn't involve an ODE somehow.

640

: 00:57:14

And I mean, it basically is the same because you're looking at like these intersections of

like, of this like conic section with the unit sphere.

641

: 00:57:24

And you're trying to make sure like, okay, like

642

: 00:57:26

I want to make sure that I'm within that.

643

: 00:57:29

But yeah, you have to either do some crazy look ahead.

644

: 00:57:33

And right now, I've derived a one look ahead.

645

: 00:57:36

But really, you would need the n look ahead.

646

: 00:57:38

You would need to look ahead to every single element.

647

: 00:57:40

Make sure that you have all the correct bounds.

648

: 00:57:44

But then it would update.

649

: 00:57:46

So you'd always be looking ahead, right?

650

: 00:57:48

Because once you draw a parameter, that updates what all the bounds can be.

651

: 00:57:53

But now that you've had restrictions, I don't know, it gets like super complicated.

652

: 00:57:58

Is that actually a special case of a convex subset of...

653

: 00:58:03

So I guess we can just say we have vector space R to the N and say, okay, some convex

subspace of that.

654

: 00:58:13

So we have some matrix A, and I guess there's the...

655

: 00:58:20

So we want all x such that a times x greater than b.

656

: 00:58:26

think, I know that that doesn't cover em covariance matrices yet because I think that we

need something quadratic, right?

657

: 00:58:34

Because we need positive definite, but it's still a convex space at least I think.

658

: 00:58:41

uh Yes, I think so.

659

: 00:58:44

keeps coming up.

660

: 00:58:45

That's kind of within the last two or three weeks, I don't know, several times in

completely different contexts.

661

: 00:58:54

Yeah, there's, I don't know, like this guy who's talking to me now, he's talking about um

sampling eigenvalues such that your sample eigenvalues, like you have a prior.

662

: 00:59:07

So you can take your sample, and you can get eigenvalues.

663

: 00:59:10

And then you would actually put a prior such that your correlation or covariance matrix

would have very similar eigenvalues.

664

: 00:59:17

um

665

: 00:59:18

And he's like sort of worked it out, but then there's still a case where like these, two,

the largest and smallest are like way different than what it should be.

666

: 00:59:27

So like there's some fundamental issue there.

667

: 00:59:29

especially kind of eigenvectors for instance, are also an issue, right?

668

: 00:59:33

Because they kind of, issue as happens when you want to sample an angle, that it's, you

want to sample in some manifold that's not diffiomorphic to

669

: 00:59:49

to our usual space, to the Euclidean space, because the topology is different.

670

: 00:59:55

And that also happens with eigenvectors, for instance.

671

: 01:00:01

there's, I mean, yeah, I think if you were doing like, for Romani and like Hamiltonian

Monte Carlo, I'm sure that there's like ways to efficiently sample a lot of these.

672

: 01:00:13

But yeah, if we want to use it in our implementation of HMC, where we

673

: 01:00:19

Yeah, I need this diffeomorphism between Euclidean space and the space of positive

definite matrices with 1 from the diagonal.

674

: 01:00:29

I don't know.

675

: 01:00:30

Maybe.

676

: 01:00:31

it's the problem sometimes that diffeomorphism just doesn't exist, It's just

fundamentally...

677

: 01:00:35

I mean for covariance matrices or correlation matrices it does exist.

678

: 01:00:40

But for eigenvectors it doesn't.

679

: 01:00:43

Yeah, so then you have to figure out certain charts, like spaces of it, and then just

like, OK, I'm going to just have it just be this chart of that space.

680

: 01:00:53

So it won't be the full space.

681

: 01:00:56

yeah, think orthogonal matrices have the same thing, because there's an indeterminacy with

the sign.

682

: 01:01:01

And so you have to just set it to typically positive, right?

683

: 01:01:08

Um, nice.

684

: 01:01:11

Super like super fascinating.

685

: 01:01:13

Thanks guys.

686

: 01:01:14

I drank your, uh, your comments.

687

: 01:01:17

Um, I think I understand 50 % of them though.

688

: 01:01:21

Um, which is already good.

689

: 01:01:24

Um, yeah, if I understood correctly, we don't have that constraint, um, like already like

ready to use in Pine C, uh, Adrian.

690

: 01:01:38

Do you have LKJ Cholesky?

691

: 01:01:40

No?

692

: 01:01:41

We have it.

693

: 01:01:41

Yeah, we do have LKJ uh Cholesky, but I think the one you set up that's like only

positive, I don't think we have that out of the box.

694

: 01:01:54

But you have a Cholesky factor that you can sample uniformly.

695

: 01:01:58

okay.

696

: 01:01:59

Actually, we, I'm not sure.

697

: 01:02:02

So we definitely have a Cholesky covariance thing.

698

: 01:02:08

I'm not actually sure if we ever implemented the correlation one.

699

: 01:02:13

No.

700

: 01:02:15

Yeah, I mean, I think so you mean LKJ core?

701

: 01:02:18

Yeah, we have we have that but I think it's wrong for some reason.

702

: 01:02:24

we tell people only the bad one that that kind of does rejection sampling if you run into

the border or something but I think so.

703

: 01:02:31

Yeah.

704

: 01:02:31

So the covariance one is properly defined.

705

: 01:02:35

that.

706

: 01:02:35

Yes.

707

: 01:02:36

Yeah.

708

: 01:02:36

So yeah, we always use LKG-COV uh decomposition.

709

: 01:02:41

And then you get that correlation matrix also with that, for sure.

710

: 01:02:47

we don't use LKG-Core because there's an issue in there.

711

: 01:02:53

So we should change that.

712

: 01:02:57

There's a guy who was doing a bunch of stuff in Stan.

713

: 01:03:00

And then I think he did some stuff in TensorFlow probability.

714

: 01:03:03

His name's Adam Haber.

715

: 01:03:05

And he implemented in tensor flow probability the construction of the Cholesky factor of

correlation matrices um is quite clever.

716

: 01:03:14

And I went through it actually the other week uh to figure out why this works.

717

: 01:03:23

And I figured out that it's actually a stereographic transform uh from the positive pole

of the sphere.

718

: 01:03:32

OK, so what he did, or what is done there, and it's super efficient.

719

: 01:03:36

We should put this in the stand.

720

: 01:03:37

You should put this in PMC.

721

: 01:03:38

um So when you sample the unit vectors, so the um Cholesky factor of correlation matrices,

each row is a unit vector.

722

: 01:03:48

um And on the diagonal, the diagonal must be positive.

723

: 01:03:53

um So that's where this stereographic thing comes from.

724

: 01:03:58

So if you do a stereographic projection,

725

: 01:04:01

and you have it as 1.

726

: 01:04:03

that would be the diagonal value of, so let's take one of the vectors.

727

: 01:04:08

It would be a 1.

728

: 01:04:10

And then you do a unit vector of normal.

729

: 01:04:13

So basically, you just take standard normals, and then you do the norm of that factor with

the 1 here.

730

: 01:04:21

Well, that will get you exactly what you need for this.

731

: 01:04:25

And it works really well.

732

: 01:04:27

It's super fast and super efficient.

733

: 01:04:29

um

734

: 01:04:30

And yeah, if you work it all out, it comes to be the stereographic transform from the

positive pole of the sphere, is like, and I was like, oh man, this is so cool.

735

: 01:04:43

And yeah, without even having like the derivatives or anything, if I just like code it as

like a function and stay in without like the C++, it's like oftentimes more efficient than

736

: 01:04:53

the built in one.

737

: 01:04:54

And it's way better numerically

738

: 01:04:59

because it doesn't have this issue with the stick breaking process, where you get to a

very small number, right?

739

: 01:05:06

And you get problems around zero.

740

: 01:05:09

And so it doesn't have that.

741

: 01:05:11

And so it samples way better from an LKJF1.

742

: 01:05:15

That's cool.

743

: 01:05:16

Is that just in TensorFlow probability itself?

744

: 01:05:18

Is that in the package, or is that somewhere else?

745

: 01:05:22

I don't know.

746

: 01:05:24

um Anyway, I can put some.

747

: 01:05:27

I'm not sure where it is, but Adam was the one who told me about it and was like, this is

how I implemented it.

748

: 01:05:36

And then I like, yeah, I just...

749

: 01:05:38

That sounds cool.

750

: 01:05:40

couldn't follow completely, but it sounds cool.

751

: 01:05:43

I mean, I can give you, I don't know if you can follow the standard, but I can give you

just the function that I use for it and you can kind of see how it all works.

752

: 01:05:53

I could probably even have like chat GPT show the...

753

: 01:05:57

how it goes from stereographic transform to this and you can see the connection there.

754

: 01:06:03

That took quite a bit to figure um out exactly what's going on.

755

: 01:06:07

But I was like, how can this uh work if you're setting the last element to one?

756

: 01:06:14

That blew my mind.

757

: 01:06:15

um the reason why is because there is a pinch.

758

: 01:06:18

So we were talking about all this geometry of these spaces.

759

: 01:06:21

ah And there's just something uh fundamentally you can't

760

: 01:06:26

sample from this value, same with a unit vector, you need to push everything away from

zero, right?

761

: 01:06:32

ah Because there's a singularity there.

762

: 01:06:36

But if you set that singularity, then everything samples great.

763

: 01:06:46

Nice.

764

: 01:06:46

Yeah.

765

: 01:06:46

mean, for sure, you have like, so if that's in TFP, um definitely we should link to that

in the show notes.

766

: 01:06:54

or if you have any public link to share with us, I'll put that in the show notes.

767

: 01:06:59

yeah, in any case, the function you were talking about, sure, we're curious about that.

768

: 01:07:04

But was like, if we can add that to, to point C and that helps, for sure that could be,

that would be, that would be great.

769

: 01:07:12

um Well, for sure.

770

: 01:07:13

Damn.

771

: 01:07:14

Nice.

772

: 01:07:15

Thanks, Sean.

773

: 01:07:15

Super cool.

774

: 01:07:17

So I had so many more questions for you, but I know you, got a drop.

775

: 01:07:21

So like,

776

: 01:07:23

Actually, let's call that a show.

777

: 01:07:26

Before I ask you the last two questions, is there any other topic you wanted to talk about

or mention and that we didn't get to yet?

778

: 01:07:37

I mean, maybe you can just have me on again.

779

: 01:07:41

We'll do more topics that way.

780

: 01:07:44

Yeah.

781

: 01:07:44

That's great.

782

: 01:07:46

Awesome.

783

: 01:07:46

Great.

784

: 01:07:47

All right.

785

: 01:07:48

So let's close that up.

786

: 01:07:50

And before that, of course, I'll ask you the

787

: 01:07:52

Last two questions, ask every guest at end of the show.

788

: 01:07:56

So first one, if you had unlimited time and resources, which problem would you try to

solve?

789

: 01:08:03

This is like, this is such a hard question.

790

: 01:08:06

uh And like always, like, I feel like the pithy answer is that, you know, why are we here?

791

: 01:08:11

What's the Douglas Adams, you know, like what's the answer to the universe?

792

: 01:08:15

Like the fundamental question.

793

: 01:08:16

uh Because I feel like everything uh from that question, like

794

: 01:08:22

We can, once you kind of have an idea of what the meaning of life is, uh you can really

focus on fixing all these other problems.

795

: 01:08:32

uh So climate change, wars, all that stuff.

796

: 01:08:36

But if we had a really, really good answer, ah I think that would be something.

797

: 01:08:42

If we could understand it, if it wasn't completely incomprehensible, that's what I would

spend all my time trying to figure out.

798

: 01:08:51

What do you do if there is no meaning?

799

: 01:08:55

don't know, global warming.

800

: 01:09:01

Poverty, war, like just make people so like they're not, you hurting.

801

: 01:09:08

Second question, if you could have dinner with any great scientific mind, dead or alive or

fictional, who would it be?

802

: 01:09:16

I feel like, yeah, this was also a hard question.

803

: 01:09:19

was like, man, who is like...

804

: 01:09:24

someone that would be just fun to be around because there are people who are super smart,

but I wouldn't necessarily want to have like dinner or a beer with them, right?

805

: 01:09:31

Like they're really cool.

806

: 01:09:33

Um, so I came up with DaVinci just he has done so many interesting things.

807

: 01:09:41

I have no idea how he, you know, figured all this stuff out, how he was so driven, um, to

come up and to just study so many different areas.

808

: 01:09:51

Um, would love to talk with him.

809

: 01:09:54

have some wine and, you know.

810

: 01:09:56

how's your Italian?

811

: 01:09:58

Oh, Italian's terrible.

812

: 01:10:00

He's got to learn English.

813

: 01:10:02

He could probably do that in like a few days.

814

: 01:10:07

Yeah, I mean, that sounds like a nice dinner for sure.

815

: 01:10:09

I would definitely do the dinner in Italy, so you know, like, you might want to work on

your Italian.

816

: 01:10:16

Awesome.

817

: 01:10:17

Well, great, Sean, that was absolutely awesome to have you here, learned.

818

: 01:10:23

so much I'm gonna listen to that episode quite a lot when editing because there is a lot

of stuff I've missed but I will make sure to ask you questions.

819

: 01:10:32

Adrian thank you so much for taking the time that was awesome.

820

: 01:10:35

Thank you.

821

: 01:10:35

You're a great co-host you know so if you want to change careers like please let me know.

822

: 01:10:42

Yeah that was great to have you both on the show you're both welcome anytime and yeah

thanks again Sean and Adrian for taking the time and being on this show.

823

: 01:10:54

Thank you, Alex.

824

: 01:10:55

See you guys.

825

: 01:11:00

This has been another episode of Learning Bayesian Statistics.

826

: 01:11:04

Be sure to rate, review and follow the show on your favorite podcatcher and visit

learnbaystats.com for more resources about today's topics as well as access to more

827

: 01:11:14

episodes to help you reach true Bayesian state of mind.

828

: 01:11:18

That's learnbaystats.com.

829

: 01:11:20

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraam.

830

: 01:11:25

Check out his awesome work at bababrinkman.com.

831

: 01:11:28

I'm your host.

832

: 01:11:30

Alex and Dora.

833

: 01:11:31

can follow me on Twitter at Alex underscore and Dora like the country.

834

: 01:11:35

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

835

: 01:11:42

Thank you so much for listening and for your support.

836

: 01:11:44

You're truly a good Bayesian.

837

: 01:11:47

Change your predictions after taking information in and if you're thinking I'll be less

than amazing.

838

: 01:11:54

Let's adjust those expectations.

839

: 01:11:57

Let me show you how to

840

: 01:11:59

Good days here, change calculations after taking fresh data in Those predictions that your

brain is making Let's get them on a solid foundation

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube