Artwork for podcast Learning Bayesian Statistics
#91, Exploring European Football Analytics, with Max Göbel
Episode 9120th September 2023 • Learning Bayesian Statistics • Alexandre Andorra
00:00:00 01:04:13

Share Episode

Shownotes

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

As you may know, I’m kind of a nerd. And I also love football — I've been a PSG fan since I’m 5 years old, so I’ve lived it all with this club.. And yet, I’ve never done a European-centered football analytics episode because, well, the US are much more advanced when it comes to sports analytics.

But today, I’m happy to say this day has come: a sports analytics episode where we can actually talk about European football. And that is thanks to Maximilan Göbel.

Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan. Before that, he did his PhD in Economics at the Lisbon School of Economics and Management. 

Max is a very passionate football fan and played himself for almost 25 years in his local football club. Unfortunately, he had to give it up when starting his PhD — don’t worry, he still goes to the gym, or goes running and sometimes cycling.

Max is also a great cook, inspired by all kinds of Italian food, and an avid podcast listener — from financial news, to health and fitness content, and even a mysterious and entertaining Bayesian podcast…

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau and Luis Fonseca.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

Max's paper using Bayesian inference:

Forecasting Arctic Sea Ice:

Some of Max’s coauthors:

Abstract

by Christoph Bamberg

We already covered baseball analytics in the U.S.A. with Jim Albert in episode 85 and looked back at the decade long history of sports analytics there. How does it look like in Europe? 

To talk about this we got Max Göbel on the show. Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan and holds a PhD in Economics from the Lisbon School of Economics and Management.

What qualifies him to talk about the sports-side of sports analytics is his passion for football and decades of playing experience. 

So, can sports analytics in Europe compete with analytics in the U.S.A.? Unfortunately, not yet. Many sports clubs do not use models in their hiring decisions, leading to suboptimal choices based on players’ reputation alone, as Max explains.

He designed a factor model for the performance of single players, borrowing from his econometrics expertise (check it out on his webpage, link in the show notes). 

We talk about how to grow this model from a simple and straight-forward Bernoulli model for the rate of scored goals to a multilevel model, incorporating other players. And of course, we discuss the benefits for using Bayesian statistics for this modelling problem.

We also cover sport analytics more generally and why it may not be so widely used in European football clubs yet. 

Besides his interest in football analytics, Max worked and works on topics in econometrics such as regression forecasting in the U.S.A., asset pricing and applying econometric methods to climate change issues like climate change forecasting and sea ice disappearance.

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Transcripts

Speaker:

[Alex Andorra]: Maximilian Göbel, welcome to Learning Basian Statistics.

Speaker:

[Max]: Thanks Alex.

Speaker:

[Alex Andorra]: Oh, yeah. Thank you for, for taking the time. I'm really excited about this

Speaker:

[Alex Andorra]: episode. Um, I'm really having a variety of, uh, of, uh, podcast episodes

Speaker:

[Alex Andorra]: these days. Um, going from, so episode nine 89 is going to get out in a

Speaker:

[Alex Andorra]: few days. Uh, and, uh, you'll see it's about sports also, but it's about

Speaker:

[Alex Andorra]: the science of, um, sports and nutrition. of exercise and nutrition. And so

Speaker:

[Alex Andorra]: today we're going to talk a lot about sports also, but more about football

Speaker:

[Alex Andorra]: or soccer as it's known in the US. So that's going to be a fun one. And I'm

Speaker:

[Alex Andorra]: really happy to have you on the show because you are German. So if I remember

Speaker:

[Alex Andorra]: correctly, Germany is in Europe. And so you would be the first soccer analytics

Speaker:

[Alex Andorra]: episode Europe centered, which is cool. Yeah, it's one of the things I'm saying

Speaker:

[Alex Andorra]: we should do more here in Europe. But before that, as usual, we'll start with

Speaker:

[Alex Andorra]: your origin story. Max, how did you come to the world of econometrics and

Speaker:

[Alex Andorra]: machine learning? Because it's actually what you're doing most of the time,

Speaker:

[Alex Andorra]: if I understood correctly.

Speaker:

[Max]: Yeah, yeah, you're right, Alex. Well, actually, it's been well, if I say it's quite

Speaker:

[Max]: a journey, it sounds dramatic. But that's, that's not the case. But it took me quite a

Speaker:

[Max]: while, let's say. Yeah, that's maybe the better framing.

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: I started out in my PhD, basically, the first year is, you know, there's just some

Speaker:

[Max]: coursework. But I went into the PhD without really having something that I really wanted

Speaker:

[Max]: to work on in particular. So I took the first year to see which courses I like, which

Speaker:

[Max]: not. And at my university, it was not really allowed to choose from. I mean, we had

Speaker:

[Max]: macroeconomics, microeconomics, and econometrics, the usual stuff. But yeah, really nothing resonated

Speaker:

[Max]: with me so much, I have to say. And then I thought I would do some macro, macroeconomics.

Speaker:

[Max]: I think many, many people, or most of the people. PhD students really want to do

Speaker:

[Max]: something in that field. So it was also me. But yeah, I really never got familiar with

Speaker:

[Max]: that stuff so much. I never really liked it. But in the second year, then there was

Speaker:

[Max]: a course of computational economics. And I liked that quite a lot. And it was also,

Speaker:

[Max]: let's say a tough schedule. I had to prepare a proposal within a week and I didn't

Speaker:

[Max]: have any idea about computational economics. But that really got me into looking into that

Speaker:

[Max]: stuff very deeply or deeper, let's say. And

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: so, yeah, basically what I was working on there was some clustering, some unsupervised

Speaker:

[Max]: learning basically, but it wasn't really a fancy machine learning back then. So what

Speaker:

[Max]: I did

Speaker:

[Alex Andorra]: Heh.

Speaker:

[Max]: was like the project was related to clustering community structure in the SMP 500 basically,

Speaker:

[Max]: that was the project. And... Yeah, but I really thought, oh, this network analysis,

Speaker:

[Max]: this community structure detection, that's really cool. I want to work on that. And yeah,

Speaker:

[Max]: so I thought this would be basically the outline for the rest of my PhD. And how

Speaker:

[Max]: did I get into economics and machine learning then? Because it wasn't really related

Speaker:

[Max]: to or not really machine learning, what I was doing back then. So

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: how do I get there then? It wasn't until the third year, basically until I got luckily

Speaker:

[Max]: invited to the University of Pennsylvania as a visiting student. And I got introduced,

Speaker:

[Max]: I got invited by Francis Diebold and I'll be forever grateful for him for inviting

Speaker:

[Max]: me there. And he had a research group on econometrics. And at that time, the topic

Speaker:

[Max]: was about climate. And I, again, I thought, well, I'm, I don't care about the topic actually.

Speaker:

[Max]: I just want to learn whatever. Yeah. comes to me. And so, yeah, I took that opportunity.

Speaker:

[Max]: He introduced me to his research group. And they were working on climate on climate

Speaker:

[Max]: forecasting, climate econometrics. And that's how I got basically really introduced

Speaker:

[Max]: into econometrics. Because before I went to the University of Pennsylvania, I thought

Speaker:

[Max]: like, yeah, I basically know what's going on. And I have this and this project. And that's

Speaker:

[Max]: cool. But when I really arrived there, I really got to know what PhD in economics

Speaker:

[Max]: is really about. And yeah, that was pretty insightful, I would say.

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: And that's how I got introduced, basically, through this research group, through projects

Speaker:

[Max]: that we were working on. And then there was one guy, he was Frank's RA. And yeah, he

Speaker:

[Max]: was working on machine learning, in particular. And

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: basically, a couple of weeks in, he came to me and asked me, well, Max, you want to

Speaker:

[Max]: get me that and that data? And we can work on a project. started off a long, well,

Speaker:

[Max]: quite well, a couple of years now of co-authorship with him with Philippe Goulicolom,

Speaker:

[Max]: who is now a professor at UCAM in the University of Quebec at Montreal. And he's

Speaker:

[Max]: working a lot on machine learning. And he basically introduced me to that sphere.

Speaker:

[Max]: And so in the end, it was the third year of my PhD that I got introduced into econometrics

Speaker:

[Max]: and machine learning. And yeah, quite late, as I would say. Yeah. Better late than

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: never maybe.

Speaker:

[Alex Andorra]: I mean, better late than never. Right? So it's cool. And you seem to enjoy

Speaker:

[Alex Andorra]: that. So that's super fun. And so today, what are we doing? Basically, how

Speaker:

[Alex Andorra]: would you define the work you're doing nowadays and the topics you are particularly

Speaker:

[Alex Andorra]: interested in?

Speaker:

[Max]: Yeah, well, that's a good question. And because everyone I got asked that question,

Speaker:

[Max]: I also already or always had a difficult time actually saying

Speaker:

[Alex Andorra]: Hehehe

Speaker:

[Max]: because I was doing something here, something there. So

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: in between, I also thought I would like to get back to macroeconomics actually, but

Speaker:

[Max]: after spending a couple of months on something there and it didn't really work out,

Speaker:

[Max]: I completely ditched it at least for the meantime. So what I'm working now is basically

Speaker:

[Max]: machine learning and macroeconomic forecasting, let's say. I have a project on recession forecasting

Speaker:

[Max]: in the United States, which is probably a hot topic currently. Everyone is awaiting

Speaker:

[Max]: it, but it doesn't really seem to occur. So you have to wait a couple of months more.

Speaker:

[Max]: And then the other stuff is basically related to climate, a lot of climate forecasting.

Speaker:

[Max]: especially about Arctic sea ice, how Arctic sea ice is projected to evolve in the

Speaker:

[Max]: future, not only in the near future, but also in the, let's say, longer run. So

Speaker:

[Max]: when Arctic sea ice might potentially disappear, there are a couple of

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: projects on that are still related to that climate econometrics group. And then the

Speaker:

[Max]: other stuff is basically, yeah, I mentioned learning. And I got really interested in finance,

Speaker:

[Max]: asset pricing, what you can do.

Speaker:

[Max]: predicting stock returns, using machine learning tools there. That's super fascinating.

Speaker:

[Max]: And yeah, just I mean, I have to say that I'm not a specialist in machine learning

Speaker:

[Max]: or so. I'm just super interested and fascinated by the tools and the problems that

Speaker:

[Max]: come with them. So yeah, there's a lot of, well, they are powerful, but. Applying

Speaker:

[Max]: them to finance and economics also comes with some drawbacks. So yeah, you have to work

Speaker:

[Max]: around that. And it makes it super interesting.

Speaker:

[Alex Andorra]: Yeah, yeah, yeah. Yeah, for sure. And I mean, that's

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: probably by being really interested in a topic that you end up being a specialist

Speaker:

[Alex Andorra]: of it. So it's like you don't really start being a specialist and then being

Speaker:

[Alex Andorra]: interested in the subject. It's like the causality go the other way around.

Speaker:

[Alex Andorra]: So that's

Speaker:

[Max]: Thank

Speaker:

[Alex Andorra]: good.

Speaker:

[Max]: you.

Speaker:

[Alex Andorra]: Like trying a lot of things is how you end up finding. what you're really

Speaker:

[Alex Andorra]: passionate about. Yeah, awesome. And I'm curious actually, in the research realm

Speaker:

[Alex Andorra]: of economics, which tools do you use, machine learning tools, to work in

Speaker:

[Alex Andorra]: these models? I'm guessing a lot of open source package, I'm hoping. Because

Speaker:

[Alex Andorra]: I remember I was introduced a bit to, I mean, I knew a bit the econometrics

Speaker:

[Alex Andorra]: economics field in Europe a few years ago and they were using Stata all

Speaker:

[Alex Andorra]: over the place. So I'm curious if that changed and how that changed.

Speaker:

[Max]: Oh yeah, that's a funny question. Because Stata, yeah, I mean some people love Stata.

Speaker:

[Max]: I'm actually at the complete other end of the distribution. So

Speaker:

[Alex Andorra]: haha

Speaker:

[Max]: I always try to avoid it as much as I can. I don't know, I never really liked it.

Speaker:

[Max]: So what I'm using is basically R and Python.

Speaker:

[Alex Andorra]: Okay.

Speaker:

[Max]: I also worked a bit on MATLAB. I like MATLAB actually a lot.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But yeah, now I'm mostly working in R and Python. And it really depends. Sometimes

Speaker:

[Max]: I prefer R. Sometimes I prefer Python. For machine learning, I'm mostly using Python.

Speaker:

[Max]: Well, let's say for machine learning, I'm actually using R, let's say, when it comes

Speaker:

[Max]: to random forest or

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: gradient boosted trees or something like that or just plain LASA or Ridge. When it comes

Speaker:

[Max]: to deep learning, then I'm using Python. So TensorFlow, now I'm trying to switch to

Speaker:

[Max]: PyTorch, actually.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: And yeah, so that's basically the patch that I'm using. Yeah.

Speaker:

[Alex Andorra]: Yeah, interesting. And how do you choose the tool, the particular tool you're

Speaker:

[Alex Andorra]: using for a particular project?

Speaker:

[Max]: Yeah, that's a good question. I think that's mostly an art rather than a science,

Speaker:

[Max]: I would say. And it's up to your preference. But not all tools work in every context, right?

Speaker:

[Max]: So in economics, it's really the problem, especially in, I would say, macroeconomic forecasting,

Speaker:

[Max]: where you have time series of, let's say, it gets until 700 observations on a monthly

Speaker:

[Max]: basis for the United States maybe. And then you have a feature set of, let's say,

Speaker:

[Max]: 100 features when you include lags and all that. You can pump it up maybe to 1,000

Speaker:

[Max]: or something. But for machine learning or for deep learning, this is still rather

Speaker:

[Max]: a small data set, I would say. So that's ridiculous, actually.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But still, that's then the challenge, right? To tune them, to train them so that

Speaker:

[Max]: they don't overfit. And that's really the interesting part for me, I think. And yeah.

Speaker:

[Max]: In other contexts, other tools might work much more conveniently, let's say, or

Speaker:

[Max]: are much easier to apply. So some lasso or so when you have a lot of features and you

Speaker:

[Max]: just don't know which features are important, then you, yeah. I like lasso in that regard

Speaker:

[Max]: because it selects basically the features for you. Or you might say, well, you're a file

Speaker:

[Max]: As a pricing context, we have returns, a lot of noise in their signal-to-noise ratio,

Speaker:

[Max]: very, very low. You really don't know which features are important. So we just maybe

Speaker:

[Max]: the better option, because Lasso would basically set almost everything to zero. Yeah,

Speaker:

[Max]: so it really depends. You really have to make it dependent on the context that you're

Speaker:

[Max]: working in. And

Speaker:

[Alex Andorra]: Hehehe

Speaker:

[Max]: yeah, but that's also interesting to see which models prefer or work well on which

Speaker:

[Max]: data sets and which contexts. And yeah, I'm still learning in that regard. And that's

Speaker:

[Max]: super interesting.

Speaker:

[Alex Andorra]: Yeah, yeah, yeah. No, for sure. And I find that super interesting also to see

Speaker:

[Alex Andorra]: this ability of open source tools to basically be adopted more and more

Speaker:

[Alex Andorra]: in your research, which of course, I'm extremely biased, but I welcome. But also

Speaker:

[Alex Andorra]: mainly because I do think that open data and open source are natural consequence,

Speaker:

[Alex Andorra]: but also cause, I would say, of... more open science, which I definitely

Speaker:

[Alex Andorra]: welcome and I think should be way more of the case, you know, like more and

Speaker:

[Alex Andorra]: more you see papers with accompanying GitHub repositories and accompanying GitHub

Speaker:

[Alex Andorra]: open source packages even in Python or in R, which is definitely something

Speaker:

[Alex Andorra]: new. And that's super cool that the research realm is catching up on that.

Speaker:

[Alex Andorra]: Um, because less and less you see papers where I remember a few years ago,

Speaker:

[Alex Andorra]: you know, like the first open say the open science and, or open data papers

Speaker:

[Alex Andorra]: was like, Oh yeah, the data is available by the way. Um, at the end of

Speaker:

[Alex Andorra]: the paper, you know, and then you had to basically beg the, the corresponding

Speaker:

[Alex Andorra]: author about like three times a week for four months to get some of the data

Speaker:

[Alex Andorra]: and that was not really open basically, um, so yeah, that, that's a really

Speaker:

[Alex Andorra]: cool. development that I really love. I have to say.

Speaker:

[Max]: No, absolutely. And this is also, I think that's a very good point. For example, me and

Speaker:

[Max]: my co-authors, or my co-authors are pushing for that, really, to make the codes then also

Speaker:

[Max]: available on the website, for example, so that people can cross-check. And that's

Speaker:

[Max]: very good. And yeah, I like that also myself. When I read papers and I want to replicate

Speaker:

[Max]: something and the authors are making the code available, basically, you can check

Speaker:

[Max]: if your own code is correct. That's super helpful. You learn a lot by that. And yeah,

Speaker:

[Max]: really, really. Especially when, for example, using GustaTrees or so. I mean, it's

Speaker:

[Max]: XGBoost, and it's super convenient to use. And for sure, there's some tuning that

Speaker:

[Max]: you have to do yourself. But still, the package is there, basically. And it's super

Speaker:

[Max]: convenient to use. You don't have to cope the whole forest, basically, yourself.

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: So yeah, for sure. That's

Speaker:

[Alex Andorra]: Yeah, yeah,

Speaker:

[Max]: amazing.

Speaker:

[Alex Andorra]: yeah. No, clearly. Yeah, that's super nice and well done and like picking up

Speaker:

[Alex Andorra]: all those different tools and different

Speaker:

[Max]: Mm-hmm.

Speaker:

[Alex Andorra]: languages. That's super cool. And I don't know how it changed, but I do remember

Speaker:

[Alex Andorra]: that a few years ago, doing open source development wasn't really incentivized

Speaker:

[Alex Andorra]: for doctoral candidates or post-doctoral candidates, so maybe that changed and that's

Speaker:

[Alex Andorra]: further better. But if that didn't, the fact that you're doing it is like

Speaker:

[Alex Andorra]: even more commentable, I would say, because that's a bit adjacent to your

Speaker:

[Alex Andorra]: project. So yeah, well done on doing that and taking the time to do it.

Speaker:

[Alex Andorra]: That's what we're called for sure. Um, so now I'd like to talk a bit about,

Speaker:

[Alex Andorra]: yeah. So you said you're doing econometrics, but, um, can you define econometrics

Speaker:

[Alex Andorra]: for us and, and tell us what it brings to economics basically.

Speaker:

[Max]: Yeah, sure. So a lot of weight now for me on

Speaker:

[Alex Andorra]: haha

Speaker:

[Max]: giving the textbook definition of econometrics.

Speaker:

[Alex Andorra]: Yeah, exactly.

Speaker:

[Max]: No, I mean, it's basically, or now I'm butchering the whole definition probably. But

Speaker:

[Max]: it's applying statistical tools to an economic context and trying

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: to use statistical tools to basically verify some economic theory or some. to understand

Speaker:

[Max]: some relationships between economic variables. So I think it's a, yeah, I think that that's

Speaker:

[Max]: basically it. It's kind of a fancier term for what it actually is, applying statistical

Speaker:

[Max]: tools for understanding economic relationships. That's basically it. I mean, it's essential.

Speaker:

[Max]: I mean, for empirical work, for sure they're economists who you only work on theory,

Speaker:

[Max]: but yeah, for policy analysis or for... you need to analyze the data in the end. And

Speaker:

[Max]: basically, that's what I'm doing. I don't really do theory stuff, but for me, it's just

Speaker:

[Max]: all empirical. And yeah, so definitely, it's very useful in the end, especially for

Speaker:

[Max]: policymaking at central banks and everywhere, also for the industry, be it

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: banking industry or be it just normal in the real economy for

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: analyzing demand and all that.

Speaker:

[Alex Andorra]: And do you... So I'm curious how you got introduced to Bayesian methods,

Speaker:

[Alex Andorra]: actually, and why they stuck with you, because from what I remember, from

Speaker:

[Alex Andorra]: the world of econometrics, Bayes was not used a lot in this field. So I'm actually

Speaker:

[Alex Andorra]: curious why you are using it.

Speaker:

[Max]: Yeah. Well, I have to admit, like, so I already said that it was like third year

Speaker:

[Max]: that I got to introduce in Jekyll and the Matrix. And that was this project when

Speaker:

[Max]: Philippe, Frank's RA basically

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: came to me and asked me to gather some data on climate variables because we want to

Speaker:

[Max]: run a vector autoregression of the Arctic. Basically, you basically get some, what we

Speaker:

[Max]: basically did is we gathered data. and which time series on certain climate variables,

Speaker:

[Max]: which we

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: thought would proxy for the Arctic ecosystem basically. And then we wanted to use a vector

Speaker:

[Max]: autoregression to analyze certain amplification mechanisms, if there is a shock to CO2, for

Speaker:

[Max]: example, and also to be able to produce long-run forecasting projections. So when Arctic

Speaker:

[Max]: seas might potentially

Speaker:

[Max]: disappear in the future.

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: And so the data is highly non-stationary. And

Speaker:

[Alex Andorra]: Uh huh. Uh huh.

Speaker:

[Max]: in VARs, when you work with VARs, most economists really work with patient methods

Speaker:

[Max]: there. And as I said, data was highly stationary. So patient statistics or the patient's

Speaker:

[Max]: framework gives you some leeway there, granted some freedom there. So that was big.

Speaker:

[Max]: Yeah, that was why Felipe told me, okay, look at Bayesian VARs, look at the Bayesian

Speaker:

[Max]: way. And that's how I actually got introduced to that. And there was at the time, I really

Speaker:

[Max]: didn't have any exposure. So there was a package in MATLAB for doing Bayesian inference,

Speaker:

[Max]: basically, with VARs. And that was super helpful. That helped me a lot. That was super,

Speaker:

[Max]: or a great education, a source of education, really, that was great. And The more I learned

Speaker:

[Max]: about it, the more it resonated with me, this concept of quantifying uncertainty.

Speaker:

[Max]: I think this is because especially in economics, this is quintessential to really

Speaker:

[Max]: get an idea of

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: what the uncertainty is. Point estimate is always nice, but you want to have the uncertainty

Speaker:

[Max]: around it. And that's also what Frank Biber always told us. Yeah, you want to have

Speaker:

[Max]: a measure of uncertainty. And definitely, that's true. Yeah, you get it from the in the

Speaker:

[Max]: Bayesian framework. It's just so intuitive to think about it. And yeah, I like that a

Speaker:

[Max]: lot. And unfortunately, I don't really work so much or haven't worked in so many projects

Speaker:

[Max]: with Bayesian methods lately, or not as much as I would like to. But yeah, it's

Speaker:

[Max]: ever since resonated with me. And yeah. I. Still, I wanted to learn more, and that's

Speaker:

[Max]: how basically I got into looking at PyMC, because I wanted to learn with Python, and

Speaker:

[Max]: thought, well, maybe an application with Bayesian methods, the Bayesian framework would

Speaker:

[Max]: be cool to learn, and that's how I got into PyMC 3, or PyMC basically, or looked at

Speaker:

[Max]: it and looked at it. So, yeah.

Speaker:

[Alex Andorra]: Yeah, yeah, yeah. Nice. That's interesting. So yeah, basically, it's like

Speaker:

[Alex Andorra]: the uncertainty quantifying that was really important to you.

Speaker:

[Max]: Exactly. So that was really the key, the key point there.

Speaker:

[Alex Andorra]: Yeah. I mean, that does make sense, right? Because, yeah, that's really

Speaker:

[Alex Andorra]: one of the parts where bass does shine a lot. And also, especially for

Speaker:

[Alex Andorra]: the Arctic sea ice project that you are talking about. It's not like it's a

Speaker:

[Alex Andorra]: reproducible experiment. It's really hard in these cases to think from a

Speaker:

[Alex Andorra]: frequentist framework of repeatable experiments. You cannot have multiple earths

Speaker:

[Alex Andorra]: on which you can two RCTs where you melt the ice caps or not, and you melt

Speaker:

[Alex Andorra]: it naturally. like naturally or thanks to human intervention. It's just

Speaker:

[Alex Andorra]: like, it doesn't work in that case. So yeah, Base, I'm not surprised that

Speaker:

[Alex Andorra]: it would be a project where Base fits way more naturally.

Speaker:

[Max]: Yeah, no, that's for sure. I mean, for example, these climate models from these climate

Speaker:

[Max]: institutions, these are huge models. And big models, to train them or to run these

Speaker:

[Max]: models, it takes a lot of time. And they are very sophisticated. So really, really sophisticated.

Speaker:

[Max]: But they are basically deterministic models. And they give you a point estimate

Speaker:

[Max]: in the end. But our... interest was basically really to see, well, we get a point estimate,

Speaker:

[Max]: but we also want to see, especially when you project the path of Arctic sea ice, the

Speaker:

[Max]: uncertainty around it. Well, how likely is it that maybe or that we see Arctic sea

Speaker:

[Max]: ice disappearing, not at our point estimate in the 2060s or 70s, but beforehand? Like how

Speaker:

[Max]: large is the uncertainty? Maybe our model is really not good and the uncertainty is so

Speaker:

[Max]: much all over the place that it's more or less useless. But yeah, and that project

Speaker:

[Max]: was actually interesting to see that the uncertainty or the credible region was

Speaker:

[Max]: basically spanning like 20 years, 25 years around. So that was very interesting.

Speaker:

[Max]: And it gave us a quick quantification of uncertainty to it. Yeah, that was really,

Speaker:

[Max]: really interesting.

Speaker:

[Alex Andorra]: Yeah, yeah, yeah. Nice. Uh, they, I love that. Uh, and I mean, I would

Speaker:

[Alex Andorra]: have, that's really interesting for me to, to talk with someone who recently

Speaker:

[Alex Andorra]: got into the Bayesian framework and to understand how you get into it and why,

Speaker:

[Alex Andorra]: and, and how, uh, so I would have a lot of other questions on that, but

Speaker:

[Alex Andorra]: I want to talk about football or soccer, so let's, let's switch to that and

Speaker:

[Alex Andorra]: then if we have time at the end of the episode, I'll come back with my,

Speaker:

[Alex Andorra]: um, nerdy, uh. Educational questions. So yeah, basically you have an area or a hobby

Speaker:

[Alex Andorra]: of yours where you do apply and need actually Beijing stats and that's

Speaker:

[Alex Andorra]: soccer analytics. First, I read a bit your website and I saw you were a passionate

Speaker:

[Alex Andorra]: football since you were a child and you mentioned a bunch of European championships.

Speaker:

[Alex Andorra]: Not the French one though. I was absolutely outraged. What happened? What

Speaker:

[Alex Andorra]: happened? Like, don't you get the French games in Germany?

Speaker:

[Max]: Oh yeah, well that's another issue. So when I was younger really, I mean it was only

Speaker:

[Max]: the Bundesliga and sometimes when you were lucky, sometimes you got the highlights

Speaker:

[Max]: of the French Premier League and the Serie A, but yeah you had to be really lucky,

Speaker:

[Max]: it was not always available and I wasn't that...

Speaker:

[Max]: Yeah, I didn't know the websites where you could watch it basically. So

Speaker:

[Alex Andorra]: Hehehehe

Speaker:

[Max]: that was another issue. But yeah, the French, well, the French league, I was never

Speaker:

[Max]: really a fan of. I'm sorry, Alex. But yeah, that's just even though one of my favorite

Speaker:

[Max]: players was Joao Gopic. So Olympic

Speaker:

[Alex Andorra]: Oh,

Speaker:

[Max]: Rio.

Speaker:

[Alex Andorra]: really?

Speaker:

[Max]: Yeah,

Speaker:

[Alex Andorra]: Oh,

Speaker:

[Max]: yeah, yeah. So

Speaker:

[Alex Andorra]: he went

Speaker:

[Max]: yeah.

Speaker:

[Alex Andorra]: to Milan. Yeah. Um,

Speaker:

[Alex Andorra]: yeah, no offense taken. I think the French league is pretty boring. Um, and,

Speaker:

[Alex Andorra]: uh, yeah, as long

Speaker:

[Max]: as

Speaker:

[Alex Andorra]: as,

Speaker:

[Max]: the Bundesliga.

Speaker:

[Alex Andorra]: I mean, yeah, um, as long as PSG is dominating like that, uh, I mean, that's

Speaker:

[Alex Andorra]: good for me because, um, I'm a PSG fan since I'm like five year olds, uh,

Speaker:

[Alex Andorra]: but yeah, like, uh, it's not a very interesting league. And the level is

Speaker:

[Alex Andorra]: kind of going down by the gears. So hopefully we'll get some investors in other

Speaker:

[Alex Andorra]: clubs, which make for a good competition for Paris, but until now it's really

Speaker:

[Alex Andorra]: bad. And it's actually bad for Paris because the competition inside the country

Speaker:

[Alex Andorra]: is really bad. So then when they get on the European stage, they are not

Speaker:

[Alex Andorra]: really used to the intensity and having so much. adversity in a way. So,

Speaker:

[Alex Andorra]: yeah, it's too easy for them, let's say. So basically, but I didn't get you

Speaker:

[Alex Andorra]: on the show to trash the French league. I want to talk about soccer factor

Speaker:

[Alex Andorra]: model that you recently worked on. And I found it super interesting because

Speaker:

[Alex Andorra]: that's mainly, yeah, the main question I always have in soccer analytics.

Speaker:

[Alex Andorra]: The nerd in me is always very careful about the hot takes that you see the

Speaker:

[Alex Andorra]: commentators have about players where it's like, yeah, but what's the, how

Speaker:

[Alex Andorra]: do you separate a player's skill from the ability, skills and ability from his

Speaker:

[Alex Andorra]: team's strength? And that's to me is extremely important because mostly

Speaker:

[Alex Andorra]: in Europe, right now, most of the clubs... mainly invest on players on gut

Speaker:

[Alex Andorra]: feeling, basically. And the thing is when you do that and you're not able

Speaker:

[Alex Andorra]: to separate inherent player abilities from team strength, then you get

Speaker:

[Alex Andorra]: kind of an aura effect from the beginning of your carrier that can follow

Speaker:

[Alex Andorra]: you, even though you're not that good of a player, but basically, like

Speaker:

[Alex Andorra]: this aura can follow you even though you are not making that much of a difference.

Speaker:

[Alex Andorra]: But it's just like, it's hard to contradict it because you don't really have

Speaker:

[Alex Andorra]: the method of the scientific way of disproving basically what's going on.

Speaker:

[Alex Andorra]: That actually, well, it's not really your inherent abilities but mainly the

Speaker:

[Alex Andorra]: people you're surrounded with. And I think it's like absolutely important

Speaker:

[Alex Andorra]: to do that and should lead to... really a revolutionized way of transferring

Speaker:

[Alex Andorra]: players and signing them and so on. So, that was basically the background

Speaker:

[Alex Andorra]: for people who are not interested in football. Even though, even if the field

Speaker:

[Alex Andorra]: doesn't interest you, I think the method and the goal of the model is actually

Speaker:

[Alex Andorra]: extremely important because you can also think about that in finance, for

Speaker:

[Alex Andorra]: instance, like I know a lot more work has been done in finance for that

Speaker:

[Alex Andorra]: because I mean, the return or. Basically, the incentives of the money are

Speaker:

[Alex Andorra]: much more important because you know if you make money or not. But I know

Speaker:

[Alex Andorra]: there is a lot of literature right on basically passive investment versus

Speaker:

[Alex Andorra]: active investment. And how do you actually prove that an active investment

Speaker:

[Alex Andorra]: is better than a passive one and that it's actually due to the skills of

Speaker:

[Alex Andorra]: the person who invested on the market instead of just random market fluctuation?

Speaker:

[Alex Andorra]: So you can see that in a lot of contexts where you can see that. Basically,

Speaker:

[Alex Andorra]: information is sparse, is hard to decipher, and so you need a model to make

Speaker:

[Alex Andorra]: sense of it. So you can see that, I would say, in football, in a lot of

Speaker:

[Alex Andorra]: sports, in finance, in medicine also, right, where it's like you can have a

Speaker:

[Alex Andorra]: lot of these celebrity effect basically. I think in a lot of contexts where

Speaker:

[Alex Andorra]: celebrity effect is important, it can be broken down by that scientific way

Speaker:

[Alex Andorra]: of estimating it. So these... politics, of course, movie. I think it's basically

Speaker:

[Alex Andorra]: a theme that's running in a lot of fields where the celebrity effect is

Speaker:

[Alex Andorra]: extremely big. So yeah, that was a very long introduction.

Speaker:

[Max]: Yeah.

Speaker:

[Alex Andorra]: But to say that, I think it's very useful. So you can react to what I said

Speaker:

[Alex Andorra]: and also afterwards, if you can tell us what a factor model is. Because

Speaker:

[Alex Andorra]: your model is very, You could lead the soccer factor model, but then can

Speaker:

[Alex Andorra]: you tell us before that what a factor model is?

Speaker:

[Max]: Yeah. No, Alex, I mean, you laid it out perfectly. I couldn't have said it any more

Speaker:

[Max]: accurately, I would say, really on the point as far as I see that. So a factor model,

Speaker:

[Max]: what it actually is, is a factor basically as some, I would define it as

Speaker:

[Max]: some proxy for a certain. exposure to a certain, in finance to a certain risk basically.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: Also a reduction for example in when you look at economics or macroeconomics it's

Speaker:

[Max]: often related to the context you have a huge set of features and you reduce it to

Speaker:

[Max]: a couple of underlying factors or a single factor only. It's a kind of a feature reduction

Speaker:

[Max]: like dimensionally reduction technique like PCA.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: principal component analysis or that. But in finance, it's really like a proxy for

Speaker:

[Max]: a certain risk exposure that basically the cross-section of stock returns or all stock

Speaker:

[Max]: returns are exposed to a certain systematic risk exposure.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: All stock returns are basically exposed to it. This is basically a factor. And

Speaker:

[Alex Andorra]: Yep.

Speaker:

[Max]: in the literature, and as surprising as identified, several of these and yeah. common

Speaker:

[Max]: risk exposures basically across the whole universe of stocks basically. But as you already

Speaker:

[Max]: said, you can use it also as quantifying the ability, for example, of a portfolio manager.

Speaker:

[Max]: So if he has some skill in the game, basically if he has really superior selection

Speaker:

[Max]: potential, then just following along these. common risk exposures, basically.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: And that's also what this Stalker Factor Model basically is inspired by, to identify

Speaker:

[Max]: certain features that all players are exposed to because of the differences in the

Speaker:

[Max]: teams. And then when you account for that, then you can basically extract the skill

Speaker:

[Max]: and the inherent ability of each player after you account for these systematic differences

Speaker:

[Max]: across teams basically that influences

Speaker:

[Alex Andorra]: Hmph.

Speaker:

[Max]: the ability or the observed performance of a player.

Speaker:

[Alex Andorra]: Yeah, yeah, for sure. Yeah, for sure. Because like in the example of football,

Speaker:

[Alex Andorra]: like you'd say it's easier to be the number nine. So the, how do you say

Speaker:

[Alex Andorra]: in English that position, like the front, playing. Number nine is like the

Speaker:

[Alex Andorra]: guy who's supposed to score the goals. Like the English natives can then

Speaker:

[Alex Andorra]: tell me what the, the name is in French that would be Atacon. It's easier

Speaker:

[Alex Andorra]: to be the number nine of PSG than the number nine of a very small team in

Speaker:

[Alex Andorra]: France, because the whole, the rest of the team is stronger. The manager is

Speaker:

[Alex Andorra]: supposed to be stronger and so on. So, yeah, you're like, yeah, but maybe

Speaker:

[Alex Andorra]: if you took the number nine of the small team and you put it in Paris,

Speaker:

[Alex Andorra]: maybe he would perform as well as the current number nine does. So how do

Speaker:

[Alex Andorra]: you make the difference? So that's what we're going to talk about. Before

Speaker:

[Alex Andorra]: that, I'm curious, from a structural standpoint, these kind of factor models, how

Speaker:

[Alex Andorra]: do they work? How much time do you need to really start to decipher the

Speaker:

[Alex Andorra]: difference between inherent skills and exhaustion as basically strength?

Speaker:

[Alex Andorra]: And that question is basically, how much data you need from the past years

Speaker:

[Alex Andorra]: to start having an idea like how data hungry are those models.

Speaker:

[Max]: Yeah, so that's definitely a good question, a good point. So you have to create these,

Speaker:

[Max]: yeah, you have, so in the model that I'm basically proposing is, Basically, I need

Speaker:

[Max]: a lead time into the season to really account for certain differences. So I need

Speaker:

[Max]: a couple of games already that

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: would need to be played to really account for differences in teams. Because before the

Speaker:

[Max]: first game, basically, everything, or based on the data that I had, everyone

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: would have been the same.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But it depends really on the data. If you have data that allows you to account for

Speaker:

[Max]: differences across teams, batch it.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: let's say, or so you can just start right

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: away. And for overall data, I would say like more data is always better. If you have

Speaker:

[Max]: only a few observations, I think the Bayesian framework is then tailor made for

Speaker:

[Max]: that as well. Like it's yeah, it grants you some leeway there. But I would say really,

Speaker:

[Max]: it's the more data you have, the better. But yeah.

Speaker:

[Alex Andorra]: But you could already, OK, so you could already start having that idea with

Speaker:

[Alex Andorra]: just a few games. Then you get the idea of the strength of the team. And then

Speaker:

[Alex Andorra]: you can start deciphering the strengths of the player. OK.

Speaker:

[Max]: Exactly, exactly.

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: But as far as I always used a certain number of, let's say, burn-in games

Speaker:

[Alex Andorra]: Yeah,

Speaker:

[Max]: to

Speaker:

[Alex Andorra]: yeah,

Speaker:

[Max]: really account

Speaker:

[Alex Andorra]: yeah.

Speaker:

[Max]: for that.

Speaker:

[Alex Andorra]: Yeah. And I mean, it's not that superficial, right? Because you can think like

Speaker:

[Alex Andorra]: right now it's August, it's the beginning of the leagues for the European

Speaker:

[Alex Andorra]: teams. August is a weird moment where the teams are still warming up basically.

Speaker:

[Alex Andorra]: Um, and they are not really, they are clearly not at peak performance. Usually

Speaker:

[Alex Andorra]: they try to peak around spring for the Northern hemisphere. So around March.

Speaker:

[Alex Andorra]: from February to May, basically, they are trying to get their peak. So they

Speaker:

[Alex Andorra]: are still warming up. They can still trade players until the end of August.

Speaker:

[Alex Andorra]: So you could really say that the games they are doing in August, even though

Speaker:

[Alex Andorra]: they are official games, they are still warming up games and don't really

Speaker:

[Alex Andorra]: mean a lot for a long-term performance perspective. So that's an interesting moment

Speaker:

[Alex Andorra]: to start warming up the model, I'd say. And so, but something I mean, and

Speaker:

[Alex Andorra]: maybe you have that for future iterations of the model where you could put

Speaker:

[Alex Andorra]: in the priors. Um, we're going to talk about the structure of the model, uh,

Speaker:

[Alex Andorra]: right away, right after that, but, uh, something I'm thinking about is that

Speaker:

[Alex Andorra]: you could put in the prior, the information that you have about the strengths

Speaker:

[Alex Andorra]: of the team in, in the way that, yeah, you have the budget, which is a good

Speaker:

[Alex Andorra]: proxy for potential future performance. But also, like, just past performance. If you

Speaker:

[Alex Andorra]: know that Paris has been the champion for nine years out of 10, well, you

Speaker:

[Alex Andorra]: have really good prior about the strengths of the team. So you can

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: probably also add that into the model and in that way reduce the warming

Speaker:

[Alex Andorra]: up period of the model.

Speaker:

[Max]: Yeah, no, absolutely. Or how Paris against Lyon, let's say, has performed in

Speaker:

[Alex Andorra]: Yep.

Speaker:

[Max]: the past. So they're direct comparison between those teams, basically, when they faced

Speaker:

[Max]: each other for past years. That would also feed in there. Yeah, so absolutely. There's

Speaker:

[Max]: a lot of potential. And my model is,

Speaker:

[Alex Andorra]: Mm-hmm. Yeah.

Speaker:

[Max]: when you're basically suggesting this stuff, my model just appears very rudimentary.

Speaker:

[Max]: But it could be definitely. extended in that regard.

Speaker:

[Alex Andorra]: Yeah, I mean, that's the fun thing of model and rights. It's like you have

Speaker:

[Alex Andorra]: to start somewhere that's good enough, and then you have a lot of ideas to

Speaker:

[Alex Andorra]: extend it. And it's a never-ending endeavor. Like, each model, if you want to

Speaker:

[Alex Andorra]: do your good work on it your whole life, if you're interested enough, you

Speaker:

[Alex Andorra]: definitely can do that. I know my models that I often revisit are the ones

Speaker:

[Alex Andorra]: for predicting French presidential elections. when I started doing that in 2017

Speaker:

[Alex Andorra]: and compared to the one I had for 2022, it's just embarrassing.

Speaker:

[Alex Andorra]: But in a way, it's good that the work you're doing right now is the best

Speaker:

[Alex Andorra]: one you've ever done. And in a few years, when you look at the work you're

Speaker:

[Alex Andorra]: doing right now, it should be the worst you've ever done because that means

Speaker:

[Alex Andorra]: you've... progressed a lot in the meantime. So I think it's a good mindset.

Speaker:

[Alex Andorra]: So how did you adapt that factor model for soccer? Like how, what does the model

Speaker:

[Alex Andorra]: structure look like basically for listeners to have an idea? And for those

Speaker:

[Alex Andorra]: watching on YouTube, you can share your screen actually. So if you want

Speaker:

[Alex Andorra]: to share anything at some point, feel free to do it. Otherwise, the audio format

Speaker:

[Alex Andorra]: is here for you because it's a podcast. So it's an audio first content.

Speaker:

[Max]: Perfect. Yeah. So yeah, maybe if I get it on the screen, I'll do that. But for now,

Speaker:

[Max]: maybe the structure, I think, is pretty simple. And as you laid it out already very,

Speaker:

[Max]: very accurately, it's basically trying to come up with some features, do some feature

Speaker:

[Max]: engineering that basically accounts for differences across teams. And well, when you

Speaker:

[Max]: look at, let's say, player a certain player, let's say, Cristiano Ronaldo. And you

Speaker:

[Max]: really want to account for the difference that his current team is currently between

Speaker:

[Max]: his team and the team that he's facing at that exact instance. And you want to create

Speaker:

[Max]: some features that can proxy for these differences across teams. And that's basically

Speaker:

[Max]: the heart of the model. And this is basically inspired by these asset pricing factors that

Speaker:

[Max]: try to account for. differences across assets, across stocks, across firms, basically.

Speaker:

[Max]: And the modeling part itself is really nothing sophisticated. You can include kind

Speaker:

[Max]: of a hierarchical structure where you don't need to, but it can help, definitely.

Speaker:

[Max]: But it's really the feature engineering that is at the heart of it. And then IMC comes

Speaker:

[Max]: in very conveniently and just basically. That's the dirty work for you.

Speaker:

[Alex Andorra]: Mm-hmm. And so what's the, so then that's cool. If it's a simple structure,

Speaker:

[Alex Andorra]: yeah, can you talk about what was your likelihood

Speaker:

[Max]: Thanks

Speaker:

[Alex Andorra]: and then

Speaker:

[Max]: for watching!

Speaker:

[Alex Andorra]: what kind of distribution you put on the parameters and things like that?

Speaker:

[Alex Andorra]: I think it would be a fun thing to talk about for the listeners.

Speaker:

[Max]: Sure, sure. Then maybe I just get the workbook loaded.

Speaker:

[Alex Andorra]: Oh yeah.

Speaker:

[Max]: So maybe I can share my screen and couple

Speaker:

[Alex Andorra]: Yes,

Speaker:

[Max]: of...

Speaker:

[Alex Andorra]: you should be able to.

Speaker:

[Max]: Let me see.

Speaker:

[Max]: So in terms of a likelihood, basically, or what the model structure is, so I have to

Speaker:

[Max]: proxy, I need some observed measurement of a player's performance. Not

Speaker:

[Alex Andorra]: Yes.

Speaker:

[Max]: a skill, I mean, that is something that is underlying, that is latent, that we want

Speaker:

[Max]: to identify.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But we need some observed measure of player performance. What I used is scoring

Speaker:

[Max]: goals. Did players score a goal in a certain game or not? So basically, 0, 1, basically

Speaker:

[Max]: binomial distributed, and basically, the logistic regression it is. You want

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: to identify the probability of a player's scoring. And so now I have it. I guess I have

Speaker:

[Max]: it here.

Speaker:

[Alex Andorra]: you may have

Speaker:

[Max]: Um,

Speaker:

[Alex Andorra]: to authorize Google Chrome to share.

Speaker:

[Max]: exactly,

Speaker:

[Alex Andorra]: Oh

Speaker:

[Max]: exactly. That

Speaker:

[Alex Andorra]: yeah.

Speaker:

[Max]: unfortunately takes a bit of time. Um, Sorry, I guess I'll be

Speaker:

[Alex Andorra]: Yeah,

Speaker:

[Max]: here in a second.

Speaker:

[Alex Andorra]: it's all good. Yep. It's all good. You can do that and

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: come back. I don't know what's going to happen for the recording, but I already

Speaker:

[Alex Andorra]: did that. After all, it's no problem.

Speaker:

[Max]: Sorry, I didn't.

Speaker:

[Alex Andorra]: I mean, it's the first time I do it. So I didn't know it either.

Speaker:

[Max]: Ah, okay, here it is. Wait.

Speaker:

[Max]: Is it Joe? Ah.

Speaker:

[Alex Andorra]: So

Speaker:

[Max]: No.

Speaker:

[Alex Andorra]: I think you need to give permission.

Speaker:

[Max]: Yeah, exactly. That's

Speaker:

[Alex Andorra]: And open

Speaker:

[Max]: one.

Speaker:

[Alex Andorra]: your computer system settings and click privacy and security.

Speaker:

[Max]: Well, maybe.

Speaker:

[Alex Andorra]: Apparently, if you open your system settings, and then you go

Speaker:

[Max]: Yeah

Speaker:

[Alex Andorra]: to privacy and security, and you click screen recording, and allow your

Speaker:

[Alex Andorra]: browser to share your screen. I think you need to allow Google Chrome to

Speaker:

[Alex Andorra]: share your screen.

Speaker:

[Max]: mm-hmm yeah I was there but ah yeah okay now

Speaker:

[Alex Andorra]: I mean

Speaker:

[Max]: maybe

Speaker:

[Alex Andorra]: otherwise it's no chip.

Speaker:

[Max]: Okay.

Speaker:

[Max]: Okay. Sorry for that.

Speaker:

[Alex Andorra]: So let's see.

Speaker:

[Max]: No? That's what I wanna

Speaker:

[Alex Andorra]: Yeah, it

Speaker:

[Max]: do with

Speaker:

[Alex Andorra]: may

Speaker:

[Max]: that guess.

Speaker:

[Alex Andorra]: be

Speaker:

[Max]: Sorry.

Speaker:

[Alex Andorra]: because you have to get out to quit Google Chrome and then come back. Are

Speaker:

[Alex Andorra]: you on Mac?

Speaker:

[Max]: Yeah, yeah, exactly.

Speaker:

[Alex Andorra]: Yeah, so you probably need to close Google Chrome and then come back. But

Speaker:

[Alex Andorra]: you can do that. And then you come back to the same link I sent you.

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: And then it should work. Maybe I'll have to

Speaker:

[Max]: OK.

Speaker:

[Alex Andorra]: do another recording, but that's OK. I can edit that after once. It's easy.

Speaker:

[Max]: Okay, okay.

Speaker:

[Alex Andorra]: So I'll wait for you here. Yeah.

Speaker:

[Max]: Okay. I'm back Alex. Sorry.

Speaker:

[Max]: Sorry Alex, I cannot hear you currently.

Speaker:

[Alex Andorra]: Yes, that's normal. I was muted. So cool. I didn't even have to start a new

Speaker:

[Alex Andorra]: recording. You can just join the room again. Cool. First time it happened,

Speaker:

[Alex Andorra]: so I didn't know what would happen. So cool. Perfect. So does it work now?

Speaker:

[Max]: Let's

Speaker:

[Alex Andorra]: Let's

Speaker:

[Max]: see.

Speaker:

[Alex Andorra]: try.

Speaker:

[Alex Andorra]: No,

Speaker:

[Max]: No,

Speaker:

[Alex Andorra]: still not.

Speaker:

[Max]: no,

Speaker:

[Alex Andorra]: That's weird.

Speaker:

[Max]: no. I'll give it a last try and otherwise I just.

Speaker:

[Alex Andorra]: Yeah, otherwise it's okay, but...

Speaker:

[Max]: Yeah, Google

Speaker:

[Alex Andorra]: It's

Speaker:

[Max]: Chrome,

Speaker:

[Alex Andorra]: weird.

Speaker:

[Max]: it's there.

Speaker:

[Alex Andorra]: It should work.

Speaker:

[Max]: I allowed it. So I don't know, Google Chrome, it's fine. It can access.

Speaker:

[Max]: but

Speaker:

[Alex Andorra]: I'm checking that it could be on my end maybe, so...

Speaker:

[Max]: screen

Speaker:

[Alex Andorra]: Yeah, no,

Speaker:

[Max]: the

Speaker:

[Alex Andorra]: on

Speaker:

[Max]: window

Speaker:

[Alex Andorra]: my end also it's all good, so...

Speaker:

[Max]: And I'm sorry, no, fortunately it doesn't

Speaker:

[Alex Andorra]: No, weird.

Speaker:

[Max]: work.

Speaker:

[Alex Andorra]: Anyways, that's OK. So well, then let's continue between the

Speaker:

[Max]: Thanks for watching.

Speaker:

[Alex Andorra]: screen sharing. You can just talk through it. It's no problem.

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: I've

Speaker:

[Max]: Yeah.

Speaker:

[Alex Andorra]: done it. We've done it for a lot of podcast episodes.

Speaker:

[Max]: OK. Yeah, so the structure basically is relatively simple. You need some idea of

Speaker:

[Max]: what the performance of the player is. And you have to have a proxy for that.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: And well, you need this performance to be observed, obviously. And the proxy that

Speaker:

[Max]: I choose for a player's performance is whether he scores a goal or not, so 0 or 1

Speaker:

[Max]: in a certain game. We're normally distributed our y, our target. And it's basically a logistic

Speaker:

[Max]: regression that we are running. Because what we want to identify is really the skill

Speaker:

[Max]: and the ability, latent variable hidden in our observed performance measure, basically.

Speaker:

[Max]: And so the model is pretty simple. You need the prior. You have basically a bunch

Speaker:

[Max]: of coefficients. That is, you have the alpha. the skill, the ability that you're interested

Speaker:

[Max]: in. And then you have the loadings, the coefficients on all the factors that are in

Speaker:

[Max]: your model. So you basically have to impose priors for all the coefficients.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: And then you have to define the likelihood, the newly distributed. And yeah, that's basically

Speaker:

[Max]: the model. It's on the workbook. And people can go through it. There's also a redacted

Speaker:

[Max]: version, basically, where you're People, if they are fancy, can try to work with their

Speaker:

[Max]: own priors and all that and try to do it themselves first and check the unredacted

Speaker:

[Max]: version.

Speaker:

[Alex Andorra]: Oh, that's cool.

Speaker:

[Max]: So they want to play with that a bit.

Speaker:

[Alex Andorra]: Nice.

Speaker:

[Max]: Yeah, that's basically it. So it's nothing really crazy. It's the four lines of code,

Speaker:

[Max]: the basic model, basically. And yeah, when you look at multiple players, so you can

Speaker:

[Max]: do that for a single player only, but you can also do that for sure for multiple

Speaker:

[Max]: players. The key reason is that. Basically, everyone should be exposed to the, each player

Speaker:

[Max]: should be exposed to these factors with the same loading basically. So you can expose,

Speaker:

[Max]: impose a hierarchical structure on the ability and skill of each player. You should

Speaker:

[Max]: definitely do that, but you can post the hierarchical structure by player or also

Speaker:

[Max]: by season. So the ability of the player may evolve over seasons or across seasons basically.

Speaker:

[Alex Andorra]: Mm-hmm, mm-hmm, yeah.

Speaker:

[Max]: That's, I think. something worth looking into or worthwhile doing. And then basically

Speaker:

[Max]: you have the loadings on the factors and they should account for the team effort

Speaker:

[Max]: basically. You want to account that and you want to get that out of the way so that

Speaker:

[Max]: you're basically in the end left with this latent factor, the alpha, the inherent

Speaker:

[Max]: skill and ability of the player.

Speaker:

[Alex Andorra]: Yeah, yeah, yeah. OK. Yeah, that makes sense. And I mean, for sure, I will

Speaker:

[Alex Andorra]: put all of these in your episode's show notes. And actually, I think I can share

Speaker:

[Alex Andorra]: my screen. I didn't know why I didn't think about that before. And here

Speaker:

[Alex Andorra]: is the notebook, right? Am I on the right notebook?

Speaker:

[Max]: Exactly.

Speaker:

[Alex Andorra]: Yeah, perfect.

Speaker:

[Max]: Yeah, yeah,

Speaker:

[Alex Andorra]: So.

Speaker:

[Max]: yeah. So there are a couple of notebooks there. So there's this in the Pyamicon folder,

Speaker:

[Max]: that's the one where there's the redacted version and the unredacted version and the

Speaker:

[Max]: version that we're currently looking on. That's the initial part with all its typos

Speaker:

[Max]: in there.

Speaker:

[Alex Andorra]: Ah ok, so it's not the right one. Then, should look at

Speaker:

[Max]: It's

Speaker:

[Alex Andorra]: another

Speaker:

[Max]: fine

Speaker:

[Alex Andorra]: one.

Speaker:

[Max]: one, so it's perfect. The other one is just a bit smaller and more concise, I would

Speaker:

[Max]: say.

Speaker:

[Alex Andorra]: Ah, here. Unredacted. Perfect. Yeah, I have it here. So yeah, like for those

Speaker:

[Alex Andorra]: of you watching on YouTube, I'm charging it right now. And so basically,

Speaker:

[Alex Andorra]: this is the part of the model where you're talking about the likelihood,

Speaker:

[Alex Andorra]: where it's goal is scored or not scored. And then you have here the probability,

Speaker:

[Alex Andorra]: which is basically here. this alpha that you talked about, right? That

Speaker:

[Max]: Exactly.

Speaker:

[Alex Andorra]: is the inherent skill of the player which enters probability. And you have

Speaker:

[Alex Andorra]: the Xs and the beta. So the Xs, are they the factors or the beta are the

Speaker:

[Alex Andorra]: factors?

Speaker:

[Max]: So the Xs are the factors. These are the differences across the teams or between

Speaker:

[Max]: the teams. And this is what you want to basically account for and to clean the observed

Speaker:

[Max]: performance measure from. Yeah.

Speaker:

[Alex Andorra]: Yeah, yeah. Oh, yeah, OK. Yeah, for sure. And then the beta is the slope, basically,

Speaker:

[Alex Andorra]: on the factors. Yeah, yeah,

Speaker:

[Max]: Exactly.

Speaker:

[Alex Andorra]: yeah. Yeah, yeah, it's a fun model. So of course, it's hard to make it just

Speaker:

[Alex Andorra]: this on the podcast. But I encourage you to go and watch that part on YouTube. I'm

Speaker:

[Alex Andorra]: sharing it right now. And also, you can just take a look at the notebook from

Speaker:

[Alex Andorra]: Max, which I put in the show notes, where you have all the details. So it's

Speaker:

[Alex Andorra]: pretty fun to look at. And also, as you were saying, the model is pretty small.

Speaker:

[Alex Andorra]: So that's the amazing thing that I find is that basically, and now if we

Speaker:

[Alex Andorra]: go look at the Prime C implementation, so a bit later

Speaker:

[Max]: Oh.

Speaker:

[Alex Andorra]: down in the model, the really cool thing is that basically the model is quite

Speaker:

[Alex Andorra]: easy to code, right? And in a way, that's just a few lines of codes, so

Speaker:

[Alex Andorra]: basically four lines of codes, as you were saying, and you're done. So that's

Speaker:

[Alex Andorra]: the beauty of the probabilistic programming framework, right? It's a really

Speaker:

[Alex Andorra]: useful model. But if you want to get to a first good enough version that

Speaker:

[Alex Andorra]: already gives you interesting insights, you don't have to reinvent everything.

Speaker:

[Alex Andorra]: And you don't have to go with the first, hardest version from the start,

Speaker:

[Alex Andorra]: where you have a hierarchical time series model where everything is varying

Speaker:

[Alex Andorra]: and pulling information. Sure, that's cool. But don't start with that. It's

Speaker:

[Alex Andorra]: like if you're starting to train, don't start with 100 push-ups. Start by like

Speaker:

[Alex Andorra]: try five first, and then do a few series of them. build your way up to

Speaker:

[Alex Andorra]: 100. So that's the critical thing I find of here at the patient framework

Speaker:

[Alex Andorra]: coupled to the part of probabilistic programming languages, which is you can get

Speaker:

[Alex Andorra]: down to a first good enough version and then in a few lines of codes having

Speaker:

[Alex Andorra]: your version and then sampling from it. Because here you have it on the screen.

Speaker:

[Alex Andorra]: The likelihood that you have a line for deterministic, which is the. logistic

Speaker:

[Alex Andorra]: regression line, and then you have your intercept and your coefficient on

Speaker:

[Alex Andorra]: the factors. And basically that's it. That's really amazing.

Speaker:

[Max]: Absolutely. No, that's, I think, the beauty of Climacy that it allows you to describe

Speaker:

[Max]: or build your model in a pretty intuitive way. And you can even let it be printed out

Speaker:

[Max]: to see if everything is as you would have expected. And

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: yeah, then Climacy does the dirty work, the sampling and all that for you. And yeah,

Speaker:

[Max]: but it already gives you an intuitive idea of how the modeling works. And yeah, that's

Speaker:

[Max]: absolutely

Speaker:

[Alex Andorra]: Yeah, yeah, yeah.

Speaker:

[Max]: super

Speaker:

[Alex Andorra]: No,

Speaker:

[Max]: cool.

Speaker:

[Alex Andorra]: it's really fun. Well done on that. And so I'm curious, what are your, do

Speaker:

[Alex Andorra]: you have any ideas? Do you want to keep working on this model? Do you have

Speaker:

[Alex Andorra]: any ideas on where to take it from what it is right now? Um.

Speaker:

[Max]: Yeah, that's a good question, actually. So definitely the model can be improved. And

Speaker:

[Max]: definitely, it's all depending on the features that you have and the data that you

Speaker:

[Max]: have. And I think the clubs, they have so much more

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: interesting data than I have. And they could build many, many more interesting factors

Speaker:

[Max]: according to our differences across

Speaker:

[Alex Andorra]: Oh yeah,

Speaker:

[Max]: teams.

Speaker:

[Alex Andorra]: for sure.

Speaker:

[Max]: So yeah, I really don't know because I tried to reach out to a couple of clubs,

Speaker:

[Max]: let's say. But I don't know. there was nothing really coming back. So yeah, apparently,

Speaker:

[Max]: perhaps they're not interested in that or maybe they have their own models already

Speaker:

[Max]: or something. So I really don't know. I'd be excited to work on that. But as you

Speaker:

[Max]: said, it's rather a side project that I did once upon a time. And yeah, it's not

Speaker:

[Max]: really related to economics or finance. That's why I'm currently working absolutely

Speaker:

[Max]: on other stuff. But yeah, I would love to work on that in that regard. But yeah, it

Speaker:

[Max]: seems not. not so many teams are picking up on that, at least to those that I reached

Speaker:

[Max]: out. And it seems to be European clubs. Um, because in part of your last episodes,

Speaker:

[Max]: I heard people talking about that in the United States, it's pretty different. And,

Speaker:

[Max]: um, yeah, uh, there are a lot of, apparently a lot of clubs already trying to implement

Speaker:

[Max]: that to really try to understand the inherent latent skill of, of players, not necessarily

Speaker:

[Max]: in soccer, but in baseball or in

Speaker:

[Alex Andorra]: Yeah,

Speaker:

[Max]: other,

Speaker:

[Alex Andorra]: oh, especially

Speaker:

[Max]: um, in other disciplines.

Speaker:

[Alex Andorra]: baseball. Yeah, yeah, yeah. So this is sad, but I'm kind of reassured to

Speaker:

[Alex Andorra]: hear you say that because I do think it's a huge area of improvement that

Speaker:

[Alex Andorra]: there is in Europe. And clubs just don't seem to be very interested. The

Speaker:

[Alex Andorra]: thing I know is that a few English clubs are using data pretty heavily, like Liverpool.

Speaker:

[Alex Andorra]: Manchester City, clubs like that, but still is kind of the exception. I

Speaker:

[Alex Andorra]: know Toulouse now in France, which is a small club, and that makes sense.

Speaker:

[Alex Andorra]: If you're a small club, you have less money, so you have much more competitive

Speaker:

[Alex Andorra]: pressure to find good players, which you are not overpaying, which is basically

Speaker:

[Alex Andorra]: where science can help you. You don't want to pay for just a name. You

Speaker:

[Alex Andorra]: want to pay for someone who has a name because... he's got talent, not

Speaker:

[Alex Andorra]: just because he's got a name. So it's like, to me, everybody should do that.

Speaker:

[Alex Andorra]: And I just don't understand why they don't. Because it's just like, that's

Speaker:

[Alex Andorra]: also the beauty of sport, right, you don't care about the name, you care about

Speaker:

[Alex Andorra]: what someone can do and if they have talent or not. Like, you should not care

Speaker:

[Alex Andorra]: at all about the name, about the color of the skin, about nothing else,

Speaker:

[Alex Andorra]: but what they can do on the field. And... Yeah, like to me that if I had

Speaker:

[Alex Andorra]: a club, that would be one of my first priority. How do we make sure we optimize

Speaker:

[Alex Andorra]: the way we are signing the players because it costs a lot of money. So.

Speaker:

[Max]: I think one club that also does a lot of that data work is in Denmark, the FC Midjartland

Speaker:

[Max]: or something. I think

Speaker:

[Alex Andorra]: Uh-huh.

Speaker:

[Max]: the name I got it completely wrong. But I heard once upon a time that they're really

Speaker:

[Max]: investing a lot in data science and trying to assign players according to data or at least

Speaker:

[Max]: incorporate data a lot in their daily training exercises and all that. So yeah, they

Speaker:

[Max]: are one of the cutting edge maybe there in Europe as well. Small club,

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: but yeah. I think they won the Danish Championship a couple of years ago.

Speaker:

[Alex Andorra]: Yeah, not surprised. I mean, something I see a lot, at least in France,

Speaker:

[Alex Andorra]: and I've seen that a lot also on electoral forecasting, is basically this

Speaker:

[Alex Andorra]: idea that if you start doing that, you're basically becoming kind of inhuman

Speaker:

[Alex Andorra]: and you make players being robots. Basically, that's really an interesting thing

Speaker:

[Alex Andorra]: to me because one of the spots that really use data heavily is cycling. A

Speaker:

[Alex Andorra]: lot of the teams are using now data. Here, again, thanks a lot to the British,

Speaker:

[Alex Andorra]: which often in Europe are the first ones to take up the data wave. And so

Speaker:

[Alex Andorra]: I know, for instance, Bradley Wiggins, I think he's won the Tour de France.

Speaker:

[Alex Andorra]: I don't remember how many times, but a lot of times. And basically, a lot. The

Speaker:

[Alex Andorra]: whole team was using data to optimize the performances of the team. And

Speaker:

[Alex Andorra]: that was one, like the British started being like, okay, we need to get back

Speaker:

[Alex Andorra]: on our circling game. They started using data extremely optimally and well, they

Speaker:

[Alex Andorra]: did. And thanks to these, basically a lot of the teams started to do that again.

Speaker:

[Alex Andorra]: And the Tour de France is extremely optimized on that. But it's funny because when

Speaker:

[Alex Andorra]: you hear the mediatic coverage of that, at least in France, it's a bad thing

Speaker:

[Alex Andorra]: because it's like players are becoming robot. and they cannot eat what they

Speaker:

[Alex Andorra]: want at the time they want. And they like, it just gets the magic out of

Speaker:

[Alex Andorra]: the Tour de Francois and I strongly disagree with that, of course, because the

Speaker:

[Alex Andorra]: performances get better in a clean way, of course. Well, then that's just

Speaker:

[Alex Andorra]: better for everybody because the show is going to get better. And also We're

Speaker:

[Alex Andorra]: talking about the Tour de France or professional athletes. Like the goal is

Speaker:

[Alex Andorra]: not to recreationally do that. They do that for a living. Um, so it's important

Speaker:

[Alex Andorra]: for their own basically income. Uh, but also they do that because they want

Speaker:

[Alex Andorra]: to be the best. Is it, they are not doing that because, well, they just

Speaker:

[Alex Andorra]: want to cycle on the weekends, right? They cycle for living. So yeah, sure.

Speaker:

[Alex Andorra]: If you're an amateur cyclist, then okay. You don't need the same. structure

Speaker:

[Alex Andorra]: as a professional cyclist. But even then, if you want to improve your performances

Speaker:

[Alex Andorra]: as an amateur cyclist, you're going to need to optimize some of the things.

Speaker:

[Alex Andorra]: And if you really care about it, you're going to need to optimize your nutrition,

Speaker:

[Alex Andorra]: for instance, and maybe when you take your meals or else. But if you're

Speaker:

[Alex Andorra]: a professional, the one slightest change can mean you're going to have to take

Speaker:

[Alex Andorra]: your meals or else. perform one second better or two seconds better, which

Speaker:

[Alex Andorra]: can make you win the Tour de France or not. So I don't understand this argument

Speaker:

[Alex Andorra]: in these contexts where you're trying to optimize performance. For me, it's

Speaker:

[Alex Andorra]: like not something that should count here. They are not doing that for pleasure

Speaker:

[Alex Andorra]: only.

Speaker:

[Max]: I think absolutely agree. Absolutely agree. It should be incorporated much more,

Speaker:

[Max]: especially for the clubs. In the end, I think it will pay off as you lay it out.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: You want to pick a lemon, and

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: you just rather pick it. Yeah.

Speaker:

[Alex Andorra]: Yeah, yeah. No, I mean, I have to say it's like, it's an interesting topic

Speaker:

[Alex Andorra]: for me because I'm trying to crack that nut and I cannot crack it for now.

Speaker:

[Alex Andorra]: Like, understand why basically the clubs in Europe are not really interested

Speaker:

[Alex Andorra]: in that. Because I don't really care about the Chinese side or else. I'm like,

Speaker:

[Alex Andorra]: once the club starts picking that up, then everybody will have to. But what

Speaker:

[Alex Andorra]: I'm trying to understand is why the clubs don't do that. because it's just

Speaker:

[Alex Andorra]: leaving gates on the table. And I'm just super curious about why they would

Speaker:

[Alex Andorra]: do that from a sociological standpoint, honestly. Because I've seen a lot

Speaker:

[Alex Andorra]: of clubs using, they have data science teams, but they use it for marketing.

Speaker:

[Alex Andorra]: That's

Speaker:

[Max]: I see,

Speaker:

[Alex Andorra]: such a

Speaker:

[Max]: I

Speaker:

[Alex Andorra]: shame.

Speaker:

[Max]: see.

Speaker:

[Alex Andorra]: And I don't know why. So if anybody

Speaker:

[Max]: there.

Speaker:

[Alex Andorra]: knows, please get in touch. If anybody is working in a club, please get

Speaker:

[Alex Andorra]: in touch with Max or me, because I want to know about it. We don't even need

Speaker:

[Alex Andorra]: to work together. I would be happy to help you out with a model, but for

Speaker:

[Alex Andorra]: now, I just want to know why and what are the internal factors, because

Speaker:

[Alex Andorra]: definitely there is something going on, but I don't know what it is, and

Speaker:

[Alex Andorra]: I'm just curious about it. So yeah, to try and make it a bit more constructive,

Speaker:

[Alex Andorra]: do you have any idea on how we personally in the data world could change

Speaker:

[Alex Andorra]: the status quo in that regard? And not only for spots, but that's also true

Speaker:

[Alex Andorra]: for a lot of domain where more robust application of the scientific method

Speaker:

[Alex Andorra]: would be useful. But it's hard to get it done. Do you have any ideas personally

Speaker:

[Alex Andorra]: on how that status quo could be changed?

Speaker:

[Max]: Yeah, I think it's really hard to say. It depends on the willingness to adopt these,

Speaker:

[Max]: to be open to these methods, I would say. And the players play an important part,

Speaker:

[Max]: or I think the crucial part, because if the players are not willing to adopt these

Speaker:

[Max]: additional insights, I would say, it's just not possible.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But for sure, I mean, as you say, it's management, it's internal. things that are

Speaker:

[Max]: going on there, politics potentially, but I really don't know. How can someone resolve

Speaker:

[Max]: that? I don't know. I regard it always as, for sure, you shouldn't base all your decisions

Speaker:

[Max]: on this model or on a single model or so, but it can help

Speaker:

[Alex Andorra]: No for sure.

Speaker:

[Max]: stimulate your decision process, and I think it's a useful addition. And in the

Speaker:

[Alex Andorra]: Yep.

Speaker:

[Max]: end, for sure, there might be an upfront cost, basically, to implement, to get the data,

Speaker:

[Max]: to implement the model, to hire people to produce that, but In the end, it actually

Speaker:

[Max]: may pay off economically because it may save you from picking a lemon overpaying massively.

Speaker:

[Alex Andorra]: Oh yeah, for sure.

Speaker:

[Max]: So

Speaker:

[Alex Andorra]: Yeah, yeah.

Speaker:

[Max]: yeah, I see it really as a worthwhile investment.

Speaker:

[Alex Andorra]: No official,

Speaker:

[Max]: I think the US

Speaker:

[Alex Andorra]: yeah.

Speaker:

[Max]: sports has demonstrated that.

Speaker:

[Alex Andorra]: Yeah, yeah. I mean, just look at the US, just look at all the other fields,

Speaker:

[Alex Andorra]: especially marketing, for instance, which is starting and already started to adopt

Speaker:

[Alex Andorra]: data analysis and modeling aggressively and they just like, we do that all at the labs,

Speaker:

[Alex Andorra]: basically making them save a lot of money and not only save money, but make

Speaker:

[Alex Andorra]: more money. So like, it's just, yeah, like, I don't think this is a question,

Speaker:

[Alex Andorra]: but yeah. I mean, something you can do. I would think if you're interested

Speaker:

[Alex Andorra]: in it and have the time, something maybe that could work is if you could make

Speaker:

[Alex Andorra]: some predictions with your model, basically. And I would think to get it per

Speaker:

[Alex Andorra]: player, you would probably need some hierarchical structure in that to get

Speaker:

[Alex Andorra]: some better predictions. But once you get there, you have something of a

Speaker:

[Alex Andorra]: web page with basically the predictions of the model per player saying

Speaker:

[Alex Andorra]: basically, this player is basically overvalued and this player is undervalued,

Speaker:

[Alex Andorra]: basically based on the results of the model. And then basically see what that

Speaker:

[Alex Andorra]: gives you during the season because at the beginning of the season, you

Speaker:

[Alex Andorra]: can see that player is basically undervalued. He's gonna perform better than

Speaker:

[Alex Andorra]: what the market currently think. And then people see that it's true. All that's

Speaker:

[Alex Andorra]: a clear sign that basically these kind of... methods and models are working

Speaker:

[Alex Andorra]: and so that could spark some interest. Um, because definitely demonstrating

Speaker:

[Alex Andorra]: what a model is for. Because I'm my hinge, hinge hunch. I think it's hunch.

Speaker:

[Alex Andorra]: My hunch is that, um, basically the decision makers in the clubs are not data,

Speaker:

[Alex Andorra]: um, they don't, don't really know what data is about. and they even don't

Speaker:

[Alex Andorra]: know what a model is and what it can give you. But if you are able to demonstrate

Speaker:

[Alex Andorra]: what a model can give you, because they don't care about the model, the priors,

Speaker:

[Alex Andorra]: the parameters, stuff like that, they just care about the results of the model.

Speaker:

[Alex Andorra]: So if you can demonstrate the results of the model and even better what the

Speaker:

[Alex Andorra]: model can say about recruiting that player or not recruiting that player,

Speaker:

[Alex Andorra]: that would maybe have a better impact, or at least I would say it increases

Speaker:

[Alex Andorra]: the probability that the impact... These methods can help get noticed.

Speaker:

[Max]: Oh, absolutely. That's absolutely the case. For sure, it depends on having the real-time

Speaker:

[Max]: data, basically getting the real-time data.

Speaker:

[Alex Andorra]: Exactly. Yeah.

Speaker:

[Max]: That's an upfront cost that you would have to pay. No, but that's actually the intent,

Speaker:

[Max]: really. This is the intent to run that model for multiple players as part of the workbook,

Speaker:

[Max]: for example, to lay it out and to compare which players perform well or not. And you

Speaker:

[Max]: see it, for example, Cristiano Ronaldo, when he won the. player of the year award in

Speaker:

[Max]: 2008. He was basically in the middle of the pack in

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: that season. So there were other players actually outperforming, for example, Imera

Speaker:

[Max]: Berbertov in that very season. He was playing for Tottenham

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: later on in the year, thereafter signed by Manchester United. So you see that. And

Speaker:

[Max]: for sure, there's a lot of subjective judgment coming in from when you observe it

Speaker:

[Max]: and you see the model telling you something completely different. But this is stimulating

Speaker:

[Max]: and it should potentially update your priors, so

Speaker:

[Alex Andorra]: Yeah,

Speaker:

[Max]: your

Speaker:

[Alex Andorra]: exactly.

Speaker:

[Max]: subjective price.

Speaker:

[Alex Andorra]: Yeah. And forces you to lay out your priors clearly

Speaker:

[Max]: Thank

Speaker:

[Alex Andorra]: and

Speaker:

[Max]: you.

Speaker:

[Alex Andorra]: on paper. So it's actually very important. Yeah. So I would say definitely

Speaker:

[Alex Andorra]: something like that. And if you have the predictions for the biggest number

Speaker:

[Alex Andorra]: of players on a webpage and basically betting based on the model, saying

Speaker:

[Alex Andorra]: that this model, this player is going to over perform. in respect to the

Speaker:

[Alex Andorra]: market or underperformed in respect to the market. That's an interesting

Speaker:

[Alex Andorra]: thing. And also, as you were saying, for the individual rewards, where the

Speaker:

[Alex Andorra]: name is extremely, like, counts a lot, where you can see someone like Messi,

Speaker:

[Alex Andorra]: who is, yeah, sure, an incredible player. But the number of times he's got the

Speaker:

[Alex Andorra]: golden... How is it called in English? Ballon d'or? Golden ball, I don't

Speaker:

[Alex Andorra]: know. You could argue that some of these seasons where he did get the reward,

Speaker:

[Alex Andorra]: maybe there were other players who were actually overperforming him, but they

Speaker:

[Alex Andorra]: don't have the name recognition, so they are not scrutinized as much. They don't

Speaker:

[Alex Andorra]: have the confirmation bias going in their favor, where it's like everybody's

Speaker:

[Alex Andorra]: looking at Messi because they already know he's extremely good, so they just

Speaker:

[Alex Andorra]: look at confirming the fact that he's... Incredible, which he is, but maybe

Speaker:

[Alex Andorra]: not all the time, so as to get so many rewards. So yeah, like that. To me,

Speaker:

[Alex Andorra]: that would be a really good way of demonstrating the utility of these methods.

Speaker:

[Alex Andorra]: Basically,

Speaker:

[Max]: Thank

Speaker:

[Alex Andorra]: making

Speaker:

[Max]: you.

Speaker:

[Alex Andorra]: it really concrete for the decision maker.

Speaker:

[Max]: Thank you.

Speaker:

[Alex Andorra]: So before we close up the show, I'd like to get back a bit on your personal

Speaker:

[Alex Andorra]: experience with bass. And I'm curious, what was your main pain point on this

Speaker:

[Alex Andorra]: project, the Sucker Factor model, and just in general, when you're using the

Speaker:

[Alex Andorra]: bassian workflow, what is your main pain point right now?

Speaker:

[Max]: Yeah, so in that project, I really have to admit that Mayer was lucky. But

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: there wasn't really a huge pain point. I mean, it's not

Speaker:

[Alex Andorra]: Uh-huh.

Speaker:

[Max]: something publishable for a paper or so. It's just basically sketching the idea

Speaker:

[Max]: behind the model and basically showing the outline of the model, what it can give

Speaker:

[Max]: you.

Speaker:

[Max]: pretty well. I didn't really, I don't remember any really big problems. So then when

Speaker:

[Max]: I looked at the model evaluation, everything looked fine. I mean, for example, we can evaluate

Speaker:

[Max]: the how well the model works is when you look at in this logistic regression at

Speaker:

[Max]: the area under the curve, for example, it's a popular metric. And it wasn't a reasonable

Speaker:

[Max]: ballpark. And that was fine for me so that the model didn't the results were really

Speaker:

[Max]: what you would have, or that it's kind of reliable, the results. So that was not much

Speaker:

[Max]: of a pain point. And that was also nice for me to see that, yeah, it's a simple model

Speaker:

[Max]: and it works also pretty simply. And yeah, that was a project that I was pleased

Speaker:

[Max]: to see that there were not many obstacles that I had to overcome.

Speaker:

[Alex Andorra]: Nice. Yeah, that's good to hear. And so in general, in the Bayesian workflow,

Speaker:

[Alex Andorra]: do you identify something in your own learning that is costing you to learn

Speaker:

[Alex Andorra]: right now, that has cost you to learn, and you would like an easier way

Speaker:

[Alex Andorra]: to have learned that?

Speaker:

[Max]: I mean, I have to say that, for example, with all the different samplers that are out

Speaker:

[Max]: there, that's not my major field. I would like to learn much, much more about the inner

Speaker:

[Max]: workings of all these samplers. I mean, I code maybe one of the simpler ones, myself

Speaker:

[Max]: maybe once or so, but then I really resort to open source packages for that. But to really

Speaker:

[Max]: understand what's going on, I think, yeah. looking deeper into that, that's definitely

Speaker:

[Max]: something I would like to do and would need to do.

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: But yeah, I think that's basically the math of it. I think it's the most fascinating

Speaker:

[Max]: stuff and how it really works and how it's then implemented in code. I think that's

Speaker:

[Max]: the most fascinating stuff. But yeah, the beauty of PyMC then is if you really are

Speaker:

[Max]: interested in the outcome and want a fast outcome, yeah, it's pretty intuitive.

Speaker:

[Max]: Yeah.

Speaker:

[Alex Andorra]: Nice. OK. Well, it's good to hear. Yeah, and I'm asking that from a developer

Speaker:

[Alex Andorra]: perspective and also teacher perspective. That's always interesting for

Speaker:

[Alex Andorra]: me to get a peek in the learning experience of the people. Cool. So before we

Speaker:

[Alex Andorra]: close up the show, is there a topic I didn't ask you about and that you'd

Speaker:

[Alex Andorra]: like to mention?

Speaker:

[Max]: Well, actually, my career hasn't progressed so much so far. So I think we covered everything

Speaker:

[Max]: there. So, oh yeah, that's pretty interesting. And yeah, you covered actually everything.

Speaker:

[Alex Andorra]: Awesome. Yeah, we did record for a long time, so that's a price.

Speaker:

[Max]: Thank you.

Speaker:

[Alex Andorra]: Yeah, and I'm happy. I got to ask you the main thing I wanted to ask you,

Speaker:

[Alex Andorra]: so that's super cool. In a reasonable amount of time, I'm sure the listeners will

Speaker:

[Alex Andorra]: appreciate it, because the last two episodes were the two longest of the whole

Speaker:

[Alex Andorra]: podcast. So it's good to get back to reasonable amounts of time for people,

Speaker:

[Alex Andorra]: I guess. And yeah, so before letting you go, I'm gonna ask you the last

Speaker:

[Alex Andorra]: two questions I ask every guest at the end of the show. So Max, if you had

Speaker:

[Alex Andorra]: unlimited time and resources, which problem would you try?

Speaker:

[Max]: Yeah, so I think one of the most popular answers is climate change. And definitely,

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: it's, it's probably the most present problem, especially here in Milan currently.

Speaker:

[Max]: You really feel it.

Speaker:

[Alex Andorra]: Ha.

Speaker:

[Max]: But when I've been or throughout the time I've been working on a bit of climate

Speaker:

[Max]: econometrics, let's say, forecasting RTC, as I saw what people are really doing

Speaker:

[Max]: in climate and what, yeah, they're fascinating people out there very, very intelligent people.

Speaker:

[Max]: So I think my throwing money on me would be wasted in that regard. I mean, what I'd

Speaker:

[Max]: be rather interested in is like, yeah, maybe implementing that into sports into sports

Speaker:

[Max]: analytics, right to, to allow teams to access data to have access to data, and

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: to kind of create that level playing field across players and then really, yeah,

Speaker:

[Max]: it's an investment and people spend a lot of, especially in investing and in banking

Speaker:

[Max]: and finance, spend a lot of time on crunching numbers and why not do that in sports as well

Speaker:

[Max]: if you have the data available. So yeah, I'd be very, very interested in working on

Speaker:

[Max]: that. That's for sure.

Speaker:

[Alex Andorra]: Yeah, I love it. Me too, for sure. That's a good one. And if you could have

Speaker:

[Alex Andorra]: dinner with any great scientific mind, dead, alive or fictional, who would it

Speaker:

[Alex Andorra]: be?

Speaker:

[Max]: Yeah, well, that's a that's pretty a tough question, I have to say. So

Speaker:

[Alex Andorra]: Yeah.

Speaker:

[Max]: no, really, it's, yeah, there's so many amazing people out there. And when you read

Speaker:

[Max]: papers, that's really incredible. What people are doing. And so yeah, there's so many

Speaker:

[Max]: people I'd like to talk to you on. Well, one, one for sure. It's Frank Debal, the guy

Speaker:

[Max]: who basically invited me to the University of Pennsylvania, because that was a declining

Speaker:

[Max]: point in my PhD, absolutely. But then if I could pick one as professors should expand

Speaker:

[Max]: on your network, basically, it would be Ben Bernanke. He

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: was former president of the Federal Reserve. He received

Speaker:

[Alex Andorra]: Mm-hmm.

Speaker:

[Max]: the Nobel Prize in economics. Well, people say there's no Nobel Prize in economics, but

Speaker:

[Max]: yeah, the Ricks Bank prize last year for his work on banks and financial crisis.

Speaker:

[Max]: Yeah, that would be super interesting to talk to him. He served his country basically.

Speaker:

[Max]: Then he was assistant professor. So how he managed all that. And yeah, that would be

Speaker:

[Max]: super interesting to talk to him. Phenomenal scholar. And I like reading his papers. So

Speaker:

[Max]: yeah, I think that would be super cool.

Speaker:

[Alex Andorra]: Nice, yeah. Love it. Very nerdy answer.

Speaker:

[Max]: Okay.

Speaker:

[Alex Andorra]: Awesome. Well, thanks a lot, Max. That

Speaker:

[Max]: Thanks, Adam.

Speaker:

[Alex Andorra]: was really interesting. You allowed me to rant about some of my pet peeves

Speaker:

[Alex Andorra]: about

Speaker:

[Max]: Thanks

Speaker:

[Alex Andorra]: data

Speaker:

[Max]: for watching!

Speaker:

[Alex Andorra]: analytics and soccer. And I hope people learned a bit more. And of course,

Speaker:

[Alex Andorra]: if they are curious, as usual, I will put a link. resources and a link to

Speaker:

[Alex Andorra]: your website in the show notes for those who want to dig deeper. Thank you

Speaker:

[Alex Andorra]: again Max for taking the time and being on this show.

Speaker:

[Max]: Thanks Alex. It was a pleasure.

Chapters

Video

More from YouTube