Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
As you may know, I’m kind of a nerd. And I also love football — I've been a PSG fan since I’m 5 years old, so I’ve lived it all with this club.. And yet, I’ve never done a European-centered football analytics episode because, well, the US are much more advanced when it comes to sports analytics.
But today, I’m happy to say this day has come: a sports analytics episode where we can actually talk about European football. And that is thanks to Maximilan Göbel.
Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan. Before that, he did his PhD in Economics at the Lisbon School of Economics and Management.
Max is a very passionate football fan and played himself for almost 25 years in his local football club. Unfortunately, he had to give it up when starting his PhD — don’t worry, he still goes to the gym, or goes running and sometimes cycling.
Max is also a great cook, inspired by all kinds of Italian food, and an avid podcast listener — from financial news, to health and fitness content, and even a mysterious and entertaining Bayesian podcast…
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau and Luis Fonseca.
Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)
Links from the show:
Max's paper using Bayesian inference:
Forecasting Arctic Sea Ice:
Some of Max’s coauthors:
Abstract
We already covered baseball analytics in the U.S.A. with Jim Albert in episode 85 and looked back at the decade long history of sports analytics there. How does it look like in Europe?
To talk about this we got Max Göbel on the show. Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan and holds a PhD in Economics from the Lisbon School of Economics and Management.
What qualifies him to talk about the sports-side of sports analytics is his passion for football and decades of playing experience.
So, can sports analytics in Europe compete with analytics in the U.S.A.? Unfortunately, not yet. Many sports clubs do not use models in their hiring decisions, leading to suboptimal choices based on players’ reputation alone, as Max explains.
He designed a factor model for the performance of single players, borrowing from his econometrics expertise (check it out on his webpage, link in the show notes).
We talk about how to grow this model from a simple and straight-forward Bernoulli model for the rate of scored goals to a multilevel model, incorporating other players. And of course, we discuss the benefits for using Bayesian statistics for this modelling problem.
We also cover sport analytics more generally and why it may not be so widely used in European football clubs yet.
Besides his interest in football analytics, Max worked and works on topics in econometrics such as regression forecasting in the U.S.A., asset pricing and applying econometric methods to climate change issues like climate change forecasting and sea ice disappearance.
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
[Alex Andorra]: Maximilian Göbel, welcome to Learning Basian Statistics.
Speaker:[Max]: Thanks Alex.
Speaker:[Alex Andorra]: Oh, yeah. Thank you for, for taking the time. I'm really excited about this
Speaker:[Alex Andorra]: episode. Um, I'm really having a variety of, uh, of, uh, podcast episodes
Speaker:[Alex Andorra]: these days. Um, going from, so episode nine 89 is going to get out in a
Speaker:[Alex Andorra]: few days. Uh, and, uh, you'll see it's about sports also, but it's about
Speaker:[Alex Andorra]: the science of, um, sports and nutrition. of exercise and nutrition. And so
Speaker:[Alex Andorra]: today we're going to talk a lot about sports also, but more about football
Speaker:[Alex Andorra]: or soccer as it's known in the US. So that's going to be a fun one. And I'm
Speaker:[Alex Andorra]: really happy to have you on the show because you are German. So if I remember
Speaker:[Alex Andorra]: correctly, Germany is in Europe. And so you would be the first soccer analytics
Speaker:[Alex Andorra]: episode Europe centered, which is cool. Yeah, it's one of the things I'm saying
Speaker:[Alex Andorra]: we should do more here in Europe. But before that, as usual, we'll start with
Speaker:[Alex Andorra]: your origin story. Max, how did you come to the world of econometrics and
Speaker:[Alex Andorra]: machine learning? Because it's actually what you're doing most of the time,
Speaker:[Alex Andorra]: if I understood correctly.
Speaker:[Max]: Yeah, yeah, you're right, Alex. Well, actually, it's been well, if I say it's quite
Speaker:[Max]: a journey, it sounds dramatic. But that's, that's not the case. But it took me quite a
Speaker:[Max]: while, let's say. Yeah, that's maybe the better framing.
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: I started out in my PhD, basically, the first year is, you know, there's just some
Speaker:[Max]: coursework. But I went into the PhD without really having something that I really wanted
Speaker:[Max]: to work on in particular. So I took the first year to see which courses I like, which
Speaker:[Max]: not. And at my university, it was not really allowed to choose from. I mean, we had
Speaker:[Max]: macroeconomics, microeconomics, and econometrics, the usual stuff. But yeah, really nothing resonated
Speaker:[Max]: with me so much, I have to say. And then I thought I would do some macro, macroeconomics.
Speaker:[Max]: I think many, many people, or most of the people. PhD students really want to do
Speaker:[Max]: something in that field. So it was also me. But yeah, I really never got familiar with
Speaker:[Max]: that stuff so much. I never really liked it. But in the second year, then there was
Speaker:[Max]: a course of computational economics. And I liked that quite a lot. And it was also,
Speaker:[Max]: let's say a tough schedule. I had to prepare a proposal within a week and I didn't
Speaker:[Max]: have any idea about computational economics. But that really got me into looking into that
Speaker:[Max]: stuff very deeply or deeper, let's say. And
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: so, yeah, basically what I was working on there was some clustering, some unsupervised
Speaker:[Max]: learning basically, but it wasn't really a fancy machine learning back then. So what
Speaker:[Max]: I did
Speaker:[Alex Andorra]: Heh.
Speaker:[Max]: was like the project was related to clustering community structure in the SMP 500 basically,
Speaker:[Max]: that was the project. And... Yeah, but I really thought, oh, this network analysis,
Speaker:[Max]: this community structure detection, that's really cool. I want to work on that. And yeah,
Speaker:[Max]: so I thought this would be basically the outline for the rest of my PhD. And how
Speaker:[Max]: did I get into economics and machine learning then? Because it wasn't really related
Speaker:[Max]: to or not really machine learning, what I was doing back then. So
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: how do I get there then? It wasn't until the third year, basically until I got luckily
Speaker:[Max]: invited to the University of Pennsylvania as a visiting student. And I got introduced,
Speaker:[Max]: I got invited by Francis Diebold and I'll be forever grateful for him for inviting
Speaker:[Max]: me there. And he had a research group on econometrics. And at that time, the topic
Speaker:[Max]: was about climate. And I, again, I thought, well, I'm, I don't care about the topic actually.
Speaker:[Max]: I just want to learn whatever. Yeah. comes to me. And so, yeah, I took that opportunity.
Speaker:[Max]: He introduced me to his research group. And they were working on climate on climate
Speaker:[Max]: forecasting, climate econometrics. And that's how I got basically really introduced
Speaker:[Max]: into econometrics. Because before I went to the University of Pennsylvania, I thought
Speaker:[Max]: like, yeah, I basically know what's going on. And I have this and this project. And that's
Speaker:[Max]: cool. But when I really arrived there, I really got to know what PhD in economics
Speaker:[Max]: is really about. And yeah, that was pretty insightful, I would say.
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: And that's how I got introduced, basically, through this research group, through projects
Speaker:[Max]: that we were working on. And then there was one guy, he was Frank's RA. And yeah, he
Speaker:[Max]: was working on machine learning, in particular. And
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: basically, a couple of weeks in, he came to me and asked me, well, Max, you want to
Speaker:[Max]: get me that and that data? And we can work on a project. started off a long, well,
Speaker:[Max]: quite well, a couple of years now of co-authorship with him with Philippe Goulicolom,
Speaker:[Max]: who is now a professor at UCAM in the University of Quebec at Montreal. And he's
Speaker:[Max]: working a lot on machine learning. And he basically introduced me to that sphere.
Speaker:[Max]: And so in the end, it was the third year of my PhD that I got introduced into econometrics
Speaker:[Max]: and machine learning. And yeah, quite late, as I would say. Yeah. Better late than
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: never maybe.
Speaker:[Alex Andorra]: I mean, better late than never. Right? So it's cool. And you seem to enjoy
Speaker:[Alex Andorra]: that. So that's super fun. And so today, what are we doing? Basically, how
Speaker:[Alex Andorra]: would you define the work you're doing nowadays and the topics you are particularly
Speaker:[Alex Andorra]: interested in?
Speaker:[Max]: Yeah, well, that's a good question. And because everyone I got asked that question,
Speaker:[Max]: I also already or always had a difficult time actually saying
Speaker:[Alex Andorra]: Hehehe
Speaker:[Max]: because I was doing something here, something there. So
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: in between, I also thought I would like to get back to macroeconomics actually, but
Speaker:[Max]: after spending a couple of months on something there and it didn't really work out,
Speaker:[Max]: I completely ditched it at least for the meantime. So what I'm working now is basically
Speaker:[Max]: machine learning and macroeconomic forecasting, let's say. I have a project on recession forecasting
Speaker:[Max]: in the United States, which is probably a hot topic currently. Everyone is awaiting
Speaker:[Max]: it, but it doesn't really seem to occur. So you have to wait a couple of months more.
Speaker:[Max]: And then the other stuff is basically related to climate, a lot of climate forecasting.
Speaker:[Max]: especially about Arctic sea ice, how Arctic sea ice is projected to evolve in the
Speaker:[Max]: future, not only in the near future, but also in the, let's say, longer run. So
Speaker:[Max]: when Arctic sea ice might potentially disappear, there are a couple of
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: projects on that are still related to that climate econometrics group. And then the
Speaker:[Max]: other stuff is basically, yeah, I mentioned learning. And I got really interested in finance,
Speaker:[Max]: asset pricing, what you can do.
Speaker:[Max]: predicting stock returns, using machine learning tools there. That's super fascinating.
Speaker:[Max]: And yeah, just I mean, I have to say that I'm not a specialist in machine learning
Speaker:[Max]: or so. I'm just super interested and fascinated by the tools and the problems that
Speaker:[Max]: come with them. So yeah, there's a lot of, well, they are powerful, but. Applying
Speaker:[Max]: them to finance and economics also comes with some drawbacks. So yeah, you have to work
Speaker:[Max]: around that. And it makes it super interesting.
Speaker:[Alex Andorra]: Yeah, yeah, yeah. Yeah, for sure. And I mean, that's
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: probably by being really interested in a topic that you end up being a specialist
Speaker:[Alex Andorra]: of it. So it's like you don't really start being a specialist and then being
Speaker:[Alex Andorra]: interested in the subject. It's like the causality go the other way around.
Speaker:[Alex Andorra]: So that's
Speaker:[Max]: Thank
Speaker:[Alex Andorra]: good.
Speaker:[Max]: you.
Speaker:[Alex Andorra]: Like trying a lot of things is how you end up finding. what you're really
Speaker:[Alex Andorra]: passionate about. Yeah, awesome. And I'm curious actually, in the research realm
Speaker:[Alex Andorra]: of economics, which tools do you use, machine learning tools, to work in
Speaker:[Alex Andorra]: these models? I'm guessing a lot of open source package, I'm hoping. Because
Speaker:[Alex Andorra]: I remember I was introduced a bit to, I mean, I knew a bit the econometrics
Speaker:[Alex Andorra]: economics field in Europe a few years ago and they were using Stata all
Speaker:[Alex Andorra]: over the place. So I'm curious if that changed and how that changed.
Speaker:[Max]: Oh yeah, that's a funny question. Because Stata, yeah, I mean some people love Stata.
Speaker:[Max]: I'm actually at the complete other end of the distribution. So
Speaker:[Alex Andorra]: haha
Speaker:[Max]: I always try to avoid it as much as I can. I don't know, I never really liked it.
Speaker:[Max]: So what I'm using is basically R and Python.
Speaker:[Alex Andorra]: Okay.
Speaker:[Max]: I also worked a bit on MATLAB. I like MATLAB actually a lot.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But yeah, now I'm mostly working in R and Python. And it really depends. Sometimes
Speaker:[Max]: I prefer R. Sometimes I prefer Python. For machine learning, I'm mostly using Python.
Speaker:[Max]: Well, let's say for machine learning, I'm actually using R, let's say, when it comes
Speaker:[Max]: to random forest or
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: gradient boosted trees or something like that or just plain LASA or Ridge. When it comes
Speaker:[Max]: to deep learning, then I'm using Python. So TensorFlow, now I'm trying to switch to
Speaker:[Max]: PyTorch, actually.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: And yeah, so that's basically the patch that I'm using. Yeah.
Speaker:[Alex Andorra]: Yeah, interesting. And how do you choose the tool, the particular tool you're
Speaker:[Alex Andorra]: using for a particular project?
Speaker:[Max]: Yeah, that's a good question. I think that's mostly an art rather than a science,
Speaker:[Max]: I would say. And it's up to your preference. But not all tools work in every context, right?
Speaker:[Max]: So in economics, it's really the problem, especially in, I would say, macroeconomic forecasting,
Speaker:[Max]: where you have time series of, let's say, it gets until 700 observations on a monthly
Speaker:[Max]: basis for the United States maybe. And then you have a feature set of, let's say,
Speaker:[Max]: 100 features when you include lags and all that. You can pump it up maybe to 1,000
Speaker:[Max]: or something. But for machine learning or for deep learning, this is still rather
Speaker:[Max]: a small data set, I would say. So that's ridiculous, actually.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But still, that's then the challenge, right? To tune them, to train them so that
Speaker:[Max]: they don't overfit. And that's really the interesting part for me, I think. And yeah.
Speaker:[Max]: In other contexts, other tools might work much more conveniently, let's say, or
Speaker:[Max]: are much easier to apply. So some lasso or so when you have a lot of features and you
Speaker:[Max]: just don't know which features are important, then you, yeah. I like lasso in that regard
Speaker:[Max]: because it selects basically the features for you. Or you might say, well, you're a file
Speaker:[Max]: As a pricing context, we have returns, a lot of noise in their signal-to-noise ratio,
Speaker:[Max]: very, very low. You really don't know which features are important. So we just maybe
Speaker:[Max]: the better option, because Lasso would basically set almost everything to zero. Yeah,
Speaker:[Max]: so it really depends. You really have to make it dependent on the context that you're
Speaker:[Max]: working in. And
Speaker:[Alex Andorra]: Hehehe
Speaker:[Max]: yeah, but that's also interesting to see which models prefer or work well on which
Speaker:[Max]: data sets and which contexts. And yeah, I'm still learning in that regard. And that's
Speaker:[Max]: super interesting.
Speaker:[Alex Andorra]: Yeah, yeah, yeah. No, for sure. And I find that super interesting also to see
Speaker:[Alex Andorra]: this ability of open source tools to basically be adopted more and more
Speaker:[Alex Andorra]: in your research, which of course, I'm extremely biased, but I welcome. But also
Speaker:[Alex Andorra]: mainly because I do think that open data and open source are natural consequence,
Speaker:[Alex Andorra]: but also cause, I would say, of... more open science, which I definitely
Speaker:[Alex Andorra]: welcome and I think should be way more of the case, you know, like more and
Speaker:[Alex Andorra]: more you see papers with accompanying GitHub repositories and accompanying GitHub
Speaker:[Alex Andorra]: open source packages even in Python or in R, which is definitely something
Speaker:[Alex Andorra]: new. And that's super cool that the research realm is catching up on that.
Speaker:[Alex Andorra]: Um, because less and less you see papers where I remember a few years ago,
Speaker:[Alex Andorra]: you know, like the first open say the open science and, or open data papers
Speaker:[Alex Andorra]: was like, Oh yeah, the data is available by the way. Um, at the end of
Speaker:[Alex Andorra]: the paper, you know, and then you had to basically beg the, the corresponding
Speaker:[Alex Andorra]: author about like three times a week for four months to get some of the data
Speaker:[Alex Andorra]: and that was not really open basically, um, so yeah, that, that's a really
Speaker:[Alex Andorra]: cool. development that I really love. I have to say.
Speaker:[Max]: No, absolutely. And this is also, I think that's a very good point. For example, me and
Speaker:[Max]: my co-authors, or my co-authors are pushing for that, really, to make the codes then also
Speaker:[Max]: available on the website, for example, so that people can cross-check. And that's
Speaker:[Max]: very good. And yeah, I like that also myself. When I read papers and I want to replicate
Speaker:[Max]: something and the authors are making the code available, basically, you can check
Speaker:[Max]: if your own code is correct. That's super helpful. You learn a lot by that. And yeah,
Speaker:[Max]: really, really. Especially when, for example, using GustaTrees or so. I mean, it's
Speaker:[Max]: XGBoost, and it's super convenient to use. And for sure, there's some tuning that
Speaker:[Max]: you have to do yourself. But still, the package is there, basically. And it's super
Speaker:[Max]: convenient to use. You don't have to cope the whole forest, basically, yourself.
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: So yeah, for sure. That's
Speaker:[Alex Andorra]: Yeah, yeah,
Speaker:[Max]: amazing.
Speaker:[Alex Andorra]: yeah. No, clearly. Yeah, that's super nice and well done and like picking up
Speaker:[Alex Andorra]: all those different tools and different
Speaker:[Max]: Mm-hmm.
Speaker:[Alex Andorra]: languages. That's super cool. And I don't know how it changed, but I do remember
Speaker:[Alex Andorra]: that a few years ago, doing open source development wasn't really incentivized
Speaker:[Alex Andorra]: for doctoral candidates or post-doctoral candidates, so maybe that changed and that's
Speaker:[Alex Andorra]: further better. But if that didn't, the fact that you're doing it is like
Speaker:[Alex Andorra]: even more commentable, I would say, because that's a bit adjacent to your
Speaker:[Alex Andorra]: project. So yeah, well done on doing that and taking the time to do it.
Speaker:[Alex Andorra]: That's what we're called for sure. Um, so now I'd like to talk a bit about,
Speaker:[Alex Andorra]: yeah. So you said you're doing econometrics, but, um, can you define econometrics
Speaker:[Alex Andorra]: for us and, and tell us what it brings to economics basically.
Speaker:[Max]: Yeah, sure. So a lot of weight now for me on
Speaker:[Alex Andorra]: haha
Speaker:[Max]: giving the textbook definition of econometrics.
Speaker:[Alex Andorra]: Yeah, exactly.
Speaker:[Max]: No, I mean, it's basically, or now I'm butchering the whole definition probably. But
Speaker:[Max]: it's applying statistical tools to an economic context and trying
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: to use statistical tools to basically verify some economic theory or some. to understand
Speaker:[Max]: some relationships between economic variables. So I think it's a, yeah, I think that that's
Speaker:[Max]: basically it. It's kind of a fancier term for what it actually is, applying statistical
Speaker:[Max]: tools for understanding economic relationships. That's basically it. I mean, it's essential.
Speaker:[Max]: I mean, for empirical work, for sure they're economists who you only work on theory,
Speaker:[Max]: but yeah, for policy analysis or for... you need to analyze the data in the end. And
Speaker:[Max]: basically, that's what I'm doing. I don't really do theory stuff, but for me, it's just
Speaker:[Max]: all empirical. And yeah, so definitely, it's very useful in the end, especially for
Speaker:[Max]: policymaking at central banks and everywhere, also for the industry, be it
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: banking industry or be it just normal in the real economy for
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: analyzing demand and all that.
Speaker:[Alex Andorra]: And do you... So I'm curious how you got introduced to Bayesian methods,
Speaker:[Alex Andorra]: actually, and why they stuck with you, because from what I remember, from
Speaker:[Alex Andorra]: the world of econometrics, Bayes was not used a lot in this field. So I'm actually
Speaker:[Alex Andorra]: curious why you are using it.
Speaker:[Max]: Yeah. Well, I have to admit, like, so I already said that it was like third year
Speaker:[Max]: that I got to introduce in Jekyll and the Matrix. And that was this project when
Speaker:[Max]: Philippe, Frank's RA basically
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: came to me and asked me to gather some data on climate variables because we want to
Speaker:[Max]: run a vector autoregression of the Arctic. Basically, you basically get some, what we
Speaker:[Max]: basically did is we gathered data. and which time series on certain climate variables,
Speaker:[Max]: which we
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: thought would proxy for the Arctic ecosystem basically. And then we wanted to use a vector
Speaker:[Max]: autoregression to analyze certain amplification mechanisms, if there is a shock to CO2, for
Speaker:[Max]: example, and also to be able to produce long-run forecasting projections. So when Arctic
Speaker:[Max]: seas might potentially
Speaker:[Max]: disappear in the future.
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: And so the data is highly non-stationary. And
Speaker:[Alex Andorra]: Uh huh. Uh huh.
Speaker:[Max]: in VARs, when you work with VARs, most economists really work with patient methods
Speaker:[Max]: there. And as I said, data was highly stationary. So patient statistics or the patient's
Speaker:[Max]: framework gives you some leeway there, granted some freedom there. So that was big.
Speaker:[Max]: Yeah, that was why Felipe told me, okay, look at Bayesian VARs, look at the Bayesian
Speaker:[Max]: way. And that's how I actually got introduced to that. And there was at the time, I really
Speaker:[Max]: didn't have any exposure. So there was a package in MATLAB for doing Bayesian inference,
Speaker:[Max]: basically, with VARs. And that was super helpful. That helped me a lot. That was super,
Speaker:[Max]: or a great education, a source of education, really, that was great. And The more I learned
Speaker:[Max]: about it, the more it resonated with me, this concept of quantifying uncertainty.
Speaker:[Max]: I think this is because especially in economics, this is quintessential to really
Speaker:[Max]: get an idea of
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: what the uncertainty is. Point estimate is always nice, but you want to have the uncertainty
Speaker:[Max]: around it. And that's also what Frank Biber always told us. Yeah, you want to have
Speaker:[Max]: a measure of uncertainty. And definitely, that's true. Yeah, you get it from the in the
Speaker:[Max]: Bayesian framework. It's just so intuitive to think about it. And yeah, I like that a
Speaker:[Max]: lot. And unfortunately, I don't really work so much or haven't worked in so many projects
Speaker:[Max]: with Bayesian methods lately, or not as much as I would like to. But yeah, it's
Speaker:[Max]: ever since resonated with me. And yeah. I. Still, I wanted to learn more, and that's
Speaker:[Max]: how basically I got into looking at PyMC, because I wanted to learn with Python, and
Speaker:[Max]: thought, well, maybe an application with Bayesian methods, the Bayesian framework would
Speaker:[Max]: be cool to learn, and that's how I got into PyMC 3, or PyMC basically, or looked at
Speaker:[Max]: it and looked at it. So, yeah.
Speaker:[Alex Andorra]: Yeah, yeah, yeah. Nice. That's interesting. So yeah, basically, it's like
Speaker:[Alex Andorra]: the uncertainty quantifying that was really important to you.
Speaker:[Max]: Exactly. So that was really the key, the key point there.
Speaker:[Alex Andorra]: Yeah. I mean, that does make sense, right? Because, yeah, that's really
Speaker:[Alex Andorra]: one of the parts where bass does shine a lot. And also, especially for
Speaker:[Alex Andorra]: the Arctic sea ice project that you are talking about. It's not like it's a
Speaker:[Alex Andorra]: reproducible experiment. It's really hard in these cases to think from a
Speaker:[Alex Andorra]: frequentist framework of repeatable experiments. You cannot have multiple earths
Speaker:[Alex Andorra]: on which you can two RCTs where you melt the ice caps or not, and you melt
Speaker:[Alex Andorra]: it naturally. like naturally or thanks to human intervention. It's just
Speaker:[Alex Andorra]: like, it doesn't work in that case. So yeah, Base, I'm not surprised that
Speaker:[Alex Andorra]: it would be a project where Base fits way more naturally.
Speaker:[Max]: Yeah, no, that's for sure. I mean, for example, these climate models from these climate
Speaker:[Max]: institutions, these are huge models. And big models, to train them or to run these
Speaker:[Max]: models, it takes a lot of time. And they are very sophisticated. So really, really sophisticated.
Speaker:[Max]: But they are basically deterministic models. And they give you a point estimate
Speaker:[Max]: in the end. But our... interest was basically really to see, well, we get a point estimate,
Speaker:[Max]: but we also want to see, especially when you project the path of Arctic sea ice, the
Speaker:[Max]: uncertainty around it. Well, how likely is it that maybe or that we see Arctic sea
Speaker:[Max]: ice disappearing, not at our point estimate in the 2060s or 70s, but beforehand? Like how
Speaker:[Max]: large is the uncertainty? Maybe our model is really not good and the uncertainty is so
Speaker:[Max]: much all over the place that it's more or less useless. But yeah, and that project
Speaker:[Max]: was actually interesting to see that the uncertainty or the credible region was
Speaker:[Max]: basically spanning like 20 years, 25 years around. So that was very interesting.
Speaker:[Max]: And it gave us a quick quantification of uncertainty to it. Yeah, that was really,
Speaker:[Max]: really interesting.
Speaker:[Alex Andorra]: Yeah, yeah, yeah. Nice. Uh, they, I love that. Uh, and I mean, I would
Speaker:[Alex Andorra]: have, that's really interesting for me to, to talk with someone who recently
Speaker:[Alex Andorra]: got into the Bayesian framework and to understand how you get into it and why,
Speaker:[Alex Andorra]: and, and how, uh, so I would have a lot of other questions on that, but
Speaker:[Alex Andorra]: I want to talk about football or soccer, so let's, let's switch to that and
Speaker:[Alex Andorra]: then if we have time at the end of the episode, I'll come back with my,
Speaker:[Alex Andorra]: um, nerdy, uh. Educational questions. So yeah, basically you have an area or a hobby
Speaker:[Alex Andorra]: of yours where you do apply and need actually Beijing stats and that's
Speaker:[Alex Andorra]: soccer analytics. First, I read a bit your website and I saw you were a passionate
Speaker:[Alex Andorra]: football since you were a child and you mentioned a bunch of European championships.
Speaker:[Alex Andorra]: Not the French one though. I was absolutely outraged. What happened? What
Speaker:[Alex Andorra]: happened? Like, don't you get the French games in Germany?
Speaker:[Max]: Oh yeah, well that's another issue. So when I was younger really, I mean it was only
Speaker:[Max]: the Bundesliga and sometimes when you were lucky, sometimes you got the highlights
Speaker:[Max]: of the French Premier League and the Serie A, but yeah you had to be really lucky,
Speaker:[Max]: it was not always available and I wasn't that...
Speaker:[Max]: Yeah, I didn't know the websites where you could watch it basically. So
Speaker:[Alex Andorra]: Hehehehe
Speaker:[Max]: that was another issue. But yeah, the French, well, the French league, I was never
Speaker:[Max]: really a fan of. I'm sorry, Alex. But yeah, that's just even though one of my favorite
Speaker:[Max]: players was Joao Gopic. So Olympic
Speaker:[Alex Andorra]: Oh,
Speaker:[Max]: Rio.
Speaker:[Alex Andorra]: really?
Speaker:[Max]: Yeah,
Speaker:[Alex Andorra]: Oh,
Speaker:[Max]: yeah, yeah. So
Speaker:[Alex Andorra]: he went
Speaker:[Max]: yeah.
Speaker:[Alex Andorra]: to Milan. Yeah. Um,
Speaker:[Alex Andorra]: yeah, no offense taken. I think the French league is pretty boring. Um, and,
Speaker:[Alex Andorra]: uh, yeah, as long
Speaker:[Max]: as
Speaker:[Alex Andorra]: as,
Speaker:[Max]: the Bundesliga.
Speaker:[Alex Andorra]: I mean, yeah, um, as long as PSG is dominating like that, uh, I mean, that's
Speaker:[Alex Andorra]: good for me because, um, I'm a PSG fan since I'm like five year olds, uh,
Speaker:[Alex Andorra]: but yeah, like, uh, it's not a very interesting league. And the level is
Speaker:[Alex Andorra]: kind of going down by the gears. So hopefully we'll get some investors in other
Speaker:[Alex Andorra]: clubs, which make for a good competition for Paris, but until now it's really
Speaker:[Alex Andorra]: bad. And it's actually bad for Paris because the competition inside the country
Speaker:[Alex Andorra]: is really bad. So then when they get on the European stage, they are not
Speaker:[Alex Andorra]: really used to the intensity and having so much. adversity in a way. So,
Speaker:[Alex Andorra]: yeah, it's too easy for them, let's say. So basically, but I didn't get you
Speaker:[Alex Andorra]: on the show to trash the French league. I want to talk about soccer factor
Speaker:[Alex Andorra]: model that you recently worked on. And I found it super interesting because
Speaker:[Alex Andorra]: that's mainly, yeah, the main question I always have in soccer analytics.
Speaker:[Alex Andorra]: The nerd in me is always very careful about the hot takes that you see the
Speaker:[Alex Andorra]: commentators have about players where it's like, yeah, but what's the, how
Speaker:[Alex Andorra]: do you separate a player's skill from the ability, skills and ability from his
Speaker:[Alex Andorra]: team's strength? And that's to me is extremely important because mostly
Speaker:[Alex Andorra]: in Europe, right now, most of the clubs... mainly invest on players on gut
Speaker:[Alex Andorra]: feeling, basically. And the thing is when you do that and you're not able
Speaker:[Alex Andorra]: to separate inherent player abilities from team strength, then you get
Speaker:[Alex Andorra]: kind of an aura effect from the beginning of your carrier that can follow
Speaker:[Alex Andorra]: you, even though you're not that good of a player, but basically, like
Speaker:[Alex Andorra]: this aura can follow you even though you are not making that much of a difference.
Speaker:[Alex Andorra]: But it's just like, it's hard to contradict it because you don't really have
Speaker:[Alex Andorra]: the method of the scientific way of disproving basically what's going on.
Speaker:[Alex Andorra]: That actually, well, it's not really your inherent abilities but mainly the
Speaker:[Alex Andorra]: people you're surrounded with. And I think it's like absolutely important
Speaker:[Alex Andorra]: to do that and should lead to... really a revolutionized way of transferring
Speaker:[Alex Andorra]: players and signing them and so on. So, that was basically the background
Speaker:[Alex Andorra]: for people who are not interested in football. Even though, even if the field
Speaker:[Alex Andorra]: doesn't interest you, I think the method and the goal of the model is actually
Speaker:[Alex Andorra]: extremely important because you can also think about that in finance, for
Speaker:[Alex Andorra]: instance, like I know a lot more work has been done in finance for that
Speaker:[Alex Andorra]: because I mean, the return or. Basically, the incentives of the money are
Speaker:[Alex Andorra]: much more important because you know if you make money or not. But I know
Speaker:[Alex Andorra]: there is a lot of literature right on basically passive investment versus
Speaker:[Alex Andorra]: active investment. And how do you actually prove that an active investment
Speaker:[Alex Andorra]: is better than a passive one and that it's actually due to the skills of
Speaker:[Alex Andorra]: the person who invested on the market instead of just random market fluctuation?
Speaker:[Alex Andorra]: So you can see that in a lot of contexts where you can see that. Basically,
Speaker:[Alex Andorra]: information is sparse, is hard to decipher, and so you need a model to make
Speaker:[Alex Andorra]: sense of it. So you can see that, I would say, in football, in a lot of
Speaker:[Alex Andorra]: sports, in finance, in medicine also, right, where it's like you can have a
Speaker:[Alex Andorra]: lot of these celebrity effect basically. I think in a lot of contexts where
Speaker:[Alex Andorra]: celebrity effect is important, it can be broken down by that scientific way
Speaker:[Alex Andorra]: of estimating it. So these... politics, of course, movie. I think it's basically
Speaker:[Alex Andorra]: a theme that's running in a lot of fields where the celebrity effect is
Speaker:[Alex Andorra]: extremely big. So yeah, that was a very long introduction.
Speaker:[Max]: Yeah.
Speaker:[Alex Andorra]: But to say that, I think it's very useful. So you can react to what I said
Speaker:[Alex Andorra]: and also afterwards, if you can tell us what a factor model is. Because
Speaker:[Alex Andorra]: your model is very, You could lead the soccer factor model, but then can
Speaker:[Alex Andorra]: you tell us before that what a factor model is?
Speaker:[Max]: Yeah. No, Alex, I mean, you laid it out perfectly. I couldn't have said it any more
Speaker:[Max]: accurately, I would say, really on the point as far as I see that. So a factor model,
Speaker:[Max]: what it actually is, is a factor basically as some, I would define it as
Speaker:[Max]: some proxy for a certain. exposure to a certain, in finance to a certain risk basically.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: Also a reduction for example in when you look at economics or macroeconomics it's
Speaker:[Max]: often related to the context you have a huge set of features and you reduce it to
Speaker:[Max]: a couple of underlying factors or a single factor only. It's a kind of a feature reduction
Speaker:[Max]: like dimensionally reduction technique like PCA.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: principal component analysis or that. But in finance, it's really like a proxy for
Speaker:[Max]: a certain risk exposure that basically the cross-section of stock returns or all stock
Speaker:[Max]: returns are exposed to a certain systematic risk exposure.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: All stock returns are basically exposed to it. This is basically a factor. And
Speaker:[Alex Andorra]: Yep.
Speaker:[Max]: in the literature, and as surprising as identified, several of these and yeah. common
Speaker:[Max]: risk exposures basically across the whole universe of stocks basically. But as you already
Speaker:[Max]: said, you can use it also as quantifying the ability, for example, of a portfolio manager.
Speaker:[Max]: So if he has some skill in the game, basically if he has really superior selection
Speaker:[Max]: potential, then just following along these. common risk exposures, basically.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: And that's also what this Stalker Factor Model basically is inspired by, to identify
Speaker:[Max]: certain features that all players are exposed to because of the differences in the
Speaker:[Max]: teams. And then when you account for that, then you can basically extract the skill
Speaker:[Max]: and the inherent ability of each player after you account for these systematic differences
Speaker:[Max]: across teams basically that influences
Speaker:[Alex Andorra]: Hmph.
Speaker:[Max]: the ability or the observed performance of a player.
Speaker:[Alex Andorra]: Yeah, yeah, for sure. Yeah, for sure. Because like in the example of football,
Speaker:[Alex Andorra]: like you'd say it's easier to be the number nine. So the, how do you say
Speaker:[Alex Andorra]: in English that position, like the front, playing. Number nine is like the
Speaker:[Alex Andorra]: guy who's supposed to score the goals. Like the English natives can then
Speaker:[Alex Andorra]: tell me what the, the name is in French that would be Atacon. It's easier
Speaker:[Alex Andorra]: to be the number nine of PSG than the number nine of a very small team in
Speaker:[Alex Andorra]: France, because the whole, the rest of the team is stronger. The manager is
Speaker:[Alex Andorra]: supposed to be stronger and so on. So, yeah, you're like, yeah, but maybe
Speaker:[Alex Andorra]: if you took the number nine of the small team and you put it in Paris,
Speaker:[Alex Andorra]: maybe he would perform as well as the current number nine does. So how do
Speaker:[Alex Andorra]: you make the difference? So that's what we're going to talk about. Before
Speaker:[Alex Andorra]: that, I'm curious, from a structural standpoint, these kind of factor models, how
Speaker:[Alex Andorra]: do they work? How much time do you need to really start to decipher the
Speaker:[Alex Andorra]: difference between inherent skills and exhaustion as basically strength?
Speaker:[Alex Andorra]: And that question is basically, how much data you need from the past years
Speaker:[Alex Andorra]: to start having an idea like how data hungry are those models.
Speaker:[Max]: Yeah, so that's definitely a good question, a good point. So you have to create these,
Speaker:[Max]: yeah, you have, so in the model that I'm basically proposing is, Basically, I need
Speaker:[Max]: a lead time into the season to really account for certain differences. So I need
Speaker:[Max]: a couple of games already that
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: would need to be played to really account for differences in teams. Because before the
Speaker:[Max]: first game, basically, everything, or based on the data that I had, everyone
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: would have been the same.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But it depends really on the data. If you have data that allows you to account for
Speaker:[Max]: differences across teams, batch it.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: let's say, or so you can just start right
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: away. And for overall data, I would say like more data is always better. If you have
Speaker:[Max]: only a few observations, I think the Bayesian framework is then tailor made for
Speaker:[Max]: that as well. Like it's yeah, it grants you some leeway there. But I would say really,
Speaker:[Max]: it's the more data you have, the better. But yeah.
Speaker:[Alex Andorra]: But you could already, OK, so you could already start having that idea with
Speaker:[Alex Andorra]: just a few games. Then you get the idea of the strength of the team. And then
Speaker:[Alex Andorra]: you can start deciphering the strengths of the player. OK.
Speaker:[Max]: Exactly, exactly.
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: But as far as I always used a certain number of, let's say, burn-in games
Speaker:[Alex Andorra]: Yeah,
Speaker:[Max]: to
Speaker:[Alex Andorra]: yeah,
Speaker:[Max]: really account
Speaker:[Alex Andorra]: yeah.
Speaker:[Max]: for that.
Speaker:[Alex Andorra]: Yeah. And I mean, it's not that superficial, right? Because you can think like
Speaker:[Alex Andorra]: right now it's August, it's the beginning of the leagues for the European
Speaker:[Alex Andorra]: teams. August is a weird moment where the teams are still warming up basically.
Speaker:[Alex Andorra]: Um, and they are not really, they are clearly not at peak performance. Usually
Speaker:[Alex Andorra]: they try to peak around spring for the Northern hemisphere. So around March.
Speaker:[Alex Andorra]: from February to May, basically, they are trying to get their peak. So they
Speaker:[Alex Andorra]: are still warming up. They can still trade players until the end of August.
Speaker:[Alex Andorra]: So you could really say that the games they are doing in August, even though
Speaker:[Alex Andorra]: they are official games, they are still warming up games and don't really
Speaker:[Alex Andorra]: mean a lot for a long-term performance perspective. So that's an interesting moment
Speaker:[Alex Andorra]: to start warming up the model, I'd say. And so, but something I mean, and
Speaker:[Alex Andorra]: maybe you have that for future iterations of the model where you could put
Speaker:[Alex Andorra]: in the priors. Um, we're going to talk about the structure of the model, uh,
Speaker:[Alex Andorra]: right away, right after that, but, uh, something I'm thinking about is that
Speaker:[Alex Andorra]: you could put in the prior, the information that you have about the strengths
Speaker:[Alex Andorra]: of the team in, in the way that, yeah, you have the budget, which is a good
Speaker:[Alex Andorra]: proxy for potential future performance. But also, like, just past performance. If you
Speaker:[Alex Andorra]: know that Paris has been the champion for nine years out of 10, well, you
Speaker:[Alex Andorra]: have really good prior about the strengths of the team. So you can
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: probably also add that into the model and in that way reduce the warming
Speaker:[Alex Andorra]: up period of the model.
Speaker:[Max]: Yeah, no, absolutely. Or how Paris against Lyon, let's say, has performed in
Speaker:[Alex Andorra]: Yep.
Speaker:[Max]: the past. So they're direct comparison between those teams, basically, when they faced
Speaker:[Max]: each other for past years. That would also feed in there. Yeah, so absolutely. There's
Speaker:[Max]: a lot of potential. And my model is,
Speaker:[Alex Andorra]: Mm-hmm. Yeah.
Speaker:[Max]: when you're basically suggesting this stuff, my model just appears very rudimentary.
Speaker:[Max]: But it could be definitely. extended in that regard.
Speaker:[Alex Andorra]: Yeah, I mean, that's the fun thing of model and rights. It's like you have
Speaker:[Alex Andorra]: to start somewhere that's good enough, and then you have a lot of ideas to
Speaker:[Alex Andorra]: extend it. And it's a never-ending endeavor. Like, each model, if you want to
Speaker:[Alex Andorra]: do your good work on it your whole life, if you're interested enough, you
Speaker:[Alex Andorra]: definitely can do that. I know my models that I often revisit are the ones
Speaker:[Alex Andorra]: for predicting French presidential elections. when I started doing that in 2017
Speaker:[Alex Andorra]: and compared to the one I had for 2022, it's just embarrassing.
Speaker:[Alex Andorra]: But in a way, it's good that the work you're doing right now is the best
Speaker:[Alex Andorra]: one you've ever done. And in a few years, when you look at the work you're
Speaker:[Alex Andorra]: doing right now, it should be the worst you've ever done because that means
Speaker:[Alex Andorra]: you've... progressed a lot in the meantime. So I think it's a good mindset.
Speaker:[Alex Andorra]: So how did you adapt that factor model for soccer? Like how, what does the model
Speaker:[Alex Andorra]: structure look like basically for listeners to have an idea? And for those
Speaker:[Alex Andorra]: watching on YouTube, you can share your screen actually. So if you want
Speaker:[Alex Andorra]: to share anything at some point, feel free to do it. Otherwise, the audio format
Speaker:[Alex Andorra]: is here for you because it's a podcast. So it's an audio first content.
Speaker:[Max]: Perfect. Yeah. So yeah, maybe if I get it on the screen, I'll do that. But for now,
Speaker:[Max]: maybe the structure, I think, is pretty simple. And as you laid it out already very,
Speaker:[Max]: very accurately, it's basically trying to come up with some features, do some feature
Speaker:[Max]: engineering that basically accounts for differences across teams. And well, when you
Speaker:[Max]: look at, let's say, player a certain player, let's say, Cristiano Ronaldo. And you
Speaker:[Max]: really want to account for the difference that his current team is currently between
Speaker:[Max]: his team and the team that he's facing at that exact instance. And you want to create
Speaker:[Max]: some features that can proxy for these differences across teams. And that's basically
Speaker:[Max]: the heart of the model. And this is basically inspired by these asset pricing factors that
Speaker:[Max]: try to account for. differences across assets, across stocks, across firms, basically.
Speaker:[Max]: And the modeling part itself is really nothing sophisticated. You can include kind
Speaker:[Max]: of a hierarchical structure where you don't need to, but it can help, definitely.
Speaker:[Max]: But it's really the feature engineering that is at the heart of it. And then IMC comes
Speaker:[Max]: in very conveniently and just basically. That's the dirty work for you.
Speaker:[Alex Andorra]: Mm-hmm. And so what's the, so then that's cool. If it's a simple structure,
Speaker:[Alex Andorra]: yeah, can you talk about what was your likelihood
Speaker:[Max]: Thanks
Speaker:[Alex Andorra]: and then
Speaker:[Max]: for watching!
Speaker:[Alex Andorra]: what kind of distribution you put on the parameters and things like that?
Speaker:[Alex Andorra]: I think it would be a fun thing to talk about for the listeners.
Speaker:[Max]: Sure, sure. Then maybe I just get the workbook loaded.
Speaker:[Alex Andorra]: Oh yeah.
Speaker:[Max]: So maybe I can share my screen and couple
Speaker:[Alex Andorra]: Yes,
Speaker:[Max]: of...
Speaker:[Alex Andorra]: you should be able to.
Speaker:[Max]: Let me see.
Speaker:[Max]: So in terms of a likelihood, basically, or what the model structure is, so I have to
Speaker:[Max]: proxy, I need some observed measurement of a player's performance. Not
Speaker:[Alex Andorra]: Yes.
Speaker:[Max]: a skill, I mean, that is something that is underlying, that is latent, that we want
Speaker:[Max]: to identify.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But we need some observed measure of player performance. What I used is scoring
Speaker:[Max]: goals. Did players score a goal in a certain game or not? So basically, 0, 1, basically
Speaker:[Max]: binomial distributed, and basically, the logistic regression it is. You want
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: to identify the probability of a player's scoring. And so now I have it. I guess I have
Speaker:[Max]: it here.
Speaker:[Alex Andorra]: you may have
Speaker:[Max]: Um,
Speaker:[Alex Andorra]: to authorize Google Chrome to share.
Speaker:[Max]: exactly,
Speaker:[Alex Andorra]: Oh
Speaker:[Max]: exactly. That
Speaker:[Alex Andorra]: yeah.
Speaker:[Max]: unfortunately takes a bit of time. Um, Sorry, I guess I'll be
Speaker:[Alex Andorra]: Yeah,
Speaker:[Max]: here in a second.
Speaker:[Alex Andorra]: it's all good. Yep. It's all good. You can do that and
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: come back. I don't know what's going to happen for the recording, but I already
Speaker:[Alex Andorra]: did that. After all, it's no problem.
Speaker:[Max]: Sorry, I didn't.
Speaker:[Alex Andorra]: I mean, it's the first time I do it. So I didn't know it either.
Speaker:[Max]: Ah, okay, here it is. Wait.
Speaker:[Max]: Is it Joe? Ah.
Speaker:[Alex Andorra]: So
Speaker:[Max]: No.
Speaker:[Alex Andorra]: I think you need to give permission.
Speaker:[Max]: Yeah, exactly. That's
Speaker:[Alex Andorra]: And open
Speaker:[Max]: one.
Speaker:[Alex Andorra]: your computer system settings and click privacy and security.
Speaker:[Max]: Well, maybe.
Speaker:[Alex Andorra]: Apparently, if you open your system settings, and then you go
Speaker:[Max]: Yeah
Speaker:[Alex Andorra]: to privacy and security, and you click screen recording, and allow your
Speaker:[Alex Andorra]: browser to share your screen. I think you need to allow Google Chrome to
Speaker:[Alex Andorra]: share your screen.
Speaker:[Max]: mm-hmm yeah I was there but ah yeah okay now
Speaker:[Alex Andorra]: I mean
Speaker:[Max]: maybe
Speaker:[Alex Andorra]: otherwise it's no chip.
Speaker:[Max]: Okay.
Speaker:[Max]: Okay. Sorry for that.
Speaker:[Alex Andorra]: So let's see.
Speaker:[Max]: No? That's what I wanna
Speaker:[Alex Andorra]: Yeah, it
Speaker:[Max]: do with
Speaker:[Alex Andorra]: may
Speaker:[Max]: that guess.
Speaker:[Alex Andorra]: be
Speaker:[Max]: Sorry.
Speaker:[Alex Andorra]: because you have to get out to quit Google Chrome and then come back. Are
Speaker:[Alex Andorra]: you on Mac?
Speaker:[Max]: Yeah, yeah, exactly.
Speaker:[Alex Andorra]: Yeah, so you probably need to close Google Chrome and then come back. But
Speaker:[Alex Andorra]: you can do that. And then you come back to the same link I sent you.
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: And then it should work. Maybe I'll have to
Speaker:[Max]: OK.
Speaker:[Alex Andorra]: do another recording, but that's OK. I can edit that after once. It's easy.
Speaker:[Max]: Okay, okay.
Speaker:[Alex Andorra]: So I'll wait for you here. Yeah.
Speaker:[Max]: Okay. I'm back Alex. Sorry.
Speaker:[Max]: Sorry Alex, I cannot hear you currently.
Speaker:[Alex Andorra]: Yes, that's normal. I was muted. So cool. I didn't even have to start a new
Speaker:[Alex Andorra]: recording. You can just join the room again. Cool. First time it happened,
Speaker:[Alex Andorra]: so I didn't know what would happen. So cool. Perfect. So does it work now?
Speaker:[Max]: Let's
Speaker:[Alex Andorra]: Let's
Speaker:[Max]: see.
Speaker:[Alex Andorra]: try.
Speaker:[Alex Andorra]: No,
Speaker:[Max]: No,
Speaker:[Alex Andorra]: still not.
Speaker:[Max]: no,
Speaker:[Alex Andorra]: That's weird.
Speaker:[Max]: no. I'll give it a last try and otherwise I just.
Speaker:[Alex Andorra]: Yeah, otherwise it's okay, but...
Speaker:[Max]: Yeah, Google
Speaker:[Alex Andorra]: It's
Speaker:[Max]: Chrome,
Speaker:[Alex Andorra]: weird.
Speaker:[Max]: it's there.
Speaker:[Alex Andorra]: It should work.
Speaker:[Max]: I allowed it. So I don't know, Google Chrome, it's fine. It can access.
Speaker:[Max]: but
Speaker:[Alex Andorra]: I'm checking that it could be on my end maybe, so...
Speaker:[Max]: screen
Speaker:[Alex Andorra]: Yeah, no,
Speaker:[Max]: the
Speaker:[Alex Andorra]: on
Speaker:[Max]: window
Speaker:[Alex Andorra]: my end also it's all good, so...
Speaker:[Max]: And I'm sorry, no, fortunately it doesn't
Speaker:[Alex Andorra]: No, weird.
Speaker:[Max]: work.
Speaker:[Alex Andorra]: Anyways, that's OK. So well, then let's continue between the
Speaker:[Max]: Thanks for watching.
Speaker:[Alex Andorra]: screen sharing. You can just talk through it. It's no problem.
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: I've
Speaker:[Max]: Yeah.
Speaker:[Alex Andorra]: done it. We've done it for a lot of podcast episodes.
Speaker:[Max]: OK. Yeah, so the structure basically is relatively simple. You need some idea of
Speaker:[Max]: what the performance of the player is. And you have to have a proxy for that.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: And well, you need this performance to be observed, obviously. And the proxy that
Speaker:[Max]: I choose for a player's performance is whether he scores a goal or not, so 0 or 1
Speaker:[Max]: in a certain game. We're normally distributed our y, our target. And it's basically a logistic
Speaker:[Max]: regression that we are running. Because what we want to identify is really the skill
Speaker:[Max]: and the ability, latent variable hidden in our observed performance measure, basically.
Speaker:[Max]: And so the model is pretty simple. You need the prior. You have basically a bunch
Speaker:[Max]: of coefficients. That is, you have the alpha. the skill, the ability that you're interested
Speaker:[Max]: in. And then you have the loadings, the coefficients on all the factors that are in
Speaker:[Max]: your model. So you basically have to impose priors for all the coefficients.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: And then you have to define the likelihood, the newly distributed. And yeah, that's basically
Speaker:[Max]: the model. It's on the workbook. And people can go through it. There's also a redacted
Speaker:[Max]: version, basically, where you're People, if they are fancy, can try to work with their
Speaker:[Max]: own priors and all that and try to do it themselves first and check the unredacted
Speaker:[Max]: version.
Speaker:[Alex Andorra]: Oh, that's cool.
Speaker:[Max]: So they want to play with that a bit.
Speaker:[Alex Andorra]: Nice.
Speaker:[Max]: Yeah, that's basically it. So it's nothing really crazy. It's the four lines of code,
Speaker:[Max]: the basic model, basically. And yeah, when you look at multiple players, so you can
Speaker:[Max]: do that for a single player only, but you can also do that for sure for multiple
Speaker:[Max]: players. The key reason is that. Basically, everyone should be exposed to the, each player
Speaker:[Max]: should be exposed to these factors with the same loading basically. So you can expose,
Speaker:[Max]: impose a hierarchical structure on the ability and skill of each player. You should
Speaker:[Max]: definitely do that, but you can post the hierarchical structure by player or also
Speaker:[Max]: by season. So the ability of the player may evolve over seasons or across seasons basically.
Speaker:[Alex Andorra]: Mm-hmm, mm-hmm, yeah.
Speaker:[Max]: That's, I think. something worth looking into or worthwhile doing. And then basically
Speaker:[Max]: you have the loadings on the factors and they should account for the team effort
Speaker:[Max]: basically. You want to account that and you want to get that out of the way so that
Speaker:[Max]: you're basically in the end left with this latent factor, the alpha, the inherent
Speaker:[Max]: skill and ability of the player.
Speaker:[Alex Andorra]: Yeah, yeah, yeah. OK. Yeah, that makes sense. And I mean, for sure, I will
Speaker:[Alex Andorra]: put all of these in your episode's show notes. And actually, I think I can share
Speaker:[Alex Andorra]: my screen. I didn't know why I didn't think about that before. And here
Speaker:[Alex Andorra]: is the notebook, right? Am I on the right notebook?
Speaker:[Max]: Exactly.
Speaker:[Alex Andorra]: Yeah, perfect.
Speaker:[Max]: Yeah, yeah,
Speaker:[Alex Andorra]: So.
Speaker:[Max]: yeah. So there are a couple of notebooks there. So there's this in the Pyamicon folder,
Speaker:[Max]: that's the one where there's the redacted version and the unredacted version and the
Speaker:[Max]: version that we're currently looking on. That's the initial part with all its typos
Speaker:[Max]: in there.
Speaker:[Alex Andorra]: Ah ok, so it's not the right one. Then, should look at
Speaker:[Max]: It's
Speaker:[Alex Andorra]: another
Speaker:[Max]: fine
Speaker:[Alex Andorra]: one.
Speaker:[Max]: one, so it's perfect. The other one is just a bit smaller and more concise, I would
Speaker:[Max]: say.
Speaker:[Alex Andorra]: Ah, here. Unredacted. Perfect. Yeah, I have it here. So yeah, like for those
Speaker:[Alex Andorra]: of you watching on YouTube, I'm charging it right now. And so basically,
Speaker:[Alex Andorra]: this is the part of the model where you're talking about the likelihood,
Speaker:[Alex Andorra]: where it's goal is scored or not scored. And then you have here the probability,
Speaker:[Alex Andorra]: which is basically here. this alpha that you talked about, right? That
Speaker:[Max]: Exactly.
Speaker:[Alex Andorra]: is the inherent skill of the player which enters probability. And you have
Speaker:[Alex Andorra]: the Xs and the beta. So the Xs, are they the factors or the beta are the
Speaker:[Alex Andorra]: factors?
Speaker:[Max]: So the Xs are the factors. These are the differences across the teams or between
Speaker:[Max]: the teams. And this is what you want to basically account for and to clean the observed
Speaker:[Max]: performance measure from. Yeah.
Speaker:[Alex Andorra]: Yeah, yeah. Oh, yeah, OK. Yeah, for sure. And then the beta is the slope, basically,
Speaker:[Alex Andorra]: on the factors. Yeah, yeah,
Speaker:[Max]: Exactly.
Speaker:[Alex Andorra]: yeah. Yeah, yeah, it's a fun model. So of course, it's hard to make it just
Speaker:[Alex Andorra]: this on the podcast. But I encourage you to go and watch that part on YouTube. I'm
Speaker:[Alex Andorra]: sharing it right now. And also, you can just take a look at the notebook from
Speaker:[Alex Andorra]: Max, which I put in the show notes, where you have all the details. So it's
Speaker:[Alex Andorra]: pretty fun to look at. And also, as you were saying, the model is pretty small.
Speaker:[Alex Andorra]: So that's the amazing thing that I find is that basically, and now if we
Speaker:[Alex Andorra]: go look at the Prime C implementation, so a bit later
Speaker:[Max]: Oh.
Speaker:[Alex Andorra]: down in the model, the really cool thing is that basically the model is quite
Speaker:[Alex Andorra]: easy to code, right? And in a way, that's just a few lines of codes, so
Speaker:[Alex Andorra]: basically four lines of codes, as you were saying, and you're done. So that's
Speaker:[Alex Andorra]: the beauty of the probabilistic programming framework, right? It's a really
Speaker:[Alex Andorra]: useful model. But if you want to get to a first good enough version that
Speaker:[Alex Andorra]: already gives you interesting insights, you don't have to reinvent everything.
Speaker:[Alex Andorra]: And you don't have to go with the first, hardest version from the start,
Speaker:[Alex Andorra]: where you have a hierarchical time series model where everything is varying
Speaker:[Alex Andorra]: and pulling information. Sure, that's cool. But don't start with that. It's
Speaker:[Alex Andorra]: like if you're starting to train, don't start with 100 push-ups. Start by like
Speaker:[Alex Andorra]: try five first, and then do a few series of them. build your way up to
Speaker:[Alex Andorra]: 100. So that's the critical thing I find of here at the patient framework
Speaker:[Alex Andorra]: coupled to the part of probabilistic programming languages, which is you can get
Speaker:[Alex Andorra]: down to a first good enough version and then in a few lines of codes having
Speaker:[Alex Andorra]: your version and then sampling from it. Because here you have it on the screen.
Speaker:[Alex Andorra]: The likelihood that you have a line for deterministic, which is the. logistic
Speaker:[Alex Andorra]: regression line, and then you have your intercept and your coefficient on
Speaker:[Alex Andorra]: the factors. And basically that's it. That's really amazing.
Speaker:[Max]: Absolutely. No, that's, I think, the beauty of Climacy that it allows you to describe
Speaker:[Max]: or build your model in a pretty intuitive way. And you can even let it be printed out
Speaker:[Max]: to see if everything is as you would have expected. And
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: yeah, then Climacy does the dirty work, the sampling and all that for you. And yeah,
Speaker:[Max]: but it already gives you an intuitive idea of how the modeling works. And yeah, that's
Speaker:[Max]: absolutely
Speaker:[Alex Andorra]: Yeah, yeah, yeah.
Speaker:[Max]: super
Speaker:[Alex Andorra]: No,
Speaker:[Max]: cool.
Speaker:[Alex Andorra]: it's really fun. Well done on that. And so I'm curious, what are your, do
Speaker:[Alex Andorra]: you have any ideas? Do you want to keep working on this model? Do you have
Speaker:[Alex Andorra]: any ideas on where to take it from what it is right now? Um.
Speaker:[Max]: Yeah, that's a good question, actually. So definitely the model can be improved. And
Speaker:[Max]: definitely, it's all depending on the features that you have and the data that you
Speaker:[Max]: have. And I think the clubs, they have so much more
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: interesting data than I have. And they could build many, many more interesting factors
Speaker:[Max]: according to our differences across
Speaker:[Alex Andorra]: Oh yeah,
Speaker:[Max]: teams.
Speaker:[Alex Andorra]: for sure.
Speaker:[Max]: So yeah, I really don't know because I tried to reach out to a couple of clubs,
Speaker:[Max]: let's say. But I don't know. there was nothing really coming back. So yeah, apparently,
Speaker:[Max]: perhaps they're not interested in that or maybe they have their own models already
Speaker:[Max]: or something. So I really don't know. I'd be excited to work on that. But as you
Speaker:[Max]: said, it's rather a side project that I did once upon a time. And yeah, it's not
Speaker:[Max]: really related to economics or finance. That's why I'm currently working absolutely
Speaker:[Max]: on other stuff. But yeah, I would love to work on that in that regard. But yeah, it
Speaker:[Max]: seems not. not so many teams are picking up on that, at least to those that I reached
Speaker:[Max]: out. And it seems to be European clubs. Um, because in part of your last episodes,
Speaker:[Max]: I heard people talking about that in the United States, it's pretty different. And,
Speaker:[Max]: um, yeah, uh, there are a lot of, apparently a lot of clubs already trying to implement
Speaker:[Max]: that to really try to understand the inherent latent skill of, of players, not necessarily
Speaker:[Max]: in soccer, but in baseball or in
Speaker:[Alex Andorra]: Yeah,
Speaker:[Max]: other,
Speaker:[Alex Andorra]: oh, especially
Speaker:[Max]: um, in other disciplines.
Speaker:[Alex Andorra]: baseball. Yeah, yeah, yeah. So this is sad, but I'm kind of reassured to
Speaker:[Alex Andorra]: hear you say that because I do think it's a huge area of improvement that
Speaker:[Alex Andorra]: there is in Europe. And clubs just don't seem to be very interested. The
Speaker:[Alex Andorra]: thing I know is that a few English clubs are using data pretty heavily, like Liverpool.
Speaker:[Alex Andorra]: Manchester City, clubs like that, but still is kind of the exception. I
Speaker:[Alex Andorra]: know Toulouse now in France, which is a small club, and that makes sense.
Speaker:[Alex Andorra]: If you're a small club, you have less money, so you have much more competitive
Speaker:[Alex Andorra]: pressure to find good players, which you are not overpaying, which is basically
Speaker:[Alex Andorra]: where science can help you. You don't want to pay for just a name. You
Speaker:[Alex Andorra]: want to pay for someone who has a name because... he's got talent, not
Speaker:[Alex Andorra]: just because he's got a name. So it's like, to me, everybody should do that.
Speaker:[Alex Andorra]: And I just don't understand why they don't. Because it's just like, that's
Speaker:[Alex Andorra]: also the beauty of sport, right, you don't care about the name, you care about
Speaker:[Alex Andorra]: what someone can do and if they have talent or not. Like, you should not care
Speaker:[Alex Andorra]: at all about the name, about the color of the skin, about nothing else,
Speaker:[Alex Andorra]: but what they can do on the field. And... Yeah, like to me that if I had
Speaker:[Alex Andorra]: a club, that would be one of my first priority. How do we make sure we optimize
Speaker:[Alex Andorra]: the way we are signing the players because it costs a lot of money. So.
Speaker:[Max]: I think one club that also does a lot of that data work is in Denmark, the FC Midjartland
Speaker:[Max]: or something. I think
Speaker:[Alex Andorra]: Uh-huh.
Speaker:[Max]: the name I got it completely wrong. But I heard once upon a time that they're really
Speaker:[Max]: investing a lot in data science and trying to assign players according to data or at least
Speaker:[Max]: incorporate data a lot in their daily training exercises and all that. So yeah, they
Speaker:[Max]: are one of the cutting edge maybe there in Europe as well. Small club,
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: but yeah. I think they won the Danish Championship a couple of years ago.
Speaker:[Alex Andorra]: Yeah, not surprised. I mean, something I see a lot, at least in France,
Speaker:[Alex Andorra]: and I've seen that a lot also on electoral forecasting, is basically this
Speaker:[Alex Andorra]: idea that if you start doing that, you're basically becoming kind of inhuman
Speaker:[Alex Andorra]: and you make players being robots. Basically, that's really an interesting thing
Speaker:[Alex Andorra]: to me because one of the spots that really use data heavily is cycling. A
Speaker:[Alex Andorra]: lot of the teams are using now data. Here, again, thanks a lot to the British,
Speaker:[Alex Andorra]: which often in Europe are the first ones to take up the data wave. And so
Speaker:[Alex Andorra]: I know, for instance, Bradley Wiggins, I think he's won the Tour de France.
Speaker:[Alex Andorra]: I don't remember how many times, but a lot of times. And basically, a lot. The
Speaker:[Alex Andorra]: whole team was using data to optimize the performances of the team. And
Speaker:[Alex Andorra]: that was one, like the British started being like, okay, we need to get back
Speaker:[Alex Andorra]: on our circling game. They started using data extremely optimally and well, they
Speaker:[Alex Andorra]: did. And thanks to these, basically a lot of the teams started to do that again.
Speaker:[Alex Andorra]: And the Tour de France is extremely optimized on that. But it's funny because when
Speaker:[Alex Andorra]: you hear the mediatic coverage of that, at least in France, it's a bad thing
Speaker:[Alex Andorra]: because it's like players are becoming robot. and they cannot eat what they
Speaker:[Alex Andorra]: want at the time they want. And they like, it just gets the magic out of
Speaker:[Alex Andorra]: the Tour de Francois and I strongly disagree with that, of course, because the
Speaker:[Alex Andorra]: performances get better in a clean way, of course. Well, then that's just
Speaker:[Alex Andorra]: better for everybody because the show is going to get better. And also We're
Speaker:[Alex Andorra]: talking about the Tour de France or professional athletes. Like the goal is
Speaker:[Alex Andorra]: not to recreationally do that. They do that for a living. Um, so it's important
Speaker:[Alex Andorra]: for their own basically income. Uh, but also they do that because they want
Speaker:[Alex Andorra]: to be the best. Is it, they are not doing that because, well, they just
Speaker:[Alex Andorra]: want to cycle on the weekends, right? They cycle for living. So yeah, sure.
Speaker:[Alex Andorra]: If you're an amateur cyclist, then okay. You don't need the same. structure
Speaker:[Alex Andorra]: as a professional cyclist. But even then, if you want to improve your performances
Speaker:[Alex Andorra]: as an amateur cyclist, you're going to need to optimize some of the things.
Speaker:[Alex Andorra]: And if you really care about it, you're going to need to optimize your nutrition,
Speaker:[Alex Andorra]: for instance, and maybe when you take your meals or else. But if you're
Speaker:[Alex Andorra]: a professional, the one slightest change can mean you're going to have to take
Speaker:[Alex Andorra]: your meals or else. perform one second better or two seconds better, which
Speaker:[Alex Andorra]: can make you win the Tour de France or not. So I don't understand this argument
Speaker:[Alex Andorra]: in these contexts where you're trying to optimize performance. For me, it's
Speaker:[Alex Andorra]: like not something that should count here. They are not doing that for pleasure
Speaker:[Alex Andorra]: only.
Speaker:[Max]: I think absolutely agree. Absolutely agree. It should be incorporated much more,
Speaker:[Max]: especially for the clubs. In the end, I think it will pay off as you lay it out.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: You want to pick a lemon, and
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: you just rather pick it. Yeah.
Speaker:[Alex Andorra]: Yeah, yeah. No, I mean, I have to say it's like, it's an interesting topic
Speaker:[Alex Andorra]: for me because I'm trying to crack that nut and I cannot crack it for now.
Speaker:[Alex Andorra]: Like, understand why basically the clubs in Europe are not really interested
Speaker:[Alex Andorra]: in that. Because I don't really care about the Chinese side or else. I'm like,
Speaker:[Alex Andorra]: once the club starts picking that up, then everybody will have to. But what
Speaker:[Alex Andorra]: I'm trying to understand is why the clubs don't do that. because it's just
Speaker:[Alex Andorra]: leaving gates on the table. And I'm just super curious about why they would
Speaker:[Alex Andorra]: do that from a sociological standpoint, honestly. Because I've seen a lot
Speaker:[Alex Andorra]: of clubs using, they have data science teams, but they use it for marketing.
Speaker:[Alex Andorra]: That's
Speaker:[Max]: I see,
Speaker:[Alex Andorra]: such a
Speaker:[Max]: I
Speaker:[Alex Andorra]: shame.
Speaker:[Max]: see.
Speaker:[Alex Andorra]: And I don't know why. So if anybody
Speaker:[Max]: there.
Speaker:[Alex Andorra]: knows, please get in touch. If anybody is working in a club, please get
Speaker:[Alex Andorra]: in touch with Max or me, because I want to know about it. We don't even need
Speaker:[Alex Andorra]: to work together. I would be happy to help you out with a model, but for
Speaker:[Alex Andorra]: now, I just want to know why and what are the internal factors, because
Speaker:[Alex Andorra]: definitely there is something going on, but I don't know what it is, and
Speaker:[Alex Andorra]: I'm just curious about it. So yeah, to try and make it a bit more constructive,
Speaker:[Alex Andorra]: do you have any idea on how we personally in the data world could change
Speaker:[Alex Andorra]: the status quo in that regard? And not only for spots, but that's also true
Speaker:[Alex Andorra]: for a lot of domain where more robust application of the scientific method
Speaker:[Alex Andorra]: would be useful. But it's hard to get it done. Do you have any ideas personally
Speaker:[Alex Andorra]: on how that status quo could be changed?
Speaker:[Max]: Yeah, I think it's really hard to say. It depends on the willingness to adopt these,
Speaker:[Max]: to be open to these methods, I would say. And the players play an important part,
Speaker:[Max]: or I think the crucial part, because if the players are not willing to adopt these
Speaker:[Max]: additional insights, I would say, it's just not possible.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But for sure, I mean, as you say, it's management, it's internal. things that are
Speaker:[Max]: going on there, politics potentially, but I really don't know. How can someone resolve
Speaker:[Max]: that? I don't know. I regard it always as, for sure, you shouldn't base all your decisions
Speaker:[Max]: on this model or on a single model or so, but it can help
Speaker:[Alex Andorra]: No for sure.
Speaker:[Max]: stimulate your decision process, and I think it's a useful addition. And in the
Speaker:[Alex Andorra]: Yep.
Speaker:[Max]: end, for sure, there might be an upfront cost, basically, to implement, to get the data,
Speaker:[Max]: to implement the model, to hire people to produce that, but In the end, it actually
Speaker:[Max]: may pay off economically because it may save you from picking a lemon overpaying massively.
Speaker:[Alex Andorra]: Oh yeah, for sure.
Speaker:[Max]: So
Speaker:[Alex Andorra]: Yeah, yeah.
Speaker:[Max]: yeah, I see it really as a worthwhile investment.
Speaker:[Alex Andorra]: No official,
Speaker:[Max]: I think the US
Speaker:[Alex Andorra]: yeah.
Speaker:[Max]: sports has demonstrated that.
Speaker:[Alex Andorra]: Yeah, yeah. I mean, just look at the US, just look at all the other fields,
Speaker:[Alex Andorra]: especially marketing, for instance, which is starting and already started to adopt
Speaker:[Alex Andorra]: data analysis and modeling aggressively and they just like, we do that all at the labs,
Speaker:[Alex Andorra]: basically making them save a lot of money and not only save money, but make
Speaker:[Alex Andorra]: more money. So like, it's just, yeah, like, I don't think this is a question,
Speaker:[Alex Andorra]: but yeah. I mean, something you can do. I would think if you're interested
Speaker:[Alex Andorra]: in it and have the time, something maybe that could work is if you could make
Speaker:[Alex Andorra]: some predictions with your model, basically. And I would think to get it per
Speaker:[Alex Andorra]: player, you would probably need some hierarchical structure in that to get
Speaker:[Alex Andorra]: some better predictions. But once you get there, you have something of a
Speaker:[Alex Andorra]: web page with basically the predictions of the model per player saying
Speaker:[Alex Andorra]: basically, this player is basically overvalued and this player is undervalued,
Speaker:[Alex Andorra]: basically based on the results of the model. And then basically see what that
Speaker:[Alex Andorra]: gives you during the season because at the beginning of the season, you
Speaker:[Alex Andorra]: can see that player is basically undervalued. He's gonna perform better than
Speaker:[Alex Andorra]: what the market currently think. And then people see that it's true. All that's
Speaker:[Alex Andorra]: a clear sign that basically these kind of... methods and models are working
Speaker:[Alex Andorra]: and so that could spark some interest. Um, because definitely demonstrating
Speaker:[Alex Andorra]: what a model is for. Because I'm my hinge, hinge hunch. I think it's hunch.
Speaker:[Alex Andorra]: My hunch is that, um, basically the decision makers in the clubs are not data,
Speaker:[Alex Andorra]: um, they don't, don't really know what data is about. and they even don't
Speaker:[Alex Andorra]: know what a model is and what it can give you. But if you are able to demonstrate
Speaker:[Alex Andorra]: what a model can give you, because they don't care about the model, the priors,
Speaker:[Alex Andorra]: the parameters, stuff like that, they just care about the results of the model.
Speaker:[Alex Andorra]: So if you can demonstrate the results of the model and even better what the
Speaker:[Alex Andorra]: model can say about recruiting that player or not recruiting that player,
Speaker:[Alex Andorra]: that would maybe have a better impact, or at least I would say it increases
Speaker:[Alex Andorra]: the probability that the impact... These methods can help get noticed.
Speaker:[Max]: Oh, absolutely. That's absolutely the case. For sure, it depends on having the real-time
Speaker:[Max]: data, basically getting the real-time data.
Speaker:[Alex Andorra]: Exactly. Yeah.
Speaker:[Max]: That's an upfront cost that you would have to pay. No, but that's actually the intent,
Speaker:[Max]: really. This is the intent to run that model for multiple players as part of the workbook,
Speaker:[Max]: for example, to lay it out and to compare which players perform well or not. And you
Speaker:[Max]: see it, for example, Cristiano Ronaldo, when he won the. player of the year award in
Speaker:[Max]: 2008. He was basically in the middle of the pack in
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: that season. So there were other players actually outperforming, for example, Imera
Speaker:[Max]: Berbertov in that very season. He was playing for Tottenham
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: later on in the year, thereafter signed by Manchester United. So you see that. And
Speaker:[Max]: for sure, there's a lot of subjective judgment coming in from when you observe it
Speaker:[Max]: and you see the model telling you something completely different. But this is stimulating
Speaker:[Max]: and it should potentially update your priors, so
Speaker:[Alex Andorra]: Yeah,
Speaker:[Max]: your
Speaker:[Alex Andorra]: exactly.
Speaker:[Max]: subjective price.
Speaker:[Alex Andorra]: Yeah. And forces you to lay out your priors clearly
Speaker:[Max]: Thank
Speaker:[Alex Andorra]: and
Speaker:[Max]: you.
Speaker:[Alex Andorra]: on paper. So it's actually very important. Yeah. So I would say definitely
Speaker:[Alex Andorra]: something like that. And if you have the predictions for the biggest number
Speaker:[Alex Andorra]: of players on a webpage and basically betting based on the model, saying
Speaker:[Alex Andorra]: that this model, this player is going to over perform. in respect to the
Speaker:[Alex Andorra]: market or underperformed in respect to the market. That's an interesting
Speaker:[Alex Andorra]: thing. And also, as you were saying, for the individual rewards, where the
Speaker:[Alex Andorra]: name is extremely, like, counts a lot, where you can see someone like Messi,
Speaker:[Alex Andorra]: who is, yeah, sure, an incredible player. But the number of times he's got the
Speaker:[Alex Andorra]: golden... How is it called in English? Ballon d'or? Golden ball, I don't
Speaker:[Alex Andorra]: know. You could argue that some of these seasons where he did get the reward,
Speaker:[Alex Andorra]: maybe there were other players who were actually overperforming him, but they
Speaker:[Alex Andorra]: don't have the name recognition, so they are not scrutinized as much. They don't
Speaker:[Alex Andorra]: have the confirmation bias going in their favor, where it's like everybody's
Speaker:[Alex Andorra]: looking at Messi because they already know he's extremely good, so they just
Speaker:[Alex Andorra]: look at confirming the fact that he's... Incredible, which he is, but maybe
Speaker:[Alex Andorra]: not all the time, so as to get so many rewards. So yeah, like that. To me,
Speaker:[Alex Andorra]: that would be a really good way of demonstrating the utility of these methods.
Speaker:[Alex Andorra]: Basically,
Speaker:[Max]: Thank
Speaker:[Alex Andorra]: making
Speaker:[Max]: you.
Speaker:[Alex Andorra]: it really concrete for the decision maker.
Speaker:[Max]: Thank you.
Speaker:[Alex Andorra]: So before we close up the show, I'd like to get back a bit on your personal
Speaker:[Alex Andorra]: experience with bass. And I'm curious, what was your main pain point on this
Speaker:[Alex Andorra]: project, the Sucker Factor model, and just in general, when you're using the
Speaker:[Alex Andorra]: bassian workflow, what is your main pain point right now?
Speaker:[Max]: Yeah, so in that project, I really have to admit that Mayer was lucky. But
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: there wasn't really a huge pain point. I mean, it's not
Speaker:[Alex Andorra]: Uh-huh.
Speaker:[Max]: something publishable for a paper or so. It's just basically sketching the idea
Speaker:[Max]: behind the model and basically showing the outline of the model, what it can give
Speaker:[Max]: you.
Speaker:[Max]: pretty well. I didn't really, I don't remember any really big problems. So then when
Speaker:[Max]: I looked at the model evaluation, everything looked fine. I mean, for example, we can evaluate
Speaker:[Max]: the how well the model works is when you look at in this logistic regression at
Speaker:[Max]: the area under the curve, for example, it's a popular metric. And it wasn't a reasonable
Speaker:[Max]: ballpark. And that was fine for me so that the model didn't the results were really
Speaker:[Max]: what you would have, or that it's kind of reliable, the results. So that was not much
Speaker:[Max]: of a pain point. And that was also nice for me to see that, yeah, it's a simple model
Speaker:[Max]: and it works also pretty simply. And yeah, that was a project that I was pleased
Speaker:[Max]: to see that there were not many obstacles that I had to overcome.
Speaker:[Alex Andorra]: Nice. Yeah, that's good to hear. And so in general, in the Bayesian workflow,
Speaker:[Alex Andorra]: do you identify something in your own learning that is costing you to learn
Speaker:[Alex Andorra]: right now, that has cost you to learn, and you would like an easier way
Speaker:[Alex Andorra]: to have learned that?
Speaker:[Max]: I mean, I have to say that, for example, with all the different samplers that are out
Speaker:[Max]: there, that's not my major field. I would like to learn much, much more about the inner
Speaker:[Max]: workings of all these samplers. I mean, I code maybe one of the simpler ones, myself
Speaker:[Max]: maybe once or so, but then I really resort to open source packages for that. But to really
Speaker:[Max]: understand what's going on, I think, yeah. looking deeper into that, that's definitely
Speaker:[Max]: something I would like to do and would need to do.
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: But yeah, I think that's basically the math of it. I think it's the most fascinating
Speaker:[Max]: stuff and how it really works and how it's then implemented in code. I think that's
Speaker:[Max]: the most fascinating stuff. But yeah, the beauty of PyMC then is if you really are
Speaker:[Max]: interested in the outcome and want a fast outcome, yeah, it's pretty intuitive.
Speaker:[Max]: Yeah.
Speaker:[Alex Andorra]: Nice. OK. Well, it's good to hear. Yeah, and I'm asking that from a developer
Speaker:[Alex Andorra]: perspective and also teacher perspective. That's always interesting for
Speaker:[Alex Andorra]: me to get a peek in the learning experience of the people. Cool. So before we
Speaker:[Alex Andorra]: close up the show, is there a topic I didn't ask you about and that you'd
Speaker:[Alex Andorra]: like to mention?
Speaker:[Max]: Well, actually, my career hasn't progressed so much so far. So I think we covered everything
Speaker:[Max]: there. So, oh yeah, that's pretty interesting. And yeah, you covered actually everything.
Speaker:[Alex Andorra]: Awesome. Yeah, we did record for a long time, so that's a price.
Speaker:[Max]: Thank you.
Speaker:[Alex Andorra]: Yeah, and I'm happy. I got to ask you the main thing I wanted to ask you,
Speaker:[Alex Andorra]: so that's super cool. In a reasonable amount of time, I'm sure the listeners will
Speaker:[Alex Andorra]: appreciate it, because the last two episodes were the two longest of the whole
Speaker:[Alex Andorra]: podcast. So it's good to get back to reasonable amounts of time for people,
Speaker:[Alex Andorra]: I guess. And yeah, so before letting you go, I'm gonna ask you the last
Speaker:[Alex Andorra]: two questions I ask every guest at the end of the show. So Max, if you had
Speaker:[Alex Andorra]: unlimited time and resources, which problem would you try?
Speaker:[Max]: Yeah, so I think one of the most popular answers is climate change. And definitely,
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: it's, it's probably the most present problem, especially here in Milan currently.
Speaker:[Max]: You really feel it.
Speaker:[Alex Andorra]: Ha.
Speaker:[Max]: But when I've been or throughout the time I've been working on a bit of climate
Speaker:[Max]: econometrics, let's say, forecasting RTC, as I saw what people are really doing
Speaker:[Max]: in climate and what, yeah, they're fascinating people out there very, very intelligent people.
Speaker:[Max]: So I think my throwing money on me would be wasted in that regard. I mean, what I'd
Speaker:[Max]: be rather interested in is like, yeah, maybe implementing that into sports into sports
Speaker:[Max]: analytics, right to, to allow teams to access data to have access to data, and
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: to kind of create that level playing field across players and then really, yeah,
Speaker:[Max]: it's an investment and people spend a lot of, especially in investing and in banking
Speaker:[Max]: and finance, spend a lot of time on crunching numbers and why not do that in sports as well
Speaker:[Max]: if you have the data available. So yeah, I'd be very, very interested in working on
Speaker:[Max]: that. That's for sure.
Speaker:[Alex Andorra]: Yeah, I love it. Me too, for sure. That's a good one. And if you could have
Speaker:[Alex Andorra]: dinner with any great scientific mind, dead, alive or fictional, who would it
Speaker:[Alex Andorra]: be?
Speaker:[Max]: Yeah, well, that's a that's pretty a tough question, I have to say. So
Speaker:[Alex Andorra]: Yeah.
Speaker:[Max]: no, really, it's, yeah, there's so many amazing people out there. And when you read
Speaker:[Max]: papers, that's really incredible. What people are doing. And so yeah, there's so many
Speaker:[Max]: people I'd like to talk to you on. Well, one, one for sure. It's Frank Debal, the guy
Speaker:[Max]: who basically invited me to the University of Pennsylvania, because that was a declining
Speaker:[Max]: point in my PhD, absolutely. But then if I could pick one as professors should expand
Speaker:[Max]: on your network, basically, it would be Ben Bernanke. He
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: was former president of the Federal Reserve. He received
Speaker:[Alex Andorra]: Mm-hmm.
Speaker:[Max]: the Nobel Prize in economics. Well, people say there's no Nobel Prize in economics, but
Speaker:[Max]: yeah, the Ricks Bank prize last year for his work on banks and financial crisis.
Speaker:[Max]: Yeah, that would be super interesting to talk to him. He served his country basically.
Speaker:[Max]: Then he was assistant professor. So how he managed all that. And yeah, that would be
Speaker:[Max]: super interesting to talk to him. Phenomenal scholar. And I like reading his papers. So
Speaker:[Max]: yeah, I think that would be super cool.
Speaker:[Alex Andorra]: Nice, yeah. Love it. Very nerdy answer.
Speaker:[Max]: Okay.
Speaker:[Alex Andorra]: Awesome. Well, thanks a lot, Max. That
Speaker:[Max]: Thanks, Adam.
Speaker:[Alex Andorra]: was really interesting. You allowed me to rant about some of my pet peeves
Speaker:[Alex Andorra]: about
Speaker:[Max]: Thanks
Speaker:[Alex Andorra]: data
Speaker:[Max]: for watching!
Speaker:[Alex Andorra]: analytics and soccer. And I hope people learned a bit more. And of course,
Speaker:[Alex Andorra]: if they are curious, as usual, I will put a link. resources and a link to
Speaker:[Alex Andorra]: your website in the show notes for those who want to dig deeper. Thank you
Speaker:[Alex Andorra]: again Max for taking the time and being on this show.
Speaker:[Max]: Thanks Alex. It was a pleasure.