Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!
Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!
Visit our Patreon page to unlock exclusive Bayesian swag ;)
Takeaways:
Chapters:
00:00 Introduction to Football Analytics and Matt's Journey
04:54 The Role of Bayesian Methods in Football
10:20 Challenges in Communicating Data Insights
17:03 Building Relationships with Coaches
22:09 The Structure of the Data Team at Como
26:18 Focus on Player Recruitment and Transfer Strategies
28:48 January Transfer Window Insights
30:54 Biases in Football Data Analysis
34:11 Comparative Analysis of Men's and Women's Football
36:55 Statistical Techniques in Football Analysis
42:48 The Impact of Tracking Data on Football Analysis
45:49 The Future of Data-Driven Football Strategies
47:27 Advice for Aspiring Football Analysts
51:29 Future Projects in Football Analytics
Thank you to my Patrons for making this episode possible!
Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.
Links from the show:
Transcript
This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.
Today, I'm joined by Matt Ben, a football data scientist for the Italian football club
Como:
2
:years.
3
:With a background in mathematics from Oxford and a passion for football, Matt now applies
statistical modeling to help clubs make smarter, data-driven decisions.
4
:In this episode,
5
:Matt walks us through his journey into football analytics, one that started during the
lockdown and quickly evolved into a full-time career.
6
:We discuss the role of Bayesian methods in football, particularly when working with
limited data, and the challenges of communicating statistical insights to coaches and
7
:decision makers.
8
:Matt also shares his experience in building a data team within a football club,
highlighting the importance of bridging the gap between analytics and coaching.
9
:We dive deep into the world of player recruitment and transfer strategies, exploring how
biases in traditional football statistics can distort player evaluations and how tracking
10
:data is transforming our understanding of performance.
11
:From expected passes to the differences between men's and women's football, this
conversation is packed with insights into the evolving landscape of sports analytics.
12
:This is Learning Vision Statistics episode
13
:128, recorded November 19, 2024.
14
:Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,
the projects, and the people who make it possible.
15
:I'm your host, Alex Andorra.
16
:You can follow me on Twitter at alex-underscore-andorra.
17
:like the country.
18
:For any info about the show, learnbasedats.com is the last to be.
19
:Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on
Patreon, everything is in there.
20
:That's learnbasedats.com.
21
:If you're interested in one-on-one mentorship, online courses, or statistical consulting,
feel free to reach out and book a call at topmate.io slash alex underscore and dora.
22
:See you around, folks.
23
:and best patient wishes to you all.
24
:And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can
help bring them to life.
25
:Check us out at pimc-labs.com.
26
:met then welcome to learn invasion statistics now thank you for taking the time and then
and the big thank you to dizzy even over comment that friend of a of ours and busy
27
:recommended me to to interview you on the show bs she knows i love states in sports so she
was like are you going to meet you get a meet met madison so you know
28
:But yeah, for for people who want to hear more actually about Daisy, I just had her on the
show episode 117.
29
:And we dove dove.
30
:think you say we dove in English, right?
31
:so we dove into into the fascinating work Daisy is doing on Bayesian experimental design.
32
:So yeah, definitely encourage you guys to check these one out.
33
:I'll put it in the show notes because
34
:That's definitely a great one.
35
:I had great feedback about that one, Tessy, so thanks a lot and lots of pressure for us
today, Let's try That's a act to follow.
36
:I know, know.
37
:But you know, let's try.
38
:So actually, can you share your journey with us?
39
:Because I like your path because you studied math at Oxford, but now you work in football
analytics or soccer analytics for our...
40
:American friends.
41
:So yeah, I'm curious how how did that happen?
42
:Yeah, so like you said, I did an undergrad and master's in maths at the University of
Oxford and then went on to do a PhD in statistics and really, certainly going into that,
43
:the plan was not to go into football.
44
:I thought I'd probably go become an academic or work in research in some other way.
45
:But it was during lockdown.
46
:It was kind of winter 2021.
47
:It was cold outside and we were stuck inside.
48
:And I was sat playing the video game football manager.
49
:I'm hoping at least some of the people listening in have played that.
50
:too.
51
:If not, I would thoroughly recommend although it's a dangerous dangerous one to spend too
much time on.
52
:And so doing that, and just kind of thinking, I actually probably have the skills to do
this for real now.
53
:Or at least to do some of it for real to kind of, you know, curate some of this data
myself to start building models and to start having an effect on how clubs sign players
54
:and how they line up.
55
:And so I just sent out
56
:a load of random emails to clubs in Oxfordshire, which is where I was living at the time,
just saying, would you like anyone to come do some free data analysis for you?
57
:And I kind of forgot about it.
58
:After that, like life went on.
59
:And then about a month later, someone from Oxford City Football Club, which is a semi
professional football team in Oxfordshire, and got back to me and they were like, Yeah,
60
:we'd love to have you on board.
61
:And so, yeah, it just kind of it started from there.
62
:Really, I was kind of useless for the first year, to be perfectly honest, just trying to
work out.
63
:I think I came in thinking, I'm gonna solve everything.
64
:I'm better than all of these other people.
65
:that turned out to be very much not the case.
66
:All of these people really knew what they were doing and I did not.
67
:But yeah, they were very patient with me.
68
:A guy called Wilkie from Oxford City in particular.
69
:yeah, just kind of...
70
:snowballed from there really and so worked for them for a few years and also did some work
for Solihull Moors and Oxford United and while I was doing my PhD and then when that
71
:finished and moved to FC Como who at the time were second division in Italy now promoted
to the first division and yeah been working there working there since.
72
:This is really cool yeah yeah I love it.
73
:And were you already...
74
:like have you always been a football fan?
75
:Do you have a team you root for?
76
:Yeah, very much grew up kind of.
77
:the classic, I don't know, I was probably classic seven year old boy obsessed with
football.
78
:Played a lot of football when I was a child, although deeply unsuccessfully.
79
:I always shudder to think what would happen if my data ever appeared on any of our data
sources.
80
:yeah, I grew up supporting Bolton Wanderers, who were a really trendy English football
team for like about a year and a half, and I made a terrible mistake to support them and
81
:it's been all downhill since then.
82
:And yeah, of, you know, always following football, always playing things like football
manager.
83
:So it's been a real joy to be able to do that in my kind of nine to five as well, working
for Kome.
84
:Yeah, yeah, yeah, I guess.
85
:So yeah, we've been the same seven year old, you know, so, and I've made the same mistake
f, so when I was five, it was:
86
:until now, European Cup, like the previous Champions League, I don't remember the name in
English.
87
:And so of course they were everywhere on TV in France.
88
:And my favorite color was blue, so you it became my team.
89
:And that was a big mistake.
90
:Now I have to root for them, it's absolutely terrible.
91
:I know.
92
:know.
93
:Yeah.
94
:The temptation is so there to change, but I just couldn't.
95
:I couldn't, couldn't Oswald Bolton.
96
:Oh yeah, I'm the same like.
97
:Oh yeah, believe me, I tried.
98
:I tried, you know, being like.
99
:It's not the same, it?
100
:Yeah, it's.
101
:This is awesome.
102
:I have amazing, amazing data in, it's about, uh, professional athletes in, um, when I get
to do models about, uh, other performance.
103
:Absolutely incredible.
104
:So, you know, um.
105
:really grateful for that.
106
:And actually I'm curious because I of course use patient stats a lot in my work and in
everything, even in my life.
107
:What about you?
108
:you using any patient stats at work?
109
:And even more generally, do you remember when you were first introduced to patient stats?
110
:Yeah, I guess maybe I'll start with when I was first introduced and then, spoiler alert, I
do use it quite a lot at work.
111
:I remember it was in our second year undergrad stats course and it was just this little
section at end of the course.
112
:It all been very frequentist up to that point.
113
:then we had like three lectures on Basing stats and I remember walking out the first floor
going like, mate this is nonsense, like you don't know the prior, what's going on, I would
114
:never use this.
115
:And then slowly over time I realised that maybe my initial reaction as a kind of know it
all second year undergrad was not necessarily correct.
116
:And so yeah, like have really come to appreciate how useful it is.
117
:think particularly in a football context, the places where certainly it's the most
powerful are the places where you don't have very much data.
118
:So when when you're trying to model something you come up with a parametric model and
you've not got much to fit to it.
119
:So for example Kind of modeling international football results is a really good example of
that You've got kind of over the course of four years, which is maybe as much relevant
120
:data as you can get you've got maybe 30 40 games tops for each team And so you're trying
to train a kind of traditional Maybe it's not traditional but I can actually boost model
121
:or something like you just end up overfitting massively whereas
122
:Bayesian stats really helps you kind of know what you don't know, I guess.
123
:Yeah, yeah, couldn't put it better.
124
:I mean, even when you have a lot of data.
125
:So yeah, I know a bit about football data.
126
:And I know also about baseball data, because that's my job.
127
:And baseball, we have a ton of data.
128
:It's absolutely just absolutely incredible.
129
:But you have a lot of data in aggregate.
130
:Doesn't mean you have a lot of data about each and every player.
131
:Yeah.
132
:So I mean, even you have millions of rows doesn't mean you have millions of rows about
each player and actually, Vagin says are still super useful there because well, there is a
133
:lot of players you don't know a lot about and you want to make inferences about what they
could do and are able to do.
134
:Well, here that's super helpful too.
135
:Yeah, definitely, definitely.
136
:And like in a football context, it's those kind of rare actions.
137
:we should kind of fall into that category.
138
:something like looking at players passing, you've got lots of data on them, like a top
player will do 100 passes a game, that's kind of enough to do whatever you want.
139
:But when you start to look at something like shooting, particularly maybe from a
goalkeeper, like saving shots, you're probably only facing like five shots a game that are
140
:on target.
141
:And so it's yet in those scenarios where again, it becomes really important.
142
:Yeah, yeah, yeah.
143
:Yeah, great point.
144
:I've been working lately for an open source or...
145
:project you know like that just just for fun with football data and you know we're looking
at Scoring rates and yes scoring rates are so low.
146
:It's just so hard to score a goal, you know, yeah, and it's so low that Using a Poisson
likelihood for instance, which you think that would be the likelihood you want right
147
:because that's already rare events for the Poisson
148
:the person is actually too heavy on the right tail myself it's gonna have too much
probability mass around 234 goals which are actually
149
:very rare from an individual perspective right it's actually very rare that one given
player scores a goal and then more than one goal it's even more rare and the Poisson is
150
:just like it puts too much emphasis on the two three plus goals and since you don't have
enough data to inform that these scenarios are actually very improbable then if you don't
151
:or change the likelihood or inform the prior about that, then the model will be happily
saying, yeah, but there is a lot of uncertainty.
152
:You know, that player maybe will end up scoring five goals in the next game.
153
:No, he won't, you know, even if that's messy, that's very rare.
154
:yeah, that's here.
155
:That's really dominant knowledge that you want to use or even like real statistical
knowledge here even.
156
:Yeah, definitely, definitely.
157
:And I'm wondering, like for you for instance, I mean for you and football in general, what
is your main obstacle or paint point when you're using a Beijing model?
158
:Yeah, I guess a lot of it, to be fair, this is in some sense a more general obstacle.
159
:It's easy.
160
:It's very easy to come up with something you might want to model about a game.
161
:You might want to model like, I don't know, which of the opposition players is going to
make the most passes in your final third or something like close to your goal.
162
:The difficulty with all football models is how do you actually then apply that into the
game.
163
:Like it's all very well if we're playing Man City I can go to the coaches and say that
Erling Haaland guy he's quite good but they can't actually use that and so it's kind of
164
:you need the
165
:you're always having to balance kind of interesting data to model, but also using the
insight from the coaches and the people you're working with to try and go, how can we,
166
:like what possible things could we do in this game?
167
:Like what possible strategies could we employ?
168
:And then how can we kind of model whether or not they'll solve this problem?
169
:So yeah, kind of making sure that what you're doing is really, really applicable is a real
challenge.
170
:Yeah, so basically all of my work is in Python.
171
:So I guess that makes me not a real coder.
172
:But it does the job.
173
:I do everything in Python, and I'm an open source developer in Python.
174
:you're like, you're welcome here.
175
:That's good, that's good, that's good.
176
:Because I was made to code some JavaScript this week for an app and that did not go well.
177
:I'm very happy sticking with Python.
178
:yeah, I feel your pain.
179
:But yeah, and then within that, kind of a lot of the stuff, I I maybe have a bad tendency
to kind of often try and code everything custom myself.
180
:So like if we're doing an MCMC loop, for example,
181
:Obviously there are lots of great packages out there to do that in Python, but I'd always
have the temptation to do it myself and then kind of you'll be able to fit in a few custom
182
:speed ups and things like that in the process.
183
:But yeah, think that maybe answer your question.
184
:I'm not 100 % sure.
185
:Yeah, yeah, yeah.
186
:And also I'm curious, do you use any package to write the models?
187
:Like are you using Pimc or you using NumPyro?
188
:Yeah, so again, the models themselves are generally just kind of Python scripts, maybe
wrapped up in number or something like that to pre-compile and speed it up.
189
:OK.
190
:I see.
191
:Yeah, but fairly standard Python.
192
:OK, yeah, yeah, Yeah, then so you might want to look at PyMC because we have the number
back end.
193
:OK, nice.
194
:So you'll be able to run your models in.
195
:in number if you use you use not by another who this is this is using number or you can
even use checks if you have a GP or if you need one yeah so yeah lots of cool stuff on
196
:there on that from so yeah like if you need any any tips on how to get started with that
yeah well feel free to contact me in private and I'll be happy to to hook you up that's
197
:very kind thank you
198
:in Yeah, okay.
199
:And something I'm also curious is you talked about, okay, my, the models need to be
actionable.
200
:And so whoever you're developing the model with, you want to understand how they are going
to consume the
201
:So yeah, I'm curious what your experience is right now at Como, but also your previous
clubs.
202
:What was your experience talking to the coaches, to the teams?
203
:Did you find...
204
:that some parts were actually easier than you expected, some parts harder.
205
:What was your experience of that, making that translation between the technical side and
the sports side?
206
:Yeah, think that's definitely a skill I've had to develop over the last few years.
207
:I've been very lucky to work with clubs where the coaches are very much...
208
:They want to be very data driven, they're very much kind of evidence based in their
decisions and that's been a really good thing because that's not guaranteed in the
209
:football world from stories from other people I've worked with.
210
:But yeah, in terms of kind of communicating, think really a lot of it is giving the
coaches the tools to kind of both ask the questions that they want to know the answer to
211
:and also...
212
:Yeah, just kind of be aware of what you could do.
213
:Because I think the first few times I had meetings with coaches, I'd be like, what do want
me to do?
214
:You know, we've got lots of data, and that would be it.
215
:And then as a coach who's kind of, you know, has a lot of football knowledge in their
head, but has never really engaged, or not engaged, but has never really, hasn't got the
216
:same stats experience, you're asking them to do, you know, you're essentially asking them
to plan a stats project, which is very much not their, not their skill set.
217
:you know, not the best use of their time.
218
:So I think a lot of it is kind of, yeah, kind of being able to very quickly come up with
demos of what you might want to show, like even if it's kind of, you know, you've
219
:horrendously hard-coded something in a notebook, just to kind of...
220
:show them, it's kind of, yeah, being able to kind of sit in the meetings that they do
pre-match and post-match with the players, that gives you really good insight into kind of
221
:the sort of things that they think are important, the sort of things that are coming up in
these briefings.
222
:And then that gives you kind of something to bounce off and go, okay, I've seen that
you've done this.
223
:Maybe I could do something related to this work you've done on their high press or
something like that.
224
:And then you can start building that relationship where you're bouncing off each other.
225
:It's extremely liable.
226
:Basically what I try to do is...
227
:Yeah, it would be so easy to just like sit in a room for a year and build really cool
stuff on football.
228
:And I'd have a great time, but no one would want to use it.
229
:and to kind of remember the people who are like, you your consumers essentially.
230
:Yeah, yeah, exactly.
231
:And that's also, that also helps you a lot to prioritize, right?
232
:Because otherwise, you're gonna do a model about anything.
233
:It's like, just basically, you know, kind of what you were telling the coaches, like, I
definitely could do the same thing where like, I go see the coach or the GM and I'm like,
234
:okay.
235
:What are you curious about?
236
:What do you want to know?
237
:I can be a model about anything, so just ask.
238
:But then, like, that's precisely the problem because you have to prioritize what's the
most useful to them because otherwise you can end up having an awesome model but if they
239
:don't use it, that's not very good, you know.
240
:So, yeah.
241
:Yeah.
242
:Yeah.
243
:And I think with kind of, yeah, I think with sports analytics being so new as well, it's
not the case that you've got like a set 10 things you want before each game and you're
244
:just kind of iterating on them and making those slightly better at predicting or whatever.
245
:Like you're generally, you're coming up with that list
246
:of 10 things that you want.
247
:And so, yeah, think that makes it really interesting time to work in sports at the minute,
but also a challenge because you've got to work out what's a good thing to be researching,
248
:like you say.
249
:Yeah, yeah, yeah, definitely.
250
:And also...
251
:It depends on the size of your data team, As the club.
252
:Because, if you're just a few, handful of people, then you have to be very good at
prioritizing.
253
:Okay, we really want that right now.
254
:That stuff would be awesome, but that's later down the road because we need more time to
develop that or more people and we don't have more people.
255
:You know, if you're a huge team, like the big MLB teams here, they have 20, 30,
256
:research analysts so they can do so many things in parallel.
257
:It's like having a GPU basically, it makes your model run faster, right?
258
:It doesn't mean your model is going to be better, but you can try so many more things than
if you're just two, three, four people.
259
:But in this case, you have to prioritize more.
260
:so yeah, this is extremely, extremely important.
261
:I'm curious actually, how does that work for you with the combo team?
262
:because you were telling me before the show that actually the team is based in London.
263
:The club obviously is Italian.
264
:Obviously people know about Cuomo Lake.
265
:If you don't, try and go there at some point.
266
:Maybe not in the summer, it's just too many people.
267
:But that's an amazing day.
268
:If you're in Milan, already Milan is a great city to visit, to leave.
269
:That's even better.
270
:really like...
271
:Milan and then you can take, know, a day you take the train, you go to Como and you just
enjoy it because that's amazing.
272
:But the team, so the team is there, but the data team is actually in London, am I right?
273
:Yeah, I remember when I first got offered this job, I shouted up the stairs to my wife,
we're moving to Como, and this hugely exciting voice and then a few hours later discovered
274
:that that was not indeed the case, which is a bit of a shame, but obviously it's pretty
too and I get to go out every now and again.
275
:But yeah, so
276
:It's a pretty new team, so I joined kind of at the start of this year when it was just me
and one other guy kind of on the data side.
277
:And then we've been hiring this year.
278
:You know, we're always open to interesting people if anyone's listening to this and they
think, come work for Como.
279
:But yeah, so I think we now have, it's difficult to remember, I think we now have maybe
about...
280
:nine people, something like that.
281
:And a mixture of kind of, we've got a few people on the kind of data engineering side, who
are massively helpful, and I'm very grateful to have them.
282
:And then some people on the kind of data science data research side as well.
283
:So yeah, we're, it's a it's a really good team to be working in and kind of a very well,
we have a very good kind of like, I don't know what the word for it is, like back end,
284
:like
285
:data warehouse type thing.
286
:So compared to clubs I've worked for in the past where it's just been me and everything's
just stored in horrible local files and I'm scrambling around trying to make things work.
287
:Yeah, but that makes your life way easier, 100%.
288
:100%.
289
:I'm not sure I could live without Google BigQuery now.
290
:That's my favorite thing.
291
:It's the foundation of what you can do basically.
292
:If you don't have that, then can hire, I say you can hire scientists, but...
293
:they're not going to be the most efficient they could be because basically it's going to
be very hard to get the the like the raw material, you know, it's like, basically, so it's
294
:gonna sound very pretentious, but I think it's a good analogy.
295
:It's like contracting Michelangelo to have a sculpture, and then having him just come to
where he has to do this culture with all the marble and so on, you know, like, no, you the
296
:marble to
297
:be all ready for him and like that's a great model he can just do the work and boom
because you have like other people are much better than him to source the marble or
298
:something like that right so that's that's basically what you want because otherwise
without the good marble you're gonna lose time and then and Michelangelo costs a lot I
299
:guess you know yeah 100 % and like it's definitely something I've learned with this kind
of yeah being my first proper proper role is how important that stuff is
300
:Yeah, yeah, no, for sure.
301
:Actually, I'm curious now.
302
:So first, yeah, that's awesome.
303
:Like nine people doing data stuff.
304
:That's really cool.
305
:And I'm pretty sure that makes you guys in the top 1%, I would say, of teams using data in
Europe and not only in football, I guess in most professional spots in Europe.
306
:Because I don't know a of teams who are there.
307
:developed to the engineer.
308
:So for that, that's awesome.
309
:And second, I'm wondering then what does a typical day look like for you at work?
310
:And what are some of the key responsibilities in your role?
311
:Yeah, so guess for me in particular, quite a lot of my focus in the last few months has
shifted towards player recruitment, which I guess is the side of football direction
312
:analysis we haven't quite touched on yet.
313
:But particularly at the minute, the January transfer window where you get to buy and
players is coming up increasingly fast.
314
:You know, we're kind of submitting all of our lists of recommendations and then they have
to go through all of the phases of being scouted and...
315
:discovering whether or not we can actually afford them, discovering whether or not they
actually want to come, and all of those things.
316
:So yeah, so lot of it will be, I don't know, there's models I'm working on at the minute
that go towards that, and so it will be kind of...
317
:Maybe we're adding a new feature to it, taking in some new data and then kind of
retraining those, seeing what effect they have on the list of players that we have, see
318
:kind of, you know, we're at the stage now where we've kind of got a relatively short list
of targets for each position.
319
:And so we're able to kind of do a bit more in depth analysis of those guys and see, okay,
what do we actually, what do we forecast if they join us?
320
:And then at the same time, kind of responding to...
321
:you know, as a team, we absorb quite a lot of ad hoc requests from the coaching staff.
322
:And so, yeah, being able to kind of respond to those, whether it's kind of, I don't know,
we've got a particular opponent coming up and we want to look at a particular thing about
323
:their style, or we need a particular report on like, maybe someone's played international
matches, and we need to kind of look at how they did physically in those games or
324
:something.
325
:So yeah, it can be a whole range of things.
326
:It's not an industry where you can very easily plan ahead.
327
:Like two weeks on Wednesday, I will spend the whole day doing this and nothing else.
328
:That's not really possible, but that keeps it fun.
329
:Yeah, yeah, for sure.
330
:Actually, a question I have for you that I often hear, know, like kind of a...
331
:widespread belief in football is that the winter transfer window is not very useful and
nothing happens?
332
:Is that something you've seen in the data?
333
:Yeah, I I think the lower down the pyramid you go, and that's kind of been where a lot of
my experience has been up till this year.
334
:Clubs very much do use the winter transfer window.
335
:I mean, so where I started at Oxford City and we were semi-pro, we would sign players
actually all the way through the season.
336
:There wasn't specific transfer windows.
337
:But even somewhere like KOMODE, you still want to be on the lookout for players.
338
:It's particularly important for players coming to the end of their contract.
339
:So if you've got a deal that expires in the summer, when generally most player contracts
do end...
340
:then the January before that is kind of a good opportunity to pick up those players at a
reasonably cheap price.
341
:And so that's an important thing to look at.
342
:You've also got players who maybe joined a club and then aren't playing, or got replaced
towards the end of the summer transfer window and aren't playing.
343
:And so there are opportunities to pick up good players.
344
:The very good players will come with premium in January though.
345
:because clubs don't want to sell them mid-season.
346
:Yeah, it's kind of like, yeah, that was my prior.
347
:That basically it's more useful for lower ranking clubs than big clubs, Yeah, exactly.
348
:And I mean, you've always...
349
:Yeah, like it depends kind of what your...
350
:People might cost more, but you might need them more in January as well.
351
:And so you've just got to kind of weigh up, like you've got a better idea of how your
season's going.
352
:you can make a better assessment of, if we sign such and such a player, maybe we expect...
353
:If your chance of relegation changes from 70 % to 10%, then that's really worth it.
354
:And that's something you can make a slightly more accurate assessment of in January.
355
:Although that would maybe be quite a player who decreased your relegation chances by that
much.
356
:Yeah, yeah, I mean, that's often like that in football, right?
357
:The richest clubs...
358
:often often have their way anyways.
359
:Like when they really need someone exactly exactly.
360
:But yes, I'd be shocked with no data behind it.
361
:I'd be shocked if they didn't sign a few.
362
:Yeah, yeah, yeah.
363
:So something I'm curious about is if you you know, any common biases in your football
data?
364
:And how do you adjust for them in your analysis?
365
:Yeah, so
366
:I think there's quite a lot of biases in terms of, and particularly on the recruitment
side, in terms of using the kind of traditional stats that you use to assess a football
367
:game.
368
:So possibly the canonical example of this is something like a pass completion rate for a
player.
369
:You know, that's kind of long been used to kind of, if you're trying to play a passing
game, you want to bring in players with high pass completion rates, and that's not untrue.
370
:But it's also so dependent on the kind of difficulty of the passes, which they're
attempting.
371
:You know, a player who kind of plays in defensive midfield, taps the ball about a little
bit, will have a high pass completion rate.
372
:A player looking to play really aggressive balls will have a much lower one.
373
:And so one of the places that kind of, yeah, statistical modelling can help you there is
you can come up with an expected passers score.
374
:You know, guess probably a lot of people listening are maybe familiar with expected goals
and expected passes is just the same as that.
375
:It's for each pass, what do we think the probability of this being completed is?
376
:And as we increasingly get more data in football, our models for expected passes can
become increasingly good.
377
:And so kind of, you know, five, six years ago before you had tracking data and you just
had very granular event data of there was this player here and he tried this pass to this
378
:other player on the other side of the pitch.
379
:You can make a kind of model like
380
:If it's a long pass, if it's a forward pass, those things make the pass more difficult.
381
:But now, we have data on where every player is across the whole pitch, and increasingly
even what direction they were facing, and maybe where their elbow was, if you want to try
382
:and use that.
383
:And so that kind of lets you make a really unbiased assessment of how good a player is at
passing, not taking into account the kinds of passes that they're trying.
384
:And this is something that really early on, I think, is still one of my favorite football
385
:analytics stories is when when I was at Oxford City we had one of these expected passes
models we were playing against the team who at the time had won I think 11 out of 11 in
386
:the league the mighty EBS fleet United and
387
:All of their team have really high pass completion rates, but we noticed that their
goalkeeper actually was massively underperforming his expected passes, because he was just
388
:playing really simple passes to the centre backs.
389
:Whenever he went long, he gave the ball away.
390
:And so the strategy for that match was, great, we should try and press the goalkeeper,
which we did.
391
:Kind of took away his passing options, that worked really well, to the point where the
goalkeeper actually came out at half time.
392
:and practice kicking long because he was being forced to and that was so out of his
comfort zone.
393
:And so yeah, that's kind of a really good example of looking beyond just the basic
statistics and using modeling to get some useful conclusions out and a point that I hope
394
:to actually.
395
:Yeah, yeah, That's awesome.
396
:I really like that example for sure.
397
:And actually, you have another?
398
:example of a particularly insightful finding that helped you assess player strength and
weaknesses using data, which is something I guess you do a lot, especially for transfer
399
:recommendations.
400
:Yeah, definitely.
401
:Obviously, there's a limit of exactly what I could give away on a podcast, I think I an
example.
402
:Yeah, I think one really interesting study which I did while I was still at uni, and this
is kind of not quite so much on a player level, but it was assessing, kind of comparing
403
:men's and women's football.
404
:So I did a paper just before the Women's World Cup in 2023.
405
:And again, it was kind of a similar passing-based study, where we were looking at what are
the differences here.
406
:And by actually developing an expected threat model, we were able to see a really clear
difference.
407
:So what you can do, or the way that this model worked, is you kind of break the pitch down
into lots of little grids.
408
:essentially have a kind Markov chain of transitions between those grids.
409
:You work out the probability of, if you've got the ball in a particular space, what's the
probability you score in the next possession or the next 20 seconds, or however many
410
:you...
411
:want to segment your data.
412
:And then...
413
:Yeah, by kind of comparing the start and end locations of all of the passes that players
were attempting, it really stood out that in the women's game, at least in the data that
414
:we had, all of the well not all of the passes, but the average expected threat of a pass
was much higher than in the men's game.
415
:And so it was really kind of, there were a lot more passes in the men's game, but like the
total threat generated was the same.
416
:And so you got this kind of, you were seeing this much more kind of dynamic, much more
attacking intent behind the play.
417
:Yeah, okay.
418
:That's super cool.
419
:And the fact that you can, does that help the fact that you can do the same analysis on
different teams from the same club?
420
:like women academy, first man's team, et cetera.
421
:Yeah, it's a really...
422
:I think it just means you can do a lot more as a data team, right?
423
:I think it is important to be aware of the differences between the different kind of, you
know, the women's game, the men's game, and the youth game, I guess, would be the three
424
:categories.
425
:But a lot of the models, if you write a model for the men's team, say, and then apply it
to the youth team, it will certainly at least still have some validity, even if the
426
:parameters that you've trained or the exact form that you've used isn't completely
optimal.
427
:And so that means that kind of
428
:when we write our post-match reports at Como for the first team, we're also able to
produce the same report for the under-19s, for the under-17s, and so we're able to kind
429
:of, not only kind of give the coaches more information about how their team might be doing
and how we can improve, which is obviously a hugely valuable thing, but also get the
430
:players themselves really involved in the data analysis process.
431
:So they're used to playing a game and then looking at the feedback from it, and that's
something you can build up a culture in the club as players are coming through.
432
:that can then kind of really just become a really natural thing as players progress into
the first team.
433
:Rather than being in the other 19s, suddenly moving to the first team and then no,
someone's showing you a whole load of graphs about how bad you were.
434
:It's something that you can, yeah, you can be bit more realistic.
435
:yeah.
436
:No, for sure, and that makes also the players, you know, like, participate in the analysis
and like be part of the models and so on.
437
:I think that that's much...
438
:easier than to get their acceptation of what the model is saying afterwards.
439
:Yeah, exactly.
440
:Okay.
441
:Are there any statistical techniques or models that you find most useful in your work?
442
:How do you see them helping in decision making?
443
:Yeah.
444
:I'm a big fan of kind of relatively heuristic based parametric models.
445
:So if I'm approached with the problem of football, particularly in a kind of prediction
sense, think kind of exploring the data and exploring kind of
446
:what you might expect to see from it and then coming up with a model with relatively small
numbers of parameters is something I find really helpful.
447
:So for example, like a lot of statistics in football, because they count statistics over a
sufficiently long amount of time, you you'd expect them to be approximately normally
448
:distributed if they came over a season.
449
:And so you can bake that into your model already, which I think is a real advantage to
kind of...
450
:I don't know, compared to just training a completely black box model where it's got to
work out its confidence intervals or prediction intervals or credible intervals or
451
:whatever it purport to do at the end of it.
452
:Because you know that there should already be quite a lot of structure to your data.
453
:So one thing, we were building a the other day that kind of, we were looking at the total
number of actions that a player performed, like just as a simple test case for something.
454
:We allowed it to be normally distributed with any variance.
455
:And then, and behold, after training the model, the variance parameter had a very tiny
confidence interval around the Poisson mean that you'd expect to see.
456
:it was just variance was equal to the mean.
457
:And that did make me feel slightly stupid because I was like, come on now, we maybe could
have seen that coming, could have saved ourselves a bit of compute there.
458
:But yeah, really taking advantage of the structure in the data and the kind of...
459
:Yeah, particularly when you're looking at averages, the normal things that you'd expect to
see.
460
:That can be really helpful.
461
:Hmm.
462
:Hmm.
463
:Okay.
464
:Yeah, that's interesting.
465
:And yeah, basically, these kind of heuristics help you having some...
466
:some already good first results to then have a back and forth with the domain experts.
467
:Exactly.
468
:Exactly.
469
:it gives you something, it gives you a really good starting point for models.
470
:So if you're doing something like predicting...
471
:Results like as much as it's not a perfect model starting with both teams score a Poisson
distributed number of goals Like that gives you a really sensible starting point where you
472
:can add different things in and so maybe you adjust You know you kind of use a Poisson,
but you have some adjustment parameter to it people use things like the viable
473
:distribution or something like that and Maybe yet you introduce somehow introduce some
kind of dependence in your model that once the first goal is scored the parameters change
474
:or something that's wanting becomes more defensive
475
:And yeah, like, but having that kind of solid baseline not only gives you a good baseline
to compare your other models against, but also allows you to do this kind of slow
476
:iterative process that gets you towards a kind of, you know, your model at the end might
be quite complicated, but because it started from a sensible point, you kind of know what
477
:all the steps were.
478
:And it also makes things really explainable, like that's another real advantage of these
parametric models.
479
:If every one of your parameters has some meaning in your head, then you're to be able to
really easily explain that to the coaches.
480
:Whereas I'm not sure they're going to love looking at a sharp plot or something like that
from your set of:
481
:Yeah.
482
:Yeah, I know for sure.
483
:And that's why I think it ties back to what we're saying where knowing how they are going
to use the model is extremely important because then you can just...
484
:compute back their samples of your model instead of showing these, you just compute back
to the metric they are interested in.
485
:that gives you a lot of buying and also a lot of very useful back and forth because
usually they have an extremely deep domain knowledge that you don't have.
486
:And that's absolutely super valuable for you as the modeler.
487
:Exactly, and it gives them the opportunity to change some of the parameters as well.
488
:Like if all of your inputs are things that they understand, then yeah, maybe when they're
recruiting for a particular role, it's not just a winger, it's a like inverted winger that
489
:they're looking for, then they'll be able to know, this is actually more important, this
is a bit less important, I can tweak those on whatever app that you've coded for me, and
490
:then yeah, see how that affects the list of players that comes up.
491
:Right, right.
492
:Yeah, very good point.
493
:Yeah, super cool.
494
:And something else I'm wondering is, you touched a bit on that already, but data is
becoming more more available in football in particular, especially tracking data.
495
:How has that influenced your work?
496
:And what new insight were you already able to provide?
497
:Yeah, so
498
:Tracking data is a really exciting data source.
499
:The fact that it's so available now is just mental.
500
:It still blows my mind that every weekend we get like XY tracking data from hundreds of
matches across Europe and the world in fact.
501
:I think we get it from the States at least.
502
:Yeah, it subsets the possibilities from that endless, right?
503
:Like you've got enough data Modulo the noise that's there and it is relatively
substantial, but you've got enough data to completely understand what's going on on the
504
:football pitch and So you can start to do things like I was saying earlier about the
expected passage model You can start to do things like really accurately say Okay, this
505
:scenario, know, these were your options and this guy you shouldn't have passed you or you
you know You could have looked for this other option
506
:think kind of, you know, ultimately, the aim of a lot of models and their kind of pieces
in this very long journey is to essentially come up with a system that can play football,
507
:right, that understands the game and that is able to go, okay, from this position, if I
was all 11 players, I would move like this, and I would pass like this.
508
:And this would kind of, yeah, increase my chances of scoring a goal by 5%, or what have
you.
509
:So kind of almost building, you know, like a stockfish, but for football, which...
510
:It's an incredibly complicated thing to do because your players can no longer just move
two steps forward or one step to the left.
511
:But yeah, like, I think it's the first stage.
512
:The data is there now to be able to build those models.
513
:And kind of I do see that kind of a lot of a lot of our projects that we're working
towards are just
514
:Ultimately, you could construe them as just small pieces on this on this journey Yeah,
think particularly the data that's coming out in football now as well with not just player
515
:positions But you get like 20 coordinates per player you get kind of with their hands and
their arms and their knees are Like exactly how you choose that data.
516
:I'm not entirely sure but it's certainly an exciting like exciting thing to have Yeah,
yeah, no, they definitely and how do you see the
517
:the field and the role of analysis in football evolving in the next few years?
518
:Yeah, I think, I mean the very boring answer is that I think it would just continue to
become increasingly important.
519
:I think we're starting to see that we've got a generation of coaches coming through who
have been essentially had this data available to them throughout their entire coaching
520
:career.
521
:and are going to be much more willing to use it are going to see it as a much higher
priority, invest more in their data teams, and kind of, yeah, begin to use it for a lot
522
:more of their decision making.
523
:And I think in terms of kind of actual things that might tangibly change, I don't think it
will be too long before we start seeing, and in some sense, you could argue that teams
524
:like Man City are already doing this, we start seeing strategies being employed that are
kind of purely come up, have purely been developed by from the data.
525
:And so rather than kind of like I was saying earlier, using the data to decide between
three possible lineups or something or three possible formations, you could play, like
526
:actually starting to build models to, you know, say, oh, well, maybe if we get our players
to kind of be in this formation, and then morph into this one, like when a certain thing
527
:happens, then that's going to be really effective.
528
:And so I think, you know, like,
529
:already now if you compare how football's played to 40 years ago, it looks very different.
530
:And I think the speed of that change is only going to be increased by data as teams kind
of come up with increasingly good ways to play against each other and you know, the
531
:optimal strategy shifts and shifts and shifts.
532
:And I think also like player recruitment from kind of, you know, like countries that don't
have big leagues in is also going to increase you already look at
533
:Brighton and Bradford are the two canonical examples of that and how well they're doing.
534
:It's something that every recruitment team around the world and every kind of consultant
is trying to do as well.
535
:But I think people are going to get increasingly good at that.
536
:And so I imagine you'll start seeing players move earlier.
537
:And because after 500 minutes with tracking data, you're able to assess that they're going
to be good enough to play in the Premier League.
538
:And also kind of move to a wider range of clubs.
539
:It won't just be Brighton who sign 20 youngsters from South America who are somehow
amazing.
540
:It'll be kind of 20 premier league clubs each finding one youngster and then fighting over
them.
541
:Yeah, yeah, Yeah, great point.
542
:And so, like, that in mind, what advice would you give to...
543
:aspiring football analysts or data scientists who want to break into the sports analytics
industry?
544
:Yeah, I guess the main thing I would say to them is to kind of be like take the time to be
curious.
545
:And there's quite a lot of good open source data out there.
546
:So statsbomb have an open data repository.
547
:They're one of the big football data providers.
548
:If you just
549
:Google stats bomb open data, I'm sure it will come up.
550
:And skill corner who are one of the big tracking providers have, I think seven or eight
example games out there, which is enough to kind of get you started playing with that.
551
:And I think just kind of take the time to have fun with that data, look at it, try and
come up with some interesting models, maybe take half a season's worth of data and try and
552
:predict a certain thing in the second half.
553
:And because when clubs are recruiting analysts, like the top thing they want to see is
kind of that
554
:portfolio is that you can do things and kind of yeah, not only have you got the interest,
but you've also got the skill to be able to produce really insightful things.
555
:And so yeah, that's definitely something that's worth doing kind of even if it's not
connected to a club at first, even if it's just kind of in your own time.
556
:Yeah, that is great for building skills.
557
:And then secondly, is to try and get involved in a club like I'm very, very grateful to
558
:Oxford City for kind of taking me on when I didn't really know anything and those first
couple of years of getting experience were really really important and they meant that
559
:when I went to slightly bigger clubs who if I went and I was rubbish they just sacked me.
560
:When I went to those clubs I at least kind of roughly knew what I was talking about and
was able to do some vaguely useful things.
561
:So yeah kind of reaching out even if they're a club that doesn't have much data in some
sense that makes you be more creative like
562
:we spent a bit of time investigating can we build a recruitment model based on Twitter at
Oxford City, because in the seventh tier, that's basically all you've got.
563
:And it didn't really go anywhere in the end, sadly, but I'm sure someone could do a better
job than me.
564
:And so yeah, it gives you that chance to be creative and helps you also have a real impact
on the team.
565
:Yeah, yeah, yeah, love that.
566
:The obstacle is the way, Exactly.
567
:Yeah, I love that.
568
:That's great.
569
:Great advice.
570
:And yeah, I completely second everything you just said.
571
:And also starting being involved in some open source work is definitely a good way to get
there for sure.
572
:Like trying to identify things people are interested in and you are interested in and are.
573
:able to do it because you know you have to do it even if nobody is paying you to do that
you won't do that all the time you know all your life but yeah like if you have some time
574
:especially if you're in your 20s that's when you want to experiment and do that kind of
stuff so yeah definitely try and do that and as you were saying try and finding some
575
:mentors is something extremely important but there is a skill in doing that too right like
576
:Don't go and ask people, hey, can you be my mentor?
577
:It's usually better if you have something to offer in return.
578
:If they have a pain point and you can help them with that, then in exchange they'll be
your mentor.
579
:Or if you contribute to the open software, just something like that, they'll be much more
open to mentoring you.
580
:And personally, I'm curious what your future projects or research areas are for the coming
month that you are excited about.
581
:Yeah, so I think kind of one thing that I'm really excited about looking towards working
on is, you know, kind of again, it's a step in the journey towards this kind of chess
582
:computer dream for football.
583
:is coming up with a really good way of assessing kind of what's the probability of you
scoring a goal in the next 20 seconds from being able to use all of the contextual data
584
:you've got in tracking data.
585
:And so like I said, like I was talking about earlier, you can make a kind of simple
expected threat model and you can kind of do a few more things along that line.
586
:I've read papers where kind of they've built
587
:graph neural networks to predict the next event in a set of event data.
588
:And you can start doing that and you can kind of change your predictions together.
589
:And do things with the the discrete data, but really moving that on to the tracking
tracking data I think that's something that would unlock quite a lot more analysis You
590
:know one of the big issues with the bad data is it's very much focused on the ball and you
don't get to see The case where someone makes a great run that kind of opens up The chance
591
:for a goal you just see the player who had the ball being able to dribble into this
miraculously free space and then take their shot and say
592
:Yeah, doing that to be able to really start to get some more understanding on how the
whole football team contributes to creating a chance, both offensively and defensively as
593
:well.
594
:And then I think also just, you know, just continuing kind of working with coaching staff,
just getting better at kind of, you know, the match analysis side of things, like learning
595
:how to...
596
:how to really characterize opposition teams in a way that's helpful.
597
:and just kind of, yeah, it's very much a learning experience.
598
:I still feel like I'm in the stage where, oh, certainly if you put me in charge of a
football team and asked me to actually design a set of tactics, I would be so useless.
599
:I'd be like, yeah, 4-4-2, let's just lop up to the big guy.
600
:And so kind of, yeah, continuing on that journey of learning, properly learning football
is something really excited to do.
601
:So I think we can call it a show that was like, I have so many more questions still, but
I've already taken quite a lot of time from you.
602
:Thank you so much, Matt.
603
:As usual though, I'll ask you the last two questions I ask every guest at the end of the
show.
604
:First one, if you had unlimited time and resources, which problem would you try to solve?
605
:Yeah, I guess this kind of goes back to the nerdy 15 year old guy reading about unsolved
math problems that I was.
606
:But I would love to really get my teeth into one of these essentially impossible problems.
607
:think the collapse conjecture has always really stood out to me as something that you just
feel like you should be able to solve, but at the same time it's just completely
608
:impossible and I have no idea where to start.
609
:And so I think in an alternate life I can see myself locking myself in a room and working
on it for a year and probably getting nowhere but having quite a lot of fun along the way.
610
:But yes, I feel I should maybe have said something more useful like that would have an
impact on the world, but...
611
:Now, it's fine.
612
:Yeah, so I spent a long time thinking about this question.
613
:I think actually what I'm always amazed by is kind of inventions that...
614
:could have been invented quite a long time before.
615
:So something like the smartphone, right?
616
:Like an ancient Egyptian is never going to able to invent the smartphone because they need
so many different things and so many components to get there.
617
:So we got the hot air balloon, however.
618
:And so this is my answer.
619
:It's the Montgolfier brothers who invented the hot air balloon.
620
:All you need is a way of making air hot and a balloon to hold it in.
621
:And then you can fly, which is so cool.
622
:And something that like humans surely have always wanted to do.
623
:And so, yeah, like I've
624
:that's an idea that could have been invented thousands of years before and so being able
to sit down with those kind of guys and be like what's your second best idea and let me
625
:have it so that I can invent something that wouldn't have been invented for another 3,000
years still I think that
626
:to great to pick that for AIDS.
627
:I'm glad I got a unicorn.
628
:Excellent, we'll fly over in our hot air balloon and then we can meet you in the States.
629
:yeah, that's great.
630
:so Hot Air Balloon, if you ever go to my native region, which is the Loire Valley, so in
the center of France, which is the region where the castles,
631
:are from the Middle Ages and the Renaissance.
632
:Definitely recommend doing a hot air balloon flight.
633
:They usually start very, very early in the morning, something like 4 or 5 a.m.
634
:But it's absolutely amazing because you get to see the sunrise and then you have like that
gorgeous view of the castles along the Loire, which is the river.
635
:Yeah, that's absolutely incredible.
636
:I haven't done it myself.
637
:I speak like I had.
638
:But I watched a lot of videos of my parents because that's a present.
639
:That's the gift I made to my father for his last birthday.
640
:And yeah, this looks absolutely great.
641
:Definitely recommend it.
642
:It's pricey, but it's really worth it.
643
:I hope you enjoyed it.
644
:A link to your website in the show notes for those who want to dig deeper.
645
:Thank you again, Matt, for taking the time and being on this show.
646
:This has been another episode of Learning Bayesian Statistics.
647
:Be sure to rate, review, and follow the show on your favorite podcatcher, and visit
learnbayestats.com for more resources about today's topics, as well as access to more
648
:episodes to help you reach true Bayesian state of mind.
649
:That's learnbayestats.com.
650
:Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.
651
:Check out his awesome work at bababrinkman.com.
652
:I'm your host,
653
:Alex Andora.
654
:can follow me on Twitter at Alex underscore Andora, like the country.
655
:You can support the show and unlock exclusive benefits by visiting Patreon.com slash
LearnBasedDance.
656
:Thank you so much for listening and for your support.
657
:You're truly a good Bayesian.
658
:Change your predictions after taking information in.
659
:And if you're thinking I'll be less than amazing.
660
:Let's adjust those expectations.
661
:Let me show you how to be a good daisy Change calculations after taking fresh data in
Those predictions that your brain is making Let's get them on a solid foundation