Easily listen to Learning Bayesian Statistics

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!

Intro to Bayes Course (first 2 lessons free)
Advanced Regression Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways:

Matt emphasizes the importance of Bayesian statistics in scenarios with limited data.
Communicating insights to coaches is a crucial skill for data analysts.
Building a data team requires understanding the needs of the coaching staff.
Player recruitment is a significant focus in football analytics.
The integration of data science in sports is still evolving.
Effective data modeling must consider the practical application in games.
Collaboration between data analysts and coaches enhances decision-making.
Having a robust data infrastructure is essential for efficient analysis.
The landscape of sports analytics is becoming increasingly competitive.
Player recruitment involves analyzing various data models.
Biases in traditional football statistics can skew player evaluations.
Statistical techniques should leverage the structure of football data.
Tracking data opens new avenues for understanding player movements.
The role of data analysis in football will continue to grow.
Aspiring analysts should focus on curiosity and practical experience.

Chapters:

00:00 Introduction to Football Analytics and Matt's Journey

04:54 The Role of Bayesian Methods in Football

10:20 Challenges in Communicating Data Insights

17:03 Building Relationships with Coaches

22:09 The Structure of the Data Team at Como

26:18 Focus on Player Recruitment and Transfer Strategies

28:48 January Transfer Window Insights

30:54 Biases in Football Data Analysis

34:11 Comparative Analysis of Men's and Women's Football

36:55 Statistical Techniques in Football Analysis

42:48 The Impact of Tracking Data on Football Analysis

45:49 The Future of Data-Driven Football Strategies

47:27 Advice for Aspiring Football Analysts

51:29 Future Projects in Football Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary, Blake Walters, Jonathan Morgan, Francesco Madrisotti, Ivy Huang, Gary Clarke, Robert Flannery, Rasmus Hindström, Stefan, Corey Abshire, Mike Loncaric, David McCormick, Ronald Legere, Sergio Dolia, Michael Cao, Yiğit Aşık and Suyog Chandramouli.

Links from the show:

Matt on Linkedin: https://www.linkedin.com/in/matthew-penn-732551232/
Matt on Google Scholar: https://scholar.google.com/citations?user=oY7jC7UAAAAJ&hl=en
LBS #117, Unveiling the Power of Bayesian Experimental Design, with Desi Ivanova: https://learnbayesstats.com/episode/117-unveiling-power-bayesian-experimental-design-desi-ivanova/

Transcript

This is an automatic transcript and may therefore contain errors. Please get in touch if you're willing to correct them.

Speaker: 00:00:05

Today, I'm joined by Matt Ben, a football data scientist for the Italian football club

Como: 1907

2

: 00:00:19

years.

3

: 00:00:20

With a background in mathematics from Oxford and a passion for football, Matt now applies

statistical modeling to help clubs make smarter, data-driven decisions.

4

: 00:00:31

In this episode,

5

: 00:00:32

Matt walks us through his journey into football analytics, one that started during the

lockdown and quickly evolved into a full-time career.

6

: 00:00:40

We discuss the role of Bayesian methods in football, particularly when working with

limited data, and the challenges of communicating statistical insights to coaches and

7

: 00:00:50

decision makers.

8

: 00:00:52

Matt also shares his experience in building a data team within a football club,

highlighting the importance of bridging the gap between analytics and coaching.

9

: 00:01:02

We dive deep into the world of player recruitment and transfer strategies, exploring how

biases in traditional football statistics can distort player evaluations and how tracking

10

: 00:01:13

data is transforming our understanding of performance.

11

: 00:01:18

From expected passes to the differences between men's and women's football, this

conversation is packed with insights into the evolving landscape of sports analytics.

12

: 00:01:29

This is Learning Vision Statistics episode

13

: 00:01:31

128, recorded November 19, 2024.

14

: 00:01:41

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods,

the projects, and the people who make it possible.

15

: 00:02:02

I'm your host, Alex Andorra.

16

: 00:02:04

You can follow me on Twitter at alex-underscore-andorra.

17

: 00:02:07

like the country.

18

: 00:02:08

For any info about the show, learnbasedats.com is the last to be.

19

: 00:02:13

Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on

Patreon, everything is in there.

20

: 00:02:20

That's learnbasedats.com.

21

: 00:02:22

If you're interested in one-on-one mentorship, online courses, or statistical consulting,

feel free to reach out and book a call at topmate.io slash alex underscore and dora.

22

: 00:02:33

See you around, folks.

23

: 00:02:34

and best patient wishes to you all.

24

: 00:02:36

And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can

help bring them to life.

25

: 00:02:43

Check us out at pimc-labs.com.

26

: 00:02:49

met then welcome to learn invasion statistics now thank you for taking the time and then

and the big thank you to dizzy even over comment that friend of a of ours and busy

27

: 00:03:05

recommended me to to interview you on the show bs she knows i love states in sports so she

was like are you going to meet you get a meet met madison so you know

28

: 00:03:18

But yeah, for for people who want to hear more actually about Daisy, I just had her on the

show episode 117.

29

: 00:03:27

And we dove dove.

30

: 00:03:30

think you say we dove in English, right?

31

: 00:03:33

so we dove into into the fascinating work Daisy is doing on Bayesian experimental design.

32

: 00:03:42

So yeah, definitely encourage you guys to check these one out.

33

: 00:03:46

I'll put it in the show notes because

34

: 00:03:48

That's definitely a great one.

35

: 00:03:50

I had great feedback about that one, Tessy, so thanks a lot and lots of pressure for us

today, Let's try That's a act to follow.

36

: 00:04:01

I know, know.

37

: 00:04:02

But you know, let's try.

38

: 00:04:05

So actually, can you share your journey with us?

39

: 00:04:08

Because I like your path because you studied math at Oxford, but now you work in football

analytics or soccer analytics for our...

40

: 00:04:18

American friends.

41

: 00:04:20

So yeah, I'm curious how how did that happen?

42

: 00:04:24

Yeah, so like you said, I did an undergrad and master's in maths at the University of

Oxford and then went on to do a PhD in statistics and really, certainly going into that,

43

: 00:04:34

the plan was not to go into football.

44

: 00:04:36

I thought I'd probably go become an academic or work in research in some other way.

45

: 00:04:41

But it was during lockdown.

46

: 00:04:43

It was kind of winter 2021.

47

: 00:04:44

It was cold outside and we were stuck inside.

48

: 00:04:47

And I was sat playing the video game football manager.

49

: 00:04:51

I'm hoping at least some of the people listening in have played that.

50

: 00:04:54

too.

51

: 00:04:55

If not, I would thoroughly recommend although it's a dangerous dangerous one to spend too

much time on.

52

: 00:04:59

And so doing that, and just kind of thinking, I actually probably have the skills to do

this for real now.

53

: 00:05:07

Or at least to do some of it for real to kind of, you know, curate some of this data

myself to start building models and to start having an effect on how clubs sign players

54

: 00:05:15

and how they line up.

55

: 00:05:17

And so I just sent out

56

: 00:05:19

a load of random emails to clubs in Oxfordshire, which is where I was living at the time,

just saying, would you like anyone to come do some free data analysis for you?

57

: 00:05:28

And I kind of forgot about it.

58

: 00:05:29

After that, like life went on.

59

: 00:05:31

And then about a month later, someone from Oxford City Football Club, which is a semi

professional football team in Oxfordshire, and got back to me and they were like, Yeah,

60

: 00:05:40

we'd love to have you on board.

61

: 00:05:41

And so, yeah, it just kind of it started from there.

62

: 00:05:46

Really, I was kind of useless for the first year, to be perfectly honest, just trying to

work out.

63

: 00:05:50

I think I came in thinking, I'm gonna solve everything.

64

: 00:05:52

I'm better than all of these other people.

65

: 00:05:54

that turned out to be very much not the case.

66

: 00:05:57

All of these people really knew what they were doing and I did not.

67

: 00:05:59

But yeah, they were very patient with me.

68

: 00:06:02

A guy called Wilkie from Oxford City in particular.

69

: 00:06:04

yeah, just kind of...

70

: 00:06:07

snowballed from there really and so worked for them for a few years and also did some work

for Solihull Moors and Oxford United and while I was doing my PhD and then when that

71

: 00:06:18

finished and moved to FC Como who at the time were second division in Italy now promoted

to the first division and yeah been working there working there since.

72

: 00:06:28

This is really cool yeah yeah I love it.

73

: 00:06:31

And were you already...

74

: 00:06:33

like have you always been a football fan?

75

: 00:06:36

Do you have a team you root for?

76

: 00:06:39

Yeah, very much grew up kind of.

77

: 00:06:42

the classic, I don't know, I was probably classic seven year old boy obsessed with

football.

78

: 00:06:46

Played a lot of football when I was a child, although deeply unsuccessfully.

79

: 00:06:50

I always shudder to think what would happen if my data ever appeared on any of our data

sources.

80

: 00:06:57

yeah, I grew up supporting Bolton Wanderers, who were a really trendy English football

team for like about a year and a half, and I made a terrible mistake to support them and

81

: 00:07:06

it's been all downhill since then.

82

: 00:07:10

And yeah, of, you know, always following football, always playing things like football

manager.

83

: 00:07:16

So it's been a real joy to be able to do that in my kind of nine to five as well, working

for Kome.

84

: 00:07:21

Yeah, yeah, yeah, I guess.

85

: 00:07:24

So yeah, we've been the same seven year old, you know, so, and I've made the same mistake

f, so when I was five, it was: 1995

86

: 00:07:40

until now, European Cup, like the previous Champions League, I don't remember the name in

English.

87

: 00:07:47

And so of course they were everywhere on TV in France.

88

: 00:07:52

And my favorite color was blue, so you it became my team.

89

: 00:07:57

And that was a big mistake.

90

: 00:08:00

Now I have to root for them, it's absolutely terrible.

91

: 00:08:04

I know.

92

: 00:08:05

know.

93

: 00:08:06

Yeah.

94

: 00:08:06

The temptation is so there to change, but I just couldn't.

95

: 00:08:09

I couldn't, couldn't Oswald Bolton.

96

: 00:08:12

Oh yeah, I'm the same like.

97

: 00:08:14

Oh yeah, believe me, I tried.

98

: 00:08:15

I tried, you know, being like.

99

: 00:08:17

It's not the same, it?

100

: 00:08:18

Yeah, it's.

101

: 00:08:19

This is awesome.

102

: 00:08:20

I have amazing, amazing data in, it's about, uh, professional athletes in, um, when I get

to do models about, uh, other performance.

103

: 00:08:30

Absolutely incredible.

104

: 00:08:32

So, you know, um.

105

: 00:08:33

really grateful for that.

106

: 00:08:36

And actually I'm curious because I of course use patient stats a lot in my work and in

everything, even in my life.

107

: 00:08:46

What about you?

108

: 00:08:47

you using any patient stats at work?

109

: 00:08:51

And even more generally, do you remember when you were first introduced to patient stats?

110

: 00:08:58

Yeah, I guess maybe I'll start with when I was first introduced and then, spoiler alert, I

do use it quite a lot at work.

111

: 00:09:05

I remember it was in our second year undergrad stats course and it was just this little

section at end of the course.

112

: 00:09:11

It all been very frequentist up to that point.

113

: 00:09:14

then we had like three lectures on Basing stats and I remember walking out the first floor

going like, mate this is nonsense, like you don't know the prior, what's going on, I would

114

: 00:09:22

never use this.

115

: 00:09:24

And then slowly over time I realised that maybe my initial reaction as a kind of know it

all second year undergrad was not necessarily correct.

116

: 00:09:33

And so yeah, like have really come to appreciate how useful it is.

117

: 00:09:37

think particularly in a football context, the places where certainly it's the most

powerful are the places where you don't have very much data.

118

: 00:09:44

So when when you're trying to model something you come up with a parametric model and

you've not got much to fit to it.

119

: 00:09:50

So for example Kind of modeling international football results is a really good example of

that You've got kind of over the course of four years, which is maybe as much relevant

120

: 00:10:00

data as you can get you've got maybe 30 40 games tops for each team And so you're trying

to train a kind of traditional Maybe it's not traditional but I can actually boost model

121

: 00:10:11

or something like you just end up overfitting massively whereas

122

: 00:10:14

Bayesian stats really helps you kind of know what you don't know, I guess.

123

: 00:10:20

Yeah, yeah, couldn't put it better.

124

: 00:10:23

I mean, even when you have a lot of data.

125

: 00:10:26

So yeah, I know a bit about football data.

126

: 00:10:29

And I know also about baseball data, because that's my job.

127

: 00:10:33

And baseball, we have a ton of data.

128

: 00:10:36

It's absolutely just absolutely incredible.

129

: 00:10:40

But you have a lot of data in aggregate.

130

: 00:10:44

Doesn't mean you have a lot of data about each and every player.

131

: 00:10:49

Yeah.

132

: 00:10:49

So I mean, even you have millions of rows doesn't mean you have millions of rows about

each player and actually, Vagin says are still super useful there because well, there is a

133

: 00:11:00

lot of players you don't know a lot about and you want to make inferences about what they

could do and are able to do.

134

: 00:11:08

Well, here that's super helpful too.

135

: 00:11:11

Yeah, definitely, definitely.

136

: 00:11:12

And like in a football context, it's those kind of rare actions.

137

: 00:11:16

we should kind of fall into that category.

138

: 00:11:18

something like looking at players passing, you've got lots of data on them, like a top

player will do 100 passes a game, that's kind of enough to do whatever you want.

139

: 00:11:25

But when you start to look at something like shooting, particularly maybe from a

goalkeeper, like saving shots, you're probably only facing like five shots a game that are

140

: 00:11:33

on target.

141

: 00:11:34

And so it's yet in those scenarios where again, it becomes really important.

142

: 00:11:38

Yeah, yeah, yeah.

143

: 00:11:39

Yeah, great point.

144

: 00:11:41

I've been working lately for an open source or...

145

: 00:11:46

project you know like that just just for fun with football data and you know we're looking

at Scoring rates and yes scoring rates are so low.

146

: 00:11:57

It's just so hard to score a goal, you know, yeah, and it's so low that Using a Poisson

likelihood for instance, which you think that would be the likelihood you want right

147

: 00:12:11

because that's already rare events for the Poisson

148

: 00:12:14

the person is actually too heavy on the right tail myself it's gonna have too much

probability mass around 234 goals which are actually

149

: 00:12:28

very rare from an individual perspective right it's actually very rare that one given

player scores a goal and then more than one goal it's even more rare and the Poisson is

150

: 00:12:38

just like it puts too much emphasis on the two three plus goals and since you don't have

enough data to inform that these scenarios are actually very improbable then if you don't

151

: 00:12:52

or change the likelihood or inform the prior about that, then the model will be happily

saying, yeah, but there is a lot of uncertainty.

152

: 00:12:59

You know, that player maybe will end up scoring five goals in the next game.

153

: 00:13:04

No, he won't, you know, even if that's messy, that's very rare.

154

: 00:13:09

yeah, that's here.

155

: 00:13:11

That's really dominant knowledge that you want to use or even like real statistical

knowledge here even.

156

: 00:13:18

Yeah, definitely, definitely.

157

: 00:13:21

And I'm wondering, like for you for instance, I mean for you and football in general, what

is your main obstacle or paint point when you're using a Beijing model?

158

: 00:13:39

Yeah, I guess a lot of it, to be fair, this is in some sense a more general obstacle.

159

: 00:13:46

It's easy.

160

: 00:13:48

It's very easy to come up with something you might want to model about a game.

161

: 00:13:51

You might want to model like, I don't know, which of the opposition players is going to

make the most passes in your final third or something like close to your goal.

162

: 00:14:01

The difficulty with all football models is how do you actually then apply that into the

game.

163

: 00:14:05

Like it's all very well if we're playing Man City I can go to the coaches and say that

Erling Haaland guy he's quite good but they can't actually use that and so it's kind of

164

: 00:14:14

you need the

165

: 00:14:16

you're always having to balance kind of interesting data to model, but also using the

insight from the coaches and the people you're working with to try and go, how can we,

166

: 00:14:24

like what possible things could we do in this game?

167

: 00:14:27

Like what possible strategies could we employ?

168

: 00:14:29

And then how can we kind of model whether or not they'll solve this problem?

169

: 00:14:33

So yeah, kind of making sure that what you're doing is really, really applicable is a real

challenge.

170

: 00:14:38

Yeah, so basically all of my work is in Python.

171

: 00:14:42

So I guess that makes me not a real coder.

172

: 00:14:45

But it does the job.

173

: 00:14:48

I do everything in Python, and I'm an open source developer in Python.

174

: 00:14:52

you're like, you're welcome here.

175

: 00:14:56

That's good, that's good, that's good.

176

: 00:14:57

Because I was made to code some JavaScript this week for an app and that did not go well.

177

: 00:15:01

I'm very happy sticking with Python.

178

: 00:15:04

yeah, I feel your pain.

179

: 00:15:08

But yeah, and then within that, kind of a lot of the stuff, I I maybe have a bad tendency

to kind of often try and code everything custom myself.

180

: 00:15:17

So like if we're doing an MCMC loop, for example,

181

: 00:15:21

Obviously there are lots of great packages out there to do that in Python, but I'd always

have the temptation to do it myself and then kind of you'll be able to fit in a few custom

182

: 00:15:29

speed ups and things like that in the process.

183

: 00:15:35

But yeah, think that maybe answer your question.

184

: 00:15:37

I'm not 100 % sure.

185

: 00:15:38

Yeah, yeah, yeah.

186

: 00:15:39

And also I'm curious, do you use any package to write the models?

187

: 00:15:43

Like are you using Pimc or you using NumPyro?

188

: 00:15:49

Yeah, so again, the models themselves are generally just kind of Python scripts, maybe

wrapped up in number or something like that to pre-compile and speed it up.

189

: 00:16:01

OK.

190

: 00:16:01

I see.

191

: 00:16:02

Yeah, but fairly standard Python.

192

: 00:16:05

OK, yeah, yeah, Yeah, then so you might want to look at PyMC because we have the number

back end.

193

: 00:16:12

OK, nice.

194

: 00:16:14

So you'll be able to run your models in.

195

: 00:16:18

in number if you use you use not by another who this is this is using number or you can

even use checks if you have a GP or if you need one yeah so yeah lots of cool stuff on

196

: 00:16:31

there on that from so yeah like if you need any any tips on how to get started with that

yeah well feel free to contact me in private and I'll be happy to to hook you up that's

197

: 00:16:43

very kind thank you

198

: 00:16:46

in Yeah, okay.

199

: 00:16:47

And something I'm also curious is you talked about, okay, my, the models need to be

actionable.

200

: 00:16:54

And so whoever you're developing the model with, you want to understand how they are going

to consume the

201

: 00:17:03

So yeah, I'm curious what your experience is right now at Como, but also your previous

clubs.

202

: 00:17:13

What was your experience talking to the coaches, to the teams?

203

: 00:17:18

Did you find...

204

: 00:17:21

that some parts were actually easier than you expected, some parts harder.

205

: 00:17:28

What was your experience of that, making that translation between the technical side and

the sports side?

206

: 00:17:36

Yeah, think that's definitely a skill I've had to develop over the last few years.

207

: 00:17:41

I've been very lucky to work with clubs where the coaches are very much...

208

: 00:17:46

They want to be very data driven, they're very much kind of evidence based in their

decisions and that's been a really good thing because that's not guaranteed in the

209

: 00:17:54

football world from stories from other people I've worked with.

210

: 00:17:59

But yeah, in terms of kind of communicating, think really a lot of it is giving the

coaches the tools to kind of both ask the questions that they want to know the answer to

211

: 00:18:10

and also...

212

: 00:18:12

Yeah, just kind of be aware of what you could do.

213

: 00:18:15

Because I think the first few times I had meetings with coaches, I'd be like, what do want

me to do?

214

: 00:18:21

You know, we've got lots of data, and that would be it.

215

: 00:18:23

And then as a coach who's kind of, you know, has a lot of football knowledge in their

head, but has never really engaged, or not engaged, but has never really, hasn't got the

216

: 00:18:31

same stats experience, you're asking them to do, you know, you're essentially asking them

to plan a stats project, which is very much not their, not their skill set.

217

: 00:18:41

you know, not the best use of their time.

218

: 00:18:44

So I think a lot of it is kind of, yeah, kind of being able to very quickly come up with

demos of what you might want to show, like even if it's kind of, you know, you've

219

: 00:18:54

horrendously hard-coded something in a notebook, just to kind of...

220

: 00:18:58

show them, it's kind of, yeah, being able to kind of sit in the meetings that they do

pre-match and post-match with the players, that gives you really good insight into kind of

221

: 00:19:09

the sort of things that they think are important, the sort of things that are coming up in

these briefings.

222

: 00:19:14

And then that gives you kind of something to bounce off and go, okay, I've seen that

you've done this.

223

: 00:19:19

Maybe I could do something related to this work you've done on their high press or

something like that.

224

: 00:19:25

And then you can start building that relationship where you're bouncing off each other.

225

: 00:19:28

It's extremely liable.

226

: 00:19:30

Basically what I try to do is...

227

: 00:19:33

Yeah, it would be so easy to just like sit in a room for a year and build really cool

stuff on football.

228

: 00:19:39

And I'd have a great time, but no one would want to use it.

229

: 00:19:44

and to kind of remember the people who are like, you your consumers essentially.

230

: 00:19:49

Yeah, yeah, exactly.

231

: 00:19:51

And that's also, that also helps you a lot to prioritize, right?

232

: 00:19:57

Because otherwise, you're gonna do a model about anything.

233

: 00:20:02

It's like, just basically, you know, kind of what you were telling the coaches, like, I

definitely could do the same thing where like, I go see the coach or the GM and I'm like,

234

: 00:20:12

okay.

235

: 00:20:13

What are you curious about?

236

: 00:20:14

What do you want to know?

237

: 00:20:15

I can be a model about anything, so just ask.

238

: 00:20:19

But then, like, that's precisely the problem because you have to prioritize what's the

most useful to them because otherwise you can end up having an awesome model but if they

239

: 00:20:31

don't use it, that's not very good, you know.

240

: 00:20:34

So, yeah.

241

: 00:20:35

Yeah.

242

: 00:20:35

Yeah.

243

: 00:20:36

And I think with kind of, yeah, I think with sports analytics being so new as well, it's

not the case that you've got like a set 10 things you want before each game and you're

244

: 00:20:44

just kind of iterating on them and making those slightly better at predicting or whatever.

245

: 00:20:48

Like you're generally, you're coming up with that list

246

: 00:20:49

of 10 things that you want.

247

: 00:20:52

And so, yeah, think that makes it really interesting time to work in sports at the minute,

but also a challenge because you've got to work out what's a good thing to be researching,

248

: 00:21:02

like you say.

249

: 00:21:03

Yeah, yeah, yeah, definitely.

250

: 00:21:05

And also...

251

: 00:21:06

It depends on the size of your data team, As the club.

252

: 00:21:13

Because, if you're just a few, handful of people, then you have to be very good at

prioritizing.

253

: 00:21:19

Okay, we really want that right now.

254

: 00:21:21

That stuff would be awesome, but that's later down the road because we need more time to

develop that or more people and we don't have more people.

255

: 00:21:30

You know, if you're a huge team, like the big MLB teams here, they have 20, 30,

256

: 00:21:37

research analysts so they can do so many things in parallel.

257

: 00:21:41

It's like having a GPU basically, it makes your model run faster, right?

258

: 00:21:48

It doesn't mean your model is going to be better, but you can try so many more things than

if you're just two, three, four people.

259

: 00:21:57

But in this case, you have to prioritize more.

260

: 00:22:00

so yeah, this is extremely, extremely important.

261

: 00:22:04

I'm curious actually, how does that work for you with the combo team?

262

: 00:22:09

because you were telling me before the show that actually the team is based in London.

263

: 00:22:15

The club obviously is Italian.

264

: 00:22:18

Obviously people know about Cuomo Lake.

265

: 00:22:22

If you don't, try and go there at some point.

266

: 00:22:24

Maybe not in the summer, it's just too many people.

267

: 00:22:27

But that's an amazing day.

268

: 00:22:30

If you're in Milan, already Milan is a great city to visit, to leave.

269

: 00:22:35

That's even better.

270

: 00:22:36

really like...

271

: 00:22:38

Milan and then you can take, know, a day you take the train, you go to Como and you just

enjoy it because that's amazing.

272

: 00:22:45

But the team, so the team is there, but the data team is actually in London, am I right?

273

: 00:22:50

Yeah, I remember when I first got offered this job, I shouted up the stairs to my wife,

we're moving to Como, and this hugely exciting voice and then a few hours later discovered

274

: 00:22:59

that that was not indeed the case, which is a bit of a shame, but obviously it's pretty

too and I get to go out every now and again.

275

: 00:23:07

But yeah, so

276

: 00:23:08

It's a pretty new team, so I joined kind of at the start of this year when it was just me

and one other guy kind of on the data side.

277

: 00:23:20

And then we've been hiring this year.

278

: 00:23:23

You know, we're always open to interesting people if anyone's listening to this and they

think, come work for Como.

279

: 00:23:31

But yeah, so I think we now have, it's difficult to remember, I think we now have maybe

about...

280

: 00:23:37

nine people, something like that.

281

: 00:23:40

And a mixture of kind of, we've got a few people on the kind of data engineering side, who

are massively helpful, and I'm very grateful to have them.

282

: 00:23:47

And then some people on the kind of data science data research side as well.

283

: 00:23:53

So yeah, we're, it's a it's a really good team to be working in and kind of a very well,

we have a very good kind of like, I don't know what the word for it is, like back end,

284

: 00:24:02

like

285

: 00:24:03

data warehouse type thing.

286

: 00:24:05

So compared to clubs I've worked for in the past where it's just been me and everything's

just stored in horrible local files and I'm scrambling around trying to make things work.

287

: 00:24:12

Yeah, but that makes your life way easier, 100%.

288

: 00:24:16

100%.

289

: 00:24:17

I'm not sure I could live without Google BigQuery now.

290

: 00:24:20

That's my favorite thing.

291

: 00:24:24

It's the foundation of what you can do basically.

292

: 00:24:26

If you don't have that, then can hire, I say you can hire scientists, but...

293

: 00:24:32

they're not going to be the most efficient they could be because basically it's going to

be very hard to get the the like the raw material, you know, it's like, basically, so it's

294

: 00:24:45

gonna sound very pretentious, but I think it's a good analogy.

295

: 00:24:49

It's like contracting Michelangelo to have a sculpture, and then having him just come to

where he has to do this culture with all the marble and so on, you know, like, no, you the

296

: 00:25:02

marble to

297

: 00:25:02

be all ready for him and like that's a great model he can just do the work and boom

because you have like other people are much better than him to source the marble or

298

: 00:25:11

something like that right so that's that's basically what you want because otherwise

without the good marble you're gonna lose time and then and Michelangelo costs a lot I

299

: 00:25:21

guess you know yeah 100 % and like it's definitely something I've learned with this kind

of yeah being my first proper proper role is how important that stuff is

300

: 00:25:32

Yeah, yeah, no, for sure.

301

: 00:25:35

Actually, I'm curious now.

302

: 00:25:37

So first, yeah, that's awesome.

303

: 00:25:39

Like nine people doing data stuff.

304

: 00:25:41

That's really cool.

305

: 00:25:42

And I'm pretty sure that makes you guys in the top 1%, I would say, of teams using data in

Europe and not only in football, I guess in most professional spots in Europe.

306

: 00:25:58

Because I don't know a of teams who are there.

307

: 00:26:02

developed to the engineer.

308

: 00:26:06

So for that, that's awesome.

309

: 00:26:08

And second, I'm wondering then what does a typical day look like for you at work?

310

: 00:26:14

And what are some of the key responsibilities in your role?

311

: 00:26:18

Yeah, so guess for me in particular, quite a lot of my focus in the last few months has

shifted towards player recruitment, which I guess is the side of football direction

312

: 00:26:27

analysis we haven't quite touched on yet.

313

: 00:26:30

But particularly at the minute, the January transfer window where you get to buy and

players is coming up increasingly fast.

314

: 00:26:38

You know, we're kind of submitting all of our lists of recommendations and then they have

to go through all of the phases of being scouted and...

315

: 00:26:46

discovering whether or not we can actually afford them, discovering whether or not they

actually want to come, and all of those things.

316

: 00:26:52

So yeah, so lot of it will be, I don't know, there's models I'm working on at the minute

that go towards that, and so it will be kind of...

317

: 00:27:00

Maybe we're adding a new feature to it, taking in some new data and then kind of

retraining those, seeing what effect they have on the list of players that we have, see

318

: 00:27:09

kind of, you know, we're at the stage now where we've kind of got a relatively short list

of targets for each position.

319

: 00:27:14

And so we're able to kind of do a bit more in depth analysis of those guys and see, okay,

what do we actually, what do we forecast if they join us?

320

: 00:27:23

And then at the same time, kind of responding to...

321

: 00:27:26

you know, as a team, we absorb quite a lot of ad hoc requests from the coaching staff.

322

: 00:27:31

And so, yeah, being able to kind of respond to those, whether it's kind of, I don't know,

we've got a particular opponent coming up and we want to look at a particular thing about

323

: 00:27:43

their style, or we need a particular report on like, maybe someone's played international

matches, and we need to kind of look at how they did physically in those games or

324

: 00:27:54

something.

325

: 00:27:55

So yeah, it can be a whole range of things.

326

: 00:27:57

It's not an industry where you can very easily plan ahead.

327

: 00:28:00

Like two weeks on Wednesday, I will spend the whole day doing this and nothing else.

328

: 00:28:04

That's not really possible, but that keeps it fun.

329

: 00:28:08

Yeah, yeah, for sure.

330

: 00:28:12

Actually, a question I have for you that I often hear, know, like kind of a...

331

: 00:28:19

widespread belief in football is that the winter transfer window is not very useful and

nothing happens?

332

: 00:28:25

Is that something you've seen in the data?

333

: 00:28:27

Yeah, I I think the lower down the pyramid you go, and that's kind of been where a lot of

my experience has been up till this year.

334

: 00:28:37

Clubs very much do use the winter transfer window.

335

: 00:28:40

I mean, so where I started at Oxford City and we were semi-pro, we would sign players

actually all the way through the season.

336

: 00:28:45

There wasn't specific transfer windows.

337

: 00:28:48

But even somewhere like KOMODE, you still want to be on the lookout for players.

338

: 00:28:54

It's particularly important for players coming to the end of their contract.

339

: 00:28:57

So if you've got a deal that expires in the summer, when generally most player contracts

do end...

340

: 00:29:02

then the January before that is kind of a good opportunity to pick up those players at a

reasonably cheap price.

341

: 00:29:09

And so that's an important thing to look at.

342

: 00:29:11

You've also got players who maybe joined a club and then aren't playing, or got replaced

towards the end of the summer transfer window and aren't playing.

343

: 00:29:20

And so there are opportunities to pick up good players.

344

: 00:29:24

The very good players will come with premium in January though.

345

: 00:29:28

because clubs don't want to sell them mid-season.

346

: 00:29:30

Yeah, it's kind of like, yeah, that was my prior.

347

: 00:29:33

That basically it's more useful for lower ranking clubs than big clubs, Yeah, exactly.

348

: 00:29:43

And I mean, you've always...

349

: 00:29:45

Yeah, like it depends kind of what your...

350

: 00:29:47

People might cost more, but you might need them more in January as well.

351

: 00:29:50

And so you've just got to kind of weigh up, like you've got a better idea of how your

season's going.

352

: 00:29:55

you can make a better assessment of, if we sign such and such a player, maybe we expect...

353

: 00:30:00

If your chance of relegation changes from 70 % to 10%, then that's really worth it.

354

: 00:30:05

And that's something you can make a slightly more accurate assessment of in January.

355

: 00:30:10

Although that would maybe be quite a player who decreased your relegation chances by that

much.

356

: 00:30:15

Yeah, yeah, I mean, that's often like that in football, right?

357

: 00:30:21

The richest clubs...

358

: 00:30:23

often often have their way anyways.

359

: 00:30:27

Like when they really need someone exactly exactly.

360

: 00:30:31

But yes, I'd be shocked with no data behind it.

361

: 00:30:33

I'd be shocked if they didn't sign a few.

362

: 00:30:34

Yeah, yeah, yeah.

363

: 00:30:36

So something I'm curious about is if you you know, any common biases in your football

data?

364

: 00:30:47

And how do you adjust for them in your analysis?

365

: 00:30:51

Yeah, so

366

: 00:30:54

I think there's quite a lot of biases in terms of, and particularly on the recruitment

side, in terms of using the kind of traditional stats that you use to assess a football

367

: 00:31:03

game.

368

: 00:31:04

So possibly the canonical example of this is something like a pass completion rate for a

player.

369

: 00:31:11

You know, that's kind of long been used to kind of, if you're trying to play a passing

game, you want to bring in players with high pass completion rates, and that's not untrue.

370

: 00:31:20

But it's also so dependent on the kind of difficulty of the passes, which they're

attempting.

371

: 00:31:25

You know, a player who kind of plays in defensive midfield, taps the ball about a little

bit, will have a high pass completion rate.

372

: 00:31:30

A player looking to play really aggressive balls will have a much lower one.

373

: 00:31:33

And so one of the places that kind of, yeah, statistical modelling can help you there is

you can come up with an expected passers score.

374

: 00:31:41

You know, guess probably a lot of people listening are maybe familiar with expected goals

and expected passes is just the same as that.

375

: 00:31:48

It's for each pass, what do we think the probability of this being completed is?

376

: 00:31:52

And as we increasingly get more data in football, our models for expected passes can

become increasingly good.

377

: 00:31:58

And so kind of, you know, five, six years ago before you had tracking data and you just

had very granular event data of there was this player here and he tried this pass to this

378

: 00:32:07

other player on the other side of the pitch.

379

: 00:32:09

You can make a kind of model like

380

: 00:32:11

If it's a long pass, if it's a forward pass, those things make the pass more difficult.

381

: 00:32:16

But now, we have data on where every player is across the whole pitch, and increasingly

even what direction they were facing, and maybe where their elbow was, if you want to try

382

: 00:32:25

and use that.

383

: 00:32:26

And so that kind of lets you make a really unbiased assessment of how good a player is at

passing, not taking into account the kinds of passes that they're trying.

384

: 00:32:37

And this is something that really early on, I think, is still one of my favorite football

385

: 00:32:41

analytics stories is when when I was at Oxford City we had one of these expected passes

models we were playing against the team who at the time had won I think 11 out of 11 in

386

: 00:32:51

the league the mighty EBS fleet United and

387

: 00:32:56

All of their team have really high pass completion rates, but we noticed that their

goalkeeper actually was massively underperforming his expected passes, because he was just

388

: 00:33:04

playing really simple passes to the centre backs.

389

: 00:33:06

Whenever he went long, he gave the ball away.

390

: 00:33:09

And so the strategy for that match was, great, we should try and press the goalkeeper,

which we did.

391

: 00:33:15

Kind of took away his passing options, that worked really well, to the point where the

goalkeeper actually came out at half time.

392

: 00:33:21

and practice kicking long because he was being forced to and that was so out of his

comfort zone.

393

: 00:33:26

And so yeah, that's kind of a really good example of looking beyond just the basic

statistics and using modeling to get some useful conclusions out and a point that I hope

394

: 00:33:35

to actually.

395

: 00:33:38

Yeah, yeah, That's awesome.

396

: 00:33:41

I really like that example for sure.

397

: 00:33:44

And actually, you have another?

398

: 00:33:48

example of a particularly insightful finding that helped you assess player strength and

weaknesses using data, which is something I guess you do a lot, especially for transfer

399

: 00:34:00

recommendations.

400

: 00:34:02

Yeah, definitely.

401

: 00:34:03

Obviously, there's a limit of exactly what I could give away on a podcast, I think I an

example.

402

: 00:34:11

Yeah, I think one really interesting study which I did while I was still at uni, and this

is kind of not quite so much on a player level, but it was assessing, kind of comparing

403

: 00:34:23

men's and women's football.

404

: 00:34:24

So I did a paper just before the Women's World Cup in 2023.

405

: 00:34:29

And again, it was kind of a similar passing-based study, where we were looking at what are

the differences here.

406

: 00:34:34

And by actually developing an expected threat model, we were able to see a really clear

difference.

407

: 00:34:40

So what you can do, or the way that this model worked, is you kind of break the pitch down

into lots of little grids.

408

: 00:34:45

essentially have a kind Markov chain of transitions between those grids.

409

: 00:34:50

You work out the probability of, if you've got the ball in a particular space, what's the

probability you score in the next possession or the next 20 seconds, or however many

410

: 00:34:59

you...

411

: 00:34:59

want to segment your data.

412

: 00:35:02

And then...

413

: 00:35:03

Yeah, by kind of comparing the start and end locations of all of the passes that players

were attempting, it really stood out that in the women's game, at least in the data that

414

: 00:35:12

we had, all of the well not all of the passes, but the average expected threat of a pass

was much higher than in the men's game.

415

: 00:35:20

And so it was really kind of, there were a lot more passes in the men's game, but like the

total threat generated was the same.

416

: 00:35:26

And so you got this kind of, you were seeing this much more kind of dynamic, much more

attacking intent behind the play.

417

: 00:35:35

Yeah, okay.

418

: 00:35:37

That's super cool.

419

: 00:35:38

And the fact that you can, does that help the fact that you can do the same analysis on

different teams from the same club?

420

: 00:35:48

like women academy, first man's team, et cetera.

421

: 00:35:53

Yeah, it's a really...

422

: 00:35:55

I think it just means you can do a lot more as a data team, right?

423

: 00:35:59

I think it is important to be aware of the differences between the different kind of, you

know, the women's game, the men's game, and the youth game, I guess, would be the three

424

: 00:36:08

categories.

425

: 00:36:10

But a lot of the models, if you write a model for the men's team, say, and then apply it

to the youth team, it will certainly at least still have some validity, even if the

426

: 00:36:18

parameters that you've trained or the exact form that you've used isn't completely

optimal.

427

: 00:36:23

And so that means that kind of

428

: 00:36:25

when we write our post-match reports at Como for the first team, we're also able to

produce the same report for the under-19s, for the under-17s, and so we're able to kind

429

: 00:36:34

of, not only kind of give the coaches more information about how their team might be doing

and how we can improve, which is obviously a hugely valuable thing, but also get the

430

: 00:36:42

players themselves really involved in the data analysis process.

431

: 00:36:47

So they're used to playing a game and then looking at the feedback from it, and that's

something you can build up a culture in the club as players are coming through.

432

: 00:36:55

that can then kind of really just become a really natural thing as players progress into

the first team.

433

: 00:37:00

Rather than being in the other 19s, suddenly moving to the first team and then no,

someone's showing you a whole load of graphs about how bad you were.

434

: 00:37:07

It's something that you can, yeah, you can be bit more realistic.

435

: 00:37:10

yeah.

436

: 00:37:10

No, for sure, and that makes also the players, you know, like, participate in the analysis

and like be part of the models and so on.

437

: 00:37:20

I think that that's much...

438

: 00:37:25

easier than to get their acceptation of what the model is saying afterwards.

439

: 00:37:30

Yeah, exactly.

440

: 00:37:31

Okay.

441

: 00:37:35

Are there any statistical techniques or models that you find most useful in your work?

442

: 00:37:44

How do you see them helping in decision making?

443

: 00:37:50

Yeah.

444

: 00:37:52

I'm a big fan of kind of relatively heuristic based parametric models.

445

: 00:37:58

So if I'm approached with the problem of football, particularly in a kind of prediction

sense, think kind of exploring the data and exploring kind of

446

: 00:38:09

what you might expect to see from it and then coming up with a model with relatively small

numbers of parameters is something I find really helpful.

447

: 00:38:16

So for example, like a lot of statistics in football, because they count statistics over a

sufficiently long amount of time, you you'd expect them to be approximately normally

448

: 00:38:26

distributed if they came over a season.

449

: 00:38:28

And so you can bake that into your model already, which I think is a real advantage to

kind of...

450

: 00:38:35

I don't know, compared to just training a completely black box model where it's got to

work out its confidence intervals or prediction intervals or credible intervals or

451

: 00:38:42

whatever it purport to do at the end of it.

452

: 00:38:47

Because you know that there should already be quite a lot of structure to your data.

453

: 00:38:51

So one thing, we were building a the other day that kind of, we were looking at the total

number of actions that a player performed, like just as a simple test case for something.

454

: 00:39:02

We allowed it to be normally distributed with any variance.

455

: 00:39:05

And then, and behold, after training the model, the variance parameter had a very tiny

confidence interval around the Poisson mean that you'd expect to see.

456

: 00:39:13

it was just variance was equal to the mean.

457

: 00:39:17

And that did make me feel slightly stupid because I was like, come on now, we maybe could

have seen that coming, could have saved ourselves a bit of compute there.

458

: 00:39:24

But yeah, really taking advantage of the structure in the data and the kind of...

459

: 00:39:30

Yeah, particularly when you're looking at averages, the normal things that you'd expect to

see.

460

: 00:39:33

That can be really helpful.

461

: 00:39:35

Hmm.

462

: 00:39:36

Hmm.

463

: 00:39:36

Okay.

464

: 00:39:37

Yeah, that's interesting.

465

: 00:39:39

And yeah, basically, these kind of heuristics help you having some...

466

: 00:39:45

some already good first results to then have a back and forth with the domain experts.

467

: 00:39:52

Exactly.

468

: 00:39:53

Exactly.

469

: 00:39:54

it gives you something, it gives you a really good starting point for models.

470

: 00:39:57

So if you're doing something like predicting...

471

: 00:40:01

Results like as much as it's not a perfect model starting with both teams score a Poisson

distributed number of goals Like that gives you a really sensible starting point where you

472

: 00:40:11

can add different things in and so maybe you adjust You know you kind of use a Poisson,

but you have some adjustment parameter to it people use things like the viable

473

: 00:40:20

distribution or something like that and Maybe yet you introduce somehow introduce some

kind of dependence in your model that once the first goal is scored the parameters change

474

: 00:40:29

or something that's wanting becomes more defensive

475

: 00:40:33

And yeah, like, but having that kind of solid baseline not only gives you a good baseline

to compare your other models against, but also allows you to do this kind of slow

476

: 00:40:44

iterative process that gets you towards a kind of, you know, your model at the end might

be quite complicated, but because it started from a sensible point, you kind of know what

477

: 00:40:51

all the steps were.

478

: 00:40:53

And it also makes things really explainable, like that's another real advantage of these

parametric models.

479

: 00:40:59

If every one of your parameters has some meaning in your head, then you're to be able to

really easily explain that to the coaches.

480

: 00:41:05

Whereas I'm not sure they're going to love looking at a sharp plot or something like that

from your set of: 2000

481

: 00:41:13

Yeah.

482

: 00:41:13

Yeah, I know for sure.

483

: 00:41:15

And that's why I think it ties back to what we're saying where knowing how they are going

to use the model is extremely important because then you can just...

484

: 00:41:25

compute back their samples of your model instead of showing these, you just compute back

to the metric they are interested in.

485

: 00:41:34

that gives you a lot of buying and also a lot of very useful back and forth because

usually they have an extremely deep domain knowledge that you don't have.

486

: 00:41:46

And that's absolutely super valuable for you as the modeler.

487

: 00:41:51

Exactly, and it gives them the opportunity to change some of the parameters as well.

488

: 00:41:55

Like if all of your inputs are things that they understand, then yeah, maybe when they're

recruiting for a particular role, it's not just a winger, it's a like inverted winger that

489

: 00:42:06

they're looking for, then they'll be able to know, this is actually more important, this

is a bit less important, I can tweak those on whatever app that you've coded for me, and

490

: 00:42:15

then yeah, see how that affects the list of players that comes up.

491

: 00:42:18

Right, right.

492

: 00:42:19

Yeah, very good point.

493

: 00:42:21

Yeah, super cool.

494

: 00:42:24

And something else I'm wondering is, you touched a bit on that already, but data is

becoming more more available in football in particular, especially tracking data.

495

: 00:42:39

How has that influenced your work?

496

: 00:42:42

And what new insight were you already able to provide?

497

: 00:42:46

Yeah, so

498

: 00:42:48

Tracking data is a really exciting data source.

499

: 00:42:52

The fact that it's so available now is just mental.

500

: 00:42:54

It still blows my mind that every weekend we get like XY tracking data from hundreds of

matches across Europe and the world in fact.

501

: 00:43:02

I think we get it from the States at least.

502

: 00:43:06

Yeah, it subsets the possibilities from that endless, right?

503

: 00:43:09

Like you've got enough data Modulo the noise that's there and it is relatively

substantial, but you've got enough data to completely understand what's going on on the

504

: 00:43:18

football pitch and So you can start to do things like I was saying earlier about the

expected passage model You can start to do things like really accurately say Okay, this

505

: 00:43:27

scenario, know, these were your options and this guy you shouldn't have passed you or you

you know You could have looked for this other option

506

: 00:43:36

think kind of, you know, ultimately, the aim of a lot of models and their kind of pieces

in this very long journey is to essentially come up with a system that can play football,

507

: 00:43:46

right, that understands the game and that is able to go, okay, from this position, if I

was all 11 players, I would move like this, and I would pass like this.

508

: 00:43:54

And this would kind of, yeah, increase my chances of scoring a goal by 5%, or what have

you.

509

: 00:44:01

So kind of almost building, you know, like a stockfish, but for football, which...

510

: 00:44:05

It's an incredibly complicated thing to do because your players can no longer just move

two steps forward or one step to the left.

511

: 00:44:12

But yeah, like, I think it's the first stage.

512

: 00:44:16

The data is there now to be able to build those models.

513

: 00:44:18

And kind of I do see that kind of a lot of a lot of our projects that we're working

towards are just

514

: 00:44:25

Ultimately, you could construe them as just small pieces on this on this journey Yeah,

think particularly the data that's coming out in football now as well with not just player

515

: 00:44:34

positions But you get like 20 coordinates per player you get kind of with their hands and

their arms and their knees are Like exactly how you choose that data.

516

: 00:44:42

I'm not entirely sure but it's certainly an exciting like exciting thing to have Yeah,

yeah, no, they definitely and how do you see the

517

: 00:44:55

the field and the role of analysis in football evolving in the next few years?

518

: 00:45:02

Yeah, I think, I mean the very boring answer is that I think it would just continue to

become increasingly important.

519

: 00:45:09

I think we're starting to see that we've got a generation of coaches coming through who

have been essentially had this data available to them throughout their entire coaching

520

: 00:45:18

career.

521

: 00:45:20

and are going to be much more willing to use it are going to see it as a much higher

priority, invest more in their data teams, and kind of, yeah, begin to use it for a lot

522

: 00:45:30

more of their decision making.

523

: 00:45:32

And I think in terms of kind of actual things that might tangibly change, I don't think it

will be too long before we start seeing, and in some sense, you could argue that teams

524

: 00:45:40

like Man City are already doing this, we start seeing strategies being employed that are

kind of purely come up, have purely been developed by from the data.

525

: 00:45:49

And so rather than kind of like I was saying earlier, using the data to decide between

three possible lineups or something or three possible formations, you could play, like

526

: 00:45:57

actually starting to build models to, you know, say, oh, well, maybe if we get our players

to kind of be in this formation, and then morph into this one, like when a certain thing

527

: 00:46:08

happens, then that's going to be really effective.

528

: 00:46:11

And so I think, you know, like,

529

: 00:46:14

already now if you compare how football's played to 40 years ago, it looks very different.

530

: 00:46:18

And I think the speed of that change is only going to be increased by data as teams kind

of come up with increasingly good ways to play against each other and you know, the

531

: 00:46:24

optimal strategy shifts and shifts and shifts.

532

: 00:46:27

And I think also like player recruitment from kind of, you know, like countries that don't

have big leagues in is also going to increase you already look at

533

: 00:46:38

Brighton and Bradford are the two canonical examples of that and how well they're doing.

534

: 00:46:42

It's something that every recruitment team around the world and every kind of consultant

is trying to do as well.

535

: 00:46:50

But I think people are going to get increasingly good at that.

536

: 00:46:53

And so I imagine you'll start seeing players move earlier.

537

: 00:46:55

And because after 500 minutes with tracking data, you're able to assess that they're going

to be good enough to play in the Premier League.

538

: 00:47:02

And also kind of move to a wider range of clubs.

539

: 00:47:04

It won't just be Brighton who sign 20 youngsters from South America who are somehow

amazing.

540

: 00:47:08

It'll be kind of 20 premier league clubs each finding one youngster and then fighting over

them.

541

: 00:47:15

Yeah, yeah, Yeah, great point.

542

: 00:47:20

And so, like, that in mind, what advice would you give to...

543

: 00:47:27

aspiring football analysts or data scientists who want to break into the sports analytics

industry?

544

: 00:47:34

Yeah, I guess the main thing I would say to them is to kind of be like take the time to be

curious.

545

: 00:47:41

And there's quite a lot of good open source data out there.

546

: 00:47:44

So statsbomb have an open data repository.

547

: 00:47:46

They're one of the big football data providers.

548

: 00:47:48

If you just

549

: 00:47:49

Google stats bomb open data, I'm sure it will come up.

550

: 00:47:51

And skill corner who are one of the big tracking providers have, I think seven or eight

example games out there, which is enough to kind of get you started playing with that.

551

: 00:48:01

And I think just kind of take the time to have fun with that data, look at it, try and

come up with some interesting models, maybe take half a season's worth of data and try and

552

: 00:48:10

predict a certain thing in the second half.

553

: 00:48:12

And because when clubs are recruiting analysts, like the top thing they want to see is

kind of that

554

: 00:48:19

portfolio is that you can do things and kind of yeah, not only have you got the interest,

but you've also got the skill to be able to produce really insightful things.

555

: 00:48:28

And so yeah, that's definitely something that's worth doing kind of even if it's not

connected to a club at first, even if it's just kind of in your own time.

556

: 00:48:37

Yeah, that is great for building skills.

557

: 00:48:40

And then secondly, is to try and get involved in a club like I'm very, very grateful to

558

: 00:48:46

Oxford City for kind of taking me on when I didn't really know anything and those first

couple of years of getting experience were really really important and they meant that

559

: 00:48:54

when I went to slightly bigger clubs who if I went and I was rubbish they just sacked me.

560

: 00:48:59

When I went to those clubs I at least kind of roughly knew what I was talking about and

was able to do some vaguely useful things.

561

: 00:49:05

So yeah kind of reaching out even if they're a club that doesn't have much data in some

sense that makes you be more creative like

562

: 00:49:11

we spent a bit of time investigating can we build a recruitment model based on Twitter at

Oxford City, because in the seventh tier, that's basically all you've got.

563

: 00:49:21

And it didn't really go anywhere in the end, sadly, but I'm sure someone could do a better

job than me.

564

: 00:49:26

And so yeah, it gives you that chance to be creative and helps you also have a real impact

on the team.

565

: 00:49:32

Yeah, yeah, yeah, love that.

566

: 00:49:34

The obstacle is the way, Exactly.

567

: 00:49:37

Yeah, I love that.

568

: 00:49:38

That's great.

569

: 00:49:41

Great advice.

570

: 00:49:43

And yeah, I completely second everything you just said.

571

: 00:49:49

And also starting being involved in some open source work is definitely a good way to get

there for sure.

572

: 00:49:59

Like trying to identify things people are interested in and you are interested in and are.

573

: 00:50:06

able to do it because you know you have to do it even if nobody is paying you to do that

you won't do that all the time you know all your life but yeah like if you have some time

574

: 00:50:18

especially if you're in your 20s that's when you want to experiment and do that kind of

stuff so yeah definitely try and do that and as you were saying try and finding some

575

: 00:50:29

mentors is something extremely important but there is a skill in doing that too right like

576

: 00:50:36

Don't go and ask people, hey, can you be my mentor?

577

: 00:50:40

It's usually better if you have something to offer in return.

578

: 00:50:44

If they have a pain point and you can help them with that, then in exchange they'll be

your mentor.

579

: 00:50:50

Or if you contribute to the open software, just something like that, they'll be much more

open to mentoring you.

580

: 00:51:00

And personally, I'm curious what your future projects or research areas are for the coming

month that you are excited about.

581

: 00:51:13

Yeah, so I think kind of one thing that I'm really excited about looking towards working

on is, you know, kind of again, it's a step in the journey towards this kind of chess

582

: 00:51:27

computer dream for football.

583

: 00:51:29

is coming up with a really good way of assessing kind of what's the probability of you

scoring a goal in the next 20 seconds from being able to use all of the contextual data

584

: 00:51:38

you've got in tracking data.

585

: 00:51:40

And so like I said, like I was talking about earlier, you can make a kind of simple

expected threat model and you can kind of do a few more things along that line.

586

: 00:51:48

I've read papers where kind of they've built

587

: 00:51:53

graph neural networks to predict the next event in a set of event data.

588

: 00:51:56

And you can start doing that and you can kind of change your predictions together.

589

: 00:52:00

And do things with the the discrete data, but really moving that on to the tracking

tracking data I think that's something that would unlock quite a lot more analysis You

590

: 00:52:11

know one of the big issues with the bad data is it's very much focused on the ball and you

don't get to see The case where someone makes a great run that kind of opens up The chance

591

: 00:52:20

for a goal you just see the player who had the ball being able to dribble into this

miraculously free space and then take their shot and say

592

: 00:52:29

Yeah, doing that to be able to really start to get some more understanding on how the

whole football team contributes to creating a chance, both offensively and defensively as

593

: 00:52:38

well.

594

: 00:52:40

And then I think also just, you know, just continuing kind of working with coaching staff,

just getting better at kind of, you know, the match analysis side of things, like learning

595

: 00:52:55

how to...

596

: 00:52:57

how to really characterize opposition teams in a way that's helpful.

597

: 00:53:01

and just kind of, yeah, it's very much a learning experience.

598

: 00:53:03

I still feel like I'm in the stage where, oh, certainly if you put me in charge of a

football team and asked me to actually design a set of tactics, I would be so useless.

599

: 00:53:11

I'd be like, yeah, 4-4-2, let's just lop up to the big guy.

600

: 00:53:15

And so kind of, yeah, continuing on that journey of learning, properly learning football

is something really excited to do.

601

: 00:53:22

So I think we can call it a show that was like, I have so many more questions still, but

I've already taken quite a lot of time from you.

602

: 00:53:31

Thank you so much, Matt.

603

: 00:53:32

As usual though, I'll ask you the last two questions I ask every guest at the end of the

show.

604

: 00:53:38

First one, if you had unlimited time and resources, which problem would you try to solve?

605

: 00:53:47

Yeah, I guess this kind of goes back to the nerdy 15 year old guy reading about unsolved

math problems that I was.

606

: 00:53:57

But I would love to really get my teeth into one of these essentially impossible problems.

607

: 00:54:03

think the collapse conjecture has always really stood out to me as something that you just

feel like you should be able to solve, but at the same time it's just completely

608

: 00:54:10

impossible and I have no idea where to start.

609

: 00:54:13

And so I think in an alternate life I can see myself locking myself in a room and working

on it for a year and probably getting nowhere but having quite a lot of fun along the way.

610

: 00:54:24

But yes, I feel I should maybe have said something more useful like that would have an

impact on the world, but...

611

: 00:54:31

Now, it's fine.

612

: 00:54:32

Yeah, so I spent a long time thinking about this question.

613

: 00:54:36

I think actually what I'm always amazed by is kind of inventions that...

614

: 00:54:42

could have been invented quite a long time before.

615

: 00:54:44

So something like the smartphone, right?

616

: 00:54:45

Like an ancient Egyptian is never going to able to invent the smartphone because they need

so many different things and so many components to get there.

617

: 00:54:52

So we got the hot air balloon, however.

618

: 00:54:53

And so this is my answer.

619

: 00:54:55

It's the Montgolfier brothers who invented the hot air balloon.

620

: 00:54:58

All you need is a way of making air hot and a balloon to hold it in.

621

: 00:55:02

And then you can fly, which is so cool.

622

: 00:55:05

And something that like humans surely have always wanted to do.

623

: 00:55:09

And so, yeah, like I've

624

: 00:55:12

that's an idea that could have been invented thousands of years before and so being able

to sit down with those kind of guys and be like what's your second best idea and let me

625

: 00:55:20

have it so that I can invent something that wouldn't have been invented for another 3,000

years still I think that

626

: 00:55:27

to great to pick that for AIDS.

627

: 00:55:29

I'm glad I got a unicorn.

628

: 00:55:31

Excellent, we'll fly over in our hot air balloon and then we can meet you in the States.

629

: 00:55:34

yeah, that's great.

630

: 00:55:36

so Hot Air Balloon, if you ever go to my native region, which is the Loire Valley, so in

the center of France, which is the region where the castles,

631

: 00:55:47

are from the Middle Ages and the Renaissance.

632

: 00:55:50

Definitely recommend doing a hot air balloon flight.

633

: 00:55:54

They usually start very, very early in the morning, something like 4 or 5 a.m.

634

: 00:56:00

But it's absolutely amazing because you get to see the sunrise and then you have like that

gorgeous view of the castles along the Loire, which is the river.

635

: 00:56:12

Yeah, that's absolutely incredible.

636

: 00:56:16

I haven't done it myself.

637

: 00:56:18

I speak like I had.

638

: 00:56:20

But I watched a lot of videos of my parents because that's a present.

639

: 00:56:28

That's the gift I made to my father for his last birthday.

640

: 00:56:32

And yeah, this looks absolutely great.

641

: 00:56:35

Definitely recommend it.

642

: 00:56:36

It's pricey, but it's really worth it.

643

: 00:56:41

I hope you enjoyed it.

644

: 00:56:42

A link to your website in the show notes for those who want to dig deeper.

645

: 00:56:47

Thank you again, Matt, for taking the time and being on this show.

646

: 00:56:58

This has been another episode of Learning Bayesian Statistics.

647

: 00:57:02

Be sure to rate, review, and follow the show on your favorite podcatcher, and visit

learnbayestats.com for more resources about today's topics, as well as access to more

648

: 00:57:12

episodes to help you reach true Bayesian state of mind.

649

: 00:57:17

That's learnbayestats.com.

650

: 00:57:19

Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.

651

: 00:57:23

Check out his awesome work at bababrinkman.com.

652

: 00:57:27

I'm your host,

653

: 00:57:28

Alex Andora.

654

: 00:57:29

can follow me on Twitter at Alex underscore Andora, like the country.

655

: 00:57:33

You can support the show and unlock exclusive benefits by visiting Patreon.com slash

LearnBasedDance.

656

: 00:57:40

Thank you so much for listening and for your support.

657

: 00:57:43

You're truly a good Bayesian.

658

: 00:57:45

Change your predictions after taking information in.

659

: 00:57:48

And if you're thinking I'll be less than amazing.

660

: 00:57:52

Let's adjust those expectations.

661

: 00:57:55

Let me show you how to be a good daisy Change calculations after taking fresh data in

Those predictions that your brain is making Let's get them on a solid foundation

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube