Football vs. Data Analysis

Rooney vs Montenegro 2nd half (Wired)

Wayne Rooney’s movement – England vs. Montenegro, second half (Wired)

It’s the day of the World Cup final, huzzah!

The tournament has produced a blizzard of social media-friendly infographics. Even the good old Beeb has got in on the act, and the Press Association has hired two talented footy data/visualisation chaps (cf Matchstory) to produce things like this:

Data ist sexy, ja?

I work with data every day (analysing games), and I’m season ticket holder at Luton Town – winners of last year’s (Blue Square Bet Conference) Premier League – so I was curious about how games and football data analysis compare.


In games there is a ridiculous amount of data, millions and billions of data points – level success rates, clicks on a given button, use of boosters, and much much much more. Perhaps surprisingly, it’s the same in football. At German nouveau-riche TSG Hoffenheim, sensors are attached to players, cones and goalposts, and inserted into the footballs. With ten players training for just ten minutes with three balls, those sensors will track more than 7m data points.

Reading the Matrix

When you have lots of data, it needs analysis and interpretation. I’m lucky to work with a super-talented team who can identify and understand the patterns in our data. Top football clubs work the same way. Manchester City now have eleven people crunching the numbers, so they could field a team made up entirely of analysts.

Action Men

The aim of the analysis in both fields is to come up with ideas for action – now that we know x, we should do y. We do this in games the whole time, working with the production team to decide on new AB tests and feature priorities. Same in football: Manchester City hadn’t scored from a corner for 22 games, but by switching from out-swinging to in-swinging corners (as suggested by their data team), they scored nine goals from corners in 12 games, and won the title when Vincent Kompany headed in from a corner against Manchester United.

Art + Science

Data alone will not give you the answers. I wrote about this here: What is B?. When you find a pattern or a problem in the data, and want to make a change, what do you change it to? Should we add a new booster? / Should Liverpool switch to a different shape in central midfield? Data can help you identify a problem, but it doesn’t always provide the solution. So using data is an art as well as a science.

Too much love will kill you

There is an absolute avalanche of data available in games, like in sports. So it’s easy to have too much data. Bolton’s Head of Analytic Development admitted that since their goalkeeper had started studying opposition penalty takers, he had actually saved fewer penalties – not the intended result. Sports players at their best operate in a state of flow, so over-thinking is a real risk to performance.

Ignore the ignorables

That sounds like an Ian Holloway quote, but I just made it up. In The Name Of The Rose, wise old William of Baskerville says that:

Learning does not consist only of knowing what we must or we can do, but also of knowing what we could do and perhaps should not do.

How true. With millions/billions of data points, you can’t look at everything – so you have to use instinct and experience to filter out the stuff that is of less importance or which will have less impact. Prioritisation, a constant battle! Simon Kuper, football data guy and writer, is convinced that football is in the very early stages of understanding how to use data to improve performance, because it’s pretty new and because knowing the right things to look at is not easy in a dynamic, unpredictable environment like a football match.

Measuring the wrong things

You have to look at the right things, because the wrong things can lead you astray. Alex Ferguson, grumpy erstwhile Manchester United manager, sold defender Jaap Stam in 2001 because Stam’s number of tackles was decreasing. Ferguson thought Stam was in decline – but he went on to play successfully at big clubs for several more years. It turns out that tackles are not a good yardstick for the defender’s value. Kuper points out that great defenders like Paolo Maldini actually don’t need to tackle that much, because their positional skills alone reduce opportunities for the opposing side.

The limits of data

Football is not like baseball (Moneyball) – it’s more dynamic, less structured, more anarchic; that makes it harder to apply analysis to improving performance. There’s a lot of stuff that cannot be measured or understood in a quantitative way. What does a player think or feel when they’re playing one a mobile game? That’s tough to see in the data (though there are many qualitative ways to learn the answers). If you can only manage what you measure, then the limitations of your measurements are crucial. The same applies to football: there are many limitations, even when it comes to one of the most structured elements of the game – the penalty kick. Again per Kuper, some players are predictable – e.g. at one stage, Diego Forlan alternated which side he would hit his penalties, left-right-left-right. So tracking his penalty taking would have been helpful. But other players are unpredictable: Franck Ribery mixes up his penalties seemingly at random. Unpredictability is rewarded, because it’s harder to combat – and unpredictability also reduces the usefulness of data analysis. You can’t see a pattern that’s not there.

So there you go. Several similarities – rather more in fact than I thought there would be when I started researching football data yesterday. Worthwhile further reading if you’re interested: New StatesmanThe GuardianBBC, and Wired.

What do you think?

Post Navigation