Blog List.

Applying the Pythagorean Expectation to Football: Part One

Introduction

The Baseball Pythagorean Expectation is a formula originally derived by Bill James to estimate how many games a baseball team could be expected to win over a season based on the number of runs they score and concede (Figure 1). Teams winning fewer games than their Pythagorean prediction are considered to have been unlucky while those outperforming the prediction are thought to have had luck on their side.

$wins = runs scored^2 / (runs scored^2 + runs allowed^2)$

Figure 1: The Baseball Pythagorean Expectation

The formula works well for baseball, giving predictions generally within three games of what actually happens. The Pythagorean expectation has also been applied successfully to other sports, including American football and basketball. However, so far the equation has not worked particularly well for predicting football matches.

Table 1 shows goals scored and conceded in the English Premier League (EPL) during the 2011–2012 season, along with the actual points and Pythagorean predicted points. Looking at the difference between predicted and actual points it is clear that the Pythagorean expectation is over-predicting at the top of the table and under-predicting at the bottom.

Team GF GA Pts Pythag Pts
Manchester City 93 29 89 104
Manchester United 89 33 89 100
Arsenal 74 49 70 79
Tottenham Hotspur 66 41 69 82
Newcastle United 56 51 65 62
Chelsea 65 46 64 76
Everton 50 40 56 70
Liverpool 47 40 52 66
Fulham 48 51 52 54
West Bromwich Albion 45 52 47 49
Swansea City 44 51 47 49
Norwich City 52 66 47 44
Sunderland 45 46 45 56
Stoke City 36 53 45 36
Wigan Athletic 42 62 43 36
Aston Villa 37 53 38 37
Queens Park Rangers 43 66 37 34
Bolton Wanderers 46 77 36 30
Blackburn Rovers 48 78 31 31
Wolverhampton Wanderers 40 82 25 22
RMSE 8.4

Table 1: Pythagorean Expectation for the EPL 2011–2012

We can quantify this error by calculating the root-mean-square error (RMSE). This technique basically squares the difference between the predicted and actual points and then takes the square root of the average. It sounds complicated but all the squares and square roots do is make all the numbers positive. Imagine if we predicted just two values and were -10 points out for the first and +10 points out on the second. If we just averaged these two numbers then the average error would be zero, making it look like our prediction was perfect when it obviously was not. Instead, if we square the numbers first and then take the square root of the average we get the correct error of ten points. Doing this calculation for Table 1 gives us a RMSE of 8.4 points meaning that on average the Pythagorean expectation was eight points out for the 2011–2012 season.

The more accurate the predictions are then the lower the RMSE will be. One way to improve the prediction is to alter the exponent used in the equation. In other words, instead of raising goals scored and conceded to the power of two we use different values. Figure 2 shows what happens to the RMSE as the exponent is changed from 0.1–3. Looking at the chart, the RMSE is lowest using an exponent of 1.35, giving an average error of 5.75, nearly three points lower than before.

Pelican

Figure 2: Effect of Altering Exponent on RMSE

The next logical step to improve the prediction further is to try using a different exponent for each part of the equation. This makes the formula harder to optimize but by applying a technique called least squares to it we come up with optimal exponents of 1.39, 1.43 and 0.98. Unfortunately this has little effect on the RMSE though, reducing it just 0.1 to 5.65 points.

So far the predictions are still nearly six points out but in part two of this article I will discuss why the error is high and show how to improve it further to increase the accuracy of the predictions.

Comments

Get In Touch!

Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.

Thanks!

About

Pena.lt/y is a site dedicated to football analytics. You'll find lots of research, tutorials and examples on the blog and on GitHub.

Social Links

Get In Touch

You can contact pena.lt/y through the website here