Blog List.

Predicting Football Matches Using Shot Data Part Two

Introduction

Having found that the correlation between goals scored and shots on target was the strongest of the various shooting variables I had available to me, I decided to see how well they could predict the outcome of a football match.

Creating The Model

The obvious approach would have been to just do a linear regression for goals scored against number of shots on target and then predict the average number of goals each team would be expected to score. This doesn’t provide much insight though. The average score line might be of interest if each team was going to play each other 20 or 30 times a season but for a single game it is pretty much irrelevant.

What is of more use is to predict the actual odds for each possible outcome between the teams. In other words what is the probability of each team winning, drawing or losing?

To do this I looked at how many shots on target each team achieved and conceded each match compared with the league’s average to estimate how many they would be expected to have against each other. This was then mapped to the distribution of their shot on targets over the season so far and their shot conversion rate used to calculate the probabilities of the different number of goals they could score. Each match was then played one million times as part of a Monte Carlo simulation to see what the likely outcomes was.

Are the Predictions Accurate?

One difficulty with a model like this is to assess its accuracy. With a traditional linear model you can just look at the $r2$ value to see how well you predictions match the actual results. The higher the $r2$ value then the better your model is.

But with a probability model this doesn’t work. For example take the situation where the probability model predicts Team A have a 75% chance of beating Team B. Even if the model has calculated these odds perfectly then Team A will still lose 25% of the time, making it look like the prediction was incorrect.

One alternative is to identify what the most probable outcome for each match was – win, draw or loss – and compare that with what actually happened to see if they match. To do this I applied the model retrospectively to all the matches from the 2011–2012 English Premier League season and overall the proportions of outcomes predicted did match closely what actually happened (Figure 1).

Pelican

Figure 1: Proportion of outcomes predicted compared with actual results for 2011–2012 English Premier League season

Another test we can do is to compare the Shot on Target model with other models to see how well they compare. Again I picked the most probable outcome from my odds and this time compared it with those from Bet365 for the entire 2011–2012 English Premier League season. I also randomly guessed the outcome for each match by chance to see how the model compared with pure luck too.

Prediction Results

Overall, the Shot on Target model’s most probable outcome correctly matched what actually happened for 43% of the matches tested compared with 52% for Bet365 and 33% from randomly guessing.

Interestingly, even the bookies only managed to get the odds correct for around half the matches so the Shot on Target model is doing pretty well at 43% and isn’t that far behind the professional odds compilers. Also, this is only the first stage of the model, there are still plenty of ways it can be tweaked to try and improve its accuracy further.

Comments

Laurie - January 31, 2013

Reading your blog it seems that shots on goal is a good thing to study to determine a win/draw/lose prediction of a game.I have an ELO system on excel and also power ratings.However i want to add shot:shots on target:goals to my excel system,but am unsure how to go about doing this.Im hoping you can help.Maybe send me an email with some advice.I’d be very grateful for your help.

Cheers

Laurie

BernieW - February 4, 2013

I did notice that Barca did not perform as well when their shots & shots on target were below their average PLUS the opposition had their shots & shots on target above the average. Do you use clear chances as a metric? I cannot find this statistic recorded by any site and I get the “feeling” that this would be more accurate a predictor. It would be great to have this for all the 5/6 major leagues for a few seasons.

Martin Eastwood - February 4, 2013

It is an interesting idea. I am looking at ways to improve the model by adding in extra metrics so I’ll take a look at it if i can find any data available

David - February 3, 2014

Great post..I have always wanted to use something other than linear regression. Could you make a short tutorial or send me an email on how you did the part where you create the model.

How do you estimate how many shots on target they are expected to have against each other and how do you end up with the score probabilities.

I would love to be able to recreate this model you have made.

Looking forward to your reply and keep up the good work.

Nick - April 16, 2014

I came across a bog post (http://www.soccerstatistically.com/blog/2011/11/9/how-to-succeed-in-the-epl-chances-created-and-chance-convers.html) that analyzes chances created and goals scored. There is a correlation (adding in the chance conversion rate). The author uses data from Opta.

Martin Eastwood - April 16, 2014

Thanks for the link!

Nick - April 16, 2014

I have thought about the same – goal chances although a little bit subjective and more difficult to define than shots on target (is a shot from 30 yards that blazed over the bar a goal chance?) should be a much better predictor than shots on target because shots on target do not tell you anything about the quality of those shots and exclude great gola chances that resulted in shots off target or no shot at all.

Nick - June 20, 2014

This may be interesting for you: http://www.pinnaclesports.com/online-betting-articles/05-2014/world-cup-total-shots-ratio.aspx

Martin Eastwood - June 21, 2014

Thanks for the link Nick!

Get In Touch!

Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.

Thanks!

About

Pena.lt/y is a site dedicated to football analytics. You'll find lots of research, tutorials and examples on the blog and on GitHub.

Social Links

Get In Touch

You can contact pena.lt/y through the website here