Football Prediction Models: Which Ones Work the Best?

Published on March 10, 2025 by Martin Eastwood

Introduction

I've recently released version 1.1.0 of my penaltyblog Python package, bringing significant improvements to the speed and predictive performance of football (soccer) goals models. With this update, I thought it would be a great opportunity to compare the different models available — such as Poisson, Dixon and Coles, and more — exploring how they work, how to optimize their parameters, and how they perform on real-world data.

Let's start off with a high-level look at the different models available, looking at how they work, what their strengths are and what their weaknesses are.

What is the Poisson Goals Model?

One of the most widely used models for predicting football scores is the Poisson goals model. It assumes that the number of goals scored by each team follows a Poisson distribution — essentially meaning that goals occur randomly over time, but with a certain average rate (known as λ). This rate is determined by factors like a team’s attacking strength, their opponent’s defensive ability, and whether they are playing at home or away.

Strengths of the Poisson Goals Model

  • Simple and efficient – It’s easy to understand and requires only a few parameters, making it quick to calculate.
  • Decent predictive power – Despite its simplicity, it often performs well in forecasting match outcomes.
  • Useful for betting and analytics – Many bookmakers and analysts use variations of the Poisson model as a baseline.

Weaknesses of the Poisson Goals Model

  • Overestimates extreme results – Because it assumes goal events are independent, it sometimes predicts too many high-scoring games.
  • Doesn’t handle low-scoring bias well – Football has more 0-0 and 1-1 draws than the model naturally expects.
  • Ignores game dynamics – It doesn’t account for strategic shifts (e.g., teams playing more defensively when leading).

To address some of these issues, more sophisticated models have been developed to try and combat some of the Poisson model's limitations.

What is the Dixon and Coles Goals Model?

The Dixon and Coles goals model is an improvement on the standard Poisson approach for predicting football scores. While the Poisson model assumes goals are scored independently, Dixon and Coles recognized that this assumption doesn’t always hold in real-world matches. In particular, low-scoring games, especially 0-0, 1-0, and 1-1, happen more often than a basic Poisson model would predict. To correct this, they introduced an additional adjustment factor that modifies the probabilities of low-score outcomes, making the model more accurate for practical football forecasting.

Strengths of the Dixon and Coles Goals Model

  • More realistic score predictions – It better accounts for the observed tendency of football matches to produce more low-score draws.
  • Improves predictive accuracy – By adjusting goal probabilities, it refines match outcome predictions, especially for betting and analytics.
  • Still relatively simple – It builds on the Poisson model without adding excessive complexity, making it practical to implement.

Weaknesses of the Dixon and Coles Goals Model

  • Less effective for high-scoring matches – Since it primarily corrects low-score probabilities, it may not offer much improvement for teams or leagues where high-scoring games are common.
  • Parameter estimation can be trickier – The extra adjustment introduces an additional parameter that needs to be optimized, making implementation more complex.
  • Assumes the same adjustment applies to all matches – The correction factor is applied uniformly, meaning it doesn’t adapt dynamically to different match contexts or team styles.

Overall, the Dixon and Coles model offers a small but meaningful refinement over Poisson, making it a better choice when predicting match results, particularly in low-scoring leagues or competitions where draws are common.

What is the Bivariate Poisson Goals Model?

The Bivariate Poisson goals model is a more advanced approach to predicting football match scores that builds on the standard Poisson goals model by introducing correlation between the number of goals scored by each team.

Unlike the basic Poisson goals model, which assumes that each team’s goal count is independent of the other, the Bivariate Poisson goals model acknowledges that certain match factors, such as overall game tempo, attacking intent, or defensive frailties, can influence both teams simultaneously and therefore both teams' scores are dependent on each other.

To achieve this, the model introduces a shared dependency term, which captures the extent to which goal-scoring events are linked. For example, in high-tempo matches where both teams play aggressively, a high-scoring game (like 3-2) may be more likely than two independent Poisson processes would suggest. And, in defensive matchups, both teams might struggle to score, increasing the likelihood of a low-scoring draw.

Strengths of the Bivariate Poisson Goals Model

  • Accounts for correlation in goal-scoring – Unlike simpler models, it recognizes that teams don’t score in isolation; match dynamics often affect both sides.
  • Improves predictions across all scorelines – While Dixon and Coles only corrects for low-score biases, the Bivariate Poisson goals model affects high-score predictions as well.
  • More flexible than Poisson and Dixon and Coles – it extends the standard Poisson goals framework without imposing arbitrary score adjustments.

Weaknesses of the Bivariate Poisson Goals Model

  • More complex to estimate – the additional correlation parameter makes it harder to fit the model to data, requiring more advanced statistical techniques.
  • Less interpretable than simpler models – while Poisson-based models are easy to explain, the dependency structure in Bivariate Poisson makes it more abstract.
  • Can be unnecessary for low-scoring leagues – if most matches end with 0-0, 1-0, or 1-1 scorelines, the extra complexity might not give noticeable accuracy gains over simpler models.

Overall, the Bivariate Poisson goals model is a strong choice for capturing the interaction between teams, particularly in leagues or matchups where attacking styles, game tempo, or defensive weaknesses lead to both high and low-scoring outcomes. However, its added complexity means it may not always be worth the effort compared with a simpler model like Dixon and Coles.

What is the Zero-inflated Poisson Goals Model?

The Zero-Inflated Poisson (ZIP) goals model is an extension of the standard Poisson goals model that accounts for the fact that football matches often have more goalless results (0-0, 1-0, 0-1) than a standard Poisson process predicts. In simple terms, the ZIP model assumes that some matches have a high probability of producing zero goals due to defensive tactics, lack of attacking quality, or other match-specific factors that the standard Poisson goals model does not capture.

This is done by introducing an extra parameter, which represents the probability that a match belongs to a special zero-inflated category rather than following the usual Poisson goal distribution. If a match is in this category, its goal count is forced to be zero. Otherwise, the number of goals is drawn from a standard Poisson distribution. This allows the model to explicitly account for the excess number of goalless matches while still modeling other scorelines using a Poisson process.

Strengths of the Zero-Inflated Poisson Goals Model

  • Better at modeling goalless matches – it accounts for the tendency of football matches to have more 0-0 results than a pure Poisson process would predict.
  • Improves accuracy for defensive teams – useful in competitions or teams where ultra-defensive tactics result in frequent low-scoring games.
  • Still relatively simple – it only adds one extra parameter to the Poisson model.

Weaknesses of the Zero-Inflated Poisson Goals Model

  • Only adjusts for excess zero-score games – the ZIP goals model specifically corrects for too many goalless matches but does not address other common Poisson goals model issues, such as overestimating extreme scorelines (e.g., 4-3 or 5-2 results).
  • Assumes zero-inflation is the same across all matches – the model applies a single probability for zero-goal inflation, meaning it doesn’t adapt dynamically to different teams, leagues, or match conditions (e.g., some teams naturally play more defensive football than others).
  • May not improve predictions in high-scoring leagues – in competitions where 0-0 results are not unusually frequent, the extra complexity of zero-inflation is unnecessary and may not lead to better forecasts.

Overall, the Zero-Inflated Poisson goals model is a useful model when working with leagues or teams that have a higher-than-expected number of goalless results; however, it doesn’t adjust for other scoreline distortions.

What is the Negative Binomial Goals Model?

The Negative Binomial goals model is an extension of the standard Poisson goals model that addresses the issue of overdispersion — where the variance in goal counts is greater than the mean, which the Poisson model cannot handle properly. In football data, overdispersion often occurs because real-world score lines include more variability than the simple Poisson process predicts.

The Negative Binomial goals model attempts to solve this by introducing an extra parameter, which allows the variance to be larger than the mean, making it more flexible in capturing a wider range of goal distributions. Instead of assuming a fixed rate of goal-scoring for each team, it accounts for additional variability in scoring ability between different matches. This makes it particularly useful for leagues or teams where results can fluctuate significantly from game to game.

Strengths of the Negative Binomial Goals Model

  • Handles overdispersion effectively – it allows for greater variance in goal counts, making it more accurate for leagues or teams with unpredictable scorelines.
  • More realistic high-score predictions – unlike the standard Poisson model, it doesn’t underestimate the frequency of extreme results (e.g., 4-2, 5-3), making it more reliable for goal-heavy leagues.
  • Still interpretable and relatively simple – it’s a natural extension of Poisson, meaning it retains much of the intuitiveness and ease of implementation while improving accuracy.

Weaknesses of the Negative Binomial Goals Model

  • Doesn’t model goal correlation between teams – while it improves goal variance, it still treats each team’s goal count as independent, meaning it doesn’t account for game dynamics where both teams’ performances are linked (unlike the Bivariate Poisson goals model).
  • Can be unnecessary for low-scoring leagues – if a competition has mostly 0-0, 1-0, or 1-1 matches, the added flexibility of the Negative Binomial goals model may not provide a significant advantage over simpler models like Poisson or Dixon and Coles goal models.
  • Requires an extra parameter to estimate – while not overly complex, it adds another layer of statistical estimation, which can make optimization and model fitting slightly more challenging compared to a basic Poisson model.

The Negative Binomial goals model is particularly useful in high-scoring leagues or tournaments where teams display inconsistent scoring patterns. It’s a strong alternative when Poisson-based models struggle with overdispersion, but it won’t help much in low-scoring leagues where correcting draw probabilities (as Dixon-Coles does) is more important.

What is the Weibull Count + Copula Goals Model?

Instead of assuming that goals follow a Poisson or Negative Binomial distribution, this approach uses a Weibull distribution, which allows for more flexible goal distributions. This can be useful since real-world goal distributions often don't follow the Poisson. The Weibull goals model may be able to better capture the empirical shape of goal distributions, accommodating overdispersion and other nuances in scoring patterns.

The model also incorporates a copula, which allows for goal-scoring correlation between teams. Unlike say the standard Bivariate Poisson model, which assumes a specific form of dependence, the copula framework is more flexible and can capture different types of relationships between teams' goal counts, such as how a team's attacking performance influences their opponent's defensive response.

Strengths of the Weibull Count + Copula Goals Model

  • More flexible than Poisson-based models – the Weibull count model does not assume a fixed mean-variance relationship, making it better at handling overdispersion and improving accuracy for extreme scorelines.
  • Accounts for goal correlation in a more general way – unlike the Bivariate Poisson model, which assumes a specific type of dependency, the copula approach allows for a wider range of correlations between teams' goal counts.
  • Better predictive power in certain contexts – Weibull Count + Copula approach can often outperform traditional Poisson-based models, particularly in leagues with more complex goal-scoring patterns.

Weaknesses of the Weibull Count + Copula Goals Model

  • Significantly more complex to implement – Unlike simpler Poisson-based models, this approach requires some heavy mathematical techniques to estimate the Weibull parameters and copula dependency structure, making it harder to apply in practice.
  • More computationally expensive – The added flexibility comes at a cost: fitting the model requires more intensive calculations, which can be impractical for large-scale applications compared to Poisson-based models.
  • Not always a clear improvement over simpler models – while it offers greater flexibility, in many cases simpler models like Dixon and Coles or Negative Binomial goals models still perform well enough without the added complexity, making this model potentially unnecessary for certain leagues or datasets.

This model is most useful in situations where traditional Poisson-based models struggle, for example, in leagues where goal distributions exhibit strong overdispersion or where team interactions significantly influence each other's scoring potential. However, it is significantly more time consuming to fit and may not always provide a clear improvement over simpler models.

The table below provides a quick reference to help you decide which model best suits your needs based on factors like scoring patterns, goal correlation, and computational complexity.

Model Strengths Weaknesses Best Used For
Poisson Simple, efficient, widely used Overpredicts high scores, doesn’t handle low-score bias well General forecasting, fast model training
Dixon & Coles Corrects low-score biases, better real-world accuracy Assumes fixed adjustment across all matches, extra parameter tuning needed Low-scoring leagues, competitions with many draws
Bivariate Poisson Models goal correlation between teams, useful for high-scoring matches Complex, harder to interpret, computationally expensive High-scoring leagues, capturing match dynamics
Zero-Inflated Poisson Better at modeling 0-0 draws, improves accuracy for defensive teams Only addresses goalless matches, assumes fixed zero-inflation probability Ultra-defensive teams, low-scoring competitions
Negative Binomial Handles overdispersion, more realistic high-score predictions Still assumes independent goal counts, may be unnecessary for low-scoring leagues High-scoring leagues, competitions with unpredictable scorelines
Weibull Count + Copula More flexible goal distribution, accounts for complex scoring relationships Highly complex, computationally expensive, difficult to implement Leagues with extreme goal-scoring patterns, advanced predictive modeling

Table 1: Comparison of the different model types currently available in penaltyblog

Installing the penaltyblog Python Package

Now we've got the theory out of the way, let's look at how to use these models via the penaltyblog Python package. If you've not used it before, you can install it using pip.

pip install penaltyblog

Downloading Football Data the Easy Way

Next, we are going to need some data to fit the models to. We'll use penaltyblog's built in functionality to download data for the English Premier League from football-data.co.uk.

We start off by downloading the data, then we index the date column to speed up filtering the dataframe later on. Finally, we create a new column called ftr_numeric which maps the ftr column to a numeric value, so we can use it in the models more easily.

We're going to use the Dutch Eredivisie for this example, but you can use any league you want by changing the league and season arguments in the code.

import numpy as np
import pandas as pd
import penaltyblog as pb
import matplotlib.pyplot as plt
from tqdm import tqdm

df = pd.concat([
    pb.scrapers.FootballData("NLD Eredivisie", "2015-2016").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2016-2017").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2017-2018").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2018-2019").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2019-2020").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2020-2021").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2021-2022").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2022-2023").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2023-2024").get_fixtures(),
    pb.scrapers.FootballData("NLD Eredivisie", "2024-2025").get_fixtures(),
])

df = df.sort_values('date').set_index('date', drop=False)
ftr_map = {"H": 0, "D": 1, "A": 2}
df['ftr_numeric'] = df['ftr'].map(ftr_map)

To evaluate and compare the different models in our package, we employ a rolling time-based validation approach. Instead of training each model on the entire dataset at once, we simulate how they would perform in a real-world betting or forecasting scenario.

The process involves iterating through the dataset one date at a time, fitting the model only to the data available up to that date, and then using it to predict the outcomes of fixtures on that specific day.

This method ensures that our predictions are made without knowledge of future results, mimicking the conditions under which these models would be used in practice. By avoiding data leakage and preserving the natural temporal structure of football matches, we obtain a more fair and realistic assessment of each model’s predictive power.

Fitting the Goals Models

Let's start off by getting all the unique dates since the start of the 2023-2024 season to test the models on.

start_date = df.query("season == '2023-2024'")["date"].min()
run_dates = df["date"][df["date"] >= start_date].unique()
len(run_dates)

The next block of code implements our rolling time-based validation approach to evaluate a Poisson model by predicting match outcomes over time. The process iterates through each date in run_dates, simulating a real-world forecasting scenario where only past data is available when making predictions.

For each date:

  • A training set is created, including matches from the previous three years up to (but not including) the current date.
  • A test set is created, containing matches that occur on the current date.
  • The model is initialized and trained using the training set.
  • If the model successfully fits, predictions are made for the test fixtures.
  • The predicted probabilities (home_draw_away) and actual outcomes are stored for evaluation.

To ensure robustness:

  • The code skips dates with no fixtures.
  • Errors during prediction are caught and ignored to prevent the loop from breaking.
  • The final Ranked Probability Score (RPS) is computed to measure model performance.

We can repeat this process for each of the model types we're interested in to get a sense of how well they perform on our dataset.

predictions = []
observed = []

for date in tqdm(run_dates, desc="Processing dates"):
    lookback = pd.Timestamp(date) - pd.DateOffset(years=3)
    train = df[(df["date"] < date) & (df["date"] >= lookback)]
    test = df[df["date"] == date]

    clf = pb.models.DixonColesGoalModel(
        train["goals_home"],
        train["goals_away"],
        train["team_home"],
        train["team_away"],
    )
    try:
        clf.fit()

        if len(test) > 0:
            homes = test["team_home"].values
            aways = test["team_away"].values
            outcomes = test["ftr_numeric"].values

            for i in range(len(test)):
                try:
                    prediction = clf.predict(homes[i], aways[i])
                    predictions.append(prediction.home_draw_away)
                    observed.append(outcomes[i])
                except Exception as e:
                    continue
    except:
        continue

pb.metrics.rps_average(predictions, observed)

Measuring Performance of the Goals Models

We will use the Ranked Probability Score (RPS) to evaluate the accuracy of the models' forecasts, which is a metric that measures how well a model’s predicted probability distribution aligns with the actual outcome.

RPS is calculated as the squared difference between the cumulative predicted probabilities and the cumulative observed outcome. It ranges from 0 to 1, where lower values indicate better predictive performance.

In football modeling, a model with a lower RPS assigns higher probabilities to correct outcomes while distributing probabilities meaningfully across alternatives, making it more reliable for decision-making.

Table 2 below shows the results for all the different models against our Eredivisie dataset. We can see that:

  • Dixon and Coles performed the best with the lowest RPS.
  • Weibull Count followed closely behind.
  • Bivariate Poisson had the highest RPS, indicating the weakest performance on this set of data.
Model Ranked Probability Score (RPS)
Dixon and Coles 0.19137780685608083
Weibull Count 0.19141358825225932
Poisson 0.19154229559464445
Zero-inflated Poisson 0.19154043298013113
Negative Binomial 0.19155750459845977
Bivariate Poisson 0.19161764011301444

Table 2: Rank Probability Scores for the different model types

Optimising the Goal Model's Lookback Window

In the previous section, we used a fixed three-year lookback window to train our models before making predictions. However, the optimal amount of historical data to use is not always clear — too much data might include outdated information, while too little could lead to overfitting to recent trends.

Next, we'll optimise this lookback window by systematically varying the amount of past data used for training. By looping through different window sizes (e.g., 1 year, 2 years, 3 years, etc.), we can assess how the model’s predictive performance changes based on the amount of historical data it has access to and see if we can improve the RPS further.

For simplicity, we'll just optimise the Dixon and Coles model since it performed best, but ideally you'd repeat this process for all the models you're interested in to find the best performing model on your dataset.

We'll use the same rolling time-based validation approach as before, but this time we'll vary the lookback window by just tweaking the lookback variable by changing the number of years we go back. For example, the code below gives us a lookback window of two years.

lookback = pd.Timestamp(date) - pd.DateOffset(years=2)

Figure 1 below shows the results for different lookback windows. We can see that initially adding more data improves the model's performance, but after around four seasons the RPS starts to increase again and the model's performance starts to degrade.

Pelican

Figure 1: Dixon and Coles RPS using different lookback windows

Incorporating Time-Weighted Data: Enhancing Predictions with the Dixon and Coles Approach

Next, we'll optimise the weighting applied to the data based on the methodology proposed by Dixon and Coles in their seminal paper. Their approach acknowledges that more recent matches carry greater predictive value than older ones when modeling football outcomes so should be given more importance in the model's predictions.

This is done by introducing an exponential decay function so that older games contribute less to the model’s parameter estimates, allowing it to adapt more effectively to recent team performances. This weighting method helps balance the trade-off between using sufficient historical data and ensuring that the model remains responsive to current trends.

How the Dixon and Coles Weights are Calculated

The weighting function typically follows this form:

$$[ w_t = e^{-\xi (T - t)} ]$$

Explanation of Variables

  • $w_t$: Weight assigned to a match played at time $t$
  • $T$: The current date
  • $t$: The date the match is played
  • $\xi$: The decay factor (higher values cause faster decay)

How it Works

  • If $\xi$ = 0, all matches are weighted equally.
  • If $\xi$ is small (e.g., 0.01), older matches retain some influence.
  • If $\xi$ is large (e.g., 0.03), older matches lose influence faster.

To help visualise the effect of the weights, I've plotted a range of $\xi$ values in Figure 2 below. With a $\xi$ of zero, no weighting is applied and all historical fixtures carry the same importance. As we increase $\xi$, older matches carry less and less weight in the model's calculations and the model becomes more responsive to recent team performance trends.

Pelican

Figure 2: Example Dixon and Coles weights

Optimising the Value of $\xi$

All of the models in the penaltyblog package support applying weights to the data being trained on, so we can easily optimise the value of $\xi$ by looping through a range of values and selecting the one that gives the lowest RPS.

penaltyblog also provides a function to automatically create the weights values based on Dixon and Coles approach, but you can also use your own weighting system if you prefer, as shown in the example below.

xi = 0.001
weights = pb.models.dixon_coles_weights(train["date"], xi)

clf = pb.models.ZeroInflatedPoissonGoalsModel(
    train["goals_home"],
    train["goals_away"],
    train["team_home"],
    train["team_away"],
    weights,
)

clf.fit()

Now we just need to update our rolling time-based validation from before to loop through a range of $\xi$ values and select the one that gives the lowest RPS.

results = []
for xi in np.arange(0.000, 0.005, 0.0005):
    predictions = []
    observed = []
    for date in tqdm(run_dates, desc="Processing dates"):
        lookback = pd.Timestamp(date) - pd.DateOffset(years=4)
        train = df[(df["date"] < date) & (df["date"] >= lookback)]
        test = df[df["date"] == date]

        weights = pb.models.dixon_coles_weights(train["date"], xi)
        clf = pb.models.ZeroInflatedPoissonGoalsModel(
            train["goals_home"],
            train["goals_away"],
            train["team_home"],
            train["team_away"],
            weights,
        )
        clf.fit()

        if len(test) > 0:
            homes = test["team_home"].values
            aways = test["team_away"].values
            outcomes = test["ftr_numeric"].values

            for i in range(len(test)):
                try:
                    prediction = clf.predict(homes[i], aways[i])
                    predictions.append(prediction.home_draw_away)
                    observed.append(outcomes[i])
                except Exception as e:
                    continue

    result = {
        "decay": xi,
        "rps": pb.metrics.rps_average(predictions, observed),
    }
    results.append(result)

x = [x["decay"] for x in results]
y = [x["rps"] for x in results]

plt.plot(x, y)
plt.xlabel("Decay")
plt.ylabel("RPS")

Figure 3 below shows the results for different values of $\xi$. We can see that the optimal value is around 0.001, reducing our RPS to 0.1890924613913105.

Pelican

Figure 3: Dixon and Coles RPS using different values of $\xi$

Conclusions

In this article, we've explored penaltyblog's predictive models from the Poisson goal model to more advanced approaches like Dixon and Coles, Bivariate Poisson, and Weibull Count Copula goals models. By applying a rolling time-based validation approach, we evaluated their real-world predictive performance using the Ranked Probability Score (RPS).

We then optimized key parameters, including the lookback window, finding that using around four seasons of historical data strikes the best balance between capturing enough information and staying responsive to recent trends. Finally, we applied Dixon-Coles time weighting, tuning the decay factor to further improve prediction accuracy, reducing the RPS even more.

Key Takeaways

  • More advanced models often outperform the basic Poisson goals model (although not always), particularly in handling low-scoring biases and goal correlation.
  • Optimizing historical data usage prevents outdated matches from degrading predictions.
  • Applying time-weighting enhances accuracy, ensuring the model remains adaptable to recent team performances.

If you are interested in finding out what else the penaltyblog package can do then you can read the documentation here.

Thanks for reading!