Why I Wrote My Own MCMC Sampler for Penaltyblog

Introduction

A little while back, I made the difficult decision to remove the Bayesian goal models from penaltyblog. It wasn't a choice I took lightly, but many users were hitting a wall. The models relied on powerful libraries like Stan and PyMC for which the installation process was proving to be overly difficult. What should have been a simple modelling task often turned into wrestling with complex dependencies.

That frustration is precisely why I'm so excited to announce the latest release of penaltyblog. It now includes its very own native, dependency-free Bayesian engine, built from the ground up to make my Bayesian goal models accessible to users, without the installation headache.

Moving Beyond Point Estimates

The existing models in penaltyblog (like Dixon-Coles and Poisson) rely on Maximum Likelihood Estimation (MLE). MLE is great as it finds the single best set of parameters (Attack, Defense, Home Advantage) that explains the history of goals.

But football is gloriously unpredictable. Is Liverpool's attack strength exactly 1.45? Or is it somewhere between 1.30 and 1.60 depending on who's injured, who's in form, and whether the team bus got stuck in traffic?

MLE gives you a specific number. Bayesian inference gives you a distribution of possibilities. By understanding the range of likely strengths, we can better quantify the risk in our odds and predictions. It's the difference between saying "Liverpool will win 2-0" and "Liverpool will probably win, but here's the full range of possible outcomes and how likely each one is."

The Solution: Cythonized MCMC

To bring Bayesian modelling to penaltyblog without the previous dependency hell, I decided to write a bespoke MCMC (Markov Chain Monte Carlo) sampler from scratch using Cython.

Under the hood, it uses an Affine Invariant Ensemble Sampler with Differential Evolution moves. If that sounds scary, here's what it means in plain English:

  1. It's lightning fast: The likelihood functions and the sampler are compiled to C. It leaves pure Python implementations in the dust.
  2. It's robust: Football data is messy. This specific type of sampler excels at navigating the correlated parameters often found in sports models (like the relationship between a team's attack and their opponent's defense).
  3. No bloat: It's built directly into the package and is designed for these specific use cases. pip install penaltyblog is all you need to get started.

How to use it

I've designed the API to mimic the existing MLE models, so you'll feel right at home. Here's how you fit a Bayesian Dixon-Coles model.

First, let's grab the data using the built-in scrapers:

import penaltyblog as pb

# Get Premier League data
df = pb.scrapers.FootballData("ENG Premier League", "2023-2024").get_fixtures()

# Calculate weights (optional, but recommended to weigh recent games higher)
weights = pb.models.dixon_coles_weights(df["date"], xi=0.001)

Next, we fit the model. Instead of DixonColesGoalModel, we use BayesianGoalModel:

from penaltyblog.models import BayesianGoalModel

# Initialize the model
model = BayesianGoalModel(
    df["fthg"],
    df["ftag"],
    df["team_home"],
    df["team_away"],
    weights=weights
)

# Fit using MCMC
# n_samples: How many steps to take
# burn: How many initial steps to throw away (warm-up)
model.fit(n_samples=2000, burn=1000)

A Quick Note on the Arguments

You might notice a few extra arguments in the .fit() function. Since MCMC works by "walking" around the probability landscape, we need to give it a little guidance:

  • burn: Think of this as the warm-up lap. The walkers start at random positions and need time to find the "sensible" parameter values. We discard these initial steps so they don't skew our results.
  • n_samples: This is how long we record the walkers after the warm-up. More samples mean a smoother, more accurate distribution, but it will take longer to run.
  • n_chains: Why rely on just one explorer? This tells the model to send out multiple independent walkers starting from different random locations. If they all end up describing the same probability landscape, we can trust the results. If they disagree (get stuck in different places), it’s a warning sign that our model hasn't converged.
  • thin: (Optional) If the walkers are moving very slowly, consecutive samples might look too similar. "Thinning" tells the model to only keep every 5th or 10th sample etc to ensure they remain independent.

Unlike the MLE models which finish instantly, this will take a moment as the "walkers" explore the parameter space. Grab a cup of tea, check the latest transfer rumors, and let the mathematics work its magic.

Diagnostics

In Bayesian analysis, we can't just trust the result blindly. We need to ensure our sampler converged properly. I've added helper functions to visualize the "trace" (the path the sampler took). Ideally, we want the traces to look like "fuzzy caterpillars," which indicates the model has explored the space well.

# Plot trace for specific parameters
model.plot_trace(params=["attack_Arsenal", "home_advantage"])

Pelican

When analyzing these trace plots, we are looking for the classic 'fuzzy caterpillar' shape. The dense, rectangular block of noise is exactly what you want to see - it confirms that the model has converged and is confidently exploring the possible values for parameters like Arsenal's Attack Strength or Home Advantage.

The fact that the plot remains horizontal (not drifting up or down) and the lines are perfectly blended together proves that our independent sampling chains all agree on the answer, giving us a statistically sound result.

Prediction

Once we're happy with the convergence, prediction works exactly the same way as before. The difference is that penaltyblog is now calculating the probabilities across thousands of possible parameter combinations, giving you a marginalized probability that accounts for uncertainty.

probs = model.predict("Man City", "Arsenal")

print(f"Home Win: {probs.home_win}")
print(f"Draw: {probs.draw}")
print(f"Away Win: {probs.away_win}")

The Hierarchical Model

I've also added a HierarchicalBayesianGoalModel for those who want to take it a step further.

In the standard model, we usually clamp the team parameters so they average to zero. In a Hierarchical model, we treat the league's variance (how wide the gap is between the best and worst teams) as an unknown parameter to be learned.

Think of it this way: instead of just looking at how teams perform against each other, the model also learns about the "character" of the league itself. Is this a season with a dominant team running away with the title? Or is it a tightly contested battle where anyone can beat anyone?

The model learns two extra parameters: sigma_attack and sigma_defense.

  • High Sigma: The league is very unequal (think Man City vs. Sheffield Utd).
  • Low Sigma: The league is very competitive and tightly packed.

This approach allows the model to "shrink" estimates for teams with fewer games toward the league average, which is especially useful early in the season to prevent wild predictions based on tiny sample sizes.

from penaltyblog.models import HierarchicalBayesianGoalModel

h_model = HierarchicalBayesianGoalModel(
    df["fthg"], df["ftag"], df["team_home"], df["team_away"], weights=weights
)

h_model.fit()

Summary

This update was born out of my own frustration with setting up complex environments just to run simple sports models. Life's too short to spend it debugging C++ compilers when you could be analyzing football!

The new Bayesian capabilities allow you to:

  1. Quantify Uncertainty: See the full distribution of team strengths, not just point estimates.
  2. Avoid Installation Hell: No external dependencies required - just one simple pip install.
  3. Fast: Powered by custom Cython MCMC sampler.

Ready to dive into Bayesian football modelling without the headaches? The update is available now.

pip install penaltyblog --upgrade

Happy modeling!