<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>penaltyblog</title><link href="/" rel="alternate"></link><link href="feeds/all.atom.xml" rel="self"></link><id>/</id><updated>2026-01-08T10:42:00+00:00</updated><entry><title>Why I Wrote My Own MCMC Sampler for Penaltyblog</title><link href="2026/01/08/bayesian_goal_models/" rel="alternate"></link><published>2026-01-08T10:42:00+00:00</published><updated>2026-01-08T10:42:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2026-01-08:2026/01/08/bayesian_goal_models/</id><summary type="html">&lt;p&gt;I've added a native, dependency-free Bayesian engine to penaltyblog, powered by a custom Cython MCMC sampler to make quantifying uncertainty fast and frustration-free...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;A little while back, I made the difficult decision to remove the Bayesian goal models from &lt;code&gt;penaltyblog&lt;/code&gt;. It wasn't a choice I took lightly, but many users were hitting a wall. The models relied on powerful libraries like &lt;code&gt;Stan&lt;/code&gt; and &lt;code&gt;PyMC&lt;/code&gt; for which the installation process was proving to be overly difficult. What should have been a simple modelling task often turned into wrestling with complex dependencies.&lt;/p&gt;
&lt;p&gt;That frustration is precisely why I'm so excited to announce the latest release of &lt;strong&gt;penaltyblog&lt;/strong&gt;. It now includes its very own native, dependency-free Bayesian engine, built from the ground up to make my Bayesian goal models accessible to users, without the installation headache.&lt;/p&gt;
&lt;h2 id="moving-beyond-point-estimates"&gt;Moving Beyond Point Estimates&lt;/h2&gt;
&lt;p&gt;The existing models in &lt;code&gt;penaltyblog&lt;/code&gt; (like Dixon-Coles and Poisson) rely on &lt;strong&gt;Maximum Likelihood Estimation (MLE)&lt;/strong&gt;. MLE is great as it finds the &lt;em&gt;single best&lt;/em&gt; set of parameters (Attack, Defense, Home Advantage) that explains the history of goals.&lt;/p&gt;
&lt;p&gt;But football is gloriously unpredictable. Is Liverpool's attack strength exactly &lt;code&gt;1.45&lt;/code&gt;? Or is it somewhere between &lt;code&gt;1.30&lt;/code&gt; and &lt;code&gt;1.60&lt;/code&gt; depending on who's injured, who's in form, and whether the team bus got stuck in traffic?&lt;/p&gt;
&lt;p&gt;MLE gives you a specific number. Bayesian inference gives you a &lt;strong&gt;distribution&lt;/strong&gt; of possibilities. By understanding the range of likely strengths, we can better quantify the risk in our odds and predictions. It's the difference between saying "Liverpool will win 2-0" and "Liverpool will probably win, but here's the full range of possible outcomes and how likely each one is."&lt;/p&gt;
&lt;h2 id="the-solution-cythonized-mcmc"&gt;The Solution: Cythonized MCMC&lt;/h2&gt;
&lt;p&gt;To bring Bayesian modelling to &lt;code&gt;penaltyblog&lt;/code&gt; without the previous dependency hell, I decided to write a bespoke &lt;strong&gt;MCMC (Markov Chain Monte Carlo)&lt;/strong&gt; sampler from scratch using &lt;strong&gt;Cython&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Under the hood, it uses an &lt;strong&gt;Affine Invariant Ensemble Sampler with Differential Evolution moves&lt;/strong&gt;. If that sounds scary, here's what it means in plain English:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;It's lightning fast:&lt;/strong&gt; The likelihood functions and the sampler are compiled to &lt;code&gt;C&lt;/code&gt;. It leaves pure Python implementations in the dust.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It's robust:&lt;/strong&gt; Football data is messy. This specific type of sampler excels at navigating the correlated parameters often found in sports models (like the relationship between a team's attack and their opponent's defense).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No bloat:&lt;/strong&gt; It's built directly into the package and is designed for these specific use cases. &lt;code&gt;pip install penaltyblog&lt;/code&gt; is all you need to get started.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="how-to-use-it"&gt;How to use it&lt;/h2&gt;
&lt;p&gt;I've designed the API to mimic the existing MLE models, so you'll feel right at home. Here's how you fit a &lt;strong&gt;Bayesian Dixon-Coles&lt;/strong&gt; model.&lt;/p&gt;
&lt;p&gt;First, let's grab the data using the built-in scrapers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="c1"&gt;# Get Premier League data&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2023-2024&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate weights (optional, but recommended to weigh recent games higher)&lt;/span&gt;
&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we fit the model. Instead of &lt;code&gt;DixonColesGoalModel&lt;/code&gt;, we use &lt;code&gt;BayesianGoalModel&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BayesianGoalModel&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BayesianGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fthg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Fit using MCMC&lt;/span&gt;
&lt;span class="c1"&gt;# n_samples: How many steps to take&lt;/span&gt;
&lt;span class="c1"&gt;# burn: How many initial steps to throw away (warm-up)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;burn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="a-quick-note-on-the-arguments"&gt;A Quick Note on the Arguments&lt;/h2&gt;
&lt;p&gt;You might notice a few extra arguments in the &lt;code&gt;.fit()&lt;/code&gt; function. Since MCMC works by "walking" around the probability landscape, we need to give it a little guidance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;burn&lt;/code&gt;: Think of this as the warm-up lap. The walkers start at random positions and need time to find the "sensible" parameter values. We discard these initial steps so they don't skew our results.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;n_samples&lt;/code&gt;: This is how long we record the walkers after the warm-up. More samples mean a smoother, more accurate distribution, but it will take longer to run.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;n_chains&lt;/code&gt;: Why rely on just one explorer? This tells the model to send out multiple independent walkers starting from different random locations. If they all end up describing the same probability landscape, we can trust the results. If they disagree (get stuck in different places), it’s a warning sign that our model hasn't converged.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;thin&lt;/code&gt;: (Optional) If the walkers are moving very slowly, consecutive samples might look too similar. "Thinning" tells the model to only keep every 5th or 10th sample etc to ensure they remain independent.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unlike the MLE models which finish instantly, this will take a moment as the "walkers" explore the parameter space. Grab a cup of tea, check the latest transfer rumors, and let the mathematics work its magic.&lt;/p&gt;
&lt;h2 id="diagnostics"&gt;Diagnostics&lt;/h2&gt;
&lt;p&gt;In Bayesian analysis, we can't just trust the result blindly. We need to ensure our sampler converged properly. I've added helper functions to visualize the "trace" (the path the sampler took). Ideally, we want the traces to look like "fuzzy caterpillars," which indicates the model has explored the space well.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Plot trace for specific parameters&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_Arsenal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;home_advantage&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20260108_trace_plot.png"&gt;&lt;/p&gt;
&lt;p&gt;When analyzing these trace plots, we are looking for the classic '&lt;strong&gt;fuzzy caterpillar&lt;/strong&gt;' shape. The dense, rectangular block of noise is exactly what you want to see - it confirms that the model has converged and is confidently exploring the possible values for parameters like &lt;em&gt;Arsenal's Attack Strength&lt;/em&gt; or &lt;em&gt;Home Advantage&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The fact that the plot remains horizontal (not drifting up or down) and the lines are perfectly blended together proves that our independent sampling chains all agree on the answer, giving us a statistically sound result.&lt;/p&gt;
&lt;h2 id="prediction"&gt;Prediction&lt;/h2&gt;
&lt;p&gt;Once we're happy with the convergence, prediction works exactly the same way as before. The difference is that &lt;code&gt;penaltyblog&lt;/code&gt; is now calculating the probabilities across thousands of possible parameter combinations, giving you a marginalized probability that accounts for uncertainty.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Arsenal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home Win: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_win&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Draw: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Away Win: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;away_win&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="the-hierarchical-model"&gt;The Hierarchical Model&lt;/h2&gt;
&lt;p&gt;I've also added a &lt;code&gt;HierarchicalBayesianGoalModel&lt;/code&gt; for those who want to take it a step further.&lt;/p&gt;
&lt;p&gt;In the standard model, we usually clamp the team parameters so they average to zero. In a &lt;strong&gt;Hierarchical&lt;/strong&gt; model, we treat the league's variance (how wide the gap is between the best and worst teams) as an unknown parameter to be learned.&lt;/p&gt;
&lt;p&gt;Think of it this way: instead of just looking at how teams perform against each other, the model also learns about the "character" of the league itself. Is this a season with a dominant team running away with the title? Or is it a tightly contested battle where anyone can beat anyone?&lt;/p&gt;
&lt;p&gt;The model learns two extra parameters: &lt;code&gt;sigma_attack&lt;/code&gt; and &lt;code&gt;sigma_defense&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;High Sigma:&lt;/strong&gt; The league is very unequal (think Man City vs. Sheffield Utd).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low Sigma:&lt;/strong&gt; The league is very competitive and tightly packed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach allows the model to "shrink" estimates for teams with fewer games toward the league average, which is especially useful early in the season to prevent wild predictions based on tiny sample sizes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HierarchicalBayesianGoalModel&lt;/span&gt;

&lt;span class="n"&gt;h_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HierarchicalBayesianGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fthg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;h_model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This update was born out of my own frustration with setting up complex environments just to run simple sports models. Life's too short to spend it debugging C++ compilers when you could be analyzing football!&lt;/p&gt;
&lt;p&gt;The new Bayesian capabilities allow you to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Quantify Uncertainty:&lt;/strong&gt; See the full distribution of team strengths, not just point estimates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid Installation Hell:&lt;/strong&gt; No external dependencies required - just one simple pip install.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fast:&lt;/strong&gt; Powered by custom Cython MCMC sampler.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Ready to dive into Bayesian football modelling without the headaches? The update is available now.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog&lt;span class="w"&gt; &lt;/span&gt;--upgrade
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Happy modeling!&lt;/p&gt;</content><category term="Prediction"></category><category term="MCMC"></category><category term="Bayesian"></category><category term="Goal Models"></category></entry><entry><title>Introducing the Opta API Connector for matchflow</title><link href="2025/11/30/matchflow-opta-statsperform/" rel="alternate"></link><published>2025-11-30T19:30:00+00:00</published><updated>2025-11-30T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-11-30:2025/11/30/matchflow-opta-statsperform/</id><summary type="html">&lt;p&gt;Streamline your football data pipelines with direct, hassle-free Opta API integration for matchflow...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;I'm excited to announce a new feature for &lt;code&gt;matchflow&lt;/code&gt;: direct integration with the Stats Perform (Opta) Soccer API!&lt;/p&gt;
&lt;p&gt;For a while now, &lt;code&gt;matchflow&lt;/code&gt; (part of my &lt;a href="https://github.com/martineastwood/penaltyblog/"&gt;penaltyblog&lt;/a&gt; package) has been a powerful tool for building data pipelines from local files. But what if you could start those pipelines directly from Stats Perform's Opta API?&lt;/p&gt;
&lt;p&gt;That's exactly what this new addition enables. If you have access to the Stats Perform Opta API, you can now build lazy, powerful &lt;code&gt;matchflow&lt;/code&gt; pipelines straight from their feeds. This update is all about letting you focus on your analysis, not on the data engineering "grunt work" that comes with using a complex API.&lt;/p&gt;
&lt;h1 id="the-why-handling-the-hard-parts-for-you"&gt;The "Why?": Handling the Hard Parts for You&lt;/h1&gt;
&lt;p&gt;Let's be honest: while the Opta API is an incredibly rich data source, it's not always the easiest to work with. 
You have to manage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Authentication tokens&lt;/li&gt;
&lt;li&gt;Building the correct URL for each of the 20+ feeds&lt;/li&gt;
&lt;li&gt;Handling paginated endpoints (and knowing which ones are paginated)&lt;/li&gt;
&lt;li&gt;Parsing the deeply nested, complex JSON responses&lt;/li&gt;
&lt;li&gt;Flattening statistics and un-nesting event data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The new opta connector, available in &lt;code&gt;penaltyblog.matchflow&lt;/code&gt;, handles that for you.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lazy Execution:&lt;/strong&gt; It builds a &lt;code&gt;matchflow&lt;/code&gt; plan. No API calls are made until you run &lt;code&gt;.collect()&lt;/code&gt;, &lt;code&gt;.to_pandas()&lt;/code&gt;, or write to a file.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automatic Pagination:&lt;/strong&gt; Feeds like matches or venues are automatically paginated. You get a single, clean stream of all records, whether it takes 1 request or 100.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smart Parsing:&lt;/strong&gt; It automatically unnests and flattens data. For example, event data (MA3) is yielded one event at a time, and match stats (MA2) are yielded one player-stat or team-stat record at a time. It turns complex JSON into analysis-ready rows.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="getting-started-a-quick-example"&gt;Getting Started: A Quick Example&lt;/h1&gt;
&lt;p&gt;First, let's set our Stats Perform credentials as environment variables:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;OPTA_AUTH_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;your_auth_key_here&amp;quot;&lt;/span&gt;
&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;OPTA_RT_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;your_rt_mode_here&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# (e.g., &amp;quot;b&amp;quot;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, we can import the opta object and use it to build a &lt;code&gt;Flow&lt;/code&gt;. Let's get a list of all active tournament calendars (Feed OT2).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tournament_calendars&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;active&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;competitionCode&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;competitionCode&lt;/th&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ACO&lt;/td&gt;
&lt;td&gt;ax1yf4nlzqpcji4j8epdgx3zl&lt;/td&gt;
&lt;td&gt;Africa Cup of Nations Qualification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17Q&lt;/td&gt;
&lt;td&gt;64fygrchlfuz3q4lc7k2ffj84&lt;/td&gt;
&lt;td&gt;Africa U17 Cup of Nations Qualification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACO&lt;/td&gt;
&lt;td&gt;7dauoeun2gnkofl7f4y510s4f&lt;/td&gt;
&lt;td&gt;Africa U20 Cup of Nations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20Q&lt;/td&gt;
&lt;td&gt;4fht4nyqpp5dzzv6ucm057dp0&lt;/td&gt;
&lt;td&gt;Africa U20 Cup of Nations Qualification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;23Q&lt;/td&gt;
&lt;td&gt;27zeqzs85uxv3eej1mhot6gic&lt;/td&gt;
&lt;td&gt;Africa U23 Cup of Nations Qualification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h1 id="the-real-power-building-a-pipeline"&gt;The Real Power: Building a Pipeline&lt;/h1&gt;
&lt;p&gt;The true magic happens when you combine this source with &lt;code&gt;matchflow&lt;/code&gt;'s other methods. Let's build a common pipeline: &lt;strong&gt;getting all shots from a single match&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We'll use Feed MA3 (Match Events) and filter it down.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_opta_event&lt;/span&gt;

&lt;span class="c1"&gt;# The match we want to analyze&lt;/span&gt;
&lt;span class="n"&gt;FIXTURE_UUID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;zhs8gg1hvcuqvhkk2itb54pg&amp;quot;&lt;/span&gt; 

&lt;span class="c1"&gt;# 1. Start the flow from the Opta MA3 (events) feed&lt;/span&gt;
&lt;span class="c1"&gt;# This is lazy! No data is downloaded yet.&lt;/span&gt;
&lt;span class="n"&gt;event_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixture_uuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIXTURE_UUID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Chain matchflow methods to build our pipeline&lt;/span&gt;
&lt;span class="c1"&gt;# We&amp;#39;ll use the new &amp;#39;where_opta_event&amp;#39; helper&lt;/span&gt;
&lt;span class="n"&gt;shot_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event_flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;where_opta_event&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Goal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Attempt Saved&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Miss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Post&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Now, execute the full pipeline to &lt;/span&gt;
&lt;span class="c1"&gt;# download and filter the data&lt;/span&gt;
&lt;span class="n"&gt;shot_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shot_flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_pandas&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# You now have a clean DataFrame of *only* the shot events&lt;/span&gt;
&lt;span class="n"&gt;shot_df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;playerName&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;timeMin&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;timeSec&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;typeId&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;&lt;/th&gt;
&lt;th style="text-align: left;"&gt;playerName&lt;/th&gt;
&lt;th style="text-align: right;"&gt;timeMin&lt;/th&gt;
&lt;th style="text-align: right;"&gt;timeSec&lt;/th&gt;
&lt;th style="text-align: right;"&gt;x&lt;/th&gt;
&lt;th style="text-align: right;"&gt;y&lt;/th&gt;
&lt;th style="text-align: right;"&gt;typeId&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;0&lt;/td&gt;
&lt;td style="text-align: left;"&gt;H. Ekitiké&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td style="text-align: right;"&gt;76.8&lt;/td&gt;
&lt;td style="text-align: right;"&gt;68.7&lt;/td&gt;
&lt;td style="text-align: right;"&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Mohamed Salah&lt;/td&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;td style="text-align: right;"&gt;39&lt;/td&gt;
&lt;td style="text-align: right;"&gt;85.6&lt;/td&gt;
&lt;td style="text-align: right;"&gt;31.3&lt;/td&gt;
&lt;td style="text-align: right;"&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td style="text-align: left;"&gt;V. van Dijk&lt;/td&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td style="text-align: right;"&gt;27&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91.4&lt;/td&gt;
&lt;td style="text-align: right;"&gt;50.5&lt;/td&gt;
&lt;td style="text-align: right;"&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;A. Semenyo&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;td style="text-align: right;"&gt;55&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91.1&lt;/td&gt;
&lt;td style="text-align: right;"&gt;45&lt;/td&gt;
&lt;td style="text-align: right;"&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Evanilson&lt;/td&gt;
&lt;td style="text-align: right;"&gt;9&lt;/td&gt;
&lt;td style="text-align: right;"&gt;44&lt;/td&gt;
&lt;td style="text-align: right;"&gt;94.9&lt;/td&gt;
&lt;td style="text-align: right;"&gt;41.1&lt;/td&gt;
&lt;td style="text-align: right;"&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is what it's all about. You defined your source (&lt;code&gt;opta.events&lt;/code&gt;) and your transformation (&lt;code&gt;.filter&lt;/code&gt;), and &lt;code&gt;matchflow&lt;/code&gt; handled the API call, JSON parsing, and filtering, delivering a clean DataFrame of shots.&lt;/p&gt;
&lt;h1 id="say-goodbye-to-magic-numbers"&gt;Say Goodbye to "Magic Numbers"&lt;/h1&gt;
&lt;p&gt;You might have noticed in the example above that I filtered for event types like "Goal" and "Miss" directly. If you’ve worked with Opta data before, you know usually you have to memorize specific IDs to do this. You stare at your code wondering, "Is Event Type 15 a Goal or an Attempt Saved? Is Qualifier 1 a Long Ball or a Cross?"&lt;/p&gt;
&lt;p&gt;If you are not careful, you end up with code full of "magic numbers" that is hard to read and hard to debug:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# The Old Way: Hard to read, hard to debug&lt;/span&gt;
&lt;span class="n"&gt;shot_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;typeId&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Wait, which ID is which?&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I wanted to fix this developer experience. That's why I’ve included a comprehensive mapping of standard Opta definitions directly into &lt;code&gt;matchflow&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In addition to &lt;code&gt;where_opta_event&lt;/code&gt;, I've also added &lt;code&gt;where_opta_qualifier&lt;/code&gt;. These helpers let you filter using human-readable names, handling the ID lookups behind the scenes.&lt;/p&gt;
&lt;p&gt;Here is how you might filter for specific pass types without needing to look up a documentation PDF:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;where_opta_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_opta_qualifier&lt;/span&gt;

&lt;span class="c1"&gt;# The New Way: Readable and explicit&lt;/span&gt;
&lt;span class="c1"&gt;# Get all Passes that are specifically &amp;#39;Long balls&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;long_ball_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;where_opta_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Pass&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;where_opta_qualifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Long ball&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These helpers support list inputs for multiple events and value checks for qualifiers, making your analysis code self-documenting and much easier to share with colleagues.&lt;/p&gt;
&lt;h1 id="why-matchflow-cloud-storage-more"&gt;Why matchflow? Cloud Storage &amp;amp; More&lt;/h1&gt;
&lt;p&gt;While the API integration is the headline feature of this update, it's worth remembering why &lt;code&gt;matchflow&lt;/code&gt; exists in the first place. It isn't just a script to download data, it's a complete processing engine for football data.&lt;/p&gt;
&lt;p&gt;Because &lt;code&gt;matchflow&lt;/code&gt; is built on top of the excellent &lt;code&gt;fsspec&lt;/code&gt; library, you aren't limited to saving your data locally. You can stream the API response directly into &lt;strong&gt;S3&lt;/strong&gt;, &lt;strong&gt;Google Cloud Storage&lt;/strong&gt;, or &lt;strong&gt;Azure Blob Storage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is huge for building data lakes or automated pipelines. You don't need to download the data to memory and then upload it as &lt;code&gt;matchflow&lt;/code&gt; handles the data streaming for you.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Stream Opta events directly to an S3 bucket&lt;/span&gt;
&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixture_uuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIXTURE_UUID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_jsonl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;s3://my-football-data/raw/events.jsonl&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And because &lt;code&gt;matchflow&lt;/code&gt; is lazy, it optimizes your plan before execution. If you chain a &lt;code&gt;.limit(10)&lt;/code&gt; to an API call, &lt;code&gt;matchflow&lt;/code&gt; knows to stop processing as soon as it has 10 records, saving you time and compute resources.&lt;/p&gt;
&lt;h1 id="the-killer-feature-joining-streams"&gt;The "Killer Feature": Joining Streams&lt;/h1&gt;
&lt;p&gt;If you have worked with football data before, you know the data is often fragmented. You get match events in one feed, but the player details are often in a completely different feed.&lt;/p&gt;
&lt;p&gt;Traditionally, linking these together forces you to make a choice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The RAM Hog:&lt;/strong&gt; Load everything into Pandas DataFrames and merge them (risking memory crashes with large historical dumps).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Hard Way:&lt;/strong&gt; Write complex Python loops to manually look up player IDs as you iterate through events.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;matchflow&lt;/code&gt; solves this with &lt;strong&gt;lazy joins&lt;/strong&gt;. You can join two data streams just like you would in SQL or Pandas, but it remains a lazy iterator.&lt;/p&gt;
&lt;h3 id="pro-tip-working-with-ip-whitelists"&gt;Pro Tip: Working with IP Whitelists&lt;/h3&gt;
&lt;p&gt;Stats Perform feeds are often IP-restricted, which can be difficult if you are working from a laptop or a dynamic cloud environment.&lt;/p&gt;
&lt;p&gt;To solve this, &lt;code&gt;matchflow&lt;/code&gt; supports standard Python proxy dictionaries. If you have a static IP proxy set up to match your whitelist, you can pass it directly to any opta method:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define your secure proxy&lt;/span&gt;
&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;http&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;http://user:pass@10.10.1.10:3128&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;https&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;http://user:pass@10.10.1.10:1080&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Pass it into your flow&lt;/span&gt;
&lt;span class="n"&gt;event_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fixture_uuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIXTURE_UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="what-feeds-are-supported"&gt;What Feeds Are Supported?&lt;/h1&gt;
&lt;p&gt;I've added support for a wide range of the most popular feeds, with more to come. You can now build flows from:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tournament Feeds (OT2, MA0):&lt;/strong&gt; &lt;code&gt;tournament_calendars()&lt;/code&gt;, &lt;code&gt;tournament_schedule()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Venue &amp;amp; Area Feeds (OT3, OT4):&lt;/strong&gt; &lt;code&gt;venues()&lt;/code&gt;, &lt;code&gt;areas()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Match Feeds (MA1, MA2, MA3):&lt;/strong&gt; &lt;code&gt;matches()&lt;/code&gt;, &lt;code&gt;match()&lt;/code&gt;, &lt;code&gt;match_stats_player()&lt;/code&gt;, &lt;code&gt;match_stats_team()&lt;/code&gt;, &lt;code&gt;events()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Advanced Match Feeds (MA4, MA5):&lt;/strong&gt; &lt;code&gt;pass_matrix()&lt;/code&gt;, &lt;code&gt;possession()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Player/Team Feeds (PE2, TM1, TM2, TM3, TM4):&lt;/strong&gt; &lt;code&gt;player_career()&lt;/code&gt;, &lt;code&gt;teams()&lt;/code&gt;, &lt;code&gt;team_standings()&lt;/code&gt;, &lt;code&gt;squads()&lt;/code&gt;, &lt;code&gt;player_season_stats()&lt;/code&gt;, &lt;code&gt;team_season_stats()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;And more!&lt;/strong&gt; Including &lt;code&gt;referees()&lt;/code&gt;, &lt;code&gt;rankings()&lt;/code&gt;, &lt;code&gt;injuries()&lt;/code&gt;, and &lt;code&gt;transfers()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;This latest update transforms &lt;code&gt;matchflow&lt;/code&gt; from a file processing tool into an end-to-end solution for working with Opta feeds. By handling authentication, pagination, and complex JSON parsing for you, it removes the friction between you and the insights you're looking for.&lt;/p&gt;
&lt;p&gt;Whether you are building a simple shot map or a complex, cloud-based data lake, the new Opta connector is designed to make your life easier.&lt;/p&gt;
&lt;p&gt;This feature is available now in the latest version of &lt;code&gt;penaltyblog&lt;/code&gt;. I’m excited to see what you build with it - if you have any feedback or run into issues, please let me know!&lt;/p&gt;</content><category term="Player Analytics"></category><category term="Opta"></category><category term="matchflow"></category></entry><entry><title>Shrinkage, Uncertainty, and Son Heung-min: Using Bayesian Methods to Identify Finishing Ability</title><link href="2025/10/01/a-better-way-to-measure-finishing-skill/" rel="alternate"></link><published>2025-10-01T19:30:00+00:00</published><updated>2025-10-01T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-10-01:2025/10/01/a-better-way-to-measure-finishing-skill/</id><summary type="html">&lt;p&gt;Why most finishing metrics are flawed and how a Bayesian approach gives us a truer picture of a player's finishing ability...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;In football analytics, we often turn to a simple metric: &lt;strong&gt;Goals minus Expected Goals (G-xG)&lt;/strong&gt; to assess a player's finishing ability. It’s a decent first look, often used to argue that a player is "running hot" or getting lucky. &lt;/p&gt;
&lt;p&gt;However, it has a fundamental flaw: it struggles to separate temporary overperformance from genuine, repeatable ability.&lt;/p&gt;
&lt;p&gt;Imagine two players:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Player A scores 4 goals from 2.0 xG. His G-xG is +2.0.&lt;/li&gt;
&lt;li&gt;Player B scores 20 goals from 18.0 xG. His G-xG is also +2.0.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Are they equally good finishers?&lt;/p&gt;
&lt;p&gt;Our intuition says no. Player A might just be on a lucky hot streak over a handful of shots, while Player B has a more proven track record. The core problem is that simple metrics, such as G-xG, can't distinguish between &lt;strong&gt;repeatable skill&lt;/strong&gt; and &lt;strong&gt;random luck&lt;/strong&gt;. To do that, we need a better approach.&lt;/p&gt;
&lt;h1 id="the-trouble-with-a-single-number"&gt;The Trouble with a Single Number&lt;/h1&gt;
&lt;p&gt;The issue with a simple metric like &lt;code&gt;Goals - xG&lt;/code&gt; is that it hides the two most important factors in any analysis: &lt;strong&gt;context and confidence&lt;/strong&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;It Ignores Sample Size:&lt;/strong&gt; A player who scores one goal from a 0.1 xG chance (+0.9 G-xG) looks like a world-class finisher. But with only one shot, we have almost no information. Simple metrics treat this the same as a player who consistently overperforms over hundreds of shots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;It Gives a False Sense of Precision:&lt;/strong&gt; By providing a single number, &lt;code&gt;G-xG&lt;/code&gt; suggests a definitive value for a player's skill. But in reality, there's a range of plausible skill levels for every player. A striker's &lt;code&gt;+5.0 G-xG&lt;/code&gt; might be a true reflection of their talent, or it might be the high end of a lucky season where their "true" skill is closer to &lt;code&gt;+2.0&lt;/code&gt;. We have no way to know how certain we are.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To truly measure finishing, we need a model that can think more like a human scout: it should be skeptical of small samples and express its confidence in its own conclusions.&lt;/p&gt;
&lt;h1 id="a-better-approach-thinking-in-probabilities"&gt;A Better Approach: Thinking in Probabilities&lt;/h1&gt;
&lt;p&gt;Instead of relying on simple subtraction, we can get a much truer picture of finishing skill using a &lt;strong&gt;Bayesian hierarchical model&lt;/strong&gt;. While that sounds complex, the idea behind it is intuitive.&lt;/p&gt;
&lt;p&gt;The model assumes that most players are average finishers and requires significant evidence to believe a player is truly exceptional. It achieves this in two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;"Borrowing Strength" and Shrinkage:&lt;/strong&gt; The model analyzes all players simultaneously and learns the skill distribution of the entire league. For players with little data, their skill estimate is "shrunk" towards this league average. A player with 2 goals from 3 shots won't be crowned a superstar as the model remains skeptical and keeps their estimate stable. This prevents outliers from topping the leaderboards by luck alone.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Quantifying Uncertainty:&lt;/strong&gt; This is the biggest advantage. Instead of a single number, the model produces a full range of plausible skill levels for each player, known as a &lt;strong&gt;credible interval&lt;/strong&gt;. We can finally move from "Player X is a &lt;code&gt;+2.0&lt;/code&gt; finisher" to "We are 94% certain that Player X is between a &lt;code&gt;+1.5&lt;/code&gt; and &lt;code&gt;+2.5&lt;/code&gt; finisher." This allows us to see not just &lt;em&gt;what&lt;/em&gt; the model thinks a player's skill is, but &lt;em&gt;how confident&lt;/em&gt; it is in that assessment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To achieve this, the model intelligently isolates a player's individual talent. For every shot, it accounts for three key factors: the &lt;strong&gt;quality of the chance&lt;/strong&gt; (the xG), the &lt;strong&gt;baseline difficulty of the competition&lt;/strong&gt; (it's harder to score in the Premier League than in some other leagues), and finally, the &lt;strong&gt;unique skill of the player&lt;/strong&gt; taking the shot. This final value - the player's individual contribution after all other factors are considered - is what we define as true finishing skill. We'll call this metric &lt;strong&gt;Finishing Skill Above Average (FSAA)&lt;/strong&gt;.&lt;/p&gt;
&lt;h1 id="the-results-ranking-europes-elite-finishers"&gt;The Results: Ranking Europe's Elite Finishers&lt;/h1&gt;
&lt;p&gt;After running the model on hundreds of thousands of non-penalty shots from Europe’s top leagues, we get a stable, uncertainty-aware ranking of finishing skill. &lt;/p&gt;
&lt;p&gt;The table below shows the top 10, a list dominated by players renowned for their clinical ability in front of goal. The players are ranked by the lower bound of their skill estimate (HDI 3%), which shows our minimum level of confidence in their ability. This means we are ranking them by their estimated 'worst-case' skill, not just their average estimate.&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;Player&lt;/th&gt;
      &lt;th&gt;FSAA&lt;/th&gt;
      &lt;th&gt;HDI 3%&lt;/th&gt;
      &lt;th&gt;HDI 97%&lt;/th&gt;
      &lt;th&gt;Prob &gt; Average&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Son Heung-Min&lt;/td&gt;
      &lt;td&gt;0.354&lt;/td&gt;
      &lt;td&gt;0.192&lt;/td&gt;
      &lt;td&gt;0.543&lt;/td&gt;
      &lt;td&gt;1.00000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Lionel Messi&lt;/td&gt;
      &lt;td&gt;0.292&lt;/td&gt;
      &lt;td&gt;0.151&lt;/td&gt;
      &lt;td&gt;0.425&lt;/td&gt;
      &lt;td&gt;1.00000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Antoine Griezmann&lt;/td&gt;
      &lt;td&gt;0.305&lt;/td&gt;
      &lt;td&gt;0.138&lt;/td&gt;
      &lt;td&gt;0.471&lt;/td&gt;
      &lt;td&gt;0.99950&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;James Rodríguez&lt;/td&gt;
      &lt;td&gt;0.324&lt;/td&gt;
      &lt;td&gt;0.108&lt;/td&gt;
      &lt;td&gt;0.555&lt;/td&gt;
      &lt;td&gt;0.99600&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Harry Kane&lt;/td&gt;
      &lt;td&gt;0.247&lt;/td&gt;
      &lt;td&gt;0.101&lt;/td&gt;
      &lt;td&gt;0.381&lt;/td&gt;
      &lt;td&gt;0.99875&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Kylian Mbappe-Lottin&lt;/td&gt;
      &lt;td&gt;0.246&lt;/td&gt;
      &lt;td&gt;0.097&lt;/td&gt;
      &lt;td&gt;0.392&lt;/td&gt;
      &lt;td&gt;0.99925&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Kevin De Bruyne&lt;/td&gt;
      &lt;td&gt;0.256&lt;/td&gt;
      &lt;td&gt;0.076&lt;/td&gt;
      &lt;td&gt;0.456&lt;/td&gt;
      &lt;td&gt;0.99250&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Iago Aspas&lt;/td&gt;
      &lt;td&gt;0.245&lt;/td&gt;
      &lt;td&gt;0.069&lt;/td&gt;
      &lt;td&gt;0.413&lt;/td&gt;
      &lt;td&gt;0.99725&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Manolo Gabbiadini&lt;/td&gt;
      &lt;td&gt;0.266&lt;/td&gt;
      &lt;td&gt;0.061&lt;/td&gt;
      &lt;td&gt;0.476&lt;/td&gt;
      &lt;td&gt;0.98975&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Dries Mertens&lt;/td&gt;
      &lt;td&gt;0.232&lt;/td&gt;
      &lt;td&gt;0.051&lt;/td&gt;
      &lt;td&gt;0.424&lt;/td&gt;
      &lt;td&gt;0.98800&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id="understanding-the-rankings-what-do-the-numbers-mean"&gt;Understanding the Rankings: What Do the Numbers Mean?&lt;/h1&gt;
&lt;p&gt;The table provides a rich picture of each player's finishing ability, far beyond a simple rank. Here’s a quick guide to interpreting the columns:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSAA&lt;/strong&gt;: This is the model's best estimate of a player's finishing skill. A positive number means they are an above-average finisher, while a negative number indicates a below-average one. The higher the value, the more goals a player adds compared to an average player taking the same shots.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HDI 3% / HDI 97%&lt;/strong&gt;: This is our "confidence" range. We can be 94% certain that the player's true finishing skill lies between these two values. If the entire range is positive (above zero), as it is for all the players in the top 10, we can be highly confident that their performance is due to genuine skill, not just a lucky streak.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prob &amp;gt; Average&lt;/strong&gt;: This is the most direct measure of confidence. It tells us the probability that a player is a better-than-average finisher. For the truly elite, this number is very close to 100%. For example, the model is 100% certain that both Son Heung-Min and Lionel Messi are above-average finishers.&lt;/p&gt;
&lt;h1 id="sense-check-the-elite-lionel-messi"&gt;Sense Check: The Elite (Lionel Messi)&lt;/h1&gt;
&lt;p&gt;The model ranks Son Heung-Min and Lionel Messi as the top two finishers, a result that makes perfect sense. Both are famous for consistently outperforming their expected goals. For Messi, the model is 100% certain he is an above-average finisher, which I don't think many people would disagree with.&lt;/p&gt;
&lt;h1 id="sense-check-the-not-so-elite-jesus-navas"&gt;Sense Check: The Not-so Elite (Jesús Navas)&lt;/h1&gt;
&lt;p&gt;To show the model works at both ends of the spectrum, consider Jesús Navas. Famous for his hard work but not his finishing, the model ranks him in the bottom 10 of the 7,500+ players analysed by the model. It is 88% certain he is a below-average finisher, confirming that the model's rankings align with reality. Here's the rest of the bottom 10 finishers.&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;Player&lt;/th&gt;
      &lt;th&gt;FSAA&lt;/th&gt;
      &lt;th&gt;HDI 3%&lt;/th&gt;
      &lt;th&gt;HDI 97%&lt;/th&gt;
      &lt;th&gt;Prob &gt; Average&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Rodrigo Palacio&lt;/td&gt;
      &lt;td&gt;-0.178&lt;/td&gt;
      &lt;td&gt;-0.405&lt;/td&gt;
      &lt;td&gt;0.045&lt;/td&gt;
      &lt;td&gt;0.06750&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Jesús Navas&lt;/td&gt;
      &lt;td&gt;-0.158&lt;/td&gt;
      &lt;td&gt;-0.406&lt;/td&gt;
      &lt;td&gt;0.105&lt;/td&gt;
      &lt;td&gt;0.12225&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Dominic Calvert-Lewin&lt;/td&gt;
      &lt;td&gt;-0.199&lt;/td&gt;
      &lt;td&gt;-0.409&lt;/td&gt;
      &lt;td&gt;0.006&lt;/td&gt;
      &lt;td&gt;0.03575&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Giampaolo Pazzini&lt;/td&gt;
      &lt;td&gt;-0.159&lt;/td&gt;
      &lt;td&gt;-0.412&lt;/td&gt;
      &lt;td&gt;0.101&lt;/td&gt;
      &lt;td&gt;0.12350&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Florian Sotoca&lt;/td&gt;
      &lt;td&gt;-0.189&lt;/td&gt;
      &lt;td&gt;-0.413&lt;/td&gt;
      &lt;td&gt;0.046&lt;/td&gt;
      &lt;td&gt;0.06050&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Keane Lewis-Potter&lt;/td&gt;
      &lt;td&gt;-0.138&lt;/td&gt;
      &lt;td&gt;-0.413&lt;/td&gt;
      &lt;td&gt;0.121&lt;/td&gt;
      &lt;td&gt;0.17050&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Emmanuel Rivière&lt;/td&gt;
      &lt;td&gt;-0.152&lt;/td&gt;
      &lt;td&gt;-0.414&lt;/td&gt;
      &lt;td&gt;0.093&lt;/td&gt;
      &lt;td&gt;0.13200&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hans Hateboer&lt;/td&gt;
      &lt;td&gt;-0.160&lt;/td&gt;
      &lt;td&gt;-0.416&lt;/td&gt;
      &lt;td&gt;0.092&lt;/td&gt;
      &lt;td&gt;0.11475&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Alexander Djiku&lt;/td&gt;
      &lt;td&gt;-0.152&lt;/td&gt;
      &lt;td&gt;-0.419&lt;/td&gt;
      &lt;td&gt;0.100&lt;/td&gt;
      &lt;td&gt;0.13075&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mijat Gacinovic&lt;/td&gt;
      &lt;td&gt;-0.167&lt;/td&gt;
      &lt;td&gt;-0.426&lt;/td&gt;
      &lt;td&gt;0.088&lt;/td&gt;
      &lt;td&gt;0.10725&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id="putting-it-into-practice"&gt;Putting It Into Practice&lt;/h1&gt;
&lt;p&gt;So, what does a &lt;strong&gt;FSAA&lt;/strong&gt; of &lt;code&gt;+0.292&lt;/code&gt; for Lionel Messi actually mean?&lt;/p&gt;
&lt;p&gt;It's an estimate of Messi's finishing skill compared to an "average" finisher on a &lt;a href="https://en.wikipedia.org/wiki/Logit"&gt;log-odds&lt;/a&gt; scale, but we can translate it into real-world impact. For a standard shot with a 10% chance of being a goal (0.10 xG), Messi's skill increases that probability to nearly 13%.&lt;/p&gt;
&lt;p&gt;While an absolute increase of 3% may not sound like a lot, it's a nearly &lt;strong&gt;30% relative increase&lt;/strong&gt; in the likelihood of scoring from a given shot compared with an "average" player.&lt;/p&gt;
&lt;p&gt;Over the course of 100 of these shots, an average player would score 10 goals. Messi would be expected to score 13. His elite finishing ability creates 3 extra goals from the exact same set of chances.&lt;/p&gt;
&lt;h1 id="practical-applications-for-clubs"&gt;Practical Applications for Clubs&lt;/h1&gt;
&lt;p&gt;This finishing skill model offers several advantages for professional clubs over traditional metrics. &lt;/p&gt;
&lt;p&gt;In recruitment, scouts can identify undervalued strikers who consistently outperform their Expected Goals but haven't yet caught the market's attention, while avoiding overpaying for players on unsustainable hot streaks. &lt;/p&gt;
&lt;p&gt;The uncertainty intervals help clubs assess risk - a striker with a wide credible interval might represent a bigger gamble, while one with tighter intervals represents a safer investment. &lt;/p&gt;
&lt;p&gt;The model also supports contract negotiations by providing objective evidence of a player's true finishing ability, separate from temporary form or lucky seasons. &lt;/p&gt;
&lt;p&gt;Perhaps most valuably, the approach helps clubs avoid the costly mistake of building their attack around a player whose apparent finishing prowess is actually just statistical noise. Looking at you Dominic Calvert-Lewin.&lt;/p&gt;
&lt;h1 id="methodology-details"&gt;Methodology Details&lt;/h1&gt;
&lt;p&gt;Data for the analyses were collected for non-penalty shots for the Premier League, Ligue 1, Serie A, Budesliga 1, La Liga for seasons 2014/2015 to 2025/2026.&lt;/p&gt;
&lt;p&gt;The rankings were produced by a multilevel Bayesian logistic regression. The model included crossed random effects for player and competition to isolate finishing skill from league-wide differences. Priors were chosen to be weakly informative, and inference was performed using a NUTS MCMC sampler with multiple chains. Convergence was confirmed by ensuring the &lt;code&gt;r_hat&lt;/code&gt; statistic for all parameters was below 1.01.&lt;/p&gt;
&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;For years, &lt;code&gt;Goals - xG&lt;/code&gt; has been the default metric for finishing skill, but it's a noisy measure that can be misleading. &lt;/p&gt;
&lt;p&gt;As any clichéd football pundit will tell you, "form is temporary, class is permanent" - but most finishing metrics struggle to separate the two. By using Bayesian approaches, we can build models that properly account for sample size and quantify uncertainty, giving us a much truer and more stable picture of player ability.&lt;/p&gt;
&lt;p&gt;This model provides more than just a ranking; it offers &lt;strong&gt;objective evidence&lt;/strong&gt; of a player's true finishing ability backed by statistical confidence. We can now quantify exactly how much better one finisher is than another - Messi's +0.292 skill creates 3 extra goals per 100 shots compared to an average player, while Son's +0.354 creates nearly 4. The credible intervals tell us precisely how confident we should be in these assessments.&lt;/p&gt;
&lt;p&gt;The results speak for themselves, identifying a list of world-class players whose elite finishing talent is not just a lucky streak but a genuine, repeatable skill. Bayesian methods offer a robust way to evaluate one of football's most crucial talents.&lt;/p&gt;</content><category term="Player Analytics"></category><category term="Bayesian"></category><category term="Expected Goals"></category><category term="Player Analytics"></category></entry><entry><title>From Biased Odds to Fair Probabilities: Removing the Bookmaker's Overround</title><link href="2025/09/14/from-biased-odds-to-fair-probabilities/" rel="alternate"></link><published>2025-09-14T19:30:00+00:00</published><updated>2025-09-14T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-09-14:2025/09/14/from-biased-odds-to-fair-probabilities/</id><summary type="html">&lt;p&gt;A simple guide to stripping the overround and finding the real probabilities behind the bookmaker's odds...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;If you've ever converted betting odds to probabilities, you've likely noticed they sum to a value greater than 100%. This surplus is the bookmaker's margin, often called the "overround" or "vig", and it's how they guarantee a profit. For any serious quantitative analysis, removing this margin to find the "true" underlying probabilities is a critical first step.&lt;/p&gt;
&lt;p&gt;The challenge, however, is that there is no single, universally agreed-upon way to remove this margin. The process depends on our assumptions about how the bookmaker has applied it. Is it distributed evenly across all outcomes, or is it weighted towards the longshots?&lt;/p&gt;
&lt;p&gt;To help analysts tackle this problem with greater flexibility, I've updated to the implied odds functions in the &lt;code&gt;penaltyblog&lt;/code&gt; package. This article explores the new features and takes a deep dive into the different theoretical models you can now use.&lt;/p&gt;
&lt;h1 id="whats-new-in-penaltyblog"&gt;What's New in penaltyblog?&lt;/h1&gt;
&lt;p&gt;The core of this update is a move to a more modern and robust architecture. I've have replaced standard dictionaries with type-safe dataclasses, namely &lt;code&gt;OddsInput&lt;/code&gt; and &lt;code&gt;ImpliedProbabilities&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This change provides several key benefits for users:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Clarity and Safety:&lt;/strong&gt; The new structure makes the API self-documenting. You know exactly what data to provide and what the function will return, allowing tools like &lt;code&gt;mypy&lt;/code&gt; to catch potential errors before you run your code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ease of Use:&lt;/strong&gt; With type-safe objects, IDEs can provide better auto-completion, making the library more intuitive to work with.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All functionality is now consolidated into a single, powerful function: &lt;code&gt;penaltyblog.implied.calculate_implied()&lt;/code&gt;. This serves as a unified entry point for all the supported methods.&lt;/p&gt;
&lt;h1 id="a-deep-dive-into-implied-probability-methods"&gt;A Deep Dive into Implied Probability Methods&lt;/h1&gt;
&lt;p&gt;The fundamental question when removing the overround is: how is the margin distributed? To illustrate the different approaches, we'll use a typical 1X2 football market with the following decimal odds: &lt;strong&gt;Home: 2.7, Draw: 2.3, Away: 4.4.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The unadjusted probabilities, calculated as $1/odds$, are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$1/2.7=0.370$&lt;/li&gt;
&lt;li&gt;$1/2.3=0.434$&lt;/li&gt;
&lt;li&gt;$1/4.4=0.227$&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The sum of these is $1.032$, which gives us a bookmaker's margin of $3.2\%$. Each method below attempts to reduce this sum to exactly $1.0$ based on a different set of assumptions.&lt;/p&gt;
&lt;h2 id="the-basic-approaches"&gt;The Basic Approaches&lt;/h2&gt;
&lt;p&gt;These two methods are the most common and serve as excellent baselines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multiplicative (&lt;code&gt;"multiplicative"&lt;/code&gt;):&lt;/strong&gt; This is the simplest and most widely used method. It assumes the margin is distributed proportionally across all outcomes and normalises the raw probabilities by dividing each one by their sum.​&lt;/p&gt;
&lt;p&gt;$$p_i = \frac{1/o_i}{\sum_{j=1}^{n} 1/o_j}$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Additive (&lt;code&gt;"additive"&lt;/code&gt;):&lt;/strong&gt; This method assumes the margin is an equal, fixed amount subtracted from each outcome's raw probability.&lt;/p&gt;
&lt;p&gt;$$p_i = \frac{1}{o_i} - \frac{M}{n}$$&lt;/p&gt;
&lt;p&gt;where $M$ is the total margin and $n$ is the number of outcomes.&lt;/p&gt;
&lt;h2 id="iterative-approaches"&gt;Iterative Approaches&lt;/h2&gt;
&lt;p&gt;These methods are more complex and often require solving for a parameter that normalises the probability distribution.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Power (&lt;code&gt;"power"&lt;/code&gt;):&lt;/strong&gt; This method raises the raw probabilities to a power, &lt;code&gt;k&lt;/code&gt;, which is solved for iteratively. It provides more flexibility than the basic methods in modelling how the margin is applied.&lt;/p&gt;
&lt;p&gt;$$p_i = \left(\frac{1}{o_i}\right)^k$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shin (&lt;code&gt;"shin"&lt;/code&gt;):&lt;/strong&gt; Developed by Hyun Song Shin, this method is derived from a model that assumes a mix of informed and uninformed bettors in the market. It solves for a parameter, &lt;code&gt;z&lt;/code&gt;, which can be interpreted as the proportion of uninformed money. It is a popular method in academic literature for modelling the favourite-longshot bias.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Odds Ratio (&lt;code&gt;"odds_ratio"&lt;/code&gt;):&lt;/strong&gt; Proposed by Keith Cheung, this method models the relationship between the bookmaker's probabilities and the true probabilities using an odds ratio formulation, solving for a constant &lt;code&gt;c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Differential Margin Weighting (&lt;code&gt;"differential_margin_weighting"&lt;/code&gt;):&lt;/strong&gt; This approach, popularised by Joseph Buchdahl, works on the assumption that the margin is applied proportionally to the odds themselves, which results in a greater proportion of the margin being applied to longshots.&lt;/p&gt;
&lt;p&gt;$$\text{fair_odds}_i = \frac{n \cdot o_i}{n - (M \cdot o_i)}$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Logarithmic (&lt;code&gt;"logarithmic"&lt;/code&gt;):&lt;/strong&gt; This method converts the bookmaker's probabilities into "log-odds" space and assumes the margin is applied as an equal subtraction from each outcome. It solves for a single constant, c, that is removed from all log-odds values to ensure the final probabilities sum to 1.0.&lt;/p&gt;
&lt;h1 id="putting-it-all-together-a-practical-example"&gt;Putting It All Together: A Practical Example&lt;/h1&gt;
&lt;p&gt;The new &lt;code&gt;calculate_implied&lt;/code&gt; function makes it easy to apply and compare these methods.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="c1"&gt;# Our example odds for a Home, Draw, Away market&lt;/span&gt;
&lt;span class="n"&gt;odds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;2.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;market_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Draw&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate using the default Multiplicative method&lt;/span&gt;
&lt;span class="n"&gt;result_mult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;implied&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calculate_implied&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;odds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;market_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;market_names&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Multiplicative: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result_mult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# &amp;gt;&amp;gt; Multiplicative: {&amp;#39;Home&amp;#39;: 0.3587, &amp;#39;Draw&amp;#39;: 0.4211, &amp;#39;Away&amp;#39;: 0.2201}&lt;/span&gt;

&lt;span class="c1"&gt;# Now try Shin&amp;#39;s method&lt;/span&gt;
&lt;span class="n"&gt;result_shin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;implied&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calculate_implied&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;odds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;shin&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;market_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;market_names&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Shin: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result_shin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# &amp;gt;&amp;gt; Shin: {&amp;#39;Home&amp;#39;: 0.3593, &amp;#39;Draw&amp;#39;: 0.4232, &amp;#39;Away&amp;#39;: 0.2174}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="an-empirical-test-which-method-is-most-accurate"&gt;An Empirical Test: Which Method is Most Accurate?&lt;/h1&gt;
&lt;p&gt;While the theoretical differences are interesting, the crucial question is whether one method produces more accurate probabilities than another. To test this, we can take a historical dataset of odds and outcomes and evaluate the accuracy of each method using a proper scoring rule.&lt;/p&gt;
&lt;p&gt;For this analysis, I used the closing odds for all 380 matches from the 2024/25 English Premier League season. For each match, I calculated the implied 1X2 probabilities for Bet365's odds using every method. I then scored these probabilities against the actual match outcome (Home Win, Draw, or Away Win) using the Ranked Probability Score (RPS).&lt;/p&gt;
&lt;p&gt;The RPS is a measure of the distance between a probabilistic forecast and the observed outcome, where a lower score indicates a more accurate forecast.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Get the fixture data for the 2024-2025 Premier League season&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2024-2025&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Map the full time result to a numerical outcome (0 for Home, 1 for Draw, 2 for Away)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;outcome&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Define a function to normalise the odds using a specific method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class="sd"&gt;    Calculates implied probabilities from betting odds using a specified method.&lt;/span&gt;

&lt;span class="sd"&gt;    Args:&lt;/span&gt;
&lt;span class="sd"&gt;        x (pd.Series): A row of the DataFrame containing &amp;#39;b365_h&amp;#39;, &amp;#39;b365_d&amp;#39;, &amp;#39;b365_a&amp;#39; columns.&lt;/span&gt;
&lt;span class="sd"&gt;        method (str): The normalisation method to use.&lt;/span&gt;

&lt;span class="sd"&gt;    Returns:&lt;/span&gt;
&lt;span class="sd"&gt;        pd.Series: A pandas Series with the normalised probabilities for home, draw, and away outcomes.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;implied&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calculate_implied&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;b365_h&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;b365_d&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;b365_a&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
        &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Return the probabilities as a Series with named columns&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;prob_h&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;prob_d&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;prob_a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Define the list of normalisation methods to compare&lt;/span&gt;
&lt;span class="n"&gt;methods&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;additive&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;multiplicative&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;power&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;differential_margin_weighting&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;shin&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;odds_ratio&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;logarithmic&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="c1"&gt;# Iterate through each normalisation method&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Apply the normalise function to each row of the DataFrame to get the probabilities&lt;/span&gt;
    &lt;span class="n"&gt;norm_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate the average Ranked Probability Score (RPS) for the current method&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norm_probs&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;prob_h&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;prob_d&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;prob_a&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;outcome&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Store the rounded RPS value in the results dictionary&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Print the results&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;The results show a tight race with a compelling conclusion. While the simple &lt;strong&gt;multiplicative&lt;/strong&gt; method was technically the most accurate with the lowest RPS of &lt;strong&gt;0.19724&lt;/strong&gt;, the differences among the top methods are negligible.&lt;/p&gt;
&lt;p&gt;The more theoretically grounded &lt;strong&gt;Odds Ratio&lt;/strong&gt; and &lt;strong&gt;Logarithmic&lt;/strong&gt; methods were virtually indistinguishable in performance, tying for a close second place. This suggests that for a highly efficient market like the English Premier League, several different models for how bookmakers apply their margin can produce similarly accurate probabilities.&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;Method&lt;/th&gt;
      &lt;th&gt;RPS&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;multiplicative&lt;/td&gt;
      &lt;td&gt;0.19724&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;logarithmic&lt;/td&gt;
      &lt;td&gt;0.19730&lt;/td&gt;
    &lt;/tr&gt;    
    &lt;tr&gt;
      &lt;td&gt;odds_ratio&lt;/td&gt;
      &lt;td&gt;0.19730&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;shin&lt;/td&gt;
      &lt;td&gt;0.19731&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;additive&lt;/td&gt;
      &lt;td&gt;0.19736&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;differential_margin_weighting&lt;/td&gt;
      &lt;td&gt;0.19736&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;power&lt;/td&gt;
      &lt;td&gt;0.19739&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Removing the bookmaker's margin is a foundational step in quantitative sports analysis. With the latest update to &lt;code&gt;penaltyblog&lt;/code&gt;, analysts now have a comprehensive and easy-to-use toolkit to perform this task with greater precision. &lt;/p&gt;</content><category term="Betting"></category><category term="Betting"></category><category term="Odds"></category></entry><entry><title>Penaltyblog v1.5.0: Faster Models, Smarter Queries, and a Sharper Edge</title><link href="2025/08/14/penaltyblog-1.5.0/" rel="alternate"></link><published>2025-08-14T19:30:00+00:00</published><updated>2025-08-14T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-08-14:2025/08/14/penaltyblog-1.5.0/</id><summary type="html">&lt;p&gt;v1.5.0 delivers interactive charts, faster models, upgraded football probability grid, and a powerful Flow query language - all designed to make your analysis sharper and quicker...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;The latest release of the &lt;code&gt;penaltyblog&lt;/code&gt; Python package brings a major upgrade to your football analytics toolkit.&lt;/p&gt;
&lt;p&gt;Highlights include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Interactive plotting&lt;/strong&gt; for rich, dynamic pitch visualisations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster goal-model fitting&lt;/strong&gt; with improved stability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;More accurate goal-expectancy estimation&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Powerful Flow query tools&lt;/strong&gt; for streamlined data access.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="pitch-the-interactive-football-plotting-library"&gt;Pitch - The Interactive Football Plotting Library&lt;/h1&gt;
&lt;p&gt;The standout feature in this release is the &lt;strong&gt;brand-new&lt;/strong&gt; &lt;code&gt;Pitch&lt;/code&gt; &lt;strong&gt;plotting API&lt;/strong&gt; - a fully interactive, Plotly-powered framework for building rich football pitch visualisations in Python.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key capabilities&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple built-in pitch dimensions and themes.&lt;/li&gt;
&lt;li&gt;Horizontal or vertical layouts.&lt;/li&gt;
&lt;li&gt;Flexible view modes for zooming into specific areas.&lt;/li&gt;
&lt;li&gt;Layering of scatter points, heatmaps, kernel-density surfaces, comets, arrows, and more.&lt;/li&gt;
&lt;li&gt;Custom hover tooltips, colour schemes, opacity, and orientation controls.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Display charts inline in Jupyter notebooks.&lt;/li&gt;
&lt;li&gt;Save as static images.&lt;/li&gt;
&lt;li&gt;Export as standalone HTML for embedding anywhere.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, it’s a &lt;strong&gt;single, versatile tool&lt;/strong&gt; for producing publication-ready, explorable football data visuals.&lt;/p&gt;
&lt;h2 id="example-shot-map"&gt;Example: Shot Map&lt;/h2&gt;
&lt;p&gt;The example below uses the new &lt;code&gt;Pitch&lt;/code&gt; API to visualise every shot taken by Liverpool in match ID 22912 from StatsBomb.&lt;/p&gt;
&lt;p&gt;Process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data is pulled directly from StatsBomb via the &lt;code&gt;Flow&lt;/code&gt; API.&lt;/li&gt;
&lt;li&gt;We filter for Liverpool’s shots.&lt;/li&gt;
&lt;li&gt;We render them as an interactive Plotly chart.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each marker shows a shot’s location, with hover tooltips displaying the player’s name and exact coordinates.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.viz&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_field&lt;/span&gt;


&lt;span class="c1"&gt;# 1) Pull and prep the data with Flow&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# stringifier (avoids None issues)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statsbomb&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;22912&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Shot&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Liverpool&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;hover_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;player.name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &amp;quot;&lt;/span&gt;
                             &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;location.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, &amp;quot;&lt;/span&gt;
                             &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;location.1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)&amp;quot;&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2) Create the pitch&lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;statsbomb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;horizontal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;full&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;night&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;show_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;show_legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Liverpool shots (StatsBomb 22912)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subtitle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hover for player + shot location&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3) Plot the shot map&lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;location.0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;location.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hover&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hover_text&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div&gt;
        &lt;script type="text/javascript"&gt;window.PlotlyConfig = { MathJaxConfig: 'local' };&lt;/script&gt;
        &lt;script charset="utf-8" src="https://cdn.plot.ly/plotly-3.0.1.min.js"&gt;&lt;/script&gt;
        &lt;div id="aa19b721-31a4-482b-9c34-03fb4c718716" class="plotly-graph-div" style="height:68vh; width:100%;"&gt;&lt;/div&gt;
        &lt;script
                type="text/javascript"&gt;                window.PLOTLYENV = window.PLOTLYENV || {}; if (document.getElementById("aa19b721-31a4-482b-9c34-03fb4c718716")) { Plotly.newPlot("aa19b721-31a4-482b-9c34-03fb4c718716", [{ "hoverinfo": "skip", "line": { "color": "#90caf9" }, "mode": "lines", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAACAL0BorbUxn9gvQIjkujh0FzBAu8wqlWRBMEBuoBemF2owQE3GwqeEkTBANCulHKO3MEDVU0PPatwwQIfq8dPT\u002fzBAzWeKitYhMUCLdw+ga0IxQGPAQBCMYTFASLYdJzF\u002fMUDqJFeCVJsxQIYirxLwtTFAJB9HHf7OMUCAyNs8eeYxQHZ+7mJc\u002fDFA6hfc2KIQMkA1vOBASCMyQASZCJdINDJA8kANMqBDMkATgx\u002fES1EyQBKQnVtIXTJAnEW1Y5NnMkAdffKkKnAyQEA\u002fuUUMdzJAz8GryjZ8MkD8GfwWqX8yQIiRqWxigTJAiJGpbGKBMkD8GfwWqX8yQM\u002fBq8o2fDJAQD+5RQx3MkAdffKkKnAyQJxFtWOTZzJAEpCdW0hdMkATgx\u002fES1EyQPJADTKgQzJABJkIl0g0MkA1vOBASCMyQOoX3NiiEDJAdn7uYlz8MUCByNs8eeYxQCQfRx3+zjFAhiKvEvC1MUDqJFeCVJsxQEi2HScxfzFAY8BAEIxhMUCLdw+ga0IxQM1niorWITFAiOrx09P\u002fMEDVU0PPatwwQDQrpRyjtzBATcbCp4SRMEBuoBemF2owQLzMKpVkQTBAieS6OHQXMEBorbUxn9gvQAAAAAAAgC9A" }, "y": { "dtype": "f8", "bdata": "MvWTMpP0O0CiCQ3fMxw8QHJkvT8ZRTxAgMQQhjpvPECnpGufjpo8QBFoHzcMxzxAO8FsuKn0PEDd5JNQXSM9QO8W8vAcUz1ANx0sUd6DPUDyH2XxlrU9QIl+gRw86D1Aqxt16sIbPkCkoZxCIFA+QIU9Id5IhT5AVE1mSjG7PkBoe4DrzfE+QAq\u002ftf4SKT9AgbgFnfRgP0Dy3Lm9Zpk\u002fQMLk\u002fDhd0j9AynY85eUFQEDS3\u002f0J0yJAQGVSENDvP0BALp9\u002f8jVdQED+Ym8jn3pAQEGSdg0lmEBApKT8VMG1QEC8FZiZbdNAQDXvbXcj8UBAyxCSiNwOQUBD6mdmkixBQFxbA6s+SkFAv22J8tpnQUACnZDcYIVBQNJggA3KokFAm63vLxDAQUAuIAL2LN1BQDaJwxoa+kFAn42BY9EWQkCHESOhTDNCQMAjfbGFT0JAeyClgHZrQkBMwj8KGYdCQFbZzFpnokJAPmHvkFu9QkAur7He79dCQCpyxYoe8kJAu0C\u002f8eELQ0AHcE2HNCVDQGTxadcQPkNAiPSGh3FWQ0CRDbZXUW5DQGKfySOrhUNA+Etw5HmcQ0CsLUqwuLJDQMCd97xiyENAx00hYHPdQ0Ave3kQ5vFDQGcFtma2BURA" }, "type": "scatter" }, { "hoverinfo": "skip", "line": { "color": "#90caf9" }, "mode": "lines", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAABQVkBTSskZ7ERWQN5G0fEiOlZA0Uy12qYvVkDlF3oWeiVWQG1OD9aeG1ZAM7XWOBcSVkALKy9M5QhWQF6FAwsLAFZADWZdXYr3VUAdIvwXZe9VQOfP7\u002fuc51VAbpI4tjPgVUDGNmrfKtlVQF83VPuD0lVANziueEDMVUDgDcmwYcZVQGJgROfowFVABfrISde7VUDz0MfvLbdVQL\u002fZPdrtslVAxK988xevVUA7H\u002fgOratVQPubGOmtqFVAma4SJxumVUC5YMNW9aNVQDCwke48olVAjA9VTfKgVUCB+UC6FaBVQJ6b1WSnn1VAnpvVZKefVUCB+UC6FaBVQIwPVU3yoFVAMLCR7jyiVUC5YMNW9aNVQJmuEicbplVA+5sY6a2oVUA7H\u002fgOratVQMSvfPMXr1VAv9k92u2yVUDz0MfvLbdVQAX6yEnXu1VAYmBE5+jAVUDgDcmwYcZVQDc4rnhAzFVAXzdU+4PSVUDFNmrfKtlVQG6SOLYz4FVA58\u002fv+5znVUAdIvwXZe9VQA1mXV2K91VAXoUDCwsAVkALKy9M5QhWQDO11jgXElZAbU4P1p4bVkDkF3oWeiVWQNFMtdqmL1ZA3kbR8SI6VkBTSskZ7ERWQAAAAAAAUFZA" }, "y": { "dtype": "f8", "bdata": "ZwW2ZrYFREAve3kQ5vFDQMdNIWBz3UNAwJ33vGLIQ0CsLUqwuLJDQPdLcOR5nENAYp\u002fJI6uFQ0CSDbZXUW5DQIn0hodxVkNAZfFp1xA+Q0AIcE2HNCVDQLxAv\u002fHhC0NAK3LFih7yQkAur7He79dCQD5h75BbvUJAVtnMWmeiQkBMwj8KGYdCQHsgpYB2a0JAvyN9sYVPQkCHESOhTDNCQJ+NgWPRFkJANonDGhr6QUAuIAL2LN1BQJqt7y8QwEFA0mCADcqiQUACnZDcYIVBQMBtifLaZ0FAXFsDqz5KQUBE6mdmkixBQMsQkojcDkFANe9tdyPxQEC9FZiZbdNAQKSk\u002fFTBtUBAQJJ2DSWYQED+Ym8jn3pAQC6ff\u002fI1XUBAZlIQ0O8\u002fQEDS3\u002f0J0yJAQMp2POXlBUBAwuT8OF3SP0Dz3Lm9Zpk\u002fQIK4BZ30YD9AC7+1\u002fhIpP0Boe4DrzfE+QFRNZkoxuz5AhT0h3kiFPkCkoZxCIFA+QKsbderCGz5AiX6BHDzoPUDyH2XxlrU9QDcdLFHegz1A7hby8BxTPUDc5JNQXSM9QDrBbLip9DxAEWgfNwzHPECnpGufjpo8QIDEEIY6bzxAcmS9PxlFPECiCQ3fMxw8QDP1kzKT9DtA" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAAAAJUA=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAACgV0A=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAABASkA=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "hoverinfo": "text", "hovertext": ["Mohamed Salah: (108.2, 40.1)", "Trent Alexander-Arnold: (90.2, 59.3)", "Mohamed Salah: (95.2, 47.2)", "Mohamed Salah: (113.0, 59.5)", "Andrew Robertson: (98.4, 20.4)", "Mohamed Salah: (97.6, 37.4)", "Jordan Brian Henderson: (89.0, 47.9)", "Mohamed Salah: (94.0, 31.3)", "Fábio Henrique Tavares: (88.6, 39.0)", "Mohamed Salah: (110.2, 52.7)", "James Philip Milner: (103.1, 45.9)", "Jordan Brian Henderson: (93.6, 53.4)", "Virgil van Dijk: (106.0, 38.9)", "Divock Okoth Origi: (106.1, 31.6)"], "marker": { "color": "#ffca28", "size": 10 }, "mode": "markers", "showlegend": false, "x": [94.675, 78.925, 83.3, 98.875, 86.10000000000001, 85.39999999999999, 77.875, 82.25, 77.52499999999999, 96.425, 90.21249999999999, 81.89999999999999, 92.75, 92.83749999999999], "y": [33.915, 17.595000000000002, 27.879999999999995, 17.425, 50.66, 36.21, 27.285, 41.395, 34.85, 23.205, 28.985, 22.61, 34.935, 41.14], "type": "scatter" }], { "template": { "data": { "histogram2dcontour": [{ "type": "histogram2dcontour", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "choropleth": [{ "type": "choropleth", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "histogram2d": [{ "type": "histogram2d", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "heatmap": [{ "type": "heatmap", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "contourcarpet": [{ "type": "contourcarpet", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "contour": [{ "type": "contour", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "surface": [{ "type": "surface", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "mesh3d": [{ "type": "mesh3d", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "scatter": [{ "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" }], "parcoords": [{ "type": "parcoords", "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterpolargl": [{ "type": "scatterpolargl", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "bar": [{ "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" }], "scattergeo": [{ "type": "scattergeo", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterpolar": [{ "type": "scatterpolar", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "histogram": [{ "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" }], "scattergl": [{ "type": "scattergl", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatter3d": [{ "type": "scatter3d", "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattermap": [{ "type": "scattermap", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattermapbox": [{ "type": "scattermapbox", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterternary": [{ "type": "scatterternary", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattercarpet": [{ "type": "scattercarpet", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "carpet": [{ "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" }], "table": [{ "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" }], "barpolar": [{ "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" }], "pie": [{ "automargin": true, "type": "pie" }] }, "layout": { "autotypenumbers": "strict", "colorway": ["#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52"], "font": { "color": "#2a3f5f" }, "hovermode": "closest", "hoverlabel": { "align": "left" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "bgcolor": "#E5ECF6", "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "ternary": { "bgcolor": "#E5ECF6", "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "sequential": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "sequentialminus": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "diverging": [[0, "#8e0152"], [0.1, "#c51b7d"], [0.2, "#de77ae"], [0.3, "#f1b6da"], [0.4, "#fde0ef"], [0.5, "#f7f7f7"], [0.6, "#e6f5d0"], [0.7, "#b8e186"], [0.8, "#7fbc41"], [0.9, "#4d9221"], [1, "#276419"]] }, "xaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "automargin": true, "zerolinewidth": 2 }, "yaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "automargin": true, "zerolinewidth": 2 }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "geo": { "bgcolor": "white", "landcolor": "#E5ECF6", "subunitcolor": "white", "showland": true, "showlakes": true, "lakecolor": "white" }, "title": { "x": 0.05 }, "mapbox": { "style": "light" } } }, "margin": { "l": 0, "r": 0, "t": 80, "b": 0 }, "font": { "family": "Helvetica Neue, Arial, sans-serif", "color": "#90caf9" }, "title": { "text": "\u003cb\u003eLiverpool shots (StatsBomb 22912)\u003c\u002fb\u003e", "x": 0.5, "xanchor": "center" }, "hoverlabel": { "font": { "family": "Helvetica Neue, Arial, sans-serif", "size": 16, "color": "white" }, "bgcolor": "rgba(255,255,255,0.15)", "bordercolor": "rgba(144,202,249,0.6)" }, "shapes": [{ "line": { "color": "#90caf9" }, "type": "rect", "x0": 0, "x1": 105.0, "y0": 0, "y1": 68.0 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 52.5, "x1": 52.5, "y0": 68.0, "y1": 0.0 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 0.0, "x1": 15.75, "y0": 52.699999999999996, "y1": 15.299999999999999 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 89.25, "x1": 105.0, "y0": 52.699999999999996, "y1": 15.299999999999999 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 0.0, "x1": 5.25, "y0": 42.5, "y1": 25.5 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 99.75, "x1": 105.0, "y0": 42.5, "y1": 25.5 }, { "line": { "color": "#90caf9" }, "type": "circle", "x0": 44.49375, "x1": 60.50625, "y0": 25.99375, "y1": 42.00625 }], "plot_bgcolor": "#0f1e2e", "paper_bgcolor": "#0f1e2e", "showlegend": false, "xaxis": { "range": [-5, 110.0], "scaleanchor": "y", "constrain": "domain", "fixedrange": true, "showgrid": false, "zeroline": false, "showticklabels": false, "visible": false }, "yaxis": { "range": [-5, 73.0], "constrain": "domain", "fixedrange": true, "showgrid": false, "zeroline": false, "showticklabels": false, "visible": false }, "annotations": [{ "font": { "color": "#90caf9", "family": "Helvetica Neue, Arial, sans-serif", "size": 16 }, "showarrow": false, "text": "Hover for player + shot location", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.04, "yanchor": "top", "yref": "paper" }] }, { "responsive": true, "displayModeBar": false, "displaylogo": false }) };            &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;&lt;/br&gt;&lt;/p&gt;
&lt;h2 id="example-layering-plots"&gt;Example: Layering Plots&lt;/h2&gt;
&lt;p&gt;One of the most powerful features of the new &lt;code&gt;Pitch&lt;/code&gt; API is &lt;strong&gt;layering&lt;/strong&gt; - the ability to stack multiple plot types in a single interactive view.&lt;/p&gt;
&lt;p&gt;You can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with a clean base pitch.&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;scatter plots&lt;/strong&gt; for individual events.&lt;/li&gt;
&lt;li&gt;Overlay &lt;strong&gt;heatmaps&lt;/strong&gt; or &lt;strong&gt;kernel density estimates&lt;/strong&gt; for spatial intensity.&lt;/li&gt;
&lt;li&gt;Draw &lt;strong&gt;arrows&lt;/strong&gt; to indicate passes or runs.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;“comet” trails&lt;/strong&gt; to visualise player or ball movement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is fully customisable in colour, opacity, and tooltip content.&lt;/li&gt;
&lt;li&gt;Can be shown, hidden, or reordered at any time.&lt;/li&gt;
&lt;li&gt;Works seamlessly together to create rich, multi-dimensional visualisations.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.viz&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;

&lt;span class="c1"&gt;# 1) Pull and prep the data with Flow&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statsbomb&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;22912&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Pass&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Liverpool&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hover_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;player.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;: &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2) Create the pitch&lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;statsbomb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;horizontal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;full&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;night&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;show_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;show_legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Liverpool Passes (StatsBomb 22912)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subtitle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hover for player + location&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3) Plot a smooth KDE (continuous) with fine grid&lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;location.0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;location.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;show_colorbar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;colorscale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Viridis&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;opacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4) Overlay the raw points &lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;location.0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;location.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;white&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hover&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hover_text&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5) Show the result&lt;/span&gt;
&lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div&gt;
        &lt;script type="text/javascript"&gt;window.PlotlyConfig = { MathJaxConfig: 'local' };&lt;/script&gt;
        &lt;script charset="utf-8" src="https://cdn.plot.ly/plotly-3.0.1.min.js"&gt;&lt;/script&gt;
        &lt;div id="75a6df89-a4d1-4330-a5fe-3b4576c19c51" class="plotly-graph-div" style="height:68vh; width:100%;"&gt;&lt;/div&gt;
        &lt;script
                type="text/javascript"&gt;                window.PLOTLYENV = window.PLOTLYENV || {}; if (document.getElementById("75a6df89-a4d1-4330-a5fe-3b4576c19c51")) { Plotly.newPlot("75a6df89-a4d1-4330-a5fe-3b4576c19c51", [{ "hoverinfo": "skip", "line": { "color": "#90caf9" }, "mode": "lines", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAACAL0BorbUxn9gvQIjkujh0FzBAu8wqlWRBMEBuoBemF2owQE3GwqeEkTBANCulHKO3MEDVU0PPatwwQIfq8dPT\u002fzBAzWeKitYhMUCLdw+ga0IxQGPAQBCMYTFASLYdJzF\u002fMUDqJFeCVJsxQIYirxLwtTFAJB9HHf7OMUCAyNs8eeYxQHZ+7mJc\u002fDFA6hfc2KIQMkA1vOBASCMyQASZCJdINDJA8kANMqBDMkATgx\u002fES1EyQBKQnVtIXTJAnEW1Y5NnMkAdffKkKnAyQEA\u002fuUUMdzJAz8GryjZ8MkD8GfwWqX8yQIiRqWxigTJAiJGpbGKBMkD8GfwWqX8yQM\u002fBq8o2fDJAQD+5RQx3MkAdffKkKnAyQJxFtWOTZzJAEpCdW0hdMkATgx\u002fES1EyQPJADTKgQzJABJkIl0g0MkA1vOBASCMyQOoX3NiiEDJAdn7uYlz8MUCByNs8eeYxQCQfRx3+zjFAhiKvEvC1MUDqJFeCVJsxQEi2HScxfzFAY8BAEIxhMUCLdw+ga0IxQM1niorWITFAiOrx09P\u002fMEDVU0PPatwwQDQrpRyjtzBATcbCp4SRMEBuoBemF2owQLzMKpVkQTBAieS6OHQXMEBorbUxn9gvQAAAAAAAgC9A" }, "y": { "dtype": "f8", "bdata": "MvWTMpP0O0CiCQ3fMxw8QHJkvT8ZRTxAgMQQhjpvPECnpGufjpo8QBFoHzcMxzxAO8FsuKn0PEDd5JNQXSM9QO8W8vAcUz1ANx0sUd6DPUDyH2XxlrU9QIl+gRw86D1Aqxt16sIbPkCkoZxCIFA+QIU9Id5IhT5AVE1mSjG7PkBoe4DrzfE+QAq\u002ftf4SKT9AgbgFnfRgP0Dy3Lm9Zpk\u002fQMLk\u002fDhd0j9AynY85eUFQEDS3\u002f0J0yJAQGVSENDvP0BALp9\u002f8jVdQED+Ym8jn3pAQEGSdg0lmEBApKT8VMG1QEC8FZiZbdNAQDXvbXcj8UBAyxCSiNwOQUBD6mdmkixBQFxbA6s+SkFAv22J8tpnQUACnZDcYIVBQNJggA3KokFAm63vLxDAQUAuIAL2LN1BQDaJwxoa+kFAn42BY9EWQkCHESOhTDNCQMAjfbGFT0JAeyClgHZrQkBMwj8KGYdCQFbZzFpnokJAPmHvkFu9QkAur7He79dCQCpyxYoe8kJAu0C\u002f8eELQ0AHcE2HNCVDQGTxadcQPkNAiPSGh3FWQ0CRDbZXUW5DQGKfySOrhUNA+Etw5HmcQ0CsLUqwuLJDQMCd97xiyENAx00hYHPdQ0Ave3kQ5vFDQGcFtma2BURA" }, "type": "scatter" }, { "hoverinfo": "skip", "line": { "color": "#90caf9" }, "mode": "lines", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAABQVkBTSskZ7ERWQN5G0fEiOlZA0Uy12qYvVkDlF3oWeiVWQG1OD9aeG1ZAM7XWOBcSVkALKy9M5QhWQF6FAwsLAFZADWZdXYr3VUAdIvwXZe9VQOfP7\u002fuc51VAbpI4tjPgVUDGNmrfKtlVQF83VPuD0lVANziueEDMVUDgDcmwYcZVQGJgROfowFVABfrISde7VUDz0MfvLbdVQL\u002fZPdrtslVAxK988xevVUA7H\u002fgOratVQPubGOmtqFVAma4SJxumVUC5YMNW9aNVQDCwke48olVAjA9VTfKgVUCB+UC6FaBVQJ6b1WSnn1VAnpvVZKefVUCB+UC6FaBVQIwPVU3yoFVAMLCR7jyiVUC5YMNW9aNVQJmuEicbplVA+5sY6a2oVUA7H\u002fgOratVQMSvfPMXr1VAv9k92u2yVUDz0MfvLbdVQAX6yEnXu1VAYmBE5+jAVUDgDcmwYcZVQDc4rnhAzFVAXzdU+4PSVUDFNmrfKtlVQG6SOLYz4FVA58\u002fv+5znVUAdIvwXZe9VQA1mXV2K91VAXoUDCwsAVkALKy9M5QhWQDO11jgXElZAbU4P1p4bVkDkF3oWeiVWQNFMtdqmL1ZA3kbR8SI6VkBTSskZ7ERWQAAAAAAAUFZA" }, "y": { "dtype": "f8", "bdata": "ZwW2ZrYFREAve3kQ5vFDQMdNIWBz3UNAwJ33vGLIQ0CsLUqwuLJDQPdLcOR5nENAYp\u002fJI6uFQ0CSDbZXUW5DQIn0hodxVkNAZfFp1xA+Q0AIcE2HNCVDQLxAv\u002fHhC0NAK3LFih7yQkAur7He79dCQD5h75BbvUJAVtnMWmeiQkBMwj8KGYdCQHsgpYB2a0JAvyN9sYVPQkCHESOhTDNCQJ+NgWPRFkJANonDGhr6QUAuIAL2LN1BQJqt7y8QwEFA0mCADcqiQUACnZDcYIVBQMBtifLaZ0FAXFsDqz5KQUBE6mdmkixBQMsQkojcDkFANe9tdyPxQEC9FZiZbdNAQKSk\u002fFTBtUBAQJJ2DSWYQED+Ym8jn3pAQC6ff\u002fI1XUBAZlIQ0O8\u002fQEDS3\u002f0J0yJAQMp2POXlBUBAwuT8OF3SP0Dz3Lm9Zpk\u002fQIK4BZ30YD9AC7+1\u002fhIpP0Boe4DrzfE+QFRNZkoxuz5AhT0h3kiFPkCkoZxCIFA+QKsbderCGz5AiX6BHDzoPUDyH2XxlrU9QDcdLFHegz1A7hby8BxTPUDc5JNQXSM9QDrBbLip9DxAEWgfNwzHPECnpGufjpo8QIDEEIY6bzxAcmS9PxlFPECiCQ3fMxw8QDP1kzKT9DtA" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAAAAJUA=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAACgV0A=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "hoverinfo": "skip", "marker": { "color": "#90caf9", "size": 8 }, "mode": "markers", "showlegend": false, "x": { "dtype": "f8", "bdata": "AAAAAABASkA=" }, "y": { "dtype": "f8", "bdata": "AAAAAAAAQUA=" }, "type": "scatter" }, { "colorbar": { "len": 0.6782608695652174, "lenmode": "fraction", "y": 0.5, "yanchor": "middle" }, "colorscale": [[0.0, "#440154"], [0.1111111111111111, "#482878"], [0.2222222222222222, "#3e4989"], [0.3333333333333333, "#31688e"], [0.4444444444444444, "#26828e"], [0.5555555555555556, "#1f9e89"], [0.6666666666666666, "#35b779"], [0.7777777777777778, "#6ece58"], [0.8888888888888888, "#b5de2b"], [1.0, "#fde725"]], "hovertemplate": "x: %{x:.1f}\u003cbr\u003ey: %{y:.1f}\u003cbr\u003ez: %{z}\u003cextra\u003e\u003c\u002fextra\u003e", "opacity": 0.75, "showscale": true, "x": [53.375, 31.587500000000002, 57.39999999999999, 65.8, 52.0625, 56.612500000000004, 56.262499999999996, 92.83749999999999, 12.6875, 24.675, 39.4625, 38.237500000000004, 43.050000000000004, 36.4, 27.3875, 38.0625, 43.75, 54.425000000000004, 50.75, 74.6375, 70.96249999999999, 80.325, 91.96249999999999, 72.97500000000001, 1.8375000000000001, 8.6625, 1.8375000000000001, 20.212500000000002, 29.574999999999996, 6.125, 6.825, 7.174999999999999, 31.0625, 11.725, 21.175, 105.0, 36.225, 13.2125, 27.912499999999998, 27.212500000000002, 56.262499999999996, 50.050000000000004, 22.224999999999998, 25.724999999999998, 22.837500000000002, 36.050000000000004, 37.362500000000004, 6.125, 12.512500000000001, 9.362499999999999, 21.0, 14.349999999999998, 12.6875, 35.525, 20.5625, 25.724999999999998, 24.849999999999998, 36.574999999999996, 67.46249999999999, 63.96249999999999, 67.46249999999999, 48.9125, 56.78750000000001, 43.050000000000004, 10.85, 19.162499999999998, 36.4, 38.0625, 45.9375, 50.050000000000004, 30.537499999999998, 23.0125, 7.3500000000000005, 10.325000000000001, 31.237500000000004, 57.39999999999999, 60.637499999999996, 61.25, 51.1, 24.849999999999998, 13.5625, 30.887499999999996, 56.08749999999999, 79.97500000000001, 81.46249999999999, 105.0, 80.325, 91.6125, 74.9875, 68.77499999999999, 80.14999999999999, 86.27499999999999, 95.46249999999999, 88.9875, 98.7, 102.02499999999999, 80.5, 76.825, 81.1125, 103.16250000000001, 105.0, 102.2, 80.2375, 31.237500000000004, 47.6, 43.574999999999996, 43.050000000000004, 18.900000000000002, 43.225, 68.075, 72.45, 71.3125, 68.77499999999999, 41.5625, 20.212500000000002, 27.5625, 56.08749999999999, 61.949999999999996, 73.9375, 68.25, 69.125, 32.025, 36.4, 26.5125, 24.849999999999998, 11.200000000000001, 100.8, 12.512500000000001, 61.425000000000004, 68.77499999999999, 96.33749999999999, 88.1125, 89.16250000000001, 30.887499999999996, 10.5, 19.6875, 105.0, 39.375, 66.58749999999999, 60.637499999999996, 91.78750000000001, 72.10000000000001, 86.0125, 94.14999999999999, 79.625, 86.0125, 88.28750000000001, 91.96249999999999, 97.64999999999999, 105.0, 105.0, 67.1125, 50.050000000000004, 49.7875, 50.925000000000004, 43.75, 42.875, 73.58749999999999, 25.724999999999998, 24.2375, 8.8375, 36.574999999999996, 38.5875, 20.7375, 19.6875, 9.1875, 18.900000000000002, 7.5249999999999995, 35.699999999999996, 46.725, 6.125, 63.612500000000004, 31.587500000000002, 31.412499999999998, 1.6624999999999999, 25.287499999999998, 48.125, 57.137499999999996, 60.28750000000001, 19.5125, 32.199999999999996, 42.7875, 55.2125, 36.4, 37.275, 38.15, 46.199999999999996, 55.824999999999996, 61.77499999999999, 72.71249999999999, 87.14999999999999, 72.53750000000001, 12.775, 22.3125, 36.050000000000004, 50.8375, 49.35, 50.225, 44.5375, 2.625, 14.4375, 8.575000000000001, 36.8375, 42.7875, 35.7875, 74.89999999999999, 102.375, 97.47500000000001, 91.175, 76.125, 29.3125, 47.6, 79.71249999999999, 80.325, 73.41250000000001, 85.39999999999999, 15.924999999999999, 6.0375000000000005, 29.487500000000004, 42.7875, 53.112500000000004, 5.25, 11.549999999999999, 3.15, 44.1, 65.1875, 29.487500000000004, 36.487500000000004, 35.612500000000004, 60.28750000000001, 66.2375, 5.25, 61.77499999999999, 82.075, 95.2, 55.475, 36.487500000000004, 38.762499999999996, 59.41250000000001, 47.425000000000004, 35.875, 43.8375, 62.64999999999999, 69.64999999999999, 11.725, 3.6750000000000003, 25.1125, 41.9125, 47.1625, 36.6625, 10.0625, 23.537499999999998, 37.449999999999996, 36.487500000000004, 60.987500000000004, 56.52499999999999, 83.5625, 105.0, 91.96249999999999, 95.8125, 91.52499999999999, 30.275000000000002, 48.9125, 83.91250000000001, 93.0125, 10.0625, 25.025000000000002, 6.0375000000000005, 102.1125, 71.75, 82.425, 48.9125, 11.375, 6.3, 28.875, 46.637499999999996, 60.987500000000004, 66.2375, 73.2375, 86.53750000000001, 7.7875000000000005, 43.050000000000004, 24.412499999999998, 35.612500000000004, 42.0, 37.5375, 11.375, 49.35, 55.300000000000004, 60.987500000000004, 5.25, 23.537499999999998, 60.987500000000004, 11.2875, 37.7125, 51.887499999999996, 48.475, 60.550000000000004, 7.7875000000000005, 38.324999999999996, 105.0, 80.41250000000001, 57.137499999999996, 3.6750000000000003, 63.262499999999996, 10.85, 74.6375, 105.0, 91.175, 5.25, 60.112500000000004, 70.4375, 47.6, 51.275, 38.762499999999996, 14.4375, 2.625, 22.05, 12.6, 13.0375, 35.7875, 55.824999999999996, 85.39999999999999, 81.8125, 34.737500000000004, 49.35], "xbins": { "end": 105.0, "size": 10.5, "start": 0 }, "y": [33.915, 30.599999999999998, 55.92999999999999, 51.51, 51.339999999999996, 52.36, 47.345, 48.11, 61.455, 43.775, 44.71, 67.915, 64.345, 64.345, 29.41, 12.154999999999998, 23.970000000000002, 18.445, 48.279999999999994, 24.904999999999998, 25.245, 12.75, 19.889999999999997, 0.0, 11.645000000000001, 11.815000000000005, 23.8, 0.0, 34.0, 37.315, 55.92999999999999, 37.4, 33.235, 31.62, 42.415, 67.915, 40.375, 30.599999999999998, 9.690000000000005, 3.2299999999999973, 10.370000000000003, 22.354999999999997, 67.915, 43.265, 65.28, 43.095, 50.915, 30.514999999999997, 56.865, 36.125, 24.310000000000002, 17.935000000000002, 48.449999999999996, 61.625, 43.775, 24.139999999999997, 35.19, 14.875, 67.915, 43.095, 37.23, 1.784999999999995, 2.7200000000000024, 10.795000000000002, 28.220000000000002, 41.989999999999995, 64.515, 22.185000000000002, 25.075, 31.279999999999998, 43.605, 54.74, 29.325, 31.62, 53.805, 53.464999999999996, 50.915, 57.035, 63.24, 41.65, 31.279999999999998, 42.754999999999995, 15.895000000000001, 11.134999999999994, 23.46, 67.915, 0.0, 50.065, 60.095, 61.879999999999995, 54.48499999999999, 50.574999999999996, 58.309999999999995, 64.515, 0.0, 10.029999999999998, 16.15, 15.045000000000002, 49.3, 50.574999999999996, 67.915, 47.515, 50.404999999999994, 40.8, 67.915, 31.11, 13.939999999999998, 19.889999999999997, 0.9349999999999952, 35.614999999999995, 32.045, 34.68, 54.145, 22.354999999999997, 44.54, 46.154999999999994, 7.904999999999998, 1.784999999999995, 4.334999999999995, 13.770000000000001, 20.229999999999997, 0.0, 9.179999999999998, 2.209999999999995, 67.915, 57.035, 2.7200000000000024, 29.495, 1.4450000000000023, 3.059999999999995, 1.4450000000000023, 9.35, 1.6150000000000049, 54.315, 40.375, 26.264999999999997, 0.0, 29.325, 19.720000000000002, 17.51, 62.22, 62.9, 47.94, 55.42, 67.915, 58.309999999999995, 23.8, 18.955, 8.245000000000003, 0.0, 67.915, 67.915, 47.345, 28.985, 11.475, 25.075, 42.16, 3.4, 30.770000000000003, 13.770000000000001, 28.220000000000002, 58.05499999999999, 43.435, 41.65, 19.125, 30.770000000000003, 49.3, 32.555, 7.904999999999998, 33.235, 30.514999999999997, 45.05, 31.45, 41.14, 3.2299999999999973, 67.915, 64.25999999999999, 61.625, 56.695, 66.215, 67.915, 67.915, 65.11, 54.91, 63.495, 55.75999999999999, 67.915, 19.04, 0.0, 0.0, 6.034999999999995, 8.159999999999995, 28.474999999999998, 43.86, 1.784999999999995, 18.020000000000003, 20.825, 13.854999999999997, 6.120000000000002, 56.1, 61.795, 57.8, 67.915, 65.96, 65.11, 3.5700000000000025, 58.14, 48.279999999999994, 56.525, 53.04, 45.475, 38.42, 26.435000000000002, 17.764999999999997, 26.945, 53.04, 29.495, 2.125, 29.495, 21.675, 37.654999999999994, 30.599999999999998, 55.25, 39.949999999999996, 12.070000000000002, 6.034999999999995, 45.645, 64.515, 47.089999999999996, 43.435, 26.775, 37.4, 0.0, 0.0, 21.25, 18.36, 30.514999999999997, 54.23, 65.535, 59.584999999999994, 46.239999999999995, 12.579999999999997, 24.564999999999998, 17.34, 19.55, 30.429999999999996, 20.060000000000002, 25.585, 20.995, 57.715, 38.675, 57.715, 50.32, 65.535, 48.96, 36.89, 18.529999999999998, 0.0, 0.0, 0.0, 2.125, 67.915, 64.68499999999999, 22.099999999999998, 29.07, 26.349999999999998, 41.989999999999995, 32.98, 20.229999999999997, 67.915, 59.584999999999994, 38.42, 20.995, 35.955, 43.605, 65.36500000000001, 64.25999999999999, 9.095000000000002, 45.9, 22.014999999999997, 19.04, 63.665000000000006, 48.875, 39.949999999999996, 54.315, 61.795, 23.46, 3.2299999999999973, 1.784999999999995, 0.0, 30.599999999999998, 37.23, 28.05, 37.654999999999994, 65.19500000000001, 62.05, 65.705, 55.504999999999995, 23.885, 2.55, 0.0, 30.26, 38.25, 27.795, 62.9, 41.565, 63.07, 0.0, 36.635, 30.599999999999998, 40.97, 46.324999999999996, 21.25, 38.504999999999995, 22.695, 34.0, 18.955, 29.41, 37.654999999999994, 37.23, 22.525, 21.504999999999995, 15.895000000000001, 19.804999999999996, 23.885, 17.595000000000002], "ybins": { "end": 68.0, "size": 8.5, "start": 0 }, "type": "histogram2d" }, { "hoverinfo": "text", "hovertext": ["Jordan Brian Henderson", "Joël Andre Job Matip", "Fábio Henrique Tavares", "Jordan Brian Henderson", "Virgil van Dijk", "Georginio Wijnaldum", "Jordan Brian Henderson", "Sadio Mané", "Andrew Robertson", "Fábio Henrique Tavares", "Fábio Henrique Tavares", "Andrew Robertson", "Sadio Mané", "Andrew Robertson", "Joël Andre Job Matip", "Trent Alexander-Arnold", "Fábio Henrique Tavares", "Jordan Brian Henderson", "Fábio Henrique Tavares", "Sadio Mané", "Jordan Brian Henderson", "Sadio Mané", "Mohamed Salah", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Alisson Ramsés Becker", "Trent Alexander-Arnold", "Andrew Robertson", "Alisson Ramsés Becker", "Virgil van Dijk", "Alisson Ramsés Becker", "Fábio Henrique Tavares", "Alisson Ramsés Becker", "Virgil van Dijk", "Trent Alexander-Arnold", "Andrew Robertson", "Alisson Ramsés Becker", "Joël Andre Job Matip", "Andrew Robertson", "Trent Alexander-Arnold", "Virgil van Dijk", "Andrew Robertson", "Virgil van Dijk", "Andrew Robertson", "Andrew Robertson", "Andrew Robertson", "Alisson Ramsés Becker", "Virgil van Dijk", "Alisson Ramsés Becker", "Fábio Henrique Tavares", "Joël Andre Job Matip", "Virgil van Dijk", "Andrew Robertson", "Virgil van Dijk", "Joël Andre Job Matip", "Virgil van Dijk", "Joël Andre Job Matip", "Andrew Robertson", "Fábio Henrique Tavares", "Jordan Brian Henderson", "Trent Alexander-Arnold", "Jordan Brian Henderson", "Joël Andre Job Matip", "Alisson Ramsés Becker", "Virgil van Dijk", "Andrew Robertson", "Joël Andre Job Matip", "Fábio Henrique Tavares", "Jordan Brian Henderson", "Fábio Henrique Tavares", "Trent Alexander-Arnold", "Jordan Brian Henderson", "Alisson Ramsés Becker", "Andrew Robertson", "Georginio Wijnaldum", "Sadio Mané", "Georginio Wijnaldum", "Andrew Robertson", "Virgil van Dijk", "Alisson Ramsés Becker", "Virgil van Dijk", "Jordan Brian Henderson", "Mohamed Salah", "Roberto Firmino Barbosa de Oliveira", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Roberto Firmino Barbosa de Oliveira", "Sadio Mané", "Andrew Robertson", "Roberto Firmino Barbosa de Oliveira", "Sadio Mané", "Georginio Wijnaldum", "Andrew Robertson", "Trent Alexander-Arnold", "Jordan Brian Henderson", "Georginio Wijnaldum", "Fábio Henrique Tavares", "Andrew Robertson", "Sadio Mané", "Trent Alexander-Arnold", "Jordan Brian Henderson", "Andrew Robertson", "Virgil van Dijk", "Andrew Robertson", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Joël Andre Job Matip", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Mohamed Salah", "Trent Alexander-Arnold", "Andrew Robertson", "Joël Andre Job Matip", "Virgil van Dijk", "Virgil van Dijk", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Mohamed Salah", "Jordan Brian Henderson", "Fábio Henrique Tavares", "Trent Alexander-Arnold", "Georginio Wijnaldum", "Trent Alexander-Arnold", "Andrew Robertson", "Sadio Mané", "Jordan Brian Henderson", "Alisson Ramsés Becker", "Trent Alexander-Arnold", "Jordan Brian Henderson", "Mohamed Salah", "Roberto Firmino Barbosa de Oliveira", "Mohamed Salah", "Andrew Robertson", "Alisson Ramsés Becker", "Joël Andre Job Matip", "Trent Alexander-Arnold", "Virgil van Dijk", "Mohamed Salah", "Jordan Brian Henderson", "Sadio Mané", "Andrew Robertson", "Georginio Wijnaldum", "Mohamed Salah", "Andrew Robertson", "Roberto Firmino Barbosa de Oliveira", "Roberto Firmino Barbosa de Oliveira", "Jordan Brian Henderson", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Mohamed Salah", "Andrew Robertson", "Virgil van Dijk", "Joël Andre Job Matip", "Trent Alexander-Arnold", "Joël Andre Job Matip", "Virgil van Dijk", "Trent Alexander-Arnold", "Georginio Wijnaldum", "Trent Alexander-Arnold", "Alisson Ramsés Becker", "Andrew Robertson", "Jordan Brian Henderson", "Virgil van Dijk", "Joël Andre Job Matip", "Alisson Ramsés Becker", "Virgil van Dijk", "Alisson Ramsés Becker", "Joël Andre Job Matip", "Mohamed Salah", "Alisson Ramsés Becker", "Sadio Mané", "Joël Andre Job Matip", "Virgil van Dijk", "Trent Alexander-Arnold", "Andrew Robertson", "Georginio Wijnaldum", "Roberto Firmino Barbosa de Oliveira", "Sadio Mané", "Andrew Robertson", "Andrew Robertson", "Andrew Robertson", "Roberto Firmino Barbosa de Oliveira", "Virgil van Dijk", "Andrew Robertson", "Fábio Henrique Tavares", "Andrew Robertson", "Joël Andre Job Matip", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Roberto Firmino Barbosa de Oliveira", "Fábio Henrique Tavares", "Joël Andre Job Matip", "Alisson Ramsés Becker", "Trent Alexander-Arnold", "Roberto Firmino Barbosa de Oliveira", "Fábio Henrique Tavares", "Jordan Brian Henderson", "Trent Alexander-Arnold", "Andrew Robertson", "Sadio Mané", "Andrew Robertson", "Andrew Robertson", "Sadio Mané", "Andrew Robertson", "Trent Alexander-Arnold", "Sadio Mané", "Georginio Wijnaldum", "Roberto Firmino Barbosa de Oliveira", "Andrew Robertson", "Joël Andre Job Matip", "Georginio Wijnaldum", "Georginio Wijnaldum", "Jordan Brian Henderson", "Fábio Henrique Tavares", "Andrew Robertson", "Jordan Brian Henderson", "Joël Andre Job Matip", "Virgil van Dijk", "Jordan Brian Henderson", "Sadio Mané", "Alisson Ramsés Becker", "Virgil van Dijk", "Alisson Ramsés Becker", "Joël Andre Job Matip", "Jordan Brian Henderson", "Virgil van Dijk", "Andrew Robertson", "Virgil van Dijk", "Sadio Mané", "Jordan Brian Henderson", "Alisson Ramsés Becker", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Mohamed Salah", "Fábio Henrique Tavares", "Virgil van Dijk", "James Philip Milner", "Andrew Robertson", "James Philip Milner", "Virgil van Dijk", "Joël Andre Job Matip", "Divock Okoth Origi", "Jordan Brian Henderson", "Virgil van Dijk", "Alisson Ramsés Becker", "Virgil van Dijk", "Joël Andre Job Matip", "Joël Andre Job Matip", "Andrew Robertson", "Alisson Ramsés Becker", "Andrew Robertson", "Fábio Henrique Tavares", "James Philip Milner", "Divock Okoth Origi", "Sadio Mané", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Andrew Robertson", "Divock Okoth Origi", "Sadio Mané", "Mohamed Salah", "Joël Andre Job Matip", "Virgil van Dijk", "Alisson Ramsés Becker", "Mohamed Salah", "Andrew Robertson", "Sadio Mané", "Fábio Henrique Tavares", "Alisson Ramsés Becker", "Alisson Ramsés Becker", "Fábio Henrique Tavares", "Andrew Robertson", "Sadio Mané", "Mohamed Salah", "Andrew Robertson", "Mohamed Salah", "Alisson Ramsés Becker", "Andrew Robertson", "Andrew Robertson", "Fábio Henrique Tavares", "Sadio Mané", "Andrew Robertson", "Alisson Ramsés Becker", "James Philip Milner", "Trent Alexander-Arnold", "Trent Alexander-Arnold", "Alisson Ramsés Becker", "Virgil van Dijk", "Divock Okoth Origi", "Alisson Ramsés Becker", "Andrew Robertson", "Sadio Mané", "Andrew Robertson", "Divock Okoth Origi", "Alisson Ramsés Becker", "James Philip Milner", "James Philip Milner", "Mohamed Salah", "Andrew Robertson", "Alisson Ramsés Becker", "Sadio Mané", "Alisson Ramsés Becker", "Andrew Robertson", "James Philip Milner", "Joël Andre Job Matip", "Alisson Ramsés Becker", "Jordan Brian Henderson", "Divock Okoth Origi", "Trent Alexander-Arnold", "Divock Okoth Origi", "Trent Alexander-Arnold", "Alisson Ramsés Becker", "Joe Gomez", "Fábio Henrique Tavares", "Alisson Ramsés Becker", "Alisson Ramsés Becker", "Fábio Henrique Tavares", "Divock Okoth Origi", "Mohamed Salah", "Divock Okoth Origi", "Joël Andre Job Matip", "Trent Alexander-Arnold"], "marker": { "color": "white", "size": 4 }, "mode": "markers", "showlegend": false, "x": [53.375, 31.587500000000002, 57.39999999999999, 65.8, 52.0625, 56.612500000000004, 56.262499999999996, 92.83749999999999, 12.6875, 24.675, 39.4625, 38.237500000000004, 43.050000000000004, 36.4, 27.3875, 38.0625, 43.75, 54.425000000000004, 50.75, 74.6375, 70.96249999999999, 80.325, 91.96249999999999, 72.97500000000001, 1.8375000000000001, 8.6625, 1.8375000000000001, 20.212500000000002, 29.574999999999996, 6.125, 6.825, 7.174999999999999, 31.0625, 11.725, 21.175, 105.0, 36.225, 13.2125, 27.912499999999998, 27.212500000000002, 56.262499999999996, 50.050000000000004, 22.224999999999998, 25.724999999999998, 22.837500000000002, 36.050000000000004, 37.362500000000004, 6.125, 12.512500000000001, 9.362499999999999, 21.0, 14.349999999999998, 12.6875, 35.525, 20.5625, 25.724999999999998, 24.849999999999998, 36.574999999999996, 67.46249999999999, 63.96249999999999, 67.46249999999999, 48.9125, 56.78750000000001, 43.050000000000004, 10.85, 19.162499999999998, 36.4, 38.0625, 45.9375, 50.050000000000004, 30.537499999999998, 23.0125, 7.3500000000000005, 10.325000000000001, 31.237500000000004, 57.39999999999999, 60.637499999999996, 61.25, 51.1, 24.849999999999998, 13.5625, 30.887499999999996, 56.08749999999999, 79.97500000000001, 81.46249999999999, 105.0, 80.325, 91.6125, 74.9875, 68.77499999999999, 80.14999999999999, 86.27499999999999, 95.46249999999999, 88.9875, 98.7, 102.02499999999999, 80.5, 76.825, 81.1125, 103.16250000000001, 105.0, 102.2, 80.2375, 31.237500000000004, 47.6, 43.574999999999996, 43.050000000000004, 18.900000000000002, 43.225, 68.075, 72.45, 71.3125, 68.77499999999999, 41.5625, 20.212500000000002, 27.5625, 56.08749999999999, 61.949999999999996, 73.9375, 68.25, 69.125, 32.025, 36.4, 26.5125, 24.849999999999998, 11.200000000000001, 100.8, 12.512500000000001, 61.425000000000004, 68.77499999999999, 96.33749999999999, 88.1125, 89.16250000000001, 30.887499999999996, 10.5, 19.6875, 105.0, 39.375, 66.58749999999999, 60.637499999999996, 91.78750000000001, 72.10000000000001, 86.0125, 94.14999999999999, 79.625, 86.0125, 88.28750000000001, 91.96249999999999, 97.64999999999999, 105.0, 105.0, 67.1125, 50.050000000000004, 49.7875, 50.925000000000004, 43.75, 42.875, 73.58749999999999, 25.724999999999998, 24.2375, 8.8375, 36.574999999999996, 38.5875, 20.7375, 19.6875, 9.1875, 18.900000000000002, 7.5249999999999995, 35.699999999999996, 46.725, 6.125, 63.612500000000004, 31.587500000000002, 31.412499999999998, 1.6624999999999999, 25.287499999999998, 48.125, 57.137499999999996, 60.28750000000001, 19.5125, 32.199999999999996, 42.7875, 55.2125, 36.4, 37.275, 38.15, 46.199999999999996, 55.824999999999996, 61.77499999999999, 72.71249999999999, 87.14999999999999, 72.53750000000001, 12.775, 22.3125, 36.050000000000004, 50.8375, 49.35, 50.225, 44.5375, 2.625, 14.4375, 8.575000000000001, 36.8375, 42.7875, 35.7875, 74.89999999999999, 102.375, 97.47500000000001, 91.175, 76.125, 29.3125, 47.6, 79.71249999999999, 80.325, 73.41250000000001, 85.39999999999999, 15.924999999999999, 6.0375000000000005, 29.487500000000004, 42.7875, 53.112500000000004, 5.25, 11.549999999999999, 3.15, 44.1, 65.1875, 29.487500000000004, 36.487500000000004, 35.612500000000004, 60.28750000000001, 66.2375, 5.25, 61.77499999999999, 82.075, 95.2, 55.475, 36.487500000000004, 38.762499999999996, 59.41250000000001, 47.425000000000004, 35.875, 43.8375, 62.64999999999999, 69.64999999999999, 11.725, 3.6750000000000003, 25.1125, 41.9125, 47.1625, 36.6625, 10.0625, 23.537499999999998, 37.449999999999996, 36.487500000000004, 60.987500000000004, 56.52499999999999, 83.5625, 105.0, 91.96249999999999, 95.8125, 91.52499999999999, 30.275000000000002, 48.9125, 83.91250000000001, 93.0125, 10.0625, 25.025000000000002, 6.0375000000000005, 102.1125, 71.75, 82.425, 48.9125, 11.375, 6.3, 28.875, 46.637499999999996, 60.987500000000004, 66.2375, 73.2375, 86.53750000000001, 7.7875000000000005, 43.050000000000004, 24.412499999999998, 35.612500000000004, 42.0, 37.5375, 11.375, 49.35, 55.300000000000004, 60.987500000000004, 5.25, 23.537499999999998, 60.987500000000004, 11.2875, 37.7125, 51.887499999999996, 48.475, 60.550000000000004, 7.7875000000000005, 38.324999999999996, 105.0, 80.41250000000001, 57.137499999999996, 3.6750000000000003, 63.262499999999996, 10.85, 74.6375, 105.0, 91.175, 5.25, 60.112500000000004, 70.4375, 47.6, 51.275, 38.762499999999996, 14.4375, 2.625, 22.05, 12.6, 13.0375, 35.7875, 55.824999999999996, 85.39999999999999, 81.8125, 34.737500000000004, 49.35], "y": [33.915, 30.599999999999998, 55.92999999999999, 51.51, 51.339999999999996, 52.36, 47.345, 48.11, 61.455, 43.775, 44.71, 67.915, 64.345, 64.345, 29.41, 12.154999999999998, 23.970000000000002, 18.445, 48.279999999999994, 24.904999999999998, 25.245, 12.75, 19.889999999999997, 0.0, 11.645000000000001, 11.815000000000005, 23.8, 0.0, 34.0, 37.315, 55.92999999999999, 37.4, 33.235, 31.62, 42.415, 67.915, 40.375, 30.599999999999998, 9.690000000000005, 3.2299999999999973, 10.370000000000003, 22.354999999999997, 67.915, 43.265, 65.28, 43.095, 50.915, 30.514999999999997, 56.865, 36.125, 24.310000000000002, 17.935000000000002, 48.449999999999996, 61.625, 43.775, 24.139999999999997, 35.19, 14.875, 67.915, 43.095, 37.23, 1.784999999999995, 2.7200000000000024, 10.795000000000002, 28.220000000000002, 41.989999999999995, 64.515, 22.185000000000002, 25.075, 31.279999999999998, 43.605, 54.74, 29.325, 31.62, 53.805, 53.464999999999996, 50.915, 57.035, 63.24, 41.65, 31.279999999999998, 42.754999999999995, 15.895000000000001, 11.134999999999994, 23.46, 67.915, 0.0, 50.065, 60.095, 61.879999999999995, 54.48499999999999, 50.574999999999996, 58.309999999999995, 64.515, 0.0, 10.029999999999998, 16.15, 15.045000000000002, 49.3, 50.574999999999996, 67.915, 47.515, 50.404999999999994, 40.8, 67.915, 31.11, 13.939999999999998, 19.889999999999997, 0.9349999999999952, 35.614999999999995, 32.045, 34.68, 54.145, 22.354999999999997, 44.54, 46.154999999999994, 7.904999999999998, 1.784999999999995, 4.334999999999995, 13.770000000000001, 20.229999999999997, 0.0, 9.179999999999998, 2.209999999999995, 67.915, 57.035, 2.7200000000000024, 29.495, 1.4450000000000023, 3.059999999999995, 1.4450000000000023, 9.35, 1.6150000000000049, 54.315, 40.375, 26.264999999999997, 0.0, 29.325, 19.720000000000002, 17.51, 62.22, 62.9, 47.94, 55.42, 67.915, 58.309999999999995, 23.8, 18.955, 8.245000000000003, 0.0, 67.915, 67.915, 47.345, 28.985, 11.475, 25.075, 42.16, 3.4, 30.770000000000003, 13.770000000000001, 28.220000000000002, 58.05499999999999, 43.435, 41.65, 19.125, 30.770000000000003, 49.3, 32.555, 7.904999999999998, 33.235, 30.514999999999997, 45.05, 31.45, 41.14, 3.2299999999999973, 67.915, 64.25999999999999, 61.625, 56.695, 66.215, 67.915, 67.915, 65.11, 54.91, 63.495, 55.75999999999999, 67.915, 19.04, 0.0, 0.0, 6.034999999999995, 8.159999999999995, 28.474999999999998, 43.86, 1.784999999999995, 18.020000000000003, 20.825, 13.854999999999997, 6.120000000000002, 56.1, 61.795, 57.8, 67.915, 65.96, 65.11, 3.5700000000000025, 58.14, 48.279999999999994, 56.525, 53.04, 45.475, 38.42, 26.435000000000002, 17.764999999999997, 26.945, 53.04, 29.495, 2.125, 29.495, 21.675, 37.654999999999994, 30.599999999999998, 55.25, 39.949999999999996, 12.070000000000002, 6.034999999999995, 45.645, 64.515, 47.089999999999996, 43.435, 26.775, 37.4, 0.0, 0.0, 21.25, 18.36, 30.514999999999997, 54.23, 65.535, 59.584999999999994, 46.239999999999995, 12.579999999999997, 24.564999999999998, 17.34, 19.55, 30.429999999999996, 20.060000000000002, 25.585, 20.995, 57.715, 38.675, 57.715, 50.32, 65.535, 48.96, 36.89, 18.529999999999998, 0.0, 0.0, 0.0, 2.125, 67.915, 64.68499999999999, 22.099999999999998, 29.07, 26.349999999999998, 41.989999999999995, 32.98, 20.229999999999997, 67.915, 59.584999999999994, 38.42, 20.995, 35.955, 43.605, 65.36500000000001, 64.25999999999999, 9.095000000000002, 45.9, 22.014999999999997, 19.04, 63.665000000000006, 48.875, 39.949999999999996, 54.315, 61.795, 23.46, 3.2299999999999973, 1.784999999999995, 0.0, 30.599999999999998, 37.23, 28.05, 37.654999999999994, 65.19500000000001, 62.05, 65.705, 55.504999999999995, 23.885, 2.55, 0.0, 30.26, 38.25, 27.795, 62.9, 41.565, 63.07, 0.0, 36.635, 30.599999999999998, 40.97, 46.324999999999996, 21.25, 38.504999999999995, 22.695, 34.0, 18.955, 29.41, 37.654999999999994, 37.23, 22.525, 21.504999999999995, 15.895000000000001, 19.804999999999996, 23.885, 17.595000000000002], "type": "scatter" }], { "template": { "data": { "histogram2dcontour": [{ "type": "histogram2dcontour", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "choropleth": [{ "type": "choropleth", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "histogram2d": [{ "type": "histogram2d", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "heatmap": [{ "type": "heatmap", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "contourcarpet": [{ "type": "contourcarpet", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "contour": [{ "type": "contour", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "surface": [{ "type": "surface", "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]] }], "mesh3d": [{ "type": "mesh3d", "colorbar": { "outlinewidth": 0, "ticks": "" } }], "scatter": [{ "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" }], "parcoords": [{ "type": "parcoords", "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterpolargl": [{ "type": "scatterpolargl", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "bar": [{ "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" }], "scattergeo": [{ "type": "scattergeo", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterpolar": [{ "type": "scatterpolar", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "histogram": [{ "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" }], "scattergl": [{ "type": "scattergl", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatter3d": [{ "type": "scatter3d", "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattermap": [{ "type": "scattermap", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattermapbox": [{ "type": "scattermapbox", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scatterternary": [{ "type": "scatterternary", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "scattercarpet": [{ "type": "scattercarpet", "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } } }], "carpet": [{ "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" }], "table": [{ "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" }], "barpolar": [{ "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" }], "pie": [{ "automargin": true, "type": "pie" }] }, "layout": { "autotypenumbers": "strict", "colorway": ["#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52"], "font": { "color": "#2a3f5f" }, "hovermode": "closest", "hoverlabel": { "align": "left" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "bgcolor": "#E5ECF6", "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "ternary": { "bgcolor": "#E5ECF6", "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "sequential": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "sequentialminus": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "diverging": [[0, "#8e0152"], [0.1, "#c51b7d"], [0.2, "#de77ae"], [0.3, "#f1b6da"], [0.4, "#fde0ef"], [0.5, "#f7f7f7"], [0.6, "#e6f5d0"], [0.7, "#b8e186"], [0.8, "#7fbc41"], [0.9, "#4d9221"], [1, "#276419"]] }, "xaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "automargin": true, "zerolinewidth": 2 }, "yaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "automargin": true, "zerolinewidth": 2 }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white", "gridwidth": 2 } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "geo": { "bgcolor": "white", "landcolor": "#E5ECF6", "subunitcolor": "white", "showland": true, "showlakes": true, "lakecolor": "white" }, "title": { "x": 0.05 }, "mapbox": { "style": "light" } } }, "margin": { "l": 0, "r": 0, "t": 80, "b": 0 }, "font": { "family": "Helvetica Neue, Arial, sans-serif", "color": "#90caf9" }, "title": { "text": "\u003cb\u003eLiverpool Passes (StatsBomb 22912)\u003c\u002fb\u003e", "x": 0.5, "xanchor": "center" }, "hoverlabel": { "font": { "family": "Helvetica Neue, Arial, sans-serif", "size": 16, "color": "white" }, "bgcolor": "rgba(255,255,255,0.15)", "bordercolor": "rgba(144,202,249,0.6)" }, "shapes": [{ "line": { "color": "#90caf9" }, "type": "rect", "x0": 0, "x1": 105.0, "y0": 0, "y1": 68.0 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 52.5, "x1": 52.5, "y0": 68.0, "y1": 0.0 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 0.0, "x1": 15.75, "y0": 52.699999999999996, "y1": 15.299999999999999 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 89.25, "x1": 105.0, "y0": 52.699999999999996, "y1": 15.299999999999999 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 0.0, "x1": 5.25, "y0": 42.5, "y1": 25.5 }, { "line": { "color": "#90caf9" }, "type": "rect", "x0": 99.75, "x1": 105.0, "y0": 42.5, "y1": 25.5 }, { "line": { "color": "#90caf9" }, "type": "circle", "x0": 44.49375, "x1": 60.50625, "y0": 25.99375, "y1": 42.00625 }], "plot_bgcolor": "#0f1e2e", "paper_bgcolor": "#0f1e2e", "showlegend": false, "xaxis": { "range": [-5, 110.0], "scaleanchor": "y", "constrain": "domain", "fixedrange": true, "showgrid": false, "zeroline": false, "showticklabels": false, "visible": false }, "yaxis": { "range": [-5, 73.0], "constrain": "domain", "fixedrange": true, "showgrid": false, "zeroline": false, "showticklabels": false, "visible": false }, "annotations": [{ "font": { "color": "#90caf9", "family": "Helvetica Neue, Arial, sans-serif", "size": 16 }, "showarrow": false, "text": "Hover for player name", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.04, "yanchor": "top", "yref": "paper" }] }, { "responsive": true, "displayModeBar": false, "displaylogo": false }) };            &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;&lt;/br&gt;&lt;/p&gt;
&lt;h2 id="pitch-documentation"&gt;Pitch Documentation&lt;/h2&gt;
&lt;p&gt;You can explore the full range of &lt;code&gt;Pitch&lt;/code&gt; features in the &lt;code&gt;penaltyblog&lt;/code&gt; &lt;a href="https://penaltyblog.readthedocs.io/en/latest/viz/index.html"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is the first release of &lt;code&gt;Pitch&lt;/code&gt;, so there may be a few edge cases I haven’t encountered yet. If you run into any unexpected behavior, please:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open an issue on &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Or send me a message directly &lt;a href="https://pena.lt/y/contact"&gt;via the blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your feedback will help refine Pitch and shape future updates.&lt;/p&gt;
&lt;h1 id="blazingly-fast-goal-models"&gt;Blazingly Fast Goal Models&lt;/h1&gt;
&lt;p&gt;Goal model fitting in &lt;code&gt;penaltyblog&lt;/code&gt; is now &lt;strong&gt;5–10x faster&lt;/strong&gt; thanks to two major changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Analytical Jacobian gradients&lt;/strong&gt; are now used during optimisation, replacing slower numerical approximations.&lt;/li&gt;
&lt;li&gt;More core routines are implemented in &lt;strong&gt;Cython&lt;/strong&gt; for low-level performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With exact gradient information, the optimiser converges more quickly and with greater numerical stability. In practice, you can fit models on large datasets in a fraction of the time, without sacrificing the accuracy or robustness of the original implementation.&lt;/p&gt;
&lt;h2 id="more-control-over-optimisation"&gt;More Control Over Optimisation&lt;/h2&gt;
&lt;p&gt;The updated fitting API also exposes a &lt;code&gt;minimizer_options&lt;/code&gt; parameter, allowing you to pass arguments directly to &lt;code&gt;scipy.optimize.minimize&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This means you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increase &lt;code&gt;maxiter&lt;/code&gt; for tricky fits.&lt;/li&gt;
&lt;li&gt;Tighten &lt;code&gt;gtol&lt;/code&gt; or &lt;code&gt;ftol&lt;/code&gt; for higher precision.&lt;/li&gt;
&lt;li&gt;Fine-tune convergence settings without modifying library code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Combined with the new Cython-powered gradients, this flexibility makes it easier to adapt training to &lt;strong&gt;different datasets, convergence requirements, and performance constraints&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="example-dixoncoles-model-with-custom-optimisation"&gt;Example: Dixon–Coles Model with Custom Optimisation&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DixonColesGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ga&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;th&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;use_gradient&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;# optional; can be False for back-compat&lt;/span&gt;
    &lt;span class="n"&gt;minimizer_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;               &lt;span class="c1"&gt;# optional; passes to `scipy.optimize.minimize`&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;maxiter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# more iterations if needed&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;gtol&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;# gradient tolerance&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;ftol&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1e-9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;# function tolerance&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="improved-footballprobabilitygrid"&gt;Improved &lt;code&gt;FootballProbabilityGrid&lt;/code&gt;&lt;/h1&gt;
&lt;p&gt;Every &lt;code&gt;penaltyblog&lt;/code&gt; goals model returns a &lt;code&gt;FootballProbabilityGrid&lt;/code&gt; from &lt;code&gt;.predict().&lt;/code&gt; It contains the &lt;strong&gt;full scoreline probability matrix&lt;/strong&gt; for a match and makes it easy to calculate &lt;strong&gt;market-ready probabilities&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The latest update adds normalization, expanded markets, better Asian Handicap handling, and richer representation, all while maintaining backwards compatibility.&lt;/p&gt;
&lt;p&gt;The new implementation includes:&lt;/p&gt;
&lt;h2 id="1-optional-normalization"&gt;1. Optional Normalization&lt;/h2&gt;
&lt;p&gt;You can now pass &lt;code&gt;normalize=True&lt;/code&gt; (default) to &lt;code&gt;.predict()&lt;/code&gt; to ensure the probability grid sums to &lt;strong&gt;exactly 1.0&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Why it matters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Without normalization&lt;/strong&gt;: Preserves the true Poisson/Dixon–Coles mass beyond max_goals (purist approach), but markets like 1X2 + totals may not sum exactly to 1.0.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;With normalization&lt;/strong&gt;: Rescales probabilities so all markets are internally consistent - important for pricing, trading, and backtesting.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a more detailed discussion, see the &lt;a href="https://penaltyblog.readthedocs.io/en/latest/models/football_prob_grid.html"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="2-expanded-market-calculations"&gt;2. Expanded Market Calculations&lt;/h2&gt;
&lt;p&gt;The grid now computes more markets out of the box:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1X2&lt;/strong&gt;: &lt;code&gt;home_win&lt;/code&gt;, &lt;code&gt;draw&lt;/code&gt;, &lt;code&gt;away_win&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Convenience&lt;/strong&gt;: &lt;code&gt;home_draw_away&lt;/code&gt; → &lt;code&gt;[home_win, draw, away_win]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Both Teams to Score (BTTS)&lt;/strong&gt;: &lt;code&gt;btts_yes&lt;/code&gt;, &lt;code&gt;btts_no&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Totals&lt;/strong&gt;: &lt;code&gt;total_goals(over_under, strike)&lt;/code&gt; for any goal line (e.g. 2.5, 3.0)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asian Handicap&lt;/strong&gt;: &lt;code&gt;asian_handicap&lt;/code&gt; and &lt;code&gt;asian_handicap_probs&lt;/code&gt; with push handling&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="3-push-aware-asian-handicap-calculations"&gt;3. Push-Aware Asian Handicap Calculations&lt;/h2&gt;
&lt;p&gt;The updated asian_handicap method correctly handles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Full win&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Half win / Half loss&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Push&lt;/strong&gt; (stake returned)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It supports &lt;strong&gt;quarter, half, and full goal handicaps&lt;/strong&gt;, making it possible to price complex AH markets directly from the grid — no manual matrix work required.&lt;/p&gt;
&lt;h2 id="4-richer-representation"&gt;4. Richer Representation&lt;/h2&gt;
&lt;p&gt;Calling &lt;code&gt;repr()&lt;/code&gt; now shows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model name&lt;/li&gt;
&lt;li&gt;Home/Away goal expectations&lt;/li&gt;
&lt;li&gt;Core 1X2 probabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Great for quick inspection in interactive sessions.&lt;/p&gt;
&lt;h2 id="5-backwards-compatibility"&gt;5. Backwards Compatibility&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;All previous methods still work.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;normalize&lt;/code&gt; defaults to &lt;code&gt;True&lt;/code&gt; for consistency, but you can disable it for raw probabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="example-predicting-using-the-grid"&gt;Example: Predicting &amp;amp; Using the Grid&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fit your goals model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Arsenal&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Chelsea&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Core markets (1x2)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Home win), P(Draw), P(Away win):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Home win):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_win&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Draw):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Away win):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;away_win&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Goal expectancy&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home xG:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Away xG:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;away_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Both teams to score&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BTTS (Yes):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;btts_yes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BTTS (No):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;btts_no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Totals&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Totals 2.0  -&amp;gt; Under, Push, Over:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;totals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Totals 2.5  -&amp;gt; Under, Push, Over:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Over 2.5):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_goals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;over&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Asian Handicaps&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AH Home -0.5  (win prob only):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asian_handicap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AH Home -0.25 (Win/Push/Lose):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asian_handicap_probs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AH Away +1.0  (Win/Push/Lose):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asian_handicap_probs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Double chance and DNB&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Double chance 1X:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;double_chance_1x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Double chance X2:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;double_chance_x2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Double chance 12:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;double_chance_12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;DNB Home (conditional win prob):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw_no_bet_home&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;DNB Away (conditional win prob):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw_no_bet_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Exact scores and distributions&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;P(Exact score 2-1):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exact_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home goal distribution (P(H=k)):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_goal_distribution&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Away goal distribution (P(A=k)):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;away_goal_distribution&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Total goals distribution (P(T=k)):&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_goals_distribution&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With the improved &lt;code&gt;FootballProbabilityGrid&lt;/code&gt;, you can go from a fitted model to fully priced betting markets in one step - no direct NumPy work required.&lt;/p&gt;
&lt;h1 id="flow-query-dsl"&gt;Flow Query DSL&lt;/h1&gt;
&lt;p&gt;&lt;code&gt;Flow.query(expr)&lt;/code&gt; lets you filter records with a &lt;strong&gt;compact, readable string expression&lt;/strong&gt;. It’s &lt;strong&gt;safe&lt;/strong&gt; (parsed via Python’s AST, not &lt;code&gt;eval&lt;/code&gt;), validated, and compiled into an efficient predicate.&lt;/p&gt;
&lt;p&gt;Use it to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prototype filters quickly.&lt;/li&gt;
&lt;li&gt;Keep pipelines readable.&lt;/li&gt;
&lt;li&gt;Let end users define filters without writing Python.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How It Works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The query string is parsed into an &lt;strong&gt;AST&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The AST is validated for safety.&lt;/li&gt;
&lt;li&gt;It’s compiled into a fast predicate function.&lt;/li&gt;
&lt;li&gt;Variables from your &lt;strong&gt;caller’s local scope&lt;/strong&gt; can be injected with &lt;code&gt;@var&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="supported-syntax"&gt;Supported syntax&lt;/h2&gt;
&lt;h3 id="comparisons"&gt;Comparisons&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;==&lt;/code&gt;, &lt;code&gt;!=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;gt;=&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt;  &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Chained comparisons&lt;/strong&gt; are supported and expanded internally (e.g., &lt;code&gt;0 &amp;lt;= goals &amp;lt;= 5&lt;/code&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home &amp;gt;= 2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;0 &amp;lt;= goals_home &amp;lt;= 5&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="logical-operators"&gt;Logical operators&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;and&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;, &lt;code&gt;not&lt;/code&gt; (use parentheses to control precedence)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team == &amp;#39;Liverpool&amp;#39; and goals_home &amp;gt; goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;not (home_team == &amp;#39;Arsenal&amp;#39; or away_team == &amp;#39;Arsenal&amp;#39;)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="field-access-dot-notation"&gt;Field access (dot notation)&lt;/h3&gt;
&lt;p&gt;Access nested fields with &lt;code&gt;.&lt;/code&gt; (resolved via get_field):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;venue.city == &amp;#39;London&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.stats.minutes &amp;gt;= 60&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="membership"&gt;Membership&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;in&lt;/code&gt;, &lt;code&gt;not&lt;/code&gt; &lt;code&gt;in&lt;/code&gt; - note that &lt;strong&gt;field must be on the left-hand side&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team in [&amp;#39;Chelsea&amp;#39;, &amp;#39;Tottenham&amp;#39;]&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;league not in [&amp;#39;Premier League&amp;#39;, &amp;#39;La Liga&amp;#39;]&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# &amp;quot;Man City&amp;quot; in home_team  ← not currently supported&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="null-missing-checks"&gt;NULL / missing checks&lt;/h3&gt;
&lt;p&gt;Identity comparisons with &lt;code&gt;None&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.injury_status is None&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.injury_status is not None&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="string-transforms-inside-comparisons"&gt;String transforms (inside comparisons)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;len(x)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.lower()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.upper()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team.lower() == &amp;#39;manchester united&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;len(home_team) &amp;lt; 8&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;These must be used in a comparison (e.g., &lt;code&gt;field.lower() == 'x'&lt;/code&gt;). Standalone &lt;code&gt;field.lower()&lt;/code&gt; as a predicate will raise an error.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="predicate-style-string-methods-standalone"&gt;Predicate-style string methods (standalone)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;.contains(substring)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.startswith(prefix)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.endswith(suffix)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.regex(pattern, flags)&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team.contains(&amp;#39;united&amp;#39;)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_team.startswith(&amp;#39;West&amp;#39;)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.name.regex(&amp;#39;^Mo&amp;#39;)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="regex-flags"&gt;Regex flags&lt;/h3&gt;
&lt;p&gt;Pass flags from the &lt;code&gt;re&lt;/code&gt; module via a local variable:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;liverpool&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team.regex(@pattern, @flags)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="variables-from-python-var"&gt;Variables from Python (&lt;code&gt;@var&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;Inject values from your local scope with &lt;code&gt;@var&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Liverpool&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team == @team and match_date &amp;gt;= @cutoff&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Variables must exist in the &lt;strong&gt;local&lt;/strong&gt; namespace where you call &lt;code&gt;.query(...)&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="date-datetime-literals"&gt;Date &amp;amp; datetime literals&lt;/h3&gt;
&lt;p&gt;Create date/datetime values inside the expression:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;match_date &amp;gt; date(2024, 6, 30)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;kickoff &amp;gt;= datetime(2025, 8, 11, 19, 45)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Supported literal constructors: &lt;code&gt;date(Y, M, D)&lt;/code&gt;, &lt;code&gt;datetime(Y, M, D, h, m, s).&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="common-patterns"&gt;Common Patterns&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class="s2"&gt;    league == &amp;#39;NLD Eredivisie&amp;#39;&lt;/span&gt;
&lt;span class="s2"&gt;    and season == &amp;#39;2024-2025&amp;#39;&lt;/span&gt;
&lt;span class="s2"&gt;    and total_goals &amp;gt;= 3&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# NOTE: arithmetic isn&amp;#39;t supported directly; see tip below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use an assigned field for arithmetic first:&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_goals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; \
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;total_goals &amp;gt;= 3&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Team in a rolling window (with local variable)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;(team_home == &amp;#39;PSV&amp;#39; or team_away == &amp;#39;PSV&amp;#39;) and match_date &amp;gt;= @start&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Regex with flags&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;^[A-Z][a-z]+$&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.name.regex(@pattern, @flags)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="performance-notes"&gt;Performance notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Each call compiles the expression once, then applies the predicate efficiently.&lt;/li&gt;
&lt;li&gt;Filtering is fastest when you reduce nested lookups early (e.g., call .&lt;code&gt;flatten()&lt;/code&gt; if appropriate).&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;.assign(...)&lt;/code&gt; to precompute values you’ll filter on (e.g., total_goals), then query on that key.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gotchas-limitations"&gt;Gotchas &amp;amp; limitations&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Arithmetic inside queries is not currently supported.&lt;/li&gt;
&lt;li&gt;Do the math in &lt;code&gt;.assign(...)&lt;/code&gt; or &lt;code&gt;.map(...)&lt;/code&gt;, then filter on the derived field.&lt;/li&gt;
&lt;li&gt;For &lt;code&gt;in&lt;/code&gt; / &lt;code&gt;not in&lt;/code&gt;, the field must be on the left.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.lower()&lt;/code&gt; / &lt;code&gt;.upper()&lt;/code&gt; / &lt;code&gt;len()&lt;/code&gt; must be used within a comparison.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@var&lt;/code&gt; substitution uses the caller’s local variables only.&lt;/li&gt;
&lt;li&gt;Only the following literal functions are currently supported inside queries: &lt;code&gt;date(...)&lt;/code&gt;, &lt;code&gt;datetime(...)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="quick-reference"&gt;Quick reference&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Comparisons: &lt;code&gt;== != &amp;gt; &amp;gt;= &amp;lt; &amp;lt;=&lt;/code&gt; (chained comparisons allowed)&lt;/li&gt;
&lt;li&gt;Logic: &lt;code&gt;and&lt;/code&gt; · &lt;code&gt;or&lt;/code&gt; · &lt;code&gt;not&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Membership: &lt;code&gt;in&lt;/code&gt; · &lt;code&gt;not in&lt;/code&gt; (field on LHS)&lt;/li&gt;
&lt;li&gt;Nulls: &lt;code&gt;is None&lt;/code&gt; · &lt;code&gt;is not None&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Transforms (in comparisons): &lt;code&gt;len(x)&lt;/code&gt; · &lt;code&gt;.lower()&lt;/code&gt;· &lt;code&gt;.upper()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Predicates: &lt;code&gt;.contains()&lt;/code&gt; · &lt;code&gt;.startswith()&lt;/code&gt; · &lt;code&gt;.endswith()&lt;/code&gt;· &lt;code&gt;.regex()&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;Literals: &lt;code&gt;date(Y, M, D)&lt;/code&gt; · &lt;code&gt;datetime(Y, M, D, h, m, s)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Variables: &lt;code&gt;@var&lt;/code&gt; (from caller’s locals)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="conclusions"&gt;Conclusions&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;v1.5.0&lt;/strong&gt; is all about &lt;strong&gt;speed, clarity, and control&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;From the new &lt;code&gt;Flow.query&lt;/code&gt; DSL for filtering with clean, Pythonic expressions, to the upgraded &lt;code&gt;FootballProbabilityGrid&lt;/code&gt; that’s market-ready out of the box, to the &lt;code&gt;Pitch&lt;/code&gt; API for building rich, interactive visualisations - every change in this release helps you move from &lt;strong&gt;raw data&lt;/strong&gt; to &lt;strong&gt;reliable, actionable insights&lt;/strong&gt; faster than ever.&lt;/p&gt;
&lt;p&gt;Whether you’re exploring match data, pricing markets, or building visuals for analysis, &lt;code&gt;penaltyblog&lt;/code&gt; v1.5.0 gives you the tools to work &lt;strong&gt;smarter and faster&lt;/strong&gt;.&lt;/p&gt;</content><category term="Data"></category><category term="Visualization"></category><category term="MatchFlow"></category></entry><entry><title>How Accurate Are Soccer Odds? A Data Dive into 250 Million Betting Lines</title><link href="2025/07/16/how-accurate-are-soccer-odds/" rel="alternate"></link><published>2025-07-16T19:30:00+00:00</published><updated>2025-07-16T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-07-16:2025/07/16/how-accurate-are-soccer-odds/</id><summary type="html">&lt;p&gt;A data-driven deep dive into how accurately bookmakers price global soccer markets...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;The global soccer betting market is a high-volume predictive system, where bookmakers generate and update millions of probability estimates in real time. But how accurate are these predictions and how do different bookmakers stack up against each other? &lt;/p&gt;
&lt;p&gt;To explore these questions, I’ve collected a dataset of over &lt;strong&gt;250 million betting lines&lt;/strong&gt; from &lt;strong&gt;1.25 million soccer fixtures worldwide&lt;/strong&gt;, spanning multiple years, leagues, and market types. This gives us a unique dataset to evaluate bookmaker performance on a large scale.&lt;/p&gt;
&lt;p&gt;For this article, I’ll be narrowing the focus to just the &lt;strong&gt;major European leagues&lt;/strong&gt; and analyzing the &lt;strong&gt;moneyline (1X2) markets&lt;/strong&gt; - the outcome of the match: home win, draw, or away win.&lt;/p&gt;
&lt;p&gt;We'll be digging into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Overrounds&lt;/strong&gt; and how bookmakers build in margin&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bookmaker accuracy&lt;/strong&gt; using Ranked Probability Scores (RPS)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Calibration curves&lt;/strong&gt; to test how well odds translate to real-world outcomes&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="what-is-the-overround"&gt;What Is the Overround?&lt;/h1&gt;
&lt;p&gt;In any betting market, the overround is the bookmaker’s built-in profit margin. This is the amount by which the implied probabilities of all outcomes exceed 100%.&lt;/p&gt;
&lt;p&gt;For example, in a simple 3-way soccer market (Home Win / Draw / Away Win), the odds might be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Home Win: 2.00&lt;/li&gt;
&lt;li&gt;Draw: 3.60&lt;/li&gt;
&lt;li&gt;Away Win: 3.60&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To convert these odds into implied probabilities, we take the inverse:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Home Win: 1 / 2.00 = 0.500&lt;/li&gt;
&lt;li&gt;Draw: 1 / 3.60 ≈ 0.278&lt;/li&gt;
&lt;li&gt;Away Win: 1 / 3.60 ≈ 0.278&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: 0.500 + 0.278 + 0.278 = 1.056, or 105.6%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That extra 5.6% is the &lt;strong&gt;overround&lt;/strong&gt;, or the bookmaker’s edge. In a perfectly fair market with no margin, the total implied probabilities would sum to 100%.&lt;/p&gt;
&lt;h2 id="overrounds-by-competition-and-season"&gt;Overrounds by Competition and Season&lt;/h2&gt;
&lt;p&gt;The table below shows average overrounds for moneyline (1X2) markets across the major European leagues, broken down by season. Several patterns stand out immediately:&lt;/p&gt;
&lt;p&gt;Top-tier leagues like the Premier League, La Liga, Serie A, and the Bundesliga consistently have the lowest overrounds - typically between 4.1% and 5.1%. These markets are the most competitive, liquid, and efficient, so bookmakers tend to apply tighter margins.&lt;/p&gt;
&lt;p&gt;Second divisions and lower leagues, such as League One, Serie B, or La Liga 2, often feature significantly higher overrounds, often pushing 7% or more. These markets may be less efficient and attract less volume, giving bookmakers more pricing power.&lt;/p&gt;
&lt;p&gt;Also, there’s been a noticeable upward trend in overrounds over time across most competitions. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Premier League has crept from 4.12% to 4.65% over five seasons.&lt;/li&gt;
&lt;li&gt;Serie B rose from 6.51% to 7.18%.&lt;/li&gt;
&lt;li&gt;La Liga 2 shows a particularly sharp increase: 6.77% to 7.59%.&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Season&lt;/th&gt;
      &lt;th&gt;2020/2021&lt;/th&gt;
      &lt;th&gt;2021/2022&lt;/th&gt;
      &lt;th&gt;2022/2023&lt;/th&gt;
      &lt;th&gt;2023/2024&lt;/th&gt;
      &lt;th&gt;2024/2025&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;Belgium: Jupiler League&lt;/th&gt;
      &lt;td&gt;6.04&lt;/td&gt;
      &lt;td&gt;6.15&lt;/td&gt;
      &lt;td&gt;6.14&lt;/td&gt;
      &lt;td&gt;6.34&lt;/td&gt;
      &lt;td&gt;6.43&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: Championship&lt;/th&gt;
      &lt;td&gt;5.06&lt;/td&gt;
      &lt;td&gt;5.15&lt;/td&gt;
      &lt;td&gt;5.43&lt;/td&gt;
      &lt;td&gt;5.32&lt;/td&gt;
      &lt;td&gt;5.63&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: League One&lt;/th&gt;
      &lt;td&gt;6.27&lt;/td&gt;
      &lt;td&gt;6.25&lt;/td&gt;
      &lt;td&gt;6.45&lt;/td&gt;
      &lt;td&gt;6.51&lt;/td&gt;
      &lt;td&gt;7.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: League Two&lt;/th&gt;
      &lt;td&gt;6.32&lt;/td&gt;
      &lt;td&gt;6.35&lt;/td&gt;
      &lt;td&gt;6.65&lt;/td&gt;
      &lt;td&gt;6.61&lt;/td&gt;
      &lt;td&gt;6.94&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: Premier League&lt;/th&gt;
      &lt;td&gt;4.12&lt;/td&gt;
      &lt;td&gt;4.21&lt;/td&gt;
      &lt;td&gt;4.31&lt;/td&gt;
      &lt;td&gt;4.36&lt;/td&gt;
      &lt;td&gt;4.65&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;France: Ligue 1&lt;/th&gt;
      &lt;td&gt;5.12&lt;/td&gt;
      &lt;td&gt;5.06&lt;/td&gt;
      &lt;td&gt;5.03&lt;/td&gt;
      &lt;td&gt;5.11&lt;/td&gt;
      &lt;td&gt;5.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;France: Ligue 2&lt;/th&gt;
      &lt;td&gt;6.83&lt;/td&gt;
      &lt;td&gt;6.87&lt;/td&gt;
      &lt;td&gt;6.90&lt;/td&gt;
      &lt;td&gt;7.08&lt;/td&gt;
      &lt;td&gt;7.10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Germany: 2. Bundesliga&lt;/th&gt;
      &lt;td&gt;5.98&lt;/td&gt;
      &lt;td&gt;5.95&lt;/td&gt;
      &lt;td&gt;6.28&lt;/td&gt;
      &lt;td&gt;6.39&lt;/td&gt;
      &lt;td&gt;6.57&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Germany: Bundesliga&lt;/th&gt;
      &lt;td&gt;4.72&lt;/td&gt;
      &lt;td&gt;4.63&lt;/td&gt;
      &lt;td&gt;4.71&lt;/td&gt;
      &lt;td&gt;4.76&lt;/td&gt;
      &lt;td&gt;5.02&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Italy: Serie A&lt;/th&gt;
      &lt;td&gt;4.74&lt;/td&gt;
      &lt;td&gt;4.72&lt;/td&gt;
      &lt;td&gt;4.94&lt;/td&gt;
      &lt;td&gt;5.02&lt;/td&gt;
      &lt;td&gt;5.12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Italy: Serie B&lt;/th&gt;
      &lt;td&gt;6.51&lt;/td&gt;
      &lt;td&gt;6.50&lt;/td&gt;
      &lt;td&gt;6.55&lt;/td&gt;
      &lt;td&gt;6.76&lt;/td&gt;
      &lt;td&gt;7.18&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Netherlands: Eredivisie&lt;/th&gt;
      &lt;td&gt;5.58&lt;/td&gt;
      &lt;td&gt;5.64&lt;/td&gt;
      &lt;td&gt;5.81&lt;/td&gt;
      &lt;td&gt;5.64&lt;/td&gt;
      &lt;td&gt;5.90&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Portugal: Liga Portugal&lt;/th&gt;
      &lt;td&gt;6.05&lt;/td&gt;
      &lt;td&gt;6.08&lt;/td&gt;
      &lt;td&gt;5.96&lt;/td&gt;
      &lt;td&gt;5.72&lt;/td&gt;
      &lt;td&gt;6.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Spain: La Liga&lt;/th&gt;
      &lt;td&gt;4.53&lt;/td&gt;
      &lt;td&gt;4.56&lt;/td&gt;
      &lt;td&gt;4.72&lt;/td&gt;
      &lt;td&gt;4.76&lt;/td&gt;
      &lt;td&gt;4.95&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Spain: La Liga 2&lt;/th&gt;
      &lt;td&gt;6.77&lt;/td&gt;
      &lt;td&gt;6.86&lt;/td&gt;
      &lt;td&gt;7.07&lt;/td&gt;
      &lt;td&gt;7.11&lt;/td&gt;
      &lt;td&gt;7.59&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1: Overrounds by Competition and Season&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="premier-league-overrounds-by-bookmaker"&gt;Premier League Overrounds by Bookmaker&lt;/h2&gt;
&lt;p&gt;Zooming in on the English Premier League, we can see clear differences in how various bookmakers price their moneyline markets. Table two below shows average overrounds for each bookmaker over the last five seasons.&lt;/p&gt;
&lt;p&gt;Several key themes emerge:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pinnacle consistently has the lowest overrounds&lt;/strong&gt;, averaging between &lt;strong&gt;2.39% and 3.00%&lt;/strong&gt;. This aligns with Pinnacle’s reputation as a low-margin, high-volume bookmaker that attracts sharper bettors. Their pricing reflects high market efficiency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;1xBet and Unibet&lt;/strong&gt; also offer relatively tight margins, typically under &lt;strong&gt;3%&lt;/strong&gt;, though Unibet shows a notable spike in &lt;strong&gt;2024/2025&lt;/strong&gt;, possibly indicating a change in pricing strategy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mainstream bookmakers&lt;/strong&gt; like bet365, bwin, and William Hill tend to operate with higher overrounds, often between &lt;strong&gt;5% and 6%&lt;/strong&gt;, with William Hill even reaching &lt;strong&gt;7.32%&lt;/strong&gt; in 2023/2024. These firms serve a broader customer base and tend to target recreational bettors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;888sport, Betfair, and Betway&lt;/strong&gt; show a trend of steadily increasing margins over the seasons, suggesting a shift toward more conservative pricing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;10bet&lt;/strong&gt; jumps dramatically to &lt;strong&gt;6.18%&lt;/strong&gt; in 2024/2025 from a relatively sharp &lt;strong&gt;3.78%&lt;/strong&gt;, suggesting a drastic repositioning of itself in the market.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When we break down overrounds in the Premier League by bookmaker, a clear divide emerges between two groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Low-margin operators&lt;/strong&gt; like Pinnacle and 1xBet, which consistently offer tighter pricing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mainstream bookmakers&lt;/strong&gt; such as William Hill, bet365, and bwin, which price with significantly higher margins.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This split reflects two fundamentally different business models:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mainstream bookmakers&lt;/strong&gt; like William Hill, bet365, and bwin typically operate with overrounds in the &lt;strong&gt;5–7%&lt;/strong&gt; range. These firms primarily cater to recreational bettors - users who are betting more for entertainment than profit. Many of these customers are less likely to shop around for better odds or understand how bookmaker margin erodes long-term returns. As such, these bookmakers can afford to be less competitive on price, and may even benefit from it since tighter pricing can attract sharp bettors they’d rather avoid.&lt;/p&gt;
&lt;p&gt;In contrast, &lt;strong&gt;low-margin operators&lt;/strong&gt; such as Pinnacle or 1xBet attract price-sensitive or professional bettors. Their overrounds often sit below &lt;strong&gt;3%&lt;/strong&gt;, even in top-tier markets. These firms rely on a &lt;strong&gt;volume-based model&lt;/strong&gt;, welcoming sharp action and managing risk through efficient pricing and faster line movement, rather than blocking or limiting users.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Season&lt;/th&gt;
      &lt;th&gt;2020/2021&lt;/th&gt;
      &lt;th&gt;2021/2022&lt;/th&gt;
      &lt;th&gt;2022/2023&lt;/th&gt;
      &lt;th&gt;2023/2024&lt;/th&gt;
      &lt;th&gt;2024/2025&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;10bet&lt;/th&gt;
      &lt;td&gt;3.78&lt;/td&gt;
      &lt;td&gt;4.49&lt;/td&gt;
      &lt;td&gt;4.01&lt;/td&gt;
      &lt;td&gt;4.23&lt;/td&gt;
      &lt;td&gt;6.18&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1xBet&lt;/th&gt;
      &lt;td&gt;3.10&lt;/td&gt;
      &lt;td&gt;3.08&lt;/td&gt;
      &lt;td&gt;2.81&lt;/td&gt;
      &lt;td&gt;2.95&lt;/td&gt;
      &lt;td&gt;3.03&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;888sport&lt;/th&gt;
      &lt;td&gt;4.19&lt;/td&gt;
      &lt;td&gt;4.69&lt;/td&gt;
      &lt;td&gt;5.56&lt;/td&gt;
      &lt;td&gt;6.03&lt;/td&gt;
      &lt;td&gt;5.81&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Betfair&lt;/th&gt;
      &lt;td&gt;4.03&lt;/td&gt;
      &lt;td&gt;4.47&lt;/td&gt;
      &lt;td&gt;5.38&lt;/td&gt;
      &lt;td&gt;5.68&lt;/td&gt;
      &lt;td&gt;5.87&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Betway&lt;/th&gt;
      &lt;td&gt;5.43&lt;/td&gt;
      &lt;td&gt;5.00&lt;/td&gt;
      &lt;td&gt;5.08&lt;/td&gt;
      &lt;td&gt;5.83&lt;/td&gt;
      &lt;td&gt;6.07&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Pinnacle&lt;/th&gt;
      &lt;td&gt;2.39&lt;/td&gt;
      &lt;td&gt;2.43&lt;/td&gt;
      &lt;td&gt;2.46&lt;/td&gt;
      &lt;td&gt;2.93&lt;/td&gt;
      &lt;td&gt;3.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Unibet&lt;/th&gt;
      &lt;td&gt;2.56&lt;/td&gt;
      &lt;td&gt;2.59&lt;/td&gt;
      &lt;td&gt;2.56&lt;/td&gt;
      &lt;td&gt;2.27&lt;/td&gt;
      &lt;td&gt;5.18&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;William Hill&lt;/th&gt;
      &lt;td&gt;5.11&lt;/td&gt;
      &lt;td&gt;6.05&lt;/td&gt;
      &lt;td&gt;6.25&lt;/td&gt;
      &lt;td&gt;7.32&lt;/td&gt;
      &lt;td&gt;5.57&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;bet365&lt;/th&gt;
      &lt;td&gt;5.47&lt;/td&gt;
      &lt;td&gt;5.54&lt;/td&gt;
      &lt;td&gt;5.41&lt;/td&gt;
      &lt;td&gt;5.39&lt;/td&gt;
      &lt;td&gt;5.52&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;bwin&lt;/th&gt;
      &lt;td&gt;5.10&lt;/td&gt;
      &lt;td&gt;5.63&lt;/td&gt;
      &lt;td&gt;5.78&lt;/td&gt;
      &lt;td&gt;5.64&lt;/td&gt;
      &lt;td&gt;5.77&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2: Premier League Overrounds by Bookmaker&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="bookmaker-accuracy-by-league-ranked-probability-scores"&gt;Bookmaker Accuracy by League: Ranked Probability Scores&lt;/h1&gt;
&lt;p&gt;To evaluate how well bookmaker odds align with actual match outcomes, we use the &lt;strong&gt;Ranked Probability Score (RPS)&lt;/strong&gt; - a strictly proper scoring rule that measures the accuracy of probabilistic forecasts across ordered outcomes (like home/draw/away in football). Lower values indicate better predictive performance, with perfect accuracy scoring 0.&lt;/p&gt;
&lt;p&gt;Table 3 below shows average RPS values for moneyline markets across major European leagues, broken down by season:&lt;/p&gt;
&lt;p&gt;Several key observations stand out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Top-tier leagues tend to have lower RPS values, indicating more accurate and better-calibrated odds. For example:&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Netherlands' Eredivisie, Portugal’s Liga Portugal, and Italy’s Serie A&lt;/strong&gt; consistently sit at the top of the accuracy table, with RPS scores as low as 0.175–0.185.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Premier League, La Liga, and Bundesliga&lt;/strong&gt; also show strong performance, generally ranging between 0.18 and 0.20.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;In contrast, &lt;strong&gt;lower-tier leagues&lt;/strong&gt; such as &lt;strong&gt;England’s League One/Two, Serie B, and La Liga 2&lt;/strong&gt;, tend to produce higher RPS scores, often above 0.21, suggesting greater uncertainty and less precise odds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Accuracy varies season-to-season&lt;/strong&gt; within leagues, but not that wildly, suggesting relative stability in how bookmakers price different competitions. Some leagues (like France's Ligue 1) show very consistent RPS values year over year.&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Season&lt;/th&gt;
      &lt;th&gt;2020/2021&lt;/th&gt;
      &lt;th&gt;2021/2022&lt;/th&gt;
      &lt;th&gt;2022/2023&lt;/th&gt;
      &lt;th&gt;2023/2024&lt;/th&gt;
      &lt;th&gt;2024/2025&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;Belgium: Jupiler League&lt;/th&gt;
      &lt;td&gt;0.223&lt;/td&gt;
      &lt;td&gt;0.196&lt;/td&gt;
      &lt;td&gt;0.201&lt;/td&gt;
      &lt;td&gt;0.195&lt;/td&gt;
      &lt;td&gt;0.196&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: Championship&lt;/th&gt;
      &lt;td&gt;0.219&lt;/td&gt;
      &lt;td&gt;0.217&lt;/td&gt;
      &lt;td&gt;0.221&lt;/td&gt;
      &lt;td&gt;0.212&lt;/td&gt;
      &lt;td&gt;0.207&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: League One&lt;/th&gt;
      &lt;td&gt;0.220&lt;/td&gt;
      &lt;td&gt;0.200&lt;/td&gt;
      &lt;td&gt;0.207&lt;/td&gt;
      &lt;td&gt;0.216&lt;/td&gt;
      &lt;td&gt;0.209&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: League Two&lt;/th&gt;
      &lt;td&gt;0.223&lt;/td&gt;
      &lt;td&gt;0.214&lt;/td&gt;
      &lt;td&gt;0.214&lt;/td&gt;
      &lt;td&gt;0.224&lt;/td&gt;
      &lt;td&gt;0.219&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;England: Premier League&lt;/th&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.191&lt;/td&gt;
      &lt;td&gt;0.198&lt;/td&gt;
      &lt;td&gt;0.180&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;France: Ligue 1&lt;/th&gt;
      &lt;td&gt;0.204&lt;/td&gt;
      &lt;td&gt;0.201&lt;/td&gt;
      &lt;td&gt;0.198&lt;/td&gt;
      &lt;td&gt;0.206&lt;/td&gt;
      &lt;td&gt;0.203&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;France: Ligue 2&lt;/th&gt;
      &lt;td&gt;0.204&lt;/td&gt;
      &lt;td&gt;0.209&lt;/td&gt;
      &lt;td&gt;0.208&lt;/td&gt;
      &lt;td&gt;0.221&lt;/td&gt;
      &lt;td&gt;0.211&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Germany: 2. Bundesliga&lt;/th&gt;
      &lt;td&gt;0.220&lt;/td&gt;
      &lt;td&gt;0.217&lt;/td&gt;
      &lt;td&gt;0.213&lt;/td&gt;
      &lt;td&gt;0.220&lt;/td&gt;
      &lt;td&gt;0.219&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Germany: Bundesliga&lt;/th&gt;
      &lt;td&gt;0.198&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
      &lt;td&gt;0.186&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Italy: Serie A&lt;/th&gt;
      &lt;td&gt;0.185&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.184&lt;/td&gt;
      &lt;td&gt;0.184&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Italy: Serie B&lt;/th&gt;
      &lt;td&gt;0.206&lt;/td&gt;
      &lt;td&gt;0.204&lt;/td&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.209&lt;/td&gt;
      &lt;td&gt;0.204&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Netherlands: Eredivisie&lt;/th&gt;
      &lt;td&gt;0.183&lt;/td&gt;
      &lt;td&gt;0.186&lt;/td&gt;
      &lt;td&gt;0.184&lt;/td&gt;
      &lt;td&gt;0.176&lt;/td&gt;
      &lt;td&gt;0.185&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Portugal: Liga Portugal&lt;/th&gt;
      &lt;td&gt;0.188&lt;/td&gt;
      &lt;td&gt;0.175&lt;/td&gt;
      &lt;td&gt;0.177&lt;/td&gt;
      &lt;td&gt;0.179&lt;/td&gt;
      &lt;td&gt;0.182&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Spain: La Liga&lt;/th&gt;
      &lt;td&gt;0.190&lt;/td&gt;
      &lt;td&gt;0.195&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
      &lt;td&gt;0.182&lt;/td&gt;
      &lt;td&gt;0.190&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Spain: La Liga 2&lt;/th&gt;
      &lt;td&gt;0.214&lt;/td&gt;
      &lt;td&gt;0.206&lt;/td&gt;
      &lt;td&gt;0.201&lt;/td&gt;
      &lt;td&gt;0.210&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 3: Bookmaker Accuracy by League: Ranked Probability Scores&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="accuracy-by-bookmaker-rps-in-the-premier-league"&gt;Accuracy by Bookmaker: RPS in the Premier League&lt;/h1&gt;
&lt;p&gt;Having seen how bookmaker margins differ, it’s natural to ask: &lt;strong&gt;Do tighter margins translate to more accurate odds?&lt;/strong&gt; One way to answer that is by comparing &lt;strong&gt;Ranked Probability Scores (RPS)&lt;/strong&gt; across bookmakers in the Premier League.&lt;/p&gt;
&lt;p&gt;At first glance, the differences in RPS are &lt;strong&gt;remarkably narrow&lt;/strong&gt;, typically falling between &lt;strong&gt;0.18 and 0.20&lt;/strong&gt; across seasons. That’s perhaps expected in a highly liquid, competitive market like the Premier League, where bookmakers converge on efficient prices.&lt;/p&gt;
&lt;p&gt;Still, some patterns emerge:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No clear link between margin and accuracy:&lt;/strong&gt; Despite having the &lt;strong&gt;lowest overrounds&lt;/strong&gt;, Pinnacle does not clearly outperform higher-margin books like William Hill or bet365 in terms of RPS. In fact, William Hill occasionally posts the lowest (i.e., best) RPS, such as 0.188 in 2021/2022.
This supports the idea that bookmaker &lt;strong&gt;margin and forecast accuracy are not tightly correlated&lt;/strong&gt;. Higher-margin books may simply be padding prices, not offering worse forecasts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;1xBet, Betway, and Pinnacle&lt;/strong&gt; show near-identical performance, suggesting they may be operating with similar models or reacting similarly to market movement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So while Pinnacle leads on pricing (low overround), and books like William Hill apply large margins, &lt;strong&gt;there’s no strong evidence here that sharp-friendly books are forecasting significantly better&lt;/strong&gt;, at least not when measured by RPS alone in the Premier League.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;season&lt;/th&gt;
      &lt;th&gt;2020/2021&lt;/th&gt;
      &lt;th&gt;2021/2022&lt;/th&gt;
      &lt;th&gt;2022/2023&lt;/th&gt;
      &lt;th&gt;2023/2024&lt;/th&gt;
      &lt;th&gt;2024/2025&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;1xBet&lt;/th&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.191&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.180&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;888sport&lt;/th&gt;
      &lt;td&gt;0.212&lt;/td&gt;
      &lt;td&gt;0.199&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.180&lt;/td&gt;
      &lt;td&gt;0.187&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Betway&lt;/th&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.191&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.181&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Pinnacle&lt;/th&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.192&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.180&lt;/td&gt;
      &lt;td&gt;0.192&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Unibet&lt;/th&gt;
      &lt;td&gt;0.212&lt;/td&gt;
      &lt;td&gt;0.191&lt;/td&gt;
      &lt;td&gt;0.198&lt;/td&gt;
      &lt;td&gt;0.179&lt;/td&gt;
      &lt;td&gt;0.194&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;William Hill&lt;/th&gt;
      &lt;td&gt;0.211&lt;/td&gt;
      &lt;td&gt;0.188&lt;/td&gt;
      &lt;td&gt;0.198&lt;/td&gt;
      &lt;td&gt;0.181&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;bet365&lt;/th&gt;
      &lt;td&gt;0.212&lt;/td&gt;
      &lt;td&gt;0.189&lt;/td&gt;
      &lt;td&gt;0.196&lt;/td&gt;
      &lt;td&gt;0.180&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;bwin&lt;/th&gt;
      &lt;td&gt;0.212&lt;/td&gt;
      &lt;td&gt;0.194&lt;/td&gt;
      &lt;td&gt;0.197&lt;/td&gt;
      &lt;td&gt;0.181&lt;/td&gt;
      &lt;td&gt;0.193&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 4: Accuracy by Bookmaker: RPS in the Premier League&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="are-bookmakers-well-calibrated"&gt;Are Bookmakers Well-Calibrated?&lt;/h1&gt;
&lt;p&gt;A final test of accuracy is &lt;strong&gt;calibration&lt;/strong&gt;: how well do predicted probabilities match observed outcomes? If a bookmaker assigns a 70% probability to an outcome, that outcome should occur roughly 70% of the time for the model to be considered well-calibrated.&lt;/p&gt;
&lt;p&gt;The plot below shows &lt;strong&gt;calibration curves&lt;/strong&gt; for bookmaker probabilities across &lt;strong&gt;home wins, draws, and away wins&lt;/strong&gt;. The dashed black line represents &lt;strong&gt;perfect calibration&lt;/strong&gt; where predicted probability equals observed frequency.&lt;/p&gt;
&lt;p&gt;The results are impressive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All three outcome types - &lt;strong&gt;Home Win, Draw, and Away Win&lt;/strong&gt; - track the diagonal line closely, indicating that bookmakers’ odds are &lt;strong&gt;well-calibrated overall&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;There's &lt;strong&gt;no major systematic bias:&lt;/strong&gt; bookmakers aren't consistently under- or overestimating the likelihood of any specific outcome across probability ranges.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deviations are small and mostly in the extremes&lt;/strong&gt;, where data can be noisier due to fewer data points.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Taken together with the low RPS scores in previous sections, this reinforces a key conclusion:&lt;/p&gt;
&lt;p&gt;Bookmakers aren’t just padding prices, their forecasts are statistically sound and well-aligned with real-world outcomes, especially in major markets like the Premier League.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250715_calibration_curve.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Plot 1: Calibration Curves for Bookmaker Probabilities&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="beating-the-bookmaker-in-obscure-leagues"&gt;Beating the Bookmaker in Obscure Leagues?&lt;/h1&gt;
&lt;p&gt;A common belief among many bettors is that focusing on &lt;strong&gt;lower-profile leagues&lt;/strong&gt;, where information is scarce, offers an edge. The logic is simple: if you know more than the bookmaker, you can find value. However, the data tells a more cautionary story.&lt;/p&gt;
&lt;p&gt;Let's take the &lt;strong&gt;Seychelles and Mongolian Premier Leagues&lt;/strong&gt; as examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Seychelles Premier League: Overround = 9.76%&lt;/li&gt;
&lt;li&gt;Mongolia Premier League: Overround = 12.03%&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are massive compared to the &lt;strong&gt;4-5% typical in top European leagues&lt;/strong&gt;. But high margins alone don’t mean sloppy forecasts. In fact, the &lt;strong&gt;RPS scores tell a different story:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Seychelles RPS = 0.230&lt;/li&gt;
&lt;li&gt;Mongolia RPS = 0.192&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mongolia, in particular, is priced &lt;strong&gt;as accurately as many major European leagues&lt;/strong&gt; despite the bookmaker charging a hefty margin. Even Seychelles, with a worse RPS, still falls within the range seen in many second-tier European competitions.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So while obscure leagues may feel like an edge, the data suggests bookmakers are doing a surprisingly good job and charging extra for the uncertainty.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If anything, the takeaway is this: &lt;strong&gt;the more obscure the league, the higher the margin - not necessarily the worse the forecast&lt;/strong&gt;. Bookmakers are aware of their informational disadvantage and &lt;strong&gt;price accordingly to protect themselves&lt;/strong&gt;. You’re not beating the market by knowing more about Tuv Azarganuud's starting XI but you are paying a premium for the privilege of trying.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Across millions of betting lines, one thing is clear: &lt;strong&gt;bookmakers, especially in major markets, are very good at what they do&lt;/strong&gt;. Their odds are well-calibrated, their forecasts consistently accurate, and their margins finely tuned to the level of competition and market liquidity. Even in obscure leagues, they’re cautious but not careless.&lt;/p&gt;
&lt;p&gt;But that doesn’t mean the market is unbeatable. If anything, it shows that finding an edge requires more than intuition - &lt;strong&gt;it demands scale, rigor, and nuance&lt;/strong&gt;. That’s exactly what a data-driven approach is designed to uncover. &lt;/p&gt;
&lt;p&gt;Whether you're looking to evaluate specific bookmakers, spot outliers, or understand where pricing deviates from probability, I’ll be continuing to dig into this dataset and sharing anything interesting I find.&lt;/p&gt;
&lt;p&gt;For those looking for signal in a noisy market, I hope this is an interesting start.&lt;/p&gt;</content><category term="Data"></category><category term="Betting"></category><category term="Data"></category></entry><entry><title>MatchFlow 1.4.0: Optimizing, Visualizing, and Validating your Data Pipelines</title><link href="2025/06/10/matchflow-1.4.0/" rel="alternate"></link><published>2025-06-10T19:30:00+00:00</published><updated>2025-06-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-06-10:2025/06/10/matchflow-1.4.0/</id><summary type="html">&lt;p&gt;MatchFlow just got smarter, friendlier, and more powerful for optimizing your pipelines, visualizing your data flow, and keeping your data clean...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Over the past few weeks, I've been diving into the world of query plan optimization. It's a fascinating (but complex) rabbit hole that has led to several new features in the upcoming MatchFlow release. Here's what's new, why I built it, and how it can help you.&lt;/p&gt;
&lt;h2 id="query-plan-optimization-optional-but-awesome"&gt;Query Plan Optimization (optional, but awesome!)&lt;/h2&gt;
&lt;p&gt;As data pipelines become more complex, I realized MatchFlow could benefit from smarter optimization. Enter the internal &lt;code&gt;FlowOptimizer&lt;/code&gt; class. This is an optional feature you can enable to automatically rewrite your pipeline plans, pushing filters and limits closer to data sources, fusing steps, and generally improving efficiency.&lt;/p&gt;
&lt;p&gt;You can opt-in to using it by:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_jsonl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;events.jsonl&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Why make it optional? Correctness and safety are my top priorities. Since this is the initial release of the optimizer, I'm making it opt-in while it gets thoroughly battle-tested. &lt;/p&gt;
&lt;p&gt;Once I'm confident it handles edge cases correctly, I'll make it the default. For now though, enabling it is straightforward but the choice remains yours.&lt;/p&gt;
&lt;p&gt;Importantly, the optimizer is also transparent and explainable. Use &lt;code&gt;.explain(compare=True)&lt;/code&gt; to see exactly what optimizations have taken place.&lt;/p&gt;
&lt;h2 id="improved-plan-visualization-with-plot_plan"&gt;Improved Plan Visualization with &lt;code&gt;.plot_plan&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Understanding exactly what your pipeline does is crucial. With &lt;code&gt;Flow.plot_plan&lt;/code&gt;, you can now generate visual representations of your pipelines. Quickly visualize steps, identify bottlenecks, and simplify debugging. You can also set &lt;code&gt;compare=True&lt;/code&gt; to compare the plan before and after the optimizer has potentially rewritten it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;


&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;B&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;B&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/flow_limit_plan.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Flow plan before and after optimizer&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You can see from the example above how the optimizer has pushed the &lt;code&gt;limit&lt;/code&gt; step closer to the data source, reducing the amount of data that needs to be processed. This can lead to significant performance improvements, especially for larger datasets.&lt;/p&gt;
&lt;p&gt;Let's look at another example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;


&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;B&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;B&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/flow_filter_plan.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Flow plan before and after optimizer&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This time the optimizer has recognized that the two filters are redundant and has fused them into a single filter step. This can lead to significant performance improvements by reducing the number of steps that need to be executed.&lt;/p&gt;
&lt;h2 id="rolling-summaries-for-groups"&gt;Rolling Summaries for Groups&lt;/h2&gt;
&lt;p&gt;Sports data often needs rolling summaries, think moving averages or cumulative sums. MatchFlow now supports this natively using a sliding window:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_equals&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statsbomb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16023&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Shot&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;minute&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
            &lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;second&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rolling_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;5m&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                  
         &lt;span class="n"&gt;aggregators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;sum&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot.statsbomb_xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;shot_count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot.statsbomb_xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="p"&gt;},&lt;/span&gt;
         &lt;span class="n"&gt;time_field&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                          
      &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot_count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="what-it-does"&gt;What it does&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Groups your stream by one or more keys (e.g. team).&lt;/li&gt;
&lt;li&gt;Sorts each group by a timestamp (must be pre-sorted).&lt;/li&gt;
&lt;li&gt;Slides a fixed-duration window (5m here but can also be the number of records) over each group.&lt;/li&gt;
&lt;li&gt;Computes aggregations (moving average, running sum, count, etc.) within each window.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-pre-sorting-matters"&gt;Why pre-sorting matters&lt;/h3&gt;
&lt;p&gt;To guarantee correct sliding-window results, you &lt;strong&gt;must&lt;/strong&gt; call &lt;code&gt;.sort_by()&lt;/code&gt; before &lt;code&gt;.rolling_summary()&lt;/code&gt;. If records arrive at the window out of order, you risk skewed or invalid metrics.&lt;/p&gt;
&lt;h3 id="automatic-safety-checks"&gt;Automatic safety checks&lt;/h3&gt;
&lt;p&gt;I debated with myself a lot about auto-sorting inside &lt;code&gt;.rolling_summary()&lt;/code&gt;, but eventually decided against it because it would:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add overhead when data is already sorted&lt;/li&gt;
&lt;li&gt;Hide errors when upstream sources emit out-of-order events&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instead, I settled on MatchFlow’s new optimizer emitting a warning whenever it detects that &lt;code&gt;.rolling_summary()&lt;/code&gt; follows a non-sorted stream. This reminds you to verify order without penalizing optimized pipelines.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/flow_rolling_summary_plan.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Flow plan before and after optimizer&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="time-buckets"&gt;Time Buckets&lt;/h2&gt;
&lt;p&gt;If you’d rather have one summary per uniform time interval, say total xG every 5 minutes, you can use &lt;code&gt;.time_bucket()&lt;/code&gt;: &lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_equals&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt;


&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statsbomb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16023&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Shot&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;minute&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
            &lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;second&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time_bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="n"&gt;freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;5m&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                 
         &lt;span class="n"&gt;aggregators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;sum&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot.statsbomb_xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;shot_count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot.statsbomb_xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
         &lt;span class="p"&gt;},&lt;/span&gt;
         &lt;span class="n"&gt;time_field&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                       
         &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;                                
      &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="what-it-does_1"&gt;What it does&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Non-overlapping bins: divides time into [0–5), [5–10), … minute windows.&lt;/li&gt;
&lt;li&gt;Single row per bucket: each group emits exactly one record per interval, with your chosen aggregations.&lt;/li&gt;
&lt;li&gt;Flexible labeling: choose &lt;code&gt;label="left"&lt;/code&gt; (interval start) or &lt;code&gt;label="right"&lt;/code&gt; (interval end), and name that column with &lt;code&gt;bucket_name&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="quick-tips"&gt;Quick Tips&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Always &lt;code&gt;.assign()&lt;/code&gt; your timestamp field as a proper &lt;code&gt;datetime&lt;/code&gt; or &lt;code&gt;timedelta&lt;/code&gt; when using time-based windows.&lt;/li&gt;
&lt;li&gt;Don't forget to use &lt;code&gt;.sort_by()&lt;/code&gt; before &lt;code&gt;.rolling_summary()&lt;/code&gt; to ensure correct results.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;.time_bucket()&lt;/code&gt; when you need uniform reporting intervals; use &lt;code&gt;.rolling_summary()&lt;/code&gt; for sliding-window analytics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="schema-validation"&gt;Schema Validation&lt;/h2&gt;
&lt;p&gt;Messy data can be frustrating so I've also introduced &lt;code&gt;.with_schema()&lt;/code&gt;, to help catch data issues early:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt_str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;%Y-%m-&lt;/span&gt;&lt;span class="si"&gt;%d&lt;/span&gt;&lt;span class="s2"&gt; %H:%M:%S&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;with_schema&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;score&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;parse_datetime&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;strict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can set &lt;code&gt;strict=True&lt;/code&gt; to raise an error if any data doesn't match the schema, and &lt;code&gt;drop_extra=True&lt;/code&gt; to drop any fields that are not defined in the schema.&lt;/p&gt;
&lt;h2 id="quick-data-previews"&gt;Quick Data Previews&lt;/h2&gt;
&lt;p&gt;Because I'm often working in notebooks or terminals, I've added a simple &lt;code&gt;.show()&lt;/code&gt; method to quickly preview data in a nice, readable format:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="quality-of-life-improvements"&gt;Quality-of-Life Improvements&lt;/h2&gt;
&lt;p&gt;I've also added some quality-of-life improvements to existing methods, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optional progress bars on &lt;code&gt;.collect()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Improved &lt;code&gt;.explain()&lt;/code&gt; for grouped pipelines&lt;/li&gt;
&lt;li&gt;Documentation updates&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="conclusions"&gt;Conclusions&lt;/h1&gt;
&lt;p&gt;These improvements reflect my ongoing plan to make MatchFlow as powerful while remaining simple as possible. &lt;/p&gt;
&lt;p&gt;MatchFlow is still new and I'm sure there's edge cases I haven't discovered yet so if you find any issues or or have any feedback, please do let me know. I'm always open to collaboration and excited about new possibilities. &lt;/p&gt;
&lt;h1 id="whats-next"&gt;What's Next?&lt;/h1&gt;
&lt;p&gt;As a quick teaser, the new optimizer opens the door to exciting future developments like predicate pushdown and parallel processing. I'm also currently experimenting with a custom binary data format designed for faster loading and even more efficient filtering. Lots of great stuff on the roadmap so stay tuned!&lt;/p&gt;</content><category term="Data"></category><category term="MatchFlow"></category><category term="Data"></category></entry><entry><title>Introducing MatchFlow: a JSON-native query engine for football data.</title><link href="2025/05/25/matchflow-introduction/" rel="alternate"></link><published>2025-05-25T19:30:00+00:00</published><updated>2025-05-25T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-05-25:2025/05/25/matchflow-introduction/</id><summary type="html">&lt;p&gt;MatchFlow is a JSON-native query engine for football data - no flattening, no fuss...&lt;/p&gt;</summary><content type="html">&lt;h1 id="matchflow-a-json-native-query-engine-for-football-data"&gt;MatchFlow: A JSON-Native Query Engine for Football Data&lt;/h1&gt;
&lt;p&gt;Over the years, I’ve spent a lot of time wrangling football data - from open data projects to internal club work. A lot of it comes as JSON these days: events, lineups, matches, tracking, metadata. And while that format is powerful and expressive, the tools we use to work with it often aren’t.&lt;/p&gt;
&lt;p&gt;What’s the first step in most workflows?&lt;/p&gt;
&lt;p&gt;Flatten everything. Hope the structure is consistent. Write some glue code to handle edge cases. Cross fingers that &lt;code&gt;json_normalize()&lt;/code&gt; doesn’t choke. And repeat, every time.&lt;/p&gt;
&lt;p&gt;I got tired of that. So I started building &lt;strong&gt;MatchFlow&lt;/strong&gt;, a Python library that lets me work with football data more naturally.&lt;/p&gt;
&lt;h2 id="why-i-built-matchflow"&gt;Why I Built MatchFlow&lt;/h2&gt;
&lt;p&gt;I don’t want to spend hours figuring out how to flatten the data just to get started on a new project. I want to load the data, explore it, and see what’s there. And that’s where MatchFlow really helps.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Show me all the shots in a match&lt;/li&gt;
&lt;li&gt;Group events by player&lt;/li&gt;
&lt;li&gt;Compute some summary stats&lt;/li&gt;
&lt;li&gt;Build a dataset for modelling&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The problem is, doing that with nested JSON style data often means jumping through hoops: flattening things, renaming fields, re-shaping arrays, cleaning as you go. It gets in the way of thinking clearly.&lt;/p&gt;
&lt;p&gt;MatchFlow is my attempt to fix that, or at least make it less annoying.&lt;/p&gt;
&lt;p&gt;It’s built around a simple idea: &lt;strong&gt;don’t fight the structure&lt;/strong&gt;. Just let it flow through, and shape it as you go.&lt;/p&gt;
&lt;h2 id="what-matchflow-does"&gt;What MatchFlow Does&lt;/h2&gt;
&lt;p&gt;MatchFlow builds a directed acyclic graph (DAG) of operations - a chain of lazy, composable steps that can be optimized and executed efficiently. This architecture makes MatchFlow fast, composable, and ready for upcoming features like pushdown filtering, transform fusion, and automatic caching.&lt;/p&gt;
&lt;p&gt;You can load data from JSON files, folders, or even directly from APIs like StatsBomb, and then apply transformations step by step:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_equals&lt;/span&gt;

&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_folder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;data/events/&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Shot&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;xT&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;location&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;shots.json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nothing is loaded into memory until you ask for it. You can stream, group, filter, join, and export. And most importantly, you can do all of that without flattening everything first.&lt;/p&gt;
&lt;h1 id="it-works-with-the-data-we-actually-get"&gt;It Works with the Data We Actually Get&lt;/h1&gt;
&lt;p&gt;Football data is full of nested structures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Events have players, tags, locations&lt;/li&gt;
&lt;li&gt;Matches contain teams, lineups, substitutions&lt;/li&gt;
&lt;li&gt;Freeze-frames as arrays of objects inside a single field&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Flattening all that up front means you lose some of the meaning, or have to write extra code to glue it back together. MatchFlow lets you work with it in-place, using dot notation (&lt;code&gt;player.name&lt;/code&gt;, &lt;code&gt;player.location.0&lt;/code&gt;, etc.) and stream-like transformations.&lt;/p&gt;
&lt;p&gt;MatchFlow isn’t trying to replace dataframes - it’s built for a different job: helping you explore, transform, and build pipelines over nested, unstructured data without flattening everything first.&lt;/p&gt;
&lt;p&gt;Whether that means pulling down a bunch of JSON files and exploring them interactively, or turning them into a repeatable data engineering pipeline, MatchFlow makes that easy without needing to reshape everything upfront.&lt;/p&gt;
&lt;h2 id="a-quick-example-with-statsbomb"&gt;A Quick Example (with StatsBomb)&lt;/h2&gt;
&lt;p&gt;Here’s how you might pull shots from a StatsBomb match using the built-in API client:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog.matchflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;where_equals&lt;/span&gt;

&lt;span class="n"&gt;shots&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statsbomb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;22912&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;where_equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Shot&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player.name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;location&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;shot.statsbomb_xg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;No flattening. No munging. Just a stream of records that you can work with directly.&lt;/p&gt;
&lt;h2 id="whats-included-in-v1"&gt;What's Included in v1&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Read from JSON files, folders, generators, or the StatsBomb API&lt;/li&gt;
&lt;li&gt;Chainable methods like &lt;code&gt;.filter()&lt;/code&gt;, &lt;code&gt;.select()&lt;/code&gt;, &lt;code&gt;.assign()&lt;/code&gt;, &lt;code&gt;.group_by()&lt;/code&gt;, &lt;code&gt;.join()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Export to &lt;code&gt;.to_json()&lt;/code&gt;, &lt;code&gt;.to_jsonl()&lt;/code&gt;, &lt;code&gt;.to_pandas()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Nested field access using dot notation&lt;/li&gt;
&lt;li&gt;Lazy evaluation - everything streams until you call &lt;code&gt;.collect()&lt;/code&gt; or &lt;code&gt;.to_pandas()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="whats-coming-next"&gt;What’s Coming Next&lt;/h2&gt;
&lt;h3 id="performance"&gt;🚀 Performance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A custom file format with partitioning and fast reading&lt;/li&gt;
&lt;li&gt;Predicate pushdown using file-level indexes and Bloom filters&lt;/li&gt;
&lt;li&gt;Compilation of query steps into native Python functions for speed&lt;/li&gt;
&lt;li&gt;Cython/JIT-optimized inner loops for group-by and sorting&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="features"&gt;🔧 Features&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Support for Wyscout, Opta, and remote data (S3, GCS)&lt;/li&gt;
&lt;li&gt;Built-in operations like &lt;code&gt;extract_shots()&lt;/code&gt; and &lt;code&gt;extract_passes()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;DSL for natural query expressions like &lt;code&gt;flow.query("player.name == 'Messi'")&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="ergonomics"&gt;🛠 Ergonomics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Plotting helpers for quick exploration&lt;/li&gt;
&lt;li&gt;Command-line tools for scripting pipelines&lt;/li&gt;
&lt;li&gt;Caching for faster re-runs&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="try-it-out"&gt;Try It Out&lt;/h1&gt;
&lt;p&gt;If you want to give it a spin:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can also check out the &lt;a href="https://penaltyblog.readthedocs.io/en/master/index.html"&gt;docs&lt;/a&gt; - there’s a guides, an API reference, and examples of some common tasks.&lt;/p&gt;
&lt;h1 id="final-thoughts"&gt;Final Thoughts&lt;/h1&gt;
&lt;p&gt;I don’t think MatchFlow is for everyone. If you're happy working in SQL or pandas and it's doing what you need then great, stick with it. But if you’ve ever found yourself spending too much time writing flatteners, normalizers, or trying to keep your data in shape just to ask a basic question, MatchFlow might help.&lt;/p&gt;
&lt;p&gt;It's early days, and I'm sure there are still edge cases I haven't uncovered. But this approach has saved me a lot of time over the years, and hopefully it can do the same for you.&lt;/p&gt;
&lt;p&gt;I'd love to hear how you use it, or where it breaks. That’s how it gets better 😁&lt;/p&gt;</content><category term="Data"></category><category term="MatchFlow"></category><category term="Data"></category></entry><entry><title>Better Metrics for Football Forecasts: Moving Beyond the Ranked Probability Score</title><link href="2025/05/01/better-metrics-for-football-forecasts-moving-beyond-the-ranked-probability-score/" rel="alternate"></link><published>2025-05-01T19:30:00+00:00</published><updated>2025-05-01T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-05-01:2025/05/01/better-metrics-for-football-forecasts-moving-beyond-the-ranked-probability-score/</id><summary type="html">&lt;p&gt;Why the Ranked Probability Score might be misleading your football model evaluations, and what to use instead....&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;What if the metric you've been using to evaluate football forecasts is quietly rewarding worse predictions?&lt;/p&gt;
&lt;p&gt;Probabilistic forecasting is a crucial tool in sports analytics - whether it's to inform betting strategies, guide player and tactical decisions, or simply enhance the way we follow the sport. As these models have grown in popularity, however, one important question has often been overlooked: how do we evaluate the quality of our forecasts?&lt;/p&gt;
&lt;p&gt;Despite the increase in use of football (soccer) prediction models, the discussion around how best to score and compare them has been relatively limited. In many cases, evaluation methods are taken for granted without much scrutiny. This is a significant gap because without the right tools for measuring forecast quality, it's easy to draw misleading conclusions about model performance.&lt;/p&gt;
&lt;p&gt;On my blog, I've often used the &lt;strong&gt;Ranked Probability Score (RPS)&lt;/strong&gt; to assess the accuracy of predictive models. RPS has an intuitive appeal as it rewards forecasts not just for getting the result right, but also for placing higher probability on outcomes &lt;em&gt;close&lt;/em&gt; to the actual result. A forecast that leaned towards a draw when the match ended as a narrow home win was treated better than one that backed an away win.&lt;/p&gt;
&lt;p&gt;For a long time, that logic seemed sound. However, I've recently been questioning whether RPS is really the best tool for evaluating football forecasts - especially when our ultimate goal is to identify the most informative models as efficiently and fairly as possible.&lt;/p&gt;
&lt;p&gt;In this article, I’ll explain why RPS might not be the optimal choice, introduce alternative scoring metrics like Log Loss (also known as Ignorance Score) and the multiclass Brier score, and share some experiments / ideas I've tested to explore which metrics are best suited for evaluating football predictive models.&lt;/p&gt;
&lt;h1 id="background-how-we-score-probabilistic-football-forecasts"&gt;Background: How We Score Probabilistic Football Forecasts&lt;/h1&gt;
&lt;p&gt;When evaluating a model that outputs probabilities, such as predicting a 45% chance of a home win, 30% draw, and 25% away win, we need a way to judge how good those probabilities actually were once the match result is known. That's where scoring rules come in.&lt;/p&gt;
&lt;p&gt;A scoring rule is simply a mathematical method that assigns a numerical score to a forecast based on the eventual outcome. A good scoring rule rewards forecasts that were well-calibrated and honest, and penalizes forecasts that placed high confidence on the wrong results.&lt;/p&gt;
&lt;h2 id="the-ranked-probability-score-rps"&gt;The Ranked Probability Score (RPS)&lt;/h2&gt;
&lt;p&gt;The Ranked Probability Score (RPS) has become a popular choice for evaluating football forecasts. Its appeal lies in how it measures how close a forecast was to the actual outcome, taking the &lt;strong&gt;ordering&lt;/strong&gt; of results into account.&lt;/p&gt;
&lt;p&gt;In football, match outcomes - home win, draw, away win - can be thought of as ordered, with a draw being &lt;strong&gt;closer&lt;/strong&gt; to a home win than an away win is. The RPS rewards forecasts that spread probability sensibly across outcomes near the correct result, not just those that hit the exact outcome.&lt;/p&gt;
&lt;p&gt;Technically, RPS calculates the squared differences between the cumulative forecast probabilities and the cumulative actual outcomes across all possible results. It sums these differences to produce a final score, with lower scores indicating better forecasts.&lt;/p&gt;
&lt;p&gt;In summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Correct predictions get rewarded.&lt;/li&gt;
&lt;li&gt;Predictions that &lt;em&gt;almost&lt;/em&gt; got it right (e.g., favouring a draw when it was a home win) are penalized less harshly than predictions that were way off (e.g., favouring an away win).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This &lt;strong&gt;sensitivity to distance&lt;/strong&gt; has often been cited as a key advantage of RPS but, as we will explore, it also introduces potential problems.&lt;/p&gt;
&lt;h1 id="alternative-scoring-rules-brier-score-and-log-loss"&gt;Alternative Scoring Rules: Brier Score and Log Loss&lt;/h1&gt;
&lt;p&gt;While RPS rewards proximity to the correct outcome, there are other scoring rules that take a different approach:&lt;/p&gt;
&lt;h2 id="the-brier-score-multiclass-version"&gt;The Brier Score (Multiclass Version)&lt;/h2&gt;
&lt;p&gt;The Brier score measures the squared difference between the predicted probability and the actual outcome, across all possible results but without considering the &lt;strong&gt;ordering&lt;/strong&gt; between outcomes. A wrong prediction for an away win is penalized the same as a wrong prediction for a draw if the match ends in a home win.&lt;/p&gt;
&lt;p&gt;In formula terms, it’s simply the mean squared error between the forecast probabilities and the actual result (coded as 1 for the outcome that occurred, 0 for those that did not).&lt;/p&gt;
&lt;p&gt;Like RPS, lower Brier scores indicate better forecasts. However, the Brier score is insensitive to distance as it treats all incorrect outcomes equally wrong.&lt;/p&gt;
&lt;h2 id="log-loss-ignorance-score"&gt;Log Loss (Ignorance Score)&lt;/h2&gt;
&lt;p&gt;The log loss, also known as the ignorance score, takes an even sharper approach. It focuses only on the probability assigned to the outcome that actually occurred, ignoring how probability was distributed across other outcomes.&lt;/p&gt;
&lt;p&gt;Log loss measures how &lt;em&gt;surprised&lt;/em&gt; the forecast was by the actual result using concepts from information theory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you assigned a high probability to the correct result, you get a low (good) score.&lt;/li&gt;
&lt;li&gt;If you assigned a low probability to the correct result, you're heavily penalized.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smaller log loss values (closer to zero) indicate better forecasts. Importantly, log loss is a local scoring rule meaning it doesn't reward or penalize probability placed on outcomes that didn’t happen.&lt;/p&gt;
&lt;h2 id="key-properties-of-a-good-scoring-rule"&gt;Key Properties of a Good Scoring Rule&lt;/h2&gt;
&lt;p&gt;When choosing how to evaluate football forecasts, a few key properties matter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Propriety:&lt;/strong&gt; A scoring rule is proper if it encourages honest forecasting - assigning probabilities that truly reflect beliefs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Locality:&lt;/strong&gt; A scoring rule is local if it only depends on the probability of the outcome that occurred (log loss is local; RPS and Brier are not).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sensitivity to Distance:&lt;/strong&gt; Some rules (like RPS) reward probability placed on outcomes "near" the correct one; others (like Brier and log loss) do not.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These properties have deep implications for how fairly and effectively different metrics evaluate forecasts, and, as we'll see, they raise important questions about whether RPS is really the best choice.&lt;/p&gt;
&lt;h1 id="whats-wrong-with-the-ranked-probability-score"&gt;What’s Wrong with the Ranked Probability Score?&lt;/h1&gt;
&lt;p&gt;At first glance, the RPS seems like an ideal tool for evaluating football forecasts. It rewards accurate predictions and gives partial credit to forecasts that are &lt;em&gt;almost right&lt;/em&gt;, reflecting the natural ordering of outcomes like home win, draw, and away win.&lt;/p&gt;
&lt;p&gt;However, when we dig deeper into the properties of RPS, issues emerge especially when we think carefully about what we actually want a scoring rule to do.&lt;/p&gt;
&lt;h2 id="sensitivity-to-distance-isnt-always-useful"&gt;Sensitivity to Distance Isn't Always Useful&lt;/h2&gt;
&lt;p&gt;RPS assumes that outcomes have a natural order, and that being &lt;em&gt;close&lt;/em&gt; to the right result is better than being completely wrong.
But in practice, this sensitivity to distance may not make sense for evaluating probabilistic forecasts.&lt;/p&gt;
&lt;p&gt;Consider a match where a team wins 3-0 at home. From a probability standpoint, a forecast that heavily backed a draw is no closer to reality than one that backed an away win. The match was decisively a home win, and both alternative forecasts were wrong. Yet RPS would treat the forecast favoring a draw more kindly even though both forecasts misrepresented what actually happened.&lt;/p&gt;
&lt;p&gt;In probabilistic forecasting, what matters most is the probability placed on the outcome that actually occurred, not how close other outcomes seemed. Sensitivity to distance risks rewarding forecasts that were still wrong, simply because they were &lt;em&gt;less wrong&lt;/em&gt; in some arbitrary way.&lt;/p&gt;
&lt;h2 id="non-locality-dilutes-the-signal"&gt;Non-Locality Dilutes the Signal&lt;/h2&gt;
&lt;p&gt;A second issue is that &lt;strong&gt;RPS is non-local&lt;/strong&gt;: it takes into account the entire probability distribution, not just the probability assigned to the true outcome.&lt;/p&gt;
&lt;p&gt;This means a forecast can achieve a better RPS score by adjusting probabilities on outcomes that did not happen. In theory, a forecast could place relatively little probability on the actual outcome but still score well by distributing probabilities nicely across nearby results.&lt;/p&gt;
&lt;p&gt;This dilutes the signal we care about most: how much belief did the forecast put on the correct outcome? A good evaluation metric should focus sharply on that and not get distracted by how probabilities were assigned to events that never occurred.&lt;/p&gt;
&lt;h2 id="inefficiency-in-identifying-better-forecasts"&gt;Inefficiency in Identifying Better Forecasts&lt;/h2&gt;
&lt;p&gt;From a practical standpoint, another downside of RPS is that it can require more data - as in more matches and outcomes - to reliably identify which forecasting model is actually better.&lt;/p&gt;
&lt;p&gt;In simulation experiments (including those I’ll describe later), scoring rules like log loss tend to distinguish better models more quickly and more reliably than RPS. This matters because in the real world, we often have limited sample sizes, especially when evaluating new models, seasonal predictions, or niche competitions.&lt;/p&gt;
&lt;p&gt;A scoring rule that uses the available information more efficiently gives us a better chance of recognizing model quality earlier.&lt;/p&gt;
&lt;h2 id="rps-summary"&gt;RPS Summary&lt;/h2&gt;
&lt;p&gt;While the Ranked Probability Score has attractive theoretical features, it also carries significant practical and conceptual drawbacks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It can reward forecasts that are still substantially wrong.&lt;/li&gt;
&lt;li&gt;It spreads focus across outcomes that didn’t happen.&lt;/li&gt;
&lt;li&gt;It may be slower and less reliable at identifying genuinely better models.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given these issues, it's worth asking whether we should move beyond RPS when evaluating football forecasts and whether alternatives like log loss (Ignorance Score) and the multiclass Brier score might offer a better foundation.&lt;/p&gt;
&lt;p&gt;In the next sections, I'll show some ideas that explore exactly that question.&lt;/p&gt;
&lt;h1 id="experiment-one-rps-can-be-slower-to-identify-better-forecasts"&gt;Experiment One: RPS Can be Slower to Identify Better Forecasts&lt;/h1&gt;
&lt;p&gt;To compare how well different scoring rules distinguish better forecasts in football, I ran a simple simulation.&lt;/p&gt;
&lt;p&gt;For each match:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Forecast A represented a "better" forecast - more accurate and better calibrated.&lt;/li&gt;
&lt;li&gt;Forecast B represented a "worse" forecast - slightly less accurate and less confident.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Forecast A assigned probabilities of 60% home win, 25% draw, and 15% away win.&lt;/li&gt;
&lt;li&gt;Forecast B assigned 50% home win, 30% draw, and 20% away win - a slightly more spread-out, less confident forecast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In each simulation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I simulated match outcomes by randomly drawing results according to the better forecast (Forecast A).&lt;/li&gt;
&lt;li&gt;For each match, I scored both forecasts using three different metrics:&lt;ul&gt;
&lt;li&gt;Log Loss (Ignorance Score)&lt;/li&gt;
&lt;li&gt;Multiclass Brier Score&lt;/li&gt;
&lt;li&gt;Ranked Probability Score (RPS)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;I then compared the total scores for Forecast A and Forecast B.&lt;ul&gt;
&lt;li&gt;If a metric gave a lower (better) score to Forecast A, it was counted as a "correct" selection for that metric.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I repeated this process &lt;strong&gt;1,000,000 times&lt;/strong&gt; for each sample size tested - ranging from just 10 matches up to 1000 matches - and calculated the proportion of simulations where each scoring rule correctly identified Forecast A as better.&lt;/p&gt;
&lt;p&gt;This setup mimics a realistic football forecasting evaluation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Limited numbers of matches.&lt;/li&gt;
&lt;li&gt;Models that are not drastically different - a common scenario in practice.&lt;/li&gt;
&lt;li&gt;Need for scoring rules to efficiently and reliably identify better forecasts even when differences are subtle.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The code for this is shown below. However, please note that you may get different results due to the random sampling. The general pattern should be similar though.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;forecast_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;forecast_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;outcomes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;forecast_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;brier&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;logloss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Broadcast forecasts&lt;/span&gt;
        &lt;span class="n"&gt;probs_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;forecast_a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;probs_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;forecast_b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;sample_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;score_a_brier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multiclass_brier_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;score_b_brier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multiclass_brier_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;score_a_logloss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignorance_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;score_b_logloss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignorance_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;score_a_rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;score_b_rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;brier&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score_a_brier&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;score_b_brier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;logloss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score_a_logloss&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;score_b_logloss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score_a_rps&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;score_b_rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_sizes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;brier&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;logloss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sample_sizes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;simulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="n"&gt;run_experiment&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Sample Size&lt;/th&gt;
      &lt;th&gt;Log Loss&lt;/th&gt;
      &lt;th&gt;Brier Score&lt;/th&gt;
      &lt;th&gt;RPS&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;62.9%&lt;/td&gt;&lt;td&gt;&lt;strong&gt;63.0%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;61.3%&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;25&lt;/td&gt;&lt;td&gt;70.4%&lt;/td&gt;&lt;td&gt;&lt;strong&gt;72.9%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;67.7%&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;50&lt;/td&gt;&lt;td&gt;76.9%&lt;/td&gt;&lt;td&gt;&lt;strong&gt;77.1%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;76.5%&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;100&lt;/td&gt;&lt;td&gt;84.9%&lt;/td&gt;&lt;td&gt;&lt;strong&gt;87.0%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;84.1%&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;200&lt;/td&gt;&lt;td&gt;&lt;strong&gt;93.3%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;92.6%&lt;/td&gt;&lt;td&gt;92.1%&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;500&lt;/td&gt;&lt;td&gt;&lt;strong&gt;99.0%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;98.8%&lt;/td&gt;&lt;td&gt;&lt;strong&gt;99.0%&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;&lt;strong&gt;100.0%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;100.0%&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;100.0%&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1: Proportion of simulations where each scoring rule correctly identified the better forecast&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="interpreting-the-results"&gt;Interpreting the Results&lt;/h2&gt;
&lt;p&gt;With one million repetitions per sample size, the results paint a stable and revealing picture. At smaller sample sizes (10–100 matches), RPS consistently underperforms relative to both Log Loss and Brier Score. For example, at just 25 matches, RPS correctly identifies the better forecast only 67.7% of the time, compared to 70.4% for Log Loss and 72.9% for Brier Score.&lt;/p&gt;
&lt;p&gt;Throughout most sample sizes, Brier Score slightly edges out Log Loss in terms of raw accuracy, particularly in the 25 to 200 match range. However, Log Loss remains nearly as effective, and its theoretical strengths as a strictly proper local scoring rule make it a more principled choice for model evaluation, especially when distinguishing subtle differences between forecasts.&lt;/p&gt;
&lt;p&gt;These results reinforce the idea that &lt;strong&gt;RPS is the least efficient&lt;/strong&gt; of the three: it requires more data to reach the same level of confidence in model comparison, making it a less suitable option for practical use in football analytics.&lt;/p&gt;
&lt;h1 id="experiment-two-rps-can-favour-a-forecast-that-believes-less-in-the-truth"&gt;Experiment Two: RPS Can Favour a Forecast That Believes Less in the Truth&lt;/h1&gt;
&lt;p&gt;One of the key problems with the Ranked Probability Score is that it's &lt;strong&gt;non-local&lt;/strong&gt;, meaning it evaluates forecasts based on the entire cumulative distribution, not just the outcome that actually occurred. This can lead to unintuitive results.&lt;/p&gt;
&lt;h2 id="results_1"&gt;Results&lt;/h2&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;Forecast&lt;/th&gt;
      &lt;th colspan="3"&gt;Probabilities&lt;/th&gt;
      &lt;th rowspan="2"&gt;P(Correct)&lt;/th&gt;
      &lt;th rowspan="2"&gt;RPS&lt;/th&gt;
      &lt;th rowspan="2"&gt;Log Loss&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Home&lt;/th&gt;
      &lt;th&gt;Draw&lt;/th&gt;
      &lt;th&gt;Away&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;A&lt;/td&gt;
      &lt;td&gt;0.70&lt;/td&gt;
      &lt;td&gt;0.10&lt;/td&gt;
      &lt;td&gt;0.20&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.70&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.065&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.515&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;B&lt;/td&gt;
      &lt;td&gt;0.65&lt;/td&gt;
      &lt;td&gt;0.30&lt;/td&gt;
      &lt;td&gt;0.05&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.65&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.062&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0.621&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2: RPS Can Favor a Forecast That Believes Less in the Truth&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="interpreting-the-results_1"&gt;Interpreting the Results&lt;/h2&gt;
&lt;p&gt;Although Forecast A assigns a higher probability to the correct outcome (70% vs 65%), the Ranked Probability Score (RPS) gives a better (lower) score to Forecast B. This happens because RPS evaluates the entire distribution of probabilities, and Forecast B’s more &lt;em&gt;even&lt;/em&gt; spread is treated as closer to the true outcome in cumulative terms. &lt;/p&gt;
&lt;p&gt;In contrast, Log Loss (a local scoring rule) correctly favours Forecast A, since it focuses solely on the probability assigned to the observed result. This example highlights a key flaw of RPS: it can reward worse forecasts simply because they are smoother, even when they express less belief in what actually happened.&lt;/p&gt;
&lt;h1 id="experiment-3-a-real-world-test-using-bookmaker-odds"&gt;Experiment 3: A Real-World Test Using Bookmaker Odds&lt;/h1&gt;
&lt;p&gt;The previous experiments in this article used synthetic forecasts - carefully constructed examples designed to highlight how different scoring rules behave under controlled conditions. While these are useful for understanding the theoretical properties of metrics like RPS, Log Loss, and Brier Score, real-world forecasting is rarely so tidy.&lt;/p&gt;
&lt;p&gt;To evaluate how these scoring rules perform in more practical settings, I next used historical bookmaker odds - widely regarded as one of the most accurate publicly available sources of football match probabilities - and intentionally distorted them to simulate worse forecasts. &lt;/p&gt;
&lt;p&gt;The goal was to see which scoring rules could reliably detect the better forecast under more realistic, subtly challenging conditions.&lt;/p&gt;
&lt;p&gt;Here's how the experiment was structured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Bookmaker odds&lt;/strong&gt; from Bet365 (sourced via &lt;a href="https://football-data.co.uk/"&gt;football-data.co.uk&lt;/a&gt;) were converted into probabilities by inverting and normalizing the odds (to remove the overround).&lt;/li&gt;
&lt;li&gt;These bookmaker-derived probabilities were treated as the "true" forecast (&lt;strong&gt;Forecast A&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;slightly degraded forecast (Forecast B)&lt;/strong&gt; was created by distorting the original probabilities using &lt;strong&gt;temperature scaling&lt;/strong&gt;, a method that flattens or sharpens the distribution to simulate reduced model quality.&lt;/li&gt;
&lt;li&gt;For each match, an outcome was &lt;strong&gt;simulated from Forecast A&lt;/strong&gt;, and both forecasts were scored using:&lt;ul&gt;
&lt;li&gt;Log Loss (Ignorance Score)  &lt;/li&gt;
&lt;li&gt;Brier Score  &lt;/li&gt;
&lt;li&gt;Ranked Probability Score (RPS)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The process was repeated &lt;strong&gt;thousands of times&lt;/strong&gt;, and I tracked how often each scoring rule correctly favoured the higher-quality (bookmaker) forecast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to have a go at repeating this, code is shown below. However, please note that you may get slightly different results due to the random sampling. The general pattern should be similar though.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;distort_probs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.25&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Distort probability distribution using temperature scaling.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;power&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_betting_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distort_fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;distort_probs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Compare scoring rules on bookmaker vs distorted forecasts.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;correct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;logloss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;brier&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Sample match odds from the bookmaker data&lt;/span&gt;
        &lt;span class="n"&gt;odds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;odds_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds_list&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
        &lt;span class="n"&gt;true_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;implied&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;implied_probabilities&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Distort to create worse forecast&lt;/span&gt;
        &lt;span class="n"&gt;distorted_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;distort_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;true_probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Simulate an outcome based on true_probs&lt;/span&gt;
        &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;true_probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Score both forecasts&lt;/span&gt;
        &lt;span class="n"&gt;log_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignorance_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;true_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;log_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ignorance_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;distorted_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;brier_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multiclass_brier_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;true_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;brier_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multiclass_brier_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;distorted_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;rps_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;true_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;rps_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;distorted_probs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Compare which forecast scored better&lt;/span&gt;
        &lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;logloss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_a&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;log_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;brier&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brier_a&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;brier_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps_a&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;rps_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Normalize to proportion&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;correct&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2020-2021&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022-2023&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;    
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2023-2024&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2024-2025&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;b365_h&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;draw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;b365_d&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;b365_a&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;odds_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;run_betting_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_repeats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="results_2"&gt;Results&lt;/h2&gt;
&lt;p&gt;After running the simulation thousands of times, the proportion of cases in which each scoring rule correctly identified the better forecast was:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Scoring Rule&lt;/th&gt;
      &lt;th&gt;% Correct&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Log Loss&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;58.6%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Brier Score&lt;/td&gt;
      &lt;td&gt;57.8%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;RPS&lt;/td&gt;
      &lt;td&gt;56.1%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 3: Proportion of cases in which each scoring rule correctly identified the better forecast.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="interpreting-the-results_2"&gt;Interpreting the Results&lt;/h2&gt;
&lt;p&gt;Even in this more realistic setting, where the differences between forecasts were subtle and the "ground truth" forecasts came from actual bookmaker odds, &lt;strong&gt;Log Loss continued to outperform&lt;/strong&gt; the other metrics. Although all three scoring rules perform similarly, Log Loss was more consistent in identifying the better forecast.&lt;/p&gt;
&lt;p&gt;The gap may seem small, but in practice, these marginal improvements in sensitivity and reliability matter, especially when comparing models in leagues with fewer matches or when evaluating forecasts from similar models. &lt;strong&gt;RPS, once again, proved less effective at distinguishing higher-quality forecasts&lt;/strong&gt;, consistent with the findings from the earlier synthetic experiments.&lt;/p&gt;
&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Across both synthetic and real-world tests, &lt;strong&gt;Log Loss And Briar Scores consistently proved to be a more sensitive and reliable metric&lt;/strong&gt; for evaluating football forecasts than the Ranked Probability Score (RPS). While all three scoring rules eventually converge with enough data, RPS lags behind at smaller sample sizes and can even reward worse forecasts under certain conditions as a direct consequence of its non-locality. While Brier Score sometimes outperforms Log Loss in raw accuracy, particularly at moderate sample sizes, Log Loss’s sharper theoretical grounding and consistency make it my preferred choice.&lt;/p&gt;
&lt;p&gt;In real football analytics and betting, we rarely have huge sample sizes to evaluate models with. We often deal with limited data, closely matched forecasts, and the need for efficient, trustworthy evaluation tools. Based on this evidence, Log Loss stands out as the most appropriate scoring rule for comparing football predictive models, offering sharper discrimination, stronger theoretical foundations, and more reliable guidance when it matters most.&lt;/p&gt;</content><category term="Prediction"></category><category term="Metrics"></category><category term="RPS"></category><category term="Log Loss"></category><category term="Brier Score"></category></entry><entry><title>Pi Ratings: The Smarter Way to Rank Football Teams</title><link href="2025/04/14/pi-ratings-the-smarter-way-to-rank-football-teams/" rel="alternate"></link><published>2025-04-14T19:30:00+00:00</published><updated>2025-04-14T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-04-14:2025/04/14/pi-ratings-the-smarter-way-to-rank-football-teams/</id><summary type="html">&lt;p&gt;A smarter, football-focused alternative to Elo — using Pi Ratings to track team strength and predict matches...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Football analytics has come a long way in recent years, moving from simple league tables to more sophisticated methods of quantifying team performance. If you’ve ever looked at Elo ratings or FIFA rankings, you know that rating systems attempt to provide a clearer picture of how good a team really is, beyond just the wins and losses. But are these systems as accurate as they could be?&lt;/p&gt;
&lt;p&gt;Imagine two teams: Team A beats Team B 1-0 in a closely fought match, while Team C thrashes Team D 5-0. Should Team A and Team C gain the same rating boost? Many traditional rating systems don't differentiate much between these results, even though one clearly signals a more dominant performance. This is where Pi Ratings come in — a dynamic rating system designed to better reflect team ability by considering score discrepancies, home vs. away performances, and recent form.&lt;/p&gt;
&lt;p&gt;Pi Ratings were first introduced by &lt;a href="https://www.degruyter.com/document/doi/10.1515/jqas-2012-0036/html"&gt;Constantinou &amp;amp; Fenton&lt;/a&gt; in their research on dynamic football team ratings. Their study showed that Pi Ratings not only provided a more accurate measure of team strength compared to traditional systems like Elo, but also demonstrated profitability against bookmaker odds over five English Premier League seasons. &lt;/p&gt;
&lt;p&gt;In this article, we’ll explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where traditional rating systems, like Elo, lack the nuance for football’s specific demands&lt;/li&gt;
&lt;li&gt;How Pi Ratings improve upon them&lt;/li&gt;
&lt;li&gt;Some real-world results comparing the two&lt;/li&gt;
&lt;li&gt;How you can use Pi Ratings in your own football analytics work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re interested in football (soccer) data or just want a better way to evaluate your team’s strength, this is for you. Let’s dive in.&lt;/p&gt;
&lt;h1 id="why-traditional-ratings-systems-like-elo-fall-short"&gt;Why Traditional Ratings Systems (Like Elo) Fall Short&lt;/h1&gt;
&lt;p&gt;Rating systems play in important role in football analytics, providing a way to compare teams and predict future performance. One of the most widely used methods is the &lt;a href="https://en.wikipedia.org/wiki/Elo_rating_system"&gt;Elo rating&lt;/a&gt; system, originally developed for ranking chess players and later adapted for sports, including football. While Elo has proven useful in capturing team strength over time, it has several key limitations when applied to football.&lt;/p&gt;
&lt;h2 id="elo-ratings-a-brief-overview"&gt;Elo Ratings: A Brief Overview&lt;/h2&gt;
&lt;p&gt;Elo ratings work by assigning a numerical value to each team, which is adjusted after every match based on the result. If a higher-rated team wins, it gains only a small increase in its rating, whereas an underdog victory leads to a more significant rating adjustment. The formula accounts for the expected probability of winning, meaning that an upset results in a greater shift than an expected victory.&lt;/p&gt;
&lt;p&gt;The appeal of Elo lies in its simplicity: teams are ranked on a single scale, and their relative strength is updated dynamically based on match outcomes. However, despite its widespread use, Elo has several shortcomings that reduce its effectiveness in football.&lt;/p&gt;
&lt;h3 id="draws-are-not-handled"&gt;Draws Are Not Handled&lt;/h3&gt;
&lt;p&gt;An significant limitation of the standard Elo system is that it only accounts for wins and losses — it doesn't natively handle draws. While this works well in sports like chess or tennis, it's a major drawback in football, where draws are a common and meaningful outcome. &lt;/p&gt;
&lt;p&gt;Some football adaptations of Elo attempt to account for this by treating draws as "half a win," but prediction-wise, Elo doesn't model draws probabilistically — it simply doesn't output a draw probability. We'll return to this point later when we compare how Pi and Elo predictions perform head-to-head.&lt;/p&gt;
&lt;h3 id="score-margins-are-ignored"&gt;Score Margins Are Ignored&lt;/h3&gt;
&lt;p&gt;Elo ratings consider only whether a team wins, loses, or draws, but do not take into account the margin of victory. A 1-0 win is treated the same as a 5-0 win, even though the latter provides a much stronger indication of dominance. Since goal differences carry valuable information about team strength, ignoring them can lead to inaccurate assessments of performance.&lt;/p&gt;
&lt;h3 id="home-and-away-performances-are-not-handled-separately"&gt;Home and Away Performances Are Not Handled Separately&lt;/h3&gt;
&lt;p&gt;Football teams often exhibit significantly different performances at home and away due to factors such as crowd support, pitch familiarity, and travel fatigue. Traditional Elo ratings apply the same formula regardless of match location, failing to account for these home and away discrepancies. Some adaptations of Elo introduce a fixed home advantage adjustment, but this is often static and uniform across teams, whereas in reality, home advantage could vary by club and competition.&lt;/p&gt;
&lt;h3 id="slow-adaptation-to-recent-form"&gt;Slow Adaptation to Recent Form&lt;/h3&gt;
&lt;p&gt;Elo ratings update dynamically, but the changes are incremental and cumulative. This means that a team experiencing a sudden surge or decline in form may not have its rating adjust quickly enough to reflect its current strength. For instance, a team suffering from injuries to key players or undergoing a managerial change may take several matches before its Elo rating adequately reflects the shift in performance.&lt;/p&gt;
&lt;h1 id="what-are-pi-ratings"&gt;What Are Pi Ratings?&lt;/h1&gt;
&lt;p&gt;Pi Ratings (&lt;a href="https://www.degruyter.com/document/doi/10.1515/jqas-2012-0036/html"&gt;Constantinou &amp;amp; Fenton (2013)&lt;/a&gt;) are a dynamic rating system designed specifically for football, addressing several key shortcomings of traditional methods like Elo. &lt;/p&gt;
&lt;h2 id="understanding-pi-ratings"&gt;Understanding Pi Ratings&lt;/h2&gt;
&lt;p&gt;Pi Ratings are built on three key principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Score Margins Matter:&lt;/strong&gt; A team winning 5-0 should receive a greater rating boost than a team winning 1-0, as the larger margin suggests a stronger performance. Conversely, a narrow loss may not indicate a major decline in ability, whereas a heavy defeat should trigger a more significant rating drop.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Home and Away Ratings Are Separate:&lt;/strong&gt; Instead of using a fixed home advantage adjustment, Pi Ratings maintain distinct ratings for home and away performances. This allows the system to recognize that some teams perform significantly better at home than away, while others are more consistent across venues.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recent Performance Is More Important:&lt;/strong&gt; A team’s current form is more relevant than performances from months ago. Pi Ratings incorporate a learning rate that ensures recent matches influence a team’s rating more strongly than older results - you can think of this as being somewhat similar to the Dixon and Coles decay rate we discussed in my &lt;a href="https://pena.lt/y/2025/03/10/which-model-should-you-use-to-predict-football-matches/#incorporating-time-weighted-data-enhancing-predictions-with-the-dixon-and-coles-approach"&gt;previous article on predicting football match outcomes&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="interpreting-pi-ratings"&gt;Interpreting Pi Ratings&lt;/h2&gt;
&lt;p&gt;Each team begins with a rating of 0, which represents the level of an average team in the data. The system is zero-centered, meaning that when one team gains rating points, the other team loses the same amount. This ensures that all ratings are relative — a team with a rating of +1.0 is one goal better, on average, than the typical opponent. This property also makes it possible to compare teams across leagues or seasons.&lt;/p&gt;
&lt;h2 id="how-pi-ratings-are-calculated"&gt;How Pi Ratings Are Calculated&lt;/h2&gt;
&lt;p&gt;Pi Ratings update dynamically after each match by comparing what was expected to happen with what actually happened — specifically, in terms of the goal difference. Here's how the process works:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;All teams start with a neutral rating of 0&lt;/strong&gt;, representing the average team in the dataset. Ratings rise or fall based on performance over time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Before a match, an expected goal difference is calculated&lt;/strong&gt; using the home team’s Pi Rating and the away team’s Pi Rating&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;After the match, the actual goal difference is compared to this expectation&lt;/strong&gt;. If a team overperforms or underperforms significantly, the rating adjustment is larger.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The greater the surprise, the bigger the adjustment&lt;/strong&gt;. For example, if a team was expected to lose 2–0 but instead wins 3–0, that’s a major signal of strength and leads to a sharp increase in rating.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Home and away performances are rated separately&lt;/strong&gt;. A strong home result boosts the home rating, and vice versa for away games. However, each update also slightly nudges the other (home affects away, and away affects home), allowing the model to learn cross-context performance with a catch-up learning rate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now that we’ve covered the fundamentals of how Pi Ratings work, let’s see them in action. In the rest of the article, we’ll apply the Pi system to recent football seasons and compare its performance to an Elo model, to see whether Pi Ratings provide a more accurate representation of team strength.&lt;/p&gt;
&lt;h1 id="installing-the-penaltyblog-python-package"&gt;Installing the penaltyblog Python Package&lt;/h1&gt;
&lt;p&gt;If you've not used the &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; Python package before, you can install it using &lt;strong&gt;pip&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="calculating-pi-ratings-using-penaltyblog"&gt;Calculating Pi Ratings Using &lt;code&gt;penaltyblog&lt;/code&gt;&lt;/h1&gt;
&lt;p&gt;Let’s start by downloading some historical match data using &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt;, which provides an easy way to import football results from &lt;a href="https://football-data.co.uk/"&gt;football-data.co.uk&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;mpl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.dates&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;mdates&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;


&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2018-2019&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2019-2020&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2020-2021&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022-2023&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2023-2024&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2024-2025&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2025-04-01&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we'll loop through all the historical results from oldest to latest and update each team's Pi Rating based on the score line.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;pi_ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PiRatingSystem&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;goal_diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;pi_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update_ratings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;goal_diff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="manchester-citys-pi-rating"&gt;Manchester City's Pi Rating&lt;/h1&gt;
&lt;p&gt;Let's take a look at Manchester City's Pi Rating over time.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pi_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rating_history&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Pi Rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250404_mcfc_pi.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Manchester City's Pi Rating over time&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We can see that it took about a year of data for Manchester City’s Pi Rating to stabilise after starting at zero, followed by a long stretch of dominance. A noticeable dip occurred during the first half of the 2023/2024 season, before a strong run of form saw them recover and eventually claim the title. &lt;/p&gt;
&lt;p&gt;The dramatic downturn in the 2024/2025 season — at least by Manchester City’s usual standards — is clearly reflected in the ratings, with the lowest point arriving at the end of 2024. As someone who’s watched a lot of Manchester City over the years, the ratings align closely with how the team’s form felt to me watching.&lt;/p&gt;
&lt;h1 id="chelseas-pi-rating"&gt;Chelsea's Pi Rating&lt;/h1&gt;
&lt;p&gt;Let's take a look at Chelsea's Pi Rating next and overlay the dates they changed managers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pi_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rating_history&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Chelsea&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Pi Rating&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2024-07-01&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Maresca&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2023-07-01&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Pochettino&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2023-04-06&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Lampard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2022-09-08&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Potter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2021-01-26&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Tuchel&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;date_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mdates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datestr2num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2019-07-04&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymin&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ymax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dotted&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Lampard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;top&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250404_chelsea_pi.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Chelsea's Pi Rating over time&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Figure 2 tells a familiar story for Chelsea fans: manager changes often followed periods of declining performance, as reflected in sharp dips in the Pi Ratings. The chart clearly shows how Graham Potter and Frank Lampard oversaw particularly difficult spells, with both home and away ratings dropping significantly. Mauricio Pochettino, by contrast, made a noticeable impact with the Pi Ratings climbing steadily during his tenure. Based on how quickly Pochettino turned Chelsea's rating around, it's perhaps a shame that he was only able to stay for one season.&lt;/p&gt;
&lt;h1 id="using-pi-ratings-to-predict-matches"&gt;Using Pi Ratings to Predict Matches&lt;/h1&gt;
&lt;p&gt;Beyond rating teams, Pi Ratings can also be used to estimate the probability of a home win, draw, or away win between two sides. In my &lt;a href="https://pena.lt/y/2025/03/10/which-model-should-you-use-to-predict-football-matches/"&gt;previous article&lt;/a&gt;, we explored how well more detailed goals-based models performed on this dataset. Now, let’s put Pi Ratings to the test on the same matches and see how their predictive power compares.&lt;/p&gt;
&lt;p&gt;Let's start off by downloading the same data we used in the previous article.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2015-2016&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2016-2017&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2017-2018&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2018-2019&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2019-2020&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2020-2021&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022-2023&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2023-2024&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2024-2025&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;date&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;date&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ftr_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ftr_numeric&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ftr&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ftr_map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we loop through the test data, using the Pi Ratings to predict each fixture's outcome. After making a prediction, we update the model with the actual result. To evaluate how well the model performs, we use the &lt;a href="http://constantinou.info/downloads/papers/solvingtheproblem.pdf"&gt;Ranked Probability Score &lt;/a&gt;(RPS), which measures the accuracy of probabilistic predictions.&lt;/p&gt;
&lt;p&gt;I explained more about the code and the RPS metric in my &lt;a href="https://pena.lt/y/2025/03/10/which-model-should-you-use-to-predict-football-matches/"&gt;previous article&lt;/a&gt; , so I won’t repeat everything here — but if you're new to these concepts, I definitely recommend giving it a read first.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;start_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;season == &amp;#39;2023-2024&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;run_dates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;pi_ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PiRatingSystem&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Processing dates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;homes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;aways&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;fthg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fthg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;ftag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;outcomes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftr_numeric&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

            &lt;span class="n"&gt;probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pi_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calculate_match_probabilities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_win&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;draw&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_win&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
            &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

            &lt;span class="n"&gt;goal_diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fthg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ftag&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;pi_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update_ratings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;goal_diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  

&lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;RPS: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;RPS:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.19905621086882397
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The average RPS for Pi Ratings on this dataset was 0.199, which gives us a solid benchmark to compare against other models. While this number doesn’t mean much in isolation, lower values indicate more accurate probabilistic predictions — so the real test is how it stacks up next to Elo and the goals-based models from the &lt;a href="https://pena.lt/y/2025/03/10/which-model-should-you-use-to-predict-football-matches/"&gt;previous article&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id="comparing-elo-and-pi-ratings"&gt;Comparing Elo and Pi Ratings&lt;/h1&gt;
&lt;p&gt;As mentioned earlier, standard Elo Ratings only handle wins and losses. This makes them less suitable for football, where draws are common and carry meaningful information — especially when it comes to making probabilistic predictions.&lt;/p&gt;
&lt;p&gt;To allow for a fair comparison with Pi Ratings, I extended the standard Elo system in &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; to handle draws in both rating updates and match predictions. In the rating updates, draws are treated as being worth half a win (0.5), a common adaptation in football Elo models.&lt;/p&gt;
&lt;p&gt;To handle draws — which are common in football but not natively supported by standard Elo — I added a Gaussian-shaped draw probability model. This approach assumes draws are most likely when teams are closely matched and become less likely as the rating gap widens.&lt;/p&gt;
&lt;p&gt;The draw probability is computed as:&lt;/p&gt;
&lt;p&gt;$$\text{p_draw} = \text{draw_base} \cdot \exp\left(-\frac{(\text{elo_diff})^2}{2 \cdot (\text{draw_width})^2}\right)$$&lt;/p&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$\text{draw_base}$ is the peak probability (default 0.3)&lt;/li&gt;
&lt;li&gt;$\text{elo_diff}$ is the difference in Elo ratings between the two teams.&lt;/li&gt;
&lt;li&gt;$\text{draw_width}$ is the width of the Gaussian distribution, controlling how quickly the probability of a draw decreases with increasing Elo rating difference.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;start_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;season == &amp;#39;2023-2024&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;run_dates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;elo_ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Elo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Processing dates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;homes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;aways&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;fthg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fthg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;ftag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;outcomes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftr_numeric&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

            &lt;span class="n"&gt;probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;elo_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict_match_outcome_probs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_win&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;draw&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_win&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
            &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;elo_ratings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update_ratings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;    

&lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;RPS: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;RPS:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.2041672568256303
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The results show that Pi Ratings outperform Elo when it comes to predicting football matches. While both systems are simple and fast, Pi Ratings offer several advantages that make them better suited to the dynamics of football. They were designed specifically for the sport, natively handle draws, and update team ratings based on score margins — capturing not just whether a team won or lost, but how convincingly. &lt;/p&gt;
&lt;p&gt;Pi Ratings also separate home and away performances and adapt more quickly to changes in form. These enhancements result in more accurate probability estimates, as reflected in the lower Ranked Probability Score. For anyone working with football data — especially in situations where simplicity and speed matter — Pi Ratings offer a powerful upgrade over traditional Elo.&lt;/p&gt;
&lt;h1 id="comparing-ratings-models-to-full-statistical-models"&gt;Comparing Ratings Models to Full Statistical Models&lt;/h1&gt;
&lt;p&gt;While Pi Ratings offer a clear improvement over traditional Elo in the context of football, it's worth asking how they stack up against more advanced, purpose-built statistical models. These models — like Dixon and Coles, Bivariate Poisson, and Zero-Inflated Poisson — are designed specifically for predicting football scores, often incorporating detailed assumptions about goal distributions, time decay, and team-specific strengths.&lt;/p&gt;
&lt;p&gt;Let’s compare the predictive accuracy of these models against both Pi Ratings and Elo, using the same dataset and evaluation metric (Ranked Probability Score).&lt;/p&gt;
&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Model&lt;/th&gt;
            &lt;th&gt;Ranked Probability Score (RPS)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;Dixon and Coles&lt;/td&gt;
            &lt;td&gt;0.19137780685608083&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Weibull Count&lt;/td&gt;
            &lt;td&gt;0.19141358825225932&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Poisson&lt;/td&gt;
            &lt;td&gt;0.19154229559464445&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Zero-inflated Poisson&lt;/td&gt;
            &lt;td&gt;0.19154043298013113&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Negative Binomial&lt;/td&gt;
            &lt;td&gt;0.19155750459845977&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Bivariate Poisson&lt;/td&gt;
            &lt;td&gt;0.19161764011301444&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Pi Ratings&lt;/td&gt;
            &lt;td&gt;0.19905621086882397&lt;/td&gt;
        &lt;/tr&gt;
            &lt;tr&gt;
            &lt;td&gt;Elo Ratings&lt;/td&gt;
            &lt;td&gt;0.2041672568256303&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1: Rank Probability Scores for the different model types&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;While the statistical models clearly lead in predictive accuracy, this comes with trade-offs. They often require more data, greater model complexity, and longer computation times — along with domain-specific assumptions that aren’t always easy to explain or adapt. &lt;/p&gt;
&lt;p&gt;Pi Ratings, by contrast, strike an effective middle ground. They’re more accurate than Elo, require only basic match results, and are fast and interpretable — making them ideal for many real-world applications where simplicity, speed, and solid predictive power matter more than squeezing out marginal gains.&lt;/p&gt;
&lt;h1 id="conclusions"&gt;Conclusions&lt;/h1&gt;
&lt;p&gt;If you’re rating chess players, Elo is exactly what you need — it’s simple, elegant, and perfectly suited to win/loss outcomes. But football is a different game: draws are common, scorelines matter, and form fluctuates quickly. That’s where Pi Ratings shine.&lt;/p&gt;
&lt;p&gt;They offer a smarter, football-specific alternative to Elo — capturing performance nuances, handling home and away differences, and adapting more quickly to change. While not as accurate as heavyweight statistical models, Pi Ratings strike a valuable middle ground: fast, interpretable, and amazingly powerful for something so simple.&lt;/p&gt;
&lt;p&gt;If you're building a football model, running simulations, or just want a better way to track your team’s strength — start with Pi.&lt;/p&gt;</content><category term="Prediction"></category><category term="Ratings"></category><category term="Ranking"></category><category term="Pi-ratings"></category><category term="Elo"></category></entry><entry><title>Football Prediction Models: Which Ones Work the Best?</title><link href="2025/03/10/which-model-should-you-use-to-predict-football-matches/" rel="alternate"></link><published>2025-03-10T19:30:00+00:00</published><updated>2025-03-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-03-10:2025/03/10/which-model-should-you-use-to-predict-football-matches/</id><summary type="html">&lt;p&gt;Comparing football goals models — Poisson, Dixon-Coles, and more — to see which predicts best and how to optimize them...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;I've recently released version &lt;code&gt;1.1.0&lt;/code&gt; of my &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; Python package, bringing significant improvements to the speed and predictive performance of football (soccer) goals models. With this update, I thought it would be a great opportunity to compare the different models available — such as Poisson, Dixon and Coles, and more — exploring how they work, how to optimize their parameters, and how they perform on real-world data. &lt;/p&gt;
&lt;p&gt;Let's start off with a high-level look at the different models available, looking at how they work, what their strengths are and what their weaknesses are.&lt;/p&gt;
&lt;h2 id="what-is-the-poisson-goals-model"&gt;What is the Poisson Goals Model?&lt;/h2&gt;
&lt;p&gt;One of the most widely used models for predicting football scores is the Poisson goals model. It assumes that the number of goals scored by each team follows a &lt;a href="https://en.wikipedia.org/wiki/Poisson_distribution"&gt;Poisson distribution&lt;/a&gt; — essentially meaning that goals occur randomly over time, but with a certain average rate (known as λ). This rate is determined by factors like a team’s attacking strength, their opponent’s defensive ability, and whether they are playing at home or away.&lt;/p&gt;
&lt;h4 id="strengths-of-the-poisson-goals-model"&gt;Strengths of the Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Simple and efficient – It’s easy to understand and requires only a few parameters, making it quick to calculate.&lt;/li&gt;
&lt;li&gt;Decent predictive power – Despite its simplicity, it often performs well in forecasting match outcomes.&lt;/li&gt;
&lt;li&gt;Useful for betting and analytics – Many bookmakers and analysts use variations of the Poisson model as a baseline.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-poisson-goals-model"&gt;Weaknesses of the Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Overestimates extreme results – Because it assumes goal events are independent, it sometimes predicts too many high-scoring games.&lt;/li&gt;
&lt;li&gt;Doesn’t handle low-scoring bias well – Football has more 0-0 and 1-1 draws than the model naturally expects.&lt;/li&gt;
&lt;li&gt;Ignores game dynamics – It doesn’t account for strategic shifts (e.g., teams playing more defensively when leading).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To address some of these issues, more sophisticated models have been developed to try and combat some of the Poisson model's limitations.&lt;/p&gt;
&lt;h2 id="what-is-the-dixon-and-coles-goals-model"&gt;What is the Dixon and Coles Goals Model?&lt;/h2&gt;
&lt;p&gt;The Dixon and Coles goals model is an improvement on the standard Poisson approach for predicting football scores. While the Poisson model assumes goals are scored independently, Dixon and Coles recognized that this assumption doesn’t always hold in real-world matches. In particular, low-scoring games, especially 0-0, 1-0, and 1-1, happen more often than a basic Poisson model would predict. To correct this, they introduced an additional adjustment factor that modifies the probabilities of low-score outcomes, making the model more accurate for practical football forecasting.&lt;/p&gt;
&lt;h4 id="strengths-of-the-dixon-and-coles-goals-model"&gt;Strengths of the Dixon and Coles Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;More realistic score predictions – It better accounts for the observed tendency of football matches to produce more low-score draws.&lt;/li&gt;
&lt;li&gt;Improves predictive accuracy – By adjusting goal probabilities, it refines match outcome predictions, especially for betting and analytics.&lt;/li&gt;
&lt;li&gt;Still relatively simple – It builds on the Poisson model without adding excessive complexity, making it practical to implement.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-dixon-and-coles-goals-model"&gt;Weaknesses of the Dixon and Coles Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Less effective for high-scoring matches – Since it primarily corrects low-score probabilities, it may not offer much improvement for teams or leagues where high-scoring games are common.&lt;/li&gt;
&lt;li&gt;Parameter estimation can be trickier – The extra adjustment introduces an additional parameter that needs to be optimized, making implementation more complex.&lt;/li&gt;
&lt;li&gt;Assumes the same adjustment applies to all matches – The correction factor is applied uniformly, meaning it doesn’t adapt dynamically to different match contexts or team styles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Overall, the Dixon and Coles model offers a small but meaningful refinement over Poisson, making it a better choice when predicting match results, particularly in low-scoring leagues or competitions where draws are common.&lt;/p&gt;
&lt;h2 id="what-is-the-bivariate-poisson-goals-model"&gt;What is the Bivariate Poisson Goals Model?&lt;/h2&gt;
&lt;p&gt;The Bivariate Poisson goals model is a more advanced approach to predicting football match scores that builds on the standard Poisson goals model by introducing correlation between the number of goals scored by each team. &lt;/p&gt;
&lt;p&gt;Unlike the basic Poisson goals model, which assumes that each team’s goal count is independent of the other, the Bivariate Poisson goals model acknowledges that certain match factors, such as overall game tempo, attacking intent, or defensive frailties, can influence both teams simultaneously and therefore both teams' scores are dependent on each other.&lt;/p&gt;
&lt;p&gt;To achieve this, the model introduces a shared dependency term, which captures the extent to which goal-scoring events are linked. For example, in high-tempo matches where both teams play aggressively, a high-scoring game (like 3-2) may be more likely than two independent Poisson processes would suggest. And, in defensive matchups, both teams might struggle to score, increasing the likelihood of a low-scoring draw.&lt;/p&gt;
&lt;h4 id="strengths-of-the-bivariate-poisson-goals-model"&gt;Strengths of the Bivariate Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Accounts for correlation in goal-scoring – Unlike simpler models, it recognizes that teams don’t score in isolation; match dynamics often affect both sides.&lt;/li&gt;
&lt;li&gt;Improves predictions across all scorelines – While Dixon and Coles only corrects for low-score biases, the Bivariate Poisson goals model affects high-score predictions as well.&lt;/li&gt;
&lt;li&gt;More flexible than Poisson and Dixon and Coles – it extends the standard Poisson goals framework without imposing arbitrary score adjustments.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-bivariate-poisson-goals-model"&gt;Weaknesses of the Bivariate Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;More complex to estimate – the additional correlation parameter makes it harder to fit the model to data, requiring more advanced statistical techniques.&lt;/li&gt;
&lt;li&gt;Less interpretable than simpler models – while Poisson-based models are easy to explain, the dependency structure in Bivariate Poisson makes it more abstract.&lt;/li&gt;
&lt;li&gt;Can be unnecessary for low-scoring leagues – if most matches end with 0-0, 1-0, or 1-1 scorelines, the extra complexity might not give noticeable accuracy gains over simpler models.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Overall, the Bivariate Poisson goals model is a strong choice for capturing the interaction between teams, particularly in leagues or matchups where attacking styles, game tempo, or defensive weaknesses lead to both high and low-scoring outcomes. However, its added complexity means it may not always be worth the effort compared with a simpler model like Dixon and Coles.&lt;/p&gt;
&lt;h2 id="what-is-the-zero-inflated-poisson-goals-model"&gt;What is the Zero-inflated Poisson Goals Model?&lt;/h2&gt;
&lt;p&gt;The Zero-Inflated Poisson (ZIP) goals model is an extension of the standard Poisson goals model that accounts for the fact that football matches often have more goalless results (0-0, 1-0, 0-1) than a standard Poisson process predicts. In simple terms, the ZIP model assumes that some matches have a high probability of producing zero goals due to defensive tactics, lack of attacking quality, or other match-specific factors that the standard Poisson goals model does not capture.&lt;/p&gt;
&lt;p&gt;This is done by introducing an extra parameter, which represents the probability that a match belongs to a special zero-inflated category rather than following the usual Poisson goal distribution. If a match is in this category, its goal count is forced to be zero. Otherwise, the number of goals is drawn from a standard Poisson distribution. This allows the model to explicitly account for the excess number of goalless matches while still modeling other scorelines using a Poisson process.&lt;/p&gt;
&lt;h4 id="strengths-of-the-zero-inflated-poisson-goals-model"&gt;Strengths of the Zero-Inflated Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Better at modeling goalless matches – it accounts for the tendency of football matches to have more 0-0 results than a pure Poisson process would predict.&lt;/li&gt;
&lt;li&gt;Improves accuracy for defensive teams – useful in competitions or teams where ultra-defensive tactics result in frequent low-scoring games.&lt;/li&gt;
&lt;li&gt;Still relatively simple – it only adds one extra parameter to the Poisson model.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-zero-inflated-poisson-goals-model"&gt;Weaknesses of the Zero-Inflated Poisson Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Only adjusts for excess zero-score games – the ZIP goals model specifically corrects for too many goalless matches but does not address other common Poisson goals model issues, such as overestimating extreme scorelines (e.g., 4-3 or 5-2 results).&lt;/li&gt;
&lt;li&gt;Assumes zero-inflation is the same across all matches – the model applies a single probability for zero-goal inflation, meaning it doesn’t adapt dynamically to different teams, leagues, or match conditions (e.g., some teams naturally play more defensive football than others).&lt;/li&gt;
&lt;li&gt;May not improve predictions in high-scoring leagues – in competitions where 0-0 results are not unusually frequent, the extra complexity of zero-inflation is unnecessary and may not lead to better forecasts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Overall, the Zero-Inflated Poisson goals model is a useful model when working with leagues or teams that have a higher-than-expected number of goalless results; however, it doesn’t adjust for other scoreline distortions.&lt;/p&gt;
&lt;h2 id="what-is-the-negative-binomial-goals-model"&gt;What is the Negative Binomial Goals Model?&lt;/h2&gt;
&lt;p&gt;The Negative Binomial goals model is an extension of the standard Poisson goals model that addresses the issue of overdispersion — where the variance in goal counts is greater than the mean, which the Poisson model cannot handle properly. In football data, overdispersion often occurs because real-world score lines include more variability than the simple Poisson process predicts.&lt;/p&gt;
&lt;p&gt;The Negative Binomial goals model attempts to solve this by introducing an extra parameter, which allows the variance to be larger than the mean, making it more flexible in capturing a wider range of goal distributions. Instead of assuming a fixed rate of goal-scoring for each team, it accounts for additional variability in scoring ability between different matches. This makes it particularly useful for leagues or teams where results can fluctuate significantly from game to game.&lt;/p&gt;
&lt;h4 id="strengths-of-the-negative-binomial-goals-model"&gt;Strengths of the Negative Binomial Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Handles overdispersion effectively – it allows for greater variance in goal counts, making it more accurate for leagues or teams with unpredictable scorelines.&lt;/li&gt;
&lt;li&gt;More realistic high-score predictions – unlike the standard Poisson model, it doesn’t underestimate the frequency of extreme results (e.g., 4-2, 5-3), making it more reliable for goal-heavy leagues.&lt;/li&gt;
&lt;li&gt;Still interpretable and relatively simple – it’s a natural extension of Poisson, meaning it retains much of the intuitiveness and ease of implementation while improving accuracy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-negative-binomial-goals-model"&gt;Weaknesses of the Negative Binomial Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Doesn’t model goal correlation between teams – while it improves goal variance, it still treats each team’s goal count as independent, meaning it doesn’t account for game dynamics where both teams’ performances are linked (unlike the Bivariate Poisson goals model).&lt;/li&gt;
&lt;li&gt;Can be unnecessary for low-scoring leagues – if a competition has mostly 0-0, 1-0, or 1-1 matches, the added flexibility of the Negative Binomial goals model may not provide a significant advantage over simpler models like Poisson or Dixon and Coles goal models.&lt;/li&gt;
&lt;li&gt;Requires an extra parameter to estimate – while not overly complex, it adds another layer of statistical estimation, which can make optimization and model fitting slightly more challenging compared to a basic Poisson model.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Negative Binomial goals model is particularly useful in high-scoring leagues or tournaments where teams display inconsistent scoring patterns. It’s a strong alternative when Poisson-based models struggle with overdispersion, but it won’t help much in low-scoring leagues where correcting draw probabilities (as Dixon-Coles does) is more important.&lt;/p&gt;
&lt;h2 id="what-is-the-weibull-count-copula-goals-model"&gt;What is the Weibull Count + Copula Goals Model?&lt;/h2&gt;
&lt;p&gt;Instead of assuming that goals follow a Poisson or Negative Binomial distribution, this approach uses a Weibull distribution, which allows for more flexible goal distributions. This can be useful since real-world goal distributions often don't follow the Poisson. The Weibull goals model may be able to better capture the empirical shape of goal distributions, accommodating overdispersion and other nuances in scoring patterns.&lt;/p&gt;
&lt;p&gt;The model also incorporates a &lt;a href="https://en.wikipedia.org/wiki/Copula_(statistics)"&gt;copula&lt;/a&gt;, which allows for goal-scoring correlation between teams. Unlike say the standard Bivariate Poisson model, which assumes a specific form of dependence, the copula framework is more flexible and can capture different types of relationships between teams' goal counts, such as how a team's attacking performance influences their opponent's defensive response.&lt;/p&gt;
&lt;h4 id="strengths-of-the-weibull-count-copula-goals-model"&gt;Strengths of the Weibull Count + Copula Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;More flexible than Poisson-based models – the Weibull count model does not assume a fixed mean-variance relationship, making it better at handling overdispersion and improving accuracy for extreme scorelines.&lt;/li&gt;
&lt;li&gt;Accounts for goal correlation in a more general way – unlike the Bivariate Poisson model, which assumes a specific type of dependency, the copula approach allows for a wider range of correlations between teams' goal counts.&lt;/li&gt;
&lt;li&gt;Better predictive power in certain contexts – Weibull Count + Copula approach can often outperform traditional Poisson-based models, particularly in leagues with more complex goal-scoring patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weaknesses-of-the-weibull-count-copula-goals-model"&gt;Weaknesses of the Weibull Count + Copula Goals Model&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Significantly more complex to implement – Unlike simpler Poisson-based models, this approach requires some heavy mathematical techniques to estimate the Weibull parameters and copula dependency structure, making it harder to apply in practice.&lt;/li&gt;
&lt;li&gt;More computationally expensive – The added flexibility comes at a cost: fitting the model requires more intensive calculations, which can be impractical for large-scale applications compared to Poisson-based models.&lt;/li&gt;
&lt;li&gt;Not always a clear improvement over simpler models – while it offers greater flexibility, in many cases simpler models like Dixon and Coles or Negative Binomial goals models still perform well enough without the added complexity, making this model potentially unnecessary for certain leagues or datasets.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This model is most useful in situations where traditional Poisson-based models struggle, for example, in leagues where goal distributions exhibit strong overdispersion or where team interactions significantly influence each other's scoring potential. However, it is significantly more time consuming to fit and may not always provide a clear improvement over simpler models.&lt;/p&gt;
&lt;p&gt;The table below provides a quick reference to help you decide which model best suits your needs based on factors like scoring patterns, goal correlation, and computational complexity.&lt;/p&gt;
&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Model&lt;/th&gt;
            &lt;th&gt;Strengths&lt;/th&gt;
            &lt;th&gt;Weaknesses&lt;/th&gt;
            &lt;th&gt;Best Used For&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Poisson&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;Simple, efficient, widely used&lt;/td&gt;
            &lt;td&gt;Overpredicts high scores, doesn’t handle low-score bias well&lt;/td&gt;
            &lt;td&gt;General forecasting, fast model training&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Dixon &amp; Coles&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;Corrects low-score biases, better real-world accuracy&lt;/td&gt;
            &lt;td&gt;Assumes fixed adjustment across all matches, extra parameter tuning needed&lt;/td&gt;
            &lt;td&gt;Low-scoring leagues, competitions with many draws&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Bivariate Poisson&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;Models goal correlation between teams, useful for high-scoring matches&lt;/td&gt;
            &lt;td&gt;Complex, harder to interpret, computationally expensive&lt;/td&gt;
            &lt;td&gt;High-scoring leagues, capturing match dynamics&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Zero-Inflated Poisson&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;Better at modeling 0-0 draws, improves accuracy for defensive teams&lt;/td&gt;
            &lt;td&gt;Only addresses goalless matches, assumes fixed zero-inflation probability&lt;/td&gt;
            &lt;td&gt;Ultra-defensive teams, low-scoring competitions&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Negative Binomial&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;Handles overdispersion, more realistic high-score predictions&lt;/td&gt;
            &lt;td&gt;Still assumes independent goal counts, may be unnecessary for low-scoring leagues&lt;/td&gt;
            &lt;td&gt;High-scoring leagues, competitions with unpredictable scorelines&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;Weibull Count + Copula&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;More flexible goal distribution, accounts for complex scoring relationships&lt;/td&gt;
            &lt;td&gt;Highly complex, computationally expensive, difficult to implement&lt;/td&gt;
            &lt;td&gt;Leagues with extreme goal-scoring patterns, advanced predictive modeling&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 1&lt;/strong&gt;: Comparison of the different model types currently available in &lt;code&gt;penaltyblog&lt;/code&gt;&lt;/p&gt;
&lt;h1 id="installing-the-penaltyblog-python-package"&gt;Installing the penaltyblog Python Package&lt;/h1&gt;
&lt;p&gt;Now we've got the theory out of the way, let's look at how to use these models via the &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; Python package. If you've not used it before, you can install it using &lt;strong&gt;pip&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="downloading-football-data-the-easy-way"&gt;Downloading Football Data the Easy Way&lt;/h1&gt;
&lt;p&gt;Next, we are going to need some data to fit the models to. We'll use &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog's&lt;/a&gt; built in functionality to download data for the English Premier League from &lt;a href="https://football-data.co.uk/"&gt;football-data.co.uk&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;We start off by downloading the data, then we index the &lt;code&gt;date&lt;/code&gt; column to speed up filtering the dataframe later on. Finally, we create a new column called &lt;code&gt;ftr_numeric&lt;/code&gt; which maps the &lt;code&gt;ftr&lt;/code&gt; column to a numeric value, so we can use it in the models more easily.&lt;/p&gt;
&lt;p&gt;We're going to use the &lt;strong&gt;Dutch Eredivisie&lt;/strong&gt; for this example, but you can use any league you want by changing the &lt;code&gt;league&lt;/code&gt; and &lt;code&gt;season&lt;/code&gt; arguments in the code.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2015-2016&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2016-2017&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2017-2018&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2018-2019&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2019-2020&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2020-2021&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022-2023&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2023-2024&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NLD Eredivisie&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2024-2025&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;date&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;date&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ftr_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ftr_numeric&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ftr&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ftr_map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To evaluate and compare the different models in our package, we employ a &lt;strong&gt;rolling time-based validation approach&lt;/strong&gt;. Instead of training each model on the entire dataset at once, we simulate how they would perform in a real-world betting or forecasting scenario. &lt;/p&gt;
&lt;p&gt;The process involves iterating through the dataset one date at a time, fitting the model only to the data available &lt;strong&gt;up to that date&lt;/strong&gt;, and then using it to predict the outcomes of fixtures on that specific day. &lt;/p&gt;
&lt;p&gt;This method ensures that our predictions are made without knowledge of future results, mimicking the conditions under which these models would be used in practice. By avoiding data leakage and preserving the natural temporal structure of football matches, we obtain a more fair and realistic assessment of each model’s predictive power.&lt;/p&gt;
&lt;h1 id="fitting-the-goals-models"&gt;Fitting the Goals Models&lt;/h1&gt;
&lt;p&gt;Let's start off by getting all the unique dates since the start of the &lt;strong&gt;2023-2024 season&lt;/strong&gt; to test the models on.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;start_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;season == &amp;#39;2023-2024&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;run_dates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_dates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The next block of code implements our rolling time-based validation approach to evaluate a &lt;strong&gt;Poisson&lt;/strong&gt; model by predicting match outcomes over time. The process iterates through each date in &lt;code&gt;run_dates&lt;/code&gt;, simulating a real-world forecasting scenario where only past data is available when making predictions.&lt;/p&gt;
&lt;p&gt;For each date:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A training set is created, including matches from the previous three years up to (but not including) the current date.&lt;/li&gt;
&lt;li&gt;A test set is created, containing matches that occur on the current date.&lt;/li&gt;
&lt;li&gt;The model is initialized and trained using the training set.&lt;/li&gt;
&lt;li&gt;If the model successfully fits, predictions are made for the test fixtures.&lt;/li&gt;
&lt;li&gt;The predicted probabilities (&lt;code&gt;home_draw_away&lt;/code&gt;) and actual outcomes are stored for evaluation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To ensure robustness:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The code skips dates with no fixtures.&lt;/li&gt;
&lt;li&gt;Errors during prediction are caught and ignored to prevent the loop from breaking.&lt;/li&gt;
&lt;li&gt;The final Ranked Probability Score (RPS) is computed to measure model performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can repeat this process for each of the model types we're interested in to get a sense of how well they perform on our dataset.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Processing dates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;lookback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DateOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;years&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;lookback&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DixonColesGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;homes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;aways&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;outcomes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftr_numeric&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                    &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;

&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="measuring-performance-of-the-goals-models"&gt;Measuring Performance of the Goals Models&lt;/h1&gt;
&lt;p&gt;We will use the &lt;a href="http://constantinou.info/downloads/papers/solvingtheproblem.pdf"&gt;Ranked Probability Score &lt;/a&gt;(RPS)  to evaluate the accuracy of the models' forecasts, which is a metric that measures how well a model’s predicted probability distribution aligns with the actual outcome.&lt;/p&gt;
&lt;p&gt;RPS is calculated as the squared difference between the cumulative predicted probabilities and the cumulative observed outcome. It ranges from 0 to 1, where lower values indicate better predictive performance.&lt;/p&gt;
&lt;p&gt;In football modeling, a model with a lower RPS assigns higher probabilities to correct outcomes while distributing probabilities meaningfully across alternatives, making it more reliable for decision-making.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table 2&lt;/strong&gt; below shows the results for all the different models against our Eredivisie dataset. We can see that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dixon and Coles performed the best with the lowest RPS.&lt;/li&gt;
&lt;li&gt;Weibull Count followed closely behind.&lt;/li&gt;
&lt;li&gt;Bivariate Poisson had the highest RPS, indicating the weakest performance on this set of data.&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Model&lt;/th&gt;
            &lt;th&gt;Ranked Probability Score (RPS)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;Dixon and Coles&lt;/td&gt;
            &lt;td&gt;0.19137780685608083&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Weibull Count&lt;/td&gt;
            &lt;td&gt;0.19141358825225932&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Poisson&lt;/td&gt;
            &lt;td&gt;0.19154229559464445&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Zero-inflated Poisson&lt;/td&gt;
            &lt;td&gt;0.19154043298013113&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Negative Binomial&lt;/td&gt;
            &lt;td&gt;0.19155750459845977&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Bivariate Poisson&lt;/td&gt;
            &lt;td&gt;0.19161764011301444&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2: Rank Probability Scores for the different model types&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="optimising-the-goal-models-lookback-window"&gt;Optimising the Goal Model's Lookback Window&lt;/h1&gt;
&lt;p&gt;In the previous section, we used a fixed three-year lookback window to train our models before making predictions. However, the optimal amount of historical data to use is not always clear — too much data might include outdated information, while too little could lead to overfitting to recent trends. &lt;/p&gt;
&lt;p&gt;Next, we'll optimise this lookback window by systematically varying the amount of past data used for training. By looping through different window sizes (e.g., 1 year, 2 years, 3 years, etc.), we can assess how the model’s predictive performance changes based on the amount of historical data it has access to and see if we can improve the RPS further.&lt;/p&gt;
&lt;p&gt;For simplicity, we'll just optimise the Dixon and Coles model since it performed best, but ideally you'd repeat this process for all the models you're interested in to find the best performing model on your dataset. &lt;/p&gt;
&lt;p&gt;We'll use the same rolling time-based validation approach as before, but this time we'll vary the lookback window by just tweaking the &lt;code&gt;lookback&lt;/code&gt; variable by changing the number of years we go back. For example, the code below gives us a lookback window of two years.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;lookback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DateOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;years&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Figure 1&lt;/strong&gt; below shows the results for different lookback windows. We can see that initially adding more data improves the model's performance, but after around four seasons the RPS starts to increase again and the model's performance starts to degrade.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250309_lookback.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Dixon and Coles RPS using different lookback windows&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="incorporating-time-weighted-data-enhancing-predictions-with-the-dixon-and-coles-approach"&gt;Incorporating Time-Weighted Data: Enhancing Predictions with the Dixon and Coles Approach&lt;/h1&gt;
&lt;p&gt;Next, we'll optimise the weighting applied to the data based on the methodology proposed by Dixon and Coles in their &lt;a href="https://academic.oup.com/jrsssc/article-abstract/46/2/265/6990546?redirectedFrom=PDF"&gt;seminal paper&lt;/a&gt;. Their approach acknowledges that more recent matches carry greater predictive value than older ones when modeling football outcomes so should be given more importance in the model's predictions. &lt;/p&gt;
&lt;p&gt;This is done by introducing an exponential decay function so that older games contribute less to the model’s parameter estimates, allowing it to adapt more effectively to recent team performances. This weighting method helps balance the trade-off between using sufficient historical data and ensuring that the model remains responsive to current trends. &lt;/p&gt;
&lt;h4 id="how-the-dixon-and-coles-weights-are-calculated"&gt;How the Dixon and Coles Weights are Calculated&lt;/h4&gt;
&lt;p&gt;The weighting function typically follows this form:&lt;/p&gt;
&lt;p&gt;$$[
w_t = e^{-\xi (T - t)}
]$$&lt;/p&gt;
&lt;h4 id="explanation-of-variables"&gt;Explanation of Variables&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;$w_t$: Weight assigned to a match played at time $t$&lt;/li&gt;
&lt;li&gt;$T$: The current date&lt;/li&gt;
&lt;li&gt;$t$: The date the match is played&lt;/li&gt;
&lt;li&gt;$\xi$: The decay factor (higher values cause faster decay)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="how-it-works"&gt;How it Works&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;If $\xi$ = 0, all matches are weighted equally.&lt;/li&gt;
&lt;li&gt;If $\xi$ is small (e.g., 0.01), older matches retain some influence.&lt;/li&gt;
&lt;li&gt;If $\xi$ is large (e.g., 0.03), older matches lose influence faster.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To help visualise the effect of the weights, I've plotted a range of $\xi$ values in &lt;strong&gt;Figure 2&lt;/strong&gt; below. With a $\xi$ of zero, no weighting is applied and all historical fixtures carry the same importance. As we increase $\xi$, older matches carry less and less weight in the model's calculations and the model becomes more responsive to recent team performance trends.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250303_weights.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Example Dixon and Coles weights&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="optimising-the-value-of-xi"&gt;Optimising the Value of $\xi$&lt;/h4&gt;
&lt;p&gt;All of the models in the &lt;code&gt;penaltyblog&lt;/code&gt; package support applying weights to the data being trained on, so we can easily optimise the value of $\xi$ by looping through a range of values and selecting the one that gives the lowest RPS.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;penaltyblog&lt;/code&gt; also provides a function to automatically create the weights values based on Dixon and Coles approach, but you can also use your own weighting system if you prefer, as shown in the example below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;
&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZeroInflatedPoissonGoalsModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we just need to update our rolling time-based validation from before to loop through a range of $\xi$ values and select the one that gives the lowest RPS.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0005&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Processing dates&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;lookback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DateOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;years&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;lookback&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZeroInflatedPoissonGoalsModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;homes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;aways&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
            &lt;span class="n"&gt;outcomes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ftr_numeric&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;homes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;aways&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                    &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;decay&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;decay&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rps&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Decay&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;RPS&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Figure 3&lt;/strong&gt; below shows the results for different values of $\xi$. We can see that the optimal value is around 0.001, reducing our RPS to &lt;code&gt;0.1890924613913105&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20250309_xi.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Dixon and Coles RPS using different values of $\xi$&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="conclusions"&gt;Conclusions&lt;/h1&gt;
&lt;p&gt;In this article, we've explored &lt;code&gt;penaltyblog's&lt;/code&gt; predictive models from the Poisson goal model to more advanced approaches like Dixon and Coles, Bivariate Poisson, and Weibull Count Copula goals models. By applying a rolling time-based validation approach, we evaluated their real-world predictive performance using the Ranked Probability Score (RPS). &lt;/p&gt;
&lt;p&gt;We then optimized key parameters, including the lookback window, finding that using around four seasons of historical data strikes the best balance between capturing enough information and staying responsive to recent trends. Finally, we applied Dixon-Coles time weighting, tuning the decay factor to further improve prediction accuracy, reducing the RPS even more.&lt;/p&gt;
&lt;h4 id="key-takeaways"&gt;Key Takeaways&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;More advanced models often outperform the basic Poisson goals model (although not always), particularly in handling low-scoring biases and goal correlation.&lt;/li&gt;
&lt;li&gt;Optimizing historical data usage prevents outdated matches from degrading predictions.&lt;/li&gt;
&lt;li&gt;Applying time-weighting enhances accuracy, ensuring the model remains adaptable to recent team performances.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in finding out what else the &lt;code&gt;penaltyblog&lt;/code&gt; package can do then you can read the documentation &lt;a href="https://penaltyblog.readthedocs.io/en/master/index.html"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Prediction"></category><category term="Poisson"></category><category term="Dixon and Coles"></category><category term="Bivariate Poisson"></category><category term="Weibull Count Copula"></category><category term="Betting"></category><category term="Zero-inflated Poisson"></category><category term="Negative Binomial"></category></entry><entry><title>Calculating Expected Threat in Python Using Linear Algebra</title><link href="2025/01/08/calculating-expected-threat-in-python-using-linear-algebra/" rel="alternate"></link><published>2025-01-08T19:30:00+00:00</published><updated>2025-01-08T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2025-01-08:2025/01/08/calculating-expected-threat-in-python-using-linear-algebra/</id><summary type="html">&lt;p&gt;This article walks through how to calculate expected threat in Python using linear algebra instead of the original convergence method...&lt;/p&gt;</summary><content type="html">&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Imagine you're watching a soccer match and your team's midfielder has the ball at the halfway line. 
How dangerous is this position? What about if they dribble forward 10 yards? Or make a pass to the wing? 
Expected Threat (xT), originally developed &lt;a href="https://en.wikipedia.org/wiki/Sarah_Rudd"&gt;Sarah Rudd&lt;/a&gt; and popularised by &lt;a href="https://karun.in/blog/expected-threat.html"&gt;Karun Singh&lt;/a&gt;, attempts to 
answer these questions by quantifying the offensive value of every position on the pitch.&lt;/p&gt;
&lt;p&gt;Unlike simpler metrics such as expected goals (xG) that only measure shot quality, xT evaluates both immediate shooting opportunities 
and the potential for creating future scoring chances. This makes it useful for analyzing buildup play and measuring 
contributions from those players who don't directly create shots.&lt;/p&gt;
&lt;h1 id="computational-approach"&gt;Computational Approach&lt;/h1&gt;
&lt;p&gt;Karun Singh's implementation of expected threat relies on iterative calculations that gradually converge to the final xT values. 
While this approach works, we can actually take the more elegant approach of using linear algebra to calculate the xT values directly. By reformulating the problem as a system of linear equations, we can achieve the following benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Calculate all values in a single step rather than multiple iterations&lt;/li&gt;
&lt;li&gt;Eliminate the need to check for convergence&lt;/li&gt;
&lt;li&gt;Improve processing speed by replacing nested loops with efficient matrix operations&lt;/li&gt;
&lt;li&gt;Reduce code complexity&lt;/li&gt;
&lt;li&gt;Express the model in standard linear algebra notation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this article, I'll show how to use linear algebra to calculate xT efficiently, and walk through the process 
in Python, providing a step-by-step guide for implementing this approach in practice.&lt;/p&gt;
&lt;h1 id="understanding-expected-threat"&gt;Understanding Expected Threat&lt;/h1&gt;
&lt;p&gt;Let's start off by understanding the theory behind xT, which can be described by the following equation:&lt;/p&gt;
&lt;script type="math/tex"&gt;
  \mathrm{xT}_{x,y} = \underbrace{s_{x,y} \cdot g_{x,y}}_{\text{shot threat}} + \underbrace{m_{x,y} \cdot \sum_{(z,w) \in G} T_{(x,y)\to (z,w)} \cdot \mathrm{xT}_{z,w}}_{\text{move threat}}
  &lt;/script&gt;

&lt;p&gt;At its core, xT breaks down the value of any position on the pitch into two components: the immediate threat of scoring (shot threat) 
and the potential to create better opportunities through movement of the ball (move threat). Let's examine each component in more detail:&lt;/p&gt;
&lt;h2 id="shot-threat-component"&gt;Shot Threat Component&lt;/h2&gt;
&lt;p&gt;The first term represents the immediate threat of scoring from position $(x,y)$:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$s_{x,y}$&lt;/strong&gt; represents the probability of taking a shot from this position&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$g_{x,y}$&lt;/strong&gt; represents the probability of scoring if a shot is taken&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="move-threat-component"&gt;Move Threat Component&lt;/h2&gt;
&lt;p&gt;The second term captures the potential threat from moving the ball:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$m_{x,y}$&lt;/strong&gt; is the probability of moving the ball (instead of shooting)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$T_{(x,y)\to (z,w)}$&lt;/strong&gt; represents the probability of moving the ball from position $(x,y)$ to position $(z,w)$&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$\mathrm{xT}_{z,w}$&lt;/strong&gt; is the expected threat at the destination position&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, the equation attempts to measure a fundamental concept in soccer: the value of a position comes both from the immediate 
scoring opportunity and the potential to create better opportunities through moving the ball. 
For example, when Phil Foden carries the ball from the wing into the penalty area, he's moving from a low-xT position (around 0.01) 
to a much more dangerous position (around 0.3).&lt;/p&gt;
&lt;p&gt;The recursive nature of this equation, where xT appears on both sides, is what typically necessitates an iterative solution. 
However, by recognizing xT as a system of linear equations, we can solve it directly using matrix algebra 
that can be solved efficiently using Python's linear algebra libraries.&lt;/p&gt;
&lt;h1 id="data"&gt;Data&lt;/h1&gt;
&lt;p&gt;To calculate xT, we need event data capturing how the ball moves around the pitch. While the model can incorporate various 
types of ball movements, for simplicity we'll focus on three fundamental events in this article:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Shots &lt;/li&gt;
&lt;li&gt;Goals&lt;/li&gt;
&lt;li&gt;Passes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="data-structure"&gt;Data Structure&lt;/h2&gt;
&lt;p&gt;And for each event, we require the following information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Starting position &lt;strong&gt;$(x, y)$&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Ending position of passes &lt;strong&gt;$(\text{end_x}, \text{end_y})$&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Event type (shot, goal, or pass)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's a sample of what my data looks like - note that these were events collected for the English Premier League between 2017 and 2023, giving us around 3,000,000 relevant events to fit the model to. Unfortunately, I can't share the data here but there are plenty of free data sources that you can use to replicate this.&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;x&lt;/th&gt;
      &lt;th&gt;y&lt;/th&gt;
      &lt;th&gt;end_x&lt;/th&gt;
      &lt;th&gt;end_y&lt;/th&gt;
      &lt;th&gt;event_type&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;45.8&lt;/td&gt;
      &lt;td&gt;21.8&lt;/td&gt;
      &lt;td&gt;53.3&lt;/td&gt;
      &lt;td&gt;26.7&lt;/td&gt;
      &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;52.3&lt;/td&gt;
      &lt;td&gt;11.3&lt;/td&gt;
      &lt;td&gt;45.6&lt;/td&gt;
      &lt;td&gt;8.6&lt;/td&gt;
      &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;38.4&lt;/td&gt;
      &lt;td&gt;50.1&lt;/td&gt;
      &lt;td&gt;40.0&lt;/td&gt;
      &lt;td&gt;64.8&lt;/td&gt;
      &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2.3&lt;/td&gt;
      &lt;td&gt;23.6&lt;/td&gt;
      &lt;td&gt;32.4&lt;/td&gt;
      &lt;td&gt;17.3&lt;/td&gt;
      &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;62.9&lt;/td&gt;
      &lt;td&gt;9.4&lt;/td&gt;
      &lt;td&gt;63.8&lt;/td&gt;
      &lt;td&gt;1.6&lt;/td&gt;
      &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1: Example of the data used here to calculate xT&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="calculating-xt"&gt;Calculating xT&lt;/h1&gt;
&lt;h2 id="the-grid-approach"&gt;The Grid Approach&lt;/h2&gt;
&lt;p&gt;The first step in calculating xT is dividing the pitch into a grid. This transforms the continuous space of the 
soccer / football pitch into discrete zones that we can analyze. Each zone in the grid will have its own xT value, 
representing the expected threat when the ball is in that position.&lt;/p&gt;
&lt;h2 id="iterative-vs-matrix-approach"&gt;Iterative vs Matrix Approach&lt;/h2&gt;
&lt;p&gt;The original methodology calculates xT through iteration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Initialize each zone with base shooting probabilities&lt;/li&gt;
&lt;li&gt;Update xT values by considering contributions from neighboring zones through ball movements&lt;/li&gt;
&lt;li&gt;Repeat the process until xT values stabilize (typically requiring 5 iterations)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While effective, this method requires tracking convergence and is computationally more intensive for larger datasets.&lt;/p&gt;
&lt;p&gt;Instead, we can reformulate the problem as a matrix equation:&lt;/p&gt;
&lt;p&gt;$$\mathbf{X} = \mathbf{S} + \mathbf{M}\mathbf{T}\mathbf{X}$$&lt;/p&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$\mathbf{X}$:&lt;/strong&gt; vector of xT values we want to find&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$\mathbf{S}$:&lt;/strong&gt; vector representing direct shooting threat for each zone&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$\mathbf{M}$:&lt;/strong&gt; Diagonal matrix containing movement probabilities for each zone&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$\mathbf{T}$:&lt;/strong&gt; Transition matrix capturing probabilities of moving between zones&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach allows us to solve for all xT values simultaneously, eliminating the need for iteration.&lt;/p&gt;
&lt;h2 id="choosing-grid-resolution"&gt;Choosing Grid Resolution&lt;/h2&gt;
&lt;p&gt;The choice of grid resolution for the pitch depends on the trade-offs between computational efficiency and 
spatial precision. Some common resolutions and their pros and cons are outlined below:&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
            &lt;th&gt;Resolution&lt;/th&gt;
            &lt;th&gt;Advantages&lt;/th&gt;
            &lt;th&gt;Disadvantages&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;$12 \times 8$ (Coarse)&lt;/td&gt;
            &lt;td&gt;More data per zone, Faster computation, more stable estimates&lt;/td&gt;
            &lt;td&gt;Less spatial detail&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;$16 \times 12$ (Fine)&lt;/td&gt;
            &lt;td&gt;Better spatial precision, finer tactical insights&lt;/td&gt;
            &lt;td&gt;Sparser data, Longer computation&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;$20 \times 15$ (Very Fine)&lt;/td&gt;
            &lt;td&gt;Highly detailed spatial patterns&lt;/td&gt;
            &lt;td&gt;Even sparser data, computationally intensive&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2: Grid resolution trade-offs&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For this article, we'll use a $16 \times 12$ grid. This resolution strikes a balance between capturing detailed spatial patterns, 
such as distinguishing the penalty spot from the edge of the box, and maintaining sufficient data in each zone for 
reliable estimates. Additionally, the computational requirements are manageable with this resolution.&lt;/p&gt;
&lt;p&gt;Next, let's dive into the Python implementation, showing how to construct the required matrices efficiently.&lt;/p&gt;
&lt;h1 id="computing-transition-probabilities"&gt;Computing Transition Probabilities&lt;/h1&gt;
&lt;p&gt;Before we can calculate xT, we need to understand how players use each zone of the pitch. This involves computing 
three key probabilities:&lt;/p&gt;
&lt;h2 id="required-probabilities"&gt;Required Probabilities&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Shot Probability ($s_{x,y}$):&lt;/strong&gt; The likelihood of a shot being taken from each zone, calculated as the ratio of shots taken 
to all events in that zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goal Probability ($g_{x,y}$):&lt;/strong&gt; The conversion rate of shots from each zone, determined as the ratio of goals scored to shots 
taken from that zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Movement Probability ($T_{(x,y)\to(z,w)}$):&lt;/strong&gt; The likelihood of moving the ball from zone $(x, y)$ to zone $(z, w)$, based on 
observed transitions in the data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="mapping-to-grid-coordinates"&gt;Mapping to Grid Coordinates&lt;/h2&gt;
&lt;p&gt;To calculate xT, we need to map raw pitch coordinates onto our $16 \times 12$ grid system. Here’s how to map any position on the pitch to its corresponding grid cell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;mplsoccer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;map_coordinates_to_grid_vectorized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class="sd"&gt;    Vectorized function to convert pitch coordinates (x, y) into grid cell indices,&lt;/span&gt;
&lt;span class="sd"&gt;    handling missing values.&lt;/span&gt;

&lt;span class="sd"&gt;    Parameters:&lt;/span&gt;
&lt;span class="sd"&gt;        x (pd.Series or np.array): X-coordinates on the pitch (normalized to 0-100).&lt;/span&gt;
&lt;span class="sd"&gt;        y (pd.Series or np.array): Y-coordinates on the pitch (normalized to 0-100).&lt;/span&gt;
&lt;span class="sd"&gt;        grid_width (int): Number of grid cells across the width of the pitch.&lt;/span&gt;
&lt;span class="sd"&gt;        grid_height (int): Number of grid cells across the height of the pitch.&lt;/span&gt;

&lt;span class="sd"&gt;    Returns:&lt;/span&gt;
&lt;span class="sd"&gt;        tuple: Two arrays (grid_x, grid_y) representing grid cell indices. Missing values&lt;/span&gt;
&lt;span class="sd"&gt;               will be represented as None.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;PITCH_WIDTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="n"&gt;PITCH_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

    &lt;span class="c1"&gt;# Handle missing values (NaN or None)&lt;/span&gt;
    &lt;span class="n"&gt;valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize arrays with None for invalid coordinates&lt;/span&gt;
    &lt;span class="n"&gt;grid_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;grid_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Compute grid indices only for valid rows&lt;/span&gt;
    &lt;span class="n"&gt;grid_x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;grid_width&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;PITCH_WIDTH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_width&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;grid_y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;grid_height&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;PITCH_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;grid_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_y&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;path/to/my/data.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Map starting coordinates to grid cells&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;map_coordinates_to_grid_vectorized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Map ending coordinates to grid cells, handling missing values&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;map_coordinates_to_grid_vectorized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;end_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;end_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
    &lt;thead&gt;    
        &lt;tr style="text-align: right;"&gt; 
            &lt;th&gt;x&lt;/th&gt;     
            &lt;th&gt;y&lt;/th&gt;     
            &lt;th&gt;end_x&lt;/th&gt;      
            &lt;th&gt;end_y&lt;/th&gt;      
            &lt;th&gt;grid_x&lt;/th&gt;      
            &lt;th&gt;grid_y&lt;/th&gt;     
            &lt;th&gt;event_type&lt;/th&gt;  
        &lt;/tr&gt;  
    &lt;/thead&gt;  
    &lt;tbody&gt;    
        &lt;tr&gt;          
            &lt;td&gt;45.8&lt;/td&gt;      
            &lt;td&gt;21.8&lt;/td&gt;      
            &lt;td&gt;53.3&lt;/td&gt;      
            &lt;td&gt;26.7&lt;/td&gt;      
            &lt;td&gt;7&lt;/td&gt;      
            &lt;td&gt;2&lt;/td&gt;      
            &lt;td&gt;pass&lt;/td&gt;
        &lt;/tr&gt; 
        &lt;tr&gt;          
            &lt;td&gt;52.3&lt;/td&gt;      
            &lt;td&gt;11.3&lt;/td&gt;      
            &lt;td&gt;45.6&lt;/td&gt;      
            &lt;td&gt;8.6&lt;/td&gt;      
            &lt;td&gt;8&lt;/td&gt;      
            &lt;td&gt;1&lt;/td&gt;      
            &lt;td&gt;pass&lt;/td&gt;
        &lt;/tr&gt;    
        &lt;tr&gt;      
            &lt;td&gt;38.4&lt;/td&gt;
            &lt;td&gt;50.1&lt;/td&gt;     
            &lt;td&gt;40.0&lt;/td&gt;     
            &lt;td&gt;64.8&lt;/td&gt;     
            &lt;td&gt;6&lt;/td&gt;     
            &lt;td&gt;6&lt;/td&gt;     
            &lt;td&gt;pass&lt;/td&gt;    
        &lt;/tr&gt;    
        &lt;tr&gt;        
            &lt;td&gt;2.3&lt;/td&gt;
            &lt;td&gt;23.6&lt;/td&gt;      
            &lt;td&gt;32.4&lt;/td&gt;      
            &lt;td&gt;17.3&lt;/td&gt;      
            &lt;td&gt;0&lt;/td&gt;      
            &lt;td&gt;2&lt;/td&gt;      
            &lt;td&gt;pass&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 3: Mapping coordinates to grid cells&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;By mapping our continuous pitch coordinates into discrete grid zones, we’ve converted our raw event data 
to a structured format suitable for xT calculations. For instance, a position at (50, 50) would map to grid 
zone (8, 6) in our $16 \times 12$ grid.&lt;/p&gt;
&lt;p&gt;Next, we need to calculate three core matrices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;pass_count&lt;/strong&gt;: Stores the number of passes originating from each grid zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;shot_count&lt;/strong&gt;: Records the number of shots taken from each grid zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;goal_count&lt;/strong&gt;: Tracks the number of goals scored from each grid zone.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These matrices will be used to calculate the probabilities ($s_{x,y}$, $g_{x,y}$, and $T_{(x,y)\to(z,w)}$) 
used in the xT computation.&lt;/p&gt;
&lt;h2 id="populating-event-count-matrices"&gt;Populating Event Count Matrices&lt;/h2&gt;
&lt;p&gt;Using our grid coordinates, we'll now populate the matrices that count the number of passes, shots, and goals in each grid zone. 
The following code achieves this by iterating through the event data and incrementing the corresponding grid cell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize the empty grids&lt;/span&gt;
&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;

&lt;span class="n"&gt;pass_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;shot_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;goal_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Filter events by type&lt;/span&gt;
&lt;span class="n"&gt;pass_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;event_type == &amp;#39;pass&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;goals_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;event_type == &amp;#39;goal&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;shots_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;event_type == &amp;#39;shot&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Group by grid coordinates and count events&lt;/span&gt;
&lt;span class="n"&gt;pass_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;shot_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shots_df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;goal_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goals_df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Fill in the grids using the grouped data&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pass_counts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;pass_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;shot_counts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;shot_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;goal_counts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;goal_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="visualizing-event-count-matrices"&gt;Visualizing Event Count Matrices&lt;/h2&gt;
&lt;p&gt;To ensure the data looks correct, let’s visualize the count matrices as heatmaps. Each heatmap provides a spatial representation 
of the frequency of each event type across the pitch.&lt;/p&gt;
&lt;h3 id="shot-count-heatmap"&gt;Shot Count heatmap&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shot_count&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pitch_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;opta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pitch_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#22312b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#efefef&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.125&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;pcm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;viridis&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/xt_shot_count.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Shot Count heatmap&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="goal-count-heatmap"&gt;Goal Count heatmap&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal_count&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pitch_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;opta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pitch_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#22312b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#efefef&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.125&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;pcm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;viridis&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/xt_goal_count.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Goal Count heatmap&lt;/em&gt;*&lt;/p&gt;
&lt;h3 id="pass-count-heatmap"&gt;Pass Count heatmap&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_count&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pitch_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;opta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pitch_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#22312b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#efefef&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.125&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;pcm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;viridis&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/xt_pass_count.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Pass Count heatmap&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="calculate-event-probabilities"&gt;Calculate Event Probabilities&lt;/h1&gt;
&lt;p&gt;Now that we have the event counts for each grid cell, we can calculate the probabilities of each event type occurring in a 
given cell. We’ll handle cases where division by zero might occur by replacing NaN or infinite values with 0. &lt;/p&gt;
&lt;p&gt;Notice that we also scale the move probability by &lt;code&gt;0.79&lt;/code&gt;. At the moment, our move probabilities presume a 100% chance that a pass successfully moves the ball to its target location. This is clearly not true as passes can be misplaced or intercepted, so we scale the move probability by the average pass accuracy in the dataset. This prevents us from overestimating the value of a pass when we solve for xT values using the linear algebra approach. There are more sophisticated ways to handle this, but for now, this approach does a good job of balancing the value of a pass versus that of a shot.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errstate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;divide&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ignore&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;invalid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ignore&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;move_probability&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pass_count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;shot_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;shot_probability&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shot_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pass_count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;shot_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;goal_probability&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;shot_count&lt;/span&gt;
    &lt;span class="n"&gt;goal_probability&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan_to_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goal_probability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;posinf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;neginf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;move_probability&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mf"&gt;0.79&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="pass-transition-matrices"&gt;Pass Transition Matrices&lt;/h1&gt;
&lt;p&gt;We are now ready to calculate the pass transition matrix, which describes the probability of moving the ball from one grid zone 
to another. This matrix will have dimensions equal to the total number of grid cells (e.g., $192 \times 192$ for a $16 \times 12$ grid). 
Each element at row $i$ and column $j$ represents the probability of transitioning from grid zone $i$ to grid zone $j$ based 
on the observed transitions from our event data.&lt;/p&gt;
&lt;p&gt;We start off by aggregating the event data by the source and target grid zones. This will give us a count of the number of passes
from each source zone to each target zone, which we will use to calculate the transition probabilities between grid zones.&lt;/p&gt;
&lt;p&gt;These probabilities are then assigned to the appropriate indices in the corresponding sub-matrix of &lt;code&gt;t_matrices&lt;/code&gt;, mapping the transition probabilities from each starting grid zone to all possible ending zones. This process encodes the spatial patterns of ball movement across the pitch into a format that can be used more easily for the xT computation.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Aggregate passes by start and end grid locations&lt;/span&gt;
&lt;span class="n"&gt;pass_end_aggs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pass_df&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_x_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_y_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Count passes ending at each location&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pass_start_aggs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pass_df&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count_start&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Count passes starting from each location&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Merge the start and end aggregates and calculate pass probabilities&lt;/span&gt;
&lt;span class="n"&gt;pass_aggs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_end_aggs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pass_start_aggs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;left&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pass_aggs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;pass_prob&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_aggs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;pass_aggs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count_start&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Drop unnecessary columns&lt;/span&gt;
&lt;span class="n"&gt;pass_aggs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pass_aggs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;count_start&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the transition matrices&lt;/span&gt;
&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="n"&gt;t_matrices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Populate the transition matrices&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pass_aggs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;start_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;end_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;t_matrices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;pass_prob&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id="the-xt-surface-map"&gt;The xT Surface Map&lt;/h1&gt;
&lt;p&gt;Now that we have the pass transition matrices, we can finally calculate the expected threat (xT) for each grid zone. Rather than using the original convergence method, we will take the more efficient approach of using linear algebra to solve for xT in one step.&lt;/p&gt;
&lt;p&gt;The code above constructs a system of linear equations to represent how xT flows across the pitch. Here's how it works at a high level:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Input probabilities:&lt;/strong&gt; the probability matrices are flattened into vectors, representing the likelihood of events at each grid zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Matrix Setup ($\mathbf{A}$):&lt;/strong&gt; the identity matrix ($\mathbf{A}$) is initialized, representing the default relationship where each grid cell depends only on itself. This is then adjusted by subtracting scaled transition probabilities from &lt;code&gt;t_matrices&lt;/code&gt;, which describe how xT flows from one grid zone to others during possession.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Payoff Vector ($\mathbf{b}$):&lt;/strong&gt; the payoff vector ($\mathbf{b}$) is constructed by combining shooting and goal probabilities, reflecting the immediate scoring threat for each grid cell.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Solving the System:&lt;/strong&gt; The equation $\mathbf{A} \cdot \mathbf{x} = \mathbf{b}$ can now be solved using NumPy's &lt;code&gt;np.linalg.solve&lt;/code&gt; to find $\mathbf{x}$, the expected threat (xT) for each grid zone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reshaping the Result:&lt;/strong&gt; Finally, the solution vector $\mathbf{x}$ is reshaped back into a 2D grid format, aligning with the original pitch layout.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our output now represents how both movement and shooting probabilities contribute to the scoring potential at every location on the pitch, producing the xT surface map.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;N_CELLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;

&lt;span class="c1"&gt;# Flatten input probabilities&lt;/span&gt;
&lt;span class="n"&gt;shot_prob_flat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shot_probability&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;goal_prob_flat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal_probability&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;move_prob_flat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_probability&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize matrix A as an identity matrix&lt;/span&gt;
&lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eye&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N_CELLS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compute adjustments to A for movement probabilities and transitions&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N_CELLS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Map flat index to grid coordinates&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;divmod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Subtract the scaled transition matrix from A&lt;/span&gt;
    &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;move_prob_flat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t_matrices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create vector b (shooting payoff)&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shot_prob_flat&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;goal_prob_flat&lt;/span&gt;

&lt;span class="c1"&gt;# Solve the linear system A * x = b&lt;/span&gt;
&lt;span class="n"&gt;xt_grid_flat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;solve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Reshape the result into the grid format&lt;/span&gt;
&lt;span class="n"&gt;xt_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xt_grid_flat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GRID_HEIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GRID_WIDTH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's now visualize the xT surface map, to check the expected threat at every location on the pitch.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xt_grid&lt;/span&gt;

&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pitch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pitch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pitch_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;opta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pitch_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#22312b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;#efefef&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.125&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;pcm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pitch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;viridis&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s1"&gt;.2f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;center&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;center&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;white&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/xt_grid.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 4: xT Grid&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The results look sensible, with the highest expected threat being in the zones closest to the opponent's goal and the xT decreasing as you move away from the goal. Let's compare this to the xT grid created using the original convergence method shown on the &lt;a href="https://soccermatics.readthedocs.io/en/latest/gallery/lesson4/plot_ExpectedThreat.html"&gt;Soccermatics&lt;/a&gt; website. There are some minor differences in values, most likely caused by the different datasets used, but the general shape of our xT surface map is reassuringly similar.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/soccermatics_xt_grid.webp"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 5: Soccermatics xT Grid calculated using the convergence method&lt;/em&gt;&lt;/p&gt;
&lt;h1 id="calculating-player-xt"&gt;Calculating Player xT&lt;/h1&gt;
&lt;p&gt;Now that we’ve calculated the xT grid for every cell on the pitch, we can take the analysis a step further by evaluating individual player contributions. By mapping player's actions onto the xT grid, we’ll quantify the offensive impact of each player, quantifying their ability to create and capitalize on scoring opportunities.&lt;/p&gt;
&lt;p&gt;There are several ways to approach this analysis. For instance, we could focus solely on successful on-ball events, consider only events with positive xT values, or even calculate the net xT by subtracting the xT of negative events from positive ones. For simplicity, we’ll focus on successful passes with positive xT values. This allows us to quantify each player’s offensive impact while avoiding penalizing players for negative events. Additionally, it accounts for context — teams like Manchester City, known for their high possession and intricate passing style, often make backward passes to maintain control without any negative intent. Including such passes as negative contributions could unfairly skew the analysis.&lt;/p&gt;
&lt;p&gt;For this article, we’ll also focus exclusively on passes rather than including goals. While it’s perfectly valid to incorporate goals, our interest here lies in evaluating players involved in the build-up play leading to scoring opportunities. Goals can often be better analyzed using a separate expected goals (xG) model, which is specifically designed to quantify the quality and likelihood of scoring from individual shots. By isolating passes, we can concentrate on the contributions players make during the progression and creation phases of an attack.&lt;/p&gt;
&lt;p&gt;Let's take a look at the top players by xT for the English Premier League 2023/24 season.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load the data&lt;/span&gt;
&lt;span class="n"&gt;player_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;path/to/player/data.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Map the coordinates to the grid&lt;/span&gt;
&lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;map_coordinates_to_grid_vectorized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;map_coordinates_to_grid_vectorized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;end_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;end_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate the xT difference per action&lt;/span&gt;
&lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;xt_diff&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;xt_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x_end&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;xt_grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;grid_x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
    &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Filter for positive xT differences and aggregate by player&lt;/span&gt;
&lt;span class="n"&gt;top_players&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;player_df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;xt_diff &amp;gt; 0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;player_id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;xt_diff&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;xt_diff&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
    &lt;thead&gt;    
        &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;xT&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
&lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Pascal Groß&lt;/td&gt;
      &lt;td&gt;21.5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Rodri&lt;/td&gt;
      &lt;td&gt;21.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Bruno Fernandes&lt;/td&gt;
      &lt;td&gt;18.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Martin Ødegaard&lt;/td&gt;
      &lt;td&gt;18.4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;Trent Alexander-Arnold&lt;/td&gt;
      &lt;td&gt;17.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Kieran Trippier&lt;/td&gt;
      &lt;td&gt;15.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;7&lt;/th&gt;
      &lt;td&gt;Declan Rice&lt;/td&gt;
      &lt;td&gt;15.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;8&lt;/th&gt;
      &lt;td&gt;Bruno Guimarães&lt;/td&gt;
      &lt;td&gt;14.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;Morgan Gibbs-White&lt;/td&gt;
      &lt;td&gt;13.6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;10&lt;/th&gt;
      &lt;td&gt;James Ward-Prowse&lt;/td&gt;
      &lt;td&gt;13.2&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Table 4: Top 10 players by xT for the English Premier League 2023/24 season&lt;/p&gt;
&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;By using linear algebra, we’ve introduced an efficient and elegant way to quantify player contributions to their team’s offensive efforts. This approach not only simplifies the computation of xT but also opens the door for more scalable analyses using larger datasets. &lt;/p&gt;
&lt;p&gt;It’s also worth noting that this article has been intentionally simplified to (hopefully) make it more accessible and easy to understand. There are numerous ways to refine and expand upon this approach, which I'm planning on exploring in a future blog post.&lt;/p&gt;</content><category term="Expected Threat"></category><category term="Expected Threat"></category></entry><entry><title>Estimating Goal Expectancy From Bookmaker's Odds</title><link href="2022/12/02/goal-expectancy-from-bookmakers-odds-using-python/" rel="alternate"></link><published>2022-12-02T19:30:00+00:00</published><updated>2022-12-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2022-12-02:2022/12/02/goal-expectancy-from-bookmakers-odds-using-python/</id><summary type="html">&lt;p&gt;This article walks through how to estimate goal expectancies from bookmaker's odds using Python...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I've explored lots of different ways of predicting football matches so far on this blog that have all been backwards looking - they take in historical data and try and use it to predict how many goals will be scored.&lt;/p&gt;
&lt;p&gt;This goal expectancy is then used to work out the probabilities for the match, e.g. what's the probability the home team wins, or that it's a draw, what are the prices for the over / under markets etc.&lt;/p&gt;
&lt;p&gt;This time we're going to do things the opposite way round and use the bookmaker's odds to predict goal expectancy.&lt;/p&gt;
&lt;h2 id="data"&gt;Data&lt;/h2&gt;
&lt;p&gt;Let's start off by getting some bookmaker's odds from &lt;a href="https://football-data.co.uk/"&gt;football-data.co.uk&lt;/a&gt;. This is really easy to do using the scraper from the &lt;code&gt;penaltyblog&lt;/code&gt; python package, which can be installed via &lt;a href="https://pypi.org/project/penaltyblog/"&gt;pip&lt;/a&gt; if you don't already have it. &lt;/p&gt;
&lt;p&gt;One we have the data, we filter out the columns we don't need to make it a little easier to work with. The &lt;code&gt;psh&lt;/code&gt;, &lt;code&gt;psd&lt;/code&gt; and &lt;code&gt;psa&lt;/code&gt; columns represent Pinnacle's home, draw and away odds for each fixture.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022-2023&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;th&gt;goals_home&lt;/th&gt;
      &lt;th&gt;goals_away&lt;/th&gt;
      &lt;th&gt;psh&lt;/th&gt;
      &lt;th&gt;psd&lt;/th&gt;
      &lt;th&gt;psa&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-05&lt;/td&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;4.50&lt;/td&gt;
      &lt;td&gt;3.65&lt;/td&gt;
      &lt;td&gt;1.89&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Bournemouth&lt;/td&gt;
      &lt;td&gt;Aston Villa&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;3.93&lt;/td&gt;
      &lt;td&gt;3.58&lt;/td&gt;
      &lt;td&gt;2.04&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Everton&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;6.04&lt;/td&gt;
      &lt;td&gt;4.06&lt;/td&gt;
      &lt;td&gt;1.63&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Fulham&lt;/td&gt;
      &lt;td&gt;Liverpool&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;11.20&lt;/td&gt;
      &lt;td&gt;6.22&lt;/td&gt;
      &lt;td&gt;1.28&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Leeds&lt;/td&gt;
      &lt;td&gt;Wolves&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;2.39&lt;/td&gt;
      &lt;td&gt;3.33&lt;/td&gt;
      &lt;td&gt;3.30&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id="the-overround"&gt;The Overround&lt;/h2&gt;
&lt;p&gt;The odds in the &lt;code&gt;psh&lt;/code&gt;, &lt;code&gt;psd&lt;/code&gt; and &lt;code&gt;psa&lt;/code&gt; columns are currently in decimal notation. Let's take the first fixture as an example and convert the odds into probabilities instead by taking the reciprocal.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;td&gt;2022-08-05 00:00:00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;goals_home&lt;/th&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;goals_away&lt;/th&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;psh&lt;/th&gt;
      &lt;td&gt;0.222222&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;psd&lt;/th&gt;
      &lt;td&gt;0.273973&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;psa&lt;/th&gt;
      &lt;td&gt;0.529101&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Hmmm, something's not right here. Our fixture only has three possible outcomes (either the home team wins, the away team wins or it's a draw) so the probabilities should add up nicely to 1.0 but &lt;code&gt;example["psh"] + example["psd"] + example["psa"]&lt;/code&gt; adds up to &lt;code&gt;1.025&lt;/code&gt;. Somehow, we're out by 2.5%.&lt;/p&gt;
&lt;p&gt;The reason is the overround - this is an extra bit of margin the bookmaker has factored into the odds to help guarantee them a profit regardless of the fixture's outcome.&lt;/p&gt;
&lt;p&gt;There are a whole load of different ways we can try and remove this overround and get back to the original odds before the bookmaker skewed them in their favour. We're going to use the &lt;code&gt;power&lt;/code&gt; method included in the &lt;code&gt;penaltyblog&lt;/code&gt; package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;overround&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;odds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;odds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;implied&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;power&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;implied_probabilities&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;overround&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;th&gt;goals_home&lt;/th&gt;
      &lt;th&gt;goals_away&lt;/th&gt;
      &lt;th&gt;psh&lt;/th&gt;
      &lt;th&gt;psd&lt;/th&gt;
      &lt;th&gt;psa&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-05&lt;/td&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0.214017&lt;/td&gt;
      &lt;td&gt;0.265241&lt;/td&gt;
      &lt;td&gt;0.520742&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Bournemouth&lt;/td&gt;
      &lt;td&gt;Aston Villa&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0.246553&lt;/td&gt;
      &lt;td&gt;0.271239&lt;/td&gt;
      &lt;td&gt;0.482208&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Everton&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0.157612&lt;/td&gt;
      &lt;td&gt;0.237040&lt;/td&gt;
      &lt;td&gt;0.605349&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Fulham&lt;/td&gt;
      &lt;td&gt;Liverpool&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0.079856&lt;/td&gt;
      &lt;td&gt;0.147753&lt;/td&gt;
      &lt;td&gt;0.772391&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2022-08-06&lt;/td&gt;
      &lt;td&gt;Leeds&lt;/td&gt;
      &lt;td&gt;Wolves&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0.411107&lt;/td&gt;
      &lt;td&gt;0.293087&lt;/td&gt;
      &lt;td&gt;0.295806&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Brilliant, we've now got the implied probabilities for the each fixture without the overround messing them up. &lt;/p&gt;
&lt;h2 id="goal-expectancy"&gt;Goal Expectancy&lt;/h2&gt;
&lt;p&gt;Now we've got the data we need, how do we convert it into goal expectancies?&lt;/p&gt;
&lt;p&gt;If you remember back to &lt;a href="http://www.pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;previous&lt;/a&gt; &lt;a href="http://www.pena.lt/y/2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/"&gt;articles&lt;/a&gt;, we used the predicted number of goals to work out the probabilities for a home win, away win or a draw occurring.&lt;/p&gt;
&lt;p&gt;This time though, we've got the probabilities for the outcomes but not for the goals scored. So, in theory all we need to do is try different combinations of home and away goals to see which ones give us those probabilities back. &lt;/p&gt;
&lt;p&gt;We could do this through brute force by trying lots and lots of different score lines but that's slow and tedious so we're going to use &lt;code&gt;scipy's&lt;/code&gt; minimizer to speed things up. &lt;/p&gt;
&lt;p&gt;Instead of randomly trying lots of score lines, &lt;code&gt;scipy&lt;/code&gt; will look at the error between our guesses versus the bookmaker's probabilities to work out what score lines to try out next and hopefully find us the answer more quickly.&lt;/p&gt;
&lt;p&gt;To do this, we're going to need a function to measure the error between us and the bookmakers. We're going to use something called &lt;a href="https://en.wikipedia.org/wiki/Mean_squared_error"&gt;Mean Squared Error&lt;/a&gt; (MSE) as shown below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_mse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;exp_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mu1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exp_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;mu2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exp_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;mat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mu1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mu2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tril&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="c1"&gt;# home&lt;/span&gt;
        &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mat&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="c1"&gt;# draw&lt;/span&gt;
        &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;triu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt; &lt;span class="c1"&gt;# away&lt;/span&gt;

    &lt;span class="n"&gt;obs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;obs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mse&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;_mse&lt;/code&gt; function takes an argument called &lt;code&gt;params&lt;/code&gt;, which is the home and away goals to test and then exponentiates them. These values are then plugged into a Poisson distribution to get the probabilities for each team scoring 0-15 goals. We then take the &lt;a href="https://en.wikipedia.org/wiki/Outer_product"&gt;outer product&lt;/a&gt; of these two sets of probabilities to get a matrix giving the probability of each possible scoreline.&lt;/p&gt;
&lt;p&gt;This matrix is a bit like a spreadsheet, so in the example below we have a probability of &lt;code&gt;0.01041474&lt;/code&gt; for nil-nil, &lt;code&gt;0.1207623 for 3-0&lt;/code&gt;, &lt;code&gt;0.00436915&lt;/code&gt; for 1-2 and so on.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Goal Probability Matrix" src="../../../../images/goal_matrix.png"&gt;&lt;/p&gt;
&lt;p&gt;The sum of all the probabilities on the diagonal of this matrix gives us the overall probability of a draw, the sum of all the probabilities above the diagonal is the probability of an away win and the sum of all the probabilities below the diagonal is the probability of a home win.&lt;/p&gt;
&lt;p&gt;We then just take the difference between our estimated probabilities and the bookmaker's probabilities, square them (to ensure they are not negative) and return the average back to the minimizer.&lt;/p&gt;
&lt;p&gt;The job of the minimizer is then to find out what combination of expected home and away goals returns the lowest error. Let's set up the minimizer next.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.optimize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;maxiter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;disp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;    

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;fun&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_mse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;x0&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;home_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;away_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fun&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;success&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;success&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;      

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Our &lt;code&gt;goal_expectation&lt;/code&gt; function takes the bookmaker's odds as inputs and returns the number of home and away goals predicted by the bookmaker.&lt;/p&gt;
&lt;p&gt;The first part of the function just sets up some options to tell the minimizer to give up if it's not found a decent answer after 1000 attempts and not to print out a whole load of messages to the screen while it's doing it.&lt;/p&gt;
&lt;p&gt;Next, we run the minimizer by passing in the function to minimise the error for. We also pass in some starting parameters for the home and away goals as &lt;code&gt;x0&lt;/code&gt; to give the minimizer a head start. Finally, we pass in the bookmaker's probabilities and the options we set up. We wrap things up by formatting the output from the minimizer to just return the information we're interested in.&lt;/p&gt;
&lt;p&gt;Let's give it a go!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psa&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;home_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;away_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_exp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;       
        &lt;span class="s2"&gt;&amp;quot;success&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;success&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;th&gt;home_exp&lt;/th&gt;
      &lt;th&gt;away_exp&lt;/th&gt;
      &lt;th&gt;success&lt;/th&gt;
      &lt;th&gt;error&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;0.837309&lt;/td&gt;
      &lt;td&gt;1.470508&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;8.111775e-12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Bournemouth&lt;/td&gt;
      &lt;td&gt;Aston Villa&lt;/td&gt;
      &lt;td&gt;0.921695&lt;/td&gt;
      &lt;td&gt;1.406204&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;6.030584e-10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Everton&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;0.727779&lt;/td&gt;
      &lt;td&gt;1.693048&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;6.227654e-12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Fulham&lt;/td&gt;
      &lt;td&gt;Liverpool&lt;/td&gt;
      &lt;td&gt;0.680463&lt;/td&gt;
      &lt;td&gt;2.510449&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;2.981267e-11&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Leeds&lt;/td&gt;
      &lt;td&gt;Wolves&lt;/td&gt;
      &lt;td&gt;1.187333&lt;/td&gt;
      &lt;td&gt;0.960074&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;2.442655e-09&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The &lt;code&gt;success&lt;/code&gt; column shows lots of &lt;code&gt;True&lt;/code&gt; values, meaning the minimizer successfully found a solution each time and the error column is full of lots of tiny numbers meaning our predictions should be pretty accurate &amp;#128515;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This technique should give you decent estimates but there's still some room for improvement. If you go back to my article on the &lt;a href="http://www.pena.lt/y/2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/"&gt;Dixon and Coles&lt;/a&gt; model, I discussed how the Poisson distribution can struggle somewhat with the probabilities for low scoring games and how Dixon and Coles used an adjustment factor to try and alleviate this.&lt;/p&gt;
&lt;p&gt;We can also include this adjustment into our loss function to try and reduce our error even further. I haven't added it in here as I wanted to keep this article simple but you can try the Dixon and Coles adjusted version in my &lt;code&gt;penaltyblog&lt;/code&gt; package by using the &lt;code&gt;pb.models.goal_expectancy&lt;/code&gt; function.&lt;/p&gt;
&lt;h2 id="addendum"&gt;Addendum&lt;/h2&gt;
&lt;p&gt;Just to make clear for anybody who is unsure, the goal expectancy discussed here is different to expected goals. Goal expectancy is how many goals the bookmaker is expecting based on their 1x2 odds, whereas expected goals is how many goals a team is expected to score based on the shots they have taken. They both have similar names but quite different meanings.&lt;/p&gt;</content><category term="Poisson"></category><category term="Scraping"></category><category term="Data"></category><category term="Betting"></category></entry><entry><title>Penaltyblog Python Package Updated to v0.5.1</title><link href="2022/11/04/penaltyblog-python-package-v5/" rel="alternate"></link><published>2022-11-04T19:30:00+00:00</published><updated>2022-11-04T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2022-11-04:2022/11/04/penaltyblog-python-package-v5/</id><summary type="html">&lt;p&gt;My penaltyblog python package has been updated to v0.5.1 to include new Bayesian football (soccer) models and web scrapers...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;My &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; python package has recently been updated to v0.5.1 so let's take a look at some of the new features.&lt;/p&gt;
&lt;h2 id="python-37-support"&gt;Python 3.7 Support&lt;/h2&gt;
&lt;p&gt;By popular request, &lt;code&gt;penaltyblog&lt;/code&gt; is now compatible with Python 3.7. The main reason for doing this is to allow it to run on Google Colab, which at the time of writing is still stuck on what is now a fairly old version of Python.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;penaltyblog&lt;/code&gt; isn't included in Colab by default but can easily be installed via &lt;code&gt;pip&lt;/code&gt; by running the command below within one of your notebook's cells&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="sx"&gt;!pip install penaltyblog==0.5.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Once it's installed, you can then import &lt;code&gt;penaltyblog&lt;/code&gt; as normal and use all of its functions.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;understat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;understat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="bayesian-hierarchical-goals-model"&gt;Bayesian Hierarchical Goals Model&lt;/h2&gt;
&lt;p&gt;Another exciting update is the addition of a new goals model based on Bayesian hierarchical modelling. I explained the theory behind this approach in a &lt;a href="http://pena.lt/y/2021/08/25/predicting-football-results-using-bayesian-statistics-with-python-and-pymc3/"&gt;previous article&lt;/a&gt; so I won't go into the theory here but this model is now included in the package.&lt;/p&gt;
&lt;p&gt;The hierarchical model follows the same API as all the other goals models meaning that you also have the ability to optionally apply a decay weighting to the data so that more recent fixtures are considered more important when fitting the model. &lt;/p&gt;
&lt;p&gt;Here's a quick example to get you started&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weights&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BayesianHierarchicalGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weights&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Chelsea&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="bayesian-bivariate-poisson-goals-model"&gt;Bayesian Bivariate Poisson Goals Model&lt;/h2&gt;
&lt;p&gt;As well as the hierarchical model, I've also added in a Bayesian bivariate Poisson model as well. I'll probably write a separate article at some point to explain the theory behind the modelling so again I won't go into too many details here. &lt;/p&gt;
&lt;p&gt;However, as I've mentioned in &lt;a href="https://pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;previous&lt;/a&gt; &lt;a href="https://pena.lt/y/2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/"&gt;articles&lt;/a&gt; there is a common issue with Poisson-based models where they treat both team's scores as independent from each other. &lt;/p&gt;
&lt;p&gt;This doesn't reflect reality though where each team's goals scored / conceded are likely not independent. For example, if the score is 0-0 with 15 minutes to go then the underdog may settle for a draw and not push to score. Or if a team goes a goal down early on they may park the buss to prevent a more humiliating score line.&lt;/p&gt;
&lt;p&gt;The bivariate model attempts to account for this by modelling the underlying Poisson distributions as a bivariate function. So instead of having seperate Poisson distributions for the home and away teams, we have one combined distribution.&lt;/p&gt;
&lt;p&gt;Here's another quick example to get you started. Notice how similar the code is to the hierarchical example above - all we have to do is change one word to switch out the model, making it easy to try out different approaches.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weights&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BayesianBivariateGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;goals_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weights&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Chelsea&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So which model should you use &amp;#129335; &lt;/p&gt;
&lt;p&gt;Unfortunately, there's no simple answer here. If you want something fast and reliable then go with the &lt;a href="http://docs.pena.lt/y/models/dixon_coles.html"&gt;Dixon and Coles model&lt;/a&gt;. Otherwise, it you've got more time / computational power then try out the Bayesian models. The only real way of knowing though is backtesting them on your data to find out which ones work best for your particular use case.&lt;/p&gt;
&lt;h2 id="so-fifa"&gt;So Fifa&lt;/h2&gt;
&lt;p&gt;The scrapers in &lt;code&gt;penaltyblog&lt;/code&gt; have also been updated to include &lt;a href="https://sofifa.com/"&gt;So Fifa&lt;/a&gt;. The &lt;code&gt;get_players&lt;/code&gt; function essentially scrapes the front page of the website, which contains top-level player data. You can control the number of pages to scrape and how the data should be sorted to make it easier to just get the top-ranked players if that's all you're interested in.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;sofifa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SoFifa&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;player_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sofifa&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_players&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;potential&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;player_info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can then use the &lt;code&gt;get_player&lt;/code&gt; function to get more detailed stats about players you're interested in based on So Fifa's player ID.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;time&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sleep&lt;/span&gt;

&lt;span class="n"&gt;sofifa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SoFifa&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;player_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sofifa&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_players&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;value&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;players&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;id_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;player_info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sofifa&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_player&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;players&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;players&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;players&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;players&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Remember to scrape nicely though, please don't crash someone's website by scraping too much / too fast.&lt;/p&gt;
&lt;h2 id="whats-next"&gt;What's Next?&lt;/h2&gt;
&lt;p&gt;My TODO list has plenty more modelling approaches to try and more websites to scrape but if there's anything else you think would be good to include then let me know.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Scraping"></category><category term="Scraping"></category><category term="Data"></category></entry><entry><title>Ten Years of pena.lt/y/blog</title><link href="2022/10/18/ten-year-anniversary/" rel="alternate"></link><published>2022-10-18T19:30:00+00:00</published><updated>2022-10-18T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2022-10-18:2022/10/18/ten-year-anniversary/</id><summary type="html">&lt;p&gt;It's been ten years since I started this blog...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Today marks the blog's ten year anniversary and what a journey it's been!&lt;/p&gt;
&lt;p&gt;It all started whilst I was sat in Manchester Airport waiting for a flight to Dubrovnik to present at a scientific conference. I'd been watching Match of the Day the night before and was pondering some comment Alan Shearer had made that baffled me at the time. I can't remember what it was that he'd said but I do remember thinking it couldn't possibly be true. &lt;/p&gt;
&lt;p&gt;When I got to my hotel in Dubrovnik, Alan Shearer's comment was still bugging me so I connected to the hotel's internet and started doing some Googling. I soon stumbled upon Simon Gleave's &lt;a href="https://scoreboardjournalism.wordpress.com/"&gt;Scoreboard Journalism&lt;/a&gt; blog and was inspired to have a go at football analytics myself.&lt;/p&gt;
&lt;h2 id="career-changing"&gt;Career Changing&lt;/h2&gt;
&lt;p&gt;Within a year of starting the blog, I was contacted by a start-up company called Onside Analysis. Based largely on my blog, they offered me a job developing predictive models for the gambling industry and to try and help them expand into working with football clubs. I accepted the role and left the world of scientific reasearch / clinical statistics behind forever to become a data scientist.&lt;/p&gt;
&lt;p&gt;Sadly, the role at Onside Analysis didn't last long as they were bought out by another company and I moved on as redundancy was looming. However, I still work within data science now and a large part of my career change was down to this blog.&lt;/p&gt;
&lt;h2 id="the-highs-and-the-lows"&gt;The Highs and the Lows&lt;/h2&gt;
&lt;p&gt;Over the years, I've made friends, met interesting people and worked with football clubs and governing bodies all through this blog, which has given me some fantastic (and not so fantastic) experiences.&lt;/p&gt;
&lt;p&gt;Some of the highs include my work being discussed by Rafa Benitez on his personal blog, being featured in an article by John Burn-Murdoch in the Daily Telegraph and receiving feedback from a large governing body that Sir Alex Ferguson was impressed by a piece of analysis I'd done for them &lt;em&gt;"despite Sir Alex not normally liking that football analytics sort of thing"&lt;/em&gt; &amp;#128518;&lt;/p&gt;
&lt;p&gt;I've also presented at three Opta Pro conferences, as well as writing a fourth presentation that somebody else gave there on my behalf.&lt;/p&gt;
&lt;p&gt;There have been some lows as well. I don't think I'll ever forget the humiliating taxi ride back to the train station after pretty much being laughed out of the room at a Premier League football club. They were a regular lower half team at the time and as part of a discussion about helping them with some work they challenged me to prove myself by recommending some attackers that were suitable transfer targets for them.&lt;/p&gt;
&lt;p&gt;I enthusiastically made some suggestions based on a model I'd been developing and promptly was laughed at by the entire room, who took great pleasure in telling me none of my suggestions would ever be good enough for the Premier League. They then cancelled the rest of the day's meetings we had planned and I left for home never to be contacted by them again. &lt;/p&gt;
&lt;p&gt;I can look back on it now with a wry smile but at the time I was quite disappointed as I'd taken a day off work, paid for an expensive train ticket and spent a lot of time preparing for the meeting only to be mocked. I think history is on my side though as all my suggestions went on to have much more successful careers than the player they eventually went on to sign. &lt;sup&gt;1&lt;/sup&gt; &amp;#128514;&lt;/p&gt;
&lt;h2 id="whats-next"&gt;What's Next?&lt;/h2&gt;
&lt;p&gt;I released a new update to the &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog python package&lt;/a&gt; last week, which adds in some new models and scrapers so keep an eye out on the blog for an article on those over the next few weeks. &lt;/p&gt;
&lt;p&gt;After that, I've got a giant list of ideas to write about when I can find the time and data for them.&lt;/p&gt;
&lt;p&gt;Here's to the next ten years of pena.lt/y/blog!&lt;/p&gt;
&lt;h2 id="footnotes"&gt;Footnotes&lt;/h2&gt;
&lt;p&gt;Because people are bound to ask, here's the players I recommended plus the teams they played for at the time, bearing in mind that the team in question regularly finished in the bottom half of the table:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Arkadiusz Milik (Ajax)&lt;/li&gt;
&lt;li&gt;Michy Batshuayi (Marseille)&lt;/li&gt;
&lt;li&gt;Wissam Ben Yedder (Toulouse)&lt;/li&gt;
&lt;li&gt;Nabil Fekir (Lyon)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Milik moved to Napoli later that year and Batshuayi moved to Chelsea. Fekir almost moved to Liverpool in 2018 but the deal fell through due to injury and Ben Yedder went on to have a highly successful career with Monaco and France. &lt;/p&gt;
&lt;p&gt;I have no regrets about my recommendations &amp;#128516;&lt;/p&gt;</content><category term="Misc"></category><category term="Misc"></category></entry><entry><title>Scraping Football Data Using the penaltyblog Python Package</title><link href="2022/08/05/scraping-football-data-using-penaltyblog-python-package/" rel="alternate"></link><published>2022-08-05T19:30:00+00:00</published><updated>2022-08-05T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2022-08-05:2022/08/05/scraping-football-data-using-penaltyblog-python-package/</id><summary type="html">&lt;p&gt;This article shows how to use the penaltyblog python package to scrape football data...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;If you're interested in football analytics or trying to predict football results and odds for betting markets then you are going to need data. This article will walk through how to use the &lt;code&gt;penaltyblog&lt;/code&gt; python package to scrape football (soccer) data from different websites and how to join data from different sources together.&lt;/p&gt;
&lt;h2 id="installing-penaltyblog"&gt;Installing &lt;code&gt;penaltyblog&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The first thing you need to do is install the &lt;code&gt;penaltyblog&lt;/code&gt; package from &lt;a href="https://pypi.org/project/penaltyblog/"&gt;pypi&lt;/a&gt; using pip. Provided you've got Python 3 and pip already installed then you just need to run the command below in a terminal. If you've not got Python installed yet, then head over to &lt;a href="https://www.anaconda.com/products/distribution"&gt;Anaconda&lt;/a&gt; and install that first.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="the-scrapers"&gt;The Scrapers&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;penaltyblog&lt;/code&gt; contains scrapers for a number of different websites, included Understat, ESPN and football-data.co.uk. For consistency, each of the scrapers takes the same arguments when it is created. This is a standardized competition name and season. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;understat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;espn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ESPN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It doesn't matter which data source your are scraping, competition names always comprise the country's three letter code followed by it's name, and the season is &lt;code&gt;start_year-end_year&lt;/code&gt;. The scraper then maps this to whatever the data source is calling the competition so you don't need to remember whether you are scraping &lt;code&gt;La Liga&lt;/code&gt;, &lt;code&gt;LaLiga&lt;/code&gt;, &lt;code&gt;La Liga Primera Division&lt;/code&gt; or whatever else a particular website calls it.&lt;/p&gt;
&lt;p&gt;You can get a list of available competitions for each scraper by calling its &lt;code&gt;list_competitions()&lt;/code&gt; function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;understat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;list_competitions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;DEU Bundesliga 1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;ENG Premier League&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;ESP La Liga&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;FRA Ligue 1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;ITA Serie A&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;RUS Premier League&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="fixtures"&gt;Fixtures&lt;/h2&gt;
&lt;p&gt;Let's go ahead and scrape ourselves fixtures from Understat for the English Premier League&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;understat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;under_fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;understat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;under_fixtures&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div style="overflow-x:auto;"&gt;
&lt;table class="table table-striped table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;understat_id&lt;/th&gt;
      &lt;th&gt;datetime&lt;/th&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;th&gt;goals_home&lt;/th&gt;
      &lt;th&gt;goals_away&lt;/th&gt;
      &lt;th&gt;xg_home&lt;/th&gt;
      &lt;th&gt;xg_away&lt;/th&gt;
      &lt;th&gt;forecast_w&lt;/th&gt;
      &lt;th&gt;forecast_d&lt;/th&gt;
      &lt;th&gt;forecast_l&lt;/th&gt;
      &lt;th&gt;season&lt;/th&gt;
      &lt;th&gt;competition&lt;/th&gt;
      &lt;th&gt;date&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;id&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1.888180&lt;/td&gt;
      &lt;td&gt;1.023850&lt;/td&gt;
      &lt;td&gt;0.6289&lt;/td&gt;
      &lt;td&gt;0.2287&lt;/td&gt;
      &lt;td&gt;0.1424&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628899200---burnley---brighton&lt;/th&gt;
      &lt;td&gt;16378&lt;/td&gt;
      &lt;td&gt;2021-08-14 14:00:00&lt;/td&gt;
      &lt;td&gt;Burnley&lt;/td&gt;
      &lt;td&gt;Brighton&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;1.795480&lt;/td&gt;
      &lt;td&gt;1.685300&lt;/td&gt;
      &lt;td&gt;0.3894&lt;/td&gt;
      &lt;td&gt;0.2877&lt;/td&gt;
      &lt;td&gt;0.3229&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-08-14&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628899200---chelsea---crystal_palace&lt;/th&gt;
      &lt;td&gt;16379&lt;/td&gt;
      &lt;td&gt;2021-08-14 14:00:00&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1.187090&lt;/td&gt;
      &lt;td&gt;0.321701&lt;/td&gt;
      &lt;td&gt;0.6405&lt;/td&gt;
      &lt;td&gt;0.2822&lt;/td&gt;
      &lt;td&gt;0.0773&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-08-14&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628899200---everton---southampton&lt;/th&gt;
      &lt;td&gt;16380&lt;/td&gt;
      &lt;td&gt;2021-08-14 14:00:00&lt;/td&gt;
      &lt;td&gt;Everton&lt;/td&gt;
      &lt;td&gt;Southampton&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;2.388630&lt;/td&gt;
      &lt;td&gt;0.580601&lt;/td&gt;
      &lt;td&gt;0.8359&lt;/td&gt;
      &lt;td&gt;0.1234&lt;/td&gt;
      &lt;td&gt;0.0407&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-08-14&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628899200---leicester---wolverhampton_wanderers&lt;/th&gt;
      &lt;td&gt;16381&lt;/td&gt;
      &lt;td&gt;2021-08-14 14:00:00&lt;/td&gt;
      &lt;td&gt;Leicester&lt;/td&gt;
      &lt;td&gt;Wolverhampton Wanderers&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0.668082&lt;/td&gt;
      &lt;td&gt;1.327140&lt;/td&gt;
      &lt;td&gt;0.1683&lt;/td&gt;
      &lt;td&gt;0.2750&lt;/td&gt;
      &lt;td&gt;0.5567&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-08-14&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;If you look at the table above (you may need to scroll the table horizontally if you're on a small screen) then you'll notice that 
the column names have a consistent style to them, e.g. all in lowercase, formatted as snake case etc to make them easier to work with.&lt;/p&gt;
&lt;p&gt;Where possible, the column names are also consistent between data sources. For example, the column for the home team is always called &lt;code&gt;team_home&lt;/code&gt; whatever the site you're scraping. This may seem trivial but it's a huge time saving not having to try and remember what each different data source calls the same things. &lt;/p&gt;
&lt;p&gt;Columns are always named as &lt;code&gt;team_home&lt;/code&gt;, &lt;code&gt;goals_home&lt;/code&gt;, &lt;code&gt;xg_home&lt;/code&gt; etc so that if you print out the column names then related columns, 
such as &lt;code&gt;team_home&lt;/code&gt; and &lt;code&gt;team_away&lt;/code&gt;, appear next to each other. No more searching through a giant list of columns to try and find the one you want.&lt;/p&gt;
&lt;p&gt;The data also comes with an &lt;code&gt;id&lt;/code&gt; column as the dataframe's index so every row has a unique key associated with it comprising the timestamp plus the team names.&lt;/p&gt;
&lt;h2 id="merging-data-sources"&gt;Merging Data Sources&lt;/h2&gt;
&lt;p&gt;Combining scraped data from multiple data sources can be tricky but the &lt;code&gt;penaltyblog&lt;/code&gt; scrapers can help with this too. As a somewhat contrived example, let's try and combine Understat's xG scores with football-data.co.uk's betting odds for Bet365.&lt;/p&gt;
&lt;p&gt;The first thing we need to do is scrape the data from both sites. &lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;under&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ufix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;under&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ufix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ufix&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;xg_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;xg_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="n"&gt;fb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fbfix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;fbfix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fbfix&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;b365_h&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;b365_d&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;b365_a&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In theory, we should be able to merge the two datasets by joining on the team names. Let's give it a go and see what happens...&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ufix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fbfix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;team_away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;inner&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Oh dear, we only have 240 fixtures instead of the 380 we would expect for a full season of Premier League fixtures &amp;#128533;. &lt;/p&gt;
&lt;p&gt;Unfortunately, both data sources use different team names. For example, Understat uses &lt;code&gt;Manchester City&lt;/code&gt; whereas football-data uses &lt;code&gt;Man City&lt;/code&gt; so we can't join on them since they don't match.&lt;/p&gt;
&lt;p&gt;To get around this problem, all of the &lt;code&gt;penaltyblog&lt;/code&gt; scrapers come with the ability to remap team names. &lt;code&gt;penaltyblog&lt;/code&gt; doesn't have mappings for all the world's football teams (yet..) but there's enough for this example and you can easily extend the mappings with your own team names.&lt;/p&gt;
&lt;p&gt;The mappings themselves are just a standard python dictionary, with the key as the team name you want to end up with and the value as a list of possibles choices to remap. The example below maps both &lt;code&gt;Man Utd&lt;/code&gt; and &lt;code&gt;Man United&lt;/code&gt; to &lt;code&gt;Manchester United&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;&amp;quot;Manchester United&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man Utd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Man United&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's try our join again but with the example team name mappings included. Notice how we can now simply join using the &lt;code&gt;id&lt;/code&gt; column since it's unique per fixture and will be identical across both datasets now we've mapped the team names.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;mappings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_example_team_name_mappings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;under&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mappings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ufix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;under&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;fb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FootballData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mappings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fbfix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ufix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fbfix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Success! With just 12 lines of code (including the blank lines) we've scraped both Understat and football-data.co.uk then merged the two data sets together using the unique &lt;code&gt;id&lt;/code&gt; the scrapers automatically created for us.&lt;/p&gt;
&lt;h2 id="what-else-can-the-scrapers-do"&gt;What Else Can the Scrapers Do?&lt;/h2&gt;
&lt;p&gt;Depending on the data source, the scrapers have additional functions for collecting extra data. For example, the Understat scraper can get you shot data, including the XY coordinates, and the ESPN scraper can get you player and team level data too.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;under&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scrapers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Understat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ENG Premier League&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2021-2022&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;shots&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;under&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_shots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;16376&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div style="overflow-x:auto;"&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;competition&lt;/th&gt;
      &lt;th&gt;season&lt;/th&gt;
      &lt;th&gt;datetime&lt;/th&gt;
      &lt;th&gt;minute&lt;/th&gt;
      &lt;th&gt;result&lt;/th&gt;
      &lt;th&gt;x&lt;/th&gt;
      &lt;th&gt;y&lt;/th&gt;
      &lt;th&gt;x_g&lt;/th&gt;
      &lt;th&gt;player&lt;/th&gt;
      &lt;th&gt;h_a&lt;/th&gt;
      &lt;th&gt;player_id&lt;/th&gt;
      &lt;th&gt;situation&lt;/th&gt;
      &lt;th&gt;shot_type&lt;/th&gt;
      &lt;th&gt;match_id&lt;/th&gt;
      &lt;th&gt;team_home&lt;/th&gt;
      &lt;th&gt;team_away&lt;/th&gt;
      &lt;th&gt;goals_home&lt;/th&gt;
      &lt;th&gt;goals_away&lt;/th&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;th&gt;player_assisted&lt;/th&gt;
      &lt;th&gt;last_action&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;id&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;MissedShots&lt;/td&gt;
      &lt;td&gt;0.913&lt;/td&gt;
      &lt;td&gt;0.539&lt;/td&gt;
      &lt;td&gt;0.053&lt;/td&gt;
      &lt;td&gt;Frank Onyeka&lt;/td&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;9681&lt;/td&gt;
      &lt;td&gt;OpenPlay&lt;/td&gt;
      &lt;td&gt;Head&lt;/td&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Aerial&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;ShotOnPost&lt;/td&gt;
      &lt;td&gt;0.908&lt;/td&gt;
      &lt;td&gt;0.315&lt;/td&gt;
      &lt;td&gt;0.118&lt;/td&gt;
      &lt;td&gt;Bryan Mbeumo&lt;/td&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;6552&lt;/td&gt;
      &lt;td&gt;OpenPlay&lt;/td&gt;
      &lt;td&gt;RightFoot&lt;/td&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
      &lt;td&gt;Ivan Toney&lt;/td&gt;
      &lt;td&gt;Throughball&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;21&lt;/td&gt;
      &lt;td&gt;Goal&lt;/td&gt;
      &lt;td&gt;0.874&lt;/td&gt;
      &lt;td&gt;0.698&lt;/td&gt;
      &lt;td&gt;0.052&lt;/td&gt;
      &lt;td&gt;Sergi Canos&lt;/td&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;1078&lt;/td&gt;
      &lt;td&gt;OpenPlay&lt;/td&gt;
      &lt;td&gt;RightFoot&lt;/td&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
      &lt;td&gt;Ethan Pinnock&lt;/td&gt;
      &lt;td&gt;BallRecovery&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;MissedShots&lt;/td&gt;
      &lt;td&gt;0.812&lt;/td&gt;
      &lt;td&gt;0.478&lt;/td&gt;
      &lt;td&gt;0.066&lt;/td&gt;
      &lt;td&gt;Sergi Canos&lt;/td&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;1078&lt;/td&gt;
      &lt;td&gt;OpenPlay&lt;/td&gt;
      &lt;td&gt;RightFoot&lt;/td&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
      &lt;td&gt;Frank Onyeka&lt;/td&gt;
      &lt;td&gt;Pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1628812800---brentford---arsenal&lt;/th&gt;
      &lt;td&gt;ENG Premier League&lt;/td&gt;
      &lt;td&gt;2021-2022&lt;/td&gt;
      &lt;td&gt;2021-08-13 19:00:00&lt;/td&gt;
      &lt;td&gt;29&lt;/td&gt;
      &lt;td&gt;MissedShots&lt;/td&gt;
      &lt;td&gt;0.892&lt;/td&gt;
      &lt;td&gt;0.357&lt;/td&gt;
      &lt;td&gt;0.081&lt;/td&gt;
      &lt;td&gt;Bryan Mbeumo&lt;/td&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;6552&lt;/td&gt;
      &lt;td&gt;OpenPlay&lt;/td&gt;
      &lt;td&gt;RightFoot&lt;/td&gt;
      &lt;td&gt;16376&lt;/td&gt;
      &lt;td&gt;Brentford&lt;/td&gt;
      &lt;td&gt;Arsenal&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2021-08-13&lt;/td&gt;
      &lt;td&gt;Kristoffer Ajer&lt;/td&gt;
      &lt;td&gt;Chipped&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Take a look at the &lt;a href="http://docs.pena.lt/y/"&gt;penaltyblog documentation&lt;/a&gt; for more details and examples of each scraper.&lt;/p&gt;
&lt;h2 id="whats-next"&gt;What's Next?&lt;/h2&gt;
&lt;p&gt;There's plenty of other websites to add to the scrapers, with FBRef and whoscored next up on the TODO list but let me know if there's any others that should be included too.&lt;/p&gt;
&lt;p&gt;After that, it's back to the modelling to add more techniques for predicting football results and betting markets.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Scraping"></category><category term="Scraping"></category><category term="Data"></category></entry><entry><title>Predicting Football Results Using Bayesian Modelling with Python and PyMC3</title><link href="2021/08/25/predicting-football-results-using-bayesian-statistics-with-python-and-pymc3/" rel="alternate"></link><published>2021-08-25T19:30:00+00:00</published><updated>2021-08-25T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2021-08-25:2021/08/25/predicting-football-results-using-bayesian-statistics-with-python-and-pymc3/</id><summary type="html">&lt;p&gt;This article looks at how to predict football results using a Bayesian hierarchical model built in Python and PyMC3...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;So far in this series of articles we started off building a &lt;a href="http://www.pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;basic model for predicting football results&lt;/a&gt; using two independent Poisson variables inspired by &lt;a href="http://www.90minut.pl/misc/maher.pdf"&gt;M. J. Maher&lt;/a&gt;. This simple model worked surprisingly well, but had some issues correctly predicting low scores and draws due to the independence of the two Poissons.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="http://pena.lt/y/2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/"&gt;second part of the series&lt;/a&gt;, we used ideas from &lt;a href="http://web.math.ku.dk/~rolf/teaching/thesis/DixonColes.pdf"&gt;Dixon and Coles&lt;/a&gt; to effectively add some dependance back in to our model via &lt;code&gt;rho&lt;/code&gt;, which helped improve its performance with those tricky low scores and draws. We also added in a decay weighting to the data so older fixtures were considered less important that more recent ones when fitting the model.&lt;/p&gt;
&lt;p&gt;Next, we're going to implement &lt;a href="https://discovery.ucl.ac.uk/id/eprint/16040/1/16040.pdf"&gt;Baio and Blangiardo's&lt;/a&gt; Bayesian hierarchical model to see how well that performs compared with our models so far.&lt;/p&gt;
&lt;p&gt;If you've not read through those previous articles yet, I suggest starting there first as it will provide you with the background to what we're doing here.&lt;/p&gt;
&lt;p&gt;Let's get started!&lt;/p&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;The aim of the model we're building is to predict the number of goals each football (soccer) team will score when they play each other. As always, let's start off by downloading some historical fixtures from &lt;a href="https://www.football-data.co.uk"&gt;football-data.co.uk&lt;/a&gt;. Baio and Blangiardo's publication uses the Italian Serie A but we're going to see how their model fares on the English Premier League.&lt;/p&gt;
&lt;p&gt;The code below uses the &lt;a href="https://pypi.org/project/penaltyblog/"&gt;penaltyblog python package&lt;/a&gt; to download three season's worth of data into a pandas dataframe and renames the columns to match the syntax used by Baio and Blangiardo. The &lt;code&gt;home_team&lt;/code&gt; and &lt;code&gt;away_team&lt;/code&gt; columns contain the team names, and &lt;code&gt;yg1&lt;/code&gt; and &lt;code&gt;yg2&lt;/code&gt; are the number of goals scored by the home team and away team, respectively. We've also kept the date column, and the result of each fixture.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;

&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2018&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;away_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;yg1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;yg2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;away_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;yg1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;yg2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Date&lt;/th&gt;
      &lt;th&gt;home_team&lt;/th&gt;
      &lt;th&gt;away_team&lt;/th&gt;
      &lt;th&gt;yg1&lt;/th&gt;
      &lt;th&gt;yg2&lt;/th&gt;
      &lt;th&gt;result&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2018-08-10&lt;/td&gt;
      &lt;td&gt;Man United&lt;/td&gt;
      &lt;td&gt;Leicester&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;H&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Bournemouth&lt;/td&gt;
      &lt;td&gt;Cardiff&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;H&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Fulham&lt;/td&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Huddersfield&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Newcastle&lt;/td&gt;
      &lt;td&gt;Tottenham&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;A&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;We need to give each team a unique ID, as this will make the data easier to handle later on.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;team_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;hg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;ag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Date&lt;/th&gt;
      &lt;th&gt;home_team&lt;/th&gt;
      &lt;th&gt;away_team&lt;/th&gt;
      &lt;th&gt;yg1&lt;/th&gt;
      &lt;th&gt;yg2&lt;/th&gt;
      &lt;th&gt;result&lt;/th&gt;
      &lt;th&gt;hg&lt;/th&gt;
      &lt;th&gt;ag&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2018-08-10&lt;/td&gt;
      &lt;td&gt;Man United&lt;/td&gt;
      &lt;td&gt;Leicester&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;H&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;505&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Watford&lt;/td&gt;
      &lt;td&gt;Brighton&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;H&lt;/td&gt;
      &lt;td&gt;21&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;932&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Bournemouth&lt;/td&gt;
      &lt;td&gt;Cardiff&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;H&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;881&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Huddersfield&lt;/td&gt;
      &lt;td&gt;Chelsea&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;A&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;290&lt;/th&gt;
      &lt;td&gt;2018-08-11&lt;/td&gt;
      &lt;td&gt;Fulham&lt;/td&gt;
      &lt;td&gt;Crystal Palace&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;A&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The &lt;code&gt;hg&lt;/code&gt; column now contains the ID for the home team and the &lt;code&gt;ag&lt;/code&gt; column is the ID for the away team. So for example, Manchester United are team &lt;code&gt;15&lt;/code&gt; and Brighton are team &lt;code&gt;3&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, we split the data in to a &lt;code&gt;train&lt;/code&gt; and &lt;code&gt;test&lt;/code&gt; data set. The &lt;code&gt;train&lt;/code&gt; data is used to fit the model, while we hold back 250 fixtures to &lt;code&gt;test&lt;/code&gt; the model's performance on. We also extract the team IDs and goals scored from the &lt;code&gt;train&lt;/code&gt; data into numpy arrays as it makes them a little easier to work with later on.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;TEST_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;
&lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;TEST_SIZE&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;TEST_SIZE&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

&lt;span class="n"&gt;goals_home_obs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;yg1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;goals_away_obs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;yg2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;home_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;away_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="defining-the-model"&gt;Defining the Model&lt;/h2&gt;
&lt;p&gt;The model itself is the same as the previous Poisson models we've created. We are still assuming that the number of goals a team scores is a function of their attack strength combined with the opposition's defence strength. In other words, teams with better attacks should score more and teams with worse defences should concede more. We also include home field advantage.&lt;/p&gt;
&lt;p&gt;So the model still looks like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_home = home_advantage + home_attack + defence_away&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_away = away_attack + defence_home&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Where it changes this time though is how we implement it as we're going &lt;a href="https://en.wikipedia.org/wiki/Bayesian_statistics"&gt;Bayesian&lt;/a&gt;...&lt;/p&gt;
&lt;h2 id="bayesian-statistics"&gt;Bayesian Statistics&lt;/h2&gt;
&lt;p&gt;Bayesian statistics is a very deep subject area, which I would certainly not be able to do justice to here so if you really want to dig into the details then I whole heartedly recommend reading &lt;a href="https://www.amazon.co.uk/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445"&gt;Statistical Rethinking&lt;/a&gt; by Richard McElreath and &lt;a href="https://www.amazon.co.uk/Doing-Bayesian-Data-Analysis-Tutorial/dp/0124058884/ref=pd_sbs_5/260-7118632-8200329?pd_rd_w=ieuWf&amp;amp;pf_rd_p=a3a7088f-4aec-4dbd-97cc-9a059581fe7b&amp;amp;pf_rd_r=RP8TT2FABSCC9KEWVKSE&amp;amp;pd_rd_r=6a8eb32e-e73c-4176-9553-4436adf47fd0&amp;amp;pd_rd_wg=Zy58R&amp;amp;pd_rd_i=0124058884&amp;amp;psc=1"&gt;Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan&lt;/a&gt; by John Kruschke. I'm by no means an expert in the subject but I've learnt a huge amount from these two books alone.&lt;/p&gt;
&lt;p&gt;But at a very high level for us building our model (and apologies to any real Bayesians reading this), the biggest difference here using a Bayesian approach is that we get a distribution of values for each model parameter rather than the single values we had previously.&lt;/p&gt;
&lt;p&gt;Bayesian modelling also lends itself well to using a technique known as &lt;a href="https://en.wikipedia.org/wiki/Bayesian_hierarchical_modeling"&gt;hierarchical modeling&lt;/a&gt; (also known as multilevel modelling or mixed-effects modelling). This allows us to account for relationships between variables by considering them to come from a common distribution.&lt;/p&gt;
&lt;p&gt;What that means for us is that we can avoid the problems the independent Poisson variables caused in our previous two models. By using a hierarchical model containing two conditionally independent Poisson variables, correlation between the two is naturally taken into account as our observable variables are mixed at an upper level in the model.&lt;/p&gt;
&lt;p&gt;In other words, we don't need to bother applying the Dixon and Coles adjustment to our model's output anymore. Hurrah!&lt;/p&gt;
&lt;h2 id="building-the-model"&gt;Building the Model&lt;/h2&gt;
&lt;p&gt;We're using the awesome &lt;a href="https://docs.pymc.io/"&gt;PyMC3&lt;/a&gt; library in Python to build our model, as shown below.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;home&lt;/code&gt; parameter refers to the advantage teams get from playing at home. This is a fixed effect that assumes the advantage is constant across all teams and across the entire time the dataset covers. We assign an uninformative flat prior to it, meaning that its value comes from a uniform distribution ranging anywhere from -infinity to +infinity.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;atts&lt;/code&gt; and &lt;code&gt;defs&lt;/code&gt; represent each team's attack and defence ratings. Although each team has its own unique ratings, they are estimated from a common distribution. Whilst each football team is clearly different to each other, they are all playing in the same league so we make the reasonable assumption that their ratings come from the same distribution. This allows us to pool related data together, which can often improve model estimates. Our prior here is that the ratings are normally distributed.&lt;/p&gt;
&lt;p&gt;It's worth pointing out that we are subtracting the mean ratings from both the attack and defence ratings to ensure identifiability. There are countless combinations of parameters that could potentially give us the same results so this forces the model to give us a reproducible output and keeps the parameters in a realistic range.&lt;/p&gt;
&lt;p&gt;The final chunk of code, where we calculate &lt;code&gt;home_theta&lt;/code&gt; and &lt;code&gt;away_theta&lt;/code&gt;, looks very similar to our previous models. We just sum up the ratings and home advantage to calculate the goal expectancy for each team and apply that to a Poisson distribution.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pymc3&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pm&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;theano.tensor&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;tt&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# home advantage&lt;/span&gt;
    &lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Flat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# attack ratings&lt;/span&gt;
    &lt;span class="n"&gt;tau_att&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gamma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;tau_att&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;atts_star&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts_star&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tau_att&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# defence ratings&lt;/span&gt;
    &lt;span class="n"&gt;tau_def&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gamma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;tau_def&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;def_star&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;def_star&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tau_def&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# apply sum zero constraints&lt;/span&gt;
    &lt;span class="n"&gt;atts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deterministic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;atts_star&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;atts_star&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;defs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deterministic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;def_star&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;def_star&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# calulate theta&lt;/span&gt;
    &lt;span class="n"&gt;home_theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;atts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;defs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;away_theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;atts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;defs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# goal expectation&lt;/span&gt;
    &lt;span class="n"&gt;home_points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Poisson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_goals&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;home_theta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goals_home_obs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Poisson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_goals&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;away_theta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goals_away_obs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="fitting-the-model"&gt;Fitting the Model&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Drum roll...&lt;/strong&gt; now we fit the model by calling PyMC3's &lt;code&gt;sample&lt;/code&gt; function. We're sampling 2000 times across six chains, which gives us 12,000 samples in total to play with. We've also set &lt;code&gt;tune&lt;/code&gt; to 1000, which means each chain will throw away 1000 samples before starting as these initial samples can sometimes be more uncertain.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;trace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tune&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_inferencedata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's check how well the model has done by looking at it's fitted parameters. We'll start off with the home advantage.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;var_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210825_pymc3_home_advantage.png"&gt;&lt;/p&gt;
&lt;p&gt;As I mentioned above, with Bayesian models we get a distribution of possible parameter values rather than a single point estimate. Reassuringly, we've got a smooth curve here centred somewhere around 0.3, with is pretty close to the home advantage we saw for our Poisson models - our Dixon and Coles model in the last article gave us a value of 0.294.&lt;/p&gt;
&lt;p&gt;We can also take a look at the attack and defence ratings.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;var_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210825_pymc3_att_def_trace.png"&gt;&lt;/p&gt;
&lt;p&gt;Again, we've got nice looking curves which is reassuring but that looks a little busy with 20 different attack and defence curves so let's replot the data a little differently.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;arviz&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;az&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;atts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;az&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hdi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;upper_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;upper_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;atts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;atts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;xerr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;atts&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;upper&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210825_pymc3_atts.png"&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;defs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;az&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hdi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;upper_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team_index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;upper_hdi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;defs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;median&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;defs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;xerr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;defs&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lower&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;upper&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210825_pymc3_defs.png"&gt;&lt;/p&gt;
&lt;p&gt;These last two plots show us the median value for each team's attack and defence rating (the dot in the middle), plus the high density interval or HDI (the horizontal line). You can think of the HDI as being the Bayesian equivalent to a 95% confidence interval.&lt;/p&gt;
&lt;p&gt;Again, the results pass the eye test - Manchester City and Liverpool have the highest attack rating (meaning the model thinks they score the most goals) and the lowest defence rating (meaning they concede the fewest goals). Plus, we have Huddersfield and Norwich being terrible at both ends of the pitch.&lt;/p&gt;
&lt;p&gt;The width of the HDI represents the model's uncertainty for the parameter, where the wider the bar the more uncertain we are of the parameter's true value. Typically, it tends to be wider here for teams that have been promoted / relegated as we have less data available for them in the data set.&lt;/p&gt;
&lt;h2 id="predicting-results"&gt;Predicting Results&lt;/h2&gt;
&lt;p&gt;So far, so good. We've fit our model and its parameters look feasible so the next stage is to predict some results.&lt;/p&gt;
&lt;p&gt;Things are slightly different with this model though as we don't have a specific value for each of our parameters. Instead, we have a distribution of values - in fact, we have 12,000 values for each parameter (remember that's how many samples we got back when we fit the model).&lt;/p&gt;
&lt;p&gt;To simplify things though we'll just collapse our distributions down to a single value by taking the average of each parameter, which we can then use to calculate the goal expectation in a similar way to how we did it for our previous Poisson models.&lt;/p&gt;
&lt;p&gt;This makes me sad though as one of the major advantages of Bayesian models is having these distributions so it seems a shame to throw them away and just keep their average. However, I seem to get the same results doing this compared with other approaches I've tried so let's go with it.&lt;/p&gt;
&lt;p&gt;The chunk of code below shows how to do this. The first few lines just pull out the average parameters from the model's output for a given home and away team. We then use these values to calculate the goal expectation. If you've read the previous two articles in this series then there's nothing particularly unexpected here.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home_team_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_team_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;   
    &lt;span class="c1"&gt;# get parameters&lt;/span&gt;
    &lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;atts_home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;home_team_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
    &lt;span class="n"&gt;atts_away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;away_team_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;atts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
    &lt;span class="n"&gt;defs_home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;home_team_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
    &lt;span class="n"&gt;defs_away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;away_team_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

    &lt;span class="c1"&gt;# calculate theta&lt;/span&gt;
    &lt;span class="n"&gt;home_theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;atts_home&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;defs_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;atts_away&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;defs_home&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# return the average per team&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;home_theta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_theta&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We then just need to pass in the model's output and the relevant team IDs to get our predicted number of goals. Let's give it a try for Manchester City Vs Manchester United.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.40817399509211&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9015500181975004&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As a Manchester City fan, this makes me happy as it has City winning by 2.4 goals to 0.9 goals :)&lt;/p&gt;
&lt;p&gt;Let's flip it around and have Manchester United as the home team.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.2138865328696127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.7885438632930635&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nice, even without the home advantage Manchester City is still winning.&lt;/p&gt;
&lt;h2 id="is-our-bayesian-model-any-good"&gt;Is Our Bayesian Model Any Good?&lt;/h2&gt;
&lt;p&gt;Now it's time to assess whether these predictions are actually any good or not. We'll do this by predicting the probability of home win / draw / away win for each fixture in our test data set and compare it to what actually happened using &lt;a href="http://constantinou.info/downloads/papers/solvingtheproblem.pdf"&gt;Rank Probability Scores&lt;/a&gt; to measure the error. I explained why we use Rank Probability Scores in the last article so take a look &lt;a href="http://pena.lt/y/2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/"&gt;here&lt;/a&gt; if you want to know more.&lt;/p&gt;
&lt;p&gt;To do this, we're going to need to convert our predicted score lines to win / draw / loss probabilities. This is exactly the same as how we did it in the previous articles. We just use the Poisson distribution to get the probability of scoring 0-10 goals for each team, then take the outer product of the two to get the probabilities for each possible scoreline.&lt;/p&gt;
&lt;p&gt;The sum of the upper triangle of the resulting matrix is the probability an away win, the sum of the diagonal is the probability of a draw and the sum of the lower triangle is the probability of a home win.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;win_draw_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_expectation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_expectation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;home_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;away_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
    &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tril&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;triu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;draw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And here's the function to calculate the Rank Probability Score of our model on our test data. It's pretty simple, we just loop through the fixtures and decide whether the actual result was a home win, draw or away win. We then get the probabilities for each outcome from our model and calculate the Rank Probability Score for our predictions versus what actually happened. Finally, we return the average error.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

        &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal_expectation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hg&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ag&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;win_draw_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's give it a go and see what happens.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;calculate_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mf"&gt;0.21976530250997778&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We've got our average error but is this any good or not? Let's run our Dixon and Coles model over the same data and compare the error to see which is best.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_rps_dc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;result&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;

        &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DixonColesGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;yg1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;yg2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;away_team&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;calculate_rps_dc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mf"&gt;0.21589324705157467&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Oh dear, the average error for our Bayesian model is actually slightly higher than for our Dixon and Coles model for this particular set of data :(&lt;/p&gt;
&lt;p&gt;Why is this?&lt;/p&gt;
&lt;p&gt;The first reason is that our Dixon and Coles model has a decay weighting built in to it. This means that it considers more recent data to be more important that older data when estimating the model's parameters. Currently, our Bayesian model lacks this so it considers all data to be equally important.&lt;/p&gt;
&lt;p&gt;We know that team's strengths are dynamic and vary over time due to form, injuries, transfers etc so this is definitely a limitation of our current Bayesian approach.&lt;/p&gt;
&lt;h2 id="overshrinkage"&gt;Overshrinkage&lt;/h2&gt;
&lt;p&gt;The second reason is something called overshrinkage. This a well known problem associated with hierarchical models where more extreme values get pulled back towards the overall average of the data.&lt;/p&gt;
&lt;p&gt;For us, it means that the best teams, such as Liverpool and Manchester City, will not appear as strong as they should be whilst the really bad teams, such as Huddersfield, will appear better than they should. Essentially, our parameter estimates get regressed towards the mean, which reduces the model's performance.&lt;/p&gt;
&lt;p&gt;Baio and Blangiardo discuss this issue in their publication and propose a solution where by they restructure their model to put teams into one of three different groups - top teams, mid-table teams or bottom teams. The model then uses this grouping within the hierarchical model so that parameters are estimated from a distribution of similar quality teams to try and reduce the overshrinkage.&lt;/p&gt;
&lt;p&gt;We've covered a lot of ground in this article already though so I'm going to save digging any further into the overshrinkage for a future blog post.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;We've built a Bayesian model using Python and PyMC3 that's capable of predicting football results. Its performance is pretty close to our previous Dixon and Coles model despite being a simpler model in many ways - it doesn't have a decay weighting included yet and it doesn't require the &lt;code&gt;rho&lt;/code&gt; adjustment either, plus we've not accounted for the overshrinkage issue - so we've got a pretty good base model here to build on.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Prediction"></category><category term="Bayesian"></category><category term="Python"></category><category term="PyMC3"></category><category term="Prediction"></category><category term="Dixon and Coles"></category><category term="Bivariate Poisson"></category><category term="Weibull Count Copula"></category></entry><entry><title>Predicting Football Results Using Python and the Dixon and Coles Model</title><link href="2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/" rel="alternate"></link><published>2021-06-24T19:30:00+00:00</published><updated>2021-06-24T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2021-06-24:2021/06/24/predicting-football-results-using-python-and-dixon-and-coles/</id><summary type="html">&lt;p&gt;Building on the last article, we upgrade our Poisson model with the Dixon and Coles adjustment and time decay...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In the &lt;a href="http://www.pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;last article&lt;/a&gt;, we built a model based on the Poisson distribution using Python that could predict the results of football (soccer) matches. Whilst the model worked fairly well, it struggled predicting some of the lower score lines, such as 0-0, 1-0, 0-1.&lt;/p&gt;
&lt;p&gt;In this article we'll look at how &lt;a href="http://web.math.ku.dk/~rolf/teaching/thesis/DixonColes.pdf"&gt;Dixon and Coles&lt;/a&gt; added in an adjustment factor to improve the model's performance.&lt;/p&gt;
&lt;p&gt;All the code here builds on my &lt;a href="http://www.pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;last article&lt;/a&gt; so if you've not read that yet then I highly recommend going back and reading that one first.&lt;/p&gt;
&lt;p&gt;Okay, let's get coding!&lt;/p&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;Just like last time, our model will work by predicting the number of goals scored / conceded by each team when they play each other based on their historical performances. Let's grab the same data as last time from &lt;a href="https://www.football-data.co.uk"&gt;football-data.co.uk&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;https://www.football-data.co.uk/mmz4281/1718/E0.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th title="Field #1"&gt;&lt;/th&gt;
    &lt;th title="Field #2"&gt;Date&lt;/th&gt;
    &lt;th title="Field #3"&gt;HomeTeam&lt;/th&gt;
    &lt;th title="Field #4"&gt;AwayTeam&lt;/th&gt;
    &lt;th title="Field #5"&gt;FTHG&lt;/th&gt;
    &lt;th title="Field #6"&gt;FTAG&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;td&gt;0&lt;/td&gt;
  &lt;td&gt;11/08/2017&lt;/td&gt;
  &lt;td &gt;Arsenal&lt;/td&gt;
  &lt;td &gt;Leicester&lt;/td&gt;
  &lt;td &gt;4&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;1&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Brighton&lt;/td&gt;
  &lt;td &gt;Man City&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
  &lt;td &gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;2&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Chelsea&lt;/td&gt;
  &lt;td &gt;Burnley&lt;/td&gt;
  &lt;td &gt;2&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;3&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Crystal Palace&lt;/td&gt;
  &lt;td &gt;Huddersfield&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;4&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Everton&lt;/td&gt;
  &lt;td &gt;Stoke&lt;/td&gt;
  &lt;td &gt;1&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;h2 id="building-the-model"&gt;Building the model&lt;/h2&gt;
&lt;p&gt;The model itself doesn't change from last time. We are still assuming that the number of goals a team scores is a function of their attack strength combined with the opposition's defence strength. In other words, teams with better attacks should score more and teams with worse defences should concede more. We  also include home field advantage.&lt;/p&gt;
&lt;p&gt;So the model still looks like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_home = home_advantage + home_attack + defence_away&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_away = away_attack + defence_home&lt;/code&gt;&lt;/p&gt;
&lt;h2 id="finding-the-optimal-parameters"&gt;Finding the Optimal parameters&lt;/h2&gt;
&lt;p&gt;Now we finally get to the part where things start to change. Dixon and Coles recognised that the Poisson model was struggling with the 0-0, 1-0 and 1-0 score lines and proposed adding in an extra parameter (known as $\rho$ or &lt;em&gt;rho&lt;/em&gt;) to the model that provides dependance between these scores. This is done in such a way that ensures the marginal distribution remain Poisson shaped.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rho_correction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goals_away&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home_exp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_exp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;goals_home&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;goals_away&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_exp&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;away_exp&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;goals_home&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;goals_away&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_exp&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;goals_home&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;goals_away&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_exp&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;goals_home&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;goals_away&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we need to tweak our &lt;code&gt;log_likelihood&lt;/code&gt; function from the last article to calculate the new &lt;code&gt;rho_correction&lt;/code&gt; and add it on to our log-likelihood.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;home_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;adj_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rho_correction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;adj_llk&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;

    &lt;span class="n"&gt;log_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;adj_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;log_llk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We also need make some minor updates to the code we wrapped around &lt;code&gt;scipy's&lt;/code&gt; optimizer to add in &lt;em&gt;rho&lt;/em&gt;. Compared with the version from the last article, we've added an extra value to &lt;code&gt;params&lt;/code&gt; for &lt;em&gt;rho&lt;/em&gt; and updated &lt;code&gt;_fit&lt;/code&gt; to pass &lt;em&gt;rho&lt;/em&gt; through to the &lt;code&gt;log_likelihood&lt;/code&gt; function. We're also printing out &lt;code&gt;res["fun"]&lt;/code&gt;, which is the value of the log likelihood our optimised set of parameters gives us.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pprint&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.optimize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])))&lt;/span&gt;
    &lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# attack strength&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# defence strength&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# home advantage&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# rho&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;attack_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;defence_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
        &lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;rho&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;rho&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;maxiter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;disp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;constraints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;eq&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;fun&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rho&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Log Likelihood: &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fun&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's run the code and see what the model's parameters look like.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="s1"&gt;&amp;#39;Log Likelihood:&amp;#39;&lt;/span&gt; &lt;span class="mf"&gt;1050.8007455859752&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;attack_Arsenal&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.4475806767563462&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Bournemouth&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9564134429072025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Brighton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6846752119668259&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Burnley&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6983313339318459&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Chelsea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.2572225786245756&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Crystal Palace&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9493623391542453&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Everton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9377478892934925&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Huddersfield&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.48934382402256327&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Leicester&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.189901145845123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Liverpool&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.5643592069690049&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Man City&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.786002855268045&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Man United&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.3309393567717684&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Newcastle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7670321482410185&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Southampton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7651647511032018&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Stoke&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7195758814775219&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Swansea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.46645544795474325&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Tottenham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.4273383910058157&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Watford&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9338520316567531&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_West Brom&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5837325635190684&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_West Ham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0449689235308348&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Arsenal&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.9058108004220589&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Bournemouth&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.758464198660216&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Brighton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8946000243344525&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Burnley&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.2266994646301788&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Chelsea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.2203046589466593&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Crystal Palace&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8536543328557012&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Everton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8097571454511648&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Huddersfield&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8264395373062519&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Leicester&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.7548204664366124&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Liverpool&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.1755563808950777&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Man City&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.515837101040477&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Man United&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.5183987188316523&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Newcastle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0403853901493407&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Southampton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8498059001250292&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Stoke&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.6625948352647537&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Swansea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.865693751408734&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Tottenham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.272001837824072&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Watford&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.7091269693877006&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_West Brom&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8701857668132795&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_West Ham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.6432122607551816&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;home_adv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.29444537714494584&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;rho&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1285194379447786&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Reassuringly, this look pretty similar to the parameters we got in the &lt;a href="http://www.pena.lt/y/2021/06/18/predicting-football-results-using-the-poisson-distribution/"&gt;last article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Even better, our log likelihood is &lt;code&gt;1050.80&lt;/code&gt;, which is lower than the &lt;code&gt;1052.34&lt;/code&gt; we get if we re-run the model from the previous article. This means the new model fits the data better :)&lt;/p&gt;
&lt;h2 id="but-what-does-rho-actually-do"&gt;But What Does &lt;em&gt;rho&lt;/em&gt; Actually Do?&lt;/h2&gt;
&lt;p&gt;So how has adding &lt;em&gt;rho&lt;/em&gt; into the model managed to improve things?&lt;/p&gt;
&lt;p&gt;If you remember all the way back to the last article, I mentioned that one of the issues with our Poisson model was that it predicted each team's goals independently from each other.&lt;/p&gt;
&lt;p&gt;In reality though, each team's goals scored are not independent from each other. For example, if the score is 0-0 with five minutes to go then the underdog may settle for a draw and not push to score. Or if a team goes a goal down early on they may park the buss to prevent a more humiliating score line.&lt;/p&gt;
&lt;p&gt;Whilst goals are generally Poisson distributed, there are situations where their Poisson-ness breaks down somewhat. Dixon and Cole's adjustment using &lt;em&gt;rho&lt;/em&gt; is an attempt to fix this by adding some dependance between the score lines.&lt;/p&gt;
&lt;p&gt;As an example, let's look at the difference between the probabilities of of the 0-0, 1-0 and 0-1 scores before and after applying the Dixon and Coles adjustment.&lt;/p&gt;
&lt;p&gt;We start off by getting the model's parameters and using them to calculate the number of goals we expect each team to score. Then we use this to get the probability of 0 and 1 goals being scored (the Dixon and Coles adjustment only affects those score lines of 0 and 1 goals so we won't bother going any higher).&lt;/p&gt;
&lt;p&gt;We then take the outer product of the two sets of probabilities to create a matrix, apply the Dixon and Coles adjustment and print out how the probabilities changed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;home_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Arsenal&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;away_team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Chelsea&amp;quot;&lt;/span&gt;

&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;home_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;rho&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rho&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;home_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;away_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;probability_matrix_before&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;probability_matrix_before&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probability_matrix_after&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;probability_matrix_before&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.01377996&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.01377996&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.01377996&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.01377996&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The top-left value has increased by 0.01377996, which has been taken from the top-right value. This means the Dixons and Coles adjustment has increased the likelihood of a 0-0 score line occurring at the expense of 1-0.&lt;/p&gt;
&lt;p&gt;The opposite has also occurred in the bottom row, where the bottom-right value has increased at the expense of the bottom-left value. This means the probability of a 1-1 score line has increased whilst 0-1 has decreased.&lt;/p&gt;
&lt;p&gt;So Dixon and Cole's model increases the probability of low score draws compared with our basic Poisson model. It's not a massive change to the probabilities but it's certainly enough to improve the model's overall performance.&lt;/p&gt;
&lt;h2 id="time-decay"&gt;Time Decay&lt;/h2&gt;
&lt;p&gt;Now that we've sorted out the issue with the dependance between scores, we have a second problem we need to handle.&lt;/p&gt;
&lt;p&gt;At the moment we're considering team's attack and defence ratings to be static. But in reality they actually change over time. Think about how Sheffield United finished 9th in 2019/2020 yet were bottom of the league in 2020/2021. If we were to train our model using both season's worth of data then we'd probably end up overrating how many goals we'd expect Sheffield United to have scored in 2020/2021.&lt;/p&gt;
&lt;p&gt;We could solve this by only training our model over a short time window to ensure we just use the most recent fixtures for a team. But then we'd be at risk of the model reacting to short term trends in form rather than a team's true ability.&lt;/p&gt;
&lt;p&gt;So how do we determine how much data to train the model on?&lt;/p&gt;
&lt;p&gt;Dixon and Coles proposed using exponential decay to down-weight the importance of fixtures based on how long it is since they were played. This means the model can be trained on lots of historical data, with it placing more importance on the most recent fixtures but without completely losing the influence of  older fixtures.&lt;/p&gt;
&lt;p&gt;Here's the function they suggested:&lt;/p&gt;
&lt;p&gt;$$\phi(t) = exp(-\xi t)$$&lt;/p&gt;
&lt;p&gt;And here's its implementation in python, where &lt;code&gt;xi&lt;/code&gt; controls the strength of the down-weighting and &lt;code&gt;t&lt;/code&gt; is the time since the fixture was played.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dc_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lets plot out a few different values of &lt;code&gt;xi&lt;/code&gt; to see what the down-weighting looks like, where the lower the weight the less important the fixture becomes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;sns&lt;/span&gt;

&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;xis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;xis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dc_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Number of Days Ago&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210621_dixon_coles_weighting.png"&gt;&lt;/p&gt;
&lt;h2 id="optimising-the-time-decay"&gt;Optimising the Time decay&lt;/h2&gt;
&lt;p&gt;So the larger the value of xi then the more extreme the down-weighting is. But what value should xi be?&lt;/p&gt;
&lt;p&gt;Ideally, you would optimize your model in a similar way to how we found our attack / defence parameters above and work out the exact value of xi that gives you the best performance. To keep the code a little simpler here though, we're just going to loop through a few different values to see the effect of tweaking xi.&lt;/p&gt;
&lt;p&gt;Whilst we're still using log likelihood to optimise the model itself, we're going to need to use a different metric to compare the values of xi to find out which one is best. We'll do this using &lt;a href="http://constantinou.info/downloads/papers/solvingtheproblem.pdf"&gt;Rank Probability Scores&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Ranked Probability Scores were original proposed by &lt;a href="https://journals.ametsoc.org/view/journals/apme/8/6/1520-0450_1969_008_0985_assfpf_2_0_co_2.xml?tab_body=pdf"&gt;Epstein&lt;/a&gt; back in 1969 as a way to compare probabilistic forecasts against categorical data. Their main advantage here is that as well as looking at accuracy, they also account for distance in the predictions e.g. how far out inaccurate predictions are from what actually happened. This means we can measure the accuracy of our win / draw / loss probabilities against each match's outcome and measure the overall error.&lt;/p&gt;
&lt;p&gt;To keep things simple, we'll download the &lt;a href="https://pypi.org/project/penaltyblog/"&gt;penaltyblog&lt;/a&gt; python package and use the &lt;code&gt;rps&lt;/code&gt; function from there - the idea of this article is to focus on the Dixon and Coles model itself rather than get sidetracked with implementing different metrics.&lt;/p&gt;
&lt;p&gt;Install the &lt;a href="https://pypi.org/project/penaltyblog/"&gt;penaltyblog&lt;/a&gt; package by running the command below in your terminal.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We'll then use the &lt;a href="https://pypi.org/project/penaltyblog/"&gt;penaltyblog&lt;/a&gt; package to download more data from &lt;a href="https://www.football-data.co.uk"&gt;football-data.co.uk&lt;/a&gt;, giving us five season's worth of fixtures to play with.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2017&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2018&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The next thing we need to do is to update our &lt;code&gt;log_likelihood&lt;/code&gt; function to apply the weighting. It's a fairly simple change here, all we do is pass in the weight for the fixture as an additional argument and then multiply the log likelihood by the weight.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;weight&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;home_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;adj_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rho_correction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;adj_llk&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;

    &lt;span class="n"&gt;log_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;adj_llk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;log_llk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we need to update the code that wraps the call to the optimizer. The main change here is that we pass in our value for xi and create weights for the fixtures based on how long ago they were played.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0001&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])))&lt;/span&gt;
    &lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;days_since&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dc_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;days_since&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# attack strength&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# defence strength&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# home advantage&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# rho&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;attack_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;defence_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
        &lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;rho&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;rho&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;maxiter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;disp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;constraints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;eq&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;fun&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rho&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Once we've fit our model, we'll use it to calculate the probabilities of a win / draw / loss occurring for each fixture and then measure the rank probability score of our predictions compared with what actually happened.&lt;/p&gt;
&lt;p&gt;We will also split the data up so that we train on the first fours seasons and predict on the final season. This means our model is predicting on different data to that it was trained on to help avoid &lt;a href="https://en.wikipedia.org/wiki/Overfitting"&gt;overfitting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the updated &lt;code&gt;predict&lt;/code&gt; function that returns home win, draw, away win probabilities.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;home_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;rho&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rho&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;home_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;rho&lt;/span&gt;    

    &lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tril&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;draw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;triu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And finally, here's the function that calculates the ranked probability score from those probabilities.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  

        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All we have to do now is loop through some values for xi and see what happens. Warning - you may want to go and make a coffee at this point as it's going to take a while to loop through all this. At the moment the code is very much written for simplicity rather than speed but I'll show you a faster way to do this bit later on via the &lt;a href="https://pypi.org/project/penaltyblog/"&gt;penaltyblog&lt;/a&gt; python package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;xis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0025&lt;/span&gt;&lt;span class="p"&gt;,]&lt;/span&gt;
&lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;xis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;380&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
    &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;380&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;days_since&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dc_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;days_since&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculate_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ticklabel_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;useOffset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;xi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;RPS&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210621_xi.png"&gt;&lt;/p&gt;
&lt;p&gt;Rank probability scores are a measure of the error in our predictions so the smaller the value the better the model is performing. It looks like we are doing best with an xi of roughly 0.001. Smaller xi values are likely providing too much influence to the older data, whilst larger xi values are giving too much influence to the latest data.&lt;/p&gt;
&lt;p&gt;In Dixon and Cole's paper, the authors report using an xi value of 0.0065. However, their time units are half weeks whilst ours are in days. If we divide 0.0065 by 3.5 (the number of days in a half week) then we get 0.00186, which is in line with our value of 0.001.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This has been a long article with a lot of code so thanks for reading it all the way to the end!&lt;/p&gt;
&lt;p&gt;With relatively minor modifications, we've upgraded our Poisson model to use the Dixon and Coles adjustment to improve its predictions for the 0-0, 0-1 and 1-0 score lines.&lt;/p&gt;
&lt;p&gt;We've also added in time decay so that we down-weight the influence of older data when we fit the model and give greater influence to more recent fixtures.&lt;/p&gt;
&lt;p&gt;This isn't the end of the road though. There's plenty of other things we can do to improve our model, which I'll cover in future articles.&lt;/p&gt;
&lt;h2 id="addendum"&gt;Addendum&lt;/h2&gt;
&lt;p&gt;I mentioned above that the function we were using to optimize was slow as the code was written for simplicity rather than speed. I've neatened up the code and added it my &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; python package if you want to try out a faster version.&lt;/p&gt;
&lt;p&gt;Using &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; we can repeat everything we've done above in just a few lines of code and and have it run in minutes rather than hours.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;penaltyblog&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_mean_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;H&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;D&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTR&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;

        &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2017&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2018&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;england&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   
&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;rps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;,]:&lt;/span&gt;
    &lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dixon_coles_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;dc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DixonColesGoalModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;weight&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculate_mean_rps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.2214667529243473&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.22109214369155702&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.21811162166084946&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.22953214528180002&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content><category term="Prediction"></category><category term="Poisson"></category><category term="Prediction"></category><category term="Dixon and Coles"></category><category term="Betting"></category></entry><entry><title>Predicting Football Results With the Poisson Distribution</title><link href="2021/06/18/predicting-football-results-using-the-poisson-distribution/" rel="alternate"></link><published>2021-06-18T19:30:00+00:00</published><updated>2021-06-18T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2021-06-18:2021/06/18/predicting-football-results-using-the-poisson-distribution/</id><summary type="html">&lt;p&gt;A tutorial about predicting football results using Python and the Poisson distribution...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;This article is going to walk through building a model to predict football results using the Poisson distribution. As we'll find out, the model is fairly simplistic and struggles at points. However, it provides us a with great base to improve from over the next few articles in this series.&lt;/p&gt;
&lt;p&gt;Let's get started!&lt;/p&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;Our model will work by predicting the number of goals scored / conceded by each team when they play each other based on their historical performances. To work that out we're going to need some data so let's grab some football results from the awesome &lt;a href="https://www.football-data.co.uk"&gt;football-data.co.uk website&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;https://www.football-data.co.uk/mmz4281/1718/E0.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th title="Field #1"&gt;&lt;/th&gt;
    &lt;th title="Field #2"&gt;Date&lt;/th&gt;
    &lt;th title="Field #3"&gt;HomeTeam&lt;/th&gt;
    &lt;th title="Field #4"&gt;AwayTeam&lt;/th&gt;
    &lt;th title="Field #5"&gt;FTHG&lt;/th&gt;
    &lt;th title="Field #6"&gt;FTAG&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;td&gt;0&lt;/td&gt;
  &lt;td&gt;11/08/2017&lt;/td&gt;
  &lt;td &gt;Arsenal&lt;/td&gt;
  &lt;td &gt;Leicester&lt;/td&gt;
  &lt;td &gt;4&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;1&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Brighton&lt;/td&gt;
  &lt;td &gt;Man City&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
  &lt;td &gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;2&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Chelsea&lt;/td&gt;
  &lt;td &gt;Burnley&lt;/td&gt;
  &lt;td &gt;2&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;3&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Crystal Palace&lt;/td&gt;
  &lt;td &gt;Huddersfield&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
  &lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;4&lt;/td&gt;
  &lt;td&gt;12/08/2017&lt;/td&gt;
  &lt;td &gt;Everton&lt;/td&gt;
  &lt;td &gt;Stoke&lt;/td&gt;
  &lt;td &gt;1&lt;/td&gt;
  &lt;td &gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;We've now got the team names, the number of goals scored by the home team (FTHG) at full time and the number of goals scored by the away team (FTAG) at full time, which is everything we need to get started.&lt;/p&gt;
&lt;h2 id="home-advantage"&gt;Home Advantage&lt;/h2&gt;
&lt;p&gt;Let's dig into the data to get a better understanding of what we're trying to model.  We'll start off simple by looking at the average goals scored.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nx"&gt;FTHG&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="m m-Double"&gt;1.531579&lt;/span&gt;
&lt;span class="nx"&gt;FTAG&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="m m-Double"&gt;1.147368&lt;/span&gt;
&lt;span class="nx"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;float64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;On average, the home team scores around a third of a goal more than the away team so to it looks like our model will need to handle home advantage.&lt;/p&gt;
&lt;p&gt;Next, we'll plot out the distribution of goals scored by the home and away teams.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;sns&lt;/span&gt;

&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;density&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Goals&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Proportion of matches&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;upper right&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Number of Goals Scored Per Match&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bold&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210608_goals_scored_home_away.png"&gt;&lt;/p&gt;
&lt;p&gt;The shape of that distribution looks rather similar to the &lt;a href="https://en.wikipedia.org/wiki/Poisson_distribution"&gt;Poisson distribution&lt;/a&gt;. We can confirm this by overlaying the number of goals we'd expect from the Poisson distribution based on the average goals scored that we calculated above.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;

&lt;span class="n"&gt;home_poisson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;away_poisson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Away&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;density&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;home_poisson&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Home Poisson&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;away_poisson&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Away Poisson&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Goals&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Proportion of matches&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;upper right&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Number of Goals Scored Per Match&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bold&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210608_goals_scored_home_away_poisson.png"&gt;&lt;/p&gt;
&lt;p&gt;It's not perfect, but the two sets of data look pretty close to each other. This is good as it means we can approximate the number of goals scored using the Poisson distribution - this assumption is going to form the basis of our predictive model.&lt;/p&gt;
&lt;h2 id="building-the-model"&gt;Building the model&lt;/h2&gt;
&lt;p&gt;To plot the Poisson distributions above, we used the average number of goals scored home and away, but to predict actual matches we're going to need to know how many goals we expect each individual team to score against each other.&lt;/p&gt;
&lt;p&gt;For this, we're going to assume that the number of goals a team scores is a function of their attack strength combined with the opposition's defence strength. In other words, teams with better attacks should score more and teams with worse defences should concede more.&lt;/p&gt;
&lt;p&gt;We also need to remember to account for the home advantage too.&lt;/p&gt;
&lt;p&gt;So our model is going to look like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_home = home_advantage + home_attack + defence_away&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;goals_away = away_attack + defence_home&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So far, so good. But where do we get the values for those model parameters from???&lt;/p&gt;
&lt;p&gt;We're going to use good old trial and error to test out a whole bunch of different numbers and see what works best. We are not going to do it by hand though as that will be tedious. Instead, we'll let the computer do the hard work for us.&lt;/p&gt;
&lt;h2 id="finding-the-optimal-parameters"&gt;Finding the Optimal parameters&lt;/h2&gt;
&lt;p&gt;For our computer to find the best set of &lt;code&gt;attack&lt;/code&gt; and &lt;code&gt;defence&lt;/code&gt; parameters, we need to give it a value to optimise. For a problem like this, the usual approach is &lt;a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation"&gt;Maximum Likelihood Estimation&lt;/a&gt; for identifying the parameter values that maximise the likelihood that the process described by the model produces the data that were actually observed. In other words, it will work out what set of &lt;code&gt;attack&lt;/code&gt; and &lt;code&gt;defence&lt;/code&gt; parameters get closest to reproducing those historical fixtures we downloaded.&lt;/p&gt;
&lt;p&gt;Unfortunately, for this example calculating the likelihood involves multiplying lots of small numbers together, which can cause issues with floating point precision and &lt;a href="https://en.wikipedia.org/wiki/Arithmetic_underflow"&gt;underflows&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Instead, we'll maximise the log-likelihood as we can just sum the logs of the likelihoods togethers rather than multiplying and avoid all those awkward precision issues.&lt;/p&gt;
&lt;p&gt;And actually, instead of maximising the log-likelihood, we're going to minimize the negative of the log-likelihood because Python's &lt;code&gt;scipy&lt;/code&gt; library comes with an optimizer that minimizes rather than maximizes. If this is confusing, making the negative of the log-likelihood as small as possible is equivalent to making the log-likelihood itself as big as possible.&lt;/p&gt;
&lt;p&gt;Now we just need the function to calculate the log- likelihood.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_attack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;away_defence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;    

    &lt;span class="n"&gt;home_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_home_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_home&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goals_away_observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal_expectation_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;log_llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;log_llk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That function is less complicated than it initially looks. First, we put the &lt;code&gt;attack&lt;/code&gt;, &lt;code&gt;defence&lt;/code&gt; and &lt;code&gt;home_advantage&lt;/code&gt; parameters into the model we defined above to get the number of goals we expect from the two teams playing each other. If the number of goals is less than zero then we return a really big number to tell the optimizer that the parameters it is testing are rubbish because teams can't score less than zero goals in real life.&lt;/p&gt;
&lt;p&gt;If we have zero or more goals, we use the Poisson distribution we work out the likelihood of our parameters giving us the actual number of goals scored. We then convert that to its log and add them together. Finally, because we are minimizing instead of maximizing, we return the negative of the log-likelihood.&lt;/p&gt;
&lt;p&gt;Now we need to wrap that code to pass into the optimizer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.optimize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]])))&lt;/span&gt;
    &lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# attack strength&lt;/span&gt;
            &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# defence strength&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# home advantage&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;attack_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;defence_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
        &lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;llk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;log_likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;attack_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;defence_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
                &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;maxiter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;disp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;constraints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;eq&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;fun&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;n_teams&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_fit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;

&lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fit_poisson_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's a big chunk of code so let's step through it to see what it's doing.&lt;/p&gt;
&lt;p&gt;First of all, we extract the teams from those historical fixtures we are fitting the model to as we're going to need to use them later on to index into the model's parameters with.&lt;/p&gt;
&lt;p&gt;Next, we create the default model parameters. The optimizer will use these as a starting point so the closer we can get them to the actual values then the easier (and therefore quicker) it will be for the optimizer to do its job.&lt;/p&gt;
&lt;p&gt;We then create a function called &lt;code&gt;_fit&lt;/code&gt;, which is a wrapper around the function for calculating the log-likelihood. This loops through our historical fixtures, calculates the log-likelihood for each one and then adds them all up. It's this final number that the optimizer is using to work out what the best set of parameters is.&lt;/p&gt;
&lt;p&gt;Just a small warning here that this function is rather slow. I've tried to keep the code as simple as possible by just using a &lt;code&gt;for loop&lt;/code&gt; here but there are ways to rewrite this to speed it up massively, which I'll point you towards later.&lt;/p&gt;
&lt;p&gt;We then set up some additional options for the optimizer to tell it to give up if it has not found a solution after 100 attempts and to hide the default output.&lt;/p&gt;
&lt;p&gt;Next, we set up a constraint to tell the optimizer that the sum of the attack parameters must equal the total number of teams. There are countless combinations of parameters that could potentially give us the same results so this forces the optimizer to give us a reproducible output and keeps the parameters in a realistic range by making them average around 1.&lt;/p&gt;
&lt;p&gt;After all that, we finally run the optimizer and return the parameters it has decided are the best fit.&lt;/p&gt;
&lt;p&gt;Let's take a look at the parameters and check they look sensible&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pprint&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;

&lt;span class="n"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;attack_Arsenal&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.4472981861257104&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Bournemouth&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9586519453041221&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Brighton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6704348922643556&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Burnley&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7124518794812125&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Chelsea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.2560897724037843&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Crystal Palace&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9524125006581355&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Everton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9329845923970115&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Huddersfield&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4800186720177229&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Leicester&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.1770358138298533&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Liverpool&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.5606892885829151&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Man City&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.782353073898856&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Man United&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.3383146669687525&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Newcastle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8007419035953682&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Southampton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7572219737186204&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Stoke&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7139419884005881&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Swansea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4779863899533932&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Tottenham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.4314120522353588&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_Watford&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9392553101070111&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_West Brom&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5799340702765279&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;attack_West Ham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0307710277807003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Arsenal&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.904617084909674&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Bournemouth&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.7555648468890336&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Brighton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8891290180796632&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Burnley&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.213060870426372&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Chelsea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.2124734969715354&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Crystal Palace&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8593840991466305&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Everton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8071894597751467&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Huddersfield&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8236837487885708&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Leicester&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.7605866654074676&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Liverpool&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.1893484944870034&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Man City&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.5087555344502186&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Man United&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.5123413324174197&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Newcastle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0231224233993557&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Southampton&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8496091601384902&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Stoke&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.6570589447421037&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Swansea&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8588218644055267&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Tottenham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.254167039595323&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_Watford&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.7084540000589359&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_West Brom&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8557753688634717&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;defence_West Ham&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.6434205443618255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="s1"&gt;&amp;#39;home_adv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2888264579592506&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That looks reasonable to me, Huddersfield and West Brom both have low attack parameters reflecting that they didn't score many goals while Manchester City and Chelsea have much higher attack parameters. West Ham and Bournemouth both have poor defence parameters, which again matches what actually happened in our historical data.&lt;/p&gt;
&lt;p&gt;Our home advantage parameter is also pretty close to what we got by just looking at the difference in average goals scored home and away.&lt;/p&gt;
&lt;h2 id="predicting-scores"&gt;Predicting Scores&lt;/h2&gt;
&lt;p&gt;Now for the fun bit, let's start making some predictions :)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_goals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;home_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;attack_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;defence_&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_team&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;home_advantage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;home_adv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;away_defence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_advantage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;away_attack&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;home_defence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;home_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;home_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;away_probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pmf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_goals&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;away_goal_expectation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;probability_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;home_probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;away_probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;probability_matrix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The start of this function looks similar to the &lt;code&gt;log_likelihood&lt;/code&gt; function we used when fitting the model. We start off by getting the parameters and using them to calculate how many goals we expect the home and away team to score against each other.&lt;/p&gt;
&lt;p&gt;We then use that with the Poisson distribution to get the probability of each team scoring 0, 1, 2, 3 etc goals.&lt;/p&gt;
&lt;p&gt;Then take the &lt;a href="https://en.wikipedia.org/wiki/Outer_product"&gt;outer product&lt;/a&gt; of those two sets of probabilities to get a matrix containing the probability of all the possible scores.&lt;/p&gt;
&lt;p&gt;Let's give it a go and get the probabilities for all scores up to four goals!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Stoke&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mf"&gt;0.01041474&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00470398&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00106231&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00015994&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00001806&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.04283444&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01934684&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00436915&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0006578&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00007428&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0880862&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.03978549&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00898487&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00135272&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00015274&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1207623&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05454416&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01231786&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00185452&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00020941&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.12416985&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05608323&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01266543&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00190685&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.00021531&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The columns show the probability of Manchester City scoring a given number of goals, whilst the rows are the probabilities for Stoke. For example, the probability of 0-0 is 0.01041474 and the probability of 3-1 is 0.05454416.&lt;/p&gt;
&lt;p&gt;We can now use this matrix to get the probabilities for various betting markets. For example, the sum of the diagonal of the matrix gives us the probability of a draw, the sum of the lower triangle is the probability of a home win and the sum of the upper triangle is the probability of an away win.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# draw&lt;/span&gt;
&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="mf"&gt;0.04081627039389976&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# home win&lt;/span&gt;
&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tril&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="mf"&gt;0.5531558007746874&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# away win&lt;/span&gt;
&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;triu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="mf"&gt;0.01276037463224738&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note that those probabilities don't add up to one as we've only calculated the probabilities for up to 4 goals. By the time you get up to 10 goals then you're pretty much at 1.0.&lt;/p&gt;
&lt;h2 id="something-smells-a-bit-fishy"&gt;Something Smells a Bit Fishy&lt;/h2&gt;
&lt;p&gt;We now have a model and can make predictions but are those predictions any good??&lt;/p&gt;
&lt;p&gt;If we look at the average probabilities over our data set then our model expects around 46% of games to finish as a home win, 23% as a draw and 31% as an away win.&lt;/p&gt;
&lt;p&gt;However, if we look at what actually happened that season then 46% of games ended as a home win, 26% as a draw and 28% as an away win - that's pretty good but we're clearly under-predicting draws and over-predicting away wins. If you're planning on using this model to beat the bookie then you're likely to be disappointed at this stage.&lt;/p&gt;
&lt;p&gt;The problem is that we're predicting the number of goals each team scores independently from each other. But goals in a football match are not independent. For example, the team playing away may settle for a draw if the game is 0-0 in the second half and attack less. A team that's gone three goals down may park the bus to try and avoid a more humiliating score line.&lt;/p&gt;
&lt;p&gt;So there are situations where teams aren't necessarily playing to score and that negatively impacts our Poisson assumption.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;All of this is not to say our model is useless though, it actually gives us a really good base model to build upon.&lt;/p&gt;
&lt;p&gt;In the next article in this series, we'll look at how Dixon and Coles propose an adjustment to our Poisson model to try and alleviate these issues with under-predicting draws and improve our accuracy.  &lt;/p&gt;
&lt;h2 id="addendum"&gt;Addendum&lt;/h2&gt;
&lt;p&gt;I mentioned above that the function we were using to optimize was a little slow as the code was written for simplicity rather than speed.&lt;/p&gt;
&lt;p&gt;I've neatened up the code and added it my &lt;a href="https://github.com/martineastwood/penaltyblog"&gt;penaltyblog&lt;/a&gt; python package if you want to try out a faster version. This package also includes some other useful functions for working with football and betting data.&lt;/p&gt;
&lt;p&gt;It's as simple as:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;penaltyblog
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;footballdata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;England&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2017&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Date&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;pois&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poisson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PoissonGoalsModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTHG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;FTAG&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HomeTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AwayTeam&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;pois&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pois&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Man City&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Man United&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;home_draw_away&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5645798158431763&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.23488303272721173&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.20053715077601225&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Have fun!&lt;/p&gt;</content><category term="Prediction"></category><category term="Poisson"></category><category term="Betting"></category></entry><entry><title>Which Young Players Will be Stars?</title><link href="2021/05/19/which-young-players-will-be-stars/" rel="alternate"></link><published>2021-05-19T19:30:00+00:00</published><updated>2021-05-19T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2021-05-19:2021/05/19/which-young-players-will-be-stars/</id><summary type="html">&lt;p&gt;Using my Player Ratings model to identify the best footballing prospects...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Over the years, I've shared results from my &lt;a href="http://www.pena.lt/y/2021/05/03/introducing-career-trajectories/"&gt;PlayerRatings model&lt;/a&gt; with lots of people, including a number of professional football clubs, and everybody always comes back with the same question - who are the best young players?&lt;/p&gt;
&lt;p&gt;Let's take a look!&lt;/p&gt;
&lt;h2 id="the-best-players-aged-21-or-under"&gt;The Best Players Aged 21 or Under&lt;/h2&gt;
&lt;p&gt;Let's start off with the top 25 players aged 21 or under at the time of writing this article:&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th title="Field #1"&gt;&lt;/th&gt;
    &lt;th title="Field #2"&gt;Name&lt;/th&gt;
    &lt;th title="Field #3"&gt;Team&lt;/th&gt;
    &lt;th title="Field #4"&gt;Age&lt;/th&gt;
    &lt;th title="Field #5"&gt;TransferMarkt Value £&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Matthijs de Ligt&lt;/td&gt;
&lt;td &gt;Juventus&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;75.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Phil Foden&lt;/td&gt;
&lt;td &gt;Manchester City&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;70.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Alphonso Davies&lt;/td&gt;
&lt;td &gt;Bayern Munich&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;75.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jadon Sancho&lt;/td&gt;
&lt;td &gt;Borussia Dortmund&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;100.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Erling Håland&lt;/td&gt;
&lt;td &gt;Borussia Dortmund&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;110.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Perr Schuurs&lt;/td&gt;
&lt;td &gt;Ajax&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;8.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Sergiño Dest&lt;/td&gt;
&lt;td &gt;Barcelona&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;25.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Ryan Gravenberch&lt;/td&gt;
&lt;td &gt;Ajax&lt;/td&gt;
&lt;td &gt;18&lt;/td&gt;
&lt;td &gt;28.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Mitchel Bakker&lt;/td&gt;
&lt;td &gt;Paris Saint Germain&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;10.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Pedri&lt;/td&gt;
&lt;td &gt;Barcelona&lt;/td&gt;
&lt;td &gt;18&lt;/td&gt;
&lt;td &gt;70.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;João Félix&lt;/td&gt;
&lt;td &gt;Atlético Madrid&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;80.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Callum Hudson-Odoi&lt;/td&gt;
&lt;td &gt;Chelsea&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;35.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Mason Greenwood&lt;/td&gt;
&lt;td &gt;Manchester United&lt;/td&gt;
&lt;td &gt;19&lt;/td&gt;
&lt;td &gt;50.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Kai Havertz&lt;/td&gt;
&lt;td &gt;Chelsea&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;50.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Bukayo Saka&lt;/td&gt;
&lt;td &gt;Arsenal&lt;/td&gt;
&lt;td &gt;19&lt;/td&gt;
&lt;td &gt;50.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Ferrán Torres&lt;/td&gt;
&lt;td &gt;Manchester City&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;50.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Justin Kluivert&lt;/td&gt;
&lt;td &gt;RB Leipzig (loan from Roma)&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;14.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Moussa Diaby&lt;/td&gt;
&lt;td &gt;Bayer Leverkusen&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;38.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Jonathan David&lt;/td&gt;
&lt;td &gt;Lille&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;30.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Jens Petter Hauge&lt;/td&gt;
&lt;td &gt;Milan&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;10.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;Óscar Mingueza&lt;/td&gt;
&lt;td &gt;Barcelona&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;10.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Aurélien Tchouaméni&lt;/td&gt;
&lt;td &gt;Monaco&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;25.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;Antony&lt;/td&gt;
&lt;td &gt;Ajax&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;25.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Myron Boadu&lt;/td&gt;
&lt;td &gt;AZ Alkmaar&lt;/td&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td &gt;14.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;Owen Wijndal&lt;/td&gt;
&lt;td &gt;AZ Alkmaar&lt;/td&gt;
&lt;td &gt;21&lt;/td&gt;
&lt;td &gt;15.00m&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;So that looks pretty reasonable to me. You may not agree with the exact list but there's nothing too controversial in there. They are all players who are getting regular first team minutes in some of the biggest leagues around Europe. Many of them are even full internationals despite being still being 21 or under.&lt;/p&gt;
&lt;p&gt;They are also all already at big teams so if you are looking to recruit players with potential, then you are too late. These players have already been discovered and are probably on big contracts.&lt;/p&gt;
&lt;p&gt;Let's try searching a bit younger and see if we can find the future stars before they hit the big time.&lt;/p&gt;
&lt;h2 id="the-best-players-aged-17-or-under"&gt;The Best Players Aged 17 or Under&lt;/h2&gt;
&lt;p&gt;Below I've got the top 50 players aged 17 or under at the time of writing this article. A lot of these players won't be getting first team minutes yet and it may well be a few years until they make a name for themselves. Some of them may not even make it in the game.&lt;/p&gt;
&lt;p&gt;So, since it's going to take a while for this list to mature I'm going to keep the players anonymised to start with to avoid the temptation of judging the predictions too early. If / when players become well known, get bought up by bigger clubs or their value on &lt;a href="https://www.transfermarkt.com/"&gt;Transfermarkt&lt;/a&gt; goes to £10.00m or above I'll reveal their names.&lt;/p&gt;
&lt;p&gt;A few of the players already meet this criteria so are de-anonymised to start off with.&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th title="Field #1"&gt;&lt;/th&gt;
    &lt;th title="Field #2"&gt;Name&lt;/th&gt;
    &lt;th title="Field #3"&gt;Team&lt;/th&gt;
    &lt;th title="Field #4"&gt;Age&lt;/th&gt;
    &lt;th title="Field #5"&gt;TransferMarkt Value £&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;

&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Florian Wirtz&lt;/td&gt;
&lt;td &gt;Bayer Leverkusen&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;45.00m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Youssoufa Moukoko&lt;/td&gt;
&lt;td &gt;Borussia Dortmund&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;10.00m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.8m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;1.8m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;15&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.54&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;4.5m&lt;/td&gt;
&lt;/tr&gt;


&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;

&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.18m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Jude Bellingham&lt;/td&gt;
&lt;td &gt;Borussia Dortmund&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;35.00m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;1.35m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.9m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;

&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;3.15m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.45m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;1.08&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;

&lt;tr&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;



&lt;tr&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.27m&lt;/td&gt;

&lt;tr&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;1.35m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;0.585m&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;Anonymous&lt;/td&gt;
&lt;td &gt;-&lt;/td&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td &gt;1.08m&lt;/td&gt;
&lt;/tr&gt;

&lt;/tbody&gt;&lt;/table&gt;

&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;I have no idea what a good hit rate is here, one out of the 50 going on to have a good career? Five? Ten? More?&lt;/p&gt;
&lt;p&gt;The model is already flagging the players who are well known (to me at least), such as Florian Wirtz and Jude Bellingham, which is reassuring. However, the model only has information on youth football for many of the players so I'm really intrigued as to how predictive these types of fixtures will actually be!  &lt;/p&gt;</content><category term="PlayerRating"></category><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="Career Trajectories"></category></entry><entry><title>Predicting Player Career Trajectories</title><link href="2021/05/03/introducing-career-trajectories/" rel="alternate"></link><published>2021-05-03T19:30:00+00:00</published><updated>2021-05-03T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2021-05-03:2021/05/03/introducing-career-trajectories/</id><summary type="html">&lt;p&gt;Predicting what player's potential career trajectories look like...&lt;/p&gt;</summary><content type="html">&lt;h2 id="tldr"&gt;tl;dr&lt;/h2&gt;
&lt;p&gt;I've updated the my &lt;a href="https://pena.lt/y/2016/12/20/opta-pro-forum-2016/"&gt;Player Ratings&lt;/a&gt; model to show football players' potential career trajectories.&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Back in 2016, I gave a &lt;a href="https://pena.lt/y/2016/12/20/opta-pro-forum-2016/"&gt;presentation at the Opta Pro Forum&lt;/a&gt; discussing how to rate / rank footballers using a metric somewhat related to a &lt;a href="https://en.wikipedia.org/wiki/Plus%E2%80%93minus"&gt;plus/minus score&lt;/a&gt;. The presentation went down well with the audience and probably generated more feedback than all my other Opta Pro Forum appearances combined. A few years on, I've finally got around to refining the idea further to come up with version 2.0 of the algorithm and smooth out some of its wrinkles.&lt;/p&gt;
&lt;h2 id="how-does-it-work"&gt;How Does it Work?&lt;/h2&gt;
&lt;p&gt;Although I've given the model a complete overhaul, the principles behind it are the same so I'm not going to &lt;a href="https://pena.lt/y/2016/12/20/opta-pro-forum-2016/"&gt;repeat myself too much&lt;/a&gt; but the model is essentially looking at how well teams perform when a player is in the team compared when they are out of the team.&lt;/p&gt;
&lt;p&gt;Unfortunately though, we can't just naively add up a teams goal difference with and without a specific player to calculate their plus/minus score. For example, stick an average player in Real Madrid's first team and they'd likely come out with a much better plus/minus than they would playing for Sheffield United.&lt;/p&gt;
&lt;p&gt;So we need to account for the strength of a player's team mates and their opposition - this makes things get a bit more complicated though because to calculate the current player's score we now need to know the score of the other players around them, but to calculate those other players' scores we need to know the score of the current player. Circular!&lt;/p&gt;
&lt;p&gt;Plus there are relatively few substitutions in football so we often don't get much data around how well teams perform with different combinations of players. To help alleviate this, the model incorporates a number of &lt;a href="https://en.wikipedia.org/wiki/Prior_probability"&gt;statistical priors&lt;/a&gt; that are used to inform the predictions where there is a lack of data.&lt;/p&gt;
&lt;p&gt;As an example, imagine watching a footballer play for the first time - there’s a probability they may be the next Lionel Messi, there’s a probability they may be the next &lt;a href="https://en.wikipedia.org/wiki/Lee_Bradbury"&gt;Lee &lt;strike&gt;Badbuy&lt;/strike&gt; Bradbury&lt;/a&gt; and there’s a probability they may be average and be somewhere between the two.&lt;/p&gt;
&lt;p&gt;This is essentially how the model works. Based on all the available data for each player, the model constructs a set of priors and uses them in conjunction with the observed data to estimate the player’s true talent compared with all the other footballers in the world. As the model gains more information about a player over the course of their career, the influence of these priors diminishes relative to the observed data.&lt;/p&gt;
&lt;h2 id="career-trajectories"&gt;Career Trajectories&lt;/h2&gt;
&lt;p&gt;If we calculate these scores over the course of a player's career, we can then visualise their career trajectory. As an example, here's the UK's future Prime Minister &lt;a href="https://en.wikipedia.org/wiki/Marcus_Rashford#Charity_and_activism"&gt;Marcus Rashford&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/marcus_rashford_trajectory_to_date.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Marcus Rashford's career trajectory to date&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The blue line shows Rashford's rating by the model over the course of his career. The scale of the rating is somewhat arbitrary as the number doesn't refer to anything specific, such as goals scored etc. Rather, it's used to compare footballers against each other and Rashford's current rating puts him comfortably in the top 200 footballers in the world. This is pretty impressive as there are currently around 100,000 players he's being compared against.&lt;/p&gt;
&lt;p&gt;This chart is only looking at what has happened in his career so far though. Wouldn't it be more useful to get an idea about how the rest of his career is going to look?&lt;/p&gt;
&lt;p&gt;Predicting how a player's career is going to pan out is tricky though. We have no idea about what will happen with injuries, managers, teams, transfers, motivation etc so we can't just give a single prediction. Instead, we need to provide a range of realistic possibilities that cover our uncertainty. The model does this be comparing the player with historical players to find those with the closest matching career trajectories to date.&lt;/p&gt;
&lt;p&gt;As an example, here is Marcus Rashford again (in blue) alongside the closest matching career trajectories (in grey). Now, this isn't to say that Rashford will follow any particular one of those grey lines. All footballers are unique snowflakes and will all have their own unique career trajectory but it does give us a realistic range in which the player's career could develop.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210519_rashford_trajectories.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Marcus Rashford's potential career trajectories&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There's quite a wide range of potential trajectories there for Rashford, which shows how uncertain players' careers can be. At the top end, he is closest to  &lt;a href="https://en.wikipedia.org/wiki/Ivan_Rakiti%C4%87"&gt;Ivan Rakitic&lt;/a&gt; but at the bottom end there is &lt;a href="https://en.wikipedia.org/wiki/Manuel_Fernandes_(footballer,_born_1986)"&gt;Manuel Fernandes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To be clear though, that doesn't mean Marcus Rashford and Ivan Rakitic or Manuel Fernandes are similar in how they play, but that they are similar (or could be similar) in the impact they have on their team's results.&lt;/p&gt;
&lt;h2 id="erling-braut-haaland"&gt;Erling Braut Haaland&lt;/h2&gt;
&lt;p&gt;Here's another example, this time for the in-demand Erling Braut Haaland.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/erling_haaland_trajectories.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Erling Haaland's potential career trajectories&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We can already see that Haaland is at the upper end of all the similar trajectories, showing what a generational talent he potentially is. There are very few players who've careers have started as strongly as his so we don't have many careers to compare him with.&lt;/p&gt;
&lt;p&gt;The most similar career is actually Mesut Özil's, who the model considers to have been one the greatest players I have data on. Again, this doesn't mean Haaland and Özil are similar in how they play, but that they are similar in the impact they have on their team's results.&lt;/p&gt;
&lt;h2 id="neymar"&gt;Neymar&lt;/h2&gt;
&lt;p&gt;At the opposite end of the career spectrum is Neymar who the model has been thoroughly unimpressed with recently. I don't follow Paris Saint Germain particularly closely but at one point Neymar was vying with Messi to be ranked the number one player in the world overall. However, it seems his impact has waned significantly over the past few years.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210519_neymar_trajectories.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 4: Neymar's potential career trajectories&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="andrew-robertson"&gt;Andrew Robertson&lt;/h2&gt;
&lt;p&gt;And as a final example, here's Liverpool's Andrew Robertson. The reason for including Robertson is that it's a nice example showing a player that has probably reached his peak.&lt;/p&gt;
&lt;p&gt;Although Robertson has steadily improved over the course of his career, based on the trajectories of similar players, it's likely that Liverpool will start to see him decline from here onwards. A slight positive from this though is that the expected drop off in impact looks to be fairly gradual compared with Neymar!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20210519_robertson_trajectories.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 5: Andrew Robertson's potential career trajectories&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This is a fairly short post just to introduce the idea of visualising players' potential career trajectories.&lt;/p&gt;
&lt;p&gt;I'll expand on this topic in future articles but in the meantime, thanks for reading!&lt;/p&gt;</content><category term="PlayerRating"></category><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="Career Trajectories"></category></entry><entry><title>Sharing xG Using Multi-touch Attribution Modelling</title><link href="2019/11/23/multitouch-attributed-xg/" rel="alternate"></link><published>2019-11-23T19:30:00+00:00</published><updated>2019-11-23T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2019-11-23:2019/11/23/multitouch-attributed-xg/</id><summary type="html">&lt;p&gt;Reattributing xG using multi-touch attribution modelling...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;A question that often comes up in data science is how you determine the performance of different marketing channels. For example, somebody might land on your website by clicking an advert on Google. They may find something they want to buy but leave the site only to return a few minutes later via an affiliate link with a discount code. They then add something to their shopping basket but never complete the purchase so you send them an email to remind them and finally they convert.&lt;/p&gt;
&lt;p&gt;Which channel drove the conversion here, was it the final channel which they converted from? Was it the first one as it brought them onto the site in the first place? Or was it one of the other ones in-between that helped guide them to the conversion?&lt;/p&gt;
&lt;p&gt;Okay, this is a blog about football analytics so why I am writing about marketing? Well, consider this next example - Aymeric Laporte tackles the opposition's striker to win the ball and passes it to Fernandinho. Fernandinho then plays a short ball to Bernardo Silva, who then knocks it out wide to Raheem Sterling. Raz runs down the wing, beats the fullback and crosses the ball to Sergio Agüero who then scores. Which player was responsible for the goal here, was it Laporte for winning possession in the first place, was it Agüero since he scored or was it one of the players in-between who helped move the ball down the pitch to Agüero so he could score the goal?&lt;/p&gt;
&lt;p&gt;Hopefully you've spotted that both examples are effectively the same problem - how do you determine the value of all the events leading up to a conversion?&lt;/p&gt;
&lt;h2 id="heuristics"&gt;Heuristics&lt;/h2&gt;
&lt;p&gt;Traditionally, this has been 'solved' using simple heuristics. For example the last player who touches the ball is awarded the goal or the last marketing channel a customer interacts with gets the credit for their purchase. Depending on the metric being measured, people will occasionally give credit to the first event in the sequence instead, or perhaps share everything out equally because it sounds fairer. If they are feeling really adventurous they may even apply some sort of curve to it but there's no real scientific rationale being used, it's typically just somebody's personal preference.&lt;/p&gt;
&lt;h2 id="multi-touch-attribution-models"&gt;Multi-touch Attribution Models&lt;/h2&gt;
&lt;p&gt;A more scientific approach taken from the world of marketing analytics is to use multi-touch attribution modelling to quantify the importance of each event in the sequence and assign a fractional amount of credit to it based on how much it drives the final outcome.&lt;/p&gt;
&lt;p&gt;There are lots of different ways of doing this, including using &lt;a href="https://en.wikipedia.org/wiki/Markov_chain"&gt;Markov chains&lt;/a&gt;. These are mathematical systems that can be used to model the probability of sequences transitioning from one event to another. For example, the probability of a customer clicking through to a company's home page from a tweet, followed by the probability of that being their last interaction (and therefore failing to convert) or the probability they move onto some other interaction with the company.&lt;/p&gt;
&lt;p&gt;We can apply this same principle to football, e.g. if Sergio Agüero is in possession of the football then what is the probability he passes, what is the probability he scores, what is the probability the sequence of possession ends with him?&lt;/p&gt;
&lt;h2 id="attributing-xg-using-markov-chains"&gt;Attributing xG Using Markov Chains&lt;/h2&gt;
&lt;p&gt;To apply multi-touch attribution modelling to football and &lt;a href="http://pena.lt/y/category/expected-goals.html"&gt;Expected Goals&lt;/a&gt; (xG) I created a dataset of possession sequences from the Premier League where each sequence contained the players involved plus a True / False flag terminating the sequence to designate whether it ended with a shot or not and used it to train a Markov Chain.&lt;/p&gt;
&lt;p&gt;[Ederson, Laporte, Stones, Sterling, True]&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Example possession sequence used to train the Markov chain&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The trained Markov Chain was then used to simulate possessions by picking a starting player and taking a random walk through the probabilities until it hit a True / False event. From here we can calculate the importance of each player in terms of shot generation - essentially, each possession sequence's propensity to lead to a shot changes as different players become involved. These differences in shot propensity can then be used to reattribute the xG from a given shot across all the players in the possession leading up to it - the more important a player is the more xG is awarded to them even if they didn't take the shot.&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;The table below shows the attributed xG (axG) for Manchester City's 2018/2019 season. The first thing to note is that the most attacking players, such as Agüero and Jesus, have lower axG compared with xG. This is to be expected as traditional xG models will credit them with 100% of the value of the shot whereas axG takes some of that xG and reassigns it to the players involved in the build up play. They still come out with the highest axG scores overall though as these are the players taking the majority of the shots generating the xG so their presence in the possession sequences is important in terms of shot generation.&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;&lt;tr&gt;&lt;th title="Field #1"&gt;name&lt;/th&gt;
&lt;th  title="Field #2"&gt;xg&lt;/th&gt;
&lt;th  title="Field #3"&gt;axg&lt;/th&gt;
&lt;th  title="Field #4"&gt;axg:xg&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;sergio agüero&lt;/td&gt;
&lt;td &gt;19.92&lt;/td&gt;
&lt;td &gt;15.86&lt;/td&gt;
&lt;td &gt;0.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;raheem sterling&lt;/td&gt;
&lt;td &gt;13.14&lt;/td&gt;
&lt;td &gt;12.2&lt;/td&gt;
&lt;td &gt;0.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;david silva&lt;/td&gt;
&lt;td &gt;8.07&lt;/td&gt;
&lt;td &gt;8.79&lt;/td&gt;
&lt;td &gt;1.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;bernardo silva&lt;/td&gt;
&lt;td &gt;6.74&lt;/td&gt;
&lt;td &gt;7.78&lt;/td&gt;
&lt;td &gt;1.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gabriel jesus&lt;/td&gt;
&lt;td &gt;8.76&lt;/td&gt;
&lt;td &gt;7.22&lt;/td&gt;
&lt;td &gt;0.82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;leroy sané&lt;/td&gt;
&lt;td &gt;5.52&lt;/td&gt;
&lt;td &gt;5.84&lt;/td&gt;
&lt;td &gt;1.06&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;riyad mahrez&lt;/td&gt;
&lt;td &gt;5.72&lt;/td&gt;
&lt;td &gt;5.37&lt;/td&gt;
&lt;td &gt;0.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ilkay gündogan&lt;/td&gt;
&lt;td &gt;4.52&lt;/td&gt;
&lt;td &gt;4.74&lt;/td&gt;
&lt;td &gt;1.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;aymeric laporte&lt;/td&gt;
&lt;td &gt;3.36&lt;/td&gt;
&lt;td &gt;4.23&lt;/td&gt;
&lt;td &gt;1.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fernandinho&lt;/td&gt;
&lt;td &gt;1.74&lt;/td&gt;
&lt;td &gt;2.58&lt;/td&gt;
&lt;td &gt;1.48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kevin de bruyne&lt;/td&gt;
&lt;td &gt;1.99&lt;/td&gt;
&lt;td &gt;2.26&lt;/td&gt;
&lt;td &gt;1.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kyle walker&lt;/td&gt;
&lt;td &gt;0.45&lt;/td&gt;
&lt;td &gt;1.66&lt;/td&gt;
&lt;td &gt;3.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nicolás otamendi&lt;/td&gt;
&lt;td &gt;1.39&lt;/td&gt;
&lt;td &gt;1.4&lt;/td&gt;
&lt;td &gt;1.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;phil foden&lt;/td&gt;
&lt;td &gt;1.72&lt;/td&gt;
&lt;td &gt;1.06&lt;/td&gt;
&lt;td &gt;0.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;john stones&lt;/td&gt;
&lt;td &gt;0.51&lt;/td&gt;
&lt;td &gt;0.92&lt;/td&gt;
&lt;td &gt;1.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;oleksandr zinchenko&lt;/td&gt;
&lt;td &gt;0.19&lt;/td&gt;
&lt;td &gt;0.9&lt;/td&gt;
&lt;td &gt;4.74&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vincent kompany&lt;/td&gt;
&lt;td &gt;0.35&lt;/td&gt;
&lt;td &gt;0.59&lt;/td&gt;
&lt;td &gt;1.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;danilo&lt;/td&gt;
&lt;td &gt;0.44&lt;/td&gt;
&lt;td &gt;0.54&lt;/td&gt;
&lt;td &gt;1.23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;benjamin mendy&lt;/td&gt;
&lt;td &gt;0.2&lt;/td&gt;
&lt;td &gt;0.36&lt;/td&gt;
&lt;td &gt;1.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fabian delph&lt;/td&gt;
&lt;td &gt;0.08&lt;/td&gt;
&lt;td &gt;0.29&lt;/td&gt;
&lt;td &gt;3.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ederson&lt;/td&gt;
&lt;td &gt;0&lt;/td&gt;
&lt;td &gt;0.23&lt;/td&gt;
&lt;td &gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1: Manchester City axG 2018/2019&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Looking at the ratio of xG to axG shows that the biggest beneficiaries for Manchester City are their defenders, particularly the fullbacks. Kyle Walker has a 3.7 fold increase in xG credited to him and Oleksandr Zinchenko has a 4.7 fold increase. Laporte and Fernandinho also have noticeable increases too reflecting their importance in City's build up play.&lt;/p&gt;
&lt;h2 id="the-importance-of-full-backs"&gt;The Importance of Full Backs&lt;/h2&gt;
&lt;p&gt;It's not just Manchester City's full backs who do well when we reattribute xG, it's pretty common across all other teams too as shown in the table below. These players are typically out wide where they can't take many shots but are important for getting the ball into the danger zones for the attacking players. It's still small volumes of xG compared with attackers but once we start accounting for fullbacks' involvement in the build up play then their xG numbers increase noticeably.&lt;/p&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;&lt;tr&gt;&lt;th title="Field #1"&gt;team_name&lt;/th&gt;
&lt;th title="Field #2"&gt;name&lt;/th&gt;
&lt;th title="Field #3"&gt;xg&lt;/th&gt;
&lt;th title="Field #4"&gt;axg&lt;/th&gt;
&lt;th title="Field #5"&gt;axg:xg&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;Burnley&lt;/td&gt;
&lt;td&gt;charlie taylor&lt;/td&gt;
&lt;td &gt;0.09&lt;/td&gt;
&lt;td &gt;0.73&lt;/td&gt;
&lt;td &gt;8.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;stephan lichtsteiner&lt;/td&gt;
&lt;td &gt;0.02&lt;/td&gt;
&lt;td &gt;0.15&lt;/td&gt;
&lt;td &gt;7.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bournemouth&lt;/td&gt;
&lt;td&gt;simon francis&lt;/td&gt;
&lt;td &gt;0.03&lt;/td&gt;
&lt;td &gt;0.22&lt;/td&gt;
&lt;td &gt;7.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;arthur masuaku&lt;/td&gt;
&lt;td &gt;0.08&lt;/td&gt;
&lt;td &gt;0.42&lt;/td&gt;
&lt;td &gt;5.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;oleksandr zinchenko&lt;/td&gt;
&lt;td &gt;0.19&lt;/td&gt;
&lt;td &gt;0.9&lt;/td&gt;
&lt;td &gt;4.74&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;lucas digne&lt;/td&gt;
&lt;td &gt;0.46&lt;/td&gt;
&lt;td &gt;2.16&lt;/td&gt;
&lt;td &gt;4.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;kyle walker&lt;/td&gt;
&lt;td &gt;0.45&lt;/td&gt;
&lt;td &gt;1.66&lt;/td&gt;
&lt;td &gt;3.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;sead kolasinac&lt;/td&gt;
&lt;td &gt;0.45&lt;/td&gt;
&lt;td &gt;1.63&lt;/td&gt;
&lt;td &gt;3.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;kieran trippier&lt;/td&gt;
&lt;td &gt;0.4&lt;/td&gt;
&lt;td &gt;1.41&lt;/td&gt;
&lt;td &gt;3.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bournemouth&lt;/td&gt;
&lt;td&gt;diego rico&lt;/td&gt;
&lt;td &gt;0.08&lt;/td&gt;
&lt;td &gt;0.28&lt;/td&gt;
&lt;td &gt;3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;trent alexander-arnold&lt;/td&gt;
&lt;td &gt;0.66&lt;/td&gt;
&lt;td &gt;2.07&lt;/td&gt;
&lt;td &gt;3.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;ashley young&lt;/td&gt;
&lt;td &gt;0.47&lt;/td&gt;
&lt;td &gt;1.42&lt;/td&gt;
&lt;td &gt;3.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;pablo zabaleta&lt;/td&gt;
&lt;td &gt;0.17&lt;/td&gt;
&lt;td &gt;0.51&lt;/td&gt;
&lt;td &gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watford&lt;/td&gt;
&lt;td&gt;josé holebas&lt;/td&gt;
&lt;td &gt;0.6&lt;/td&gt;
&lt;td &gt;1.79&lt;/td&gt;
&lt;td &gt;2.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;andrew robertson&lt;/td&gt;
&lt;td &gt;1.18&lt;/td&gt;
&lt;td &gt;2.83&lt;/td&gt;
&lt;td &gt;2.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;javier manquillo&lt;/td&gt;
&lt;td &gt;0.07&lt;/td&gt;
&lt;td &gt;0.16&lt;/td&gt;
&lt;td &gt;2.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Burnley&lt;/td&gt;
&lt;td&gt;matthew lowton&lt;/td&gt;
&lt;td &gt;0.33&lt;/td&gt;
&lt;td &gt;0.75&lt;/td&gt;
&lt;td &gt;2.27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Leicester&lt;/td&gt;
&lt;td&gt;ricardo pereira&lt;/td&gt;
&lt;td &gt;1.35&lt;/td&gt;
&lt;td &gt;2.81&lt;/td&gt;
&lt;td &gt;2.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Leicester&lt;/td&gt;
&lt;td&gt;ben chilwell&lt;/td&gt;
&lt;td &gt;1.05&lt;/td&gt;
&lt;td &gt;2.14&lt;/td&gt;
&lt;td &gt;2.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bournemouth&lt;/td&gt;
&lt;td&gt;nathaniel clyne&lt;/td&gt;
&lt;td &gt;0.07&lt;/td&gt;
&lt;td &gt;0.14&lt;/td&gt;
&lt;td &gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;luke shaw&lt;/td&gt;
&lt;td &gt;1.1&lt;/td&gt;
&lt;td &gt;2.15&lt;/td&gt;
&lt;td &gt;1.95&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2: Fullback axG 2018/2019&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="top-25-players-by-axg"&gt;Top 25 Players by axG&lt;/h2&gt;
&lt;table class="table table-striped table-condensed"&gt;
&lt;thead&gt;&lt;tr&gt;&lt;th title="Field #1"&gt;team_name&lt;/th&gt;
&lt;th title="Field #2"&gt;name&lt;/th&gt;
&lt;th title="Field #3"&gt;xg&lt;/th&gt;
&lt;th title="Field #4"&gt;axg&lt;/th&gt;
&lt;th title="Field #5"&gt;axg:xg&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;mohamed salah&lt;/td&gt;
&lt;td &gt;18.83&lt;/td&gt;
&lt;td &gt;18.99&lt;/td&gt;
&lt;td &gt;1.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;paul pogba&lt;/td&gt;
&lt;td &gt;15.6&lt;/td&gt;
&lt;td &gt;17.06&lt;/td&gt;
&lt;td &gt;1.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;pierre-emerick aubameyang&lt;/td&gt;
&lt;td &gt;20.84&lt;/td&gt;
&lt;td &gt;16.92&lt;/td&gt;
&lt;td &gt;0.81&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;sergio agüero&lt;/td&gt;
&lt;td &gt;19.92&lt;/td&gt;
&lt;td &gt;15.86&lt;/td&gt;
&lt;td &gt;0.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;aleksandar mitrovic&lt;/td&gt;
&lt;td &gt;15.14&lt;/td&gt;
&lt;td &gt;15.52&lt;/td&gt;
&lt;td &gt;1.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wolverhampton Wanderers&lt;/td&gt;
&lt;td&gt;raúl jiménez&lt;/td&gt;
&lt;td &gt;15.46&lt;/td&gt;
&lt;td &gt;14.48&lt;/td&gt;
&lt;td &gt;0.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Leicester&lt;/td&gt;
&lt;td&gt;jamie vardy&lt;/td&gt;
&lt;td &gt;15.59&lt;/td&gt;
&lt;td &gt;13.93&lt;/td&gt;
&lt;td &gt;0.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;eden hazard&lt;/td&gt;
&lt;td &gt;10.5&lt;/td&gt;
&lt;td &gt;13.57&lt;/td&gt;
&lt;td &gt;1.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brighton&lt;/td&gt;
&lt;td&gt;glenn murray&lt;/td&gt;
&lt;td &gt;11.87&lt;/td&gt;
&lt;td &gt;13.11&lt;/td&gt;
&lt;td &gt;1.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;harry kane&lt;/td&gt;
&lt;td &gt;13.64&lt;/td&gt;
&lt;td &gt;12.66&lt;/td&gt;
&lt;td &gt;0.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;gylfi sigurdsson&lt;/td&gt;
&lt;td &gt;11.94&lt;/td&gt;
&lt;td &gt;12.41&lt;/td&gt;
&lt;td &gt;1.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bournemouth&lt;/td&gt;
&lt;td&gt;joshua king&lt;/td&gt;
&lt;td &gt;13.07&lt;/td&gt;
&lt;td &gt;12.34&lt;/td&gt;
&lt;td &gt;0.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;salomón rondón&lt;/td&gt;
&lt;td &gt;11.86&lt;/td&gt;
&lt;td &gt;12.32&lt;/td&gt;
&lt;td &gt;1.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;raheem sterling&lt;/td&gt;
&lt;td &gt;13.14&lt;/td&gt;
&lt;td &gt;12.2&lt;/td&gt;
&lt;td &gt;0.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;sadio mané&lt;/td&gt;
&lt;td &gt;14.54&lt;/td&gt;
&lt;td &gt;11.7&lt;/td&gt;
&lt;td &gt;0.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;alexandre lacazette&lt;/td&gt;
&lt;td &gt;12.3&lt;/td&gt;
&lt;td &gt;11.23&lt;/td&gt;
&lt;td &gt;0.91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;roberto firmino&lt;/td&gt;
&lt;td &gt;11.96&lt;/td&gt;
&lt;td &gt;11.04&lt;/td&gt;
&lt;td &gt;0.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Burnley&lt;/td&gt;
&lt;td&gt;ashley barnes&lt;/td&gt;
&lt;td &gt;11.25&lt;/td&gt;
&lt;td &gt;10.83&lt;/td&gt;
&lt;td &gt;0.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bournemouth&lt;/td&gt;
&lt;td&gt;callum wilson&lt;/td&gt;
&lt;td &gt;12.24&lt;/td&gt;
&lt;td &gt;10.18&lt;/td&gt;
&lt;td &gt;0.83&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;danny ings&lt;/td&gt;
&lt;td &gt;9.51&lt;/td&gt;
&lt;td &gt;10&lt;/td&gt;
&lt;td &gt;1.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Watford&lt;/td&gt;
&lt;td&gt;troy deeney&lt;/td&gt;
&lt;td &gt;9.56&lt;/td&gt;
&lt;td &gt;9.71&lt;/td&gt;
&lt;td &gt;1.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;richarlison&lt;/td&gt;
&lt;td &gt;9.37&lt;/td&gt;
&lt;td &gt;8.91&lt;/td&gt;
&lt;td &gt;0.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;david silva&lt;/td&gt;
&lt;td &gt;8.07&lt;/td&gt;
&lt;td &gt;8.79&lt;/td&gt;
&lt;td &gt;1.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;romelu lukaku&lt;/td&gt;
&lt;td &gt;10.18&lt;/td&gt;
&lt;td &gt;8.57&lt;/td&gt;
&lt;td &gt;0.84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;bernardo silva&lt;/td&gt;
&lt;td &gt;6.74&lt;/td&gt;
&lt;td &gt;7.78&lt;/td&gt;
&lt;td &gt;1.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 3: Top 25 Players by axG 2018/2019&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;It's worth clarifying that this is not an expected possession value (EPV) model. It's taking the output from a shots-based xG model and redistributing it across the players involved in the build up play based on the propensity of a shot occurring from that particular group of players.&lt;/p&gt;
&lt;p&gt;In many ways, the output of the model is closer to a &lt;a href="https://en.wikipedia.org/wiki/Shapley_value"&gt;Shapley Value&lt;/a&gt; in that it's looking at all the different combinations of players in the possession sequences to quantify how much each player contributed to the propensity of a shot occurring. In fact, this is something I want to play around with further to see what other uses it has.&lt;/p&gt;
&lt;p&gt;Whilst the approach described here is perhaps not as complex as some EPV models, it has a couple of advantages. First of all, it's quick to process the data, but most importantly it's easy to explain to stakeholders. This isn't some complicated and uninterpretable black box that senior management need to take a leap of faith to trust, multi-touch attribution is just sharing out xG more fairly based on the probabilities of shots occurring during the sequence of play and for me that's a big win. A simpler approach that people can relate to often has a bigger impact in a business than a bigger model that's much more complicated to get buy in for.&lt;/p&gt;
&lt;p&gt;Multi-touch attribution has pretty much achieved this in marketing analytics now, it's a significant improvement from the simple heuristics without being so complex it scares the &lt;a href="https://www.investopedia.com/terms/c/c-suite.asp"&gt;C-Suite&lt;/a&gt; off. Perhaps it could also play a similar role in football as a step up from xG without ostracizing the more data-reluctant coaches?&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Player Expected Goals"></category></entry><entry><title>VAR, What is it Good For?</title><link href="2019/08/22/var-what-is-it-good-for/" rel="alternate"></link><published>2019-08-22T19:30:00+00:00</published><updated>2019-08-22T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2019-08-22:2019/08/22/var-what-is-it-good-for/</id><summary type="html">&lt;p&gt;As Edwin Starr famously said, VAR, what is it good for? Well, let's find out...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As &lt;a href="https://www.youtube.com/watch?v=dpWmlRNfLck"&gt;Edwin Starr&lt;/a&gt; famously said, VAR, what is it good for? Well, let's find out...&lt;/p&gt;
&lt;h2 id="player-speed"&gt;Player Speed&lt;/h2&gt;
&lt;p&gt;To get an idea of what VAR can and can't do we need to know how fast the players are moving on the pitch. According to &lt;a href="https://www.express.co.uk/sport/football/1130585/Fastest-footballer-Premier-League-fastest-players-Man-United-Liverpool-speed-footballers"&gt;this&lt;/a&gt; article in The Express and &lt;a href="https://www.skysports.com/football/news/11096/11427011/which-world-cup-stars-have-hit-the-fastest-speeds-in-russia"&gt;this&lt;/a&gt; one by Sky Sports, it's not uncommon for players to be able to reach speeds of 33km/hour upwards.&lt;/p&gt;
&lt;h2 id="what-about-the-var-cameras"&gt;What About the VAR Cameras?&lt;/h2&gt;
&lt;p&gt;VAR has access to the broadcast footage of the match so the exact resolution of the cameras will vary depending on the broadcaster but in the case of Sky Sports and BT here in the UK they record at a resolution of &lt;a href="https://sport.bt.com/watch-now/programmes/bt-sport-ultimate-everything-you-need-to-know-S11364379497031"&gt;50 frames per second&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-does-this-mean"&gt;What Does This Mean?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;50 frames per second means an image is captured by the camera every 0.02 seconds&lt;/li&gt;
&lt;li&gt;A footballer running at 33km/h is moving 18cm every 0.02 seconds&lt;/li&gt;
&lt;li&gt;Therefore, a footballer can potentially move 18cm between two frames being captured&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now imagine VAR is assessing an offside decision by looking at the earliest frame where the ball was clearly played forwards. Chances are the ball was not played exactly at that moment the frame was captured and actually occurred at some point between the current frame and the previous one 0.02 seconds ago. Since we know that players can move 18cm between frames the frame being viewed by VAR is potentially out by up to 18cm.&lt;/p&gt;
&lt;p&gt;It gets worse than this though as we haven't accounted for the defender moving. As a worst case scenario the defender could be sprinting in the opposite direction to the attacker, whilst trying to play the offside trap for example.&lt;/p&gt;
&lt;p&gt;This means the image being viewed by VAR to determine whether a player is offside or not has potentially occurred late enough after the ball has  been played for the defender and attacker to have moved apart by 36cm.&lt;/p&gt;
&lt;p&gt;Therefore if the attacking player is shown to be offside by 36cm or less then VAR has no absolutely no idea whether they were really offside at the time the ball was actually played. This is a fairly hefty margin of error, which effectively renders VAR unsuitable for any close offside decisions, particularly those where a player has an arm offside.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/raheem_offside_var.jpg"&gt;
&lt;em&gt;VAR making a decision it doesn't have the accuracy to do&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;VAR, what is it good for? Certainly not offside decisions!&lt;/p&gt;
&lt;h2 id="addendum"&gt;Addendum&lt;/h2&gt;
&lt;p&gt;Apparently, in some situations VAR has access to super slow motion footage recorded at 120 frames per second. This reduces the potential error to around 15cm, which is still to large for those close offside calls.&lt;/p&gt;</content><category term="Misc"></category><category term="VAR"></category></entry><entry><title>Opta Pro Forum 2018</title><link href="2018/11/21/opta-pro-forum-2018/" rel="alternate"></link><published>2018-11-21T19:30:00+00:00</published><updated>2018-11-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2018-11-21:2018/11/21/opta-pro-forum-2018/</id><summary type="html">&lt;p&gt;I finally wrote about the presentation I gave at the Opta Pro 2018 Forum....&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I presented a poster back in 2017 at the Opta Pro forum looking at using tracking data to evaluate footballer’s decisions in and around the box, which I've previously written about on &lt;a href="https://www.optasportspro.com/about/optapro-blog/posts/2017/blog-analysing-footballers%E2%80%99-decisions-in-and-around-the-penalty-box/"&gt;Opta Pro’s blog&lt;/a&gt; if you want to find out more about the original idea. Opta then kindly invited me back in 2018 to expand on the work and talk about it on the main stage.&lt;/p&gt;
&lt;p&gt;Rather than turn up in 2017 with just a poster, I’d created a web app that could animate the tracking data so I could present some real world examples and make it more exciting than just me stood next to a giant piece of paper. I decided to do something similar this time around and do without any slides. Instead, I just talked whilst the web app animated the tracking data Opta had provided. This was rather scary to say the least as you never know when your computer is going to update or crash, or even be compatible with the conference’s audio-visual facilities. Plus, I had no slides to remind me what to say and had to get my timings right so I was talking over the correct section of the match. Thankfully, it all seemed to go smoothly and my computer behaved itself.&lt;/p&gt;
&lt;h2 id="demo"&gt;Demo&lt;/h2&gt;
&lt;p&gt;Opta have published videos of some of the presentations from that day but as far as I’m aware mine was never released (it could be said have a face for blogs rather than vlogs 🙂) so I’ve added a quick run through of some of the features below. Please note that as part of the agreement to using the tracking data all analyses had to be anonymised, which is why team names and player names are all removed in the video.&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/AX29UgNUOtU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;

&lt;h2 id="shapes"&gt;Shapes&lt;/h2&gt;
&lt;p&gt;After logging in and selecting a match, the first thing I did in the video above was to add a bounding box around each team’s outfield players. This really helps highlight the changes in a team's shape over the course of a game. The demo also calculates the area of the boxes surrounding each team and the ratio between the two so you can quickly see how much of the pitch each team is controlling.&lt;/p&gt;
&lt;p&gt;Alternatively, instead of just drawing a box around the players we can be a bit more sophisticated and use a convex hull (at 00:29 in the video). This is the simplest polygon that can be drawn around the players and personally I feel it shows up changes in a team’s shape more clearly than the bounding box, especially if you have a player drifting out of position.&lt;/p&gt;
&lt;p&gt;Next up was the central box (at 00:42 seconds into the video), which is an idea that came about when I ran through an early version of my presentation with some of the analysts from Brighton and Hove Albion. They suggested plotting a box between the central midfielders and central defenders to monitor the space between them. Apparently, this is an area of the pitch that Tony Pulis is very keen on so if it's good enough for Tony, it’s good enough for me to include..&lt;/p&gt;
&lt;p&gt;Any finally in this section was the Voronoi (at 00:52 seconds into the video). Its become somewhat of a cliché that somebody always presents a Voronoi at the Opta Pro Forum and this year it was my turn. Cliché or not, I really like them and apart from being somewhat hypnotic they are a great way of visualising the space around each player.&lt;/p&gt;
&lt;h2 id="offensive"&gt;Offensive&lt;/h2&gt;
&lt;p&gt;I then moved onto talking about some ideas around offensive play, particularly using expected goals (xG). The idea being that if you could track the expected goals for each attacking player’s location on the pitch then you could look at whether players were making the correct decision in who they were choosing to pass to.&lt;/p&gt;
&lt;p&gt;The first overlay here at 1:10 in the video shows the expected goals for each attacking player if they were to take a shot from where they were stood based on distance, angle, location and pressure from opposition players near to them. If a player passed to a team mate with a 2% chance of scoring when there’s somebody else with a 10% chance of scoring you could argue that they’ve not made the optimal decision.&lt;/p&gt;
&lt;p&gt;This is somewhat simplistic though as it doesn’t account for the difficulty of making that pass in the first place. This is solved by the xgoals (pass) overlay at 1:23 in the video that shows the combined probability of the player with the ball successfully passing to a team mate and that team mate scoring from their current location based on distance, angle, location, opposition pressure etc.&lt;/p&gt;
&lt;p&gt;There’s also a line of sight overlay 1:47 in the video that draws a triangle from the player with the ball to the opposition’s goal posts and calculates the number of opposing players inside that triangle blocking the attacker’s view of the goal.&lt;/p&gt;
&lt;h2 id="defensive"&gt;Defensive&lt;/h2&gt;
&lt;p&gt;The next section of my presentation moved onto looking at some defensive ideas. The first one was just plotting the offside line to be able to quickly identify players drifting offside. It’s not particular exciting but certainly comes in useful.&lt;/p&gt;
&lt;p&gt;What's more interesting though is to plot a line through each team’s defenders (at 2:18 in the video). You can then measure how straight the defensive line is and whether particular players are getting pulled out of position. You can also measure the distance between the defenders and look at how evenly spaced out they, which is the dispersion metric in the video - the higher the dispersion the more uneven the distance between neighbouring defenders is.&lt;/p&gt;
&lt;h2 id="match-summaries"&gt;Match Summaries&lt;/h2&gt;
&lt;p&gt;As well as watching the animation to see these metrics changing in realtime, it’s often useful to get a view of how they changed over the full game so I also talked through a few summaries of the data generated by the app (shown at 2:46 in the video).&lt;/p&gt;
&lt;p&gt;The first chart shows the area of the home team’s bounding box with blue lines showing when they conceded shots. As you can clearly see they tended to concede shots when their area was the smallest. This is somewhat to be expected as teams naturally compact when they are defending and are pushed back but it would be interesting to analyse the differences between those decreases in area where a shot wasn't conceded with those where it was.&lt;/p&gt;
&lt;p&gt;The second chart shows the expected goals for one of the attackers during the match based on his location on the pitch and the defending players around him. During the first half he was constantly in areas where he was threatening the goal, whereas in the second half there is only three occasions where he was potentially in a goal scoring position. As an analyst, you probably wouldn’t want to go and show this chart directly to a coach but it certainly gives you something you should go and analyse the causes of.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Most of the ideas I talked about were intentionally fairly simple as I wanted the presentation to be approachable to the analysts in the audience without getting bogged down in technical explanations. There’s loads of potential to expand on these ideas though. For example I didn’t get time to talk about the distance between the defenders / midfielders or midfielders / attackers and how this correlates with shots. I also skipped over talking about predicting a player’s probability of scoring from a future position rather than their current one so you can analyse the value of players’ runs or through balls etc.&lt;/p&gt;
&lt;p&gt;So how was my presentation received? I honestly have no idea as outside a couple of questions from the audience I had no feedback whatsoever. I tried to talk with analysts from a few different clubs after my presentation to get their feedback but was pretty much blanked by all of them. I can only presume teams are either not interested in tracking data, are too scared to talk about it in case they give something away or the ideas I presented just stink.&lt;/p&gt;
&lt;p&gt;Personally, I feel there’s a huge amount of insight to be gained from working with tracking data. It’s much more difficult to work with though and it’s much harder to put an engaging presentation / blog together when everything has to be anonymised. Being able to talk about say David Silva’s movement on the pitch would be much more interesting to an audience than talking about Anonymous Player X.&lt;/p&gt;
&lt;p&gt;It’s also difficult as an outsider to the football industry as there is no public access to tracking data that I know of. I’m lucky enough to have accumulated a few matches worth of TRACAB data through HackMCFC and the Opta Pro Forum but I still have nowhere near enough to attempt the majority ideas I have. And, even if I did, I’m not sure if I could ever publicly write about it.&lt;/p&gt;
&lt;p&gt;Finally, because I want to end on a positive note, I've throughly enjoyed presenting at the Opta Pro Forum each time I’ve been there. It’s always a pleasure to catch up with the guys from Opta and all the bloggers I’ve got to know over the years, and I'm really looking forward to 2019's Forum.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Opta Pro Forum"></category><category term="Player Analytics"></category><category term="Expected Goals"></category><category term="Opta Pro Forum"></category></entry><entry><title>Automated Feature Engineering</title><link href="2018/05/25/deep-feature-synthesis-automated-feature-engineering/" rel="alternate"></link><published>2018-05-25T19:30:00+00:00</published><updated>2018-05-25T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2018-05-25:2018/05/25/deep-feature-synthesis-automated-feature-engineering/</id><summary type="html">&lt;p&gt;I recently gave a presentation on automated feature engineering at MancML....&lt;/p&gt;</summary><content type="html">&lt;h2 id="automated-feature-engineering-using-deep-feature-synthesis"&gt;Automated Feature Engineering Using Deep Feature Synthesis&lt;/h2&gt;
&lt;p&gt;I recently gave a presentation at the MancML about using Deep Feature Synthesis to automate Feature Engineering and how it fits into a machine learing pipeline - the slides are belowfor anyone interested in findng out more.&lt;/p&gt;
&lt;iframe src="//www.slideshare.net/slideshow/embed_code/key/FPYziJ96H7wr3y" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen&gt; &lt;/iframe&gt;
&lt;div style="margin-bottom:5px"&gt; &lt;strong&gt; &lt;a href="//www.slideshare.net/MartinEastwood/deep-feature-synthesis-98745941" title="Deep Feature Synthesis" target="_blank"&gt;Deep Feature Synthesis&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href="https://www.slideshare.net/MartinEastwood" target="_blank"&gt;Martin Eastwood&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;</content><category term="Feature Engineering"></category><category term="Feature Engineering"></category></entry><entry><title>Deep Learning in R Using Keras</title><link href="2017/09/11/deep-learning-in-r-keras/" rel="alternate"></link><published>2017-09-11T19:30:00+00:00</published><updated>2017-09-11T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2017-09-11:2017/09/11/deep-learning-in-r-keras/</id><summary type="html">&lt;p&gt;I recently gave a presentation at the Manchester R Users Group on Deep Learning in R using Keras....&lt;/p&gt;</summary><content type="html">&lt;h2 id="deep-learning-in-r-using-keras"&gt;Deep Learning in R Using Keras&lt;/h2&gt;
&lt;p&gt;I recently gave a presentation at the Manchester R Users Group on Deep Learning in R using Keras. The slides are below for anybody who's interested, but be warned R's Keras package is evolving rapidly so they may soon start to go out of date!&lt;/p&gt;
&lt;iframe src="//www.slideshare.net/slideshow/embed_code/key/FD9Y22SYdRk7fg" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen&gt; &lt;/iframe&gt;
&lt;div style="margin-bottom:5px"&gt; &lt;strong&gt; &lt;a href="//www.slideshare.net/MartinEastwood/deep-learning-in-r" title="Deep Learning In R" target="_blank"&gt;Deep Learning In R&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href="https://www.slideshare.net/MartinEastwood" target="_blank"&gt;Martin Eastwood&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;</content><category term="Deep Learning"></category><category term="Deep Learning"></category></entry><entry><title>An Alternatve to Radars for Visualising Football Data</title><link href="2017/07/17/radar-alternatives-football-data/" rel="alternate"></link><published>2017-07-17T19:30:00+00:00</published><updated>2017-07-17T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2017-07-17:2017/07/17/radar-alternatives-football-data/</id><summary type="html">&lt;p&gt;I've long been critical of the use of radar plots for visualizing football data and was recently challenged by a reader of this blog to come up with a better alternative so here we go...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I've long been critical of the use of radar plots for visualizing football data and
was recently challenged by a reader of this blog to come up with a better
alternative so here we go.&lt;/p&gt;
&lt;h2 id="radars"&gt;Radars&lt;/h2&gt;
&lt;p&gt;First of all, why are radar plots so bad?&lt;/p&gt;
&lt;p&gt;Well, as Luke Bornn's recent tweet so excellently shows, the shape of the plot is
controlled by the order of the variables - swap a few of them around and
suddenly the shape you are looking at is completely different, making visual
comparisons difficult and prone to error. Essentially, the shape you are looking at
and judging is meaningless as it's determined purely by the author's choice of layout.&lt;/p&gt;
&lt;blockquote class="twitter-tweet" data-lang="en-gb"&gt;&lt;p lang="en" dir="ltr"&gt;A reminder, blatantly plagiarized from &lt;a href="https://twitter.com/stat_sam"&gt;@stat_sam&lt;/a&gt;, of why radar plots are misleading. Eye focuses on area, not length. &lt;a href="https://t.co/Dk3gcn1GD1"&gt;pic.twitter.com/Dk3gcn1GD1&lt;/a&gt;&lt;/p&gt;&amp;mdash; Luke Bornn (@LukeBornn) &lt;a href="https://twitter.com/LukeBornn/status/864856335191388162"&gt;17 May 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;

&lt;p&gt;Another major issue is that the area of the radar's shape does not increase
linearly. If you double the size of a value on a bar chart then the bar
representing it doubles in area too - this is a good thing as it
means we can compare data easily.&lt;/p&gt;
&lt;p&gt;However, double the size of your values in a radar plot and the area increases by
the square of the values instead. This magnifies small differences, making them
appear much bigger than they really are and distorting any visual comparisons we
make. This is a bad thing, &lt;strong&gt;a very bad thing.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="the-alternatives"&gt;The Alternatives&lt;/h2&gt;
&lt;p&gt;Neil Charles showed some player comparisons at the Opta Pro Forum a couple of
years back. These were based on bullet graphs but with the central bar replaced
with a strip plot to try and highlight the distribution of the data.&lt;/p&gt;
&lt;blockquote class="twitter-tweet" data-lang="en"&gt;&lt;p lang="en" dir="ltr"&gt;Gylfi, because he&amp;#39;s topical. First four boxes raise serious questions (which others inc. &lt;a href="https://twitter.com/mixedknuts"&gt;@mixedknuts&lt;/a&gt; have answered in detail) &lt;a href="https://twitter.com/hashtag/Swans?src=hash"&gt;#Swans&lt;/a&gt; &lt;a href="https://twitter.com/hashtag/EFC?src=hash"&gt;#EFC&lt;/a&gt; &lt;a href="https://t.co/yzowruuywW"&gt;pic.twitter.com/yzowruuywW&lt;/a&gt;&lt;/p&gt;&amp;mdash; Neil Charles (@neilcharles_uk) &lt;a href="https://twitter.com/neilcharles_uk/status/885495541722951683"&gt;July 13, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;

&lt;p&gt;I remember being really impressed with these at the time but always felt that
the central strip plot didn't quite work as it was one-dimensional and the
overlap of the points obscured the true distribution of the data. If you look at
the example above you can see it all kind of merges into one grey line down the
centre of the plot.&lt;/p&gt;
&lt;p&gt;Another idea that has been gaining a little bit of traction recently are &lt;a href="https://github.com/clauswilke/ggjoy"&gt;Joyplots&lt;/a&gt;.
These are great as they show the distribution of the data really clearly
provided they are plotted at a reasonable size. However, as soon as you shrink
them to fit few onto a page or overlap them then it obscures the data making
visual comparisons tricky.&lt;/p&gt;
&lt;blockquote class="twitter-tweet" data-lang="en"&gt;&lt;p lang="en" dir="ltr"&gt;Crosby, McDavid, and the Greatest Player in the World... &lt;a href="https://t.co/VbZAm372ck"&gt;pic.twitter.com/VbZAm372ck&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ryan Stimson (@RK_Stimp) &lt;a href="https://twitter.com/RK_Stimp/status/885857761845813248"&gt;July 14, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;

&lt;h2 id="swarm-plot"&gt;Swarm Plot&lt;/h2&gt;
&lt;p&gt;After much consideration I settled on going with a &lt;a href="http://yaikhom.com/2013/04/05/implementing-a-beeswarm-plot.html"&gt;swarm plot&lt;/a&gt;.
The swarm plot takes the central strip used by Neil Charles but instead of
flattening it into a single dimension where points can obstruct each other,
it applies just enough jitter to each point to separate them and prevent
any overlaps.&lt;/p&gt;
&lt;p&gt;This can create a chart that looks a little like a swarm of bees, hence the name,
but importantly it also means that you can clearly see the distribution of the
underlying data without any overlapping points obscuring each other.&lt;/p&gt;
&lt;p&gt;I kept Neil Charles' approach of coloring the data by quintiles so you
can quickly see what group the player falls within. For example, if a
player is in the red section then they are in the bottom 20% of players for that
particular metric (bad), if they are in the yellow then they are in the middle 20% (average)
and if they are dark green then they are in the top 20% of players (yay!).&lt;/p&gt;
&lt;p&gt;I've also added a player rating in too, which is just the average of all the
quintiles. I'm not fully convinced of the merits of this but I've shown this style
of chart to a few people and pretty much everybody asked what the player's average
was so I've included it on the chart.  &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/kevin_de_bruyne_73084_13796_167.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/Gabriel_Jesus_279379_13796_167.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/raheem_sterling_97692_13796_167.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/aleksandar_kolarov_12267_13796_167.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/nicolás_otamendi_75691_13796_167.png"&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;As ever, let me know what you think.&lt;/p&gt;</content><category term="Visualisation"></category><category term="Player Analytics"></category><category term="Visualisation"></category></entry><entry><title>Opta Pro Forum 2017</title><link href="2017/03/11/opta-pro-forum-2017/" rel="alternate"></link><published>2017-03-11T19:30:00+00:00</published><updated>2017-03-11T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2017-03-11:2017/03/11/opta-pro-forum-2017/</id><summary type="html">&lt;p&gt;I wrote up the poster presentation I gave at the 2017 Opta Pro Forum for the Opta Pro blog looking at using machine learning to quantify footballer's decisions....&lt;/p&gt;</summary><content type="html">&lt;h2 id="analyzing-footballers-decisions-in-and-around-the-penalty-box"&gt;Analyzing Footballer’s Decisions in and Around the Penalty Box&lt;/h2&gt;
&lt;p&gt;I wrote up the poster presentation I gave at the 2017 Opta Pro Forum for the Opta Pro blog looking at using machine learning to quantify footballer's decisions. You can read all about it &lt;a href="http://www.optasportspro.com/about/optapro-blog/posts/2017/blog-analysing-footballers%E2%80%99-decisions-in-and-around-the-penalty-box/"&gt;here&lt;/a&gt;&lt;/p&gt;</content><category term="Opta Pro Forum"></category><category term="Player Analytics"></category><category term="Expected Goals"></category><category term="Opta Pro Forum"></category></entry><entry><title>Opta Pro Forum 2016</title><link href="2016/12/20/opta-pro-forum-2016/" rel="alternate"></link><published>2016-12-20T19:30:00+00:00</published><updated>2016-12-20T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2016-12-20:2016/12/20/opta-pro-forum-2016/</id><summary type="html">&lt;p&gt;With the 2017 Opta Pro Forum rapidly approaching, I thought it was about time I transcribed my presentation from the previous event....&lt;/p&gt;</summary><content type="html">&lt;h2 id="a-mathematical-approach-to-evaluating-footballers"&gt;A Mathematical Approach to Evaluating Footballers&lt;/h2&gt;
&lt;p&gt;With the 2017 Opta Pro Forum rapidly approaching, I thought it was about time I transcribed my presentation from the previous event discussing a mathematical model I’d been developing for rating and ranking footballers.&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;An obvious starting point for developing a model like this was the plus / minus score you often see in American sports. This simply measures a team’s goal difference when a player is on the pitch compared with off it, where the greater the plus / minus score the better the player’s impact.&lt;/p&gt;
&lt;p&gt;Whilst this may seem like a good idea, it’s unfortunately heavily biased towards the better teams. For example, take Sergio Agüero and put him in Stockport County’s first team and his plus / minus score would inevitably drop due to the quality of the players around him.&lt;/p&gt;
&lt;p&gt;You can try and improve on this by moving to the adjusted plus / minus, which tries to account for the talent of the other players on the pitch. However, this doesn’t work well for football as the sport’s relative lack of substitutions doesn’t provide enough data to model all the combinations of players accurately enough and you end up with large errors that make the results meaningless.&lt;/p&gt;
&lt;h2 id="going-bayesian"&gt;Going Bayesian&lt;/h2&gt;
&lt;p&gt;An alternative approach is to use Bayesian Statistics, which allow you to incorporate prior beliefs into your mathematical model to help inform the predictions where there is a lack of data, such as for young players who haven’t played much before.&lt;/p&gt;
&lt;p&gt;Imagine watching a footballer play for the first time - there’s a probability they may be the next Lionel Messi, there’s a probability they may be the next Bebé and there’s a probability they may be average and be somewhere between the two.&lt;/p&gt;
&lt;p&gt;This is essentially how the model works. Based on all the available data for each player, the model constructs a set of Priors and uses them to estimate the player’s true talent compared with all the other footballers in the world. As the model gains more information about a player over the course of their career, the influence of these Priors diminishes and the credible intervals narrow around the player’s true talent level as the model gains more confidence in its predictions.&lt;/p&gt;
&lt;h2 id="talent-distribution"&gt;Talent Distribution&lt;/h2&gt;
&lt;p&gt;Figure One shows the distribution of player ratings in the English Premier League. To put these ratings into context they were calculated in February 2016 and at the lower end of the scale was Fabriccio Coloccini, a once decent footballer whose legs were rapidly failing him. In the middle was Ryan Mason who’s pretty much the epitome of averageness for the Premier League, and at the top end was Mesut Özil, one of the highest rated players to have ever graced the Premier League.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_1.png"&gt;
&lt;strong&gt;Figure One: Distribution of player ratings in the English Premier League&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="player-career-trajectories"&gt;Player Career Trajectories&lt;/h2&gt;
&lt;p&gt;If you take these ratings and plot them out by age you can get an idea of what the average Premier League player’s career look like (Figure Two). Players start to break into the first team aged around 18, then steadily improve until the age of 25. They then plateau until around the age of 28, at which point the effects of age start to creep in and the player starts to decline before dropping out of the Premier League by the time they hit 35.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_2.png"&gt;
&lt;strong&gt;Figure Two: Average career of a Premier League Footballer&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To put this into perspective, Figure Three shows Wayne Rooney’s career trajectory up until February 2016. The model rated Wayne Rooney as being massively superior to the average footballer from a very young age but it also suggests that he has been in serious decline since his mid-twenties and that he’s probably playing like a footballer five years older than he really is.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_3.png"&gt;
&lt;strong&gt;Figure Three: Wayne Rooney’s career trajectory&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Figure Four below also shows the career trajectory for Micah Richards who signed for Aston Villa back in the summer of 2015. On the surface, it looked like a pretty good deal as Villa were getting an England international at his peak age for free. However, Richards’ profile shows that he has been in decline since the age of 25, and that by the time his four-year contract is up he’s likely to be a very average Championship defender at best.  &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_4.png"&gt;
&lt;strong&gt;Figure Four: Career trajectories for Wayne Rooney, James Milner, Micah Richards, and Michael Owen&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="validating-the-model"&gt;Validating the Model&lt;/h2&gt;
&lt;p&gt;Whenever I discuss this concept with people their first question is inevitably to ask who the top players in the world are, and at the time of writing the presentation for Opta back in February 2016 the results were:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Lionel Messi&lt;/li&gt;
&lt;li&gt;Thomas Müller&lt;/li&gt;
&lt;li&gt;Cristiano Ronaldo&lt;/li&gt;
&lt;li&gt;Manuel Neuer&lt;/li&gt;
&lt;li&gt;Busquets&lt;/li&gt;
&lt;li&gt;Toni Kroos&lt;/li&gt;
&lt;li&gt;Neymar&lt;/li&gt;
&lt;li&gt;Robert Lewandowski&lt;/li&gt;
&lt;li&gt;Jordi Alba&lt;/li&gt;
&lt;li&gt;David Alaba&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Admittedly, it doesn’t take a complex mathematical model to know that Lionel Messi is the greatest footballer in the world but it’s reassuring to see that the results look feasible, and it hopefully provides some confidence in the model’s utility. This is important because these types of ratings are highly subjective and difficult to validate as there’s no right or wrong answer - ask 100 football fans who the top players in the world are and you’re likely to get 100 slightly different answers.&lt;/p&gt;
&lt;h2 id="false-negatives"&gt;False Negatives&lt;/h2&gt;
&lt;p&gt;The next step in terms of validating the model was to look at false negatives. If the model’s ratings were to be trusted, then you’d expect the best players in the world to have shown up as some of the best players in their age group when they were younger. If not, then the model is potentially missing something and ranking players lower than they should be.&lt;/p&gt;
&lt;p&gt;To investigate this, I looked back at how the top players in the world were ranked on their 21st birthday compared with all the other players aged 21 or under at the time (Figure Five shows the top ten). Again, the results looked feasible with all the players ranked highly amongst their peers. The lowest ranked was Jordi Alba, who was playing at Gimnàstic de Tarragona in the Spanish 2nd tier at the time.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_5.png"&gt;
&lt;strong&gt;Figure Five: Rankings of the top ten players world-wide when aged 21&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="false-positives"&gt;False Positives&lt;/h2&gt;
&lt;p&gt;We can also test the model by looking forwards in time to see whether youngsters rated highly tend to go on and have successful careers. As an example, Figure Six shows the rankings on the 1st February 2016 for the top ten ranked 21-year-olds on the 1st February 2010.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_6.png"&gt;
&lt;strong&gt;Figure Six: Rankings of the top ten 21-year-olds on the 1st February 2010&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So back on the 1st February 2010, Holger Badstuber was ranked as the number one player worldwide aged 21 or under and by the 1st February 2016, he was ranked 229th best footballer in the world overall. To put this ranking into perspective, there are around 75,000 active players in the model’s database so it puts Badstuber in the top 1% of all footballers worldwide and in the top 10% of all footballers across the big five European leagues (England, Spain, Germany, Italy and France).&lt;/p&gt;
&lt;p&gt;I’ve only shown a snapshot of data here as it’s not possible to display comparisons for all 75,000 players. However, for the current top 5,000 players in the world the average difference in rank at age 21 to now is around 150 places suggesting the model has decent predictive power for identifying which youngsters are likely to go on to have successful careers.&lt;/p&gt;
&lt;h2 id="player-ratings-and-team-quality"&gt;Player Ratings and Team Quality&lt;/h2&gt;
&lt;p&gt;These player ratings are also highly correlated to how well a team performs over the course of a season, where the higher a team’s average player ratings them the more points they are likely to achieve (Figure Seven).  &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_7.png"&gt;
&lt;strong&gt;Figure Seven: Correlation between average team rating and points achieved per season&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For those of you with a mathematical background, the r-squared value here is 0.77, meaning that 77% of the variability in the number of points a team achieves per season can be explained by the average rating of their players.&lt;/p&gt;
&lt;p&gt;An interesting use of this is to predict the impact of injuries on a team’s expected points. For example, Sergio Agüero missed 12 matches of the 2015 / 2016 English Premier League due to injury and was predominantly replaced by Wilfried Bony. This reduced Manchester City’s average team rating by the equivalent of three points over the additional minutes played by Bony during Agüero’s injures.&lt;/p&gt;
&lt;p&gt;You can also extend this idea to assess the value of potential transfers on your team. For example, if you are in the position to sign a new player who will increase your predicted points next season by two points then how much is that worth to you as a club in terms of transfer expenditure?&lt;/p&gt;
&lt;h2 id="player-ratings-and-league-position"&gt;Player Ratings and League Position&lt;/h2&gt;
&lt;p&gt;A team’s average player rating also correlates with their final league position (Figure Eight).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20161220_figure_8.png"&gt;
&lt;strong&gt;Figure Eight: Correlation between average team rating and league position&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you’ve not seen this type of chart before, it’s known as a Box and Whiskers plot. The thick black line across the centre of each box shows the average, the vertical lines show the most common range of values and the dots are outliers.&lt;/p&gt;
&lt;p&gt;So, to finish first in the English Premier League your team needs an average player rating of 125, although the champions have ranged between 115 to 130.&lt;/p&gt;
&lt;p&gt;To finish in the top four and reach the Champions League teams need an average rating of 118, although in 2011 / 2012 Tottenham managed fourth with a rating of 103.&lt;/p&gt;
&lt;p&gt;To avoid relegation, teams need a rating of 105. The outlier here is Liverpool in 2009 / 2010 who managed to finish in seventh place despite having the second highest rated team that season, with players like Javier Mascherano, Fernando Torres and Steven Gerrard in their squad.&lt;/p&gt;
&lt;p&gt;Finally, the teams that get relegated have average ratings of around 101, noticeably below the League’s average player rating of 108.&lt;/p&gt;
&lt;p&gt;Interestingly though, there is a large overlap between the relegation and safety groups. This shows that the worst teams don’t always get relegated, perhaps due to reasons like bad luck, injuries or employing Tony Pulis…&lt;/p&gt;
&lt;h2 id="applications-for-football-teams"&gt;Applications for Football Teams&lt;/h2&gt;
&lt;p&gt;I’m not advocating that teams should be signing players purely on the output of a mathematical model but using data like this can help football teams in many ways, such as identifying potential transfer targets, exploring player’s career trajectories and quantifying the potential impact of injuries.&lt;/p&gt;
&lt;h2 id="appendix"&gt;Appendix&lt;/h2&gt;
&lt;iframe src="//www.slideshare.net/slideshow/embed_code/key/3qY08ZR0e4dBlr" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen&gt; &lt;/iframe&gt;
&lt;div style="margin-bottom:5px"&gt; &lt;strong&gt; &lt;a href="//www.slideshare.net/MartinEastwood/2016-opta-pro-forum" title="2016 Opta Pro Forum" target="_blank"&gt;2016 Opta Pro Forum&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a target="_blank" href="//www.slideshare.net/MartinEastwood"&gt;Martin Eastwood&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;</content><category term="PlayerRating"></category><category term="Recruitment"></category><category term="Player Analytics"></category><category term="Opta Pro Forum"></category></entry><entry><title>A Footballer Recommendation Engine</title><link href="2016/06/30/footballer-recommendation-engine/" rel="alternate"></link><published>2016-06-30T19:30:00+00:00</published><updated>2016-06-30T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2016-06-30:2016/06/30/footballer-recommendation-engine/</id><summary type="html">&lt;p&gt;With the transfer window well under way I thought I'd discuss my footballer recommendation engine for identifying potential transfer targets....&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;With the transfer window well under way I thought I'd discuss my footballer recommendation engine for identifying potential transfer targets.&lt;/p&gt;
&lt;h2 id="recommendation-engines"&gt;Recommendation Engines&lt;/h2&gt;
&lt;p&gt;Recommendation engines have become increasingly popular over the past few years as a way for companies to personalize content. Whether it's Amazon recommending books, Twitter suggesting people to follow or Netflix suggesting what films to watch, they are typically generating recommendations using versions of a technique called &lt;a href="https://en.wikipedia.org/wiki/Collaborative_filtering"&gt;Collaborative Filtering&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Collaborative Filtering takes information about users' behaviors and uses it to calculate their personal preferences. We then assume that if users exhibit the same behaviors they'll likely agree about other things. For example, if a group of people like watching the same films as you, then you'll probably like watching the films they've seen that you've not.&lt;/p&gt;
&lt;p&gt;There are countless different approaches for carrying out collaborative filtering, each of which add their own unique flavor to the recommendations they produce. Recommendation engines have been extensively studied over the past couple of decades and many of the algorithms are well documented but the key to making your recommendations successful is often finding an approach that works for your particular domain. For example, quantifying the similarities across sports data and recommending footballers requires a different approach to how you would recommend books or films.&lt;/p&gt;
&lt;p&gt;As an aside, Amazon have some interesting &lt;a href="https://www.google.co.uk/search?tbm=pts&amp;amp;hl=en&amp;amp;q=amazon+recommendations&amp;amp;gws_rd=ssl"&gt;patents&lt;/a&gt; discussing recommendations that are well worth a read if you are interested in the topic. Patents are normally fully of lexicon-mangling legalese but to their credit Amazon's are actually really accessible and easy to read.&lt;/p&gt;
&lt;h2 id="recommendations"&gt;Recommendations&lt;/h2&gt;
&lt;p&gt;Okay, let's take a look at some of the recommendations.&lt;/p&gt;
&lt;h3 id="pablo-zabaleta"&gt;Pablo Zabaleta&lt;/h3&gt;
&lt;p&gt;I'm a Manchester City fan so let's start there as there's likely to be plenty of transfers this summer - in fact, the next transfer looks like the departure of Pablo Zabaleta to Roma. With Zabaleta gone City's only other senior right fullback is 33-year-old Bacary Sagna so unless there's some crazy change in tactics from Pep Guardiola it seems reasonable to assume they'll be looking for a replacement.&lt;/p&gt;
&lt;p&gt;In his prime Zabaleta was one of the greatest fullbacks to have graced the Premier League so let's look at recommendations for similar players to him.  We don't want similar players to last season's Zabaleta though as his legs were clearly slowing down and he was struggling to keep up with the game. Instead, let's look at the Pablo Zabaleta who was Manchester City's player of the year back in 2012/2013 and see which modern day players compare with him.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20160702_zaba_recs.png"&gt;
&lt;strong&gt;Figure One: Players Similar To Pablo Zabaleta&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reassuringly, the top ten players recommended are all right fullbacks. At no point do I define player's positions in the algorithm, the recommendation engine learns this implicitly from the data and incorporates it into the recommendations. Also, to my eye they all look to be attacking fullbacks too, which matches with Zabaleta's style of bursting down the wing on the overlap to support the attack.&lt;/p&gt;
&lt;p&gt;The players are ranked by how similar the recommendations are to the real thing, ranging from zero where there is no similarity to one where they are identical to each other. The top recommendation here is Paris Saint-Germain's Gregory van der Wiel with a similarity of 0.87, making him a very close match. Typically, the recommendations don't really go above 0.8 that often so it's quite rare to get such a close match.&lt;/p&gt;
&lt;h3 id="yaya-toure"&gt;Yaya Touré&lt;/h3&gt;
&lt;p&gt;Another player who is (hopefully) on his way out is Yaya Touré. Yaya has been immense over the years for Manchester City, but like Zabaleta his legs are fading fast and he is struggling / can't be bothered &lt;em&gt;(delete as appropriate)&lt;/em&gt; to keep up with the game so who's his closest like-for-like replacement?&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20160702_yaya_recs.png"&gt;
&lt;strong&gt;Figure Two: Players Similar To Yaya Touré&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The first thing to notice here is that the closest match only has a similarity of 0.65. This is low - Yaya really is a unique player and there is just nobody around who is similar to him. Interestingly though, the top recommendation is Ilkay Gündogan who Manchester City have already signed this summer. So while there is no like-for-like Yaya replacement out there, City have already managed to buy the closest match there is. Top marks to City's scouting department there!&lt;/p&gt;
&lt;h3 id="riyad-mahrez"&gt;Riyad Mahrez&lt;/h3&gt;
&lt;p&gt;Let's take a quick look at a few other interesting players too. Leicester City have done a good job of holding on to their title-winning team so far so who would you sign if you want Riyad Mahrez but can't tempt him away from the KP Stadium?&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20160702_mahrez_recs.png"&gt;
&lt;strong&gt;Figure Three: Players Similar To Riyad Mahrez&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Paulo Dybala is the closest match, with a decent similarity of 0.81. Yeah, good luck trying to convince him to leave Juventus. Second in the list is Nathan Redmond, who's recently been snapped up by Southampton for a bargain £10 million. Third in the list is 19-year-old Milot Rashica who's been getting decent reviews playing in Vitesse's midfield and is rumored to be a target for Napoli. There's also Leroy Sané on there who's hopefully on his way to Manchester City this summer, and Ousmane Dembele who's recently moved to Borussia Dortmund for the crazy low fee of €15 million, so it's a pretty strong list.&lt;/p&gt;
&lt;h3 id="neymar"&gt;Neymar&lt;/h3&gt;
&lt;p&gt;How about you want to sign Neymar for your team but can't afford the real thing?
&lt;img alt="Pelican" src="../../../../images/20160702_neymar_recs.png"&gt;
&lt;strong&gt;Figure Four: Players Similar To Neymar&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The closest match to Neymar is Liverpool's Philippe Coutinho. The similarity is below 0.8 so it's not super high but it's still a fairly reasonable match. Second on the list is Ajax's Amin Younes with a virtually identical score to Coutinho - by the way, it's pretty amazing how often Ajax players crop up in these recommendations. You can pretty much pick any elite player and guarantee Ajax have someone in their teens / early twenties who profiles like them!&lt;/p&gt;
&lt;h3 id="lionel-messi"&gt;Lionel Messi&lt;/h3&gt;
&lt;p&gt;And finally, because people are bound to ask - here's Lionel Messi.
&lt;img alt="Pelican" src="../../../../images/20160702_messi_recs.png"&gt;
&lt;strong&gt;Figure Five: Players Similar To Lionel Messi&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;*Insert usual disclaimer that I'm not advocating signing players based purely on data science / analytics / machine learning / statistics and that the goal should be to combine all of the above with the domain knowledge from professionals inside the game yada yada yada.*&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Collaborative filtering is a useful technique for taking large amounts of data and filtering it down to a single value - in this case how similar different players are to each other. This information can then help refine shortlists for potential transfer targets and be used alongside more traditional scouting.&lt;/p&gt;
&lt;p&gt;It can also be used to identify interesting youngsters. For example, there is currently a 20-year-old out there with a similarity to Gareth Bale of 0.81, a 21-year-old goalkeeper with a similarity to Manuel Neuer of 0.88 and two players in their early twenties with similarities of 0.85 to Sergio Agüero.&lt;/p&gt;
&lt;p&gt;Much like Yaya Touré, many of the real elite players are pretty unique in what they do and there are very few of these youngsters out there who profile like them, at least outside of Ajax anyway!&lt;/p&gt;</content><category term="Recruitment"></category><category term="Recruitment"></category><category term="Recommendations"></category><category term="Player Analytics"></category></entry><entry><title>Expected Goals and Uncertainty</title><link href="2016/04/29/expected-goals-and-uncertainty/" rel="alternate"></link><published>2016-04-29T19:30:00+00:00</published><updated>2016-04-29T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2016-04-29:2016/04/29/expected-goals-and-uncertainty/</id><summary type="html">&lt;p&gt;My Twitter feed seems to be increasingly taken up with discussions of Expected Goals in football yet there always seems to be something important missing from the discussion, and that's uncertainty...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;My Twitter feed seems to be increasingly taken up with discussions of Expected Goals in football, yet there always seems to be something important missing from the discussion, and that's &lt;em&gt;uncertainty&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id="what-are-expected-goals"&gt;What Are Expected Goals&lt;/h2&gt;
&lt;p&gt;Due to the way most expected goals models have been developed, when we refer to expected goals we are typically talking about the probability of an average player scoring a goal from an average shot taken from a certain situation on the pitch - for example taking a header at location x, y, from a chance originating from a through ball.&lt;/p&gt;
&lt;p&gt;While this provides a reasonable estimate as to what happens over the long term for that type of shot, it fails to accurately represent the individual shot we are attempting to measure. &lt;/p&gt;
&lt;p&gt;Pierre-Simon Laplace suggested back 1814 in his &lt;a href="https://archive.org/details/philosophicaless00lapliala"&gt;Philosophical Essay on Probabilities&lt;/a&gt; that if you know the precise location and momentum of every atom in the universe, their past and future values can be calculated. But in the real world we don't have that level of detail, just a bunch of data scraped off the internet. We have no idea of the ball's momentum, whether it's spinning, what the position of the defenders are, how well the goalkeeper is positioned, whether the attacking player is off balance, the direction of the wind, whether the attacking player is nursing an injury, how good the attacking player is at shooting, and so on.&lt;/p&gt;
&lt;p&gt;As well as the uncertainty due to this lack of information, there is also the uncertainty of the model itself. We are not training the model on every shot that's ever been taken, rather we are using a subset of shots and that in itself adds to our uncertainty. This is known as the &lt;a href="https://en.wikipedia.org/wiki/Sampling_error"&gt;sampling error&lt;/a&gt;, and is the difference between the true value for a statistic and the value we've estimated from our sample of data.&lt;/p&gt;
&lt;p&gt;No two shots taken from the same situation are ever &lt;em&gt;exactly&lt;/em&gt; the same, and even if they were we cannot be certain enough of the estimate from our model to assign a single, absolute value to the output.&lt;/p&gt;
&lt;h2 id="uncertainty"&gt;Uncertainty&lt;/h2&gt;
&lt;p&gt;This doesn't necessarily mean that expected goals are useless, but that more care needs to be taken of their usage. Instead of a single value, expected goals need to convey their uncertainty using techniques such as &lt;a href="https://en.wikipedia.org/wiki/Confidence_interval"&gt;confidence intervals&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Confidence intervals provide an estimated range likely to contain the true value at a set confidence level. So instead of just saying a shot is worth 0.25 expected goals, we would say say the shot was worth 0.25 ± 0.1 expected goals at the 95% confidence level. This essentially means that we are 95% confident the true value of expected goals for that particular shot lies somewhere in the range of 0.15 - 0.35.&lt;/p&gt;
&lt;p&gt;Sure, it's not so snappy but it conveys so much more information then the single value does. There is huge variability in expected goals through lack of information, sample sizes and the general randomness of football. By just using the central estimate you are missing out on a lot of information, and potentially sharing misleading numbers too. &lt;/p&gt;
&lt;p&gt;For example, take a look at Figure One below showing the cumulative expected goals by minute from a match between Norwich City and Sunderland. &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20160429_expg_ci_example.png"&gt;
&lt;strong&gt;Figure One: Cumulative Expected Goals By Minute&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The lines show the central estimate for the total expected goals surrounded by the 95% confidence intervals. Around minute 68 Sunderland's central estimate is higher than Norwich's suggesting they have been the better team in terms of expected goals. If you look closer though, due to the high variance associated with Sunderland's shots the bottom of their confidence interval actually stretches below the bottom of Norwich's, suggesting its's feasible for Sunderland to actually have a lower expected goals total than Norwich at that point of the game.&lt;/p&gt;
&lt;p&gt;Even when you move towards larger sample sizes and look at expected goals by team over the course of a season (Figure Two), you still see huge variability that you'll potentially be mislead by when you ignore the uncertainty. Plus, there's some really interesting information in there, like how Crystal Palace's goals against total has a narrower range than West Bromwich Albion's. Who'd have though that??&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20160429_team_xg.png"&gt;
&lt;strong&gt;Figure Two: Expected Goals By Team For 2014/2015&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;The variance associated with expected goals, especially at the level of individual matches (let alone individual shots!!), is such that the uncertainty needs to be clearly accounted for. Without this information, expected goals are at best an inaccurate measure and at worst misleading or wrong.&lt;/p&gt;
&lt;p&gt;Embrace the uncertainty and include confidence intervals!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Ranking Football Teams Using Google's Page Rank Algorithm</title><link href="2016/03/21/football-pagerank/" rel="alternate"></link><published>2016-03-21T19:30:00+00:00</published><updated>2016-03-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2016-03-21:2016/03/21/football-pagerank/</id><summary type="html">&lt;p&gt;I've discussed various techniques for ranking football teams on my blog before, such as using &lt;a href="http://pena.lt/y/2014/11/27/english-premier-league/"&gt;Massey Ratings&lt;/a&gt; to account for strength of schedule, but I've not covered Google's PageRank yet...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I've discussed various techniques for ranking football teams on my blog before, such as using &lt;a href="http://pena.lt/y/2014/11/27/english-premier-league/"&gt;Massey Ratings&lt;/a&gt; to account for strength of schedule, but I've not covered the most famous ranking algorithm of them all yet, Google's PageRank.&lt;/p&gt;
&lt;h2 id="google-pagerank"&gt;Google PageRank&lt;/h2&gt;
&lt;p&gt;The PageRank algorithm (Figure One) was initially developed by Larry Page and Sergey Brin back in the mid nineties whilst working on a research project at &lt;a href="https://www.stanford.edu/"&gt;Stanford University&lt;/a&gt;. When Page and Brin later founded Google, PageRank became the cornerstone for how their search engine ranked webpages and determined the most relevant set of results for a user's query. &lt;/p&gt;
&lt;p&gt;$PR(A) = (1-d) + d(PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))$&lt;/p&gt;
&lt;p&gt;&lt;em&gt;where:&lt;/em&gt; &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PR(A)&lt;/strong&gt; is the PageRank of page A&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PR(Ti)&lt;/strong&gt; is the PageRank of pages Ti which link to page A&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;C(Ti)&lt;/strong&gt; is the number of outbound links on page Ti&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;d&lt;/strong&gt; is a damping factor ranging between 0-1*&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure One: Google's PageRank Algorithm&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Google's search algorithm has evolved considerably over the years since then, with updates such as &lt;a href="https://en.wikipedia.org/wiki/Google_Panda"&gt;Panda&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Google_Hummingbird"&gt;Hummingbird&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/RankBrain"&gt;RankBrain&lt;/a&gt; brought in to help deal with &lt;a href="https://en.wikipedia.org/wiki/Content_farm"&gt;content farms&lt;/a&gt; and to better understand ambiguous queries. However, PageRank still remains the central method for determining a web page's rank in the search results.&lt;/p&gt;
&lt;h2 id="how-does-it-work"&gt;How Does It Work?&lt;/h2&gt;
&lt;p&gt;The PageRank algorithm essentially counts links on the web and treats them like votes of support. The more links there are leading to a specific web page then the more votes there are for that page being of high quality. Not all votes are counted equally though. Votes coming from pages themselves considered high quality count for much more than from pages with few links leading to them. &lt;/p&gt;
&lt;p&gt;This puts us in a bit of a tricky situation though as it means the rank of a web page is dependant on the ranks of all the pages linking to it, which are themselves dependant on the ranks of all the pages pointing to them and so on. Plus, when you factor in that two web pages can both link to each other then you end up with enough circularity to make this initially seem an impossible calculation.&lt;/p&gt;
&lt;p&gt;It turns out that this problem can be solved fairly easily though through brute-force iteration. We start off by giving each webpage a default score we can use to start calculating its PageRank and then iteratively move through the system updating all the pages' ranks based on the ranks of the all pages linking back to them until the whole system converges and the ranks settle down to their true values (or at least close enough we don't mind the remaining error).&lt;/p&gt;
&lt;p&gt;If you want to be more elegant though, you can actually skip this brute force approach and solve the whole system using linear algebra but I'm going to leave that for a future article.&lt;/p&gt;
&lt;h2 id="how-does-this-apply-to-football"&gt;How Does This Apply To Football?&lt;/h2&gt;
&lt;p&gt;Instead of using using links as votes of support from one web page to another we can use goals as votes of support from one team to another, where the more goals a team concedes then the stronger their vote of support for the opposition that scored against them. &lt;/p&gt;
&lt;p&gt;A handy feature of the PageRank algorithm is that web pages only get one vote that ends up being shared out equally between all the other pages they are linking to. When we apply this to football it means that the more goals you score against a team, the greater the share of their vote you receive.&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;Table One below shows the rankings for the top 100 European teams over the past year based on domestic league and European fixtures (domestic cup competitions are not currently included) as calculated using Google's PageRank algorithm. &lt;/p&gt;
&lt;table class="table"&gt;
                &lt;thead&gt;
                    &lt;tr&gt;
                        &lt;th&gt;Rank&lt;/th&gt;
                        &lt;th&gt;Team&lt;/th&gt;
                    &lt;/tr&gt;
                &lt;/thead&gt;
                &lt;tbody&gt;
                    &lt;tr&gt;
                        &lt;td&gt;1&lt;/td&gt;
                        &lt;td&gt;paris saint-germain&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;2&lt;/td&gt;
                        &lt;td&gt;fc barcelona&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;3&lt;/td&gt;
                        &lt;td&gt;bayern münchen&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;4&lt;/td&gt;
                        &lt;td&gt;atlético madrid&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;5&lt;/td&gt;
                        &lt;td&gt;borussia dortmund&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;6&lt;/td&gt;
                        &lt;td&gt;real madrid&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;7&lt;/td&gt;
                        &lt;td&gt;sevilla fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;8&lt;/td&gt;
                        &lt;td&gt;manchester city&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;9&lt;/td&gt;
                        &lt;td&gt;arsenal fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;10&lt;/td&gt;
                        &lt;td&gt;olympique lyon&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;11&lt;/td&gt;
                        &lt;td&gt;valencia cf&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;12&lt;/td&gt;
                        &lt;td&gt;vfl wolfsburg&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;13&lt;/td&gt;
                        &lt;td&gt;tottenham hotspur&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;14&lt;/td&gt;
                        &lt;td&gt;juventus&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;15&lt;/td&gt;
                        &lt;td&gt;sl benfica&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;16&lt;/td&gt;
                        &lt;td&gt;villarreal cf&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;17&lt;/td&gt;
                        &lt;td&gt;cska moskva&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;18&lt;/td&gt;
                        &lt;td&gt;athletic bilbao&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;19&lt;/td&gt;
                        &lt;td&gt;chelsea fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;20&lt;/td&gt;
                        &lt;td&gt;shakhtar donetsk&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;21&lt;/td&gt;
                        &lt;td&gt;as roma&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;22&lt;/td&gt;
                        &lt;td&gt;celta vigo&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;23&lt;/td&gt;
                        &lt;td&gt;inter&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;24&lt;/td&gt;
                        &lt;td&gt;ssc napoli&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;25&lt;/td&gt;
                        &lt;td&gt;bayer leverkusen&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;26&lt;/td&gt;
                        &lt;td&gt;as monaco&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;27&lt;/td&gt;
                        &lt;td&gt;bor. mönchengladbach&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;28&lt;/td&gt;
                        &lt;td&gt;stade reims&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;29&lt;/td&gt;
                        &lt;td&gt;as saint-étienne&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;30&lt;/td&gt;
                        &lt;td&gt;lazio roma&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;31&lt;/td&gt;
                        &lt;td&gt;fc lorient&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;32&lt;/td&gt;
                        &lt;td&gt;zenit st. petersburg&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;33&lt;/td&gt;
                        &lt;td&gt;fk krasnodar&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;34&lt;/td&gt;
                        &lt;td&gt;1899 hoffenheim&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;35&lt;/td&gt;
                        &lt;td&gt;toulouse fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;36&lt;/td&gt;
                        &lt;td&gt;acf fiorentina&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;37&lt;/td&gt;
                        &lt;td&gt;everton fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;38&lt;/td&gt;
                        &lt;td&gt;hellas verona&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;39&lt;/td&gt;
                        &lt;td&gt;malmö ff&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;40&lt;/td&gt;
                        &lt;td&gt;sampdoria&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;41&lt;/td&gt;
                        &lt;td&gt;olympique marseille&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;42&lt;/td&gt;
                        &lt;td&gt;lille osc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;43&lt;/td&gt;
                        &lt;td&gt;rsc anderlecht&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;44&lt;/td&gt;
                        &lt;td&gt;celtic fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;45&lt;/td&gt;
                        &lt;td&gt;rb salzburg&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;46&lt;/td&gt;
                        &lt;td&gt;dinamo moskva&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;47&lt;/td&gt;
                        &lt;td&gt;olympiakos piräus&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;48&lt;/td&gt;
                        &lt;td&gt;real sociedad&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;49&lt;/td&gt;
                        &lt;td&gt;afc ajax&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;50&lt;/td&gt;
                        &lt;td&gt;liverpool fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;51&lt;/td&gt;
                        &lt;td&gt;montpellier hsc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;52&lt;/td&gt;
                        &lt;td&gt;manchester united&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;53&lt;/td&gt;
                        &lt;td&gt;fc porto&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;54&lt;/td&gt;
                        &lt;td&gt;fc augsburg&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;55&lt;/td&gt;
                        &lt;td&gt;psv eindhoven&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;56&lt;/td&gt;
                        &lt;td&gt;club brugge kv&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;57&lt;/td&gt;
                        &lt;td&gt;leicester city&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;58&lt;/td&gt;
                        &lt;td&gt;asteras tripolis&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;59&lt;/td&gt;
                        &lt;td&gt;west ham united&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;60&lt;/td&gt;
                        &lt;td&gt;sassuolo calcio&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;61&lt;/td&gt;
                        &lt;td&gt;southampton fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;62&lt;/td&gt;
                        &lt;td&gt;girondins bordeaux&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;63&lt;/td&gt;
                        &lt;td&gt;crystal palace&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;64&lt;/td&gt;
                        &lt;td&gt;lokomotiv moskva&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;65&lt;/td&gt;
                        &lt;td&gt;brøndby if&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;66&lt;/td&gt;
                        &lt;td&gt;molde fk&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;67&lt;/td&gt;
                        &lt;td&gt;dinamo kiev&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;68&lt;/td&gt;
                        &lt;td&gt;fc københavn&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;69&lt;/td&gt;
                        &lt;td&gt;fenerbahçe&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;70&lt;/td&gt;
                        &lt;td&gt;paok saloniki&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;71&lt;/td&gt;
                        &lt;td&gt;ogc nice&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;72&lt;/td&gt;
                        &lt;td&gt;fc schalke 04&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;73&lt;/td&gt;
                        &lt;td&gt;werder bremen&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;74&lt;/td&gt;
                        &lt;td&gt;stoke city&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;75&lt;/td&gt;
                        &lt;td&gt;terek grozniy&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;76&lt;/td&gt;
                        &lt;td&gt;sporting cp&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;77&lt;/td&gt;
                        &lt;td&gt;espanyol barcelona&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;78&lt;/td&gt;
                        &lt;td&gt;kaa gent&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;79&lt;/td&gt;
                        &lt;td&gt;bsc young boys&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;80&lt;/td&gt;
                        &lt;td&gt;torino fc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;81&lt;/td&gt;
                        &lt;td&gt;galatasaray&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;82&lt;/td&gt;
                        &lt;td&gt;dnipro dnipropetrovsk&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;83&lt;/td&gt;
                        &lt;td&gt;hamburger sv&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;84&lt;/td&gt;
                        &lt;td&gt;stade rennes&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;85&lt;/td&gt;
                        &lt;td&gt;levante ud&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;86&lt;/td&gt;
                        &lt;td&gt;fc twente&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;87&lt;/td&gt;
                        &lt;td&gt;rosenborg bk&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;88&lt;/td&gt;
                        &lt;td&gt;gd estoril&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;89&lt;/td&gt;
                        &lt;td&gt;ac milan&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;90&lt;/td&gt;
                        &lt;td&gt;fc basel&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;91&lt;/td&gt;
                        &lt;td&gt;fc nantes&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;92&lt;/td&gt;
                        &lt;td&gt;hannover 96&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;93&lt;/td&gt;
                        &lt;td&gt;panathinaikos&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;94&lt;/td&gt;
                        &lt;td&gt;az alkmaar&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;95&lt;/td&gt;
                        &lt;td&gt;sunderland afc&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;96&lt;/td&gt;
                        &lt;td&gt;sm caen&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;97&lt;/td&gt;
                        &lt;td&gt;rubin kazan&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;98&lt;/td&gt;
                        &lt;td&gt;fk ural&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;99&lt;/td&gt;
                        &lt;td&gt;1. fsv mainz 05&lt;/td&gt;
                    &lt;/tr&gt;
                    &lt;tr&gt;
                        &lt;td&gt;100&lt;/td&gt;
                        &lt;td&gt;standard liège&lt;/td&gt;
                    &lt;/tr&gt;
                &lt;/tbody&gt;
            &lt;/table&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Table One: Top 100 European Teams As Ranked By The Google PageRank Algorithm&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The initial results seem pretty feasible, with the top five spots comprising Paris Saint-Germain, Barcelona, Bayern München, Atlético Madrid and Borussia Dortmund. &lt;/p&gt;
&lt;p&gt;Olympique Lyon are somewhat of a surprise though in position ten but they beat Paris Saint-Germain earlier in the season and have been playing in the Champions League so perhaps they are better than my pre-conceptions? FK Krasnodar also appear higher than I was expecting but looking back at their results they did quite well in the Europa League this season, including beating Borussia Dortmund, so it's perhaps not unreasonable for them to appear in the top half of the rankings too.&lt;/p&gt;
&lt;p&gt;The are a number of ideas for improving this concept further, such as adding a decay into the data so more recent results carry greater importance in the rankings or adding in home field advantage (which is currently missing), so no doubt there'll be an update to this blog in the future once I've refined things.&lt;/p&gt;</content><category term="PageRank"></category><category term="Ranking"></category><category term="PageRank"></category></entry><entry><title>It’s Not Just The Money: Quality of Signings Suggest China Could Become a Major Power in Club Football</title><link href="2016/02/29/its-not-just-the-money-china-football/" rel="alternate"></link><published>2016-02-29T19:30:00+00:00</published><updated>2016-02-29T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2016-02-29:2016/02/29/its-not-just-the-money-china-football/</id><summary type="html">&lt;p&gt;John Burn-Murdoch and the Financial Times have used my PlayerRatings model to analyse the recent flux of players moving to teams in the Chinese Super League - you can read the full article &lt;a href="http://blogs.ft.com/ftdata/2016/02/28/china-football-signings-transfer-window/"&gt;here&lt;/a&gt;...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="http://blogs.ft.com/ftdata/author/johnburnmurdoch/"&gt;John Burn-Murdoch&lt;/a&gt; and the &lt;a href="http://www.ft.com/home/uk"&gt;Financial Times&lt;/a&gt; have used my &lt;a href="http://pena.lt/y/2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/"&gt;PlayerRatings model&lt;/a&gt; to analyse the recent flux of players moving to teams in the Chinese Super League - you can read the full article &lt;a href="http://blogs.ft.com/ftdata/2016/02/28/china-football-signings-transfer-window/"&gt;here&lt;/a&gt;&lt;/p&gt;</content><category term="PlayerRating"></category><category term="PlayerRating"></category></entry><entry><title>Frequency of Draws in Football</title><link href="2015/12/12/frequency-of-draws-in-football/" rel="alternate"></link><published>2015-12-12T19:30:00+00:00</published><updated>2015-12-12T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-12-12:2015/12/12/frequency-of-draws-in-football/</id><summary type="html">&lt;p&gt;There has been some discussion and misunderstanding around the low frequency of draws in football on my Twitter feed recently so I thought I'd just give a quick recap around why the probabilities of draws are so low...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;There has been some discussion and misunderstanding around the low frequency of draws in football on my Twitter feed recently so I thought I'd just give a quick recap around why the probabilities of draws are so low.&lt;/p&gt;
&lt;h2 id="the-bookmakers"&gt;The Bookmakers&lt;/h2&gt;
&lt;p&gt;Just to give you an idea around how infrequently draws occur in football, the figure below shows Bet365's odds of a draw for all English Premier League matches going back to the 2009/2010 season. The maximum probability they gave during this time was 33.3%, the minimum was 9.1% and the mean was 26.1% with a standard deviation of 4.1%. You can actually see that 33.3% value as it's so rare it stands out from all the other points on the chart. In fact, since 2009/2010 only eight matches have even gone above 31%. &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151229_draws.png"&gt;&lt;/p&gt;
&lt;h2 id="why-so-low"&gt;Why So Low?&lt;/h2&gt;
&lt;p&gt;Bet365's odds of a draw occurring are really narrow at 26.1% ± 4.1%. Is this range really plausible? &lt;/p&gt;
&lt;p&gt;Well, first of all let's start off by making the assumption that the betting industry know what they are doing and that their odds are reflective of what actually happens in football. Presuming this narrow range of draw probabilities is correct, let's work through some examples and find out why they are so low.&lt;/p&gt;
&lt;p&gt;Goals scored in football matches pretty much follow a &lt;a href="https://en.wikipedia.org/wiki/Poisson_distribution"&gt;Poisson distribution&lt;/a&gt; as shown in the figure below. Sure, it's not perfect, but as you can see it's very close and the bits where it doesn't quite match up can be accounted for in our calculations later on.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151229_poisson_goals.png"&gt;&lt;/p&gt;
&lt;p&gt;Since we know the probability distribution that describes goal scoring it's actually fairly trivial to calculate the probability of a team scoring 0, 1, 2, 3 goals etc based on how many goals they score on average. For example, the figure below shows the probabilities for each goal from 0-10 for a match where the home team scores two goals on average and the away team scores one goal on average.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151229_poisson.png"&gt;&lt;/p&gt;
&lt;p&gt;And since we now know the probabilities of how many goals each team will score, we can calculate the probabilities of each individual score line that could feasibly happen and put it into a table as shown in the figure below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151229_matrix1.png"&gt;&lt;/p&gt;
&lt;p&gt;To find out the probability of a draw, we can then sum up the probabilities of all the tied score lines (which I've highlighted in yellow for clarity)  giving us a 21.1% chance of a draw, rising to 24.7% once we account for football's divergence from the true Poisson distribution. &lt;/p&gt;
&lt;p&gt;This is pretty close to Bet365's average but still seems low overall so let's do the same again but for two identical teams both expected to score two goals each. Surely the chances of a draw will increase massively right?&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151229_matrix2.png"&gt;&lt;/p&gt;
&lt;p&gt;Actually no, even with two identical teams the draw probability is still only around 25%, with home and away win probabilities of 37.5% each. Yes, even for two identical teams the least likely outcome is still the draw. &lt;/p&gt;
&lt;p&gt;Think about it, of all the possible score lines that could happen in a match comparatively few lead to ties - there are much more score lines that result in a home or away win so the probability of a draw is always low and a draw will never be the most likely outcome because of this.&lt;/p&gt;
&lt;p&gt;The chance of a draw does increase though for teams that score fewer goals as there is greater potential for both teams to fail to score, resulting in more nil-nils. If we change the example above from teams averaging two goals each to one goal each then the draw percentage increases from 25% to nearer 30%. &lt;/p&gt;
&lt;p&gt;There is a limit to how low scoring even the worst teams are though, which is why there is a natural limit on how high the bookmakers draw probabilities will ever reach and why it's extremely rare you'll ever see anything above Bet365's previous maximum of 33.3%.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Football is difficult to forecast accurately and draws can provide a quick way to sense check a model's feasibility - if you are seeing draw probabilities regularly breaking the 31% barrier then your model's accuracy is likely very poor and does not accurately reflect the true chances of seeing a draw. Plus, if your draw probabilities are inaccurate then your home / away win probabilities must be inaccurate too...&lt;/p&gt;
&lt;p&gt;Be very wary of high draw probabilities!&lt;/p&gt;</content><category term="Poisson"></category><category term="Poisson"></category></entry><entry><title>Updated Massey Ratings</title><link href="2015/11/24/updated-massey-rating/" rel="alternate"></link><published>2015-11-24T19:30:00+00:00</published><updated>2015-11-24T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-11-24:2015/11/24/updated-massey-rating/</id><summary type="html">&lt;p&gt;Updated Massey ratings for the English Premier League showing how well teams are really doing when you account for their strength of schedule...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I've have explained the theory behind Massey Ratings on the blog before (&lt;a href="http://pena.lt/y/2014/11/27/english-premier-league/"&gt;here&lt;/a&gt; and &lt;a href="http://pena.lt/y/2014/12/04/massey-ratings-for-football-part-two/"&gt;here&lt;/a&gt;) but incase you haven't come across them, they are a way of rating teams that takes into account the strength of the schedule they've faced. Teams get an overall rating that can also be split into separate defensive and offensive ratings to allow you to see where team's strengths and weaknesses are.&lt;/p&gt;
&lt;p&gt;I've had quite a few requests to update the results from the previous posts so here are the team ratings for the English Premier League so far this season.&lt;/p&gt;
&lt;h2 id="overall-ratings"&gt;Overall Ratings&lt;/h2&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151124_massey.png"&gt;
&lt;strong&gt;Figure One: EPL Massey Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="defensive-ratings"&gt;Defensive Ratings&lt;/h2&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151124_massey_defensive.png"&gt;
&lt;strong&gt;Figure Two: EPL Defensive Massey Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="offensive-ratings"&gt;Offensive Ratings&lt;/h2&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20151124_massey_offensive.png"&gt;
&lt;strong&gt;Figure Three: EPL Offensive Massey Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;At the time of writing this article Leicester City were top of the English Premier League after thirteen matches; however, Massey only has them as sixth best, suggesting their schedule may have been somewhat easier than other teams so far. &lt;/p&gt;
&lt;p&gt;Massey actually rates Tottenham Hotspur as the best team in the league, slightly above both Arsenal and Manchester City so their promising start to the season is potentially even better than the league table shows.&lt;/p&gt;
&lt;p&gt;Poor Chelsea though are rated as sixth worst overall, similar in overall ability to Swansea City. Their attack is doing okay, with a rating sandwiched somewhere between Southampton and Liverpool. It's their defence that is letting them down, with a rating even worse than Aston Villa's!&lt;/p&gt;
&lt;p&gt;I'll be posting updated Massey ratings throughout the season so it will be interesting to track how they change as the football season progresses.&lt;/p&gt;</content><category term="Ratings"></category><category term="EPL"></category><category term="Massey Ratings"></category><category term="Ranking"></category></entry><entry><title>Mathematically Optimising Your Fantasy Football Team: Redux</title><link href="2015/08/07/mathematically-optimising-fantasy-football-team-redux/" rel="alternate"></link><published>2015-08-07T19:30:00+00:00</published><updated>2015-08-07T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-08-07:2015/08/07/mathematically-optimising-fantasy-football-team-redux/</id><summary type="html">&lt;p&gt;The Premier League’s fantasy football is back ready for the new season so I thought I’d run through an example of how linear programming can help you mathematically select your team.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Due to popular request, here's an updated version of a &lt;a href="http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams"&gt;blog I wrote&lt;/a&gt; around this time last year where I used mathematical optimisation to select the best fantasy football team possible.&lt;/p&gt;
&lt;h2 id="the-results"&gt;The Results&lt;/h2&gt;
&lt;p&gt;The team the linear solver selected using the latest Premier League data is shown in the table below – this is the squad with the highest possible number of points that can be achieved using the constraints we are working within (e.g max three players from one team, can only spend £100 million etc).&lt;/p&gt;
&lt;table class=table&gt;
&lt;tr&gt; &lt;th&gt; Position &lt;/th&gt; &lt;th&gt; Team &lt;/th&gt; &lt;th&gt; Points &lt;/th&gt; &lt;th&gt; Name &lt;/th&gt; &lt;th&gt; Cost(£) &lt;/th&gt;  &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Goalkeeper &lt;/td&gt; &lt;td&gt; Swansea &lt;/td&gt; &lt;td align="center"&gt; 151 &lt;/td&gt; &lt;td&gt; Fabianski &lt;/td&gt; &lt;td align="center"&gt; 5.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Goalkeeper &lt;/td&gt; &lt;td&gt; Liverpool &lt;/td&gt; &lt;td align="center"&gt; 149 &lt;/td&gt; &lt;td&gt; Mignolet &lt;/td&gt; &lt;td align="center"&gt; 5.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Chelsea &lt;/td&gt; &lt;td align="center"&gt; 179 &lt;/td&gt; &lt;td&gt; Ivanovic &lt;/td&gt; &lt;td align="center"&gt; 7.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Chelsea &lt;/td&gt; &lt;td align="center"&gt; 177 &lt;/td&gt; &lt;td&gt; Terry &lt;/td&gt; &lt;td align="center"&gt; 7.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Everton &lt;/td&gt; &lt;td align="center"&gt; 142 &lt;/td&gt; &lt;td&gt; Jagielka &lt;/td&gt; &lt;td align="center"&gt; 5.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Liverpool &lt;/td&gt; &lt;td align="center"&gt; 142 &lt;/td&gt; &lt;td&gt; Clyne &lt;/td&gt; &lt;td align="center"&gt; 5.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Southampton &lt;/td&gt; &lt;td align="center"&gt; 140 &lt;/td&gt; &lt;td&gt; Bertrand &lt;/td&gt; &lt;td align="center"&gt; 5.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Chelsea &lt;/td&gt; &lt;td align="center"&gt; 233 &lt;/td&gt; &lt;td&gt; Hazard &lt;/td&gt; &lt;td align="center"&gt; 11.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Liverpool &lt;/td&gt; &lt;td align="center"&gt; 162 &lt;/td&gt; &lt;td&gt; Henderson &lt;/td&gt; &lt;td align="center"&gt; 7.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Spurs &lt;/td&gt; &lt;td align="center"&gt; 160 &lt;/td&gt; &lt;td&gt; Chadli &lt;/td&gt; &lt;td align="center"&gt; 7.00 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Crystal Palace &lt;/td&gt; &lt;td align="center"&gt; 132 &lt;/td&gt; &lt;td&gt; Bolasie &lt;/td&gt; &lt;td align="center"&gt; 6.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Swansea &lt;/td&gt; &lt;td align="center"&gt; 129 &lt;/td&gt; &lt;td&gt; Ki Sung-yueng &lt;/td&gt; &lt;td align="center"&gt; 5.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; Spurs &lt;/td&gt; &lt;td align="center"&gt; 191 &lt;/td&gt; &lt;td&gt; Kane &lt;/td&gt; &lt;td align="center"&gt; 9.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; West Brom &lt;/td&gt; &lt;td align="center"&gt; 148 &lt;/td&gt; &lt;td&gt; Berahino &lt;/td&gt; &lt;td align="center"&gt; 6.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; Leicester &lt;/td&gt; &lt;td align="center"&gt; 130 &lt;/td&gt; &lt;td&gt; Ulloa &lt;/td&gt; &lt;td align="center"&gt; 6.00 &lt;/td&gt; &lt;/tr&gt;
 &lt;/table&gt;

&lt;h2 id="limitations"&gt;Limitations&lt;/h2&gt;
&lt;p&gt;I'm going to point out the obvious limitations here before I get an inbox filled with grumpy messages. First of all, the new football season hasn't started so I’m using the points totals from last season. This means all the players at the promoted teams and any new signings to the Premier League will have zero points and so will not get selected. I’ll be running this script each week throughout the season for my own use, so as these players gain points they will start to get selected by the linear solver if they perform well enough. &lt;/p&gt;
&lt;p&gt;Also, this method doesn't account for injuries, suspensions, strength of the opposition, captains, substitutions, etc. However, it's worth noting that last season I finished in the top 1.5% of all players in the Premier League's competition using this idea to help inform my decisions. I didn't follow its recommendations rigidly but the data certainly guided my transfers and helped me finish as high up as I did.&lt;/p&gt;
&lt;p&gt;Finally, as I noted last year, it still seems that spending your money evenly across the squad works better than splashing a heap of money on a couple of star players and then making up the numbers with a load of budget players. Apart from the Chelsea players, it's a fairly run-of-the-mill squad that doesn't look too impressive at first glance but it's actually the highest scoring squad you could have purchased.&lt;/p&gt;
&lt;p&gt;Good luck!&lt;/p&gt;
&lt;h2 id="appendix"&gt;Appendix&lt;/h2&gt;
&lt;p&gt;All code is available on &lt;a href="https://github.com/martineastwood/penalty/tree/master/fantasy_football_optimiser"&gt;GitHub&lt;/a&gt;&lt;/p&gt;</content><category term="Fantasy Football"></category><category term="Fantasy Football"></category></entry><entry><title>Expected Goals And Support Vector Machines</title><link href="2015/07/13/expected-goals-svm/" rel="alternate"></link><published>2015-07-13T19:30:00+00:00</published><updated>2015-07-13T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-07-13:2015/07/13/expected-goals-svm/</id><summary type="html">&lt;p&gt;I've written about expected goals on this website before but I've changed approach recently so I thought I'd write up some of the different ideas I've been playing around with...&lt;/p&gt;</summary><content type="html">&lt;h3 id="tldr"&gt;tl;dr&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Support Vector Machines provide a viable approach for calculating expected goals&lt;/li&gt;
&lt;li&gt;Forecasting expected goals conceded seems more difficult than expected goals scored&lt;/li&gt;
&lt;li&gt;Expected goals have lots of variability associated with them so calculate confidence intervals rather than just point estimates&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt;
&lt;p&gt;I've written about &lt;a href="http://pena.lt/y/category/expected-goals.html"&gt;expected goals&lt;/a&gt; on this website before but I've changed approach recently so I thought I'd write up some of the different ideas I've been playing around with.&lt;/p&gt;
&lt;p&gt;My previous expected goals model was very basic. It was essentially a non-linear regression using an exponential decay curve fitted against shot xy coordinates. Sure, there were different curves for foot shots, headed shots etc but that was about as sophisticated as it got.&lt;/p&gt;
&lt;p&gt;An alternative approach though is to consider expected goals as a binary problem. Shots either end up in the net and score a goal or they don't. There is no other outcome, no middle ground, no half way, it's either a goal or it's not a goal. So, since this gives us just two distinct outcomes to predict, we can view expected goals as a classification problem rather than regression. &lt;/p&gt;
&lt;p&gt;What's the difference? Well, classification and regression are inherently linked but to put it simply classification forecasts whether something will happen where as regression forecasts how much something will happen.&lt;/p&gt;
&lt;p&gt;This means we are interested in predicting the probability a shot produces a goal rather than how much of a goal a shot is worth (which is what my previous model did). It may sound a somewhat subtle difference between the two but it opens up the opportunity to use a number of different modelling techniques.&lt;/p&gt;
&lt;h3 id="support-vector-machines"&gt;Support Vector Machines&lt;/h3&gt;
&lt;p&gt;Support vector machines (SVM) are models often used in machine learning for data classification. They have the ability to analyse data sets and identify patterns that can then be used to forecast classes for new data points.&lt;/p&gt;
&lt;p&gt;SVMs do this by identifying the hyperplane that separates the different categories within a training data set by the largest margin. Okay, that sounds fairly complex so as a simple example, imagine having a collection of apples and oranges sat on a table top. You could separate the different fruit by drawing a line between them so that all the apples and oranges were on opposite sides of the line. Next, imagine the fruit are now floating at different heights in the air. That simple line we drew before isn't going cut it so instead we slide a sheet of paper between the fruit to separate the apples from the oranges. As we add in more and more dimensions the sheet of paper becomes too simple to represent the separation between the fruit so we move to using a &lt;a href="https://en.wikipedia.org/wiki/Hyperplane"&gt;hyperplane&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;SVMs also have the added benefits that they can handle non-linear data and calculate probabilities rather than just output binary predictions. So, in terms of expected goals we've got a technique that can handle the &lt;a href="http://pena.lt/y/category/expected-goals.html"&gt;non-linearities in the shot distances&lt;/a&gt; and provide probabilities of shots resulting in goals. Sounds promising!&lt;/p&gt;
&lt;h3 id="the-data"&gt;The Data&lt;/h3&gt;
&lt;p&gt;Support Vector Machines rely on a technique known as supervised learning. This means we have to train the model using a set of shots labelled with whether they lead to a goal or not. This training set was created from approximately 30,000 shots taken from the English Premier League's 2012/2013 and 2013/2014 seasons. This included open-play shots taken with head or feet, as well as free kicks and penalties. &lt;/p&gt;
&lt;p&gt;The model was then validated using shots taken during the English Premier League's 2014/2015 season. The reason for separating the data out like this is that it helps avoid &lt;a href="https://en.wikipedia.org/wiki/Overfitting"&gt;overfitting&lt;/a&gt; and provides a much more realistic estimation of how well the model really works. We want to create a model that generalises well and can forecast future shots. If we test the model on the same data we trained it on then we run the risk of optimising towards noise in the data and creating something that only forecasts well over the dataset it was trained with.&lt;/p&gt;
&lt;h3 id="initial-results"&gt;Initial Results&lt;/h3&gt;
&lt;p&gt;Let's start off simple and see if the preliminary results look feasible. In the test dataset there were 942 goals scored, while the trained SVM predicted 939 goals. This means we're out by just 3 goals, which is around 0.3% of the total. That looks like a pretty good start but we need to be careful here. We could be making really bad forecasts and just be lucky that all the errors are stacking up nicely and masking how bad we are.  &lt;/p&gt;
&lt;p&gt;We can find out the error by looking at the &lt;a href="https://en.wikipedia.org/wiki/Root-mean-square_deviation"&gt;root mean square error (RMSE)&lt;/a&gt;. This metric aggregates the differences between the model's forecasts and what actually happened to provide a single value representing the average error we made. For our SVM this came out at 0.269.&lt;/p&gt;
&lt;p&gt;On its own this RMSE doesn't really mean much. Is 0.269 good, bad or average? RMSE isn't really meant to be used on its own though, its strength lies in providing a simple metric that can be used to compare the predictive power of different models. So what do we compare with? Well, a recent article on &lt;a href="http://regressing.deadspin.com/why-soccers-most-popular-advanced-stat-kind-of-sucks-1685563075"&gt;Deadspin criticising expected goals&lt;/a&gt; suggested that ignoring shot location and just using the average conversion rate gives equivalent results so let's use that as our baseline to compare against. &lt;/p&gt;
&lt;p&gt;This naïve model, which is effectively just measuring total shots, predicted 978 goals for the test data set, with an RMSE of 0.294. So, on both metrics the SVM out performed the baseline meaning we are adding something worthwhile to the predictions.&lt;/p&gt;
&lt;h3 id="are-you-certain-about-that"&gt;Are You Certain About That?&lt;/h3&gt;
&lt;p&gt;So far all the forecasts have been individual point estimates. We can do better than that though and provide a range of values that our expected goals forecast likely falls within.&lt;/p&gt;
&lt;p&gt;A common way to do this for SVMs is &lt;a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)"&gt;Bootstrapping&lt;/a&gt;. Instead of using our training set to create a single model, we repeatedly resample the shots to create thousands of different combinations of our data and train different SVMs with them. We can then take the average prediction from our set of SVMs as the point estimate and use the distribution of all the individual predictions to calculate a confidence interval representing the uncertainty associated with our forecast.&lt;/p&gt;
&lt;p&gt;Doing this with 1,000 SVMs each trained on 30,000 resampled shots gives us our expected goals forecast above of 939 goals, with a 95% confidence interval of 820 - 1057 goals (approximately 12% in either direction, Figure One).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150713_expg_distribution.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure One: Distribution of expected goals&lt;/em&gt; &lt;/p&gt;
&lt;h3 id="penalties"&gt;Penalties&lt;/h3&gt;
&lt;p&gt;Another quick check of the SVM's feasibility is how well it performs on penalties. In the test data set there were 83 penalties taken of which 20 were missed, making a penalty worth 0.76 goals on average.  Using the Bootstrapped SVM, a penalty is valued at 0.75 expected goals, with a 95% confidence interval of 0.75 - 0.79. The confidence interval looks pretty narrow and neatly encompasses the expected value so again the our SVM approach looks to be on the right track.&lt;/p&gt;
&lt;p&gt;I previously posted an &lt;a href="http://pena.lt/y/2014/08/28/expected-goals-foot-shots-versus-headers/"&gt;example from my exponential decay model&lt;/a&gt; showing that a headed shot from taken from the penalty spot had a value of around 0.08 expected goals. Moving this example to the Bootstrapped SVM gives 0.081 expected goals, with a 95% confidence interval of 0.076 - 0.085. The exponential decay model worked reasonably well at the time (with some limitations) so it's reassuring to see the results are still in line with each other. &lt;/p&gt;
&lt;h3 id="expected-goals-by-team"&gt;Expected Goals By Team&lt;/h3&gt;
&lt;p&gt;This all looks good so far but we're not generally interested in predicting the total expected goals for a league - expected goals per team are of much more interest. Figure Two below shows expected goals scored versus actual goals scored for the English Premier League 2014/2015 season. The line denotes the linear regression between the two values and the shaded region the 95% confidence interval for the regression. For those interested, the r&lt;sup&gt;2&lt;/sup&gt; for the regression was 0.822 and the RMSE was 7.21. For comparison, the naïve model had an r&lt;sup&gt;2&lt;/sup&gt; of 0.707 and an RMSE of 9.16, both of which were inferior to the Bootstrapped SVM.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150713_expg_for.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure Two: Expected versus actual goals for&lt;/em&gt; &lt;/p&gt;
&lt;p&gt;Figure Three below shows the same plot but for expected goals conceded versus actual goals conceded. Notice how much wider the shaded region is, meaning we have much less certainty forecasting goals conceded. This matches what &lt;a href="https://2plus2equals11.wordpress.com/2015/05/31/great-expectations/"&gt;Will Gurpinar-Morgan&lt;/a&gt; previously reported (incidentally if you haven't Will's article on expected goals it's well worth a read as he goes into the uncertainty aspect in more detail than I have here). As well as the confidence interval widening, the r&lt;sup&gt;2&lt;/sup&gt; dropped to 0.521 and the RMSE increased to 16.33. For comparison, the r&lt;sup&gt;2&lt;/sup&gt; (0.486) and RMSE (20.57) of the naïve model also worsened, and again were inferior to the SVM.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150713_expg_away.png"&gt;&lt;/p&gt;
&lt;h3 id="expected-goals-by-player"&gt;Expected Goals By Player&lt;/h3&gt;
&lt;p&gt;And just for completion, Figure Four shows expected goals versus actual goals at the player level. As expected, since we're slicing the data to a much more granular level, the r&lt;sup&gt;2&lt;/sup&gt; is slightly lower than for team goals at  0.786, with an RMSE of 1.70. It looks better than I initially expected though as the 95% confidence interval looks surprisingly narrow.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150713_player_expg.png"&gt;&lt;/p&gt;
&lt;h3 id="discussion"&gt;Discussion&lt;/h3&gt;
&lt;p&gt;This article is already way too long so I'm going to draw it too a close here (congratulations if you made it all the way through!) but I am planning on writing more about expected goals and SVMs in future posts.&lt;/p&gt;
&lt;p&gt;The standout points for me though are that SVMs seems a viable approach to calculating expected goals and comfortably outperform the naïve model that ignores shot locations. The SVM model also handles both foot and head shots, as well as free kicks and penalties, which my previous exponential decay model could not.&lt;/p&gt;
&lt;p&gt;The fact that expected goals conceded forecasts are so much poorer than those for expected goals scored is intriguing and warrants further study. Presumably, there is some factor having a noticeable impact on whether the defending team concedes that the model does not currently account for and which has less of an impact on whether the attacking team scores.&lt;/p&gt;
&lt;p&gt;Finally, since there is a reasonable amount of uncertainty associated with expected goals forecasts I strongly advise calculating confidence intervals rather than just point estimates as it adds much more context to the metric.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Player Expected Goals"></category></entry><entry><title>PlayerRatings And False Negatives</title><link href="2015/04/23/playerratings-and-false-negatives/" rel="alternate"></link><published>2015-04-23T19:30:00+00:00</published><updated>2015-04-23T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-04-23:2015/04/23/playerratings-and-false-negatives/</id><summary type="html">&lt;p&gt;My &lt;a href="http://pena.lt/y/2015/04/09/backtesting-playerratings/"&gt;last article&lt;/a&gt; looked at how well my &lt;a href="http://www.pena.lt/y/category/playerrating.html"&gt;PlayerRatings&lt;/a&gt;  model predicted which young players would go on to have successful careers. This time we explore false negatives - which top players may have wrongly had low PlayerRatings in their youth...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;My &lt;a href="http://pena.lt/y/2015/04/09/backtesting-playerratings/"&gt;last article&lt;/a&gt; looked at how well my &lt;a href="http://www.pena.lt/y/category/playerrating.html"&gt;PlayerRatings&lt;/a&gt; model predicted which young players would go on to have successful careers. The results looked really exciting, with the majority of players aged 21 or under who were flagged by the model as having high potential going on to become world class players.&lt;/p&gt;
&lt;p&gt;Although it's fantastic that the model is correctly predicting these players' careers, there's something else we need to test and that's false negatives - how many talented players did the model miss and incorrectly flag as being unlikely to be make the grade.&lt;/p&gt;
&lt;h2 id="false-negatives"&gt;False Negatives&lt;/h2&gt;
&lt;p&gt;False negatives here refers to those players who were initially ranked lowly by their PlayerRating score but went on to become world class players anyway. One way to identify these players is to work backwards from the top players of today to see how they were rated early on in their careers compared with their peers.&lt;/p&gt;
&lt;p&gt;The table below shows the top 25 players in the world today as judged by their PlayerRating score (incidentally, I'd be interested to hear any feedback on this list. Does it seem feasible? Are there any major names you think are missing? Anyone who doesn't deserve to be in the Top 25?).&lt;/p&gt;
&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Player&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Rank&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lionel Messi&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cristiano Ronaldo&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sergio Ramos&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thomas Müller&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cesc Fàbregas&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rafinha&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bastian Schweinsteiger&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manuel Neuer&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mesut Özil&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Philipp Lahm&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wayne Rooney&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Piqué&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Karim Benzema&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcelo&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iker Casillas&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iniesta&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jérôme Boateng&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Javier Mascherano&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Toni Kroos&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dani Alves&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ángel Di María&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maicon&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;David Silva&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arjen Robben&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vincent Kompany&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table1: The World's top 25 players by PlayerRating&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;Okay, now we've identified the best players in the world, let's take a look at how they ranked on their 21st birthday compared with all the other players aged 21 or under at that time.&lt;/p&gt;
&lt;p&gt;If a player's ranking was considerably lower at age 21 compared with their peers, yet they've still gone on to become world class, it suggests their initial PlayerRating score was too low and so can be classified as a False Negative.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Player&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Rank At Age 21&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lionel Messi&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cristiano Ronaldo&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sergio Ramos&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thomas Müller&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cesc Fàbregas&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rafinha&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bastian Schweinsteiger&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manuel Neuer&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mesut Özil&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Philipp Lahm&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wayne Rooney&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Piqué&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Karim Benzema&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcelo&lt;/td&gt;
&lt;td&gt;283&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iker Casillas&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iniesta&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jérôme Boateng&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Javier Mascherano&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Toni Kroos&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dani Alves&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ángel Di María&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maicon&lt;/td&gt;
&lt;td&gt;123&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;David Silva&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arjen Robben&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vincent Kompany&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table2: Player's ranks at age 21 compared with all other players aged 21 and under at the time&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="discussion"&gt;Discussion&lt;/h2&gt;
&lt;p&gt;Again, the results look pretty impressive - the mode (or most frequently occurring rank) was position one, with a median rank of five suggesting that the overwhelming majority of today's world class players were correctly flagged as having high potential by the time they were 21.&lt;/p&gt;
&lt;p&gt;The results aren't perfect though, especially for Maicon, Marcelo, Dani Alves and perhaps Ángel Di María but there is an obvious connection between these players - they all started their careers playing for South American teams.&lt;/p&gt;
&lt;p&gt;Unfortunately, since we are going back 8-10 years in many cases, there was less data publicly available documenting the earliest stages of these player's careers, hence their reduced ratings. As I've mentioned in previous articles, minutes-played is an important factor in determining how confident the PlayerRating model is in its predictions. Without an adequate volume of data, particularly in the early years of a player's career, the model tends to be cautious in its recommendations.&lt;/p&gt;
&lt;p&gt;Going forwards though, this becomes a non-issue as all the match data needed to calculate PlayerRatings is now freely available (albeit with a fair bit of hard work...) but it does help reinforce the point that any model is only ever as good as the data available to train it on.&lt;/p&gt;</content><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="Recruitment"></category></entry><entry><title>Backtesting PlayerRatings</title><link href="2015/04/09/backtesting-playerratings/" rel="alternate"></link><published>2015-04-09T19:30:00+00:00</published><updated>2015-04-09T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-04-09:2015/04/09/backtesting-playerratings/</id><summary type="html">&lt;p&gt;Following my &lt;a href="http://pena.lt/y/2015/03/09/playerrating-and-team-quality/"&gt;last article&lt;/a&gt; discussing my &lt;a href="http://www.pena.lt/y/2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/"&gt;PlayerRating model&lt;/a&gt; for quantifying footballers, JackIO challenged me to test the model by looking at what young players it recommended...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Following my &lt;a href="http://pena.lt/y/2015/03/09/playerrating-and-team-quality/"&gt;last article&lt;/a&gt; discussing my &lt;a href="http://www.pena.lt/y/2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/"&gt;PlayerRating model&lt;/a&gt; for quantifying footballers, JackIO challenged me to test the model by looking at what young players it recommended. Rather than looking at what players score highly now though, I thought it would be more interesting to go back in time and look at what players it rated ten years ago. This way we can see whether the PlayerRating model was accurate and whether these players actually went on to have successful careers or not.&lt;/p&gt;
&lt;h2 id="a-decade-ago"&gt;A Decade Ago&lt;/h2&gt;
&lt;p&gt;Okay, let's go back a decade to April 2005. &lt;a href="https://www.youtube.com/watch?v=Xx8l5l1g0wA&amp;amp;spfreload=10"&gt;Tony Christie and Peter Kay&lt;/a&gt; were at number one in the music charts, Tony Blair and George Bush were at war with Iraq and &lt;a href="http://www.premierleague.com/en-gb/matchday/league-table.html?season=2004-2005&amp;amp;month=APRIL&amp;amp;timelineView=date&amp;amp;toDate=1112396400000&amp;amp;tableView=CURRENT_STANDINGS"&gt;Bolton Wanderers&lt;/a&gt; were five points away from the Champions League with just seven matches to play.&lt;/p&gt;
&lt;p&gt;So who were the top players to sign back then? The table below shows the top twenty rated players aged under 21 on the 1st April 2005. I've also added in the player's values at that point according to &lt;a href="http://www.transfermarkt.co.uk/"&gt;transfermarkt&lt;/a&gt;, the peak value they reached, how many full international caps they've had to date and the team they were playing for at the time.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;b&gt;Player&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Nationality&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Age&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Team&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Value (£m)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Peak (£m)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Caps&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Alexis Ruano Delgado&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Málaga&lt;/td&gt;
&lt;td&gt;1.75&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Wesley Sneijder&lt;/td&gt;
&lt;td&gt;Netherlands&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;AFC Ajax&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;39.6&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Bastian Schweinsteiger&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Bayern Munich&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;35.2&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Llorente&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Atlético Madrid&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;26.4&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;David Clarkson&lt;/td&gt;
&lt;td&gt;Scotland&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Motherwell FC&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;0.8&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Nigel de Jong&lt;/td&gt;
&lt;td&gt;Netherlands&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;AFC Ajax&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;18.5&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Michael Rensing&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Bayern Munich&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Gonzalo Rodríguez&lt;/td&gt;
&lt;td&gt;Argentina&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Villarreal CF&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;12.3&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Cristiano Ronaldo&lt;/td&gt;
&lt;td&gt;Portugal&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;20.1&lt;/td&gt;
&lt;td&gt;105.6&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Glen Johnson&lt;/td&gt;
&lt;td&gt;England&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Chelsea FC&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Christian Lell&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Bayern Munich&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;3.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Javier Mascherano&lt;/td&gt;
&lt;td&gt;Argentina&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;River Plate&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;26.4&lt;/td&gt;
&lt;td&gt;111&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Iniesta&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Barcelona&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;61.6&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Leighton Baines&lt;/td&gt;
&lt;td&gt;England&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Wayne Rooney&lt;/td&gt;
&lt;td&gt;England&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Fernando Torres&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Atlético Madrid&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Obafemi Martins&lt;/td&gt;
&lt;td&gt;Nigeria&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Inter Milan&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Andreas Ottl&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Bayern Munich&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Piotr Trochowski&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Hamburger SV&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Per Mertesacker&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Hannover 96&lt;/td&gt;
&lt;td&gt;4.4&lt;/td&gt;
&lt;td&gt;16.7&lt;/td&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;Reassuringly, of these 20 players 16 went on to receive full international caps for their countries and the overwhelming majority (if not all depending on your football geekiness) are recognizable names such as Ronaldo, Iniesta, Schweinsteiger, Sneijder etc. There's also a nice mixture of positions suggesting the model isn't necessarily biased to certain positions.&lt;/p&gt;
&lt;p&gt;A somewhat surprise inclusion on the list is David Clarkson who's had somewhat of a journeyman career to date. Looking back at his &lt;a href="http://en.wikipedia.org/wiki/David_Clarkson"&gt;playing history&lt;/a&gt; though he broke into Motherwell's first team shortly after his 17th birthday and went on to score 14 goals the following season, so his inclusion back in 2005 seems valid. I don't know enough about Scottish football though to know whether that was a fluke season but his time afterwards in English football was far from successful, culminating with him being part of the Bristol Rovers squad that was relegated to non-league football at the end of the 2013/2014 season. If anyone knows more about him I'd be intrigued to hear!&lt;/p&gt;
&lt;p&gt;Another point worth highlighting is that all the players on the list are aged 19 or 20 and one reason for this is that minutes-played is an important factor in how the model determines a player's true rating - the more game time a player has then the more confidence the model has in its estimation of the player's true ability. Also, the majority of players in the top 20 are seemingly already at big teams by this age. It's alright if you're Chelsea and you have the finances to Hoover up all the young talent from other teams but for smaller teams looking to find promising players you've potentially missed the boat by that age.&lt;/p&gt;
&lt;p&gt;Because of this, I also took a quick look at what players aged 18 or under the model recommended too. I fully expected these results to look worse because these players have likely had less game time meaning there is less data for the PlayerRating model to work with but even so the results still look promising.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;b&gt;Player&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Nationality&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Age&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Team&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Value (£m)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Peak (£m)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Caps&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Vincent Kompany&lt;/td&gt;
&lt;td&gt;Belgium&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;RSC Anderlecht&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Yoan Gouffran&lt;/td&gt;
&lt;td&gt;France&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Caen&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Aiden McGeady&lt;/td&gt;
&lt;td&gt;Ireland&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Celtic&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Tony McMahon&lt;/td&gt;
&lt;td&gt;England&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Middlesbrough&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;1.3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sergio Ramos&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Sevilla FC&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Cesc Fàbregas&lt;/td&gt;
&lt;td&gt;Spain&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Arsenal FC&lt;/td&gt;
&lt;td&gt;6.6&lt;/td&gt;
&lt;td&gt;48.4&lt;/td&gt;
&lt;td&gt;94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Igor Akinfeev&lt;/td&gt;
&lt;td&gt;Russia&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;CSKA Moscow&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;17.6&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Tom Huddlestone&lt;/td&gt;
&lt;td&gt;England&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Derby County&lt;/td&gt;
&lt;td&gt;0.3&lt;/td&gt;
&lt;td&gt;10.6&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Yoann Gourcuff&lt;/td&gt;
&lt;td&gt;France&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Rennes&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;21.1&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Fernando Gago&lt;/td&gt;
&lt;td&gt;Argentina&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Boca Juniors&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Quincy Owusu-Abeyie&lt;/td&gt;
&lt;td&gt;Ghana&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Arsenal FC&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;2.5&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Manuel Neuer&lt;/td&gt;
&lt;td&gt;Germany&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Schalke 04&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Christian Fuchs&lt;/td&gt;
&lt;td&gt;Austria&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;SV Mattersburg&lt;/td&gt;
&lt;td&gt;0.18&lt;/td&gt;
&lt;td&gt;6.6&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Edin Džeko&lt;/td&gt;
&lt;td&gt;Bosnia-Herzegovina&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Zeljeznicar&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;28.2&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;James Morrison&lt;/td&gt;
&lt;td&gt;Scotland&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Middlesbrough&lt;/td&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;6.6&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Ibrahim Afellay&lt;/td&gt;
&lt;td&gt;Netherlands&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;PSV Eindhoven&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;12.3&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Ryan Babel&lt;/td&gt;
&lt;td&gt;Netherlands&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;AFC Ajax&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;15.2&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Marco Motta&lt;/td&gt;
&lt;td&gt;Italy&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Atalanta&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Charles NZogbia&lt;/td&gt;
&lt;td&gt;France&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Newcastle Utd&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;10.6&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Stéphane Mbia&lt;/td&gt;
&lt;td&gt;Cameroon&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Rennes&lt;/td&gt;
&lt;td&gt;0.2&lt;/td&gt;
&lt;td&gt;13.2&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;Anyone fancy signing an 18-year-old Manuel Neuer for £60k? How about Christian Fuchs for £180k? Tony McMahon for £40k? Who? To be fair to Tony McMahon, back in 2005 he was aged 18, had represented England at under-16, under-17 and under-19 level, captained Middlesbrough to the FA Youth championship, was playing regular first team football and had appeared in the UEFA Cup. It's an impressive start to anyone's career, and who knows what would have happened had he not knackered his knees and broken his leg from pretty much that point onwards.&lt;/p&gt;
&lt;p&gt;I know the results of all this are certainly not perfect (or even scientific) and there are most likely some really good players who were ranked outside the top 20 and so haven't made this list (e.g a very young Lionel Messi at position 28) but even so the results still so far look feasible, especially considering the young age of the players. Had anyone taken a punt on this group of players then they could have put together a really exciting squad for a tiny price that would have made a hefty long term profit. Sounds good to me!&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Okay, let's get one thing straight before my inbox is filled with people moaning - I'm not advocating teams should sign players based just on numbers let alone my PlayerRating model - I firmly believe football scouting should be a mixture of the qualitative and quantitative. Also, I know this article is somewhat subjective about what classes as successful / not successful and just because Transfermarkt says Manuel Neuer was worth £60k back in 2005 doesn't mean he was actually available for that price. I'm pretty sure Shalke would have twigged he was rather good at football by then and not wanted to sell him off quite so cheaply!!!&lt;/p&gt;
&lt;p&gt;However, considering the paucity of data available for young players and the high attrition rate of young players making the grade I think the results of the PlayerRating model look really promising - I haven't tweaked anything here, these are the actual players the model would have rated back then based purely on the data available up until April 2005. I've still got lots more ideas for improving the PlayerRating system further but even so I'm really excited by the results so far.&lt;/p&gt;</content><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="Recruitment"></category></entry><entry><title>PlayerRatings And Team Quality</title><link href="2015/03/09/playerrating-and-team-quality/" rel="alternate"></link><published>2015-03-09T19:30:00+00:00</published><updated>2015-03-09T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-03-09:2015/03/09/playerrating-and-team-quality/</id><summary type="html">&lt;p&gt;My previous article introduced PlayerRatings, a mathematical model I’ve been working on over the past few months to quantify the ability of individual footballers. One of the nice characteristics of this approach is that player ratings can be aggregated together to create team ratings...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Last week I introduced &lt;a href="http://www.pena.lt/y/2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/"&gt;PlayerRatings&lt;/a&gt;, a mathematical model I’ve been working on over the past few months to quantify the ability of individual footballers.&lt;/p&gt;
&lt;h2 id="aggregable"&gt;Aggregable&lt;/h2&gt;
&lt;p&gt;One of the nice characteristics of this approach is that player ratings can be aggregated together to create team ratings that correlate strongly with their team’s performance. For example, Figure One below show the relationship between the average rating of a team’s players and how many points the team achieved in the English Premier League for the past eight seasons.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150309_playerrating_vs_points.png"&gt;&lt;/p&gt;
&lt;p&gt;The r&lt;sup&gt;2&lt;/sup&gt; value for this correlation is 0.73, essentially meaning that 73% of the variance associated with how many points a team achieves is captured by the average ratings of its players. To put this number into context, &lt;a href="http://www.pena.lt/y/2013/04/02/understanding-total-shot-ratio-in-football/"&gt;Total Shot Ratio&lt;/a&gt; (TSR), which is widely used among the analytics community to assesses a team's performance, has an r&lt;sup&gt;2&lt;/sup&gt; value of around 0.68 when correlated with points. &lt;/p&gt;
&lt;p&gt;This means that something as simple as taking a team’s average PlayerRating potentially provides us with more information than its Total Shot Ratio does. There is also scope to potentially improve this further as there is no doubt a more elegant solution than just taking the average, e.g factoring in substitute appearances, injuries, opposition quality etc.**&lt;/p&gt;
&lt;h2 id="baselines"&gt;Baselines&lt;/h2&gt;
&lt;p&gt;Since PlayerRatings correlate well with points we can use this data to set approximate baselines for what quality of squad is needed to win the league, qualify for Europe, avoid relegation etc (Figure Two).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20150309_playerrating_baselines.png"&gt;&lt;/p&gt;
&lt;p&gt;For example, over the past few seasons the Premier League champions have had an average PlayerRating of around 141 +/- 11, teams reaching the Champions League through positions two, three or four have averaged a PlayerRating of 133 +/- 11, teams in positions five to seventeen have averaged a PlayerRating of 111 +/- 7 and teams relegated have averaged a PlayerRating of 106 +/- 4.&lt;/p&gt;
&lt;p&gt;So, as it stands this season, the only teams whose PlayerRatings look good enough to win the league are Chelsea and Manchester City (no surprises there then). At the bottom of the table though Hull, Sunderland, Aston Villa, Queens Park Rangers, Leicester, Crystal Palace and West Bromwich Albion all have team ratings that fall within the typical range of relegation candidates. &lt;/p&gt;
&lt;p&gt;This is reassuring as it matches the clear lack of parity we see in the English Premier League - there are much fewer teams capable of winning the league than there are at risk of relegation, which matches &lt;a href="http://www.pena.lt/y/2012/11/20/disparity-in-european-football-leagues/"&gt;previous observations I've made&lt;/a&gt;. I'm really interested in the distributions of these different groups so I'm planning at looking at this in more detail over the coming weeks.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;As well as allowing individual players to be quantified, PlayerRatings can also be aggregated together to create team ratings that correlate more strongly with points achieved over the course of a season than TSR. These team ratings could be then be used to assess squad quality or evaluate whether potential signings are worthwhile. For example, will that expensive new striker on a five-year contract actually increase your squad's average rating and push it closer to the level required to challenge for Europe? &lt;/p&gt;
&lt;h2 id="addendum"&gt;Addendum&lt;/h2&gt;
&lt;p&gt;**To be a bit more detailed, it’s actually the average of all the team’s players who started a league fixture that season weighted by the number of minutes they played in total.&lt;/p&gt;</content><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="TSR"></category></entry><entry><title>PlayerRating: A Bayesian Method For Evaluating Football Players</title><link href="2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/" rel="alternate"></link><published>2015-02-26T19:30:00+00:00</published><updated>2015-02-26T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-02-26:2015/02/26/playerrating-a-bayesian-method-for-evaluating-football-players/</id><summary type="html">&lt;p&gt;I originally submitted the idea behind this article to the recent Opta Pro Forum and although it was turned down I thought I’d write it up anyway incase anyone else was interested in the results...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I originally submitted the idea behind this article to the recent &lt;a href="http://www.optasportspro.com/about/optapro-blog/posts/2014/news-optapro-analytics-forum-opens-for-entries.aspx"&gt;Opta Pro Forum&lt;/a&gt; and although it was turned down I thought I’d write it up anyway incase anyone else was interested in the results.&lt;/p&gt;
&lt;h2 id="plusminus"&gt;Plus/Minus&lt;/h2&gt;
&lt;p&gt;The premise of my abstract was that while the &lt;a href="http://en.wikipedia.org/wiki/Plus-minus"&gt;plus/minus score&lt;/a&gt; is popular in the analysis of many sports, such as NHL and MLB, it hasn't taken off in football. And there is a good reason for that - it’s hard to do.&lt;/p&gt;
&lt;p&gt;For anyone who hasn’t come across them before, plus/minus scores measure a team's goal difference while an individual player is on the pitch. Players with a positive score are considered to have a favourable effect on the team’s overall performance while those with a negative score are causing the team to perform worse.&lt;/p&gt;
&lt;p&gt;It’s a simple concept that sounds feasible enough but it has a big flaw in that it treats all players equally so is biased towards players on good teams. Think what would happen if you put me into Barcelona’s first team, they’d probably still win more matches than they’d lose and I’d have a positive plus/minus making me look like a great footballer. In reality, I’d have been flailing around hopelessly and would have been lucky to have even touch the ball let alone made a positive contribution.&lt;/p&gt;
&lt;h2 id="the-adjusted-plusminus"&gt;The Adjusted Plus/Minus&lt;/h2&gt;
&lt;p&gt;One solution to this bias is the adjusted plus/minus. This incorporates a linear regression in to the calculation to account for the effect of all the other players on the pitch during the match in order to avoid a player’s score being inflated by his team mates.&lt;/p&gt;
&lt;p&gt;However, as &lt;a href="http://www.soccermetrics.net/player-performance/adjusted-plus-minus-deep-analysis"&gt;Howard Hamilton&lt;/a&gt; has previously shown on his blog, the adjusted plus/minus doesn’t work well for football. With only three substitutes per team and 38 league matches per season there is little data available to cover all the possible combinations of players. Plus some players, such as goalkeepers in particular, play a large portion of the available minutes making it virtually impossible to distinguish the true effect of removing them from the team. And with football being such a low scoring game there is a lot of noise in the data increasing the regression’s prediction errors.&lt;/p&gt;
&lt;p&gt;As an example, here’s the current top 10 best players as rated by adjusted plus/minus scores for the English Premier league so far this season.&lt;/p&gt;
&lt;table class="table"&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;strong&gt;Player&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Joleon Lescott&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Ahmed El Mohamadi&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Tom Heaton&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Jonas Olsson&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Ashley Williams&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Gareth Barry&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Wes Morgan&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Victor Moses&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Oussama Assaidi&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Steven NZonzi&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Now, I don’t know about you but I’m pretty sure Joleon Lescott is not the league’s best player. And what about some of the Premier League’s big stars? Well, Cesc Fàbregas is rated as the 86th best player, Diego Costa is 199th and Sergio Agüero is down in position 256. In fact the error for Sergio Agüero’s plus/minus score is so high that we can’t even tell whether he has a positive or negative rating. Yep, according to the adjusted plus/minus Sergio Agüero may actually be having a negative effect on Manchester City’s performances this season. Okaaay then.&lt;/p&gt;
&lt;h2 id="regularised-adjusted-plusminus"&gt;Regularised Adjusted Plus/Minus&lt;/h2&gt;
&lt;p&gt;The next step from here is to try and reduce the errors by moving from a standard linear regression to a &lt;a href="http://en.wikipedia.org/wiki/Tikhonov_regularization"&gt;ridge regression&lt;/a&gt;. I’m not going to go into too much detail as again &lt;a href="http://www.soccermetrics.net/player-performance/adjusted-plus-minus-deep-analysis"&gt;Howard Hamilton&lt;/a&gt; has a great article on this but the idea is that ridge regression helps minimise the errors associated with the player’s plus/minus scores. As with everything in life though, there is no such thing as a free lunch and by making the regression behave better we incorporate some bias into the results. But is it worth it? Nope, the results using ridge regression still have too much error to be useful. Hands up if you think Chris Smalling is the Premier League’s best player. Nobody? Right, let's move on then.&lt;/p&gt;
&lt;h2 id="next-steps"&gt;Next Steps&lt;/h2&gt;
&lt;p&gt;So now what? One of the major problems we have is a lack of data for many of the players so let's take a more Bayesian approach and add in something called a Prior. These are basically probability distributions covering some aspect of what we want to predict that expresses our uncertainty before we account for the evidence. Where we don’t have much data this Prior helps inform our predictions but as we accumulate more data the Prior’s influence decreases and the real evidence holds more weight.&lt;/p&gt;
&lt;p&gt;Okay, that probably sounds a bit complicated if you don’t have a maths background so here’s an example: imagine you’re watching a footballer play for the first time, there is a chance the player may be as good as Lionel Messi, there is a chance they may be as bad as Tom Cleverley, and there is a chance they may be somewhere in-between and be average. As the game progresses you see them play and form your conclusion as to whether they're any good or not.&lt;/p&gt;
&lt;p&gt;This is essentially how my PlayerRating model works. Based on preliminary data it constructs a set of Priors and estimates the probability of the player being world class, average or stealing a living in the sport. As the player’s career progresses the model gains more data about them and the estimates iteratively move away from the Prior towards the Player’s true rating.&lt;/p&gt;
&lt;h2 id="playerrating"&gt;PlayerRating&lt;/h2&gt;
&lt;p&gt;The PlayerRating model works by combining a number of factors for each player into a single rating. This rating is typically very small so to try and keep things a bit more understandable they get rescaled to centre them around 100 and make them look a little bit like a percentage. It’s not really a percentage but since everyone is familiar with that kind of number it's hopefully a bit less scary.&lt;/p&gt;
&lt;p&gt;So what do the results look like? Well, as a starting point here is the current top twenty rated players:&lt;/p&gt;
&lt;table class="table"&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;strong&gt;Player&lt;/strong&gt;&lt;/td&gt;
        &lt;td&gt;&lt;strong&gt;Rating&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Lionel Messi&lt;/td&gt;
        &lt;td&gt;174&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Cristiano Ronaldo&lt;/td&gt;
        &lt;td&gt;173&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Sergio Ramos&lt;/td&gt;
        &lt;td&gt;163&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Cesc F&amp;agrave;bregas&lt;/td&gt;
        &lt;td&gt;162&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Thomas M&amp;uuml;ller&lt;/td&gt;
        &lt;td&gt;160&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Philipp Lahm&lt;/td&gt;
        &lt;td&gt;158&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Bastian Schweinsteiger&lt;/td&gt;
        &lt;td&gt;157&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Rafinha&lt;/td&gt;
        &lt;td&gt;157&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Manuel Neuer&lt;/td&gt;
        &lt;td&gt;156&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Wayne Rooney&lt;/td&gt;
        &lt;td&gt;155&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Mesut &amp;Ouml;zil&lt;/td&gt;
        &lt;td&gt;155&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Piqu&amp;eacute;&lt;/td&gt;
        &lt;td&gt;155&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Karim Benzema&lt;/td&gt;
        &lt;td&gt;153&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Iker Casillas&lt;/td&gt;
        &lt;td&gt;151&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Iniesta&lt;/td&gt;
        &lt;td&gt;150&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Javier Mascherano&lt;/td&gt;
        &lt;td&gt;148&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Toni Kroos&lt;/td&gt;
        &lt;td&gt;148&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;J&amp;eacute;r&amp;ocirc;me Boateng&lt;/td&gt;
        &lt;td&gt;147&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&amp;Aacute;ngel Di Mar&amp;iacute;a&lt;/td&gt;

        &lt;td&gt;145&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;And just for the fun of it here are Lionel Messi and Cristiano Ronaldo’s careers to date:
&lt;img alt="Pelican" src="../../../../images/messi_ronaldo_pr.png"&gt;&lt;/p&gt;
&lt;h2 id="what-next"&gt;What Next?&lt;/h2&gt;
&lt;p&gt;This is still really early stages and the work is far from finished but I wanted to get something up on the blog as it will encourage me to keep working on it and to document its progress. The next step is to dig through the data further to gain a better understanding of where this approach is working / not working so well and start to refine things. For example, goalkeepers are currently treated the same as outfield players and I suspect their ratings may be improved by having their own set of Priors.&lt;/p&gt;
&lt;p&gt;After that there are lots of other things I want to take a look at, such as how well the ratings predict the trajectory of the player’s remaining career, how to extract confidence intervals, what's the effect of swapping an individual player out of a team and so on. My todo list is growing at a rapid rate!&lt;/p&gt;
&lt;p&gt;At some point I’m also going to need to optimise things if I decide to continue with this idea as the ratings are pretty intense to compute. Currently, they are updated in monthly intervals and each month takes around twelve hours to process so it’s not exactly quick to tweak parameters and see the effect! There are some obvious steps to speed things up, such as distributing the processing across multiple cores or computers etc that'll provide some easy wins but no doubt the underlying maths can be optimised too. Plus, it’s all in R which doesn’t help so it may be time to dust off my C++ compiler for bits of the code...&lt;/p&gt;
&lt;p&gt;Anyway, let me know what you think. Good idea? Bad idea? Waste of time? Do the results look feasible? Ruining football with numbers (again)?!?! (actually, if you think the last one you really don’t need to let me know!!!)&lt;/p&gt;</content><category term="PlayerRating"></category><category term="Player Analytics"></category><category term="Recruitment"></category></entry><entry><title>Mathematically Optimising Your Fantasy Football Team: Update</title><link href="2015/01/21/mathematically-optimising-fantasy-football-teams-update/" rel="alternate"></link><published>2015-01-21T19:30:00+00:00</published><updated>2015-01-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2015-01-21:2015/01/21/mathematically-optimising-fantasy-football-teams-update/</id><summary type="html">&lt;p&gt;It's transfer window time so here's your mathematically optimised fantasy football team...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;At the start of this season &lt;a href="http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams/"&gt;I showed how to use linear programming&lt;/a&gt; to mathematically optimise your fantasy football team in order to get as many points as possible for your transfer budget.&lt;/p&gt;
&lt;p&gt;The winter transfer window is now well under way, meaning we all get a bonus wildcard to play in the &lt;a href="http://fantasy.premierleague.com/"&gt;Premier League's Fantasy Football&lt;/a&gt; that gives unlimited transfers for a week. Since I've been inundated&lt;sup&gt;*&lt;/sup&gt; with requests over the past few days for an update here is the 'best' squad you can currently buy for your £100 million.&lt;/p&gt;
&lt;h2 id="the-optimised-squad"&gt;The Optimised Squad&lt;/h2&gt;
&lt;table class="table"&gt;
&lt;tr&gt; &lt;th align="center"&gt; Position &lt;/th&gt; &lt;th align="center"&gt; Team &lt;/th&gt; &lt;th align="center"&gt; Points &lt;/th&gt; &lt;th align="center"&gt; Name &lt;/th&gt; &lt;th align="center"&gt; Cost(£) &lt;/th&gt;  &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Goalkeeper &lt;/td&gt; &lt;td&gt; Man Utd &lt;/td&gt; &lt;td align="center"&gt;89&lt;/td&gt; &lt;td&gt; de Gea &lt;/td&gt; &lt;td align="center"&gt; 5.80 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Goalkeeper &lt;/td&gt; &lt;td&gt; Swansea &lt;/td&gt; &lt;td align="center"&gt;80&lt;/td&gt; &lt;td&gt; Fabianski &lt;/td&gt; &lt;td align="center"&gt; 5.20 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Chelsea &lt;/td&gt; &lt;td align="center"&gt;108&lt;/td&gt; &lt;td&gt; Terry &lt;/td&gt; &lt;td align="center"&gt; 6.70 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Southampton &lt;/td&gt; &lt;td align="center"&gt;103&lt;/td&gt; &lt;td&gt; Bertrand &lt;/td&gt; &lt;td align="center"&gt; 5.80 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Southampton &lt;/td&gt; &lt;td align="center"&gt;94&lt;/td&gt; &lt;td&gt; Clyne &lt;/td&gt; &lt;td align="center"&gt; 5.80 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Everton &lt;/td&gt; &lt;td align="center"&gt;93&lt;/td&gt; &lt;td&gt; Baines &lt;/td&gt; &lt;td align="center"&gt; 7.10 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Defender &lt;/td&gt; &lt;td&gt; Southampton &lt;/td&gt; &lt;td align="center"&gt;89&lt;/td&gt; &lt;td&gt; Fonte &lt;/td&gt; &lt;td align="center"&gt; 5.60 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Arsenal &lt;/td&gt; &lt;td align="center"&gt;144&lt;/td&gt; &lt;td&gt; Sánchez &lt;/td&gt; &lt;td align="center"&gt; 11.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; West Ham &lt;/td&gt; &lt;td align="center"&gt;116&lt;/td&gt; &lt;td&gt; Downing &lt;/td&gt; &lt;td align="center"&gt; 6.60 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Spurs &lt;/td&gt; &lt;td align="center"&gt;111&lt;/td&gt; &lt;td&gt; Eriksen &lt;/td&gt; &lt;td align="center"&gt; 8.20 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Swansea &lt;/td&gt; &lt;td align="center"&gt;104&lt;/td&gt; &lt;td&gt; Sigurdsson &lt;/td&gt; &lt;td align="center"&gt; 6.70 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Midfielder &lt;/td&gt; &lt;td&gt; Spurs &lt;/td&gt; &lt;td align="center"&gt;102&lt;/td&gt; &lt;td&gt; Chadli &lt;/td&gt; &lt;td align="center"&gt; 6.60 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; QPR &lt;/td&gt; &lt;td align="center"&gt;109&lt;/td&gt; &lt;td&gt; Austin &lt;/td&gt; &lt;td align="center"&gt; 6.50 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; Spurs &lt;/td&gt; &lt;td align="center"&gt;90&lt;/td&gt; &lt;td&gt; Kane &lt;/td&gt; &lt;td align="center"&gt; 5.90 &lt;/td&gt; &lt;/tr&gt;
  &lt;tr&gt; &lt;td&gt; Forward &lt;/td&gt; &lt;td&gt; West Brom &lt;/td&gt; &lt;td align="center"&gt;88&lt;/td&gt; &lt;td&gt; Berahino &lt;/td&gt; &lt;td align="center"&gt; 5.50 &lt;/td&gt; &lt;/tr&gt;
   &lt;/table&gt;

&lt;p&gt;&lt;sup&gt;*&lt;/sup&gt;I had four requests, watch out readers this blog's going big time :-)&lt;/p&gt;</content><category term="Fantasy Football"></category><category term="Fantasy Football"></category></entry><entry><title>Massey Ratings For Football Part Two</title><link href="2014/12/04/massey-ratings-for-football-part-two/" rel="alternate"></link><published>2014-12-04T19:30:00+00:00</published><updated>2014-12-04T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-12-04:2014/12/04/massey-ratings-for-football-part-two/</id><summary type="html">&lt;p&gt;In part one I introduced Massey Ratings and how they can be used to rank football teams in a way that accounts for their strength of schedule. Next, we’ll take a look at how Massey Ratings can be extended further to look at team’s attack and defence strength separately.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In &lt;a href="http://pena.lt/y/2014/11/27/english-premier-league"&gt;part one&lt;/a&gt; I introduced Massey Ratings and how they can be used to rank football teams in a way that accounts for their strength of schedule. Next, we’ll take a look at how Massey Ratings can be extended further to look at team’s attack and defence strength separately.&lt;/p&gt;
&lt;h2 id="massey-ratings"&gt;Massey Ratings&lt;/h2&gt;
&lt;p&gt;The idea behind Massey Ratings is that they rate teams such that the difference between any two teams is equal to the expected margin of victory between them. For example, if a team rated -1.0 played a team rated +1.0 then we’d expect the average goal difference between them to be two goals.&lt;/p&gt;
&lt;p&gt;Since Massey Ratings look at goal difference rather than goals scored or conceded they account for a team’s overall strength and combine both their attack and defence strengths together into a single value. This means with a bit of mathematics we should be able to decompose a Massey Rating to split out these two constituent parts.&lt;/p&gt;
&lt;h2 id="attack-and-defence"&gt;Attack And Defence&lt;/h2&gt;
&lt;p&gt;In part One we originally defined the Massey Rating as shown below in Equation One:&lt;/p&gt;
&lt;p&gt;$y=ra–rb$  &lt;/p&gt;
&lt;p&gt;where &lt;em&gt;y&lt;/em&gt; is the margin of victory for fixture, &lt;em&gt;ra&lt;/em&gt; is the rating of team &lt;em&gt;a&lt;/em&gt; and &lt;em&gt;rb&lt;/em&gt; is the rating of team b. Let’s take this a step further and define the total goals a team should score in a match as Equation Two below:&lt;/p&gt;
&lt;p&gt;$ya=oa–db$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;ya&lt;/em&gt; is the number of goals team &lt;em&gt;a&lt;/em&gt; is expected to score, &lt;em&gt;oa&lt;/em&gt; is team a’s attack strength and &lt;em&gt;db&lt;/em&gt; is team b’s defence strength.&lt;/p&gt;
&lt;p&gt;Extending this further we can say the total goals a given team should score over the course of a season is therefore equal to its attack strength multiplied by the number of matches played minus the sum of the defence strength of all its opponents. Since we know what the team’s overall rating are, how many matches they’ve played, how many goals were scored and who their opponents were we’re getting pretty close to getting what we need.&lt;/p&gt;
&lt;h2 id="decompose-the-massey-matrix"&gt;Decompose The Massey Matrix&lt;/h2&gt;
&lt;p&gt;Next we need to decompose the Massey Matrix we created in Part One into it’s diagonal and off-diagonal elements to give us two new matrices, G and P, which we use in Equation Three below:&lt;/p&gt;
&lt;p&gt;$(G–P)r=p$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;G&lt;/em&gt; is total games played, &lt;em&gt;P&lt;/em&gt; is the number of pairwise matchups each team has played, &lt;em&gt;r&lt;/em&gt; are the team’s Massey Ratings and &lt;em&gt;p&lt;/em&gt; is a vector of the team’s goal differentials.&lt;/p&gt;
&lt;p&gt;From here, Ken Massey uses some clever algebra to derive the equivalent of Equation Four below:&lt;/p&gt;
&lt;p&gt;$(G+P)d=Gr–f$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;G&lt;/em&gt; is total games played, &lt;em&gt;P&lt;/em&gt; is the number of pairwise matchups each team has played, &lt;em&gt;d&lt;/em&gt; is the defensive rating and &lt;em&gt;f&lt;/em&gt; is the number of goals scored.&lt;/p&gt;
&lt;p&gt;If you are interested in finding out more about the mathematics behind this then I heartily recommend taking a look through Ken Massey’s thesis where he explains it in much more detail than I’ve gone in to here.&lt;/p&gt;
&lt;h2 id="calculating-the-ratings"&gt;Calculating The Ratings&lt;/h2&gt;
&lt;p&gt;Finally, we can now solve this linear system to get the attack and defence ratings for each team.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20141204_def_massey.png"&gt;
&lt;strong&gt;Figure One: Defensive Massey Ratings&lt;/strong&gt;  &lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20141204_off_massey.png"&gt;
&lt;strong&gt;Figure Two: Offensive Massey Ratings&lt;/strong&gt;  &lt;/p&gt;
&lt;p&gt;It’s no surprise that Manchester City and Chelsea rate high for offensive strength but Everton are somewhat surprisingly rated third best offensive team even though they only rank mid-table in the league. Everton may only have a goal difference of +2 at the moment though but they are actually joint third highest goal scorers in the Premier League. They are performing well offensively, it’s their defence that is letting them down and is actually ranked worse than relegation-threatened Burnley’s.&lt;/p&gt;
&lt;p&gt;QPR also rate pretty high in terms of attacking strength for a team in the relegation zone. Looking at their results for this season though they managed to score two against Manchester City, scored against Chelsea and are one of the few teams to actually get a goal against Southampton so they are performing well offensively against the league’s stronger teams. Like Everton though, their defence is performing poorly and dragging down their overall performance.&lt;/p&gt;
&lt;p&gt;What’s that at the bottom of the offensive chart in red? Why it’s Aston Villa whose attack is so poor it actually gets a negative rating! I’ve mentioned in my last two articles about how Aston Villa’s Pythagorean and Massey Ratings show them to be seriously over-placed in the league and once again here’s another metric showing how poor they are. Bizarrely, Villa are somehow in twelfth place having managed a pitiful eight goals from fourteen matches. Although they are mid-table in the league and their defensive rating is pretty good, from an offensive point of view Aston Villa’s numbers suggest they are perhaps rather fortuitous to be so far away from the relegation zone…&lt;/p&gt;
&lt;h2 id="further-improvements"&gt;Further Improvements&lt;/h2&gt;
&lt;p&gt;So far the Massey Ratings have considered each match a team plays equally but Ken Massey suggests they can be improved further by weighting matches based on their importance. For example, playing a cup match against a team from a lower division is probably less relevant to calculating the ratings than say a league match against a close rival. By weighting matches appropriately we can reduce the influence less relevant matches have on a team’s ratings and potentially improve their accuracy.&lt;/p&gt;
&lt;h2 id="example-code"&gt;Example Code&lt;/h2&gt;
&lt;p&gt;If you are interested in having a go with Massey Ratings then I’ve put some example R code on  &lt;a href="https://github.com/martineastwood/penalty/tree/master/massey"&gt;GitHub&lt;/a&gt;. You’ll need to add your own data though as I’ve stripped out the section where it connects to my database for security reasons.&lt;/p&gt;</content><category term="Ratings"></category><category term="EPL"></category><category term="Massey Ratings"></category><category term="Ranking"></category></entry><entry><title>Massey Ratings For Football Part One</title><link href="2014/11/27/english-premier-league/" rel="alternate"></link><published>2014-11-27T19:30:00+00:00</published><updated>2014-11-27T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-11-27:2014/11/27/english-premier-league/</id><summary type="html">&lt;p&gt;We all know the league table can lie and one of the common causes of this is strength of schedule. Take Southampton, at the time of writing they are currently second in the Premier League twelve matches in yet still haven’t played...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;We all know the league table can lie and one of the common causes of this is strength of schedule. Take Southampton, at the time of writing they are currently second in the Premier League twelve matches in yet still haven’t played Chelsea, Manchester City, Manchester United or Arsenal. Without wishing to be dismissive of Southampton, who undoubtedly are a very talented team, there’s a pretty decent chance that they’d currently be lower down the league table had these fixtures come up earlier in the season instead of Leicester, Hull or Aston Villa.&lt;/p&gt;
&lt;h2 id="massey-ratings"&gt;Massey Ratings&lt;/h2&gt;
&lt;p&gt;So if we can’t rely on the league table to tell us which teams are performing best what do we do? One alternative is to use Massey Ratings. This is a method devised by Ken Massey back in 1997 for his honours thesis that rates teams based on what opposition they’ve played. The system was originally designed for American Football but it can be adapted to football fairly trivially.&lt;/p&gt;
&lt;p&gt;The idea behind Massey Ratings is that they rate teams such that the difference between any two teams is equal to the expected margin of victory between them, as shown in Equation One below:&lt;/p&gt;
&lt;p&gt;$y=ra–rb$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;y&lt;/em&gt; is the margin of victory for fixture, &lt;em&gt;ra&lt;/em&gt; is the rating of team &lt;em&gt;a&lt;/em&gt;, &lt;em&gt;rb&lt;/em&gt; is the rating of team &lt;em&gt;b&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="error"&gt;Error&lt;/h2&gt;
&lt;p&gt;In an ideal world we’d have enough data that we could calculate true ratings for each team but with players moving from one team to another and with football seasons typically lasting just 38 matches we never have sufficient data for that so we have to settle for approximating ratings based on previous match results. This means we need to modify equation one to add in an error term to allow us to account for any unexplained variation in the outcome of games (Equation Two below).&lt;/p&gt;
&lt;p&gt;$y=ra–rb+e$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;y&lt;/em&gt; is the margin of victory, &lt;em&gt;ra&lt;/em&gt; is the rating of team a, &lt;em&gt;rb&lt;/em&gt; is the rating of team b and &lt;em&gt;e&lt;/em&gt; is the remaining error in the model.&lt;/p&gt;
&lt;p&gt;So far so good, but how do we know what ra and rb should equal? Well, to start with we want that error term we added into Equation Two to be as small as possible so we use a technique called Least Squares to find the optimal set of ratings for each team in order to minimise e based on the past data we have.&lt;/p&gt;
&lt;h2 id="the-matrix"&gt;The Matrix&lt;/h2&gt;
&lt;p&gt;Things get slightly trickier here but let’s say our past data comprises m matches involving n teams. We know what the margin of victory was for each match and who won but not the ratings for each team so we have m equations we need to solve to find the n unknown rating values, which we can write as Equation 3 below:&lt;/p&gt;
&lt;p&gt;$y=Xr+e$&lt;/p&gt;
&lt;p&gt;Where &lt;em&gt;y&lt;/em&gt; is the the margin of victory, &lt;em&gt;r&lt;/em&gt; is the rating we are trying to find, &lt;em&gt;e&lt;/em&gt; is the remaining error and &lt;em&gt;X&lt;/em&gt; is an m x m sized matrix of coefficients where each row represents a matchup containing a 1 for the winning team and -1 for the losing team. Unfortunately though, this gives us a very &lt;a href="http://en.wikipedia.org/wiki/Sparse_matrix"&gt;sparse matrix&lt;/a&gt; that is likely to be highly  &lt;a href="http://en.wikipedia.org/wiki/Overdetermined_system"&gt;over-determined&lt;/a&gt; making it difficult to find a unique solution to the system.&lt;/p&gt;
&lt;h2 id="the-massey-matrix"&gt;The Massey Matrix&lt;/h2&gt;
&lt;p&gt;Thankfully Massey discovered that you can modify the matrix such that the diagonal elements equal the number of games each teams has played and the off-diagonal elements equal the negation of the number of matchups teams have played against each other giving Equation Four below:&lt;/p&gt;
&lt;p&gt;$p=Mr$&lt;/p&gt;
&lt;p&gt;where &lt;em&gt;M&lt;/em&gt; is the modified Massey Matrix, &lt;em&gt;p&lt;/em&gt; is a vector of the score differentials and &lt;em&gt;r&lt;/em&gt; is the vector of unknown scores.&lt;/p&gt;
&lt;p&gt;We are getting closer now but the matrix still doesn’t necessarily have a unique set of Ratings so Massey modifies it further to set the bottom row to zero and the corresponding element of p to zero too. This constraint creates a &lt;a href="http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29"&gt;full rank matrix&lt;/a&gt; for us and forces the ratings to sum to zero.&lt;/p&gt;
&lt;h2 id="massey-ratings-for-the-english-premier-league"&gt;Massey Ratings For The English Premier League&lt;/h2&gt;
&lt;p&gt;Finally, using some linear algebra we can solve the system and get the ratings for each team, shown below in Figure One.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/2014_11_27_massey.png"&gt;
&lt;strong&gt;Figure One: EPL Massey Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It’s no surprise that Chelsea are ranked far ahead of anybody else in first place but Southampton do actually get ranked in second place, showing that even accounting for their easier schedule to date they deserve to be second in the league at the moment.&lt;/p&gt;
&lt;p&gt;Interestingly, Swansea get ranked fourth rather than their current position of seventh in the league. However, Swansea have already played five of the six teams above them so their Massey Rating shows they are performing better than their raw points tally would suggest.&lt;/p&gt;
&lt;p&gt;At the bottom of the table it’s not looking good for Aston Villa. I showed in my &lt;a href="http://pena.lt/y/2014/11/04/english-premier-league-pythagorean/"&gt;last article&lt;/a&gt; how their Pythagorean meant they were over performing being even as high as they are and this is now backed up by their Massey Rating ranking them in one of the relegation spots.&lt;/p&gt;
&lt;h2 id="next-steps"&gt;Next Steps&lt;/h2&gt;
&lt;p&gt;In my next article I’ll show how we can take Massey Ratings a step further and decompose teams’ overall ratings into separate ratings for both attack and defence. I’ll also add some example code too so you can have a go calculating them yourself.&lt;/p&gt;
&lt;p&gt;In the meantime, if you are interested in finding out more about the maths behind Massey Ratings then take a look at &lt;a href="http://pena.lt/y/2014/11/04/english-premier-league-pythagorean/"&gt;Ken Massey’s honours thesis&lt;/a&gt; which goes into the theory in much more depth than my brief overview here.&lt;/p&gt;</content><category term="Ratings"></category><category term="EPL"></category><category term="Massey Ratings"></category><category term="Ranking"></category></entry><entry><title>English Premier League Pythagorean</title><link href="2014/11/04/english-premier-league-pythagorean/" rel="alternate"></link><published>2014-11-04T19:30:00+00:00</published><updated>2014-11-04T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-11-04:2014/11/04/english-premier-league-pythagorean/</id><summary type="html">&lt;p&gt;I’ve not posted this for a while so here is the latest Pythagorean for the English Premier League.&lt;/p&gt;</summary><content type="html">&lt;p&gt;I’ve not posted this for a while so here is the latest Pythagorean for the English Premier League.&lt;/p&gt;
&lt;h2 id="football-pythagorean"&gt;Football Pythagorean&lt;/h2&gt;
&lt;p&gt;If you’ve seen this before, it’s an adaptation of the baseball Pythagorean that allows you to estimate how many points a team would be expected to achieve on average based on the number of goals they have scored and conceded. It’s a simple equation but it is surprisingly accurate.&lt;/p&gt;
&lt;p&gt;Take a look at my &lt;a href="http://pena.lt/y/pythagorean.html"&gt;previous blog posts&lt;/a&gt; if you want to find out more about the theory behind it, how it was tested and what the equation itself actually looks like.&lt;/p&gt;
&lt;h2 id="the-season-so-far"&gt;The Season So Far&lt;/h2&gt;
&lt;p&gt;Figure One below below shows the difference between how many points teams have achieved in the English Premier League and how many points their Pythagorean record predicts they should have.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/2014_11_04_pythagorean.png"&gt;
&lt;strong&gt;Figure One: EPL Pythagorean Results So Far&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Worryingly for Aston Villa they’ve currently got five points more than would be expected based on their goal record. These points were all from their crazy start to the season in which they were undefeated in their first four matches, somehow coming out with ten points even though they scored just four goals. However, they appear to have regressed somewhat since then managing just a single goal and zero points from their last six matches. It’s not looking good…&lt;/p&gt;
&lt;p&gt;Chelsea are also up five points more than expected but in contrast things could not be looking better. They are playing well and gaining more points than their goal scoring record suggests. All the signs of potential champions – a good team that are exceeding their expected points. If you want to win the league you have to be good and lucky!&lt;/p&gt;</content><category term="Pythagorean"></category><category term="EPL"></category><category term="Pythagorean"></category></entry><entry><title>Predicting Football Using R</title><link href="2014/11/02/predicting-football-using-r/" rel="alternate"></link><published>2014-11-02T19:30:00+00:00</published><updated>2014-11-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-11-02:2014/11/02/predicting-football-using-r/</id><summary type="html">&lt;p&gt;I recently gave a presentation to the Manchester R Users' Group discussing how to predict football results using R. My presentation gave a brief overview of how to create a Poisson model in R and apply the Dixon and Coles adjustment to it to account for dependance in the scores.&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently gave a presentation to the Manchester R Users' Group discussing how to predict football results using R. My presentation gave a brief overview of how to create a Poisson model in R and apply the Dixon and Coles adjustment to it to account for dependance in the scores.&lt;/p&gt;
&lt;p&gt;The slides are below for anybody interested and contain enough example R code to get you started. Unfortunately, there are no slide notes though but hopefully the slides should be descriptive enough to get you going!&lt;/p&gt;
&lt;p&gt;Example code from the presentation can be found at my &lt;strong&gt;&lt;a href="https://github.com/martineastwood/penalty/tree/master/poisson_example"&gt;GitHub account&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;iframe src="//www.slideshare.net/slideshow/embed_code/41024430" width="800" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen&gt; &lt;/iframe&gt;
&lt;div style="margin-bottom:5px"&gt; &lt;strong&gt; &lt;a href="//www.slideshare.net/MartinEastwood/predicting-football-using-r" title="Predicting Football Using R" target="_blank"&gt;Predicting Football Using R&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href="//www.slideshare.net/MartinEastwood" target="_blank"&gt;Martin Eastwood&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;</content><category term="Poisson"></category><category term="Poisson"></category><category term="Prediction"></category><category term="R"></category></entry><entry><title>Expected Goals: Foot Shots Versus Headers</title><link href="2014/08/28/expected-goals-foot-shots-versus-headers/" rel="alternate"></link><published>2014-08-28T19:30:00+00:00</published><updated>2014-08-28T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-08-28:2014/08/28/expected-goals-foot-shots-versus-headers/</id><summary type="html">&lt;p&gt;My last article on expected goals introduced the concept of using exponential decay to estimate the probability of scoring based on the shooter’s distance from the goal. The article received lots of feedback (thanks everyone!!), with a couple of common comments standing out that I wanted to address.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;My last article on expected goals introduced the concept of using exponential decay to estimate the probability of scoring based on the shooter’s distance from the goal. The article received lots of feedback (thanks everyone!!), with a couple of common comments standing out that I wanted to address.&lt;/p&gt;
&lt;h2 id="simplifying-the-model"&gt;Simplifying The Model&lt;/h2&gt;
&lt;p&gt;One common theme was whether the model was at risk of over-fitting and this is certainly something I was concerned about myself. In fact, I have since simplified the model to the equation below to help minimise this risk:&lt;/p&gt;
&lt;p&gt;$expg=exp(-distance/a)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Simplified Expected Goals Equation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As well as reducing the complexity of the model and making it easier to calculate the expected goals, the new equation has fewer parameters so the potential for overfitting is lower. The correlation between actual / expected goals has fallen slightly from 0.98 to 0.97 but the advantages of the simpler equation far outweigh such a minimal change.&lt;/p&gt;
&lt;h2 id="headers-versus-foot-shots"&gt;Headers Versus Foot Shots&lt;/h2&gt;
&lt;p&gt;Another common question was whether it was important to split out headers and foot shots into separate models as the previous articles have so far ignored headers due to lack of data.&lt;/p&gt;
&lt;p&gt;To investigate this I have been busy all summer collecting more shot data. I’m up to 45,000 shots in total now, including around 7,500 headers so I’m at the point where I’m happy to start the preliminary work comparing foot / headed shots although I certainly want more headers before drawing any definite conclusions.&lt;/p&gt;
&lt;p&gt;I’ve run through all the curve fitting again for both headers and foot shots and plotted the resulting probability curves in Figure Two below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140828_headers_versus_shots_expg.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Expected Goals: Shots Versus Headers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As you can see, headers have a noticeably lower chance of leading to a goal. The gap between head and foot shots appears largest around the ten metre mark, where foot shots have pretty much twice the probability of scoring. By 22 metres the chance of scoring from a header is virtually zero, while foot shots don’t reach this level until around 40 metres out.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;But is this difference significant and do we actually need to bother creating separate expected goals models for headers and foot shots?&lt;/p&gt;
&lt;p&gt;Well, if we compare the two probability curves against each other then the p value comes out at 0.064. Typically we take p values of 0.05 or lower to signify significance so by that count there is no real difference between the two.&lt;/p&gt;
&lt;p&gt;However, p values should never be about some absolute cut off where &amp;lt;= 0.05 equals significance and everything else can just be ignored.&lt;/p&gt;
&lt;p&gt;Having a value close to significance is suggestive that there may be a real difference there, especially when there is still a limited data size for headers so it’s certainly possible that headers and foot shots will warrant separate models. Luckily with the current equation this is really simple to do as we just need to alter the value of a as shown below in the appendix. This is an area I’ll be exploring in more detail as I add more headers to my database.&lt;/p&gt;
&lt;h2 id="appendix-using-the-expected-goals-model"&gt;Appendix: Using the Expected Goals Model&lt;/h2&gt;
&lt;p&gt;To use the expected goals model you just need two numbers:&lt;/p&gt;
&lt;p&gt;x = distance from goal in metres along x axis&lt;/p&gt;
&lt;p&gt;y = distance from centre of goal in metres along y axis&lt;/p&gt;
&lt;p&gt;These can then be used to calculate the total distance the shot is taken from:&lt;/p&gt;
&lt;p&gt;$distance=sqrt(x^2+y^2)$&lt;/p&gt;
&lt;p&gt;The expected goals for the shot is then just:&lt;/p&gt;
&lt;p&gt;$expected goals=exp(-distance/a)$&lt;/p&gt;
&lt;p&gt;where a = 4.4 for headers and 7.1 for foot shots&lt;/p&gt;
&lt;h2 id="example"&gt;Example&lt;/h2&gt;
&lt;p&gt;Here’s an example for a player taking a header from the penalty spot.&lt;/p&gt;
&lt;p&gt;x = 11 as penalty spots are roughly 11 metres from the goals (equal to 12 yards)&lt;/p&gt;
&lt;p&gt;y = 0 as penalty spots should be level with the centre of the goal&lt;/p&gt;
&lt;p&gt;$distance=sqrt(11^2+0^2)=11$&lt;/p&gt;
&lt;p&gt;$expected goals=exp(-11/4.4)=0.08$&lt;/p&gt;
&lt;p&gt;So on average, a header from the penalty spot would be worth around 0.08 goals.&lt;/p&gt;
&lt;p&gt;Easy, just don’t forget you need to use negative distance inside the exponential!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Mathematically Optimising Your Fantasy Football Team</title><link href="2014/07/24/mathematically-optimising-fantasy-football-teams/" rel="alternate"></link><published>2014-07-24T19:30:00+00:00</published><updated>2014-07-24T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-07-24:2014/07/24/mathematically-optimising-fantasy-football-teams/</id><summary type="html">&lt;p&gt;The Premier League’s fantasy football is back ready for the new season so I thought I’d run through an example of how linear programming can help you mathematically select your team.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href="http://fantasy.premierleague.com/"&gt;Premier League Fantasy Football&lt;/a&gt; is back ready for the new season so I thought I’d run through an example of how &lt;a href="http://en.wikipedia.org/wiki/Linear_programming"&gt;linear programming&lt;/a&gt;
can help you select your team. If you haven’t come across linear programming before it’s a mathematical optimisation technique for that can be used to maximise the total number of points your team is worth within a set of constraints,
e.g. staying within budget and not signing too many players from the same team.&lt;/p&gt;
&lt;h2 id="collecting-the-data"&gt;Collecting The Data&lt;/h2&gt;
&lt;p&gt;The first thing we are going to need to do is scrape some data to optimise our team with so let’s fire up &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt;. We are going to need the names of all the players that are available, what team they play for, how much they cost to sign and most importantly how many points they are worth. Conveniently, we can exploit of structure of the Premier League’s website to get the data and use it as a pseudo &lt;a href="http://en.wikipedia.org/wiki/Application_programming_interface"&gt;API&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DISCLAIMER:&lt;/strong&gt; there is a fine line between scraping someone’s web site and creating a &lt;a href="http://en.wikipedia.org/wiki/Denial-of-service_attack"&gt;denial-of-service attack&lt;/a&gt; so make sure you spread out your calls to the website. Trying to scrape all the data in quick succession can put unnecessary strain on the site’s servers. If you scrape somebody’s data please ensure you do it in a way that does not impact the service they are providing!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;#load libraries&lt;/span&gt;
&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lpSolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stringr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RCurl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonlite&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plyr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# scrape the data&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ldply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;521&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="c1"&gt;# Scrape responsibly kids, we don&amp;#39;t want to ddos&lt;/span&gt;
&lt;span class="c1"&gt;# the Fantasy Premier League&amp;#39;s website&lt;/span&gt;
&lt;span class="nf"&gt;Sys.sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://fantasy.premierleague.com/web/api/elements/%s/?format=json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;fromJSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;now_cost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;now_cost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="nf"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;names&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%in%&lt;/span&gt;
&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;web_name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;type_name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;now_cost&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="constraints"&gt;Constraints&lt;/h2&gt;
&lt;p&gt;Now we have the data we need to think about the constraints we will have to build into the linear system. For example, we can only spend a maximum of £100 million, we cannot have more than three players from the same team and are restricted to two goalkeepers, five defenders, five midfielders and three forwards.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;#Create the constraints&lt;/span&gt;
&lt;span class="n"&gt;num_gk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;num_def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;num_mid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;num_fwd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;max_cost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="c1"&gt;# Create vectors to constrain by position&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Goalkeeper&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ifelse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;type_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Goalkeeper&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Defender&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ifelse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;type_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Defender&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Midfielder&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ifelse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;type_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Midfielder&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Forward&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ifelse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;type_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Forward&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Create vector to constrain by max number of players allowed per team&lt;/span&gt;
&lt;span class="n"&gt;team_constraint&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unlist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;lapply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;team_name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="nf"&gt;ifelse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;team_name&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# next we need the constraint directions&lt;/span&gt;
&lt;span class="n"&gt;const_dir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&amp;lt;=&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;21&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="the-objective"&gt;The Objective&lt;/h2&gt;
&lt;p&gt;We also need to create the vector defining our objective, which is to maximise the number of points the team is worth within the constraints we are setting.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# The vector to optimize against&lt;/span&gt;
&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;total_points&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="solving-the-matrix"&gt;Solving The Matrix&lt;/h2&gt;
&lt;p&gt;Finally, we put all the constraints into a matrix and let R solve the linear system to create our mathematically optimised team selection.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Put the complete matrix together&lt;/span&gt;
&lt;span class="n"&gt;const_mat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Goalkeeper&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Defender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Midfielder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Forward&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;now_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team_constraint&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;nrow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;team_name&lt;/span&gt;&lt;span class="p"&gt;))),&lt;/span&gt;
&lt;span class="n"&gt;byrow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;const_rhs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_gk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;num_def&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;num_mid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;num_fwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;max_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# And solve the linear system&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;lp &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;max&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;const_mat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;const_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;const_rhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;all.bin&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;all.int&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;arrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;solution&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Goalkeeper&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Defender&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Midfielder&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Forward&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_points&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id="the-results"&gt;The Results&lt;/h2&gt;
&lt;p&gt;The  team the linear solver selected is shown in the table below – this is team with the highest possible number of points that can be achieved using the constraints we are working within.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Position&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Points&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cost (£)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goalkeeper&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;td&gt;Howard&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goalkeeper&lt;/td&gt;
&lt;td&gt;Crystal Palace&lt;/td&gt;
&lt;td&gt;144&lt;/td&gt;
&lt;td&gt;Speroni&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;td&gt;Coleman&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;td&gt;Terry&lt;/td&gt;
&lt;td&gt;6.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;157&lt;/td&gt;
&lt;td&gt;Mertesacker&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;155&lt;/td&gt;
&lt;td&gt;Koscielny&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;149&lt;/td&gt;
&lt;td&gt;Fonte&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;241&lt;/td&gt;
&lt;td&gt;Yaya Touré&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;205&lt;/td&gt;
&lt;td&gt;Gerrard&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Crystal Palace&lt;/td&gt;
&lt;td&gt;131&lt;/td&gt;
&lt;td&gt;Puncheon&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;td&gt;Sidwell&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;125&lt;/td&gt;
&lt;td&gt;Noble&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;td&gt;Giroud&lt;/td&gt;
&lt;td&gt;8.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;179&lt;/td&gt;
&lt;td&gt;Lambert&lt;/td&gt;
&lt;td&gt;7.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;106&lt;/td&gt;
&lt;td&gt;Weimann&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id="limitations"&gt;Limitations&lt;/h2&gt;
&lt;p&gt;Now, before the internet gets grumpy and starts trolling me (whenever I’ve mentioned using mathematics for fantasy football people seem to get very irate) there are a few obvious limitations worth pointing out. First of all the new football season hasn’t started so I’m using the points totals from last season. This means all the players at the promoted teams and any new signings to the Premier League will have zero points and so will not get selected. I’m planning on running this script regularly throughout the coming season though to help guide my transfers, so as these players gain points they will start to get selected by the linear solver if they perform well enough.&lt;/p&gt;
&lt;p&gt;Also, we’ve set the constraints to optimise for the best squad. You may want to spend all your money on the best possible first eleven and go for budget substitutes instead. For example, the table below shows what happens if you optimise for eleven players playing 1-3-4-3 at a total price of £82 million (this leaves enough to buy four substitutes at £4.5 million each).&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Position&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Points&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cost (£)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goalkeeper&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;td&gt;Howard&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;td&gt;Coleman&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;td&gt;Terry&lt;/td&gt;
&lt;td&gt;6.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defender&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;149&lt;/td&gt;
&lt;td&gt;Fonte&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;241&lt;/td&gt;
&lt;td&gt;Yaya Touré&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;205&lt;/td&gt;
&lt;td&gt;Gerrard&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;178&lt;/td&gt;
&lt;td&gt;Lallana&lt;/td&gt;
&lt;td&gt;8.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midfielder&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;td&gt;Sidwell&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;td&gt;Giroud&lt;/td&gt;
&lt;td&gt;8.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;179&lt;/td&gt;
&lt;td&gt;Lambert&lt;/td&gt;
&lt;td&gt;7.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forward&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;152&lt;/td&gt;
&lt;td&gt;Rodriguez&lt;/td&gt;
&lt;td&gt;7.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Interestingly (for me at least) is that the cost of the players is fairly evenly spread across the team. Typically, when I select my fantasy football teams I tend to splash the cash on the big name strikers and then go for cheap defenders. However, based on these results though that’s looking like a bad decision so this season I’m going to follow the data and actually sign some decent defenders. Wish me luck…&lt;/p&gt;
&lt;h2 id="appendix"&gt;Appendix&lt;/h2&gt;
&lt;p&gt;All code is available on &lt;a href="https://github.com/martineastwood/penalty/tree/master/fantasy_footballl_optimiser"&gt;GitHub&lt;/a&gt;&lt;/p&gt;</content><category term="Fantasy Football"></category><category term="Fantasy Football"></category></entry><entry><title>Expected Goals And Exponential Decay</title><link href="2014/04/22/expected-goals-and-exponential-decay/" rel="alternate"></link><published>2014-04-22T19:30:00+00:00</published><updated>2014-04-22T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-04-22:2014/04/22/expected-goals-and-exponential-decay/</id><summary type="html">&lt;p&gt;In my last article on expected goals I showed how to incorporate the distance from goal along the Y axis into the expected goal model using Pythagoras’ Theorem. This all worked pretty well, giving us an r squared value of 0.95. However, while the r squared value was good there was still a flaw in the model we need to fix.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In my &lt;a href="http://pena.lt/y/2014/04/16/expected-goals-the-y-axis/"&gt;last article&lt;/a&gt; on expected goals I showed how to incorporate the distance from goal along the Y axis into the expected goal model using &lt;a href="http://en.wikipedia.org/wiki/Pythagorean_theorem"&gt;Pythagoras' Thereom&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This all worked pretty well, giving us an r squared value of 0.95. However, while the r squared value was good there was still a flaw in the model we need to fix.&lt;/p&gt;
&lt;h2 id="better-than-ronaldo"&gt;Better than Ronaldo&lt;/h2&gt;
&lt;p&gt;Eagle-eyed readers will have noticed that the fit of the curve broke down for very short distances, meaning the probability of scoring from zero metres was actually slightly above one. And as reader Benjamin Lindqvist commented, not even Ronaldo will score more than 100% of the time, not even from the goal line. Benjamin also had a good suggestion to improve this, adding an exponential decay function into the model to make it behave better around zero&lt;/p&gt;
&lt;h2 id="exponential-decay"&gt;Exponential Decay&lt;/h2&gt;
&lt;p&gt;If you aren’t familiar with exponential decay it basically means that a value decreases at a rate proportional to its current value. It’s a phenomenon that crops up fairly frequently in science and the natural world. For example, air pressure decays exponentially as you go higher up into the Earth’s atmosphere and radioactivity decreases exponentially over time.&lt;/p&gt;
&lt;p&gt;A general equation for exponential decay is shown in Figure 1, where Y(t) is the value at time t, a is the starting value, k is the decay constant and  t is time.&lt;/p&gt;
&lt;p&gt;$y(t)=ae^{kt}$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Exponential Decay&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So how do we apply this to football? Well, the first thing to do is replace time with metres and assume that the probability of scoring a goal decreases exponentially based upon the distance from goal the shot is taken from.&lt;/p&gt;
&lt;p&gt;Next we need to find the correct value for the decay constant as this controls the shape of the curve. Rather than doing this manually through trial and error, we can use something such as R’s &lt;a href="http://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html"&gt;optim&lt;/a&gt;  function to find it for us. We can also tweak the equation to add in a multiplier for the independent variable and an intercept as found in a traditional regression model giving us the fit shown in Figure 2.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140422_exgp_exp_decay.png"&gt;
&lt;strong&gt;Figure 2: Shots Versus Distance From Goal&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Notice how the orange line now hits the Y axis just below 1.0? This fixes the problem we had before where it was possible to score more than one goal from a single shot. In fact, if you’re standing on the goal line the model now predicts around 0.96 expected goals, so very likely to score but with a small chance of screwing up (yes Edin Džeko I’m looking at you).&lt;/p&gt;
&lt;p&gt;The new curve fit also pushes the r squared value up to 0.9883, meaning 98.83% of the variance for the probability of scoring from a shot can be accounted for using just distance from goal along the X and Y axes.&lt;/p&gt;
&lt;p&gt;The final equation (Figure 3) is slightly more complicated now but it’s still pretty simple to use.&lt;/p&gt;
&lt;p&gt;$expg=e^{-d/4.79}*0.921985+0.036212$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Expected Goals Equation Incorporating Exponential Decay&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;where:&lt;/p&gt;
&lt;p&gt;$d=sqrt(dx^2+dy^2)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: Equation for d&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;and dx, dy are the difference between the x coordinates and y coordinates in metres for the shot location and the goal location.&lt;/p&gt;
&lt;p&gt;As ever, let me know what you think!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Expected Goals: The Y Axis</title><link href="2014/04/16/expected-goals-the-y-xis/" rel="alternate"></link><published>2014-04-16T19:30:00+00:00</published><updated>2014-04-16T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-04-16:2014/04/16/expected-goals-the-y-xis/</id><summary type="html">&lt;p&gt;Expected goals are one of the hot topics in the football analytics community at the moment and it’s a topic I’ve previously written a number of articles on discussing how to calculate them. If you haven’t read those pieces yet it’s probably worth taking a quick look to set the context for the rest of this article.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Expected goals are one of the hot topics in the football analytics community at the moment and it’s a topic I’ve previously written a &lt;a href="http://localhost:8000/category/expected-goals.html"&gt;number of articles&lt;/a&gt; discussing how to calculate them. If you haven’t read those pieces yet it’s probably worth taking a quick look to set the context for the rest of this article.&lt;/p&gt;
&lt;h2 id="the-story-so-far"&gt;The Story So Far&lt;/h2&gt;
&lt;p&gt;A few week’s back I published a simple equation for calculating expected goals that received a lot of positive feedback from readers as it was easy to use and was pretty accurate based on its r squared value of 0.86. This effectively means the equation is capable of explaining 86% of the variance in the shots data I have collected from &lt;a href="http://www.squawka.com/"&gt;Squawka&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For such a basic equation this is a really good result. I’d purposely tried to keep things simple so that the equation was easy enough for non-mathematicians to use in order to try and encourage its adoption by other people. Rather than keep these sort of things to myself I’d much rather share them around and see them get used elsewhere.&lt;/p&gt;
&lt;p&gt;One of the restrictions I’d set myself for this was to only use the distance the player shooting was from the goal along the X axis so that the equation only needed data along one dimension. However, I received a lot of messages through &lt;a href="https://twitter.com/penaltyblog"&gt;Twitter&lt;/a&gt; and on the blog asking about the Y axis so let’s take a look…&lt;/p&gt;
&lt;h2 id="the-y-axis"&gt;The Y Axis&lt;/h2&gt;
&lt;p&gt;So the first question to ask was whether the Y axis was even worth bothering with, after all the r squared value when just using distance along the X axis was already 0.86 which only left around 14% of the variance in the data to account for.&lt;/p&gt;
&lt;p&gt;Well, it turns out that how far away you are from the goal along the Y axis does have an impact (Figure 1). Unsurprisingly the further away you are then the less likely you are to score. Before you ask, the r squared value is 0.88 (I have learnt now to include r squared values for pretty much all charts otherwise I get bombarded by requests for them :-)).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140416_y_axis.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Shots Versus Distance From Goal Along Y Axis&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="adding-the-y-axis-into-the-equation"&gt;Adding The Y Axis Into The Equation&lt;/h2&gt;
&lt;p&gt;Okay, we know the Y axis has an effect on expected goals but how do we factor this into my previous equation? There are a number of mathematical techniques we can use to solve for multiple dimensions. However, I am keen to try and make this as simple as possible so that the lay-person can use it so let’s keep it basic and go with Pythagoras’ Theorem, a topic most people have touched on at High School at some point.&lt;/p&gt;
&lt;p&gt;If we know the xy coordinates of the player taking the shot and the xy coordinates of the goal then using Pythagoras’ Theorem we can calculate the total distance between the two points. Figure two shows the equation for this where dx is the distance between the two x coordinates, dy is the distance between the two y coordinates and AB is the total distance the player is from the goal.&lt;/p&gt;
&lt;p&gt;$AB=sqrt(dx^2+dy^2)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Calculating the distance between two points&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I did this for all 17,000 shots I have collected so far from Squawka (excluding penalties) to get their total distances from goal and calculated the probability of scoring from different distances based on the number of shots taken versus goals scored (Figure 3).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140416_exp_goals_y_axis.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Shots Versus Total Distance From Goal&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As previous, I’m using a power curve to fit the line through the data and as you can see it’s a pretty good fit. So what is the effect of adding in the Y axis? Well the r squared value has changed from 0.86 to…&lt;/p&gt;
&lt;p&gt;&lt;em&gt;drumroll&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;0.95&lt;/p&gt;
&lt;p&gt;Yep, including both the x and y axis into the expected goals model accounts for 95% of the variance in the data. This barely leaves any room for the shooting player’s talent to have any effect or even for defensive pressure to play a part.&lt;/p&gt;
&lt;p&gt;At first I thought this seemed a bit odd but thinking about it in more detail it actually seems logical. It doesn’t make much difference whether you are shooting from five metres out against a strong defence or a weak one, you still have the same chance of scoring from that particular position.&lt;/p&gt;
&lt;p&gt;However, playing against a strong defence will likely mean you will get into that good position less often so your overall expected goals will be lower. Conversely, better players will be able to get into those good positions more often than weaker players so their overall expected goals will be higher.&lt;/p&gt;
&lt;p&gt;In other words, at the individual shot level expected goals seems to be all about a player’s position in respect to the goal when they shoot. Other factors, such as player talent, defensive pressure etc are probably not visible until you start looking at larger samples, such as expected goals per fixture or even per season.&lt;/p&gt;
&lt;p&gt;Anyway, here’s the final equation:&lt;/p&gt;
&lt;p&gt;$ExpG=Distance^{-1.33796}*10^{0.4720605}$&lt;/p&gt;
&lt;p&gt;Let me know what you think!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>English Premier League Pythagorean Update</title><link href="2014/04/04/english-premier-league-pythagorean-update/" rel="alternate"></link><published>2014-04-04T19:30:00+00:00</published><updated>2014-04-04T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-04-04:2014/04/04/english-premier-league-pythagorean-update/</id><summary type="html">&lt;p&gt;I’ve not posted an update on the Pythagorean for the English Premier League (EPL) for a while so the latest figures are below.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I’ve not posted an update on the Pythagorean for the English Premier League (EPL) for a while so the latest figures are below.&lt;/p&gt;
&lt;h2 id="football-pythagorean"&gt;Football Pythagorean&lt;/h2&gt;
&lt;p&gt;In case you haven’t seen it before, my football Pythagorean is an adaptation of the &lt;a href="http://en.wikipedia.org/wiki/Pythagorean_expectation"&gt;baseball pythagorean&lt;/a&gt; that allows you to quickly estimate how many points a team would be expected to achieve on average based on the number of goals they have scored and conceded. It’s a pretty simple little equation but it is surprisingly accurate.&lt;/p&gt;
&lt;p&gt;Take a look at my previous blog posts &lt;a href="http://www.pena.lt/y/category/pythagorean.html"&gt;here&lt;/a&gt; about it if you want to find out more about the theory behind it, how it was tested and what the equation itself actually looks like.&lt;/p&gt;
&lt;h2 id="the-season-so-far"&gt;The Season So Far&lt;/h2&gt;
&lt;p&gt;Figure One below shows the difference between the actual points each Premier League team has achieved this season and how much my Pythagorean predicts they should have on average. For teams in green the difference is positive meaning they have more points than expected while those teams in red have less points than expected based on the number of goals they have scored and conceded.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140404_English_Premier_League_pythag.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure One: EPL Pythagorean Results So Far&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once again Tottenham are way ahead of where they would be expected to be, with an astonishing 15 points extra. Either Spurs are doing something fantastically efficient  this season or they are extremely lucky to be where they are in the league. Take those 15 points away and they drop down to 10th place just ahead of Stoke. This season has been a bit of a write off for Spurs compared with pre-season expectations but it could / should have been so much worse based on their Pythagorean.&lt;/p&gt;
&lt;p&gt;Down at the other end of the table Hull should probably be feeling quite pleased with themselves as they are looking a pretty safe bet to avoid relegation even with their Pythagorean of -5.&lt;/p&gt;
&lt;p&gt;Poor Swansea though have the lowest Pythagorean in the league. On average teams with their goal record would expect to have achieved roughly nine more points than their current total. In fact if Swansea and Tottenham both had the average points their goals suggest then the Swans would actually be the higher placed of the two teams!&lt;/p&gt;
&lt;p&gt;Let’s see what happens if / when &lt;a href="http://en.wikipedia.org/wiki/Regression_toward_the_mean"&gt;regression towards the mean&lt;/a&gt; starts to kick in…&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category><category term="EPL"></category></entry><entry><title>Expected Goals Updated</title><link href="2014/03/01/expected-goals-updated/" rel="alternate"></link><published>2014-03-01T19:30:00+00:00</published><updated>2014-03-01T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-03-01:2014/03/01/expected-goals-updated/</id><summary type="html">&lt;p&gt;When I introduced my Expected Goals model a few weeks back a number of people commented on the bump in the curve where I had included penalty shots in the data set used to fit the model...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;When I introduced my Expected Goals model a few weeks back a number of people commented on the bump in the curve where I had included penalty shots in the data set used to fit the model. The reason I’d originally left penalties in was I felt their number was too few to have an impact on the fit of the model and at the time I hadn’t actually tracked which shots were and were not from penalties.&lt;/p&gt;
&lt;p&gt;Since that decision seemed to cause quite a kerfuffle I have since gone back to the raw data, removed all the penalties and refitted the curve. While I was at it I also added in more shots I had collected and rescaled all the co-ordinates to use a larger pitch (105 x 68m) as Claus Moeller had suggested my estimate of Premier League pitch size was too small.&lt;/p&gt;
&lt;p&gt;As expected, the difference in the fit of the curve is very small (Figure 1)  but it has pushed the r squared value up to 0.86 from 0.84, meaning that 86% of the variance in goal scoring is due to the distance from the goal the shot is taken from and just 14% is due to other reasons, such as player talent, defensive pressure, goalkeeper etc.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140301_expected_goals_update.png"&gt;
&lt;strong&gt;Figure 1: Shots Versus Distance From Goal&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The equation for expected goals is now updated to -1.014718 for the coefficient and 0.05082859 for the intercept so for my previous example a shot from 8 metres gives:&lt;/p&gt;
&lt;p&gt;$8^{-1.014718}*10^{0.05082859}=0.1362846$ expected goals&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Actual Goals Versus Expected Goals</title><link href="2014/02/15/actual-goals-versus-expected-goals/" rel="alternate"></link><published>2014-02-15T19:30:00+00:00</published><updated>2014-02-15T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-02-15:2014/02/15/actual-goals-versus-expected-goals/</id><summary type="html">&lt;p&gt;Since my last post about how to calculate expected goals one question has come up more than any other and that is about the correlation between expected goals and actual goals..&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Since my &lt;a href="http://pena.lt/y/2014/02/12/expected-goals-for-all/"&gt;last article&lt;/a&gt; about how to calculate expected goals one question has come up more than any other and that is about the correlation between expected goals and actual goals so here you go:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140215_goals_for.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Shots Versus Distance From Goal&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140215_goals_away.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Expected Goals Away Versus Actual Goals Away&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The correlations look pretty good, 0.86 for goals for and 0.72 for goals away. I’m not sure yet why the correlations differ slightly for home / away and whether it means anything or is just down to noise in the data but I’ll keep an eye on that as I collect more shots over the course of the season.&lt;/p&gt;
&lt;p&gt;Another question that popped up a few times was whether my expected goals correlated with actual goals better than Total Shot Ratio (TSR) does and the answer is yes it does.&lt;/p&gt;
&lt;p&gt;This is to be expected really as expected goals account for shot location while TSR considers all shots to be equal when clearly they are not – a shot from one metre out is vastly more likely to lead to a goal than a shot from 20 metres out.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140215_tsr_for.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: TSR Versus Actual Goals For&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140215_tsr_away.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: TSR Versus Actual Goals Away&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There is still a heap of work to do to improve / optimise / characterise the expected goals model further but it is a promising start for it so far. I’ll post more updates as as I progress with the model’s development over the coming weeks.&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Expected Goals For All</title><link href="2014/02/12/expected-goals-for-all/" rel="alternate"></link><published>2014-02-12T19:30:00+00:00</published><updated>2014-02-12T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-02-12:2014/02/12/expected-goals-for-all/</id><summary type="html">&lt;p&gt;It seems that everybody has their own expected goals models for football nowadays but they all seem to be top secret and all appear to give different results so I thought I post a quick example of one technique here to try and stimulate a bit of chat about the best way to model them.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;It seems that everybody has their own expected goals models for football nowadays but they all seem to be top secret and all appear to give different results so I thought I post a quick example of one technique here to try and stimulate a bit of chat about the best way to model them.&lt;/p&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;Over the past few weeks I have tediously collected several thousand xy co-ordinates for shot locations from &lt;a href="http://www.squawka.com"&gt;Squawka&lt;/a&gt; and converted them into approximate distances from goal in metres, assuming that an average football pitch is 100m x 65m.&lt;/p&gt;
&lt;h2 id="goals-versus-distance"&gt;Goals Versus Distance&lt;/h2&gt;
&lt;p&gt;Figure 1 below shows the relationship between the probability of scoring a goal and how far away from the goal line the shot is taken from.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140212_exp_scatter.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Shots Versus Distance From Goal&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There seems to be a little bit of noise in the data, particularly around the 12-13m mark but overall I was pleasantly surprised how neat the data looks – there seems to be a pretty clear non-linear relationship between the likelihood of scoring and how far away from the goal the shot is taken from.&lt;/p&gt;
&lt;p&gt;So how do we model this relationship? Obviously we cannot just stick a linear regression through the graph it as the relationship is clearly not linear so one possibility is to use a polynomial  instead of a straight line (Figure 2).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140212_exp_poly.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Fitting a Polynomial&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately, this does not give particularly good results as low order polynomials (the orange line) do not fit tightly enough to the non-linearity in the relationship while higher-order polynomials (the red line) start to fit to the noise in the data leading to problems with over-fitting.&lt;/p&gt;
&lt;p&gt;So what do we do now? Well, looking closer the shape of the curve appears exponential so one option is to fit a Power function to it. We can do this pretty easily by taking the log of the data, fitting a linear regression against it and plotting this against our non-logged data (Figure 3).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140212_exp_power.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Power Curve&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This gives an extremely good fit with the data and seems a plausible choice. We know goal scoring is Poisson distributed so it would seem natural to fit expected goals using an exponential shaped curve since Poisson and exponential distributions are inherently linked – the exponential distribution in fact describes the time taken between individual events occurring in a Poisson process.&lt;/p&gt;
&lt;p&gt;If we calculate the r squared value for the fit of the Power curve then we get a value of 0.84, meaning 84% of the variance in goal scoring can be attributed to how far away the player taking the shot is from the goal. This is pretty impressive as it leaves just 16% attributed to other reasons, such as the angle of the shot, goalkeeper positioning, defensive pressure, the shooting player’s talent etc.&lt;/p&gt;
&lt;p&gt;Before you ask, I’ll be looking at whether adding these additional factors into the model can improve it or whether the added complexity is not worth chasing the 16% for in the coming weeks.&lt;/p&gt;
&lt;h2 id="using-the-expected-goals-model"&gt;Using the Expected Goals Model&lt;/h2&gt;
&lt;p&gt;But how do we use the model? Although everybody else’s models seem to be top secret I’m going to give mine away. The coefficient for the regression is $-1.036884$ and the intercept is $0.05950286$.&lt;/p&gt;
&lt;p&gt;To put this into action all you need to do is raise the distance away from the goal in metres to the power of the coefficient and multiply by 10 to the power of the intercept. For example, a shot from 8 metres gives:&lt;/p&gt;
&lt;p&gt;$8^{-1.036884} * 10^{0.05950286} = 0.132771$ expected goals&lt;/p&gt;
&lt;p&gt;So how about we give it a proper test and try it out on this season’s English Premier League to date? The results are shown in Table 1 and overall give a root mean square error of 8.2 goals, which seems a pretty reasonable starting point for developing the model further from.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Team&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Goals&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;expG&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Residual&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;1&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td &gt;68.00&lt;/td&gt;
&lt;td &gt;46.90&lt;/td&gt;
&lt;td &gt;21.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;2&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td &gt;63.00&lt;/td&gt;
&lt;td &gt;42.56&lt;/td&gt;
&lt;td &gt;20.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;3&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td &gt;48.00&lt;/td&gt;
&lt;td &gt;35.26&lt;/td&gt;
&lt;td &gt;12.74&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;4&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td &gt;48.00&lt;/td&gt;
&lt;td &gt;42.56&lt;/td&gt;
&lt;td &gt;5.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;5&lt;/td&gt;
&lt;td&gt;Man Utd&lt;/td&gt;
&lt;td &gt;41.00&lt;/td&gt;
&lt;td &gt;35.26&lt;/td&gt;
&lt;td &gt;5.74&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;6&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td &gt;37.00&lt;/td&gt;
&lt;td &gt;29.05&lt;/td&gt;
&lt;td &gt;7.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;7&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td &gt;37.00&lt;/td&gt;
&lt;td &gt;33.31&lt;/td&gt;
&lt;td &gt;3.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;8&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td &gt;32.00&lt;/td&gt;
&lt;td &gt;30.32&lt;/td&gt;
&lt;td &gt;1.68&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;9&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td &gt;32.00&lt;/td&gt;
&lt;td &gt;26.89&lt;/td&gt;
&lt;td &gt;5.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;10&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td &gt;32.00&lt;/td&gt;
&lt;td &gt;31.21&lt;/td&gt;
&lt;td &gt;0.79&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;11&lt;/td&gt;
&lt;td&gt;WBA&lt;/td&gt;
&lt;td &gt;30.00&lt;/td&gt;
&lt;td &gt;31.17&lt;/td&gt;
&lt;td &gt;-1.17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;12&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td &gt;28.00&lt;/td&gt;
&lt;td &gt;26.69&lt;/td&gt;
&lt;td &gt;1.31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;13&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td &gt;27.00&lt;/td&gt;
&lt;td &gt;24.20&lt;/td&gt;
&lt;td &gt;2.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;14&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td &gt;26.00&lt;/td&gt;
&lt;td &gt;25.29&lt;/td&gt;
&lt;td &gt;0.71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;15&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td &gt;25.00&lt;/td&gt;
&lt;td &gt;25.63&lt;/td&gt;
&lt;td &gt;-0.63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;16&lt;/td&gt;
&lt;td&gt;Hull&lt;/td&gt;
&lt;td &gt;25.00&lt;/td&gt;
&lt;td &gt;23.95&lt;/td&gt;
&lt;td &gt;1.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;17&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td &gt;24.00&lt;/td&gt;
&lt;td &gt;24.39&lt;/td&gt;
&lt;td &gt;-0.39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;18&lt;/td&gt;
&lt;td&gt;Cardiff&lt;/td&gt;
&lt;td &gt;19.00&lt;/td&gt;
&lt;td &gt;24.67&lt;/td&gt;
&lt;td &gt;-5.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;19&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td &gt;19.00&lt;/td&gt;
&lt;td &gt;27.61&lt;/td&gt;
&lt;td &gt;-8.61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;20&lt;/td&gt;
&lt;td&gt;Crystal Palace&lt;/td&gt;
&lt;td &gt;18.00&lt;/td&gt;
&lt;td &gt;25.03&lt;/td&gt;
&lt;td &gt;-7.03&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Expected Goals For The English Premier League To Date&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can also see a pretty clear pattern in that the teams at the top of the league have generally over-performed the goal expectancy while those towards the bottom end have under-performed it. This would seem reasonable as we are predicting average goal expectancy and the top teams are obviously above average so should perhaps do better with their chances, while the lower teams are below average so would be expected to perform worse?&lt;/p&gt;
&lt;h2 id="what-next"&gt;What Next?&lt;/h2&gt;
&lt;p&gt;I’m not claiming this to be the only way of calculating expected goals, or even the best way but hopefully it will encourage more discussion of how to calculate expected goals rather than a lot of secret black boxes all giving different results.&lt;/p&gt;
&lt;p&gt;I hope to write more about expected goals over the coming weeks in order to test this equation to see how well it really works, to hopefully improve it further and to try and understand what the metric can and cannot tell us.&lt;/p&gt;
&lt;p&gt;In the meantime, feel free to use my equation to calculate expected goals, all I ask is that you don’t try and pass the equation off as your own (you know who you are!!) and that if you use it then please acknowledge me and link back to my site.&lt;/p&gt;
&lt;p&gt;Be warned though it’s a work in progress so is subject to change as and when I improve things…&lt;/p&gt;
&lt;p&gt;Enjoy!&lt;/p&gt;</content><category term="Expected Goals"></category><category term="Expected Goals"></category></entry><entry><title>Comparing Players Using Cluster Analysis</title><link href="2014/02/10/comparing-players-using-cluster-analysis/" rel="alternate"></link><published>2014-02-10T19:30:00+00:00</published><updated>2014-02-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-02-10:2014/02/10/comparing-players-using-cluster-analysis/</id><summary type="html">&lt;p&gt;As there were a couple of presentations at the recent Opta Pro Forum talking about identifying player similarities I thought I’d give a quick example of how to do something similar using k-means cluster analysis.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As there were a couple of presentations at the recent Opta Pro Forum talking about identifying player similarities I thought I’d give a quick example of how to do something similar using k-means cluster analysis.&lt;/p&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;All the data used in the analysis was taken from public websites, such as &lt;a href="http://www.whoscored.com"&gt;whoscored&lt;/a&gt;, &lt;a href="http://www.squawka.com"&gt;squawka&lt;/a&gt;, &lt;a href="http://www.transfermarkt.co.uk/"&gt;transfermarkt&lt;/a&gt; etc and painstakingly matched together to try and get as much information on each player as possible.&lt;/p&gt;
&lt;p&gt;The first stage of analysis was to normalize the data so it was all in the same range to avoid biasing the clustering. If you think about how many goals a typical player scores per match compared with how many passes they play then the scale is quite different. Since k-means clustering uses &lt;a href="http://en.wikipedia.org/wiki/Euclidean_distance"&gt;Euclidean Distance&lt;/a&gt; the clusters formed are influenced strongly by the magnitudes of the variables, especially by outliers. By normalizing all data into the same range this bias can be avoided.&lt;/p&gt;
&lt;h2 id="principal-component-analysis"&gt;Principal Component Analysis&lt;/h2&gt;
&lt;p&gt;While normalizing the data, I also performed &lt;a href="http://en.wikipedia.org/wiki/Principal_component_analysis"&gt;Principal Component Analysis&lt;/a&gt; (PCA) on it too. This step isn’t essential but it is a handy way of reducing the dimensions in the data down to a more manageable size by squashing all the data together into new variables known as principal components.&lt;/p&gt;
&lt;p&gt;These principal components are created in such as way so that the first one accounts for as much as the variance in the data as possible, the second one then accounts for as much of the remaining variance and so on.&lt;/p&gt;
&lt;p&gt;As you can see in Figure 1 below, the first component represents pretty much 70% of all the variance in the data with each additional component accounting for less and less. This means we can represent pretty much all the information in the data without losing much using just five components, and around about 80% using just two components.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_scree.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: PCA scree plot showing amount of variance accounted for by each principal component&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="clustering-the-players"&gt;Clustering The Players&lt;/h2&gt;
&lt;p&gt;The next step was to then run the k-means clustering algorithm on the data. As shown in Figure 2 the players split relatively neatly into five distinct coloured clusters when plotted by the first two principal components.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_clusters.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Players split into different clusters by colour&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="goalkeepers"&gt;Goalkeepers&lt;/h2&gt;
&lt;p&gt;As a quick test we can look at the grey cluster located at the bottom of the image in more detail to see which players are contained within it (Figure 3). If you click the image to zoom in on it you can see it’s done a pretty good job of pulling out the goalkeepers from the rest of the players. This is to be expected since goalkeeper’s stats should be pretty distinct from outfield players but it’s reassuring to check the technique passes this first simple test before we move on.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_keepers.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: The grey cluster up close&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="vincent-kompany"&gt;Vincent Kompany&lt;/h2&gt;
&lt;p&gt;Now that we have separated out the goalkeepers we can take a look at how well the technique copes with outfield players, starting with Manchester City’s central defender Vincent Kompany located at the centre of Figure 4. The results are pretty good, with Kompany surrounded by players predominantly considered to be defenders. As you move up the image the players start to get a bit more attacking with people like David Luiz, Phil Jones and Fabien Delph starting to appear&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_kompany.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: Clustering of Vincent Kompany&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="adnan-januzaj"&gt;Adnan Januzaj&lt;/h2&gt;
&lt;p&gt;Next up is Adnan Januzaj, one of the few Manchester United players to be having anything resembling a decent season this year. Again the results look pretty plausible (Figure 5), with Januzaj surrounded by predominatly attacking midfielders. There are a couple of slightly surprising results in there though, such as Manchester City’s strikers Álvaro Negredo and Edin Džeko.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_januzaj.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 5: Clustering of Adnan Januzaj&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="mikel-arteta"&gt;Mikel Arteta&lt;/h2&gt;
&lt;p&gt;Finally, I added in Arsenal’s midfielder Mikel Arteta (Figure 6). This one was probably the most surprising of all the players I’ve looked at as there seems to be quite a mix of players around Arteta, including both offensive and defensive players, although perhaps this is actually representative of Arteta’s role at Arsenal?&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140210_arteta.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 6: Clustering Mikel Arteta&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="next-steps"&gt;Next Steps&lt;/h2&gt;
&lt;p&gt;For a first go the results are pretty promising but there are plenty of ways the technique could be improved. At the moment I have used all the data I had available for each player but I suspect more specific results could be obtained by filtering the data.&lt;/p&gt;
&lt;p&gt;For example, there may be specific attributes of a player you want to match on e.g. looking for attackers by just their creative output may be more useful than including their tackles, interceptions etc, which may be of minor importance to their role.&lt;/p&gt;
&lt;p&gt;Finally, all the data used here are aggregated. A really interesting next step would be to include xy co-ordinates for shot locations, interceptions, passes etc to cluster players based on the locations of their actions on the pitch (donations of xy data will be gratefully accepted :)).&lt;/p&gt;</content><category term="Cluster Analysis"></category><category term="Player Analytics"></category><category term="Cluster Analysis"></category></entry><entry><title>EPL 2013/2014: Football Pythagorean So Far</title><link href="2014/01/20/epl-20132014-football-pythagorean-so-far/" rel="alternate"></link><published>2014-01-20T19:30:00+00:00</published><updated>2014-01-20T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2014-01-20:2014/01/20/epl-20132014-football-pythagorean-so-far/</id><summary type="html">&lt;p&gt;Welome back! Now that I'm no longer part of Onside Analysis I'm free to start blogging again so let's start off by taking a look at how my football Pythagorean is doing for the English Premier League so far this season.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welome back! Now that I'm no longer part of Onside Analysis I'm free to start blogging again so let's start off by taking a look at how my football Pythagorean is doing for the English Premier League so far this season.&lt;/p&gt;
&lt;h2 id="football-pythagorean"&gt;Football Pythagorean&lt;/h2&gt;
&lt;p&gt;In case you haven’t seen it before, my &lt;a href="http://pena.lt/y/pythagorean.html"&gt;football pythagorean&lt;/a&gt; is an adaptation of the baseball Pythagorean that allows you to quickly estimate how many points a team would be expected to achieve on average based on the number of goals they have scored and conceded. It’s a pretty simple little equation but it is surprisingly accurate!&lt;/p&gt;
&lt;h2 id="the-season-so-far"&gt;The Season So Far&lt;/h2&gt;
&lt;p&gt;Figure One below shows the difference between the actual points each Premier League team has achieved this season and how much my Pythagorean predicts they should have on average. For teams in green the difference is positive so they actually have more points than expected while those in read have gained less points than would be expected based on the number of goals they have scored and conceded.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/20140120_epl_pythag.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure One: EPL Pythagorean Results So Far&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The stand out team here is obviously Tottenham, who have somehow managed to end up with eleven points more than would be expected based on their goals. Spurs’ Pythagorean has looked pretty big for a while now so I suppose you could look at this two ways – either they have developed an extremely effective and efficient system or they have been lucky to get as may points as they have. I’ll let you decide on the answer to that one…&lt;/p&gt;
&lt;p&gt;Interestingly, Manchester City are pretty close to their expected points total despite their enormous goal difference. One reason for this is that my football Pythagorean is not linear so as you score more goals they become less valuable to help account for high scoring matches, such as most of City’s home games this season! This helps prevent over-prediction of expected points for teams scoring heavily – having a good goal difference is obviously helpful but whether you win by one goal or five goals you still only get three points from the match.&lt;/p&gt;
&lt;p&gt;As it stands though Manchester City are in second place behind Arsenal who have acquired six points more than expected, meaning that typically we would not expect Arsenal to be top based on their results so far this season.&lt;/p&gt;
&lt;h2 id="how-will-the-season-end"&gt;How Will The Season End?&lt;/h2&gt;
&lt;p&gt;As well as looking at how teams are doing so far, we can also extrapolate the results and predict how the teams will end up at the end of the season (Table One). This is a very simplistic prediction, for example it does not take into account strength of schedules, but it is fairly accurate – the r squared value for Pythagorean predicted points versus actual points across multiple leagues worldwide was 0.938 with an average error of less than four points – so it should give a reasonable estimate of how the Premier League will finish next May.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Points&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;84.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;83.59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;80.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;73.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;72.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Tottenham Hotspur&lt;/td&gt;
&lt;td&gt;65.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;62.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;59.32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;54.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;41.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Hull City&lt;/td&gt;
&lt;td&gt;40.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;West Bromwich Albion&lt;/td&gt;
&lt;td&gt;40.74&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Swansea City&lt;/td&gt;
&lt;td&gt;39.66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Stoke City&lt;/td&gt;
&lt;td&gt;36.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Norwich City&lt;/td&gt;
&lt;td&gt;35.78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;West Ham United&lt;/td&gt;
&lt;td&gt;33.91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;32.28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Crystal Palace&lt;/td&gt;
&lt;td&gt;31.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;30.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Cardiff City&lt;/td&gt;
&lt;td&gt;29.29&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table One: Pythagorean Predicting Final Standings For the English Premier League 2013/2014&lt;/strong&gt;&lt;/p&gt;</content><category term="Pythagorean"></category><category term="EPL"></category><category term="Pythagorean"></category></entry><entry><title>UEFA Champions League – Route To The Final</title><link href="2013/09/30/uefa-champions-league-route-to-the-final/" rel="alternate"></link><published>2013-09-30T19:30:00+00:00</published><updated>2013-09-30T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-09-30:2013/09/30/uefa-champions-league-route-to-the-final/</id><summary type="html">&lt;p&gt;With the UEFA Champions League group stage now underway I took a look at what it typically takes for teams to reach the final.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;With the UEFA Champions League group stage now underway I took a quick look at what it typically takes for teams to reach the final.&lt;/p&gt;
&lt;p&gt;I started off by looking at how well teams from the major six domestic leagues (England, France, Italy, Spain, Germany and Portugal) performed in the UEFA Champions League based on what position they qualified in domestically (Figure 1) as this affects at what point they enter the competition…&lt;/p&gt;
&lt;p&gt;Find out more by reading the full article on the Onside Analysis blog &lt;a href="http://www.onsideanalysis.com/uefa-champions-league-route-final/"&gt;here&lt;/a&gt;.&lt;/p&gt;</content><category term="UEFA Champions League"></category><category term="UCL"></category></entry><entry><title>Analysing Football Teams Using Cluster Analysis and Principal Component Analysis</title><link href="2013/08/30/analysing-football-teams-using-cluster-analysis-and-principal-component-analysis/" rel="alternate"></link><published>2013-08-30T19:30:00+00:00</published><updated>2013-08-30T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-08-30:2013/08/30/analysing-football-teams-using-cluster-analysis-and-principal-component-analysis/</id><summary type="html">&lt;p&gt;The amount of football data available is growing rapidly – with every passing week of the season more matches are played and even more data gets collected. This is great as it allows us to increase our understanding of the game but it also means we quickly end up with more information than could ever be analysed manually.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The amount of football data available is growing rapidly – with every passing week of the season more matches are played and even more data gets collected. This is great as it allows us to increase our understanding of the game but it also means we quickly end up with more information than could ever be analysed manually.&lt;/p&gt;
&lt;p&gt;Instead, we can use techniques such as cluster analysis and principal component analysis (PCA) to critically analyse these large sets of football data to identify important patterns and relationships that can help explain a team’s performances.&lt;/p&gt;
&lt;p&gt;Find out more by reading the full article on the Onside Analysis blog &lt;a href="http://www.onsideanalysis.com/analysing-football-teams-using-cluster-analysis-principal-component-analysis/"&gt;here&lt;/a&gt;.&lt;/p&gt;</content><category term="Cluster Analysis"></category><category term="Cluster Analysis"></category></entry><entry><title>Anouncement</title><link href="2013/06/19/anouncement/" rel="alternate"></link><published>2013-06-19T19:30:00+00:00</published><updated>2013-06-19T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-06-19:2013/06/19/anouncement/</id><summary type="html">&lt;p&gt;You may have noticed that my blogging has slowed down over the past few weeks and the reason is that I have joined Onside Analysis as a computational statistician.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;You may have noticed that my blogging has slowed down over the past few weeks and the reason is that I have joined Onside Analysis as a computational statistician.&lt;/p&gt;
&lt;p&gt;I am really excited about my new role as it means that I will be working on football analysis full time instead of trying to squeeze it in around my day job, family, sleep etc. I am not sure exactly what this means for my blog here though but the plan is that I will be contributing to the Onside Analysis blog so keep an eye out on that if your interested in what I have been writing about so far.&lt;/p&gt;
&lt;p&gt;I’ll also still be around on &lt;a href="https://twitter.com/penaltyblog"&gt;Twitter&lt;/a&gt; so please keep in touch :)&lt;/p&gt;</content><category term="Misc"></category></entry><entry><title>Betting With The Eastwood Index And Kelly Criterion</title><link href="2013/05/23/betting-with-the-eastwood-index-and-kelly-criterion/" rel="alternate"></link><published>2013-05-23T19:30:00+00:00</published><updated>2013-05-23T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-23:2013/05/23/betting-with-the-eastwood-index-and-kelly-criterion/</id><summary type="html">&lt;p&gt;I demonstrated in my last post that the odds calculated using the Eastwood Index were slightly more accurate than the bookmakers over the course of the football season. My next goal is to work out the optimal way of using this edge to make a profit, starting off with the Kelly Criterion...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I demonstrated in my last post that the odds calculated using the Eastwood Index were slightly more accurate than the bookmakers over the course of the football season. My next goal is to work out the optimal way of using this edge to make a profit, starting off with the Kelly Criterion.&lt;/p&gt;
&lt;h2 id="the-kelly-criterion"&gt;The Kelly Criterion&lt;/h2&gt;
&lt;p&gt;The first point of call for any staking plan is the Kelly Criterion, a method developed by John Larry Kelly Jr to determine the optimal bet size based on how far the odds are perceived to be in your favour.&lt;/p&gt;
&lt;p&gt;The equation used to calculate the Kelly Criterion is shown in Figure 1 where $p$ is your expected probability of winning, $b$ is the odds offered and $f$ is the Kelly Criterion or recommended percentage of your bankroll to bet.&lt;/p&gt;
&lt;p&gt;$f=(pb-1)/(b-1)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Kelly Criterion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Let’s run through a quick example using Fulham’s last match of the season against Swansea City. Bet365 offered Fulham to win at odds of 4.75, which is equivalent to an expected win probability of around 21%, while let’s say you think the probability of Fulham winning is actually closer to 24%.&lt;/p&gt;
&lt;p&gt;$f = (0.24 * 4.75  – 1) / (4.75 – 1)$&lt;/p&gt;
&lt;p&gt;$f = 0.14 / 3.75$&lt;/p&gt;
&lt;p&gt;$f = 0.0373$&lt;/p&gt;
&lt;p&gt;$f = 3.73%$&lt;/p&gt;
&lt;p&gt;So according to the Kelly Criterion we should be willing to risk 3.73% of our bankroll on this bet.&lt;/p&gt;
&lt;h2 id="applying-the-kelly-criterion-to-the-eastwood-index"&gt;Applying the Kelly Criterion to the Eastwood Index&lt;/h2&gt;
&lt;p&gt;So what is the best way of applying the Kelly Criterion to the Eastwood Index? There are numerous different strategies that could be used but to start off with I’ve gone purely with value bets.&lt;/p&gt;
&lt;p&gt;For each match I calculated the Kelly Criterion based on the Home, Draw and Away odds from Bet365 and looked for the outcome where the recommended bet was the largest. The reason for this was that the larger the recommended bet then the greater the difference between my probabilities and the bookmaker’s odds so the greater the potential value of the bet.&lt;/p&gt;
&lt;p&gt;Figure Two shows the results over the course of the season. Starting off with a bankroll of £100 there was a slight loss over the first half of the season followed by pretty steady growth to finally finish with £114 in the bank. This gave a return on investment (ROI) of 14% for the Eastwood Index based on the 2012–2013 premier League season.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130523-Fraction-Kelly.png"&gt;
&lt;strong&gt;Figure 2: Bankroll over 2012-2013 Season&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="fractional-kelly"&gt;Fractional Kelly&lt;/h2&gt;
&lt;p&gt;Being relatively risk averse, I’ve found that using a fractional Kelly Criterion is preferable for me. Although using the full Kelly Criterion is optimal for maximizing growth of the bankroll long term, there is more risk of being caught out by variance and an unlucky streak wiping out your bank balance.&lt;/p&gt;
&lt;p&gt;Betting a fractional value, such as half the recommended amount slows down growth but helps protects you from volatility. As a comparison, take a look at Figure Three where I bet a full Kelly Criterion on each match and you can see that at its peak the ROI reaches 73%, yet at the end of the season variance has pulled the bankroll down below its starting value causing a loss overall.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130523-Full-Kelly.png"&gt;
&lt;strong&gt;Figure 3: Volatility Betting The Full Kelly Criterion&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="early-season-dip"&gt;Early Season Dip&lt;/h2&gt;
&lt;p&gt;One intriguing aspect of Figure Two is why the bankroll grew so much more in the second half of the season compared with the first? Partly this may be due to random variance but another possibility is betting on recently promoted teams.&lt;/p&gt;
&lt;p&gt;At the moment promoted teams take over the EI ratings of the equivalent relegated team so the team promoted as champions take the EI rating of the team relegated third from bottom, the team promoted second take over the rating of the team relegated second from bottom and the team promoted in the playoffs gets the ratings of the team finishing bottom.&lt;/p&gt;
&lt;p&gt;These ratings will not be exactly correct for the promoted teams but should over time move towards the right levels to reflect the team’s performances. Looking through the history of the bets made though those involving a promoted team during the first half of the season lost money overall while those not involving promoted teams made a profit.&lt;/p&gt;
&lt;p&gt;By the end of the season I had made a profit out of the promoted teams, suggesting that the team’s ratings had sufficiently corrected themselves. This means though that I could improve the ROI even further by avoiding bets on the promoted teams early on in the season or improving the way the promoted teams ratings are handled, such as correcting their EI ratings faster.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This is only a quick look at a very simple strategy for placing bets using the Eastwood Index; there are still plenty of improvements that can be made to improve results further. Yet even with this relatively naive approach the ROI is 14%, which is much more than I would have made sticking my money into a bank account.&lt;/p&gt;
&lt;p&gt;Applying the Eastwood Index to betting is also a great ways to identify the strengths and weaknesses of the model as the ROI gives a clear indicator of what works, what doesn’t work and what can be optimized further.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Addendum&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I was asked on Twitter what the ROI works out at per bet for the Eastwood Index without betting the Kelly Criterion. Using a fixed bet of £1 per bet gave an overall profit for the season of £17 over 380 matches, which works out at an ROI of around 4.5% per bet.&lt;/em&gt;&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>Did The Eastwood Index Beat The Bookmakers?</title><link href="2013/05/21/did-the-eastwood-index-beat-the-bookmakers/" rel="alternate"></link><published>2013-05-21T19:30:00+00:00</published><updated>2013-05-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-21:2013/05/21/did-the-eastwood-index-beat-the-bookmakers/</id><summary type="html">&lt;p&gt;It’s the end of the season so it’s time to review how the Eastwood Index performed over the year and how it compared with the bookmakers...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;It’s the end of the season so it’s time to review how the Eastwood Index performed over the year and how it compared with the bookmakers.&lt;/p&gt;
&lt;h2 id="ranked-probability-scores"&gt;Ranked Probability Scores&lt;/h2&gt;
&lt;p&gt;One of the most important aspects to me is how accurate the forecasts were and I’ve assessed this using &lt;a href="http://pena.lt/y/2013/03/21/how-accurate-are-the-ei-football-predictions/"&gt;Ranked Probability Scores&lt;/a&gt;, as recommended by &lt;a href="http://www.eecs.qmul.ac.uk/~norman/papers/assessing_probabilistic_football_forecast_models.pdf"&gt;Constantinou and Fenton&lt;/a&gt;.  I’ve discussed Ranked Probability Scores on the blog before but for people new to them they measure the difference between the forecasts and what really happened. Scores range between 0–1 and represent the amount of error in the predictions so lower Ranked Probability Scores are better and signify greater accuracy.&lt;/p&gt;
&lt;h2 id="comparison-with-bookmakers"&gt;Comparison With Bookmakers&lt;/h2&gt;
&lt;p&gt;Looking at Figure 1 you can see that the Eastwood Index has consistently outperformed the bookmakers all season – and this isn’t just one bookmaker that the Eastwood Index has beaten but the combined knowledge of the industry as I’ve aggregated multiple bookmakers’ odds together and stripped out the overround to make the comparison as tough as possible.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130521-EI-vs-BM1.png"&gt;
&lt;strong&gt;Figure 1: Eastwood Index Vs Aggregated Bookmakers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Interestingly, the difference in accuracy seems to be greatest as both ends of the season. I expected the start of the season to be difficult to forecast as new teams have been promoted, players have been bought and sold, and managers may have changed clubs but the Eastwood Index seems to have coped with these variables better than the bookmakers’ odds have.&lt;/p&gt;
&lt;p&gt;Over the course of the season the bookmakers’ forecasts improved until there was very little difference between them and the Eastwood Index but I was somewhat surprised to see how far out their accuracy drifted over the final few weeks of the season.&lt;/p&gt;
&lt;p&gt;In theory these should be the easiest matches to forecast as we have the most information but in reality they can be tricky as team’s motivations change. For example, Manchester United have been playing their reserve goalkeeper so he gets enough appearances to earn his winners medal while Swansea’s players may as well have been on holiday since they won the league cup.&lt;/p&gt;
&lt;p&gt;These changes seem to have thrown the bookmakers’ odds out quite noticeably while the Eastwood Index’s accuracy has remained constant. In fact, it suggests that bookmakers may be over-compensating for these apparent end-of-season effects as the Eastwood Index does not currently take them into account and has not struggled because of it.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Overall, I am pleased with the Eastwood Index’s debut season. I was slightly reticent to publish the forecasts at first in case the model did not hold up but it has remained accurate throughout the year. The next stage of its development is to identify any patterns as to where its forecasts differ from the bookmakers and how that could be combined with various staking strategies as well as looking at expanding to cover other leagues too.&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Probabilities for the English Premier League</title><link href="2013/05/17/ei-match-probabilities-for-the-english-premier-league-5/" rel="alternate"></link><published>2013-05-17T19:30:00+00:00</published><updated>2013-05-17T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-17:2013/05/17/ei-match-probabilities-for-the-english-premier-league-5/</id><summary type="html">&lt;p&gt;We have finally reached the end of the season so for the last time in 2012-2013 here are the Eastwood Index’s (EI) probabilities for the English Premier League...&lt;/p&gt;</summary><content type="html">&lt;p&gt;We have finally reached the end of the season so for the last time in 2012-2013 here are the Eastwood Index’s (EI) probabilities for the English Premier League.&lt;/p&gt;
&lt;p&gt;Once the season is over and done with I’ll be looking back at how the EI has performed and how well it’s predictions compare with the bookmakers so look out for that next week!&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Probabilities for the English Premier League</title><link href="2013/05/10/ei-match-probabilities-for-the-english-premier-league-4/" rel="alternate"></link><published>2013-05-10T19:30:00+00:00</published><updated>2013-05-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-10:2013/05/10/ei-match-probabilities-for-the-english-premier-league-4/</id><summary type="html">&lt;p&gt;Here are the latest match probabilities for the English Premier League calculated using the Eastwood Index (EI)...&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here are the latest match probabilities for the English Premier League calculated using the Eastwood Index (EI).&lt;/p&gt;
&lt;p&gt;Somewhat surprisingly, Liverpool are only just favorites to beat Fulham with the odds so close that a draw would seem the likely outcome.&lt;/p&gt;
&lt;p&gt;Down at the bottom of the table Newcastle versus QPR and Norwich versus West Brom look likely to finish tied, while Sunderland are slight favorites against Southampton meaning Wigan desperately need to take points off Arsenal to stand any chance of avoiding relegation.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>MLS Player Salaries: 2013</title><link href="2013/05/10/mls-player-salaries-2013/" rel="alternate"></link><published>2013-05-10T19:30:00+00:00</published><updated>2013-05-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-10:2013/05/10/mls-player-salaries-2013/</id><summary type="html">&lt;p&gt;The latest Major League Soccer (MLS) salaries were released recently by the MLS Players’ Union so I thought I would post a quick summary of the data...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The latest Major League Soccer (MLS) salaries were released recently by the MLS Players’ Union so I thought I would post a quick summary of the data.&lt;/p&gt;
&lt;h2 id="average-salary-by-team"&gt;Average Salary By Team&lt;/h2&gt;
&lt;p&gt;The first thing I was interested in was average salaries per team and whether there had been any changes compared with previous seasons (Figure 1).&lt;/p&gt;
&lt;p&gt;The trend over the past few years has been pretty constant, with LA Galaxy and New York Red Bulls having the highest outgoings on wages, which again continues for 2013.&lt;/p&gt;
&lt;p&gt;Toronto have typically followed in a distant third place but this season sees them overtaken by Seattle following the addition of Obafemi Martins to their roster.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;click the legend headers to show / hide each season’s data and hover the data points for more information&lt;/em&gt;&lt;/p&gt;
&lt;iframe src="http://pena.lt/y/highcharts/mls_average_by_club.html" frameborder="0" scrolling="no" width="100%" height="750"&gt;&lt;/iframe&gt;

&lt;h2 id="number-of-players"&gt;Number of Players&lt;/h2&gt;
&lt;p&gt;Next I looked at the number of players currently playing in each position. The results are pretty much the same as 2012, with a marginal gain in the number of forwards and defenders registered for this season (Figure 2).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;click the legend headers to show / hide each season’s data and hover the data points for more information&lt;/em&gt;&lt;/p&gt;
&lt;iframe src="http://pena.lt/y/highcharts/mls_players_pos.html" frameborder="0" scrolling="no" width="100%" height="500"&gt;&lt;/iframe&gt;

&lt;h2 id="average-salary-by-position"&gt;Average Salary By Position&lt;/h2&gt;
&lt;p&gt;Next I looked at average salary by position and it is probably no surprise that forwards receive the most remuneration (Figure 3). In fact, the higher up the field you are, the more money you earn, with goalkeepers earning the least followed by defenders, midfielders, attacking midfielders and then forwards.&lt;/p&gt;
&lt;p&gt;The only player outside of this trend are defensive midfielders who earn even less than goalkeepers. In terms of salary this appears to be the least appreciated position by quite a large margin. If you are out to make money then you are much better off specializing as either a clear-cut defender or midfielder rather than something perhaps between the two. Or even better, learn to score goals…&lt;/p&gt;
&lt;p&gt;&lt;em&gt;click the legend headers to show / hide each season’s data and hover the data points for more information&lt;/em&gt;&lt;/p&gt;
&lt;iframe src="http://pena.lt/y/highcharts/mls_salary_pos.html" frameborder="0" scrolling="no" width="100%" height="500"&gt;&lt;/iframe&gt;

&lt;h2 id="the-big-earners"&gt;The Big Earners&lt;/h2&gt;
&lt;p&gt;Although the average salaries show that forwards earn noticeably more than any other position, the actual value is skewed by a few high-profile players earning big bucks. Table 1 shows the top ten earners in the MLS compared with the overall league average. Of the ten players, eight are forwards and two are midfielders. The highest paid defender is Toronto’s Darren O’Dea, ranked 18th overall and the highest paid goalkeeper is Portland’s Donovan Ricketts, ranked just 41st overall.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Club&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Last Name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;First Name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Pos&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Base Salary&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Compensation&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NY&lt;/td&gt;
&lt;td&gt;Henry&lt;/td&gt;
&lt;td&gt;Thierry&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$3,750,000&lt;/td&gt;
&lt;td&gt;$4,350,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LA&lt;/td&gt;
&lt;td&gt;Keane&lt;/td&gt;
&lt;td&gt;Robbie&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$4,000,000&lt;/td&gt;
&lt;td&gt;$4,333,333&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NY&lt;/td&gt;
&lt;td&gt;Cahill&lt;/td&gt;
&lt;td&gt;Tim&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;$3,500,000&lt;/td&gt;
&lt;td&gt;$3,625,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LA&lt;/td&gt;
&lt;td&gt;Donovan&lt;/td&gt;
&lt;td&gt;Landon&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$2,500,000&lt;/td&gt;
&lt;td&gt;$2,500,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MTL&lt;/td&gt;
&lt;td&gt;Di Vaio&lt;/td&gt;
&lt;td&gt;Marco&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$1,000,008&lt;/td&gt;
&lt;td&gt;$1,937,508&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SEA&lt;/td&gt;
&lt;td&gt;Martins&lt;/td&gt;
&lt;td&gt;Obafemi&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$1,600,000&lt;/td&gt;
&lt;td&gt;$1,725,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TOR&lt;/td&gt;
&lt;td&gt;Koevermans&lt;/td&gt;
&lt;td&gt;Danny&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$1,250,000&lt;/td&gt;
&lt;td&gt;$1,663,323&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VAN&lt;/td&gt;
&lt;td&gt;Miller&lt;/td&gt;
&lt;td&gt;Kenny&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$1,114,992&lt;/td&gt;
&lt;td&gt;$1,132,492&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SEA&lt;/td&gt;
&lt;td&gt;Montero&lt;/td&gt;
&lt;td&gt;Fredy&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;$700,000&lt;/td&gt;
&lt;td&gt;$856,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAL&lt;/td&gt;
&lt;td&gt;Ferreira&lt;/td&gt;
&lt;td&gt;David&lt;/td&gt;
&lt;td&gt;M-F&lt;/td&gt;
&lt;td&gt;$625,000&lt;/td&gt;
&lt;td&gt;$730,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;League Average&lt;/td&gt;
&lt;td&gt;$141,903&lt;/td&gt;
&lt;td&gt;$159,849&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Since these star players are skewing the averages, we can analyse the &lt;a href="http://en.wikipedia.org/wiki/Median"&gt;median&lt;/a&gt; salary instead using box and whiskers plots (Figure 4). These show the distribution of the different salaries for each position where the thick line across the center of the box is the median salary, the top and bottom of the box are the 75th and 25th &lt;a href="http://en.wikipedia.org/wiki/Percentile"&gt;percentiles&lt;/a&gt; and the whiskers represent 1.5x the &lt;a href="http://en.wikipedia.org/wiki/Interquartile_range"&gt;interquartile range&lt;/a&gt;. Outliers outside of this range are then plotted as dots.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/Rplot01.png"&gt;&lt;/p&gt;
&lt;p&gt;Looking at the median salaries there is actually very little difference between the outfield players. The average MLS player’s salary is also clearly nothing like the league’s star players, in fact if we remove the top twenty earners then the overall league average falls from $159,849 to $113,516 with a median of $83,000 and a &lt;a href="http://en.wikipedia.org/wiki/Mode_(statistics)"&gt;mode&lt;/a&gt; of $46,500, which is the league minimum for first teamers (roster positions 1-24).&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This is only a quick overview of the data and there is still a lot more to explore so feel free to get in touch if there is anything in particular you want to have a look at.&lt;/p&gt;</content><category term="Misc"></category><category term="Chance"></category><category term="MLS"></category></entry><entry><title>The Eastwood Index, MLS and Parity</title><link href="2013/05/07/the-eastwood-index-mls-and-luck/" rel="alternate"></link><published>2013-05-07T19:30:00+00:00</published><updated>2013-05-07T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-07:2013/05/07/the-eastwood-index-mls-and-luck/</id><summary type="html">&lt;p&gt;I showed in my last post how Major League Soccer (MLS) is a much more closely matched league than the English Premier League (EPL), with the wage cap and draft system increasing the parity between teams...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I showed in my &lt;a href="http://pena.lt/y/2013/05/02/how-much-does-luck-affect-mls/"&gt;last post&lt;/a&gt; how Major League Soccer (MLS) is a much more closely matched league than the English Premier League (EPL), with the wage cap and draft system increasing the parity between teams.&lt;/p&gt;
&lt;h2 id="the-eastwood-index"&gt;The Eastwood Index&lt;/h2&gt;
&lt;p&gt;This high level of parity can also be seen using the &lt;a href="http://pena.lt/y/2013/02/21/rating-teams-and-predicting-football-matches-using-the-ei-index/"&gt;Eastwood Index&lt;/a&gt; (EI), a rating system designed to calculate odds of match outcomes when different teams play each other.&lt;/p&gt;
&lt;p&gt;The Eastwood Index rates teams so that the average rating is 2000 and the higher the rating the better a team is compared with the rest of the league.&lt;/p&gt;
&lt;p&gt;EI ratings increase when teams win matches or draw against superior opposition and decrease when teams lose matches or draw against weaker opposition. The size of the gain or loss in ratings is linked to the quality of the opposition so that beating a superior team is worth more than winning against a lower ranked team.&lt;/p&gt;
&lt;p&gt;The change in EI rating is also weighted by the goal difference in the match so that the greater the difference in goals scored or conceded then the greater the change in ratings. Home advantage is also included in the calculations so that the home team is expected to perform better when playing at home compared with away.&lt;/p&gt;
&lt;h2 id="major-league-soccer-ei-ratings"&gt;Major League Soccer EI Ratings&lt;/h2&gt;
&lt;p&gt;Currently, the highest rated team in MLS is LA Galaxy, with an EI of 2506 (Table 1) while the lowest is Toronto FC, with an EI of just 1303 (Table 2). Outside of this, the majority of teams are fairly evenly matched in MLS and are rated around 1880 – 2300 demonstrating the parity in the league.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Position&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Club&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EI Rating&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;New York Red Bulls&lt;/td&gt;
&lt;td&gt;2225&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Sporting Kansas City&lt;/td&gt;
&lt;td&gt;2374&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Houston Dynamo&lt;/td&gt;
&lt;td&gt;2210&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Montreal Impact&lt;/td&gt;
&lt;td&gt;2052&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Columbus Crew&lt;/td&gt;
&lt;td&gt;2082&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Philadelphia Union&lt;/td&gt;
&lt;td&gt;1809&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;New England Revolution&lt;/td&gt;
&lt;td&gt;1610&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Chicago Fire&lt;/td&gt;
&lt;td&gt;2063&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Toronto FC&lt;/td&gt;
&lt;td&gt;1303&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Table 1: MLS Eastern Conference EI Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Position&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Club&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EI Rating&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;FC Dallas&lt;/td&gt;
&lt;td&gt;2150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;LA Galaxy&lt;/td&gt;
&lt;td&gt;2506&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Real Salt Lake&lt;/td&gt;
&lt;td&gt;2271&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Portland Timbers&lt;/td&gt;
&lt;td&gt;1804&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Colorado Rapids&lt;/td&gt;
&lt;td&gt;1871&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Chivas USA&lt;/td&gt;
&lt;td&gt;1433&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;San Jose Earthquakes&lt;/td&gt;
&lt;td&gt;2267&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Vancouver Whitecaps&lt;/td&gt;
&lt;td&gt;1690&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Seattle Sounders FC&lt;/td&gt;
&lt;td&gt;2392&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Table 2: MLS Western Conference EI Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It is still a bit early in the season to draw too many conclusions but if we combine the two MLS conferences together then Columbus currently come out as mid-table, with an EI rating of 2082. This is just 424 lower than the top team (LA Galaxy) and 569 higher than the bottom of the league (Toronto FC), and close to theoretical league average EI of 2000.&lt;/p&gt;
&lt;h2 id="mls-compared-with-epl"&gt;MLS Compared with EPL&lt;/h2&gt;
&lt;p&gt;Compare this with the EPL (Table 3) and you can see an immediate difference in the level of parity. Taking the average of West Ham and Stoke to be the middle of the table then a mid placed EPL team’s EI is below the theoretical average at 1634, just 264 better than QPR at the bottom of the table and a gigantic 1431 away from  Manchester United. The top of the EPL has been very much a league-within-a-league for a while now, with average teams vastly more likely to be relegated than they are of ever winning anything or even reaching the European qualification spots.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Position&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Club&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EI Rating&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;3064&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;2909&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;2598&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;2627&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Tottenham Hotspur&lt;/td&gt;
&lt;td&gt;2514&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;2351&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;2291&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;West Bromwich Albion&lt;/td&gt;
&lt;td&gt;1883&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Swansea City&lt;/td&gt;
&lt;td&gt;1797&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;West Ham United&lt;/td&gt;
&lt;td&gt;1520&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Stoke City&lt;/td&gt;
&lt;td&gt;1747&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;1806&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;1704&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;1611&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;1741&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Norwich City&lt;/td&gt;
&lt;td&gt;1607&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;1814&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Wigan Athletic&lt;/td&gt;
&lt;td&gt;1650&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;1397&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Queens Park Rangers&lt;/td&gt;
&lt;td&gt;1369&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Table 3: EPL EI Ratings&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="parity"&gt;Parity&lt;/h2&gt;
&lt;p&gt;Compared with the EPL, MLS is a very evenly matched league where the margins between the top and bottom of the conferences are small, making it a really exciting league to follow as virtually any team is in with a chance of reaching the playoffs at the start of the season.&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category><category term="MLS"></category></entry><entry><title>EI Match Probabilities for the English Premier League</title><link href="2013/05/03/ei-match-probabilities-for-the-english-premier-league-3/" rel="alternate"></link><published>2013-05-03T19:30:00+00:00</published><updated>2013-05-03T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-03:2013/05/03/ei-match-probabilities-for-the-english-premier-league-3/</id><summary type="html">&lt;p&gt;Here are the latest match probabilities for the English Premier League calculated using the Eastwood Index (EI)...&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here are the latest match probabilities for the English Premier League calculated using the Eastwood Index (EI).&lt;/p&gt;
&lt;p&gt;Please note that I have not included next week’s mid-week matches yet as the odds for those will change depending on how this weekend’s matches finish. I’ll try and add those on Tuesday once I have all the data available.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Edit – Table 1 is now updated with the odds for this week’s mid week matches.&lt;/em&gt;&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;68&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>How Much Does Luck Affect MLS?</title><link href="2013/05/02/how-much-does-luck-affect-mls/" rel="alternate"></link><published>2013-05-02T19:30:00+00:00</published><updated>2013-05-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-05-02:2013/05/02/how-much-does-luck-affect-mls/</id><summary type="html">&lt;p&gt;Following my recent article for Betting Expert quantifying how large a role luck plays in the English Premier League (EPL) I thought it would be interesting to look at Major League Soccer (MLS) too...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Following my recent article for &lt;a href="http://www.bettingexpert.com/blog/football-luck"&gt;Betting Expert&lt;/a&gt; quantifying how large a role luck plays in the English Premier League (EPL) I thought it would be interesting to look at Major League Soccer (MLS) too.&lt;/p&gt;
&lt;h2 id="mls-structure"&gt;MLS Structure&lt;/h2&gt;
&lt;p&gt;MLS is structured differently to the EPL as it has followed other North American sports in implementing wage caps and player drafts. Unlike the current salary free-for-all in the EPL, MLS clubs are currently limited to spending a maximum of $2.95 million in wages over their first 20 roster spots, with up to three additional designated players paid (partially) outside of this salary cap.&lt;/p&gt;
&lt;p&gt;MLS also has a draft system that takes place each January during which teams can sign players graduating from college or otherwise signed by the league. The draft is split into three rounds and is designed to give priority to the league’s weaker teams allowing them first choice of players ahead of the more successful teams.&lt;/p&gt;
&lt;p&gt;MLS is also a shorter season than the EPL with teams playing just 34 matches compared with the EPL’s 38. This is important as the more matches that are played then the more opportunity talent has to overcome luck.&lt;/p&gt;
&lt;p&gt;Overall, this all works towards increasing parity in the MLS and making it a more evenly balanced league, which in turn should enhance the role luck plays.&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;Using &lt;a href="http://www.bettingexpert.com/blog/football-luck"&gt;my adaptation&lt;/a&gt; of &lt;a href="http://www.tangotiger.net/"&gt;Tom ‘Tango’ Tiger’s&lt;/a&gt; &lt;a href="http://blog.philbirnbaum.com/2006/08/on-correlation-r-and-r-squared.html"&gt;baseball equation&lt;/a&gt; I calculated the average win rate in MLS going back to 2004 and the variance of the win rate. I them calculated the variance expected due to luck and subtracted one from the other to get the amount of variance attributed to talent.&lt;/p&gt;
&lt;p&gt;Luck accounts for around 35% of a team’s win rate in the EPL and I was expecting MLS to be higher, but it initially came out at a staggering 82% for MLS. Instinctively this seems too high and I suspect it is inaccurate due to the changes in MLS’s structure over the years. For example, back in 2004 there were only ten teams and one conference while there are currently 19 teams and two conferences. There have also been changes to the level of the salary cap and the number of designated players allowed over this time period too.&lt;/p&gt;
&lt;p&gt;So I went back and reprocessed the results using just the 2010–2012 data. Although this reduces the sample size considerably it leaves us with data more representative of the current state of MLS. And the results this time? Luck accounted for around 57% of a team’s win percentage compared with just 43% for talent.&lt;/p&gt;
&lt;p&gt;So compared with the EPL, the structure of MLS does appear to increase parity and enhance the influence luck has in deciding the league champions. In fact, being lucky is probably the more important of the two, although luck on its own is not enough – you need to be a talented team with luck on its side to win the MLS Cup.&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category><category term="MLS"></category></entry><entry><title>How Much Does Luck Affect Football?</title><link href="2013/04/30/how-much-does-luck-affect-football/" rel="alternate"></link><published>2013-04-30T19:30:00+00:00</published><updated>2013-04-30T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-30:2013/04/30/how-much-does-luck-affect-football/</id><summary type="html">&lt;p&gt;I’ve written a new article for Betting Expert quantifying how much luck affects football...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I’ve written a new article for Betting Expert quantifying how much luck affects football. Take a look &lt;a href="http://www.bettingexpert.com/blog/football-luck"&gt;here&lt;/a&gt; as it is probably more than you are expecting!&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category></entry><entry><title>What Is A Meaningful Sample Size?</title><link href="2013/04/28/what-is-a-meaningful-sample-size/" rel="alternate"></link><published>2013-04-28T19:30:00+00:00</published><updated>2013-04-28T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-28:2013/04/28/what-is-a-meaningful-sample-size/</id><summary type="html">&lt;p&gt;I had an article published at Betting Expert last week looking at how to determine statistically how much data you need to make accurate predictions...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I had an article published at Betting Expert last week looking at how to determine statistically how much data you need to make accurate predictions.&lt;/p&gt;
&lt;p&gt;Find out more by reading the rest of this article &lt;a href="http://www.bettingexpert.com/blog/how-much-data"&gt;here&lt;/a&gt;.&lt;/p&gt;</content><category term="Misc"></category></entry><entry><title>EI Match Probabilities for the English Premier League</title><link href="2013/04/26/ei-match-probabilities-for-the-english-premier-league-2/" rel="alternate"></link><published>2013-04-26T19:30:00+00:00</published><updated>2013-04-26T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-26:2013/04/26/ei-match-probabilities-for-the-english-premier-league-2/</id><summary type="html">&lt;p&gt;It’s been a busy day but I’ve finally got the probabilities for this weekend’s matches completed....&lt;/p&gt;</summary><content type="html">&lt;p&gt;It’s Friday again so here are this weekend’s match probabilities for the English Premier League.&lt;/p&gt;
&lt;p&gt;I was a little surprised to see Manchester United come out as favourites against Arsenal, even they they are away from home. but the odds are so close though that it looks like a potential draw. However, it all depends on Sir Alex Ferguson’s squad selection, with the league won will he rest the bigger stars and let some of the second-string players reach enough appearances to be eligible for a winners medal? Anders Lindegaard, for example, still needs to play another two matches this season to claim his medal.&lt;/p&gt;
&lt;p&gt;Other possible draws include Southampton Vs West Brom and Newcastle Vs Liverpool, while you’d hope Reading Vs QPR will not be a draw as a single point is useless for either team.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Probabilities for the English Premier League</title><link href="2013/04/19/ei-match-probabilities-for-the-english-premier-league/" rel="alternate"></link><published>2013-04-19T19:30:00+00:00</published><updated>2013-04-19T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-19:2013/04/19/ei-match-probabilities-for-the-english-premier-league/</id><summary type="html">&lt;p&gt;It’s been a busy day but I’ve finally got the probabilities for this weekend’s matches completed....&lt;/p&gt;</summary><content type="html">&lt;p&gt;It’s been a busy day but I’ve finally got the probabilities for this weekend’s matches completed.&lt;/p&gt;
&lt;p&gt;There are some pretty close games, with QPR Vs Stoke, Tottenham Vs Man City and Liverpool Vs Chelsea all looking like potential draws. Plus you could maybe throw Sunderland Vs Everton and even West Ham Vs Wigan into that group too.&lt;/p&gt;
&lt;p&gt;The only clear favourties are Manchester United and Norwich so it’s going to be a tricky week to call.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Home Team&lt;/td&gt;
&lt;td&gt;Away Team&lt;/td&gt;
&lt;td&gt;Home(%)&lt;/td&gt;
&lt;td&gt;Draw (%)&lt;/td&gt;
&lt;td&gt;Away (%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/04/12/ei-match-predictions-for-the-english-premier-league-6/" rel="alternate"></link><published>2013-04-12T19:30:00+00:00</published><updated>2013-04-12T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-12:2013/04/12/ei-match-predictions-for-the-english-premier-league-6/</id><summary type="html">&lt;p&gt;Here we go with this week’s predictions from the Eastwood Index (EI)!...&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here we go with this week’s predictions from the Eastwood Index (EI)!&lt;/p&gt;
&lt;p&gt;The EI doesn’t hold out much chance for Wigan or West Ham getting three points off the two Manchester clubs this week, with the lowest odds I think the EI has ever produced. Interestingly, both of Fulham’s matches look like possible draws.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Home Team&lt;/td&gt;
&lt;td&gt;Away Team&lt;/td&gt;
&lt;td&gt;Home (%)&lt;/td&gt;
&lt;td&gt;Draw (%)&lt;/td&gt;
&lt;td&gt;Away (%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/04/05/ei-match-predictions-for-the-english-premier-league-5/" rel="alternate"></link><published>2013-04-05T19:30:00+00:00</published><updated>2013-04-05T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-05:2013/04/05/ei-match-predictions-for-the-english-premier-league-5/</id><summary type="html">&lt;p&gt;A couple of weeks back I demonstrated how the EI is more accurate than the bookies based on rank probability scores but a few people have asked if I can do something a bit simpler so...&lt;/p&gt;</summary><content type="html">&lt;p&gt;A couple of weeks back I demonstrated how the EI is more accurate than the bookies based on rank probability scores but a few people have asked if I can do something a bit simpler so Figure 1 shows how often the EI picked the winner as being the favourite compared with aggregated bookmaker’s odds. It’s pretty close but the EI seems to have a small but reasonably constant margin over the bookmaker so far this season.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/EI_BM_2012_2013.png"&gt;
&lt;strong&gt;Figure 1: EI Versus Bookmakers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Last week turned out to be a pretty good week with the EI managing to correctly predict the winner in eight out of the ten matches played. I’d made a few minor tweaks before posting last week’s odds to try and enhance the the way draws and away wins are calculated so hopefully the EI will be able to maintain its edge over the bookmakers.&lt;/p&gt;
&lt;p&gt;Anyway, here are this week’s odds:&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>Understanding Total Shot Ratio in Football</title><link href="2013/04/02/understanding-total-shot-ratio-in-football/" rel="alternate"></link><published>2013-04-02T19:30:00+00:00</published><updated>2013-04-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-04-02:2013/04/02/understanding-total-shot-ratio-in-football/</id><summary type="html">&lt;p&gt;The use of Total Shot Ratio (or TSR) seems to have slowly been gaining ground so I thought it would be worth analyzing the statistic in more detail to see what it can and cannot do.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The use of Total Shot Ratio (or TSR) seems to have slowly been gaining ground so I thought it would be worth analyzing the statistic in more detail to see what it can and cannot do.&lt;/p&gt;
&lt;h2 id="what-is-total-shot-ratio"&gt;What is Total Shot Ratio?&lt;/h2&gt;
&lt;p&gt;Put simply Total Shot Ratio is the proportion of shots taken by one team compared with another. It can be calculated by dividing the number of shots taken by a team by the total shots overall (Figure 1).&lt;/p&gt;
&lt;p&gt;$TSR=ShotsFor/(ShotsFor+ShotsAway)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Total Shot Ratio&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It is often used as a surrogate for dominance as the presumption is that the team taking the majority of the shots will be controlling the match and possibly limiting the opposition’s ability to shoot at goal.&lt;/p&gt;
&lt;h2 id="total-shot-ratio-data"&gt;Total Shot Ratio Data&lt;/h2&gt;
&lt;p&gt;Using data taken from the football-data.co.uk website I calculated the Total Shot Ratios for all matches from the English Premier League going back to the 2001-2002 season, giving a total of 8360 data points, which are normally distributed (Figure 2).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130402-TSR-Distribution.png"&gt;
&lt;strong&gt;Figure 2: Distribution of Total Shot Ratios&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The average Total Shot Ratio is always 0.5, because for every value above 0.5 you always an equivalent value below it for the opposition. For example, if the home team’s Total Shot Ratio is 0.75 then the away team’s ratio must be 0.25.&lt;/p&gt;
&lt;p&gt;$(0.75 + 0.25) / 2 = 0.5$&lt;/p&gt;
&lt;p&gt;The standard deviation, which is a measure of the dispersion of the data around the average value, was 0.166.&lt;/p&gt;
&lt;h2 id="total-shot-ratio-correlation-with-goals-scored"&gt;Total Shot Ratio: Correlation With Goals Scored&lt;/h2&gt;
&lt;p&gt;Since Total Shot Ratios are being used to show dominance in a match it makes sense to assess the correlation with the number of goals scored. The higher a team’s Total Shot Ratio is then the more shots it is having compared with the opposition so the expectation would be that they would score more goals. However, this does not seem to be the case as the relationship between the two is extremely weak (Figure 3; r2=0.079).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130402-TSR-Goals.png"&gt;
&lt;strong&gt;Figure 3: Correlation Between Total Shot Ratio and Goals Scored&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="total-shot-ratio-correlation-with-goal-difference"&gt;Total Shot Ratio: Correlation With Goal Difference&lt;/h2&gt;
&lt;p&gt;So how about the relationship between Total Shot Ratio and goal difference instead? Since teams with higher Total Shot Ratios are thought to be dominating matches, perhaps they are more likely to have a higher goal difference in the match as they may also be less likely to concede goals? Again though the correlation is weak (Figure 4, r2=0.11).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130402-TSR-Goal-Diff.png"&gt;
&lt;strong&gt;Figure 4: Correlation Between Total Shot Ratio and Goal Difference&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="total-shot-ratio-correlation-with-match-outcomes"&gt;Total Shot Ratio: Correlation With Match Outcomes&lt;/h2&gt;
&lt;p&gt;Additionally, the relationship between Total Shot Ratio and match outcome is also poor (r2=0.066) suggesting that Total Shot Ratio also has very little influence on the likelihood of a team winning a particular match. Just because you are taking a greater proportion of the shots does not mean you are any more likely to win.&lt;/p&gt;
&lt;h2 id="total-shot-ratio-correlation-with-points-per-season"&gt;Total Shot Ratio: Correlation With Points Per Season&lt;/h2&gt;
&lt;p&gt;Although the match-by-match correlations above are weak there is the suggestion of a trend so it may be that Total Shot Ratio is heavily luck driven in the short term and that we need more matches before we can see the overall effects of a higher ratio. For example, looking at the correlation between Total Shot Ratio and points over an entire season shows a pretty decent relationship between the two (Figure 5). This suggests that long term possessing a higher Total Shot Ratio is in fact associated with fewer matches being lost per season.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130402-TSR-Points.png"&gt;
&lt;strong&gt;Figure 5: Correlation Between Total Shot Ratio and Points&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="total-shot-ratio-how-much-data-is-enough"&gt;Total Shot Ratio: How Much Data is Enough?&lt;/h2&gt;
&lt;p&gt;So if Total Shot Ratio is only becoming meaningful over longer periods of time then how much data do we actually need before it becomes a useful metric? To look at this I calculated the overall Total Shot Ratio per season by team and then randomized the order of each match that season. I then looked at how the deviation changed over course of a season compared with the overall ratio, e.g. after five matches, ten matches etc  (Figure 6).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130402-TSR-Deviation.png"&gt;
&lt;strong&gt;Figure 6: Deviation in Total Shot Ratio by Sample Size&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As more data is used to calculate the Total Shot Ratio it moves closer towards its true value and the deviation decreases as the effect of any outlier matches becomes less influential. With fewer matches being used to calculate the Total Shot Ratio there is more dispersion and variability in the calculated value. Interestingly, there is still a reduction in the deviation moving from 30 matches to 38 matches, suggesting that we may need at least a full season’s worth of data to get an accurate measure of a team’s Total Shot Ratio.&lt;/p&gt;
&lt;h2 id="total-shot-ratio-calculated-sample-size"&gt;Total Shot Ratio: Calculated Sample Size&lt;/h2&gt;
&lt;p&gt;Another option to find out how much data we need is to calculate the sample size required to identify specific differences in Total Shot Ratio. There are a number of different methods for this but the commonly used t-test sample size estimation suggests that to be 95% certain that two teams with a difference in Total Shot Ratio of 0.1 are actually different from each other takes 45 matches.&lt;/p&gt;
&lt;p&gt;So, to be statistically certain that a team with a Total Shot Ratio of 0.6 actually has a higher ratio than a team with a Total Shot Ratio of 0.5 rather than it just being down to random variability requires over a season’s worth of matches to be played.&lt;/p&gt;
&lt;p&gt;As the differences become smaller then the number of matches required increases even further – to identify a difference in Total Shot Ratio of 0.05 takes nearly five season’s worth of matches!&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;In the short term, Total Shot Ratio appears to virtually meaningless in terms of goals scored or match outcomes as its variability is so high.&lt;/p&gt;
&lt;p&gt;Over the long term though, skill outweighs luck and Total Shot Ratio becomes increasingly correlated with outcomes. However, it may take a long time for this to occur and may be less accurate than other statistics available if you are interested in predicting performance.&lt;/p&gt;
&lt;p&gt;Finally, this article is not intended to say “do not to use Total Shot Ratio” as it is still an interesting metric. Rather, make sure that you are aware of its abilities and limitations if you are planning on using it for analysis.&lt;/p&gt;</content><category term="TSR"></category><category term="TSR"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/03/28/ei-match-predictions-for-the-english-premier-league-4/" rel="alternate"></link><published>2013-03-28T19:30:00+00:00</published><updated>2013-03-28T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-03-28:2013/03/28/ei-match-predictions-for-the-english-premier-league-4/</id><summary type="html">&lt;p&gt;After last week’s international matches, domestic football is finally back so here are this weekend’s match predictions using my EI predictive model. Let’s see if it can keep up its good form and continue to beat the bookmakers!&lt;/p&gt;</summary><content type="html">&lt;p&gt;After last week’s international matches, domestic football is finally back so here are this weekend’s match predictions using my EI predictive model. Let’s see if it can keep up its good form and continue to beat the bookmakers!&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;Man Utd&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;Stoke City&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>How Accurate Are The EI Football Predictions?</title><link href="2013/03/21/how-accurate-are-the-ei-football-predictions/" rel="alternate"></link><published>2013-03-21T19:30:00+00:00</published><updated>2013-03-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-03-21:2013/03/21/how-accurate-are-the-ei-football-predictions/</id><summary type="html">&lt;p&gt;Unfortunately time caught up with me last week and I was unable to post any predictions from my Eastwood Index. However, since then I have been busy validating the results to see how accurate the predictions really are using the 296 matches played in the English Premier League so far this season.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Unfortunately time caught up with me last week and I was unable to post any predictions from my Eastwood Index. However, since then I have been busy validating the results to see how accurate the predictions really are using the 296 matches played in the English Premier League so far this season.&lt;/p&gt;
&lt;h2 id="ranked-probability-scores"&gt;Ranked Probability Scores&lt;/h2&gt;
&lt;p&gt;I have previously discussed the problems of trying to determine the accuracy of probability-based models and Jonas posted a suggestion in the comments section recommending the use of ranked probability scores, which turned out to be a really interesting idea.&lt;/p&gt;
&lt;p&gt;Ranked probability scores were originally proposed by &lt;a href="http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Epstein_1969.pdf"&gt;Epstein&lt;/a&gt; back in 1969 as a way to compare probabilistic forecasts against categorical data. Their main advantage over other techniques is that as well as looking at accuracy, they also account for distance in the predictions e.g. how far out inaccurate predictions are from what actually happened.&lt;/p&gt;
&lt;p&gt;They are also easy to calculate. The equation for ranked probability scores is shown in Figure 1 for those of a mathematical disposition, where $K$ is the number of possible outcomes, and $CDF_{fc}$ and $CDF_{obs}$ are the predictions and observations for prediction $k$.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/rpseqn.png"&gt;&lt;/p&gt;
&lt;h2 id="interpreting-ranked-probability-scores"&gt;Interpreting Ranked Probability Scores&lt;/h2&gt;
&lt;p&gt;Ranked probability scores range between 0–1 and are negatively orientated meaning that the lower the result the better. For simplicity, you can think of them representing the amount of error in the predictions where a score of zero means your predictions are perfect.&lt;/p&gt;
&lt;h2 id="the-results"&gt;The Results&lt;/h2&gt;
&lt;p&gt;I started off looking at how well I would have done if I had just guessed at random for each match in the English Premier League this season rather than using the Eastwood Index and obtained a ranked probability score of 0.231.&lt;/p&gt;
&lt;p&gt;Next, I looked at how well the bookmaker’s odds predicted matches. To do this I aggregated the odds from multiple bookmakers, partly to reduce the comparisons needed and partly because aggregating data often improve predictions and I wanted to give the Eastwood Index the toughest test possible. This gave a ranked probability score of 0.193 for the bookmakers.&lt;/p&gt;
&lt;p&gt;Finally I calculated the score for the Eastwood Index and got…&lt;/p&gt;
&lt;p&gt;&lt;em&gt;drum roll please&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;a ranked probability score of 0.191. Okay, it is not much lower than the bookmakers but it does mean that so far this season the Eastwood Index has been more accurate at predicting football matches than the combined odds of the gaming industry which is really pleasing for me.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Most importantly though this suggests that the Eastwood Index works. I had originally set myself the target of being able to compete with the bookmakers as I consider them to be gold standard prediction for football. These are large companies employing professional odds compilers to generate their odds so for me to be able to beat them, even by a small amount, using a bunch of equations is a big success for the Eastwood Index.&lt;/p&gt;
&lt;p&gt;It is still early days and it is still a relatively small number of predictions (n=296) so I will be continuing to monitor the results to check the accuracy doesn’t change over time. It is a fantastic start though and great inspiration to continue developing and improving the Eastwood Index further!&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category></entry><entry><title>Is Brendan Rogers Improving Liverpool?</title><link href="2013/03/13/is-brendan-rogers-improving-liverpool/" rel="alternate"></link><published>2013-03-13T19:30:00+00:00</published><updated>2013-03-13T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-03-13:2013/03/13/is-brendan-rogers-improving-liverpool/</id><summary type="html">&lt;p&gt;As well as using my EI Index to predict future matches, it can also be used to look back at how team’s performances have changed over time.  An interesting example is Liverpool, who sacked Kenny Dalglish at the end of the 2011–2012 season to bring in Brendan Rogers from Swansea City.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As well as using my &lt;a href="http://pena.lt/y/2013/02/21/rating-teams-and-predicting-football-matches-using-the-ei-index/"&gt;EI Index&lt;/a&gt; to predict future matches, it can also be used to look back at how team’s performances have changed over time.  An interesting example is Liverpool, who sacked Kenny Dalglish at the end of the 2011–2012 season to bring in Brendan Rogers from Swansea City.&lt;/p&gt;
&lt;p&gt;The green line in Figure One shows the weekly EI rating for Kenny Dalglish’s Liverpool team over the course of the 2011–2012 season, with the black line showing the moving three-match average. Up until around Christmas time Liverpool were making decent progress in terms of EI, improving from a rating of 2127 up a peak of 2247 following their 3-1 victory against Newcastle United.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130313-LFC-EI.png"&gt;&lt;/p&gt;
&lt;p&gt;However, Liverpool’s form plummeted soon after that, with 11 losses out of their remaining 19 matches dragging Liverpool’s EI back down rapidly. Their worst performances in terms of EI were losses against Bolton Wanderers and Wigan Athletic, both of which Liverpool’s EI ratings suggested they should have had a good chance of winning. Despite a small flurry at the end of the season, Liverpool still finished with an EI lower than they started with.&lt;/p&gt;
&lt;p&gt;In contrast, the red line in Figure One shows Liverpool’s weekly EI ratings under Brendan Rogers, with the black line again showing the moving three-match average.&lt;/p&gt;
&lt;p&gt;The first few matches of the season did not go particularly well for Rogers and Liverpool’s EI dropped even lower than under Dalglish. The obvious narrative here is that Liverpool may have needed time to adjust to Rogers tactical changes but it’s also worth noting they had a tough start to the season, with fixtures against Manchester City, Arsenal and Manchester United all within the opening few weeks.&lt;/p&gt;
&lt;p&gt;Since then, Liverpool has shown a pretty steady increase in EI over the rest of the season. There have been a few drops along the way due to unexpected losses against teams such as Aston Villa and Stoke but their EI rating is currently on course to exceed Dalglish’s peak by the end of the season.&lt;/p&gt;
&lt;p&gt;To put these numbers into context, Chelsea are currently in fourth position with an EI of 2464 while Tottenham Hotspur finished fourth last season with an EI of 2329. While Liverpool’s EI isn’t quite that high yet, if they can maintain their current rate of improvement then their EI rating suggests they have a decent chance of challenging for a Champion’s League place next season.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;addendum&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In case anyone wonders why Brendan Rogers’ starting EI is lower than Kenny Dalglish’s final EI – Brendan Rogers lost his first match against West Bromwich Albion so the difference between the two is the loss in EI caused by that particular match.&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/03/08/ei-match-predictions-for-the-english-premier-league-3/" rel="alternate"></link><published>2013-03-08T19:30:00+00:00</published><updated>2013-03-08T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-03-08:2013/03/08/ei-match-predictions-for-the-english-premier-league-3/</id><summary type="html">&lt;p&gt;Here we go again!&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here we go again!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://pena.lt/y/2013/03/08/ei-match-predictions-for-the-english-premier-league-2/"&gt;Last Week&lt;/a&gt; was another success for the EI, with seven out of the ten predicted favourites winning their matches. It is still a bit early to be drawing too many conclusions but so far that is 14 out of 19 for the EI, which seems a pretty good start to me!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://pena.lt/y/2013/02/28/how-did-the-ei-predictions-do/"&gt;Jonas&lt;/a&gt; commented on my recent post discussing the difficulties of assessing probability-based models to suggest trying &lt;a href="http://en.wikipedia.org/wiki/Statistical_distance"&gt;Ranked Probability Scores&lt;/a&gt; which looks like a really good idea so look out for that once I have a bit more data to play with.&lt;/p&gt;
&lt;p&gt;It is only a small gameweek this week due to the FA cup but here are the predictions anyway.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/03/01/ei-match-predictions-for-the-english-premier-league-2/" rel="alternate"></link><published>2013-03-01T19:30:00+00:00</published><updated>2013-03-01T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-03-01:2013/03/01/ei-match-predictions-for-the-english-premier-league-2/</id><summary type="html">&lt;p&gt;Since last week’s predictions turned out to be so popular I thought I would continue testing the EI index in public so here are this week’s predictions. Fingers crossed they turn out well again!...&lt;/p&gt;</summary><content type="html">&lt;p&gt;Since last week’s predictions turned out to be so popular I thought I would continue testing the EI index in public so here are this week’s predictions. Fingers crossed they turn out well again!&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away Team&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Home (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Draw (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away (%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;West Brom&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;Reading&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man United&lt;/td&gt;
&lt;td&gt;Norwich&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/td&gt;
&lt;td&gt;QPR&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke&lt;/td&gt;
&lt;td&gt;West Ham&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/td&gt;
&lt;td&gt;Newcastle&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan&lt;/td&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;Man City&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/02/28/how-did-the-ei-predictions-do/" rel="alternate"></link><published>2013-02-28T19:30:00+00:00</published><updated>2013-02-28T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-02-28:2013/02/28/how-did-the-ei-predictions-do/</id><summary type="html">&lt;p&gt;Last week was a big test for my new EI index – it had finally reached the point where I was confident it was working well enough to post its predictions in public.&lt;/p&gt;</summary><content type="html">&lt;h2 id="the-ei"&gt;The EI&lt;/h2&gt;
&lt;p&gt;Last week was a big test for my new &lt;a href="http://pena.lt/y/2013/02/21/rating-teams-and-predicting-football-matches-using-the-ei-index/"&gt;EI Index&lt;/a&gt; – it had finally reached the point where I was confident it was working well enough to post its predictions in public.&lt;/p&gt;
&lt;p&gt;For those people who haven’t come across it before, the EI index is a mathematical system I have been developing for ranking football teams and predicting the outcomes of matches. Using the EI it is possible to predict the odds for each team winning, drawing or losing its match.&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;So how did the EI do?&lt;/p&gt;
&lt;p&gt;Well, this is the tricky bit as the EI is a probability model. For linear models it is relatively simple to assess accuracy as you get an R-squared value showing you how well your predictions match the observed result. The higher the R-squared then the better you did.&lt;/p&gt;
&lt;p&gt;For a probability-based model though you cannot do this. An obvious alternative is to just look at whether the model’s predicted favourites won their matches.  And on this measure the EI performed fantastically well by correctly matching seven of its nine predictions, giving it an overall success rate of (78%).&lt;/p&gt;
&lt;p&gt;But we need to be careful here as this can be a misleading way of looking at accuracy. Just because Manchester City had a 54% probability of beating Chelsea doesn’t mean they will win the match purely because it is the most probable result.  Instead, it means that if this match was played 100 times then Manchester City would be expected to win 54 of them and not win 46 of them.&lt;/p&gt;
&lt;p&gt;Rather than looking at the accuracy of the predicted favourite winning we really need to look at the accuracy of the predicted probabilities.  Are teams ranked with a 50% chance of winning actually winning 50% of the time? Are teams ranked with a 25% chance of winning actually winning 25% of the time?&lt;/p&gt;
&lt;p&gt;This can only be done by making lots and lots of predictions so over the coming weeks I will keep making them until I have enough of predictions to get an estimate of how good they are.&lt;/p&gt;
&lt;p&gt;Overall though it is a very exciting start for the EI, bring on next week!&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>EI Match Predictions for the English Premier League</title><link href="2013/02/21/ei-match-predictions-for-the-english-premier-league/" rel="alternate"></link><published>2013-02-21T19:30:00+00:00</published><updated>2013-02-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-02-21:2013/02/21/ei-match-predictions-for-the-english-premier-league/</id><summary type="html">&lt;p&gt;For a bit of fun, here is a trial run at predicting this weekend’s EPL matches using my EI ratings. I haven’t compared these with anyone else’s odds yet but they generally look about what I would have expected....&lt;/p&gt;</summary><content type="html">&lt;p&gt;For a bit of fun, here is a trial run at predicting this weekend’s EPL matches using my EI ratings. I haven’t compared these with anyone else’s odds yet but they generally look about what I would have expected.&lt;/p&gt;
&lt;p&gt;Poor QPR don’t seem to have much chance holding out against Manchester United, even playing at home they are only rated at having a 9% chance of winning.&lt;/p&gt;
&lt;p&gt;It looks like it could be a good weekend for Arsenal to bounce back from their Champions League defeat as they have a massive 67% chance of beating Aston Villa.&lt;/p&gt;
&lt;p&gt;Personally, I am surprised Newcastle are rated quite so highly against Southampton. I wonder if this may be due to Newcastle being so strong last season while Southampton only have this season’s data for generating EI ratings from? If so, it may be that I need to go back and tweak the equation weightings slightly to account for situations like this.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Match&lt;/td&gt;
&lt;td&gt;Home (%)&lt;/td&gt;
&lt;td&gt;Draw (%)&lt;/td&gt;
&lt;td&gt;Away (%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham Vs Stoke&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;46&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;25&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;29&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal Vs Aston Villa&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;67&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;15&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;17&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich Vs Everton&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;29&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;27&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;43&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QPR Vs Man United&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;9&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;23&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;68&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading Vs Wigan&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;43&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;25&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;31&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Brom Vs Sunderland&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;47&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;24&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;28&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City Vs Chelsea&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;54&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;24&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle Vs Southampton&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;53&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;25&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham Vs Tottenham&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;19&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;27&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;54&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="EI"></category><category term="EI"></category><category term="EPL"></category></entry><entry><title>Introducing the Eastwood Index</title><link href="2013/02/21/rating-teams-and-predicting-football-matches-using-the-ei-index/" rel="alternate"></link><published>2013-02-21T19:30:00+00:00</published><updated>2013-02-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-02-21:2013/02/21/rating-teams-and-predicting-football-matches-using-the-ei-index/</id><summary type="html">&lt;p&gt;On the whole, the Elo system works okay but it was not designed with football in mind and so there are some issues with it - for example it can only handle two distinct outcomes – winning and losing...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;My past couple of articles have focused on Elo ratings and how they can be applied to football teams to rank them against each other and to estimate win probabilities.&lt;/p&gt;
&lt;h2 id="problems-with-elo-ratings"&gt;Problems with Elo Ratings&lt;/h2&gt;
&lt;p&gt;On the whole the Elo system works okay but it was not designed with football in mind and so there are some issues with it, for example it can only handle two distinct outcomes – winning and losing.&lt;/p&gt;
&lt;p&gt;Elo ratings try to get around this problem by considering each draw to be half a win and half a loss. However, this means that the win probabilities calculated using the Elo equation are actually the probability of winning or drawing versus the probability of losing or drawing, which isn’t particularly useful.&lt;/p&gt;
&lt;p&gt;For a game such as chess, which Elo ratings were originally developed for, this may not be too much of an issue as tied matches are comparatively rare but in football draws are a common occurrence so we really need to be able to model three outcomes – win, loss and draw.&lt;/p&gt;
&lt;h2 id="the-eastwood-index"&gt;The Eastwood Index&lt;/h2&gt;
&lt;p&gt;So instead of combining draws with wins and losses, we need to be able to calculate their probabilities individually. To do this I have been developing my own ranking system, which for want of a better name I am currently calling the Eastwood Index, or EI for short (it feels rather pretentious to be naming it after myself so if anyone has any better names for it then feel free to let me know!)&lt;/p&gt;
&lt;p&gt;The Eastwood Index allows football teams to be ranked using a mathematical rating system that evaluates relative strength based on previous performances weighted so that more recent matches have a greater impact on a team’s ranking.&lt;/p&gt;
&lt;h2 id="methodology"&gt;Methodology&lt;/h2&gt;
&lt;p&gt;Teams EI ratings are scaled so that the average rating is 2000. The higher the rating the better a team is compared with the rest of the league.&lt;/p&gt;
&lt;p&gt;EI ratings increase when a team wins a match or draws against superior opposition. Conversely, EI ratings decrease when teams lose matches or draw against weaker opposition. The size of this increase or decrease in ratings is linked to the quality of the opposition. For example, beating a superior team is worth more than winning against a lower ranked team.&lt;/p&gt;
&lt;p&gt;The change in EI rating is also weighted by the score line so that the greater the difference in goals scored or conceded then the greater the change in ratings. Home advantage is also included in the calculations so that the home team is considered to perform better at home compared with away.&lt;/p&gt;
&lt;p&gt;So far this all sounds similar to an Elo rating. However, the Eastwood Index has a major advantage over Elo in that it is multinomial, meaning it can function with multiple outcomes. This makes it possible to accurately calculate the probabilities of teams winning, drawing or losing matches.&lt;/p&gt;
&lt;p&gt;A further advantage of the Eastwood Index is that it is does not rely on the Logistic distribution the same way Elo ratings do. The use of the Logistic distribution in Elo ratings originates from chess where it was considered to predict chess outcomes reasonably well. Football and chess are different games with different outcomes so instead the Eastwood Index uses custom curves developed using football data. This means that predictions for football should be more accurate using the EI compared with Elo ratings.&lt;/p&gt;
&lt;h2 id="example"&gt;Example&lt;/h2&gt;
&lt;p&gt;The underlying mathematics for the EI is completely different to how an Elo rating is calculated but rather than wade through a list of equations it is simpler to show how it works using the recent Liverpool versus Swansea match.&lt;/p&gt;
&lt;p&gt;Prior to the game Liverpool had an EI rating of 2151 compared with Swansea’s rating of 1891. Team performances are considered to be normally distributed around their rating so on any given day a team may play above or below their true skill level. By comparing the distribution curves for the two teams we can then calculate the probabilities of each outcome of the match before it is played.&lt;/p&gt;
&lt;p&gt;Although both teams have similar ratings Liverpool has the home advantage giving them overall a 52% chance of a win compared with a 25% chance of Swansea winning and a 23% chance of a draw (Figure 1).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/LvsS.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Predicting Liverpool Versus Swansea City&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We can also use these probabilities to calculate the expected points from the match. If these two teams were to play the same match repeatedly then on average Liverpool would be expected to earn (0.52 * 3) + (0.23 * 1) = 1.79 points while Swansea would be expected to earn (0.25 * 3) + (0.23 * 1) = 0.98 points.&lt;/p&gt;
&lt;p&gt;Once we know the actual result we can then update the EI for each team based on their current ratings and the score line, which was Liverpool 5 – 0 Swansea. Since Liverpool already had a higher EI rating and had beaten somewhat lesser opposition they would expect only a small rise in their EI but taking into account their high score in the match Liverpool’s rating moves up to 2183 while Swansea’s falls to 1859.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;The EI Index offers a potentially superior way of rating football teams compared with other ranking systems, with the advantage that it can predict wins, losses and draws, and uses mathematics specifically designed to accurately model football data.&lt;/p&gt;
&lt;p&gt;I will be discussing the EI in more detail in future posts and showing how it can be used to analyse and predict football matches.&lt;/p&gt;
&lt;p&gt;As ever, get in touch if you have any comments of questions!&lt;/p&gt;</content><category term="EI"></category><category term="EI"></category><category term="Elo Ratings"></category></entry><entry><title>Understanding Elo Ratings Part Two</title><link href="2013/02/07/applying-elo-ratings-to-football/" rel="alternate"></link><published>2013-02-07T19:30:00+00:00</published><updated>2013-02-07T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-02-07:2013/02/07/applying-elo-ratings-to-football/</id><summary type="html">&lt;p&gt;Now that we understand the theory behind Elo ratings, let’s take a look at how to calculate them and how to make them more relevant to football...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Now that we understand the theory behind &lt;a href="http://pena.lt/y/2013/01/31/understanding-elo-ratings/"&gt;Elo Ratings&lt;/a&gt;, let’s take a look at how to calculate them and how to make them more relevant to football.&lt;/p&gt;
&lt;h2 id="calculating-elo-ratings"&gt;Calculating Elo Ratings&lt;/h2&gt;
&lt;p&gt;The equation for calculating a team’s Elo rating is shown below in Figure 1, where $Ra_{new}$ is the team’s new Elo rating after a match, $Ra_{old}$ is the team’s previous Elo rating before the match and $k$ is a weighting factor. $Sa$ is the outcome of the match normalised to the range 0–1 so that 0 is a loss, 0.5 is a draw and 1 is a win.&lt;/p&gt;
&lt;p&gt;$Ra_{new}=Ra_{old}+k(Sa-Ea)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Elo Rating equation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;$Ea$ is the expected probability of the team winning the match and is calculated using the equation in Figure 2 where $Rb-Ra$ is the difference in Elo ratings between the two teams.&lt;/p&gt;
&lt;p&gt;$Ea=1/1+10^{(Rb-Ra)/400}$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Expected win probability equation&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="win-expectancy"&gt;Win Expectancy&lt;/h2&gt;
&lt;p&gt;The calculation for $Ea$ is actually slightly different from the original Elo equation as it uses a logistic distribution for player performances rather than a normal distribution. The use of the logistic distribution stems the chess community, who suggested that it fit player performances better than the normal distribution did. In effect, the differences between the two are relatively minor, with the logistic curve skewing more performances to the tails of the distribution, meaning players are slightly more likely to over- or under-perform (Figure 3).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130702-Logistic-Vs-Normal.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Comparison of logistic and normal distributions&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="weighting-factor"&gt;Weighting Factor&lt;/h2&gt;
&lt;p&gt;The constant $k$ in the equation controls how many points are gained or lost each match. Increasing k will apply more weight to recent matches while lowering it will allow historic matches to have more of an effect on a team’s Elo rating. Therefore, using an inappropriate rating for $k$ may lead to inaccurate Elo ratings being calculated.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.eloratings.net/system.html"&gt;Eloratings.net&lt;/a&gt; is a website that applies Elo ratings to international football. They use a weighting of 60 for a world cup final, 50 for continental championship finals and major intercontinental tournaments, 40 for World Cup and continental qualifiers, 30 for all other tournament matches and 20 for international friendly matches. However, since none of these ratings apply directly to domestic football and since &lt;a href="http://www.eloratings.net/system.html"&gt;Eloratings.net&lt;/a&gt; does not explain how they were determined I decided to calculate my own.&lt;/p&gt;
&lt;p&gt;Using &lt;a href="http://en.wikipedia.org/wiki/Least_squares"&gt;Least Squares&lt;/a&gt; I optimized the value of $k$ to minimize the error of the predicted outcomes versus the actual match results using data from the English Premier League. Overall, the most accurate predictions were obtained using a value of 15 for $k$.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130702-K-Weighting.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: Effect of k on error of Elo prediction&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="goal-difference"&gt;Goal Difference&lt;/h2&gt;
&lt;p&gt;Another modification we can do to make the Elo ratings more applicable to football is to take into account the number of goals scored so that beating the opposition by two goals for example is better than wining by just one.&lt;/p&gt;
&lt;p&gt;We can do this by scaling $k$ by the goal difference so that the larger the difference the more points are gained by the victor and the more lost by the loser. There are a number of ways this can be done but in my method each additional goal a team scores becomes increasingly less important. For example, going from 1–0 to 2–0 is much more critical in terms of winning a game than going from 8–0 to 9–0.&lt;/p&gt;
&lt;p&gt;Eloratings.net used a similar approach where their scaling reduces the weightings for goal differences of two and three. However, for goal differences of four upwards their scale (intentionally or unintentionally) becomes linear and from then on applies equal weightings to each additional goal scored. Instead, I have used a sigmoid function to smoothly reduce the weightings of each goal scored to create the curve shown in Figure 5, which is then used to produce the scaling factors shown in Table 1&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130702-Goal-Differential.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 5: Goal difference scaling factor smoothed using a sigmoid&lt;/strong&gt;&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Goal Difference&lt;/span&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td &gt;Scaling Factor&lt;/span&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10.00&lt;/td&gt;
&lt;td&gt;2.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9.00&lt;/td&gt;
&lt;td&gt;2.88&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8.00&lt;/td&gt;
&lt;td&gt;2.77&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7.00&lt;/td&gt;
&lt;td&gt;2.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6.00&lt;/td&gt;
&lt;td&gt;2.49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5.00&lt;/td&gt;
&lt;td&gt;2.32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4.00&lt;/td&gt;
&lt;td&gt;2.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;1.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;td&gt;1.51&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Goal difference scaling factors&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="home-advantage"&gt;Home Advantage&lt;/h2&gt;
&lt;p&gt;If two teams with equal Elo ratings play each other then in theory they should both have an identical chance of winning the match; however, in football the home team always has a noticeable advantage.&lt;/p&gt;
&lt;p&gt;Looking back at the 2011–2012 English Premier League season, home wins accounted for 47% of results compared with just 24% for away wins. The remainder of the results are draws, which Elo ratings consider to be half a win, so including these gives us a final win expectancy of 61% for the home team and 39% for the away team.&lt;/p&gt;
&lt;p&gt;To account for this we can give the home team’s Elo a temporary boost of 75 points. For two equally matched teams this then raises the win expectancy for the home team from 50% to 61%, matching what we see in the English Premier League.&lt;/p&gt;
&lt;h2 id="relegations-and-promotions"&gt;Relegations and Promotions&lt;/h2&gt;
&lt;p&gt;Another issue to consider is how to deal with relegations and promotions. We could calculate Elo ratings for each tier of the league so that a team already has a rating when it gets promoted or alternatively we could award each promoted team the average Elo rating of 1500. A nice feature of Elo ratings is that they are self-correcting so although these arbitrary ratings may not be accurate they would gradually alter to the correct level.&lt;/p&gt;
&lt;p&gt;This does have the unfortunate side effect of skewing the other team’s Elo values though. The gain and loss of Elo points is zero sum, meaning that for every Elo point a team gains another team has to lose one. So adding in teams with different Elo ratings would distort the values of other the team’s ratings by altering the overall number of points available in the league.&lt;/p&gt;
&lt;p&gt;The simplest way to deal with this problem is to give the promoted teams the equivalent relegated team’s Elo rating. So the best promoted team takes the Elo rating of the best relegated team, the second best promoted team takes the Elo rating of the second best relegated team, and so on.  This then keeps the correct number of Elo points in the league and maintains the parity in points between teams.&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Elo ratings are a really quick and easy way to compare teams directly and calculate win expectancies. While techniques like the Pythagorean Expectation looks at how teams perform over a long period of time, Elo ratings can be used to look at teams on a match–by–match basis.&lt;/p&gt;</content><category term="Elo Ratings"></category><category term="Elo Ratings"></category></entry><entry><title>Understanding Elo Ratings</title><link href="2013/01/31/understanding-elo-ratings/" rel="alternate"></link><published>2013-01-31T19:30:00+00:00</published><updated>2013-01-31T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-31:2013/01/31/understanding-elo-ratings/</id><summary type="html">&lt;p&gt;The Elo rating system was originally devised by its creator Arphad Elo as a way to calculate the average skill levels of two chess players. Although the system was created specifically for chess it has also been adapted to many other games and sports, including international football...&lt;/p&gt;</summary><content type="html">&lt;h2 id="what-are-elo-ratings"&gt;What are Elo Ratings?&lt;/h2&gt;
&lt;p&gt;The Elo rating system was originally devised by its creator Arphad Elo as a way to calculate the average skill levels of two chess players. Although the system was created specifically for chess it has also been adapted to many other games and sports, including &lt;a href="http://www.eloratings.net/"&gt;international football&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="how-do-they-work"&gt;How Do They Work?&lt;/h2&gt;
&lt;p&gt;The fundamental principle behind Elo ratings is that the performance of a team in each match can be considered a random variable sampled from a normally distributed population centred on the team’s true skill level. Although performances will vary from match-to-match, the true skill level of the team is likely to only change slowly over time so can be considered to be the mean value of all their performance values.&lt;/p&gt;
&lt;p&gt;For example, Figure one shows a team with an Elo rating of 1500. On any given day their actual performance could vary from anywhere below 1000 to above 2000. But over a reasonable period of time their performances will average out to 1500.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130131-Elo-Distribution.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Possible performances for a team with Elo of 1500&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="why-are-elo-ratings-useful"&gt;Why are Elo Ratings Useful?&lt;/h2&gt;
&lt;p&gt;Elo ratings have no units and taken in isolation their specific values are of little interest. However, they become useful when comparing teams together as they can be used to determine the expected outcome between two teams based on the difference between their Elo ratings.&lt;/p&gt;
&lt;p&gt;The range used for Elo ratings is somewhat arbitrary with Elo himself suggesting they should be scaled so that a difference of two hundred points equates to the higher ranked team having a win probability of 75%. In addition, Elo ratings are generally scaled so that an average team has a rating of 1500.&lt;/p&gt;
&lt;h2 id="predicting-match-results-using-elo"&gt;Predicting Match Results Using Elo&lt;/h2&gt;
&lt;p&gt;Plotting two team’s Elo distributions together gives a nice way of visualizing their expected performances. Figure 2 shows Team 1 with an Elo rating of 1100 compared with Team 2 with an Elo of 1500. The most likely outcome is that both teams will play to their average ratings and so Team 2 will win overall as they have the higher ranking. However, both team’s performance distributions overlap each other, so it is possible for Team 1 to out perform Team 2 and win the match.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130131-Elo-Comparison.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Comparision of two team’s Elo performance probabilities&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The more these performance distributions overlap then the greater the chance of the lower placed team winning the match. The actual probability of victory can then be calculated from these two distributions by subtracting one from the other to get the normal difference distribution between the two (Figure 3).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130131-ELO-Difference-Distribution.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Probability of Elo differential occurring&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The centre of this new distribution is equal to the difference between the two ratings (1500 – 1100), meaning the most likely outcome is that Team 2 play like a team with an Elo rating of four hundred higher than Team 1. As we move further to the left the difference between the two teams decreases until we reach a negative differential at which point Team 1 actually start to play better than Team 2, albeit with a low probability of occurrence.&lt;/p&gt;
&lt;p&gt;The actual probability of this occurring can be plotted using cumulative frequency to show the overall chance of winning based on the Elo differential (Figure 4). So for our example above, Team 1 with its differential of -400 actually has around a 9% chance of winning the match while Team 2 has a 91% chance of winning.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130131-Elo-Frequency1.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: Probability of winning based on Elo differential&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So now we understand the theory behind Elo ratings, my next post will look at how they can be calculated and applied to football teams.&lt;/p&gt;</content><category term="Elo Ratings"></category><category term="Elo Ratings"></category></entry><entry><title>Predicting Football Matches Using Shot Data Part Two</title><link href="2013/01/25/predicting-football-matches-using-shots-on-target/" rel="alternate"></link><published>2013-01-25T19:30:00+00:00</published><updated>2013-01-25T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-25:2013/01/25/predicting-football-matches-using-shots-on-target/</id><summary type="html">&lt;p&gt;Having found that the correlation between goals scored and shots on target was the strongest of the various shooting variables I had available to me, I decided to see how well they could predict the outcome of a football match....&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Having found that the correlation between goals scored and shots on target was the strongest of the various shooting variables I had available to me, I decided to see how well they could predict the outcome of a football match.&lt;/p&gt;
&lt;h2 id="creating-the-model"&gt;Creating The Model&lt;/h2&gt;
&lt;p&gt;The obvious approach would have been to just do a linear regression for goals scored against number of shots on target and then predict the average number of goals each team would be expected to score. This doesn’t provide much insight though. The average score line might be of interest if each team was going to play each other 20 or 30 times a season but for a single game it is pretty much irrelevant.&lt;/p&gt;
&lt;p&gt;What is of more use is to predict the actual odds for each possible outcome between the teams. In other words what is the probability of each team winning, drawing or losing?&lt;/p&gt;
&lt;p&gt;To do this I looked at how many shots on target each team achieved and conceded each match compared with the league’s average to estimate how many they would be expected to have against each other. This was then mapped to the distribution of their shot on targets over the season so far and their shot conversion rate used to calculate the probabilities of the different number of goals they could score. Each match was then played one million times as part of a Monte Carlo simulation to see what the likely outcomes was.&lt;/p&gt;
&lt;h2 id="are-the-predictions-accurate"&gt;Are the Predictions Accurate?&lt;/h2&gt;
&lt;p&gt;One difficulty with a model like this is to assess its accuracy. With a traditional linear model you can just look at the $r2$ value to see how well you predictions match the actual results. The higher the $r2$ value then the better your model is.&lt;/p&gt;
&lt;p&gt;But with a probability model this doesn’t work. For example take the situation where the probability model predicts Team A have a 75% chance of beating Team B. Even if the model has calculated these odds perfectly then Team A will still lose 25% of the time, making it look like the prediction was incorrect.&lt;/p&gt;
&lt;p&gt;One alternative is to identify what the most probable outcome for each match was – win, draw or loss – and compare that with what actually happened to see if they match. To do this I applied the model retrospectively to all the matches from the 2011–2012 English Premier League season and overall the proportions of outcomes predicted did match closely what actually happened (Figure 1).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130121-SOT-Proportions.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Proportion of outcomes predicted compared with actual results for 2011–2012 English Premier League season&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Another test we can do is to compare the Shot on Target model with other models to see how well they compare. Again I picked the most probable outcome from my odds and this time compared it with those from Bet365 for the entire 2011–2012 English Premier League season. I also randomly guessed the outcome for each match by chance to see how the model compared with pure luck too.&lt;/p&gt;
&lt;h2 id="prediction-results"&gt;Prediction Results&lt;/h2&gt;
&lt;p&gt;Overall, the Shot on Target model’s most probable outcome correctly matched what actually happened for 43% of the matches tested compared with 52% for Bet365 and 33% from randomly guessing.&lt;/p&gt;
&lt;p&gt;Interestingly, even the bookies only managed to get the odds correct for around half the matches so the Shot on Target model is doing pretty well at 43% and isn’t that far behind the professional odds compilers. Also, this is only the first stage of the model, there are still plenty of ways it can be tweaked to try and improve its accuracy further.&lt;/p&gt;</content><category term="Shots"></category><category term="Shots"></category></entry><entry><title>What Is The Chance of Bradford City reaching Wembley?</title><link href="2013/01/22/what-is-the-chance-of-bradford-city-reaching-wembley/" rel="alternate"></link><published>2013-01-22T19:30:00+00:00</published><updated>2013-01-22T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-22:2013/01/22/what-is-the-chance-of-bradford-city-reaching-wembley/</id><summary type="html">&lt;p&gt;With League Two’s Bradford City only one match away from playing at Wembley in the League Cup final I thought it would be interesting to see what the chances were of them getting this far...&lt;/p&gt;</summary><content type="html">&lt;p&gt;With League Two’s Bradford City only one match away from playing at Wembley in the League Cup final I thought it would be interesting to see what the chances were of them getting this far.&lt;/p&gt;
&lt;p&gt;It has been an unbelievable cup run for Bradford City as they have had to play against teams in higher divisions nearly every step of the way. The first round of the cup pitted them against League One team Notts County who they managed to defeat in extra time. This was then followed a couple of weeks later with a 2–1 victory away at Championship side Watford.&lt;/p&gt;
&lt;p&gt;The third round was somewhat kinder to them as they played at home against Burton Albion, a team from their own division. However, having seen off Burton the fourth round then took them to Premier League side Wigan Athletic who they managed to defeat on penalties.&lt;/p&gt;
&lt;p&gt;The quarter final again put Bradford against another Premier League team, with Arsenal this time Bradford’s victims.  Next up was Aston Villa who lost 3–1 to Bradford at the Valley Parade although Villa did come away with an away goal, which could prove to be critical for them.&lt;/p&gt;
&lt;p&gt;To work out the probability of Bradford’s cup run I collected the odds for each match from www.oddsportal.com and removed the overround to get the true odds. The overround is the bookies profit margin created by offering odds lower than the actual true odds of the event occurring. To remove it we just need to scale the odds by the excess so that they add up to exactly  100%.&lt;/p&gt;
&lt;p&gt;Once we have the true odds we can them work out the cumulative probability of the cup run by multiplying the odds together (note that bookies odds generally refer to what happens over the first ninety minutes of the match so Bradford City beating Notts County in extra time is actually classed as a draw rather than an away win).&lt;/p&gt;
&lt;p&gt;$Prob(Cup Run) = prob draw with Notts County * prob beating Watford  * prob…$&lt;/p&gt;
&lt;p&gt;Overall, the probability of Bradford City’s current cup run so far is 0.008%. If we take into account tonight’s match then the chances of Bradford’s cup run taking them all the way to Wembley is around 0.001% or 1 in 100,000. It’s not quite a lottery win, which is around 100 times less likely again, but it is a fantastic achievement for Bradford City and is likely to be a once in a lifetime experience for their fans.&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category></entry><entry><title>Predicting Football Matches Using Shot Data</title><link href="2013/01/21/predicting-football-matches-using-shot-data/" rel="alternate"></link><published>2013-01-21T19:30:00+00:00</published><updated>2013-01-21T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-21:2013/01/21/predicting-football-matches-using-shot-data/</id><summary type="html">&lt;p&gt;Previously on this blog I have discussed my attempts at using the Poisson distribution to predict the number of goals scored in football matches. So far, the results have been...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Previously on this blog I have discussed my attempts at using the Poisson distribution to predict the number of goals scored in football matches. So far, the results have been disappointing as the mathematical model I constructed under-predicted the number of draws that occurred. This is something I intend to go back and address at some point by adding in the Dixon and Coles adjustment but in the meantime I thought I would try predicting the outcome of matches using shots instead.&lt;/p&gt;
&lt;h2 id="shot-data"&gt;Shot Data&lt;/h2&gt;
&lt;p&gt;There were a number of reasons for working with shots instead of using goals directly. First of all, shots and goals are inherently linked together. For every goals scored there has to be a shot taken. Secondly, not every shot taken leads to a goal, giving us a much larger data set to work with compared with just goals alone. Thirdly, the number of shots taken in a match is pretty much normally distributed (Figure 1) whereas the number of goals scored is closer to a Poisson distribution. This is useful as many statistical tests rely on a normal distribution of data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130121-Total-Shot-Histogram.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Frequency of total shots in English Premier League matches 2009–2012&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The first stage of developing the model was to determine what variables to use for it. Looking at data over a whole season showed a decent correlation between goals scored and total shots taken ($r2$=0.62), shots on target($r2$=0.76), shots blocked ($r2$=0.59) and shots wide ($r2$=0.32; Figure 2).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130121-Shot-Correlations.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Correlation between goals scored and various shooting parameters for the 2011–2012 English Premier League Season&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="variability"&gt;Variability&lt;/h2&gt;
&lt;p&gt;Unfortunately when you start looking at the data match-by-match the correlations become much weaker. Over the course of an entire season a lot of the variability in the data starts to even out but over a single match it is not the case and variables such as luck can play a much bigger role. For example it is likely that the teams with the most shots on target will score the most goals overall per season as skill would start to dominate over luck. However, this isn’t always the case for an individual game – we have all seen matches where a team has scored a lucky goal and then managed to hold on for the win even though the opposition has showered their goal with shots for ninety minutes.&lt;/p&gt;
&lt;p&gt;Because of this I decided to exclude many of the variables as they have little value over a single match. Instead, I focussed on using just shots on target data as this had the highest correlation with goals match-by-match. As with the total number of shots taken, the data is also roughly normally distributed although it is skewed towards zero (Figure 3) as obviously no matter how bad a team is it cannot achieve less than zero shots on target in a match (although Blackburn Rovers come close by managing to go the entire match against Tottenham Hotspur in 2012 without taking even a single shot, let alone managing to get one on target!)&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130121-SOT-Histogram.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Frequency of Shots on Target in English Premier League matches 2011–2012&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In my next post I will explain more about how the Shot on Target model works and discuss its accuracy.&lt;/p&gt;</content><category term="Shots"></category><category term="Shots"></category></entry><entry><title>Predicting The Premier League Using The Refined Pythagorean Equation</title><link href="2013/01/18/predicting-the-premier-league-betting-expert/" rel="alternate"></link><published>2013-01-18T19:30:00+00:00</published><updated>2013-01-18T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-18:2013/01/18/predicting-the-premier-league-betting-expert/</id><summary type="html">&lt;p&gt;New article for Betting Expert looking at the current Premier League standings compared with the predictions from my refined version of the Pythagorean Expectation.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;New article for &lt;a href="http://www.bettingexpert.com/blog/pythagorean-expectation-and-football"&gt;Betting Expert&lt;/a&gt; looking at the current Premier League standings compared with the predictions from my refined version of the Pythagorean Expectation.&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category></entry><entry><title>How Early In The Season Can Pythagorean Predictions Be Made?</title><link href="2013/01/02/how-early-in-the-season-can-pythagorean-predictions-be-made/" rel="alternate"></link><published>2013-01-02T19:30:00+00:00</published><updated>2013-01-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2013-01-02:2013/01/02/how-early-in-the-season-can-pythagorean-predictions-be-made/</id><summary type="html">&lt;p&gt;The next stage for developing my refined version of the Pythagorean equation is to characterise how many weeks of data it actually needs to make accurate football predictions...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The next stage for developing my refined version of the Pythagorean equation (known as the MPE) is to characterise how much data it actually needs to make accurate football predictions.&lt;/p&gt;
&lt;h2 id="methodology"&gt;Methodology&lt;/h2&gt;
&lt;p&gt;To investigate this I selected Manchester City, Swansea City and Wolverhampton Wanderers from the English Premier League’s 2011–2012 season. The reason for choosing these teams was that they represented the top, middle and bottom of the league so I could test the MPE equation across teams of varying quality and league position.&lt;/p&gt;
&lt;p&gt;I then used the MPE equation to predict the total points at the end of the season for each team week-by-week to see how the prediction changed throughout the year. Figure 1 shows the difference between the predicted points and the actual points achieved at the end of the season for each of the three teams.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/130102_pythag_by_week.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Difference Between Predicted and Actual Points&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;The prediction settled down very quickly for Manchester City and from match three onwards the root mean square (RMSE) of the error was just 1.96 points. This means that after just three games the MPE equation was accurately predicting how many points Manchester City would have at the end of the season to within two points.&lt;/p&gt;
&lt;p&gt;For Swansea City the prediction was slightly more problematic as they didn’t score during their first four matches and the MPE equation needs goals to have been scored before a valid prediction can be made. Swansea City finally scored in their fifth match in a 3–0 victory over West Bromwich Albion and from then on the prediction steadily improved and was within three points of their actual total after their next six matches.&lt;/p&gt;
&lt;p&gt;Wolverhampton Wanderers’ season was an interesting one to predict as they had a very misleading start with two wins and a draw in their first three matches giving a predicted point total of 83. At this point though it all went disastrously wrong for them and they lost their next five matches on the run by which time their predicted points had dropped all the way down to 30. Wolverhampton Wanderers eventually finished bottom of the league with 25 points.&lt;/p&gt;
&lt;p&gt;Overall, the MPE equation appears to give stable results and the only real requirement is that goals have been scored. Based on the data in Figure 1 accurate predictions can be made early in the season as there is very little change in the predictions from week ten of the season onwards.&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category></entry><entry><title>What Has Caused Dimitar Berbatov’s Recent Lack of Goals?</title><link href="2012/12/15/what-has-caused-dimitar-berbatovs-recent-lack-of-goals/" rel="alternate"></link><published>2012-12-15T19:30:00+00:00</published><updated>2012-12-15T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-12-15:2012/12/15/what-has-caused-dimitar-berbatovs-recent-lack-of-goals/</id><summary type="html">&lt;p&gt;Up until week 12 of the season, Dimitar Berbatov was one of the English Premier League’s top goal scorers and goal creators. However, since then he has gone 450 minutes without registering either a goal or an assist, coinciding with Bryan Ruiz’s injury...&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Up until week 12 of the season, Dimitar Berbatov was one of the English Premier League’s top goal scorers and goal creators. However, since then he has gone 450 minutes without registering either a goal or an assist, coinciding with Bryan Ruiz’s injury. Check out my &lt;a href="http://www.bettingexpert.com/blog/what-has-caused-berbatovs-goal-drought"&gt;guest article&lt;/a&gt; in which I analyse the effect the absence of Ruiz has had on Berbatov’s performances here&lt;/p&gt;</content><category term="Misc"></category></entry><entry><title>Using the Pythagorean Expectation Across Leagues Wordwide</title><link href="2012/12/10/using-the-pythagorean-expectation-across-leagues-wordwide/" rel="alternate"></link><published>2012-12-10T19:30:00+00:00</published><updated>2012-12-10T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-12-10:2012/12/10/using-the-pythagorean-expectation-across-leagues-wordwide/</id><summary type="html">&lt;p&gt;The next stage for my Pythagorean's development is to testing whether it can be applied to leagues outside the EPL. Having one Pythagorean equation that could be used globally is preferable to having to create specific equations for each league.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I showed in my &lt;a href="http://pena.lt/y/2012/12/03/applying-the-pythagorean-expectation-to-football-part-two/"&gt;last post&lt;/a&gt; that my initial version of the Pythagorean Expectation (MPE) predicted total points for the English Premier League (EPL) pretty well, with an RMSE of approximately four points over the course of a whole season (see &lt;a href="http://en.wikipedia.org/wiki/Root-mean-square_deviation"&gt;here&lt;/a&gt; for an explanation of using RMSE to measure the error of the predictions). The next stage for the equation’s development is to see whether it can be applied to other leagues too. Having one MPE equation that could be used globally across leagues is preferable to having to create specific equations for each league.&lt;/p&gt;
&lt;h2 id="the-eredivisie"&gt;The Eredivisie&lt;/h2&gt;
&lt;p&gt;At the recommendation of &lt;a href="http://scoreboardjournalism.wordpress.com/"&gt;Scoreboard Journalism's&lt;/a&gt; Simon Gleave I started with the Eredivisie, the top flight division in Holland. The reason for choosing the Eredivisie is that it is a unique league, with high rates of goal scoring and a number of results in recent years that appear as potential outliers. For example, in the 2009–2010 season Ajax scored 43 goals more than Twente and conceded three fewer yet still finished second to them in the league. At the other end of the table Willem II finished 15th in 2007–2008 with a goal difference of -9 while the two teams immediately above them had goal differences of -30 and -24, respectively. These sort of results make the Eredivisie difficult to predict and so provide a good stress test for the MPE equation.&lt;/p&gt;
&lt;p&gt;Applying the MPE to the final Eredivise standings from 1999–2000 to 2011–2012 worked surprisingly well, with an overall RMSE of 4.35 points. It is slightly higher than the 4.08 previously obtained for the EPL but this is perhaps to be expected since the original MPE equation was generated using just data from the EPL.&lt;/p&gt;
&lt;p&gt;To see whether the Dutch league needed its own version of MPE I recreated the equation based on just Eredivisie data and the overall error dropped to 4.21, a decrease of around 3%. Such a minor improvement suggests that the equation maybe stable across leagues and so we will not need league-specific versions.&lt;/p&gt;
&lt;p&gt;To test this hypothesis further I collected 223 league tables from around the world and optimised the MPE against this larger data set. The reason for this was three-fold. Firstly, the original equation I published was created just from EPL data so any peculiarities to the EPL could bias results for other leagues.&lt;/p&gt;
&lt;p&gt;Secondly, the previous data set was smaller so any outliers in the data could have a large effect on the finalised results. By using a larger data set the influence of any outliers will be minimised.&lt;/p&gt;
&lt;p&gt;Thirdly, and perhaps most importantly, this gave enough data to &lt;a href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)"&gt;cross-validate&lt;/a&gt; the equation by randomly splitting the league tables up into training and validation sets. Initially, the MPE had been trained and tested using the same data. Now it has been tested on different data to which it was optimised against, reducing the risk of &lt;a href="http://en.wikipedia.org/wiki/Testing_hypotheses_suggested_by_the_data"&gt;Type III errors&lt;/a&gt; errors occurring.&lt;/p&gt;
&lt;p&gt;Figure One shows the RMSE for the predictions for fifteen leagues randomly selected as a validation set. The overall RMSE across the entire validation set is 3.88 points and is plotted as the vertical dotted line. The overall RMSE is now reduced to below four points and this new version of the MPE equation appears suitable for use globally across different leagues.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121210_MPE_Training_Results.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Results For Validation of MPE Equation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The finalised MPE Pythagorean Expectation is shown in Figure 2. Based on the data shown here this new version of the MPE equation is suitable for use across multiple leagues worldwide, with an average error of less than 4 points per season.&lt;/p&gt;
&lt;p&gt;$predicted points = (goalsfor^{1.2299}/(goalsfor^{1.16793} + goalsaway^{1.20053})) * 2.29761 * numberofgamesplayed$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: MPE Equation&lt;/strong&gt;&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category></entry><entry><title>Applying the Pythagorean Expectation to Football: Part Two</title><link href="2012/12/03/applying-the-pythagorean-expectation-to-football-part-two/" rel="alternate"></link><published>2012-12-03T19:30:00+00:00</published><updated>2012-12-03T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-12-03:2012/12/03/applying-the-pythagorean-expectation-to-football-part-two/</id><summary type="html">&lt;p&gt;Part two of guide to applying the baseball pythagorean to football.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In my &lt;a href="http://pena.lt/y/2012/11/26/applying-the-pythagorean-expectation-to-football-part-one/"&gt;previous article&lt;/a&gt;, I discussed how to apply the baseball Pythagorean expectation to football and how to measure the error of the predictions using &lt;a href="http://en.wikipedia.org/wiki/Root-mean-square_deviation"&gt;RMSE&lt;/a&gt;. This second article will demonstrate how to optimize the equation further to improve its accuracy.&lt;/p&gt;
&lt;h2 id="accuracy"&gt;Accuracy&lt;/h2&gt;
&lt;p&gt;One of the major reasons for the error in the predictions is the occurrence of draws in football. The Pythagorean expectation only looks at wins and losses and presumes that if a team scores zero goals then it will achieve zero points. This is of course incorrect, it is perfectly feasible for a team to fail to score but still gain a point through a nil-nil draw so we need to take this into account.&lt;/p&gt;
&lt;p&gt;Howard Hamilton of &lt;a href="http://www.soccermetrics.net/"&gt;Soccermetrics&lt;/a&gt; has published an updated &lt;a href="http://hhamilton.typepad.com/files/pythag_mit_sa_2010.pdf"&gt;Soccer Pythagorean&lt;/a&gt; equation that does just that, and it does a good job of it. For the 2011–2012 season, Howard Hamilton reports an RMSE of 3.81 compared with the RMSE of 5.65 I reported for my previous version of the Pythagorean equation. The downside to Howard Hamilton’s equation though is that it is rather complicated. While the original Pythagorean equation is simple enough to be used by any football fan, Howard Hamilton’s equation requires a decent understanding of mathematics to use it.&lt;/p&gt;
&lt;p&gt;Because of that, I thought I would tweak the original Pythagorean formula a bit further to try and improve its accuracy without adding too much extra complexity to it. One easy way to do this is to scale the points scored per match to take into account the occurrence of draws. Applying least squares to this reduces the RMSE for the 2011–2012 season to 4.04 points, just 6% higher than Howard Hamilton’s equation. This is based on only one season’s data though so to get a true idea of how well my enhanced Pythagorean expectation works (abbreviated to MPE)  I optimized the equation based on a much larger data set and applied it to the last 10 English Premier League (EPL) seasons ( Figure 1).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121203_pthagorean_seasons.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: MPE Prediction by Season in the EPL&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The MPE works well, with an average residual (the difference between predicted points and actual points) of 4.08 points. This compares nicely with Howard Hamilton’s published value of 3.81 and is less than half of the error the original Pythagorean Expectation equation gave. It is also worth noting that Howard Hamilton’s RMSE of 3.81 is for just one season, and of the ten seasons analysed here using the MPE, two actually have an RMSE lower than 3.81.&lt;/p&gt;
&lt;p&gt;Plotting the MPE predicted points versus actual points for the last ten EPL seasons shows visually  how well the MPE equation works (Figure 2). The correlation between the predicted and actual points scored is excellent, with an an $r2$ value 0.938 (Figure 2).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121203_mpe_scatter.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: MPE Predicted Points Versus Actual Points in the EPL&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So based on the initial work so far I am pleased that the MPE version of the Pythagorean expectation gives results comparable to Howard Hamilton’s more detailed and advanced derivation but without quite as much added complexity. The final equation for anybody who wants to give it a try is shown below in Figure 3.&lt;/p&gt;
&lt;p&gt;$predicted points = (goalsfor^{1.22777}/(goalsfor^{1.072388} + goalsaway^{1.127248})) * 2.499973 * numberofgamesplayed$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: MPE Equation&lt;/strong&gt;&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category></entry><entry><title>Applying the Pythagorean Expectation to Football: Part One</title><link href="2012/11/26/applying-the-pythagorean-expectation-to-football-part-one/" rel="alternate"></link><published>2012-11-26T19:30:00+00:00</published><updated>2012-11-26T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-26:2012/11/26/applying-the-pythagorean-expectation-to-football-part-one/</id><summary type="html">&lt;p&gt;Introduction to applying the baseball pythagorean to football.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Pythagorean_expectation"&gt;Baseball Pythagorean Expectation&lt;/a&gt; is a formula originally derived by Bill James to estimate how many games a baseball team could be expected to win over a season based on the number of runs they score and concede (Figure 1). Teams winning fewer games than their Pythagorean prediction are considered to have been unlucky while those outperforming the prediction are thought to have had luck on their side.&lt;/p&gt;
&lt;p&gt;$wins = runs scored^2 / (runs scored^2 + runs allowed^2)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: The Baseball Pythagorean Expectation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The formula works well for baseball, giving predictions generally within three games of what actually happens. The Pythagorean expectation has also been applied successfully to other sports, including American football and basketball. However, so far the equation has not worked particularly well for predicting football matches.&lt;/p&gt;
&lt;p&gt;Table 1 shows goals scored and conceded in the English Premier League (EPL) during the 2011–2012 season, along with the actual points and Pythagorean predicted points. Looking at the difference between predicted and actual points it is clear that the Pythagorean expectation is over-predicting at the top of the table and under-predicting at the bottom.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Team&lt;/td&gt;
&lt;td&gt;GF&lt;/td&gt;
&lt;td&gt;GA&lt;/td&gt;
&lt;td&gt;Pts&lt;/td&gt;
&lt;td&gt;Pythag Pts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester City&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manchester United&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arsenal&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tottenham Hotspur&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcastle United&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chelsea&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everton&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liverpool&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fulham&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Bromwich Albion&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea City&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norwich City&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sunderland&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stoke City&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wigan Athletic&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aston Villa&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queens Park Rangers&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bolton Wanderers&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blackburn Rovers&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wolverhampton Wanderers&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;RMSE&lt;/td&gt;
&lt;td&gt;8.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Table 1: Pythagorean Expectation for the EPL 2011–2012&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We can quantify this error by calculating the root-mean-square error (RMSE). This technique basically squares the difference between the predicted and actual points and then takes the square root of the average. It sounds complicated but all the squares and square roots do is make all the numbers positive. Imagine if we predicted just two values and were -10 points out for the first and +10 points out on the second. If we just averaged these two numbers then the average error would be zero, making it look like our prediction was perfect when it obviously was not. Instead, if we square the numbers first and then take the square root of the average we get the correct error of ten points. Doing this calculation for Table 1 gives us a RMSE of 8.4 points meaning that on average the Pythagorean expectation was eight  points out for the 2011–2012 season.&lt;/p&gt;
&lt;p&gt;The more accurate the predictions are then the lower the RMSE will be. One way to improve the prediction is to alter the exponent used in the equation. In other words, instead of raising goals scored and conceded to the power of two we use different values. Figure 2 shows what happens to the RMSE as the exponent is changed from 0.1–3. Looking at the chart, the RMSE is lowest using an exponent of 1.35, giving an average error of 5.75, nearly three points lower than before.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121126_pthagorean_rmse.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Effect of Altering Exponent on RMSE&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The next logical step to improve the prediction further is to try using a different exponent for each part of the equation. This makes the formula harder to optimize but by applying a technique called least squares to it we come up with optimal exponents of 1.39, 1.43 and 0.98. Unfortunately this has little effect on the RMSE though, reducing it just 0.1 to 5.65 points.&lt;/p&gt;
&lt;p&gt;So far the predictions are still nearly six points out but in part two of this article I will discuss why the error is high and show how to improve it further to increase the accuracy of the predictions.&lt;/p&gt;</content><category term="Pythagorean"></category><category term="Pythagorean"></category></entry><entry><title>Disparity in European Football Leagues</title><link href="2012/11/20/disparity-in-european-football-leagues/" rel="alternate"></link><published>2012-11-20T19:30:00+00:00</published><updated>2012-11-20T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-20:2012/11/20/disparity-in-european-football-leagues/</id><summary type="html">&lt;p&gt;Having mentioned the effect disparity plays on determining the league champions in previous posts I thought it would be interesting to look at the actual levels of disparity currently present in football.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Having mentioned the effect disparity plays on determining the league champions in previous posts I thought it would be interesting to look at the actual levels of disparity currently present in football.&lt;/p&gt;
&lt;h2 id="english-premier-league"&gt;English Premier League&lt;/h2&gt;
&lt;p&gt;I started off looking at the English Premier League (EPL) over the past decade and plotted the points achieved each season as a Tukey Box-and-Whiskers plot (Figure 1). Looking at Figure 1, the spread of points across the league each season is broadly consistent. There have been a few years where individual teams have done particularly well, such as Chelsea in 2004-2005, or particularly badly, such as Derby County in 2008, but there are no obvious changes over time.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121120_epl_disparity_box_whiskers.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Points Scored in EPL Per Season&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One noticeable feature, however, is that the median value for every season (the thick black line in the middle of each box) is lower than the overall average (plotted as the horizontal dotted line), suggesting the data is skewed. Looking at the 2010–2011 season as an example, half the teams scored less than 47 points while half scored 47 or more. In comparison, the average points scored that season was 51.5. This means that an average mid-table EPL team is closer to relegation than it is to winning the league. To put it into perspective, West Ham finished bottom that season scoring just 18.5 points less than the average while Manchester United won the league with 28.5 points more than the average.&lt;/p&gt;
&lt;h2 id="other-european-leagues"&gt;Other European Leagues&lt;/h2&gt;
&lt;p&gt;A similar pattern can be seen across all the major league in Europe (Figure 2) where the median points achieved was also lower than the average. The median points for the Budesliga and Eriedivise were furthest from the average but it is worth bearing in mind that these two league player fewer matches than the EPL, La Liga and Ligue 1 so this is perhaps to be expected.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121120_euro_disparity_box_whiskers.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Points Scored in 2010-2011&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;La Liga and Ligue 1 both show two teams that are classified as statistical outliers. It is no surprise that the two outliers in La Liga are Real Madrid and Barcelona who both finished more than twenty points ahead of the rest of the league. In the case of Ligue 1, the champions Montpellier and relegated Arles-Avignon are both classed as outliers. A major the reason for this is how close the middle of Ligue 1 finished that season – Monaco were relegated with 44 points, only seven points less than Bordeux who finished in seventh place.&lt;/p&gt;
&lt;p&gt;Since leagues play different numbers of matches it is difficult to compare them directly so I also looked at the difference in points scored per match by the top team and the middle team, and the middle team and bottom team (Table 1). The results show that La Liga was the most uncompetitive of the leagues, with the champions scoring 1.737 points more per match than the bottom team. The EPL came out as the most competitive league, with the lowest difference between the top and bottom teams. However, Ligue 1 appears the most balanced, with the smallest difference between the top and bottom of the league compared with the middle. Interestingly, the Eriedivisie appears unbalanced in the opposite way to most other leagues, with the bottom team further away from mid-table than the champions are from mid-table.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;League&lt;/td&gt;
&lt;td&gt;Top/Middle&lt;/td&gt;
&lt;td&gt;Middle/Bottom&lt;/td&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EPL&lt;/td&gt;
&lt;td&gt;0.868&lt;/td&gt;
&lt;td&gt;0.368&lt;/td&gt;
&lt;td&gt;1.237&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ligue 1&lt;/td&gt;
&lt;td&gt;0.711&lt;/td&gt;
&lt;td&gt;0.763&lt;/td&gt;
&lt;td&gt;1.474&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;La Liga&lt;/td&gt;
&lt;td&gt;1.289&lt;/td&gt;
&lt;td&gt;0.447&lt;/td&gt;
&lt;td&gt;1.737&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundesliga&lt;/td&gt;
&lt;td&gt;0.912&lt;/td&gt;
&lt;td&gt;0.441&lt;/td&gt;
&lt;td&gt;1.353&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eriedivisie&lt;/td&gt;
&lt;td&gt;0.765&lt;/td&gt;
&lt;td&gt;0.941&lt;/td&gt;
&lt;td&gt;1.706&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content><category term="Chance"></category><category term="Chance"></category></entry><entry><title>Analysis of André Villas-Boas Vs Harry Redknapp</title><link href="2012/11/18/analysis-of-andre-villas-boas-vs-harry-redknapp/" rel="alternate"></link><published>2012-11-18T19:30:00+00:00</published><updated>2012-11-18T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-18:2012/11/18/analysis-of-andre-villas-boas-vs-harry-redknapp/</id><summary type="html">&lt;p&gt;Since taking over as manager of Tottenham Hotspur, André Villas-Boas has been trapped in former Spurs manager Harry Redknapp’s shadow. Every tactical decision or team selection Villas-Boas makes is seemingly compared with Redknapp’s previous achievements. And after Tottenham’s apparent slow start to the season, Villas-Boas has come under heavy criticism from the media whose narrative seems to be that Tottenham are performing poorly. But is this criticism fair and are Tottenham really performing any worse than last season under Harry Redknapp?&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Since taking over as manager of Tottenham Hotspur, André Villas-Boas has been trapped in former Spurs manager Harry Redknapp’s shadow. Every tactical decision or team selection Villas-Boas makes is seemingly compared with Redknapp’s previous achievements. And after Tottenham’s apparent slow start to the season, Villas-Boas has come under heavy criticism from the media whose narrative seems to be that Tottenham are performing poorly. But is this criticism fair and are Tottenham really performing any worse than last season under Harry Redknapp?&lt;/p&gt;
&lt;p&gt;Find out by reading the rest of this article &lt;a href="http://www.bettingexpert.com/blog/andre-villas-boas-vs-harry-redknapp"&gt;here&lt;/a&gt;.&lt;/p&gt;</content><category term="Misc"></category></entry><entry><title>Effect of Season Length on Deciding the League Champion</title><link href="2012/11/12/effect-of-season-length-on-league-champions/" rel="alternate"></link><published>2012-11-12T19:30:00+00:00</published><updated>2012-11-12T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-12:2012/11/12/effect-of-season-length-on-league-champions/</id><summary type="html">&lt;p&gt;In my previous article I looked at the interplay between luck and skill in determining the league champions. There is another parameter though that also interacts with luck and that is the structure of the league itself.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In my &lt;a href="http://pena.lt/y/2012/11/08/how-often-does-the-best-team-win-the-league/"&gt;previous article&lt;/a&gt; I looked at the interplay between luck and skill in determining the league champions. There is another parameter though that also interacts with luck and that is the structure of the league itself. How many times have you heard the same tired, old cliché from football managers about how luck evens itself out over a season? But does it? Is a football season really long enough for the effects of chance to be cancelled out?&lt;/p&gt;
&lt;p&gt;I used the same mathematical model as before to simulate 10,000 seasons of a league containing 20 teams. Skill levels were randomly assigned to each team from a normally distributed population with a mean of 0.5 and a standard deviation of 0.1. The length of the season was then altered to see how frequently the team with the highest skill level won the league depending on the number of matches played – teams either played each other once, twice, four times or eight times per season.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Number of Teams&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Frequency Teams Meet&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Mean Win %&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Best Team Win %&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;60.2&lt;/td&gt;
&lt;td&gt;32.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;75.3&lt;/td&gt;
&lt;td&gt;45.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;82.2&lt;/td&gt;
&lt;td&gt;48.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;85.2&lt;/td&gt;
&lt;td&gt;50.8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table One: Effect of Season Length on League Champions&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;The results in Table One show that as the length of the season increases the probability of the team with the highest skill rating winning the league increases too. The Champions also win a greater percentage of their matches too. Therefore, the more matches that are played the less of an influence chance seems to play in determining the overall league champion.&lt;/p&gt;
&lt;p&gt;The second row of Table One matches the structure of four of the major leagues in Europe – Premier League, Serie A, La Liga and Ligue 1 – which all contain 20 teams that play each other twice per season. The Eriedivisie and Bundesliga only contain 18 teams though, so what affect does this have? Rerunning the mathematical model with 18 teams gives a lower frequency for the best team winning the league of 28.8%. This suggests that the smaller size of these two leagues makes them somewhat more competitive as there are fewer matches for luck to be evened out.&lt;/p&gt;
&lt;p&gt;The Scottish Premier League (SPL) is smaller again, containing just 12 teams. The structure of the league is fairly unique in Europe, with teams playing each other three times, either twice away and once at home or vice versa. The league then splits in half and teams play a further match against the remaining five teams in their half of the league. If we apply the mathematical model to this structure then we come out with a frequency of 19.3% for the best team winning the league. This means the SPL should be one of the most competitive leagues in Europe, yet it has only ever been won by two teams – Celtic and Rangers. The reason for this is likely due to the large disparity in talent between Glasgow’s two largest teams and the rest of the league cancelling out the effect of chance.&lt;/p&gt;
&lt;p&gt;Interestingly, with Rangers now relegated from the SPL for financial irregularities, the league is the closest it has ever been. It was thought that without Rangers present in the SPL, Celtic would go on to dominate a very one-sided division. Yet with Hibernian currently sitting top of the league, the reduction in disparity from the loss of Rangers may actually make it the most competitive and exciting year in the SPL’s history.&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category></entry><entry><title>How Often Does The Best Team Win The League?</title><link href="2012/11/08/how-often-does-the-best-team-win-the-league/" rel="alternate"></link><published>2012-11-08T19:30:00+00:00</published><updated>2012-11-08T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-08:2012/11/08/how-often-does-the-best-team-win-the-league/</id><summary type="html">&lt;p&gt;How often does the best team win the league? Probably not as often as you think as it is not just talent that is required for success; a decent amount of luck is needed too.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;How often does the best team win the league? Probably not as often as you think as it is not just talent that is required for success; a decent amount of luck is needed too.&lt;/p&gt;
&lt;h2 id="methodology"&gt;Methodology&lt;/h2&gt;
&lt;p&gt;To investigate how big a role luck plays compared with ability I created a mathematical simulation based on the English Premier League (EPL) containing 20 teams that play each other twice per season. Each team was randomly assigned a skill level drawn from a normally distributed population with a mean of 0.5 and a standard distribution linked to the spread of talent across the league so that the disparity between the top and bottom clubs could be controlled. The simulation was then run for 10,000 seasons at various disparity levels and the number of times the team with the highest skill level won the league was measured.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mean Skill Level&lt;/td&gt;
&lt;td&gt;Disparity&lt;/td&gt;
&lt;td&gt;Mean Win %&lt;/td&gt;
&lt;td&gt;Best Team Win %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;65.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.02&lt;/td&gt;
&lt;td&gt;65.9&lt;/td&gt;
&lt;td&gt;10.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;67.4&lt;/td&gt;
&lt;td&gt;22.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;32.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.08&lt;/td&gt;
&lt;td&gt;72.1&lt;/td&gt;
&lt;td&gt;41.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;76.4&lt;/td&gt;
&lt;td&gt;46.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Effect of Disparity on League Champions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The first row in Table One shows what would happen if all teams in the league were identical. Each team has a skill level of 0.5, meaning that they would each be expected to win 50% of their matches and lose 50% (ignoring draws to keep the model simple). Due to random chance though some teams will win more than 50% and some will lose more than 50%. You can see that the average number of matches won by the league champions over 10,000 seasons was 65.5% so in an evenly matched EPL you would just need to be lucky enough to win an extra 15% of matches to be champions.&lt;/p&gt;
&lt;p&gt;As the disparity increases though, the influence of chance decreases and the best team goes on to win the league more often and more of their matches in the process. Take a look at this season’s EPL and while it is possible that QPR could go on to fluke wins in all their remaining matches and go on to win the league it would take a colossal amount of luck compared to say the amount of luck Manchester City would need to finish ahead of Manchester United since their skill levels are closer.&lt;/p&gt;
&lt;p&gt;This leads to the question though of what is preferable, an evenly matched, competitive league in which luck is a major determining factor in winning or a league that is perhaps fairer as it has enough disparity that the best team is predominantly likely to win?&lt;/p&gt;</content><category term="Chance"></category><category term="Chance"></category></entry><entry><title>The Poisson Model So Far</title><link href="2012/11/02/the-poisson-model-so-far/" rel="alternate"></link><published>2012-11-02T19:30:00+00:00</published><updated>2012-11-02T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-11-02:2012/11/02/the-poisson-model-so-far/</id><summary type="html">&lt;p&gt;In my last article I wrote about my experiences using the Poisson distribution to predict the outcome of football matches. The results so far have been rather disappointing so I thought I would have a look at where things were going wrong.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In my &lt;a href="http://pena.lt/y/2012/10/29/using-poisson-to-predict-football-matches/"&gt;last article&lt;/a&gt; I wrote about my experiences using the Poisson distribution to predict the outcome of football matches. The results so far have been rather disappointing so I thought I would have a look at where things were going wrong.&lt;/p&gt;
&lt;h2 id="probabilities"&gt;Probabilities&lt;/h2&gt;
&lt;p&gt;The first place I decided to look was at the probabilities generated for the matches predicted correctly compared with those predicted incorrectly. I suspected that maybe the model was struggling with matches between more evenly matched teams. For example, for last week’s match between Stoke and Sunderland the predicted outcome was a home win with a probability of 51%. This still leaves us with a 49% chance though that the game will finish with an away win or a draw instead making it potentially difficult to predict accurately.&lt;/p&gt;
&lt;p&gt;Overall, the average probability for games correctly predicted was 64% compared with 56% in the games where the prediction failed. At first look it would therefore appear that the model does struggle somewhat with games between more closely matched teams. However, when you look at the variability in the data it is not possible to discern between the two percentages (Figure 1). In fact comparing the data sets using analysis of variance (ANOVA) gives a p-value of 0.32 suggesting no statistical difference between the two percentages based on the current data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121102_poissonprob.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Average probabilities of matches correctly / incorrectly predicted by the Poisson model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Next I looked at which outcomes were being incorrectly predicted and a problem immediately became apparent. So far the model has predicted 50 matches of which 58% were predicted to be home wins, 34% as away wins and 8% as draws. Looking at what really happened though, of those 50 matches 42% were actually home wins, 30% away wins and 28% were draws (Figure 2). This suggests the model is under-predicting the likelihood of draws by quite a large margin and is actually predicting them as home wins.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121102_poissonproportions.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Proportion of Match Outcomes - Poisson vs Actual&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;A quick Google revealed two possible fixes. Karlis and Ntzoufras recommend replacing the independent Poisson with a bivariate Poisson to add an element of correlation between the home and away team’s scores. However, even with this they still needed to inflate the diagonal of the score matrix to try and improve the prediction of draws, suggesting that moving to the bivariate Poisson is not necessarily much of an improvement. An alternative proposal by Dixon and Coles was to stick with the two independent Poisson calculations but add in an additional parameter to modify the probabilities of 0-0, 1-1, 1-0 and 0-1 scores occurring.&lt;/p&gt;
&lt;p&gt;So where does this leave the current Poisson model? For me, it is time to move on to other ideas. The Poisson model is one the most widely used models for predicting football outcomes so I will return to it in the future to try out the Karlis and Ntzoufras and Dixon and Coles adjustments but I gave a few other ideas to write about first.&lt;/p&gt;</content><category term="Poisson"></category><category term="Poisson"></category><category term="R"></category></entry><entry><title>Using Poisson to Predict Football Matches</title><link href="2012/10/29/using-poisson-to-predict-football-matches/" rel="alternate"></link><published>2012-10-29T19:30:00+00:00</published><updated>2012-10-29T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-10-29:2012/10/29/using-poisson-to-predict-football-matches/</id><summary type="html">&lt;p&gt;The Power Of Goals recently blogged about using the Poisson distribution to predict the outcome of football matches. I have been evaluating the predictive ability of the Poisson for the English Premier League (EPL) this season so I thought I would share my experiences too.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href="http://thepowerofgoals.blogspot.co.uk/"&gt;Power Of Goals&lt;/a&gt; recently blogged about using the Poisson distribution to predict the outcome of football matches. I have been evaluating the predictive ability of the Poisson for the English Premier League (EPL) this season so I thought I would share my experiences too.&lt;/p&gt;
&lt;p&gt;For anyone who is unaware, the number of goals scored by each team in a football match roughly follows a Poisson distribution. As you can see in Figure 1, it is not exact though as the Poisson distribution underestimates the likelihood of no goals being scored and overestimates one, two and three goals being scored. By four goals and upwards the Poisson starts to underestimate again. The actual difference between the Poisson and what is observed in the EPL is reasonably small though so it just requires a small fudge factor to bring the two into line.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121029_poissonvsactual.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Poisson Distribution vs Observed&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To carry out the predictions I have written a script in &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt; that scrapes the Premier League table directly from the BBC’s website. The script then calculates attack and defence coefficients for each team by comparing their goals scored and conceded with the overall EPL average home and away. The predicted number of goals scored in a particular match can then be calculated by scaling the EPL’s average goals by the two team’s attack and defence coefficients. This can then be mapped to the Poisson distribution to generate a probability matrix for each particular score line (Table 1). From this, the probabilities can be summed to find the odds that each match will end as a home win, draw or away win.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Goals&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.96&lt;/td&gt;
&lt;td&gt;4.08&lt;/td&gt;
&lt;td&gt;4.24&lt;/td&gt;
&lt;td&gt;2.94&lt;/td&gt;
&lt;td&gt;1.53&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;0.22&lt;/td&gt;
&lt;td&gt;0.07&lt;/td&gt;
&lt;td&gt;0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.63&lt;/td&gt;
&lt;td&gt;7.56&lt;/td&gt;
&lt;td&gt;7.86&lt;/td&gt;
&lt;td&gt;5.45&lt;/td&gt;
&lt;td&gt;2.83&lt;/td&gt;
&lt;td&gt;1.18&lt;/td&gt;
&lt;td&gt;0.41&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.36&lt;/td&gt;
&lt;td&gt;7.00&lt;/td&gt;
&lt;td&gt;7.27&lt;/td&gt;
&lt;td&gt;5.04&lt;/td&gt;
&lt;td&gt;2.62&lt;/td&gt;
&lt;td&gt;1.09&lt;/td&gt;
&lt;td&gt;0.38&lt;/td&gt;
&lt;td&gt;0.11&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.08&lt;/td&gt;
&lt;td&gt;4.32&lt;/td&gt;
&lt;td&gt;4.49&lt;/td&gt;
&lt;td&gt;3.11&lt;/td&gt;
&lt;td&gt;1.62&lt;/td&gt;
&lt;td&gt;0.67&lt;/td&gt;
&lt;td&gt;0.23&lt;/td&gt;
&lt;td&gt;0.07&lt;/td&gt;
&lt;td&gt;0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;td&gt;2.08&lt;/td&gt;
&lt;td&gt;1.44&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.31&lt;/td&gt;
&lt;td&gt;0.11&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.74&lt;/td&gt;
&lt;td&gt;0.77&lt;/td&gt;
&lt;td&gt;0.53&lt;/td&gt;
&lt;td&gt;0.28&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.11&lt;/td&gt;
&lt;td&gt;0.23&lt;/td&gt;
&lt;td&gt;0.24&lt;/td&gt;
&lt;td&gt;0.16&lt;/td&gt;
&lt;td&gt;0.09&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;0.02&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Example Goal Probabilities (%)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since the predictions are based on past performance this season, I waited until week five of the EPL to start testing it so I had at least a month’s worth of previous results to work with. The first week went well, with the model correctly predicting the outcome of six of the ten matches that weekend. Table 2 shows the predicted probabilities (%) of the home team winning each match. From this I also calculated the odds and compared mine with those available from Betfair to see how well they compared.&lt;/p&gt;
&lt;table class="table"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Home&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Away&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Probability (%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Odds&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Betfair&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Result&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swansea&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;Everton&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;56.3&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;1.78&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;3.35&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;Chelsea&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;Stoke City&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;63.4&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.58&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.39&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Southampton&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;Aston Villa&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;49.2&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;2.03&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;3.1&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;West Brom&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;Reading&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;41.1&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;2.43&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.82&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;West Ham&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;Sunderland&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;35.7&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;2.80&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;2.24&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;DRAW&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;Wigan&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;Fulham&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;40.1&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;2.49&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;3.25&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;Liverpool&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;Man Utd&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;75.6&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.32&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;2.82&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;Newcastle&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;Norwich&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;82.9&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.21&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.84&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Man City&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;Arsenal&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;AWAY&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;37.1&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;2.70&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;1.78&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;DRAW&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td &gt;Tottenham&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;QPR&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;41.1&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;2.43&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;1.51&lt;/span&gt;&lt;/td&gt;
&lt;td &gt;HOME&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Table 2: EPL Week 5 Predictions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since then, the performance of the Poisson has between correctly predicting between 30-60% of matches each week (Figure 2). So far, the average accuracy is 46%, which is slightly higher than the 33% we could expect from randomly guessing each result.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121029_poissoncorrect.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Weekly Performance of Poisson Predictive Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I am hopeful the model’s success rate will improve over the course of the season as it gets more data to work with. There are also further improvements that can be made as well. For example, the model currently considers the goals scored by each team to be independent events. However, it may be that the two should be correlated together as it would seem intuitive that the more goals one teams scores the less likely the opposition is to score. At the moment though I wouldn’t place too much faith in the Poisson model.&lt;/p&gt;</content><category term="Poisson"></category><category term="Poisson"></category><category term="R"></category></entry><entry><title>Influence Of Clean Sheets</title><link href="2012/10/26/influence-of-clean-sheets/" rel="alternate"></link><published>2012-10-26T19:30:00+00:00</published><updated>2012-10-26T19:30:00+00:00</updated><author><name>Martin Eastwood</name></author><id>tag:None,2012-10-26:2012/10/26/influence-of-clean-sheets/</id><summary type="html">&lt;p&gt;To make much sense of the statistics available for football we need to have an understanding of their context so I am planning on starting off simple by looking at baselines for various events and statistics while I build up the information required to start a mathematical model.&lt;/p&gt;</summary><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;To make much sense of the statistics available for football we need to have an understanding of their context so I am planning on starting off simple by looking at baselines for various events and statistics while I build up the information required to start a mathematical model.&lt;/p&gt;
&lt;h2 id="clean-sheets"&gt;Clean Sheets&lt;/h2&gt;
&lt;p&gt;While most football analytics seems to focus heavily on goals, I am going to start off with defending and the all important clean sheet. Clean sheets have been fairly consistent throughout the English Premier League’s (EPL) history, occurring in around 27% of matches between 1993 and 2011 (Figure 1). The data shows some variability around the mean with perhaps the slightest hint of an upwards trend, but in general the total number of clean sheets per season has remained constant.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121026_cleansheets.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 1: Total English Premier League Clean Sheets&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="home-and-away"&gt;Home And Away&lt;/h2&gt;
&lt;p&gt;If we split the data by home and away then we can immediately see a significant difference (Figure 2; &lt;em&gt;p&amp;gt;0.001&lt;/em&gt;). On average, the home team will keep a clean sheet 33% of the time while the away team will only manage it in 22% of their matches. Interestingly, both sets of data appear to follow broadly similar patterns with peaks and troughs occurring in the same years. I hope to explore this in more detail in the future.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121026_homeawaycleansheets.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2: Clean Sheets Home and Away&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Clean sheets are valuable commodities as they guarantee you a minimum of one point. As the cliche goes, if you keep a clean sheet you cannot lose. Looking back over the EPL’s history shows that a clean sheet at home is actually worth 2.1 points on average. This means that over the course of a season obtaining a clean sheet in 33% of matches would be expected to generate 13.2 points. Away from home a clean sheet is of lower value,  generating just 1.8 points each. Over the course of a season this would therefore bring in an additional 7.5 points.&lt;/p&gt;
&lt;h2 id="the-english-premier-league"&gt;The English Premier League&lt;/h2&gt;
&lt;p&gt;We can use these baselines to examine how teams are performing in terms of clean sheets home and away. Figure three shows the proportion of matches in which each team in the EPL obtained clean sheets for the 2011-2012 season. The teams in the upper right quadrant all acheived an above average number of clean sheets both home and away. In comparison, West Brom’s defence performed very well at home yet they struggled to obtain clean sheets away from the Hawthorns. Liverpool were the opposite of West Brom, keeping clean sheets away from Anfield but struggling at home. Bolton, Blackburn and Wolves all generated very low numbers of clean sheets home and away and were all relegated from the EPL. Norwich are an interesting  exception as they possessed the worst away record for clean sheets yet managed to finish in a respectable 12th position last year.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121026_homeawaycomparison.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 3: Proportion of Matches With Clean Sheets Home and Away&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="league-position"&gt;League Position&lt;/h2&gt;
&lt;p&gt;If we carry out linear regression on 2011-2012’s data (Figure 4) we can see the correlation between the number of clean sheets a team kept over the season and their final league position. The r2 value of 0.72 for the regression shows that the two are strongly correlated with each other so any team not keeping clean sheets could be expected to finish lower down the league table. This does not bode well for current champions Manchester City, who have conceded goals in seven of their eight EPL matches this season.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pelican" src="../../../../images/121026_cleansheetregression.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 4: Correlation of Final League Position to Number of Clean Sheets 2011/2012&lt;/strong&gt;&lt;/p&gt;</content><category term="Misc"></category><category term="EPL"></category></entry></feed>