Introduction
The latest release of the penaltyblog
Python package brings a major upgrade to your football analytics toolkit.
Highlights include:
- Interactive plotting for rich, dynamic pitch visualisations.
- Faster goal-model fitting with improved stability.
- More accurate goal-expectancy estimation.
- Powerful Flow query tools for streamlined data access.
Pitch - The Interactive Football Plotting Library
The standout feature in this release is the brand-new Pitch
plotting API - a fully interactive, Plotly-powered framework for building rich football pitch visualisations in Python.
Key capabilities:
- Multiple built-in pitch dimensions and themes.
- Horizontal or vertical layouts.
- Flexible view modes for zooming into specific areas.
- Layering of scatter points, heatmaps, kernel-density surfaces, comets, arrows, and more.
- Custom hover tooltips, colour schemes, opacity, and orientation controls.
You can:
- Display charts inline in Jupyter notebooks.
- Save as static images.
- Export as standalone HTML for embedding anywhere.
In short, it’s a single, versatile tool for producing publication-ready, explorable football data visuals.
Example: Shot Map
The example below uses the new Pitch
API to visualise every shot taken by Liverpool in match ID 22912 from StatsBomb.
Process:
- Data is pulled directly from StatsBomb via the
Flow
API. - We filter for Liverpool’s shots.
- We render them as an interactive Plotly chart.
Each marker shows a shot’s location, with hover tooltips displaying the player’s name and exact coordinates.
from penaltyblog.viz import Pitch
from penaltyblog.matchflow import Flow, where_equals, get_field
# 1) Pull and prep the data with Flow
def fmt(v): # stringifier (avoids None issues)
return "" if v is None else str(v)
flow = (
Flow
.statsbomb
.events(22912, optimize=True)
.filter(where_equals("type.name", "Shot"))
.filter(where_equals("team.name", "Liverpool"))
.assign(
hover_text=lambda x: f"{fmt(get_field(x,'player.name'))}: "
f"({fmt(get_field(x,'location.0'))}, "
f"{fmt(get_field(x,'location.1'))})"
)
)
# 2) Create the pitch
pitch = Pitch(
provider="statsbomb",
orientation="horizontal",
view="full",
theme="night",
show_axis=False,
show_legend=False,
title="Liverpool shots (StatsBomb 22912)",
subtitle="Hover for player + shot location",
)
# 3) Plot the shot map
pitch.plot_scatter(flow, "location.0", "location.1", hover="hover_text")
pitch.show()
Example: Layering Plots
One of the most powerful features of the new Pitch
API is layering - the ability to stack multiple plot types in a single interactive view.
You can:
- Start with a clean base pitch.
- Add scatter plots for individual events.
- Overlay heatmaps or kernel density estimates for spatial intensity.
- Draw arrows to indicate passes or runs.
- Use “comet” trails to visualise player or ball movement.
Each layer:
- Is fully customisable in colour, opacity, and tooltip content.
- Can be shown, hidden, or reordered at any time.
- Works seamlessly together to create rich, multi-dimensional visualisations.
import numpy as np
import pandas as pd
from penaltyblog.viz import Pitch
# 1) Pull and prep the data with Flow
flow = (
Flow
.statsbomb
.events(22912, optimize=True)
.filter(where_equals("type.name", "Pass"))
.filter(where_equals("team.name", "Liverpool"))
.assign(hover_text=lambda x: get_field(x, "player.name") + ": ")
)
# 2) Create the pitch
pitch = Pitch(
provider="statsbomb",
orientation="horizontal",
view="full",
theme="night",
show_axis=False,
show_legend=False,
title="Liverpool Passes (StatsBomb 22912)",
subtitle="Hover for player + location",
)
# 3) Plot a smooth KDE (continuous) with fine grid
pitch.plot_heatmap(
flow,
x="location.0",
y="location.1",
show_colorbar=True,
colorscale="Viridis",
opacity=0.75,
)
# 4) Overlay the raw points
pitch.plot_scatter(
flow,
x="location.0",
y="location.1",
size=4,
color="white",
hover="hover_text"
)
# 5) Show the result
pitch.show()
Pitch Documentation
You can explore the full range of Pitch
features in the penaltyblog
documentation.
This is the first release of Pitch
, so there may be a few edge cases I haven’t encountered yet. If you run into any unexpected behavior, please:
- Open an issue on Github
- Or send me a message directly via the blog
Your feedback will help refine Pitch and shape future updates.
Blazingly Fast Goal Models
Goal model fitting in penaltyblog
is now 5–10x faster thanks to two major changes:
- Analytical Jacobian gradients are now used during optimisation, replacing slower numerical approximations.
- More core routines are implemented in Cython for low-level performance.
With exact gradient information, the optimiser converges more quickly and with greater numerical stability. In practice, you can fit models on large datasets in a fraction of the time, without sacrificing the accuracy or robustness of the original implementation.
More Control Over Optimisation
The updated fitting API also exposes a minimizer_options
parameter, allowing you to pass arguments directly to scipy.optimize.minimize
.
This means you can:
- Increase
maxiter
for tricky fits. - Tighten
gtol
orftol
for higher precision. - Fine-tune convergence settings without modifying library code.
Combined with the new Cython-powered gradients, this flexibility makes it easier to adapt training to different datasets, convergence requirements, and performance constraints.
Example: Dixon–Coles Model with Custom Optimisation
import penaltyblog as pb
model = pb.models.DixonColesGoalModel(gh, ga, th, ta, weights=weights)
model.fit(
use_gradient=True, # optional; can be False for back-compat
minimizer_options={ # optional; passes to `scipy.optimize.minimize`
"maxiter": 3000, # more iterations if needed
"gtol": 1e-8, # gradient tolerance
"ftol": 1e-9, # function tolerance
}
)
Improved FootballProbabilityGrid
Every penaltyblog
goals model returns a FootballProbabilityGrid
from .predict().
It contains the full scoreline probability matrix for a match and makes it easy to calculate market-ready probabilities.
The latest update adds normalization, expanded markets, better Asian Handicap handling, and richer representation, all while maintaining backwards compatibility.
The new implementation includes:
1. Optional Normalization
You can now pass normalize=True
(default) to .predict()
to ensure the probability grid sums to exactly 1.0.
Why it matters:
- Without normalization: Preserves the true Poisson/Dixon–Coles mass beyond max_goals (purist approach), but markets like 1X2 + totals may not sum exactly to 1.0.
- With normalization: Rescales probabilities so all markets are internally consistent - important for pricing, trading, and backtesting.
For a more detailed discussion, see the documentation.
2. Expanded Market Calculations
The grid now computes more markets out of the box:
- 1X2:
home_win
,draw
,away_win
- Convenience:
home_draw_away
→[home_win, draw, away_win]
- Both Teams to Score (BTTS):
btts_yes
,btts_no
- Totals:
total_goals(over_under, strike)
for any goal line (e.g. 2.5, 3.0) - Asian Handicap:
asian_handicap
andasian_handicap_probs
with push handling
3. Push-Aware Asian Handicap Calculations
The updated asian_handicap method correctly handles:
- Full win
- Half win / Half loss
- Push (stake returned)
It supports quarter, half, and full goal handicaps, making it possible to price complex AH markets directly from the grid — no manual matrix work required.
4. Richer Representation
Calling repr()
now shows:
- Model name
- Home/Away goal expectations
- Core 1X2 probabilities
Great for quick inspection in interactive sessions.
5. Backwards Compatibility
- All previous methods still work.
normalize
defaults toTrue
for consistency, but you can disable it for raw probabilities.
Example: Predicting & Using the Grid
# Fit your goals model
model.fit()
pred = model.predict("Arsenal", "Chelsea", max_goals=15, normalize=True)
# Core markets (1x2)
print("P(Home win), P(Draw), P(Away win):", pred.home_draw_away)
print("P(Home win):", pred.home_win)
print("P(Draw):", pred.draw)
print("P(Away win):", pred.away_win)
# Goal expectancy
print("Home xG:", pred.home_goal_expectation)
print("Away xG:", pred.away_goal_expectation)
# Both teams to score
print("BTTS (Yes):", pred.btts_yes)
print("BTTS (No):", pred.btts_no)
# Totals
u, p, o = pred.totals(2.0)
print("Totals 2.0 -> Under, Push, Over:", (u, p, o))
u, p, o = pred.totals(2.5)
print("Totals 2.5 -> Under, Push, Over:", (u, p, o))
print("P(Over 2.5):", pred.total_goals("over", 2.5))
# Asian Handicaps
print("AH Home -0.5 (win prob only):", pred.asian_handicap("home", -0.5))
print("AH Home -0.25 (Win/Push/Lose):", pred.asian_handicap_probs("home", -0.25))
print("AH Away +1.0 (Win/Push/Lose):", pred.asian_handicap_probs("away", +1.0))
# Double chance and DNB
print("Double chance 1X:", pred.double_chance_1x)
print("Double chance X2:", pred.double_chance_x2)
print("Double chance 12:", pred.double_chance_12)
print("DNB Home (conditional win prob):", pred.draw_no_bet_home)
print("DNB Away (conditional win prob):", pred.draw_no_bet_away)
# Exact scores and distributions
print("P(Exact score 2-1):", pred.exact_score(2, 1))
print("Home goal distribution (P(H=k)):", pred.home_goal_distribution())
print("Away goal distribution (P(A=k)):", pred.away_goal_distribution())
print("Total goals distribution (P(T=k)):", pred.total_goals_distribution())
With the improved FootballProbabilityGrid
, you can go from a fitted model to fully priced betting markets in one step - no direct NumPy work required.
Flow Query DSL
Flow.query(expr)
lets you filter records with a compact, readable string expression. It’s safe (parsed via Python’s AST, not eval
), validated, and compiled into an efficient predicate.
Use it to:
- Prototype filters quickly.
- Keep pipelines readable.
- Let end users define filters without writing Python.
How It Works
- The query string is parsed into an AST.
- The AST is validated for safety.
- It’s compiled into a fast predicate function.
- Variables from your caller’s local scope can be injected with
@var
.
Supported syntax
Comparisons
==
, !=
, >
, >=
, <
, <=
Chained comparisons are supported and expanded internally (e.g., 0 <= goals <= 5
).
flow.query("goals_home >= 2")
flow.query("0 <= goals_home <= 5")
Logical operators
and
, or
, not
(use parentheses to control precedence)
flow.query("home_team == 'Liverpool' and goals_home > goals_away")
flow.query("not (home_team == 'Arsenal' or away_team == 'Arsenal')")
Field access (dot notation)
Access nested fields with .
(resolved via get_field):
flow.query("venue.city == 'London'")
flow.query("player.stats.minutes >= 60")
Membership
in
, not
in
- note that field must be on the left-hand side
flow.query("home_team in ['Chelsea', 'Tottenham']")
flow.query("league not in ['Premier League', 'La Liga']")
# "Man City" in home_team ← not currently supported
NULL / missing checks
Identity comparisons with None
:
flow.query("player.injury_status is None")
flow.query("player.injury_status is not None")
String transforms (inside comparisons)
len(x)
.lower()
.upper()
flow.query("home_team.lower() == 'manchester united'")
flow.query("len(home_team) < 8")
These must be used in a comparison (e.g.,
field.lower() == 'x'
). Standalonefield.lower()
as a predicate will raise an error.
Predicate-style string methods (standalone)
.contains(substring)
.startswith(prefix)
.endswith(suffix)
.regex(pattern, flags)
flow.query("home_team.contains('united')")
flow.query("away_team.startswith('West')")
flow.query("player.name.regex('^Mo')")
Regex flags
Pass flags from the re
module via a local variable:
import re
pattern = r"liverpool"
flags = re.IGNORECASE
flow.query("home_team.regex(@pattern, @flags)")
Variables from Python (@var
)
Inject values from your local scope with @var
:
team = "Liverpool"
cutoff = date(2023, 1, 1)
flow.query("home_team == @team and match_date >= @cutoff")
Variables must exist in the local namespace where you call
.query(...)
.
Date & datetime literals
Create date/datetime values inside the expression:
flow.query("match_date > date(2024, 6, 30)")
flow.query("kickoff >= datetime(2025, 8, 11, 19, 45)")
Supported literal constructors:
date(Y, M, D)
,datetime(Y, M, D, h, m, s).
Common Patterns
flow.query("""
league == 'NLD Eredivisie'
and season == '2024-2025'
and total_goals >= 3
""") # NOTE: arithmetic isn't supported directly; see tip below
# Use an assigned field for arithmetic first:
flow.assign(total_goals=lambda r: r["goals_home"] + r["goals_away"]) \
.query("total_goals >= 3")
# Team in a rolling window (with local variable)
from datetime import date
start = date(2025, 6, 1)
flow.query("(team_home == 'PSV' or team_away == 'PSV') and match_date >= @start")
# Regex with flags
import re
pattern = r"^[A-Z][a-z]+$"
flags = re.IGNORECASE
flow.query("player.name.regex(@pattern, @flags)")
Performance notes
- Each call compiles the expression once, then applies the predicate efficiently.
- Filtering is fastest when you reduce nested lookups early (e.g., call .
flatten()
if appropriate). - Use
.assign(...)
to precompute values you’ll filter on (e.g., total_goals), then query on that key.
Gotchas & limitations
- Arithmetic inside queries is not currently supported.
- Do the math in
.assign(...)
or.map(...)
, then filter on the derived field. - For
in
/not in
, the field must be on the left. .lower()
/.upper()
/len()
must be used within a comparison.@var
substitution uses the caller’s local variables only.- Only the following literal functions are currently supported inside queries:
date(...)
,datetime(...)
.
Quick reference
- Comparisons:
== != > >= < <=
(chained comparisons allowed) - Logic:
and
·or
·not
- Membership:
in
·not in
(field on LHS) - Nulls:
is None
·is not None
- Transforms (in comparisons):
len(x)
·.lower()
·.upper()
- Predicates:
.contains()
·.startswith()
·.endswith()
·.regex()
- Literals:
date(Y, M, D)
·datetime(Y, M, D, h, m, s)
- Variables:
@var
(from caller’s locals)
Conclusions
v1.5.0 is all about speed, clarity, and control.
From the new Flow.query
DSL for filtering with clean, Pythonic expressions, to the upgraded FootballProbabilityGrid
that’s market-ready out of the box, to the Pitch
API for building rich, interactive visualisations - every change in this release helps you move from raw data to reliable, actionable insights faster than ever.
Whether you’re exploring match data, pricing markets, or building visuals for analysis, penaltyblog
v1.5.0 gives you the tools to work smarter and faster.