Introduction

The latest release of the penaltyblog Python package brings a major upgrade to your football analytics toolkit.

Highlights include:

Interactive plotting for rich, dynamic pitch visualisations.
Faster goal-model fitting with improved stability.
More accurate goal-expectancy estimation.
Powerful Flow query tools for streamlined data access.

Pitch - The Interactive Football Plotting Library

The standout feature in this release is the brand-new Pitch plotting API - a fully interactive, Plotly-powered framework for building rich football pitch visualisations in Python.

Key capabilities:

Multiple built-in pitch dimensions and themes.
Horizontal or vertical layouts.
Flexible view modes for zooming into specific areas.
Layering of scatter points, heatmaps, kernel-density surfaces, comets, arrows, and more.
Custom hover tooltips, colour schemes, opacity, and orientation controls.

You can:

Display charts inline in Jupyter notebooks.
Save as static images.
Export as standalone HTML for embedding anywhere.

In short, it’s a single, versatile tool for producing publication-ready, explorable football data visuals.

Example: Shot Map

The example below uses the new Pitch API to visualise every shot taken by Liverpool in match ID 22912 from StatsBomb.

Process:

Data is pulled directly from StatsBomb via the Flow API.
We filter for Liverpool’s shots.
We render them as an interactive Plotly chart.

Each marker shows a shot’s location, with hover tooltips displaying the player’s name and exact coordinates.

from penaltyblog.viz import Pitch
from penaltyblog.matchflow import Flow, where_equals, get_field


# 1) Pull and prep the data with Flow
def fmt(v):  # stringifier (avoids None issues)
    return "" if v is None else str(v)

flow = (
    Flow
    .statsbomb
    .events(22912, optimize=True)
    .filter(where_equals("type.name", "Shot"))
    .filter(where_equals("team.name", "Liverpool"))
    .assign(
        hover_text=lambda x: f"{fmt(get_field(x,'player.name'))}: "
                             f"({fmt(get_field(x,'location.0'))}, "
                             f"{fmt(get_field(x,'location.1'))})"
    )
)

# 2) Create the pitch
pitch = Pitch(
    provider="statsbomb",
    orientation="horizontal",
    view="full",
    theme="night",
    show_axis=False,
    show_legend=False,
    title="Liverpool shots (StatsBomb 22912)",
    subtitle="Hover for player + shot location",
)

# 3) Plot the shot map
pitch.plot_scatter(flow, "location.0", "location.1", hover="hover_text")

pitch.show()

Example: Layering Plots

One of the most powerful features of the new Pitch API is layering - the ability to stack multiple plot types in a single interactive view.

You can:

Start with a clean base pitch.
Add scatter plots for individual events.
Overlay heatmaps or kernel density estimates for spatial intensity.
Draw arrows to indicate passes or runs.
Use “comet” trails to visualise player or ball movement.

Each layer:

Is fully customisable in colour, opacity, and tooltip content.
Can be shown, hidden, or reordered at any time.
Works seamlessly together to create rich, multi-dimensional visualisations.

import numpy as np
import pandas as pd
from penaltyblog.viz import Pitch

# 1) Pull and prep the data with Flow
flow = (
    Flow
    .statsbomb
    .events(22912, optimize=True)
    .filter(where_equals("type.name", "Pass"))
    .filter(where_equals("team.name", "Liverpool"))
    .assign(hover_text=lambda x: get_field(x, "player.name") + ": ")
)

# 2) Create the pitch
pitch = Pitch(
    provider="statsbomb",
    orientation="horizontal",
    view="full",
    theme="night",
    show_axis=False,
    show_legend=False,
    title="Liverpool Passes (StatsBomb 22912)",
    subtitle="Hover for player + location",    
)

# 3) Plot a smooth KDE (continuous) with fine grid
pitch.plot_heatmap(
    flow,
    x="location.0",
    y="location.1",
    show_colorbar=True,
    colorscale="Viridis",
    opacity=0.75,
)

# 4) Overlay the raw points 
pitch.plot_scatter(
    flow,
    x="location.0",
    y="location.1",    
    size=4,
    color="white",
    hover="hover_text"
)

# 5) Show the result
pitch.show()

Pitch Documentation

You can explore the full range of Pitch features in the penaltyblog documentation.

This is the first release of Pitch, so there may be a few edge cases I haven’t encountered yet. If you run into any unexpected behavior, please:

Open an issue on Github
Or send me a message directly via the blog

Your feedback will help refine Pitch and shape future updates.

Blazingly Fast Goal Models

Goal model fitting in penaltyblog is now 5–10x faster thanks to two major changes:

Analytical Jacobian gradients are now used during optimisation, replacing slower numerical approximations.
More core routines are implemented in Cython for low-level performance.

With exact gradient information, the optimiser converges more quickly and with greater numerical stability. In practice, you can fit models on large datasets in a fraction of the time, without sacrificing the accuracy or robustness of the original implementation.

More Control Over Optimisation

The updated fitting API also exposes a minimizer_options parameter, allowing you to pass arguments directly to scipy.optimize.minimize.

This means you can:

Increase maxiter for tricky fits.
Tighten gtol or ftol for higher precision.
Fine-tune convergence settings without modifying library code.

Combined with the new Cython-powered gradients, this flexibility makes it easier to adapt training to different datasets, convergence requirements, and performance constraints.

Example: Dixon–Coles Model with Custom Optimisation

import penaltyblog as pb

model = pb.models.DixonColesGoalModel(gh, ga, th, ta, weights=weights)

model.fit(
    use_gradient=True,                # optional; can be False for back-compat
    minimizer_options={               # optional; passes to `scipy.optimize.minimize`
        "maxiter": 3000,              # more iterations if needed
        "gtol": 1e-8,                 # gradient tolerance
        "ftol": 1e-9,                 # function tolerance
    }
)

Improved `FootballProbabilityGrid`

Every penaltyblog goals model returns a FootballProbabilityGrid from .predict(). It contains the full scoreline probability matrix for a match and makes it easy to calculate market-ready probabilities.

The latest update adds normalization, expanded markets, better Asian Handicap handling, and richer representation, all while maintaining backwards compatibility.

The new implementation includes:

1. Optional Normalization

You can now pass normalize=True (default) to .predict() to ensure the probability grid sums to exactly 1.0.

Why it matters:

Without normalization: Preserves the true Poisson/Dixon–Coles mass beyond max_goals (purist approach), but markets like 1X2 + totals may not sum exactly to 1.0.
With normalization: Rescales probabilities so all markets are internally consistent - important for pricing, trading, and backtesting.

For a more detailed discussion, see the documentation.

2. Expanded Market Calculations

The grid now computes more markets out of the box:

1X2: home_win, draw, away_win
Convenience: home_draw_away → [home_win, draw, away_win]
Both Teams to Score (BTTS): btts_yes, btts_no
Totals: total_goals(over_under, strike) for any goal line (e.g. 2.5, 3.0)
Asian Handicap: asian_handicap and asian_handicap_probs with push handling

3. Push-Aware Asian Handicap Calculations

The updated asian_handicap method correctly handles:

Full win
Half win / Half loss
Push (stake returned)

It supports quarter, half, and full goal handicaps, making it possible to price complex AH markets directly from the grid — no manual matrix work required.

4. Richer Representation

Calling repr() now shows:

Model name
Home/Away goal expectations
Core 1X2 probabilities

Great for quick inspection in interactive sessions.

5. Backwards Compatibility

All previous methods still work.
normalize defaults to True for consistency, but you can disable it for raw probabilities.

Example: Predicting & Using the Grid

# Fit your goals model
model.fit()
pred = model.predict("Arsenal", "Chelsea", max_goals=15, normalize=True)

# Core markets (1x2)
print("P(Home win), P(Draw), P(Away win):", pred.home_draw_away)
print("P(Home win):", pred.home_win)
print("P(Draw):", pred.draw)
print("P(Away win):", pred.away_win)

# Goal expectancy
print("Home xG:", pred.home_goal_expectation)
print("Away xG:", pred.away_goal_expectation)

# Both teams to score
print("BTTS (Yes):", pred.btts_yes)
print("BTTS (No):", pred.btts_no)

# Totals
u, p, o = pred.totals(2.0)
print("Totals 2.0  -> Under, Push, Over:", (u, p, o))

u, p, o = pred.totals(2.5)
print("Totals 2.5  -> Under, Push, Over:", (u, p, o))

print("P(Over 2.5):", pred.total_goals("over", 2.5))

# Asian Handicaps
print("AH Home -0.5  (win prob only):", pred.asian_handicap("home", -0.5))
print("AH Home -0.25 (Win/Push/Lose):", pred.asian_handicap_probs("home", -0.25))
print("AH Away +1.0  (Win/Push/Lose):", pred.asian_handicap_probs("away", +1.0))

# Double chance and DNB
print("Double chance 1X:", pred.double_chance_1x)
print("Double chance X2:", pred.double_chance_x2)
print("Double chance 12:", pred.double_chance_12)
print("DNB Home (conditional win prob):", pred.draw_no_bet_home)
print("DNB Away (conditional win prob):", pred.draw_no_bet_away)

# Exact scores and distributions
print("P(Exact score 2-1):", pred.exact_score(2, 1))
print("Home goal distribution (P(H=k)):", pred.home_goal_distribution())
print("Away goal distribution (P(A=k)):", pred.away_goal_distribution())
print("Total goals distribution (P(T=k)):", pred.total_goals_distribution())

With the improved FootballProbabilityGrid, you can go from a fitted model to fully priced betting markets in one step - no direct NumPy work required.

Flow Query DSL

Flow.query(expr) lets you filter records with a compact, readable string expression. It’s safe (parsed via Python’s AST, not eval), validated, and compiled into an efficient predicate.

Use it to:

Prototype filters quickly.
Keep pipelines readable.
Let end users define filters without writing Python.

How It Works

The query string is parsed into an AST.
The AST is validated for safety.
It’s compiled into a fast predicate function.
Variables from your caller’s local scope can be injected with @var.

Supported syntax

Comparisons

==, !=, >, >=, <, <=

Chained comparisons are supported and expanded internally (e.g., 0 <= goals <= 5).

flow.query("goals_home >= 2")
flow.query("0 <= goals_home <= 5")

Logical operators

and, or, not (use parentheses to control precedence)

flow.query("home_team == 'Liverpool' and goals_home > goals_away")
flow.query("not (home_team == 'Arsenal' or away_team == 'Arsenal')")

Field access (dot notation)

Access nested fields with . (resolved via get_field):

flow.query("venue.city == 'London'")
flow.query("player.stats.minutes >= 60")

Membership

in, not in - note that field must be on the left-hand side

flow.query("home_team in ['Chelsea', 'Tottenham']")
flow.query("league not in ['Premier League', 'La Liga']")
# "Man City" in home_team  ← not currently supported

NULL / missing checks

Identity comparisons with None:

flow.query("player.injury_status is None")
flow.query("player.injury_status is not None")

String transforms (inside comparisons)

len(x)
.lower()
.upper()

flow.query("home_team.lower() == 'manchester united'")
flow.query("len(home_team) < 8")

These must be used in a comparison (e.g., field.lower() == 'x'). Standalone field.lower() as a predicate will raise an error.

Predicate-style string methods (standalone)

.contains(substring)
.startswith(prefix)
.endswith(suffix)
.regex(pattern, flags)

flow.query("home_team.contains('united')")
flow.query("away_team.startswith('West')")
flow.query("player.name.regex('^Mo')")

Regex flags

Pass flags from the re module via a local variable:

import re

pattern = r"liverpool"
flags = re.IGNORECASE
flow.query("home_team.regex(@pattern, @flags)")

Variables from Python (`@var`)

Inject values from your local scope with @var:

team = "Liverpool"
cutoff = date(2023, 1, 1)

flow.query("home_team == @team and match_date >= @cutoff")

Variables must exist in the local namespace where you call .query(...).

Date & datetime literals

Create date/datetime values inside the expression:

flow.query("match_date > date(2024, 6, 30)")
flow.query("kickoff >= datetime(2025, 8, 11, 19, 45)")

Supported literal constructors: date(Y, M, D), datetime(Y, M, D, h, m, s).

Common Patterns

flow.query("""
    league == 'NLD Eredivisie'
    and season == '2024-2025'
    and total_goals >= 3
""")  # NOTE: arithmetic isn't supported directly; see tip below

# Use an assigned field for arithmetic first:
flow.assign(total_goals=lambda r: r["goals_home"] + r["goals_away"]) \
    .query("total_goals >= 3")

# Team in a rolling window (with local variable)
from datetime import date

start = date(2025, 6, 1)
flow.query("(team_home == 'PSV' or team_away == 'PSV') and match_date >= @start")

# Regex with flags
import re

pattern = r"^[A-Z][a-z]+$"
flags = re.IGNORECASE
flow.query("player.name.regex(@pattern, @flags)")

Performance notes

Each call compiles the expression once, then applies the predicate efficiently.
Filtering is fastest when you reduce nested lookups early (e.g., call .flatten() if appropriate).
Use .assign(...) to precompute values you’ll filter on (e.g., total_goals), then query on that key.

Gotchas & limitations

Arithmetic inside queries is not currently supported.
Do the math in .assign(...) or .map(...), then filter on the derived field.
For in / not in, the field must be on the left.
.lower() / .upper() / len() must be used within a comparison.
@var substitution uses the caller’s local variables only.
Only the following literal functions are currently supported inside queries: date(...), datetime(...).

Quick reference

Comparisons: == != > >= < <= (chained comparisons allowed)
Logic: and · or · not
Membership: in · not in (field on LHS)
Nulls: is None · is not None
Transforms (in comparisons): len(x) · .lower()· .upper()
Predicates: .contains() · .startswith() · .endswith()· .regex()
Literals: date(Y, M, D) · datetime(Y, M, D, h, m, s)
Variables: @var (from caller’s locals)

Conclusions

v1.5.0 is all about speed, clarity, and control.

From the new Flow.query DSL for filtering with clean, Pythonic expressions, to the upgraded FootballProbabilityGrid that’s market-ready out of the box, to the Pitch API for building rich, interactive visualisations - every change in this release helps you move from raw data to reliable, actionable insights faster than ever.

Whether you’re exploring match data, pricing markets, or building visuals for analysis, penaltyblog v1.5.0 gives you the tools to work smarter and faster.

Penaltyblog v1.5.0: Faster Models, Smarter Queries, and a Sharper Edge