Introduction
The Premier League Fantasy Football is back ready for the new season so I thought I’d run through an example of how linear programming can help you select your team. If you haven’t come across linear programming before it’s a mathematical optimisation technique for that can be used to maximise the total number of points your team is worth within a set of constraints, e.g. staying within budget and not signing too many players from the same team.
Collecting The Data
The first thing we are going to need to do is scrape some data to optimise our team with so let’s fire up R. We are going to need the names of all the players that are available, what team they play for, how much they cost to sign and most importantly how many points they are worth. Conveniently, we can exploit of structure of the Premier League’s website to get the data and use it as a pseudo API.
DISCLAIMER: there is a fine line between scraping someone’s web site and creating a denial-of-service attack so make sure you spread out your calls to the website. Trying to scrape all the data in quick succession can put unnecessary strain on the site’s servers. If you scrape somebody’s data please ensure you do it in a way that does not impact the service they are providing!
#load libraries
library(lpSolve)
library(stringr)
library(RCurl)
library(jsonlite)
library(plyr)
# scrape the data
df = ldply(1:521, function(x){
# Scrape responsibly kids, we don't want to ddos
# the Fantasy Premier League's website
Sys.sleep(2.5)
url = sprintf("http://fantasy.premierleague.com/web/api/elements/%s/?format=json", x)
json = fromJSON(getURL(url))
json$now_cost = json$now_cost / 10
data.frame(json[names(json) %in%
c('web_name', 'team_name', 'type_name', 'now_cost', 'total_points')])
})
Constraints
Now we have the data we need to think about the constraints we will have to build into the linear system. For example, we can only spend a maximum of £100 million, we cannot have more than three players from the same team and are restricted to two goalkeepers, five defenders, five midfielders and three forwards.
#Create the constraints
num_gk = 2
num_def = 5
num_mid = 5
num_fwd = 3
max_cost = 100
# Create vectors to constrain by position
df$Goalkeeper = ifelse(df$type_name == "Goalkeeper", 1, 0)
df$Defender = ifelse(df$type_name == "Defender", 1, 0)
df$Midfielder = ifelse(df$type_name == "Midfielder", 1, 0)
df$Forward = ifelse(df$type_name == "Forward", 1, 0)
# Create vector to constrain by max number of players allowed per team
team_constraint = unlist(lapply(unique(df$team_name), function(x, df){
ifelse(df$team_name==x, 1, 0)
}, df=df))
# next we need the constraint directions
const_dir <- c("=", "=", "=", "=", rep("<=", 21))
The Objective
We also need to create the vector defining our objective, which is to maximise the number of points the team is worth within the constraints we are setting.
# The vector to optimize against
objective = df$total_points
Solving The Matrix
Finally, we put all the constraints into a matrix and let R solve the linear system to create our mathematically optimised team selection.
# Put the complete matrix together
const_mat = matrix(c(df$Goalkeeper, df$Defender, df$Midfielder, df$Forward,
df$now_cost, team_constraint),
nrow=(5 + length(unique(df$team_name))),
byrow=TRUE)
const_rhs = c(num_gk, num_def, num_mid, num_fwd, max_cost, rep(3, 20))
# And solve the linear system
x = lp ("max", objective, const_mat, const_dir, const_rhs, all.bin=TRUE, all.int=TRUE)
print(arrange(df[which(x$solution==1),], desc(Goalkeeper), desc(Defender), desc(Midfielder), desc(Forward), desc(total_points)))
The Results
The team the linear solver selected is shown in the table below – this is team with the highest possible number of points that can be achieved using the constraints we are working within.
Position | Team | Points | Name | Cost (£) |
Goalkeeper | Everton | 160 | Howard | 5.5 |
Goalkeeper | Crystal Palace | 144 | Speroni | 5 |
Defender | Everton | 180 | Coleman | 7 |
Defender | Chelsea | 172 | Terry | 6.5 |
Defender | Arsenal | 157 | Mertesacker | 6 |
Defender | Arsenal | 155 | Koscielny | 6 |
Defender | Southampton | 149 | Fonte | 5.5 |
Midfielder | Man City | 241 | Yaya Touré | 11 |
Midfielder | Liverpool | 205 | Gerrard | 9 |
Midfielder | Crystal Palace | 131 | Puncheon | 6 |
Midfielder | Stoke | 126 | Sidwell | 5.5 |
Midfielder | West Ham | 125 | Noble | 5.5 |
Forward | Arsenal | 187 | Giroud | 8.5 |
Forward | Liverpool | 179 | Lambert | 7.5 |
Forward | Aston Villa | 106 | Weimann | 5.5 |
Limitations
Now, before the internet gets grumpy and starts trolling me (whenever I’ve mentioned using mathematics for fantasy football people seem to get very irate) there are a few obvious limitations worth pointing out. First of all the new football season hasn’t started so I’m using the points totals from last season. This means all the players at the promoted teams and any new signings to the Premier League will have zero points and so will not get selected. I’m planning on running this script regularly throughout the coming season though to help guide my transfers, so as these players gain points they will start to get selected by the linear solver if they perform well enough.
Also, we’ve set the constraints to optimise for the best squad. You may want to spend all your money on the best possible first eleven and go for budget substitutes instead. For example, the table below shows what happens if you optimise for eleven players playing 1-3-4-3 at a total price of £82 million (this leaves enough to buy four substitutes at £4.5 million each).
Position | Team | Points | Name | Cost (£) |
Goalkeeper | Everton | 160 | Howard | 5.5 |
Defender | Everton | 180 | Coleman | 7 |
Defender | Chelsea | 172 | Terry | 6.5 |
Defender | Southampton | 149 | Fonte | 5.5 |
Midfielder | Man City | 241 | Yaya Touré | 11 |
Midfielder | Liverpool | 205 | Gerrard | 9 |
Midfielder | Liverpool | 178 | Lallana | 8.5 |
Midfielder | Stoke | 126 | Sidwell | 5.5 |
Forward | Arsenal | 187 | Giroud | 8.5 |
Forward | Liverpool | 179 | Lambert | 7.5 |
Forward | Southampton | 152 | Rodriguez | 7.5 |
Interestingly (for me at least) is that the cost of the players is fairly evenly spread across the team. Typically, when I select my fantasy football teams I tend to splash the cash on the big name strikers and then go for cheap defenders. However, based on these results though that’s looking like a bad decision so this season I’m going to follow the data and actually sign some decent defenders. Wish me luck…
Appendix
All code is available on GitHub