The Premier League Fantasy Football is back ready for the new season so I thought I’d run through an example of how linear programming can help you select your team. If you haven’t come across linear programming before it’s a mathematical optimisation technique for that can be used to maximise the total number of points your team is worth within a set of constraints, e.g. staying within budget and not signing too many players from the same team.
The first thing we are going to need to do is scrape some data to optimise our team with so let’s fire up R. We are going to need the names of all the players that are available, what team they play for, how much they cost to sign and most importantly how many points they are worth. Conveniently, we can exploit of structure of the Premier League’s website to get the data and use it as a pseudo API.
DISCLAIMER: there is a fine line between scraping someone’s web site and creating a denial-of-service attack so make sure you spread out your calls to the website. Trying to scrape all the data in quick succession can put unnecessary strain on the site’s servers. If you scrape somebody’s data please ensure you do it in a way that does not impact the service they are providing!
#load libraries
library(lpSolve)
library(stringr)
library(RCurl)
library(jsonlite)
library(plyr)
# scrape the data
df = ldply(1:521, function(x){
# Scrape responsibly kids, we don't want to ddos
# the Fantasy Premier League's website
Sys.sleep(2.5)
url = sprintf("http://fantasy.premierleague.com/web/api/elements/%s/?format=json", x)
json = fromJSON(getURL(url))
json$now_cost = json$now_cost / 10
data.frame(json[names(json) %in%
c('web_name', 'team_name', 'type_name', 'now_cost', 'total_points')])
})
Now we have the data we need to think about the constraints we will have to build into the linear system. For example, we can only spend a maximum of £100 million, we cannot have more than three players from the same team and are restricted to two goalkeepers, five defenders, five midfielders and three forwards.
#Create the constraints
num_gk = 2
num_def = 5
num_mid = 5
num_fwd = 3
max_cost = 100
# Create vectors to constrain by position
df$Goalkeeper = ifelse(df$type_name == "Goalkeeper", 1, 0)
df$Defender = ifelse(df$type_name == "Defender", 1, 0)
df$Midfielder = ifelse(df$type_name == "Midfielder", 1, 0)
df$Forward = ifelse(df$type_name == "Forward", 1, 0)
# Create vector to constrain by max number of players allowed per team
team_constraint = unlist(lapply(unique(df$team_name), function(x, df){
ifelse(df$team_name==x, 1, 0)
}, df=df))
# next we need the constraint directions
const_dir <- c("=", "=", "=", "=", rep("<=", 21))
We also need to create the vector defining our objective, which is to maximise the number of points the team is worth within the constraints we are setting.
# The vector to optimize against
objective = df$total_points
Finally, we put all the constraints into a matrix and let R solve the linear system to create our mathematically optimised team selection.
# Put the complete matrix together
const_mat = matrix(c(df$Goalkeeper, df$Defender, df$Midfielder, df$Forward,
df$now_cost, team_constraint),
nrow=(5 + length(unique(df$team_name))),
byrow=TRUE)
const_rhs = c(num_gk, num_def, num_mid, num_fwd, max_cost, rep(3, 20))
# And solve the linear system
x = lp ("max", objective, const_mat, const_dir, const_rhs, all.bin=TRUE, all.int=TRUE)
print(arrange(df[which(x$solution==1),], desc(Goalkeeper), desc(Defender), desc(Midfielder), desc(Forward), desc(total_points)))
The team the linear solver selected is shown in the table below – this is team with the highest possible number of points that can be achieved using the constraints we are working within.
Position | Team | Points | Name | Cost (£) |
Goalkeeper | Everton | 160 | Howard | 5.5 |
Goalkeeper | Crystal Palace | 144 | Speroni | 5 |
Defender | Everton | 180 | Coleman | 7 |
Defender | Chelsea | 172 | Terry | 6.5 |
Defender | Arsenal | 157 | Mertesacker | 6 |
Defender | Arsenal | 155 | Koscielny | 6 |
Defender | Southampton | 149 | Fonte | 5.5 |
Midfielder | Man City | 241 | Yaya Touré | 11 |
Midfielder | Liverpool | 205 | Gerrard | 9 |
Midfielder | Crystal Palace | 131 | Puncheon | 6 |
Midfielder | Stoke | 126 | Sidwell | 5.5 |
Midfielder | West Ham | 125 | Noble | 5.5 |
Forward | Arsenal | 187 | Giroud | 8.5 |
Forward | Liverpool | 179 | Lambert | 7.5 |
Forward | Aston Villa | 106 | Weimann | 5.5 |
Now, before the internet gets grumpy and starts trolling me (whenever I’ve mentioned using mathematics for fantasy football people seem to get very irate) there are a few obvious limitations worth pointing out. First of all the new football season hasn’t started so I’m using the points totals from last season. This means all the players at the promoted teams and any new signings to the Premier League will have zero points and so will not get selected. I’m planning on running this script regularly throughout the coming season though to help guide my transfers, so as these players gain points they will start to get selected by the linear solver if they perform well enough.
Also, we’ve set the constraints to optimise for the best squad. You may want to spend all your money on the best possible first eleven and go for budget substitutes instead. For example, the table below shows what happens if you optimise for eleven players playing 1-3-4-3 at a total price of £82 million (this leaves enough to buy four substitutes at £4.5 million each).
Position | Team | Points | Name | Cost (£) |
Goalkeeper | Everton | 160 | Howard | 5.5 |
Defender | Everton | 180 | Coleman | 7 |
Defender | Chelsea | 172 | Terry | 6.5 |
Defender | Southampton | 149 | Fonte | 5.5 |
Midfielder | Man City | 241 | Yaya Touré | 11 |
Midfielder | Liverpool | 205 | Gerrard | 9 |
Midfielder | Liverpool | 178 | Lallana | 8.5 |
Midfielder | Stoke | 126 | Sidwell | 5.5 |
Forward | Arsenal | 187 | Giroud | 8.5 |
Forward | Liverpool | 179 | Lambert | 7.5 |
Forward | Southampton | 152 | Rodriguez | 7.5 |
Interestingly (for me at least) is that the cost of the players is fairly evenly spread across the team. Typically, when I select my fantasy football teams I tend to splash the cash on the big name strikers and then go for cheap defenders. However, based on these results though that’s looking like a bad decision so this season I’m going to follow the data and actually sign some decent defenders. Wish me luck…
All code is available on GitHub
** Peer - July 25, 2014 **
Hi. Interesting article.
I would be interested to know if the code can be utilised to pick the best available player with the added variables of points achieved per minutes played?
** Martin Eastwood - July 25, 2014 **
Sure, it’s just a question of constructing the necessary constraints for the solver to optimise against
**Neal Thurman - July 26, 2014 **
The theory is fine but for what you’re recommending to have any utility it needs to account for changes in situation from last season. Lambert isn’t likely to start now that he’s at Liverpool, Rodriguez is currently recovering from an injury, Sidwell isn’t likely to be the focal point at Stoke that he was at Fulham. What would be interesting is to see what the top 20 or 50 configurations look like and to see if the “spread the money around” strategy is dominant to buying a few very expensive players and filling in with bargains or if it just happens that the best outcome last season was spreading it around but there was a “galactico” strategy that was almost as good. Regardless, interesting food for thought. Cheers – Neal
EV - July 29, 2014
Good work. Will follow this. Possibly try different objective functions, like points/min played? and maybe adjust for this season’s schedule?
This approach has real potential for greatness.
jester112358 - July 30, 2014
Loved the post.
Didn’t read the code, but in the best 11 scenario: have you set the team to play 3-4-3 or why the code leaves out Sagna who has more points with the same price than Sidwell?
Martin Eastwood - July 30, 2014
Yes, I set the constraints to use a 3-4-3 formation.
Luis Pacheco - August 7, 2014
Thank you so much for the script! I was doing this with Excel and it is not as easy of just clicking enter.
I found that with 85 budget the best starting eleven for points was the 5-3-2. Second 4-4-2. I always played the 3-4-3. Now, I’m going to change it!
Martin Eastwood - August 8, 2014
That’s really interesting, looks like my trusty 3-4-3 I’ve been using for the past few years may not be the optimal formation!
Shalin - August 8, 2014
Hi Martin,
As someone who has a beginner knowledge of analytics and related tools, I had a few queries related to obtaining the data required for this solver. It seems you directly scrap the data into R from the FPL API. Are there any other reliable data sources available that you would recommend?
I believe it would also be possible to run such a solver in Excel. Your thoughts?
Martin Eastwood - August 8, 2014
If you want player data then Squawka and WhoScored are probably your best places to look. Yes, Excel and Open Office both have solvers built in so I expect it’s possible to do something similar with them.
Shalin - August 8, 2014
Hi Martin,
As someone who has a beginner knowledge of analytics and related tools, I had a few queries related to obtaining the data required for this solver. It seems you directly scrap the data into R from the FPL API. Are there any other reliable data sources available that you would recommend?
Thanks!
Shalin.
I believe it would also be possible to run such a solver in Excel. Your thoughts?
Pete - August 15, 2014
Interesting – I always wondered what would the perfect optimisation of value would look like – was thinking of trying it out but rodriguez, lallana are out and coleman might be. Give it another go but without injured players – quick! Haha
Brendan - August 25, 2014
This is very cool
Is it possible to make the objective function take into account the presence of a captain? I can’t think of a way of doing this that keeps it a linear constraint
Martin Eastwood - August 25, 2014
Thanks, it’s something I’d like to include but I’ve not come up with a suitable way to add it in yet.
marko - December 17, 2014
teams of 12 players with requirement that one player is played twice? maybe…
any chance of running the algorythm with points so far this season and current values?
Martin Eastwood - December 17, 2014
Good idea, will post a follow up when I get chance with the updated team recommendation. Thanks!
Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.
Thanks!