2024 League of Ireland Player Profiles
Learning new skills through the 2024 League of Ireland Premier Division Season
Introduction
This tutorial will focus on creating player profiles for the 2024 League of Ireland Premier Division season. The data used in this tutorial is from the 2024 season and is more information on the league is available on the League of Ireland website. The data includes player statistics such as goals, assists, minutes played, player ratings, big chances, and shots to name a few. The goal of this tutorial is to create player profiles that summarize the performance of each player in the league. The profiles will also hopefully serve as a quick comparison tool for the top performing players in the league.
Dataset
The data used for this tutorial is stored as an Excel file and is available on my can be made available on request. The data is stored in a wide format with each row representing a player and each column representing a player statistic. The first step is to load the data and take a look at the first few rows.
# Load the data
data <- readxl::read_excel("all_player_loi.xlsx")
# Display the first few rows of the data
head(data)
Data Cleaning
The data is in a wide format and needs to be converted to a long format for analysis. The data also contains upper-case letters and spaces in the column names which need to be removed. The data cleaning process is shown below.
# Load the necessary libraries
library(tidyverse)
library(janitor)
# Convert the data to a long format
data_long <- data %>%
clean_names() %>%
pivot_longer(cols = -c(player, team), names_to = "statistic", values_to = "value")
# Display the first few rows of the data
head(data_long)
Data Analysis
The data is now in a long format and is ready for analysis. The next step is to create two functions that will be used to create the player profiles. The first function will create a Z score for each player statistic and the second function will create a T score for each player statistic. A Z score also known as a standard score is a measure of how many standard deviations an element is from the mean. These scores are useful in sports science as they allow for the comparison of different performance metrics on a standardised scale. Essentially a Z score indicates how an individual's performance metric compares to the mean and standard deviation (SD) of the reference performance. If data is normally distributed a Z score of 0 indicates that the individual's performance is equal to the mean of the reference performance. A Z score of 1 indicates that the individual's performance is 1 SD above the mean of the reference performance. A Z score of -1 indicates that the individual's performance is 1 SD below the mean of the reference performance. T scores are an alternative, easier to interpret standardised score. T scores are calculated by multiplying the Z score by 10 and adding 50. T scores offer a more intuitive interpretation of the data as they are on a scale from 0 - 100. The Z score and T score functions are shown below.
Z and T Score Functions
# Create a Z score function
z_score <- function(x){
z = (x - mean(x, na.rm = T)) / sd(x, na.rm = T)
return(z)
}
# Create a T score function
t_score <- function(x){
t = (x * 10) + 50
t = ifelse(t > 100, 100,
ifelse(t < 0, 0, t))
return(t)
}
The next step is to create the player profiles. The player profiles will be created using the Z score and T score functions. As we are only interested in the final T scores we will only display the T scores in the final player profiles. The player profiles will be created using the pivot_wider
function from the tidyverse
package. This will pivot the data_long data frame from a long format to a wide format. Then we will use the mutate_at
function to apply the Z score and T score functions to all columns except the player and team columns. The final data frame will contain the T scores for all player statistics and will be stored as a new data frame called player_profiles_t
. The code to create the player profiles is shown below.
# Load the necessary libraries
library(gt)
# Create the player profiles
player_profiles_t <- data_long %>%
pivot_wider(names_from = statistic, values_from = value) %>%
mutate_at(vars(-player, -team), z_score) %>% # Create Z scores
mutate_at(vars(-player, -team), t_score) # Create T scores from Z scores
Now that we have this new player_profiles_t
data frame we can view the data using the head()
function. This will display the first few rows of the data frame.
# Display the first few rows of the player profiles
head(player_profiles_t)
Pivoting the Data Frame
Next, we need to pivot the data frame back to a long format so that we can create the player profiles in a more detailed fashion. The code to pivot the data frame back to a long format is shown below.
# Pivot the data frame back to a long format
player_profiles_long <- player_profiles_t %>%
pivot_longer(cols = -c(player, team), names_to = "statistic", values_to = "t_score")
# Display the first few rows of the player profiles
head(player_profiles_long)
Categorisation of Statistics
Next, let's create separate data frames for each statistic category:
- Attacking: Goals, Goals per 90, Goals and Assists, Big Chances Missed, Shots on Target per 90, and Shots per 90.
- Creativity: Assists, Accurate Long Balls per 90, Accurate Passes per 90, Big Chances Created, Chances Created, Successful Dribbles per 90, and Penalties Won.
- Defending: Interceptions per 90, Successful Tackles per 90, Blocks per 90, Clearances per 90, and Possession Won in the Attacking 3rd per 90.
- Discipline: Yellow Cards, Red Cards, Penalties Conceded, and Fouls Committed per 90.
- Goalkeeping: Clean Sheets, Goals Conceded per 90, Save Percentage, and Saves per 90.
The code to create these data frames is shown below.
# Create separate data frames for each statistic category
attacking <- player_profiles_long %>%
filter(statistic %in% c("goals", "goals_per_90", "goals_and_assists", "big_chances_missed", "shots_on_target_per_90", "shots_per_90"))
creativity <- player_profiles_long %>%
filter(statistic %in% c("assists", "accurate_long_balls_per_90", "accurate_passes_per_90", "big_chances_created", "chances_created", "successful_dribbles_per_90", "penalties_won"))
defending <- player_profiles_long %>%
filter(statistic %in% c("interceptions_per_90", "successful_tackles_per_90", "blocks_per_90", "clearances_per_90", "possession_won_attacking_3rd_per_90"))
discipline <- player_profiles_long %>%
filter(statistic %in% c("yellow_cards", "red_cards", "penalties_conceded", "fouls_committed_per_90"))
goalkeeping <- player_profiles_long %>%
filter(statistic %in% c("clean_sheets", "goals_conceded_per_90", "save_percentage", "saves_per_90"))
Total Score of Athleticism (TSA)
Now that we have created separate data frames for each statistic category we can create a total sum score of performance for each category for each player. This is based on the concept of a Total Score of Athleticism (TSA) @turner2019 which is a composite score that combines the performance of an athlete across multiple performance metrics. The TSA is derived from the average of the T scores of the performance metrics. The TSA is a useful metric as it provides a single score that summarises the overall performance of an athlete across multiple performance metrics. The code to create the TSA for each player is shown below.
# Create the Total Score of Athleticism (TSA) for each player
attacking_tsa <- attacking %>%
group_by(player, team) %>%
summarise(tsa = mean(t_score), .groups = "drop") # "drop" removes the grouping information
creativity_tsa <- creativity %>%
group_by(player, team) %>%
summarise(tsa = mean(t_score), .groups = "drop")
defending_tsa <- defending %>%
group_by(player, team) %>%
summarise(tsa = mean(t_score), .groups = "drop")
discipline_tsa <- discipline %>%
group_by(player, team) %>%
summarise(tsa = mean(t_score), .groups = "drop")
goalkeeping_tsa <- goalkeeping %>%
group_by(player, team) %>%
summarise(tsa = mean(t_score), .groups = "drop")
Top Players
Now that we have created the TSA for each player, we can see the top 20 performing players in each category. The code to display the top 20 performing players in each category is shown below.
# Display the top 20 performing players in the attacking category
attacking_tsa %>%
arrange(desc(tsa)) %>%
head(20) %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(title = "Top 20 Attacking Players") %>%
cols_label(player = "Player", team = "Team", tsa = "TSA") %>%
fmt_number(columns = vars(tsa), decimals = 2) %>%
tab_spanner(label = "Attacking", columns = vars(player, team, tsa))
# Display the top 20 performing players in the creativity category
creativity_tsa %>%
arrange(desc(tsa)) %>%
head(20) %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(title = "Top 20 Creative Players") %>%
cols_label(player = "Player", team = "Team", tsa = "TSA") %>%
fmt_number(columns = vars(tsa), decimals = 2) %>%
tab_spanner(label = "Creativity", columns = vars(player, team, tsa))
# Display the top 20 performing players in the defending category
defending_tsa %>%
arrange(desc(tsa)) %>%
head(20) %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(title = "Top 20 Defending Players") %>%
cols_label(player = "Player", team = "Team", tsa = "TSA") %>%
fmt_number(columns = vars(tsa), decimals = 2) %>%
tab_spanner(label = "Defending", columns = vars(player, team, tsa))
# Display the top 20 performing players in the discipline category
discipline_tsa %>%
arrange(tsa) %>%
head(20) %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(title = "Top 20 Disciplined Players") %>%
cols_label(player = "Player", team = "Team", tsa = "TSA") %>%
fmt_number(columns = vars(tsa), decimals = 2) %>%
tab_spanner(label = "Discipline", columns = vars(player, team, tsa))
# Display the top 20 performing players in the goalkeeping category
goalkeeping_tsa %>%
arrange(desc(tsa)) %>%
head(20) %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(title = "Top 20 Goalkeeping Players") %>%
cols_label(player = "Player", team = "Team", tsa = "TSA") %>%
fmt_number(columns = vars(tsa), decimals = 2) %>%
tab_spanner(label = "Goalkeeping", columns = vars(player, team, tsa))
Next, we want to use the TSA scores to select the top 20 players in the league for each category. To do this we will need to pivot each categories data frame to a wide format, merge the TSA scores, select the top 20 players, and then pivot the data frame back to a long format. The code to do this is shown below.
# Pivot the attacking data frame to a wide format
attacking_wide <- attacking %>%
pivot_wider(names_from = statistic, values_from = t_score)
# Merge the attacking data frame with the attacking TSA scores
attacking_wide <- attacking_wide %>%
left_join(attacking_tsa, by = c("player", "team"))
# Select the top 20 attacking players
top_attacking <- attacking_wide %>%
arrange(desc(tsa)) %>%
head(20)
# Pivot the top attacking data frame back to a long format
top_attacking_long <- top_attacking %>%
pivot_longer(cols = -c(player, team, tsa), names_to = "statistic", values_to = "value") %>%
select(player, team, statistic, value, tsa)
# Display the top 20 attacking players
head(top_attacking_long)
# Pivot the creativity data frame to a wide format
creativity_wide <- creativity %>%
pivot_wider(names_from = statistic, values_from = t_score)
# Merge the creativity data frame with the creativity TSA scores
creativity_wide <- creativity_wide %>%
left_join(creativity_tsa, by = c("player", "team"))
# Select the top 20 creative players
top_creative <- creativity_wide %>%
arrange(desc(tsa)) %>%
head(20)
# Pivot the top creative data frame back to a long format
top_creative_long <- top_creative %>%
pivot_longer(cols = -c(player, team, tsa), names_to = "statistic", values_to = "value") %>%
select(player, team, statistic, value, tsa)
# Display the top 20 creative players
head(top_creative_long)
# Pivot the defending data frame to a wide format
defending_wide <- defending %>%
pivot_wider(names_from = statistic, values_from = t_score)
# Merge the defending data frame with the defending TSA scores
defending_wide <- defending_wide %>%
left_join(defending_tsa, by = c("player", "team"))
# Select the top 20 defending players
top_defending <- defending_wide %>%
arrange(desc(tsa)) %>%
head(20)
# Pivot the top defending data frame back to a long format
top_defending_long <- top_defending %>%
pivot_longer(cols = -c(player, team, tsa), names_to = "statistic", values_to = "value") %>%
select(player, team, statistic, value, tsa)
# Display the top 20 defending players
head(top_defending_long)
# Pivot the discipline data frame to a wide format
discipline_wide <- discipline %>%
pivot_wider(names_from = statistic, values_from = t_score)
# Merge the discipline data frame with the discipline TSA scores
discipline_wide <- discipline_wide %>%
left_join(discipline_tsa, by = c("player", "team"))
# Select the top disciplined players
top_discipline <- discipline_wide %>%
arrange(tsa)
# Pivot the top disciplined data frame back to a long format
top_discipline_long <- top_discipline %>%
pivot_longer(cols = -c(player, team, tsa), names_to = "statistic", values_to = "value") %>%
select(player, team, statistic, value, tsa)
# Display the top 20 disciplined players
head(top_discipline_long)
# Pivot the goalkeeping data frame to a wide format
goalkeeping_wide <- goalkeeping %>%
pivot_wider(names_from = statistic, values_from = t_score)
# Merge the goalkeeping data frame with the goalkeeping TSA scores
goalkeeping_wide <- goalkeeping_wide %>%
left_join(goalkeeping_tsa, by = c("player", "team"))
# Select the top 20 goalkeeping players
top_goalkeeping <- goalkeeping_wide %>%
arrange(desc(tsa)) %>%
head(20)
# Pivot the top goalkeeping data frame back to a long format
top_goalkeeping_long <- top_goalkeeping %>%
pivot_longer(cols = -c(player, team, tsa), names_to = "statistic", values_to = "value") %>%
select(player, team, statistic, value, tsa)
# Display the top 20 goalkeeping players
head(top_goalkeeping_long)
Data Visualisation
Now that we have all the data cleaned and stored as we need, we can now create polar plots to create player profiles based on our TSA calculations. To make this easier and keep the plot outputs consistent we can create a plotting function labelled single_polar_plot
that will create a polar plot for a single player. The function will take the following inputs:
data
: The data frame containing the player profiles.player
: The player for which the profile will be created.subtitle
: The subtitle for the plot.caption
: The caption for the plot.
Radar Plot Functions
We will use the geomtextpath
package to create curved text on the polar plot and the coord_curvedpolar
function to create the polar plot. The rlang
package will be used to evaluate the player input correctly. This is important as we want to evaluate the player input and not treat it as a string - this will prevent errors when creating the plot and allow us to label the data points correctly - this is called unquoting in R and is denoted by the !!
symbol in the code below.
Single Player Radar Plot
library(ggplot2)
library(dplyr)
library(rlang)
library(geomtextpath)
single_polar_plot <- function(data, player, subtitle = "Subtitle", caption = "Caption") {
data %>%
filter(player == !!player) %>% # Use !! to evaluate the player input correctly, this is called unquoting in R and is used to evaluate the player input and not treat it as a string
ggplot(aes(x = statistic, y = value, group = player)) +
geom_bar(aes(y = 100, fill = statistic, colour = statistic), stat = "identity", alpha = 0.2) +
geom_bar(
stat = "identity",
fill = "purple",
colour = "green",
alpha = 0.5
) +
geom_label(aes(label = round(value, 0)), nudge_y = -5, colour = "black", size = 2) +
coord_curvedpolar() +
geom_hline(yintercept = seq(50, 50, by = 1), linewidth = .5, linetype = "dashed") +
theme(
legend.position = "none",
plot.title = element_text(hjust = .5, colour = "gray20", face = "bold"),
plot.subtitle = element_text(hjust = .5, colour = "gray20"),
plot.background = element_rect(fill = "grey", color = "grey"),
panel.background = element_rect(fill = "grey", color = "grey"),
panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.text = element_text(face = "bold", colour = "gray20"),
axis.title = element_blank(),
axis.text.x = element_text(face = "bold"),
strip.text = element_text(face = "bold", colour = "gray20")) +
labs(
title = player,
subtitle = subtitle,
caption = caption
)
}
Now that we have created the single_polar_plot
function we can use it to create a radar chart for a single player. Lets create a radar plot for each top performing player in each category. The code to create the radar plots is shown below. Remember as we want to plot multiple statistics at once for each player we will need to use the _tsa and _long data frames.
# Find the top attacking player
att <- attacking_tsa %>%
group_by(player) %>%
arrange(desc(tsa)) %>%
head(1)
# Create a radar plot for the top attacking player
single_polar_plot(data = top_attacking_long, player = att$player, subtitle = "Attacking Performance", caption = "2024 League of Ireland Player Profiles")
# Find the top creative player
cre <- creativity_tsa %>%
group_by(player) %>%
arrange(desc(tsa)) %>%
head(1)
# Create a radar plot for the top creative player
single_polar_plot(data = top_creative_long, player = cre$player, subtitle = "Creative Performance", caption = "2024 League of Ireland Player Profiles")
# Find the top defending player
def <- defending_tsa %>%
group_by(player) %>%
arrange(desc(tsa)) %>%
head(1)
# Create a radar plot for the top defending player
single_polar_plot(data = top_defending_long, player = def$player, subtitle = "Defending Performance", caption = "2024 League of Ireland Player Profiles")
If you're interested in reading more and accessing all documentation and files - sign up below!