This is a conqueror’s proof assessment.
Warm-up
- load packages
tidyverse
, lubridate
, hexbin
and modelr
- import CSV files for football matches and team ratings into tibbles
football_matches
and football_ratings
;
- glance the imported tibbles. Read field notes for
football_matches
. As for football_ratings
, variables are: Team
(the team name), V
(number of victories), P
(number of draws), S
(number of defeats), GF
(goals for), GS
(goals against).
Query
- compare descriptive statistics (mean, median, max and sd) of home and away goals
- compute the absolute number of teams, the relative number of teams, and the average number of points of teams grouped by region and arrange the result by absolute number of teams in decreasing order
- group matches by goal spread
- retrieve the busy months (those with more than 40 matches)
- retrieve the matches during the busy months
- retrieve matches played at home by teams that qualified for the champions league
- retrieve matches played between teams from the south
Visualize
- plot goals against points
- add variable region to the previous plot
- add global smoothed line to the previous plot
- make a barplot with variable region
- make a barplot with variables region and league
- make a boxplot with variables region and league and reorder region by median
- add the mean to the boxplot and reorder region by mean
- use
stat_summary
to display min, max and mean and reorder region by mean
- plot goals against points faceting over region
- plot goals against points faceting over league and region
- plot count over region and league
- plot histograms of home and away team goals as well as goal spread
- plot shots on target versus goals
- plot fouls committed versus yellow cards
Program
- program a function that computes team ratings (victories, draws, defeats, points, goals for, goals against) from the team matches
- program a function that computes foresight prediction accuracy of match result using team points
- add the home field advantage (HFA) to the previous function. In the accuracy improved with HFA? Visualize the accuracy with and without HFA
- program a function that computes team points after every match for a given team and visualize the temporal evolution of points for two given teams
- visualize the temporal evolution of points for all teams
Model
- model PT in terms of GF using linear regression. Plot residuals and sort teams by residuals. Which are the top-ranked and bottom-ranked teams?
- do the same for GS and DG. Which model is the best?
- model PT in terms of GF and GS using multiple linear regression. What is the difference with respect to the model of PT in terms of DG? Which among GF and GS contribute more to PT? Use the answer to suggest a market strategy for a team.
- model PT in terms of V, P and S using multiple linear regression. Explain the outcome