Italian Soccer League

This is a conqueror’s proof assessment.

Warm-up

load packages tidyverse, lubridate, hexbin and modelr
import CSV files for football matches and team ratings into tibbles football_matches and football_ratings;
glance the imported tibbles. Read field notes for football_matches. As for football_ratings, variables are: Team (the team name), V (number of victories), P (number of draws), S (number of defeats), GF (goals for), GS (goals against).

select variables of interest, that is those from Date to AR
create Date objects from Date variable width dmy function
get team names and arrange them in alphabetical order
add to football_matches columns HomeTeamId and AwayTeamId with numeric identifiers for the teams
move in football_matches columns Date, HomeTeam, AwayTeam, HomeTeamId, and AwayTeamId in front of everything
add to football_ratings variables DG for the difference between goals for and goals against and PT for the number of points (a victory is 3 points, a draw is 1 point, a defeat is 0 points.)
add to football_ratings variable region with the region (north, center, or south Italy) of the city of the team
arrange football_ratings by points
add to football_ratings variable league with values: champions (from rank 1 to 3), europa (from rank 4 to 5), nothing (from rank 6 to 17), and retro (from rank 18 to 20)

compare descriptive statistics (mean, median, max and sd) of home and away goals
compute the absolute number of teams, the relative number of teams, and the average number of points of teams grouped by region and arrange the result by absolute number of teams in decreasing order
group matches by goal spread
retrieve the busy months (those with more than 40 matches)
retrieve the matches during the busy months
retrieve matches played at home by teams that qualified for the champions league
retrieve matches played between teams from the south

program a function that computes team ratings (victories, draws, defeats, points, goals for, goals against) from the team matches
program a function that computes foresight prediction accuracy of match result using team points
add the home field advantage (HFA) to the previous function. In the accuracy improved with HFA? Visualize the accuracy with and without HFA
program a function that computes team points after every match for a given team and visualize the temporal evolution of points for two given teams
visualize the temporal evolution of points for all teams

model PT in terms of GF using linear regression. Plot residuals and sort teams by residuals. Which are the top-ranked and bottom-ranked teams?
do the same for GS and DG. Which model is the best?
model PT in terms of GF and GS using multiple linear regression. What is the difference with respect to the model of PT in terms of DG? Which among GF and GS contribute more to PT? Use the answer to suggest a market strategy for a team.
model PT in terms of V, P and S using multiple linear regression. Explain the outcome