The goal of this make is to read complex data on wins and losses for all World Series games.
scan. In particular pay attention to attributes what, skip, and nlinesscan to read data on wins and losses for all World Series games. Make a numeric vector for years and a character vector for the patterns of wins and lossesscan reads from left to right, but the dataset is organized by columns and so the years appear in a strange order. Use function order to order the data chronologically# Read the dataset with function scan
world_series <- scan("http://lib.stat.cmu.edu/datasets/wseries",
___, # - Skip the first 35 lines
___, # - Then read 23 lines of data
___) # - The data occurs in pairs: a year (numeric) and a pattern (character)
# find a sorting permutation of sorted years (use function order)
perm <- order(___)
# using the sorting permutation make a data frame with sorted information about years and patterns
world_series <- data.frame(year = ___, pattern = ___, stringsAsFactors = ___)
The package readr uses a heuristic to figure out the type of each column: it reads the first 1000 rows and uses some (moderately conservative) heuristics to figure out the type of each column. This challenging CSV illustrates some problems.
read_csv(). You’ll see some problems. Print the data frame and notice the types of columnsguess_max parameter# SOLUTION
library(readr)
# read with no comumn spec
challenge <- read_csv("challenge.csv")
# print challenge
challenge
# read with column spec
challenge <- read_csv("challenge.csv", col_types = cols(x = col_double(), y = col_date()))
# print challenge
challenge
# another solution
read_csv("challenge.csv", guess_max = 1001)