I will teach how to organize, transform, analyse and visualize data, as well as how to effectively communicate the outcomes of the workflow, with a strong focus on network data.

The course will be multi-task (learn, make, use, watch, glance, read, dig, listen; see more below) and multi-teacher (I will be assisted by other real and virtual teachers). Some basics in programming, linear algebra and matrix theory, and statistics are desirable.

**A not-so-short introduction to the science of data**`watch`

The Joy of Stats by Hans Rosling trailer / documentary- Introduction to R
`use`

R`use`

RStudio`watch`

Getting started with R and RStudio`watch`

R tutorial`learn`

A hasty tour inside R (markup, markdown)`make`

Applications`watch`

A focus on data frames Part 1 / Part 2`learn`

Data frames vs. Tibbles`read`

Style`make`

DataCamp introduction to R`glance`

Cheatsheet. Base R`glance`

Cheatsheet. RStudio

- Data import and tidy
`watch`

Data import in base R`learn`

Data import in base R. Read chapter 4 (in particular chapter 4.12) in book T11`learn`

Data import with readr`dig`

Tidy data

- Data transformation
`learn`

Relational model and relational algebra`learn`

Data transformation in base R. Read chapters from 5.18 to 5.31 in book T11 as well as documentation of functions: subset, order, transform, aggregate, and merge`learn`

Data transformation with dplyr`learn`

Joins with dplyr`read`

DBI, RSQLite, and dbplyr

`make`

SQL vs. R`make`

nycflights13 with SQL, DBI and RSQLite`read`

Choosing R or Python for data analysis? An infographic`glance`

Cheatsheet. Data transformation: dplyr

- Data visualization
`read`

The Great Wave off Kanagawa`watch`

Plotting with Base R. Part 1 / Part 2 / Part 3`learn`

Data visualization in base R. Read chapter 10 in book T11`watch`

Plotting with ggplot`learn`

Data visualization with ggplot`learn`

Perfection is in the details`glance`

The R Graph Gallery`glance`

Cheatsheet. Data visualization: ggplot2

- Exploratory data analysis
`watch`

Correlation and covariance in R`glance`

The corplot package`watch`

EDA with base R`learn`

EDA with dplyr and ggplot`dig`

Linear regression in R`dig`

Regression modelling

**Network science**`listen`

Notes on linear algebra and matrix theory Invited teacher: Enrico Bozzo.`learn`

Notes on graph theory- The igraph package
- The ggraph package
- Real-world networks
`watch`

The power of networks`watch`

A visual history of human knowledge`glance`

Gallery: Gorgeous networks that help us understand the world`glance`

Visual complexity`glance`

Networkism`watch`

From trees to rhizomes. Picture show and caption text`learn`

Technological networks`learn`

Social networks`learn`

Information networks`learn`

Biological networks

- Centrality
`learn`

Degree`learn`

Eigenvector, Katz, and PageRank centralities`dig`

PageRank: Standing on the shoulders of giants`learn`

Closeness`learn`

Betweenness`dig`

Current-flow centralities

- Power
`learn`

A measure of power in networks`dig`

A theory on power in networks`dig`

Bargaining and power in networks. Chapter 12 in EK10

- Similarity and heterogeneity
`learn`

Similarity`learn`

Heterogeneity

- Community detection
`learn`

Modularity`learn`

Spectral comunity detection`learn`

Hierarchical clustering`learn`

Other methods

- Structure
`learn`

Network models`learn`

Components and resiliance`make`

Components and resilience in R (markup, markdown)`watch`

The science of six degrees of separation`read`

Chains, by Frigyes Karinthy`read`

Erdös number`watch`

The strength of weak ties`learn`

Small-world networks`make`

Small-world networks in R (markup, markdown)`learn`

Degree distribution`make`

Degree distribution in R (markup, markdown)`read`

Power-law distribution`learn`

Transitivity and reciprocity`learn`

Assortative mixing

**Communication**`read`

Professional ethics for the data scientist- Git and Github
`glance`

Git and Github`learn`

RStudio, Git and GitHub

- R Markdown
`learn`

R Markdown`learn`

R Markdown formats`glance`

Cheatsheet. R Markdown

- Interactivity
`glance`

Shiny- helloShiny
- helloWidgets
- helloReactivity
- helloCache (app, helpers)
- visCentrality (run)
- corCentrality (run)
- visResilience (run)
- visCommunity (run)

`glance`

HTML widgets`glance`

widgets showcase (markup, markdown)`dig`

visNetwork`dig`

networkD3

`glance`

Dashboards`make`

Interstellar Bureau of Investigation (markup, markdown)

- Processing and Arduino
`use`

Processing`watch`

Hello Processing`learn`

BubbleNet (run, code)`learn`

Force-directed network visualization`use`

Arduino`watch`

Hello Arduino`watch`

Wired // Arduino`learn`

From Arduino to Processing and back`learn`

Visualizing real-time data

You will go through different tasks: learn, make, use, watch, glance, read, dig, listen. A legend is below:

`learn`

: I teach, you listen (and hopefully learn).`make`

: I give you an assignment, you make it during the class. We discuss the solutions during the next class.`use`

: you use a software: download, install and run it for the first time. I give you a brief practical introduction to it.`watch`

: We watch a video together. By and large, the video acts as a teaser, introducing the next topic in an informal and attractive way.`glance`

: You give a brief and fast look at something, generally an informative website. I steer you towards the most important sections.`read`

: You read a story, typically at home. We discuss it together during the following class.`dig`

: You read a theoretical deepening of the current topic, normally at home. We talk about it during one of the next classes.`listen`

: The class is given by an invited speaker, an expert in the field.

- N10
**Networks**. Mark Newman. Oxford University Press, 2010. - EK10
**Networks, crowds and markets**. David Easley and Jon Kleinberg. Cambridge University Press, 2010. - WG17
**R for Data Science**. Hadley Wickham and Garrett Grolemund. O’Reilly. 2017. - T11
**R Cookbook**. Paul Teetor. O’Reilly Media. 2011. - W14
**Advanced R**. Hadley Wickham. Chapman and Hall/CRC. 2014.

Download the bibliography

Data challenges have 3 components:

**Input**, which consists of:- a dataset of raw data. No data model is assumed. The data should be open so it can be freely distributed.
- a set of data questions and challenges, formulated in natural language, whose answers might be (but not necessarily are) hidden behind the raw data. Questions should be sufficiently general and compelling to tease the attention and curiosity of scholars.

**Analysis notebook**: a stream of analyses and visualizations aimed at approaching the given data questions and challenges. Ideally, the notebook is written in some popular, free language (like R or Python) and it is self-containing so that it can be easily distributed, executed and modified by other scholars. Issues like readability, conciseness, elegance, efficiency of the notebook are relevant, although not crucial.**Output**: these are the suggested answers to the given data questions and challenges. Answers might be partial (not definitive). The same question can be answered with different notebooks. A (modest) degree of subjectivity in the interpretation of the data answers is expected.

The following are examples of data challenges you are invited to try:

- Are female dolphins more social than male dolphins? (markup, markdown)
- Which are the most powerful countries in the European natural gas market? (markup, markdown)
- Detect the most dangerous terrorists involved in Madrid train bombing attack of 2011 (markup, markdown)
- Discover the most interdisciplinary and autarchic disciplines in science (markup, markdown)
- Detect communities in a Karate club friendship network (markup, markdown)
- Attack the resilience of the Madrid train bombing terror network (markup, markdown)
- Are relationships among dolphins assortative by sex? And by degree? (markup, markdown)

The exam consists of a project and an oral exam. The project consists of two data challenges (see above some examples), one focused on data science and one focused on network science, chosen by the student.

The project must be done individually. It must use methods, languages and software tools seen during the course (not necessarily all, but most of them) in an integrated and fluent way. The project must contain:

- A brief report (about 15 pages) which describes the dataset, the objectives, the analyzes and the results obtained. The report must contain the results of the analysis in form of tables and figures, but not the R code that generated them
- R code in a R Markdown document and its HTML version
- Any other code (for instance, Processing/Arduino)
- The dataset used

All materials must be sent to the teacher at least one week before the date of the oral exam, using a ZIP archive, by e-mail if less than 10MB, or otherwise it must be saved on a server (for example WeTransfer) by sending the link to download it.

During the oral examination, students must discuss, in a maximum time of 30 minutes, the project using a presentation on a personal laptop (bring adapters). The presentation is open to the public. Both the project and the presentation skills will be evaluated.

The exams are set as per the academic calendar and will take place in Udine (typically in “Sala Riunioni” of Department of Mathematics, Computer Science and Physics).