I will teach how to organize, transform, analyse and visualize small and big data, as well as how to effectively communicate the outcomes of the workflow. This learning path starts at Edgar Frank Codd and reaches Giorgia Lupi, passing through Hadley Wickham.

The course will be multi-task (learn, make, use, watch, glance, read, dig, listen; see more below) and multi-teacher (I will be assisted by other real and virtual teachers). Some basics in programming and statistics are desirable.

I will partially adopt reverse instructional design, an educational version of test-driven development for software.

Play

  1. Relational Databases
  2. Data Science
  3. Network Science
  4. Semistructured Data and XML
  5. Text Mining
  6. Big data and Internet of things
  7. Collaborate and communicate

Task-tag legend

You will go through different tasks: learn, make, use, watch, glance, read, dig, listen. A legend is below:

Software

Packages

Books

E-learning

As an academic professional, I can sign my class up for an entire semester for free via DataCamp for the Classroom. This has some benefits:

  1. you can learn by doing using DataCamp platform;
  2. I can assign particular courses or chapters, and see who finished on time and who missed the deadline;
  3. I can track student progress, grade automatically, and download reports

Resources

Datasets

Data challenges

Data challenges have 3 components:

The following are examples of data challenges you are invited to try:

  1. Which is the best team ever in Italian soccer? challenge
  2. Which are the performance classes in the latest Italian soccer Serie A league? challenge
  3. Is child mortality decreasing over time? challenge
  4. What are the qualities of diamonds? challenge
  5. In there a first-mover advantage in chess? challenge
  6. How life expectancy change over time for each country? html + Rmd
  7. What were the most important terrorists in 2011 Madrid train bombing? html + Rmd
  8. Are female dolphins more social than male dolphins? challenge
  9. Discover the most interdisciplinary and autarchic disciplines in science html + Rmd
  10. Phyllotaxis: draw flowers using R html + Rmd

Assessments

  1. Beginner’s luck HTML
  2. Conqueror’s proof HTML

Exam

The exam consists of a written exam and a project with oral presentation.

The written part consists of a list of questions, either open questions or exercises, over all the covered syllabus, for a duration of 90 minutes. During the written examp students are allowed to use only sheets covering the syntax of languages (such as dplyr cheatsheet).

The project consists of one significant data challenge chosen by the student. It is done individually and uses methods, languages and software tools seen during the course (not necessarily all, but most of them) in an integrated and fluent way. The student must deliver:

At least one week before the date of the oral exam load all material on GitHub and send to the teacher an e-mail with the link of the repository.

During the oral examination of the project, students will discuss, in a maximum time of 20 minutes, the project using a presentation on a personal laptop (bring adapters). The presentation is open to the public. Both the project and the presentation skills will be evaluated.

The final mark will be a weighted average of the written and project parts of the exam. The weight of the project is \(\varphi^{-1}\), where \(\varphi\) is the golden ratio. Passable marks are between 18 and 30. Excellent projects will be awarded with a praise bonus from 1 to 3 points to be summed to the result of the weighted average. The final mark is rounded to the closest integer.

The written and oral (project) parts of the exam can be done in different moments and in any order. The dates of the exams are set as per the academic calendar and will take place in Trieste. Enroll only for the last (second) part of the exam.