Social Networks

Social networks are networks in which the nodes are people, or sometimes groups of people, and the edges represent some form of social interaction between them, such as friendship. Sociologists have developed their own language for discussing networks: they refer to the vertices, the people, as actors and the edges as ties.

To most people the words social networks mean social networking services such as Facebook and Twitter. In fact, the study of social networks goes back much farther. Among researchers who study networks, sociologists have the longest and best established tradition of quantitative, empirical study.

The true foundation of the field is attributed to psychiatrist Jacob Moreno, a Romanian immigrant to America who in the 1930s became interested in the dynamics of social interactions of groups of people. Moreno published in 1934 a book entitled Who Shall Survive? which contained the seeds of the field of sociometry, which later became social network analysis. Moreno called his diagrams sociograms, rather than social networks. The first example of social network was a hand-drawn image depicting friendship patterns between the boys and the girls in a class of schoolchildren. The figure reveals that there are many friendships between two boys and two girls, but few between a boy and a girl.

Another early study of social networks is the affiliation network of the so-called Southern Women Study, published in 1941 in a book entitled Deep South. Davis, Gardner and Gardner made use of the newspaper reports of public appearance of society women to study a social network of 18 women in a city in the American south. They took a sample of 14 social events attended by the women in question and recorded which women attended which events. Women in this network may be considered linked if they attended the same event. An alternative and more complete representation of the data is an affiliation network or bipartite graph, a network with two types of vertex, representing the women and the events, with edges connecting each women to the events she attended. Women were found by the researchers to split into two subgroups, tightly knit clusters of acquaintances with only rather loose between-cluster interaction.

Since Moreno and Davis et al., social network analysis has been applied to a variety of different communities, including friendship and acquaintance patterns in the population, among students, or schoolchildren, contacts between business people and other professionals, boards of directors of companies, collaborations of scientists, movie actors, and musicians, sexual contact networks and dating patterns, criminal networks such as networks of drug users and terrorists, historical networks, online social networks, and social networks of animals.

Measuring a social network

A crucial issue is the study of social networks is the empirical method for accumulating data on the network. Two techniques are most used: direct questioning of subjects and the use of archival records.

The most common method is simply to ask people questions. If you are interested in friendship networks, for instance, then you ask people who their friends are. The asking may take the form of direct interviews with participants or the completion by participants of questionnaires. The main disadvantages of networks studies based on direct questioning of participants are that they are first laborious and second inaccurate. The administering of interviews or questionnaires and the collation of responses is a demanding job. For this reason, most studies have been limited to a few tens or at most hundreds of actors. Moreover, answers given by respondents are always, to some extent, subjective. If you ask people who their friends are, different people will interpret friendship in different ways and thus give different kinds of answers.

An increasing important, voluminous, and ofter highly reliable source of social network data is archival records. Such records are often impressive in their scale allowing us to construct networks of large size. An important special case of the reconstruction of networks from archival records is the affiliation network. An affiliation network is a network in which actors are connected via co-membership of groups of some kind. The most complete representation of an affiliation network is as a bipartite graph, where the networks has two types of vertex representing the actors and the groups, with edges connecting the actors to the groups to which they belong. Perhaps the best known example is the network of collaboration of film actors, in which the actors in the network sense are the actors in the dramatic sense, and the groups to which they belong are the film casts. The network is the basis of a popular parlor game, sometimes called the Six Degrees of Kevin Bacon, in which one tries to connect an actor to Kevin Bacon via chains of intermediate costars. Another example of a large affiliation network is the collaboration network of academics. In this network an actor is an academic author and a group is the set of authors of a learned paper. Excellent and very comprehensive archives exist in many academic fields, from which large collaboration networks can be assembled and studied.

Another way to measure a network is direct observation: simply by watching interactions between actors one can, over a period of time, form a picture of the networks of unseen ties that exists between them. One arena in which direct observation is essentially the only viable experimental technique to assemble a network is studies of the social networks of animals - clearly animals cannot be surveyed using interviews or questionnaires. Informative studies have been performed for monkeys, kangaroos, and dolphins.

A network may change over time and sometimes network data are time-resolved: the date of each interaction between pairs of vertices, which forms an edge of the network, is recorded. For instance, collaboration network data are often time-resolved since bibliographies contain at least the year of each recorded academic publication. Hence, collaboration links between authors can be stamped with the year in which the collaboration took place. Time-resolved network studies, or longitudinal studies, as they are called in sociology, allow for a temporal analysis of the network, that is how network properties changed over time.

Sampling social networks

There are two main techniques to sample a population: snowball sampling, which recalls a breath-first search of a graph, and random-walk sampling, which is similar to a depth-first search on a graph.

In snowball sampling, the investigators probe the population by getting some of the members to provide contact details for the others. You find one initial member of the population of interest and interview them. Then, upon gaining their confidence, you invite them also to name other members of the target population with whom they are acquainted. Then you interview all those acquaintances asking them to name further contacts, and so forth through a succession of waves of sampling.

An alternative to snowball sampling is random-walk sampling, which is similar to a depth-first search on a graph. In this method one again starts with a single member of the target community, interviews them and determine their contacts. Then, however, instead of interviewing all of those contacts, one chooses one of them at random and interviews only that one at each step. If the person in question cannot be found or declines to be interviewed, one simply chooses another contact. It is however very important that one really does determine all the contacts of each individual, even though most of the time only one of them is pursued. This is because for the method to work correctly one must make a random choice among the contacts. The resulting sequence of contacts corresponds to a random walk on the social network of interest.

Both sampling methods introduce bias in the choice of the sample. However, in random-walk sampling we can easily correct for this sampling bias. Indeed, the asymptotic sampling probability of vertices in a random walk is simply proportional to the vertex degree. Vertices with high degree, that is with many neighbours, are more likely to be visited by a random walk because there are more ways to reach them. On the other hand, in the limit of a large number of sampling waves, snowball sampling samples actors with probability proportional to their eigenvector centrality. Vertices with high degree, but also those with few neighbours of high degree, are more likely to be visited on a snowball sampling process. While we can determine vertex degree as part of the sampling process, the computation of eigenvector centrality requires complete knowledge of the network, which by definition we don't have. Furthermore, in snowball sampling the sample size grows exponentially with the number of sampling waves and hence one typically only performs a logarithmic number of waves, which is not enough for the sampling process to reach equilibrium. In random-walk sampling the sample size grows linearly, and hence the asymptotic regime is reached quite quickly for relatively small sample sizes.