Are the different layers of a social network conveying the same information?
Abstract
Comprehensive and quantitative investigations of social theories and phenomena increasingly benefit from the vast breadth of data describing human social relations that is now available within the realm of computational social science. Such data are, however, typically proxies for one of the many interaction layers composing social networks, which can be defined in many ways and are composed of communication of various types (e.g., phone calls, facetoface communication, etc.). As a result, many studies focus on one single layer, corresponding to the data at hand. Several studies have however shown that these layers are not interchangeable, despite the presence of a certain level of correlation between them. Here, we investigate whether different layers of interactions among individuals lead to similar conclusions with respect to the presence of homophily patterns in a population—homophily represents one of the widest studied phenomenon in social networks. To this aim, we consider a data set describing interactions and links of various nature in a population of Asian students with diverse nationalities, first language and gender. We study homophily patterns, as well as their temporal evolution in each layer of the social network. To facilitate our analysis, we put forward a general method to assess whether the homophily patterns observed in one layer inform us about patterns in another layer. For instance, our study reveals that three network layers—cell phone communications, questionnaires about friendship, and trust relations—lead to similar and consistent results despite some minor discrepancies. The homophily patterns of the copresence network layer, however, does not yield any meaningful information about other network layers.
Keywords
Social networks Multilayer networks Temporal homophilyList of abbreviations
 CC
classroom community
 CCDF
complementary cumulative distribution function
 C.I.
confidence interval
 GPA
grade point average
 LS
loneliness scale
 QAP
quadratic assignment procedure
 S
strong
 SACQ
student adaptation to college questionnaire
 VS
very strong
 W
weak
1 Introduction
Mining and analyzing social networks in various contexts yield important insights towards a better fundamental knowledge and understanding of human behavior [1]. Data on social networks have allowed researchers to investigate social theories and effects such as homophily, influence, triadic closure, etc. Data also help design datadriven models of human interactions, which can be used to describe the many processes taking place in a given population, such as information spreading, coordination, consensus formation, or spread of infectious diseases [2]. Accurate descriptions of social interactions are therefore crucial to shed light on the most relevant mechanisms at work in these processes, and for instance to understand the factors determining if a rumor will spread, or what are the best measures to contain the spread of a disease.
Within a given population, however, several networks of social interactions can be defined: e.g., friendship relations, patterns of communications, copresence, facetoface interactions. These different types of relations form a multilayer network [3, 4], for which each layer can be explored using possibly different methods. Friendship relations are typically mined through surveys, physical interactions and proximity by diaries or more recently using wearable sensors [5, 6], and communication patterns are extracted from mobile phone call records [7, 8, 9]. In recent times in particular, technological developments have allowed researchers to gather increasing amounts of digital data on facetoface contacts, phone communication patterns and online relationships, at widely different scales in terms of population size, space and time resolution. These data have been widely used to investigate the structure of social networks, the patterns of social interactions and social theories, such as the strength of weak ties [7], homophily patterns (the tendency of individuals to have social links with similar individuals, with respect to gender, nationality, social class, etc. [10]) [11, 12, 13, 14, 15, 16], mechanisms of link formation and persistence [11, 12, 17], social strategies linked to limited attention capacity [8], etc.
In recent years, a number of data gathering efforts has moreover managed to access simultaneously more than one layer of interactions in various population groups, leading to multilayer network data [18, 19]. The issue then arises of how to deal with the resulting increased complexity of the data sets, as different types of ties are not interchangeable [20]. In fact, it has been shown in a number of cases that these layers are correlated but not equivalent [4, 21, 22, 23, 24, 25, 26, 27, 28]. For instance, a comparison between facetoface contacts measured by sensors and friendship relations obtained through surveys has shown that the distribution of contact durations are broad both for pairs of friends and pairs of nonfriends, even if the longest contacts occur between friends [24]. In addition, a comparison between proximity events and online social links has shown that a simple thresholding procedure retaining only the strongest proximity links is not enough to determine online friendship [23]. Furthermore, a recent study of communication, online links, and proximity events has highlighted that these layers differ and cannot be reduced to a single channel of interaction [28]. Several approaches have thus been put forward to manage multilayer social networks, such as blockmodelling for multiple relations [29], stochastic actororiented models dealing with more than one layer [30], or dimensional reduction based on structural similarities of layers to define composite network measures [31].
In most cases however, studies of social networks are still based on data describing one specific layer of the multilayer network characterizing social interactions, and consider this layer as a proxy of “the” social network of the population under study, despite the wellaccepted and known differences between the “social networks” defined through different proxies [3]. Indeed, many authors have argued that close relationships correspond to both higher frequencies of facetoface contacts and phone communication [9, 11, 14, 16, 32, 33]. It is, for instance, often assumed that the most important relationship of an individual can be captured by his/her mobile phone records, and that the “best friend” of an individual is the person he or she is in most contact with. Some evidence to support this assumption has come from surveys [32, 34] or from comparison between surveys and mobile phone records [9], which are, however, rarely available for the same population.
It is thus important to gather and investigate data sets containing multiple layers of social interactions, to better ground such assumption and assess the extent of its validity. It is worth highlighting that the number of data sets offering multiple layers of interactions, enriched with metadata describing individual characteristics, remains extremely limited. Moreover, it is crucial to investigate whether, given that the layers of interactions are correlated but not equivalent, socially relevant patterns and theories can be reliably assessed from one layer only. If it is indeed the case, then for a given population the data that is most conveniently accessible or that offers the best resolution can safely be used to explore such issues. Here, we focus on homophily along a range of individual characteristics, as one of the most explored patterns structuring social networks [10]. A recent study has shown some notable differences in the strength of homophilous patterns in different communication channels in a population of European students [4]. We investigate this particular issue in a diverse population of Asian students of various nationalities in a university of Singapore, for which we have access to phone communication records, copresence events, and friendship and trust relations over one full calendar year. Detailed metadata about gender, nationality, first spoken language, academic performance and psychological traits are also available, allowing us to assess homophily and its temporal evolution along multiple traits and multiple layers of social relationships. We put forward a methodology to systematically compare homophily patterns across layers, as observed through different indicators and with respect to different attributes, and apply this methodology to our dataset. In this case, we show that patterns of homophily in the copresence layer do not inform us on the patterns in other layers, while the patterns observed in the communication network and in the networks of friendship and trust obtained from surveys, although not equal, are informative of each other.
2 Data and methods
We consider data collected in a Singapore university during one full academic year—three consecutive terms separated by short breaks—and concerning 35 participating students, of which 15 students were from one cohort class and 20 students from another cohort class, studying in the same campus and staying at the same oncampus hostel. Each cohort class varied between 45 to maximum 50 students based on the university policy. There were no inclusion criteria for this study. The data consists in several types of relationship between students, as well as in metadata about each student.
Each participant was given a mobile phone (models included Samsung Galaxy S3, Samsung Nexus, and Sony Xperia, all having equivalent features and supporting the stateoftheart Android system at that time, namely 4.2/4.3 Jelly Bean) to use for the duration of the study. This smartphone was preinstalled with a specially developed software capable of recording and sending phone usage data and colocation information to a server located in the university premises, as described in [35]. Raw data collected by the software consists therefore in all call events between participating students, with timestamp and duration of the call, and timestamped colocation events Specifically, copresence events were detected by periodic Bluetooth scanning at 5minute intervals. If two participants were discovered in copresence, there would be one copresence event registered for each participant, thus a total of two copresence events for the dyad [35]. Automated location data collection by each phone was turned off each night from 12:00 a.m. to 7:00 a.m. for energy saving.
All 35 participants reported in this paper completed all components of the study. Participants were also reminded to always carry the phone with them and use it as their own at the beginning of each term in order to get meaningful data. All participants agreed to participate in this study on a voluntary basis, where each participant was compensated with SGD$30 for participation and completion of all survey questions. Besides the 35 participants who completed the study, there were another two students who participated but withdrew from the study (one discontinued after 1 day of participation, and the other one pulled out from the study at the end of the first term because of school transfer).
The resulting data is conveniently represented as 2 temporal networks, the communication and the copresence ones, in which nodes represent students and events correspond to a phone call communication or to a copresence event. Each communication event is directed, represented by the calling node, the receiving node, the starting time and the duration of the call. Each copresence event is instead undirected, represented by two nodes, a starting time and a duration.
Each temporal network can be aggregated on any arbitrary time window. We have considered on the one hand communication and copresence aggregated over the full study (one year), and on the other hand shorter periods of four months corresponding to the university terms: Term 1 (T1: May to August), Term 2 (T2: September to December) and Term 3 (T3: January to April). Each aggregated communication network relates nodes, representing students, by directed links: a directed link is drawn from student i to student j if i placed at least one call to j during the aggregation time window. Each directed link can be weighted in two different ways: (1) the weight can be either the number of calls \(n^{c}_{i\to j}\) from i to j, or (2) the total duration \(d^{c}_{i\to j}\) of these calls. We also consider an undirected version of these communication networks in which the weight of a link between i and j is simply the sum of the weights from i to j and from j to i, \(w^{s}_{ij} = w_{i\to j} + w_{j\to i}\) (with \(w=n^{c}\) or \(d^{c}\)).
Properties of the communication and copresence networks
(a) Aggregation period: 1 year  

Network properties  Communication  Copresence 
Number of nodes  33  35 
Number of edges  282  435 
Average degree  8.5  24.85 
Average clustering coefficient  0.45  0.858 
Edge weight—total number  2125  14,249 
Edge weight—average number  64.4  407.1 
Edge weight—total duration (s)  93,580  11,966,878 
Edge weight—average duration (s)  2835.8  341,911.0 
(b) Aggregation period: 4 months (Term)  

Network properties  Communication  Copresence  
T1  T2  T3  T1  T2  T3  
Number of nodes  33  33  33  35  34  31 
Number of edges  162  150  129  347  292  145 
Average degree  9.81  9.09  7.81  19.82  17.17  9.35 
Average clustering coefficient  0.30  0.34  0.23  0.80  0.80  0.65 
Edge weight—total number  766  822  555  5387  7057  1791 
Edge weight—average number  23.2  24.9  16.8  153.9  207.5  57.8 
Edge weight—total duration (s)  29,969  37,553  26,550  3,981,721  6,412,726  1,493,623 
Edge weight—average duration (s)  908.1  1138  804.5  113,763.4  188,609.5  48,181.3 
In addition, questionnaires were used to assess selfreported relations among students. Each participant indicated his/her friendship tiestrength with all other participants by answering individually the following two questions: (Q1) “How strong is your relationship with this person?” and (Q2) “How would you feel asking this friend to loan you $100 or more?”. For each question, a 9point scale was used where 1 indicates for Q1 that they barely know each other (resp., for Q2 that they would never ask), while 9 indicates they are close to each other (resp., for Q2, that they would feel comfortable). These questionnaires were answered by the students at the start of the study (T0) to establish baseline values, and subsequently at the end of every term (T1, T2, T3). At each such time, we obtain therefore two questionnaire networks (one for each question asked). Both networks are fullyconnected, directed, and weighted, where the weight \(W_{i \to j}\) of an edge from student i to student j ranges from 0.1 to 1.0 (9 points) indicating the reported strength of the friendship (Q1) or trust (Q2) relationship of i towards j.
Demography table: Number of participants by nationality and gender
Nationality  Gender  Total  

Male  Female  
Singaporean/permanent resident (PR)  7  15  22 
Indian  2  0  2 
PRC Chinese  4  1  5 
Malaysian  4  0  4 
Vietnamese  2  0  2 
Total  19  16  35 
For each attribute, the population under study was divided into two groups. For gender and cohort class, the division is straightforward. For nationality, the participants were divided into two groups—Singaporeans and foreigners—although several nationalities are represented (see Table 2). With respect to the first spoken language, in order to avoid confounding effects with respect to nationality, we focus only on Singaporean students, whose first language is either English or Chinese. For academic performance (GPA) and the psychological indices, again the participants were segregated into two groups to facilitate the analysis of the results: first group with abovethemedian values, and the other group with belowthemedian values.
2.1 Measuring homophily
Homophily in a social network can be assessed in a number of ways. It is possible for instance to investigate the fraction of ties between individuals with similar versus different characteristics, but also higherorder structures such as triads [36], and even temporal patterns or motifs [15]. Given the weighted nature of the networks at hand—with possibly broad distributions of weights as often encountered in human interaction networks, taking into consideration edge weights is crucial [4].
 Dyadic homophily: we first consider homophily at the basic dyad level, i.e., considering the basic elements forming the network, that is the edge. We compute the total fraction of weights carried by edges between nodes with the same value of the attribute A (directed networks being converted to their undirected versions):$$ D = \frac{\sum_{i,j / A_{i} = A_{j}} w^{s}_{ij}}{ \sum_{i,j} w^{s}_{ij} } . $$(1)
 Triadic homophily: closed triangles describe the smallest nontrivial structure in a social network. For a given attribute A, that can take only two values, triangles can either be formed by three individuals with equal value of the attribute, or by a group of 2 individuals different from the third. We therefore compute the ratio of the weights of triangles formed by individuals with the same attribute value to the total weight carried by triangles:where the sums \(\sum^{\Delta}\) are conditioned on \(ijk\) being a closed triangle. To compute this index, we convert directed networks to their undirected versions.$$ T = \frac{\sum^{\Delta}_{i,j,k / A_{i} = A_{j} = A_{k}} {( w^{s}_{ij} + w^{s}_{ik} + w^{s}_{jk} )} }{ \sum^{\Delta}_{i,j,k} {( w^{s}_{ij} + w^{s}_{ik} + w^{s}_{jk} )} } , $$(2)

Social preference: for each node i, we can rank his/her neighbors j according to the value of the corresponding edge weight \(w_{i \to j}\). As it was found in [9] that a large fraction of communication is typically allocated by each individual to a small number of topranked alters, it is indeed of interest to check if the individual and these topranked alters share common attributes. We focus here on comparing the attributes of i and of his/her firstranked neighbor and compute the fraction of individuals for which these attributes are equal (we have performed the same test for the secondranked neighbors, but omit the results in order to avoid accumulating too many indicators). An agent whose strongest contact has the same attribute shows indeed homophilic behaviour, so the fraction of such agents gives an indication of the existence of homophily in the population. We can moreover compute these fractions separately for all nodes i with a given value of the attribute A. For instance, we can compute separately the fraction of male students and of female students for whom the strongest link is towards a male student, therefore enabling to detect whether homophilic trends are different for individuals with different characteristics.

Temporal motifs: as put forward in [15], the availability of timeresolved data makes it possible to investigate homophily in temporal patterns of interactions by considering events concerning the same set of nodes and close enough in time. As in [15], we consider sets of events separated by at most 10 minutes and involving the same 2 or 3 individuals, and investigate the similarity (or difference) of their attributes. For the sake of simplicity and given the lack of statistics for motifs involving more than 2 nodes in our data, we limit the evidence shown to reciprocal and repeated calls (within the timewindow of 10 minutes) between two nodes: we consider all such patterns and compute the fraction involving nodes with equal attributes.
Null model: The measure of the abovedefined quantities is not enough in itself to assess the presence of homophily in the data. For instance, if a population is divided into two groups, with one group much larger than the other, then one would observe more links within the larger group than between the two groups even if links were created totally at random. One thus needs to compare the data with a baseline corresponding to a null hypothesis of absence of homophily. To this aim, a well known and often used way to assess homophily is to compare the values obtained in the data with those obtained in a proper null model. Several possibilities have been considered in the literature. For instance, one can consider an ensemble of random networks in which each individual has the same number of links as in the real data i.e., an ensemble of networks with fixed degree sequence, sampling this ensemble by simply reshuffling links at random [37]. Such a procedure was used for instance in [4, 13]. In this ensemble however, structures and correlations in the network are not fixed (they are indeed destroyed by the reshuffling procedure), while they might be relevant, in particular in social contexts. For instance, the number of triangles is not fixed in this ensemble, so that this procedure is not suited to test for triadic homophily. One possibility would then be to use as null model an ensemble of random graphs in which, for each node, its degree and the number of triangles to which it belongs are fixed, as defined in [38]. Such a null model however still disregards higher order structures and correlations such as communities or groups of individuals. To deal with this issue, several authors have used, instead of ensembles of random networks that keep only a specific set of properties of the original network, a null model in which the network structure is kept completely intact, but in which each possible permutation of the attributes among the nodes is equally probable: this null model is sampled by randomly reshuffling the attributes among nodes, equivalently to the permutations used in QAP procedures [39, 40, 41]. Homophily has been measured in this way for instance with respect to gender in school children [13], with respect to academic performance in students [42], for temporal motifs in communication networks [15] and with respect to gender in online relationships, using dyadic and triadic measures [36]. We consider here this standard null model and reshuffling procedure to sample it. In addition, we show in the Supporting Information (SI) (Additional file 1) an example of results obtained when considering instead as null model an ensemble of random graphs with fixed degrees and numbers of triangles for each node [38]. In each case, we sample the null model by performing 100 reshuffling and compute the homophily indices for each. The empirical value is then compared to the resulting distribution (shown in figures as a boxplot, with the box extremities representing the 25th and 75th percentiles of the distributions, and whiskers at the 5th, 10th, 90th and 95th percentiles). It is considered that the data reveals an absence of homophily if the data point falls within the box (“No”), and that we have respectively weak (“W”), strong (“S”) and very strong (“VS”) degrees of homophily if the data point lies respectively between the 75th and the 90th percentiles, between the 90th and the 95th percentiles, and above the 95th percentile. In addition, we find in few cases evidence for heterophily, i.e., the tendency to have less homophilic dyads, triads or motifs with respect to the null model. Similarly to the homophily patterns, we consider that we have respectively weak (“W_{het}”), strong (“S_{het}”) and very strong (“VS_{het}”) degrees of heterophily when the data point lies respectively between the 10th and the 25th percentiles, between the 5th and the 10th percentiles, and below the 5th percentile of the null model distribution. (Note that the use of these percentile values is obviously somewhat arbitrary—even if the ones we use are quite usual—, but we remind that the main goal of our paper will be to assess whether homophily patterns are exhibited consistently across different layers of the social network: the main requirement is thus to have a consistent way of measuring homophily in the different layers.)
Finally, and for the sake of simplicity, we will also envision a coarser classification of patterns, in which we group the cases “W”, “No” and “W_{het}” together (and as no evidence for homophily nor heterophily), and we consider as evidence for homophily (resp. heterophily) both “S” and “VS” cases (resp. “S_{het}” and “VS_{het}”).
2.2 Networks comparison

The Pearson correlation coefficient between the weights of links between individuals within the two considered networks. If one of the network is directed and the other undirected, we first convert the directed one into its undirected counterpart: for each pair of nodes \((i,j)\), the resulting weight is the sum of the weights on the directed edges \(i\to j\) and \(j\to i\).
 The cosine similarity for each node i, which measures the similarity between this node and its neighborhoods in the two networks. If \(w_{ij,1}\) and \(w_{ij,2}\) denote the weights on the links from i to j respectively in networks 1 and 2, the cosine similarity of i is defined asWe compute the distribution of \(\operatorname{sim}_{1,2}(i)\) for a pair of networks and compare it with two null models: in the first one, we keep the link structure and reshuffle the weights on the links; in the second, we reshuffle the links while keeping the degree of each node fixed [37].$$ \operatorname{sim}_{1,2}(i) = \frac{\sum_{j} w_{ij,1} w_{ij,2} }{ \sqrt {\sum_{j} w_{ij,1}^{2}} \sqrt{\sum_{j} w_{ij,2}^{2} } } . $$(3)
While these measures give us an idea of the topological similarity of networks, our goal here is also to provide a way to estimate whether homophily patterns are exhibited consistently across different networks. To this aim, we tabulate for each network and each homophily index used—dyadic homophily, triadic homophily, etc.—the occurrences corresponding to an absence of homophily, weak, strong, or very strong evidence of homophily (or heterophily). We then compute the number of concordant and discordant cases for each pair of networks. For instance, we track the number of indices for which no evidence of homophily is found in one network, while strong evidence is uncovered in the second network. This gives us a first indication with respect to whether homophily patterns are similar across two networks. Moreover, we compare these numbers to a null model defined as follows: for each network and each homophily index, we reshuffle the “No”, “W”, “S”, “VS”, “W_{het}”, “S_{het}”, “VS_{het}” cases, keeping their number fixed, and compute again the number of concordant and discordant indices. If the empirical number of concordant cases falls outside the confidence interval of the resulting distribution for the null model, it indicates that the number of concordant cases obtained is not just due for instance to a large majority of “No” cases. Thus, it is a strong indication that the homophily patterns between networks are similar enough so that information on homophily can be obtained from either.
Note that this comparison procedure can be performed independently of the way in which homophily (or lack thereof) is assessed, as long as this way is consistent across layers. Note also that it can be applied to arbitrary numbers of layers, of attributes and of homophily indicators.
3 Results
3.1 Description of network characteristics
We first present an overview of some descriptive characteristics of the data under investigation.
3.1.1 Comparison between successive terms
Pearson correlation coefficients between termaggregated networks
(a) Communication vs. communication (p = 0 for all entries)  

DurationT1  DurationT2  DurationT3  NumberT1  NumberT2  NumberT3  
DurationT1  1.0  0.57  0.59  0.89  0.49  0.59 
DurationT2  0.57  1.0  0.89  0.6  0.9  0.88 
DurationT3  0.59  0.89  1.0  0.52  0.77  0.9 
NumberT1  0.89  0.6  0.52  1.0  0.56  0.57 
NumberT2  0.49  0.9  0.77  0.56  1.0  0.9 
NumberT3  0.59  0.88  0.9  0.57  0.9  1.0 
(b) Copresence vs. copresence  

DurationT1  DurationT2  DurationT3  NumberT1  NumberT2  NumberT3  
DurationT1  1.0 (p = 0.0)  0.4 (p = 0.0)  0.44 (p = 0.0)  0.93 (p = 0.0)  0.37 (p = 0.0)  0.2 (p = 0.04) 
DurationT2  0.4 (p = 0.0)  1.0 (p = 0.0)  0.41 (p = 0.0)  0.32 (p = 0.0)  0.91 (p = 0.0)  0.3 (p = 0.002) 
DurationT3  0.44 (p = 0.0)  0.41 (p = 0.0)  1.0 (p = 0.0)  0.48 (p = 0.0)  0.43 (p = 0.0)  0.87 (p = 0.0) 
NumberT1  0.93 (p = 0.0)  0.32 (p = 0.0)  0.48 (p = 0.0)  1.0 (p = 0.0)  0.35 (p = 0.0)  0.3 (p = 0.002) 
NumberT2  0.37 (p = 0.0)  0.91 (p = 0.0)  0.43 (p = 0.0)  0.35 (p = 0.0)  1.0 (p = 0.0)  0.44 (p = 0.0) 
NumberT3  0.2 (p = 0.04)  0.3 (p = 0.002)  0.87 (p = 0.0)  0.3 (p = 0.002)  0.44 (p = 0.0)  1.0 (p = 0.0) 
(c) Questionnaire vs. questionnaire (p = 0 for all entries)  

Q1T0  Q1T1  Q1T2  Q1T3  Q2T0  Q2T1  Q2T2  Q2T3  
Q1T0  1.0  0.65  0.58  0.51  0.62  0.55  0.52  0.43 
Q1T1  0.65  1.0  0.81  0.77  0.47  0.71  0.7  0.63 
Q1T2  0.58  0.81  1.0  0.77  0.37  0.6  0.73  0.59 
Q1T3  0.51  0.77  0.77  1.0  0.28  0.52  0.63  0.73 
Q2T0  0.62  0.47  0.37  0.28  1.0  0.58  0.49  0.34 
Q2T1  0.55  0.71  0.6  0.52  0.58  1.0  0.71  0.59 
Q2T2  0.52  0.7  0.73  0.63  0.49  0.71  1.0  0.65 
Q2T3  0.43  0.63  0.59  0.73  0.34  0.59  0.65  1.0 
3.1.2 Comparison between the communication, copresence, friendship and trust networks
We found no significant correlation between the weights of edges in the yearly or termaggregated communication and copresence networks, showing that these networks correspond potentially to quite different interaction patterns (the cosine similarities between these networks show also quite low values). On the other hand, both communication and copresence weights show weak but significant correlations with the weights resulting from the two questionnaires Q1 and Q2. The values of the cosine similarities of neighborhoods of nodes (i) between communication and questionnaires, and (ii) between copresence and questionnaires, display moreover values much larger than in the null models with reshuffled weights or edges. Finally, in each term, the weights reported in Q1 and Q2 are strongly correlated (but distinct), and the cosine similarities of neighborhoods of nodes in the two questionnaire networks are close to 1 (see Supporting Information).
3.2 Homophily patterns in yearlyaggregated networks
With respect to these attributes, various homophily patterns are thus observed when aggregating over the whole dataset of one year without taking into account the timing of communication events, but also when considering sequences of calls separated by short time windows.
3.3 Evolution of homophily in communication across terms
Homophily with respect to nationality is also very strong and stable across terms as measured by dyads. It weakens, however, in the third term as measured by triads. In terms of social preference, interesting distinct patterns are found: homophily decreases strongly and becomes weak or absent in the third term for Singaporean students, but instead remain very strong and in fact increase for foreigners (see Supporting Information).
The tendency toward homophily with respect to GPA remains rather weak across all terms with respect to all indicators, except in the first term for triads and in the third term for dyads. On the other hand, several instances of heterophilic tendencies are found with respect to the first spoken language. Finally, we find no clear tendency toward homophilous behavior of students with respect to their scores in the three psychological questionnaires (see Supporting Information). Some tendency toward heterophilous behavior is even observed in some cases, in particular in the social preference of the students with loneliness index below median.
3.4 Comparison between homophily in various networks
As discussed in the introduction, an important issue, besides the evidence for homophily (or the lack thereof) in each layer of interaction or relations available for analysis, is whether the same or different conclusions are reached when investigating these different layers. As made clear from the comparison reported above, there are indeed significant correlations between communication and friendship or trust networks, and the students linked in the communication network tend also to have spent more time in copresence. However, these networks are very distinct both in terms of structure and weights.
Summary of the dyadic homophily patterns found in the different networks, with respect to the various attributes considered
Comparison of homophily in networks. Each table corresponds to a pair of networks and gives at row X and column Y the number of cases in which an indicator gives a result X in the first network and a result Y in the second. The intervals correspond to the confidence intervals of the null model described in the text, and empirical numbers are emphasized in boldface if they lie outside this interval
A first assessment of the results gathered in Table 5 indicates that concordant cases (on the diagonals) are far more numerous than discordant ones. It is, however, important to deepen our analysis as this overall observation might simply be due to the large number of indicators showing an absence of homophilous patterns. Indeed, if we consider a large number of attributes and a large number of indicators, and only few of them show evidence for homophily, then many concordant cases will be automatically observed, even if the few cases of homophily are very different in distinct network layers. To check if this is indeed the case, we resort to a comparison with the following null model: for each layer and each homophily indicator (dyadic, triadic or social preference), we reshuffle at random the answers (“VS”, “S”, “W”, “No”, “W_{het}”, “S_{het}” and “VS_{het}”) across terms and attributes, and compute for each reshuffling the number of concordant and discordant cases. We present in Table 5 the confidence intervals (C.I.) defined by the 5th and 95th percentiles of this null model, we emphasize in boldface the cases in which the empirical numbers are outside the C.I. and we color in particular the cells in which the numbers of concordant cases are above the C.I.
For the comparison between the two questionnaire networks, as well as between the communication network and the questionnaire networks, the numbers of concordant cases with and without homophily are both much larger than the upper bound of the confidence intervals of the null model, while the numbers of cases in which one network shows homophily while the other does not are smaller than the lower bound of the C.I. These three networks have therefore overall similar homophily patterns, despite discrepancies occurring in a number of specific cases.
On the other hand, comparisons involving the copresence network lead mostly to numbers of concordant and discordant cases within the C.I. of the null model. This means that, even if the copresence network displays a similar “amount” of evidence for homophilous behavior with respect to the other layers of the social network, the homophily patterns are no more similar than random, given this amount. Hence, the copresence homophily patterns do not inform us about which specific attributes and which specific indicators exhibit homophily patterns in the other networks.
4 Discussion
The increased availability of data providing proxies for human behavior and social relationships, often in digital form, has led to a surge in the number of studies of social theories and effects. Most such studies are, however, based on the analysis of one specific layer (e.g., phone call communications) of the population social network, which is best represented as a multilayer network. It is now well established that the various network layers bear some level of correlations but are far from being equivalent. However, it is still unclear to what extent one can infer general conclusions from the study of only one layer. In this paper, we have considered this issue—with a particular focus on homophily patterns—through the lens of a dataset providing data on several layers of the same population, namely a communication layer, a copresence layer, and two questionnaires describing friendship and trust relationships. The population under scrutiny is formed of firstyear students in an Asian university. Notably, the diversity of students in the population allows us to investigate homophily patterns along several dimensions: gender, nationality, first spoken language, GPA and psychological indices assessed by questionnaires. It is worth adding that most studies about homophily reported in the literature are concerned with populations having a homogeneous composition in terms of nationality and first language [4, 11, 12, 13, 14, 15, 16].
In terms of direct comparison between networks, we found no correlation between the weights of links in the copresence and communication network, but significant correlations between communication or copresence and questionnaires networks. We also found a clear correlation between communication (number and call volume) and reported friendship strength, confirming results of other authors with other types of population [9, 11, 14, 16, 32, 33]. This latter point stands in stark contrast with the absence of correlation between the amount of copresence and friendship strength.
The strongest uncovered evidence of homophily is with respect to gender and nationality in several indicators and layers, while weaker evidence concerns homophily with respect to academic performance as measured by the GPA. No homophily was found with respect to the first spoken language nor psychological indices (similarly to [4], even if for different indices).
Most importantly, we have put forward here a systematic way of comparing homophily patterns with respect to a heterogeneous group of attributes in the different layers of a social network. This methodology is based on counting the numbers of concordant and discordant indicators of homophily in each pair of networks. As a large number of concordances might simply be due to a scarcity of indicators showing homophily, a crucial point is to compare these numbers with a null model in which the results of the indicators are reshuffled within each network and type of indicator. If the observed number of concordant (resp. discordant) cases lies above (resp. below) the confidence interval of this null model, it means that both networks yield an overall concordant picture of the homophily patterns in the studied social network, in a way that is not simply due to an overall lack of homophily. On the other hand, if the observed number of concordant cases falls within the confidence interval of the null model, we can conclude that one cannot extract information about homophily patterns in one network from the patterns in the other network.
In the specific case under study, we found that the communication and questionnaire layers lead to similar conclusions in many cases—even if some minor discrepancies are observed—and more than expected from the null model. This means that the communication layer allows us to obtain information about homophilous trends in the friendship and trust networks of this social network. On the other hand, the copresence network cannot be used to assess homophily patterns occurring in the other layers.
Our work has several limitations that are worth mentioning. First, from the experimental viewpoint, it could be argued that providing the participants with a new device might have influenced some user behaviors in the early stages of the study. For instance, participants might spend time exploring the features of the new device. However, it is unlikely that the target behaviors of interest (i.e., chatting on mobile phone, copresence activity) had been changed. In supporting this, the communication networks in different terms showed moderate to strong correlations. Moreover, the copresence networks in different terms were also significantly correlated with each other. Another obvious limitation is the fact that our study is based on one single dataset of a specific population of limited size. The population was, however, largely isolated, and data is available for a whole year, allowing the analysis of the evolution of the homophily patterns along the year, as well as the comparison with the evolution in the other layers. Moreover, we could not reliably use messaging data, although messages nowadays represent a fair amount of communication between individuals. Furthermore, we did not have access to any online social network on which messages are also exchanged. The copresence data had limited spatial resolution owing to the particular choice of the Bluetooth technology. It might be that with another technology yielding a higher spatial resolution, data on facetoface interactions would lead to different conclusions, and correspond to a larger similarity of homophily patterns with the communication and questionnaire networks.
To conclude, we note that the methodology put forward to assess the similarity of homophily patterns in different layers of a social network is general and can be applied to any dataset composed of several layers of interactions or relationships between individuals, and to any set of attributes for which homophily patterns are of interest. We therefore hope that the present study will stimulate further similar dataset collections and investigations into this crucial issue.
Notes
Acknowledgements
We would like to thank Ms. Xiaoqian Li for her assistance in making the database available to us, in answering multiple rounds of questions regarding the dataset, and in reviewing the manuscript.
Availability of data and materials
All data is provided as Additional file 2.
Authors’ contributions
WQY designed the network smartphone study and data collection. AB and RB conceived and designed the study of homophily for the social network, with consultations from WQY. AM performed the statistical data analysis. AB, AM, and RB wrote the manuscript. WQY reviewed the manuscript. All authors read and approved the final manuscript.
Funding
This work was partially supported by the SUTDMIT International Design Center (IDC) under Grant IDG31100106 and IDD41100104 (AM, WQY, and RB). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests
The authors declare that they have no competing interests.
Supplementary material
References
 1.Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge CrossRefGoogle Scholar
 2.Barrat A, Barthelemy M, Vespignani A (2008) Dynamical processes on complex networks. Cambridge University Press, Cambridge CrossRefGoogle Scholar
 3.De Choudhury M, Mason WA, Hofman JM, Watts DJ (2010) Inferring relevant social networks from interpersonal communication. In: WWW ’10: proceedings of the 19th international conference on world wide web. ACM, New York, pp 301–310. ISBN 9781605587998 CrossRefGoogle Scholar
 4.Mollgaard A, Zettler I, Dammeyer J, Jensen MH, Lehmann S, Mathiesen J (2016) Measure of node similarity in multilayer networks. PLoS ONE 11(6):e0157436 CrossRefGoogle Scholar
 5.Read JM, Edmunds WJ, Riley S, Lessler J, Cummings DAT (2012) Close encounters of the infectious kind: methods to measure social mixing behaviour. Epidemiol Infect 140:2117–2130 CrossRefGoogle Scholar
 6.Barrat A, Cattuto C (2015) Facetoface interactions. In: Gonçalves B, Perra N (eds) Social phenomena: from data analysis to models. Springer, Cham, pp 37–57 CrossRefGoogle Scholar
 7.Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL (2007) Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci USA 104(18):7332–7336 CrossRefGoogle Scholar
 8.Miritello G, Lara R, Cebrian M, Moro E (2013) Limited communication capacity unveils strategies for human interaction. Sci Rep 3:1950 CrossRefGoogle Scholar
 9.Saramäki J, Leicht EA, López E, Roberts SGB, ReedTsochas F, Dunbar RIM (2014) Persistence of social signatures in human communication. Proc Natl Acad Sci USA 111(3):942–947 CrossRefGoogle Scholar
 10.McPherson M, SmithLovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–445 CrossRefGoogle Scholar
 11.Kossinets G, Watts DJ (2009) Origins of homophily in an evolving social network. Am J Sociol 115:405–450 CrossRefGoogle Scholar
 12.Aiello LM, Barrat A, Cattuto C, Ruffo G, Schifanella R (2010) Link creation and profile alignment in the aNobii social network. In: SocialCom ’10: proceedings of the second IEEE international conference on social computing, pp 249–256 CrossRefGoogle Scholar
 13.Stehlé J, Charbonnier F, Picard T, Cattuto C, Barrat A (2013) Gender homophily from spatial behavior in a primary school: a sociometric study. Soc Netw 35:604–613 CrossRefGoogle Scholar
 14.Palchykov V, Kaski K, Kertész J, Barabási AL, Dunbar RIM (2012) Sex differences in intimate relationships. Sci Rep 2:370 CrossRefGoogle Scholar
 15.Kovanen L, Kaski K, Kertész J, Saramäki J (2013) Temporal motifs reveal homophily, genderspecific patterns, and group talk in call sequences. Proc Natl Acad Sci USA 110(45):18070–18075 CrossRefGoogle Scholar
 16.Jo HH, Saramäki J, Dunbar RIM, Kaski K (2014) Spatial patterns of close relationships across the lifespan. Sci Rep 4:6988 CrossRefGoogle Scholar
 17.Navarro H, Miritello G, Canales A, Moro E (2017) Temporal patterns behind the strength of persistent ties. arXiv:1706.06188
 18.Nicosia V, Latora V (2015) Measuring and modeling correlations in multiplex networks. Phys Rev E 92:032805 CrossRefGoogle Scholar
 19.Aleta A, Moreno Y (2018) Multilayer networks in a nutshell. arXiv:1804.03488
 20.Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323(5916):892–895 CrossRefGoogle Scholar
 21.Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of largescale social networks in an online world. Proc Natl Acad Sci USA 107:13636 CrossRefGoogle Scholar
 22.Stopczynski A, Sekara V, Sapiezynski P, Cuttone A, Larsen JE, Lehmann S (2014) Measuring largescale social networks with high resolution. PLoS ONE 9(4):e95978 CrossRefGoogle Scholar
 23.Sekara V (2014) The strength of friendship ties in proximity sensor data. PLoS ONE 9(7):e100915 CrossRefGoogle Scholar
 24.Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9):e0136497 CrossRefGoogle Scholar
 25.Leecaster M, Toth DJA, Pettey WBP, Rainey JJ, Gao H, Uzicanin A, Samore M (2016) Estimates of social contact in a middle school based on selfreport and wireless sensor data. PLoS ONE 11(4):e0153690 CrossRefGoogle Scholar
 26.Smieszek T, Castell S, Barrat A, Cattuto C, White PJ, Krause G (2016) Contact diaries versus wearable proximity sensors in measuring contact patterns at a conference: method comparison and participants’ attitudes. BMC Infect Dis 16(1):341 CrossRefGoogle Scholar
 27.Boonstra TW, Larsen M E, Townsend S, Christensen H (2017) Validation of a smartphone app to map social networks of proximity. arXiv:1706.08777
 28.Mones E, Stopczynski A, Lehmann S (2017) Contact activity and dynamics of the social core. EPJ Data Sci 6(1):6 CrossRefGoogle Scholar
 29.Dabkowski M, Breiger R, Szidarovszky F (2015) Simultaneousdirect blockmodeling for multiple relations in Pajek. Soc Netw 40:1–16 CrossRefGoogle Scholar
 30.Snijders TAB, Lomi A, Torló VJ (2013) A model for the multiplex dynamics of twomode and onemode networks, with an application to employment preference, friendship, and advice. Soc Netw 35(2):265–276 CrossRefGoogle Scholar
 31.Vörös A, Snijders TAB (2017) Cluster analysis of multiplex networks: defining composite network measures. Soc Netw 49:93–112 CrossRefGoogle Scholar
 32.Hill RA, Dunbar RIM (2003) Social network size in humans. Hum Nat 14(1):53–72 CrossRefGoogle Scholar
 33.Roberts SGB, Dunbar RIM, Pollet TV, Kuppens T (2009) Exploring variation in active network size: constraints and ego characteristics. Soc Netw 31(2):138–146 CrossRefGoogle Scholar
 34.Roberts SGB, Dunbar RIM (2011) Communication in social networks: effects of kinship, network size, and emotional closeness. Pers Relatsh 18(3):439–452 CrossRefGoogle Scholar
 35.Yow WQ, Li X, Hung WY, Goldring M, Cheng L, Gu Y (2014) Predicting social networks and psychological outcomes through mobile phone sensing. In: 2014 IEEE international conference on communications (ICC), pp 3925–3931 CrossRefGoogle Scholar
 36.Laniado D, Volkovich Y, Kappler K, Kaltenbrunner A (2016) Gender homophily in online dyadic and triadic relationships. EPJ Data Sci 5(1):19 CrossRefGoogle Scholar
 37.Maslov S, Sneppen K, Zaliznyak A (2004) Detection of topological patterns in complex networks: correlation profile of the Internet. Physica A 333:529–540 CrossRefGoogle Scholar
 38.Newman MEJ (2009) Random graphs with clustering. Phys Rev Lett 103:058701 CrossRefGoogle Scholar
 39.Hubert LJ (1987) Assignment methods in combinatorial data analysis. Marcel Dekker, New York zbMATHGoogle Scholar
 40.Krackhardt D (1987) QAP partialling as a test of spuriousness. Soc Netw 9:171–186 MathSciNetCrossRefGoogle Scholar
 41.Krackhardt D (1987) Predicting with networks: nonparametric multiple regression analysis of dyadic data. Soc Netw 10:359–381 MathSciNetCrossRefGoogle Scholar
 42.Smirnov I (2017) Formation of homophily in academic performance: students change their friends rather than performance. PLoS ONE 12(8):e0183473 CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.