Description of network characteristics
We first present an overview of some descriptive characteristics of the data under investigation.
Figure 2 shows the normalized number of calls events as a function of the hour of the day, summed over all days of data collection, and as a function of the day of the week, summed over all weeks. As expected, communication events display clear daily and weekly patterns, with almost no calls at night, an increase during the day, and a peak around 6–7 p.m. around the end of class time. It is worth adding that all participants dwelled on campus from Monday to Friday as part of their residential program requirements. Fewer calls were placed during weekends, with instead more calls on Fridays and Mondays. We show in the SI the timelines for co-presence events. Interestingly, we observe in this case a peak on Thursdays, which may be attributed to the fact most Singaporean students leaved the campus on Friday evenings. During weekends, co-presence peaks in the evenings, especially on Sunday when students come back to stay on campus in preparation for school the next day. Finally, we show in the SI the full timeline of numbers and aggregated durations of communication and co-presence events at a weekly resolution.
As expected in this type of networks, edge weights (number and aggregated durations of events) show broad distributions spanning several orders of magnitude (see Supporting Information). On the other hand, node degree distributions are narrow as the population under investigation is of relatively small size (35 students). We note that, even considering yearly aggregation, the networks are far from being fully connected, especially for the communication network: each student had on average communicated only with less than 10 other students, and the maximal degree is 22, in line with results on limited communication capacities observed in larger systems . Finally, Fig. 3 displays the distribution of weights in the questionnaire networks. Most links carry the minimum possible weight in all cases, but this tendency decreases over time in both questions (see Sect. 2 for the exact phrasing), while the fraction of strong friendships tends to increase, and the distribution tends towards a bimodal shape.
Comparison between successive terms
Table 3 and Fig. 4 illustrate the temporal evolution of the different networks at the term level. The communication networks aggregated in the second and third terms are very strongly correlated, while they are only moderately correlated with the first term network. On the other hand, the co-presence networks in different terms show weak correlations. For both networks, the cosine similarity distribution extends over a quite broad range (Fig. 4), and show larger values than in the two null models considered, with lowest median value for the similarities between the non-successive terms T1 and T3. Finally, for each type of questionnaire question, the correlation between the weights decrease as the time between questionnaires increases. In particular, the network constructed from the questionnaire answered at the start of the study shows the weakest correlation with successive questionnaires, which may be attributed to the fact that the students did not know each other well at that stage. Cosine similarities between different terms take very large values, much larger than in the null model with reshuffled weights (Fig. 4).
Comparison between the communication, co-presence, friendship and trust networks
We found no significant correlation between the weights of edges in the yearly- or term-aggregated communication and co-presence networks, showing that these networks correspond potentially to quite different interaction patterns (the cosine similarities between these networks show also quite low values). On the other hand, both communication and co-presence weights show weak but significant correlations with the weights resulting from the two questionnaires Q1 and Q2. The values of the cosine similarities of neighborhoods of nodes (i) between communication and questionnaires, and (ii) between co-presence and questionnaires, display moreover values much larger than in the null models with reshuffled weights or edges. Finally, in each term, the weights reported in Q1 and Q2 are strongly correlated (but distinct), and the cosine similarities of neighborhoods of nodes in the two questionnaire networks are close to 1 (see Supporting Information).
To explore in more details the comparison between pairs of networks, we consider the properties of links either (i) common to two networks or (ii) present only in one of two networks. Figure 5 displays the complementary cumulative distribution function (CCDF) of edge weights for links common to the communication and co-presence networks, as well as the CCDF of weights for links present in only one of the two networks. Note that many links are present only in the co-presence network, while few are present only in the call network, which is not surprising given the much denser nature of the co-presence network. A clear difference is observed between the distributions of co-presence weights, with broader distributions for links common to both networks than for links present only in the co-presence networks: students who communicated by phone calls also tended to spend more time in co-presence, but a broad distribution is obtained even for the links between students who did not communicate by phone. On the other hand, no clear difference is observed in the communication weights between pairs of students who were at least once in co-presence and pairs who were not, maybe because of the lack of statistics for the latter: very few pairs of students indeed communicated but were never detected in co-presence.
We also compare the communication links and weights for the various weight categories in the questionnaires as shown in Fig. 6. As the questionnaire weight w increases, the fraction of links with that weight that are also present in the communication network increases strongly, from almost 0 for low weights to 60–70% for the strongest weights. This result confirms earlier findings that stronger friendship relations correspond to more probable communication. Interestingly, however, the average number or duration of these communications does not depend on the questionnaire weight category, except for the largest weight category, for which larger average number and duration of communications are observed: the pairs of closest friends have more frequent and longer communication patterns with respect to other pairs of students. It is also worth highlighting that no such clear tendency is observed when comparing questionnaire weights and co-presence patterns: the fraction of links corresponding to co-presence barely increases with the questionnaire weight, and the corresponding average co-presence duration (or number of events) does not show any clear trend (not shown).
Homophily patterns in yearly-aggregated networks
We first present a brief study of the homophily patterns for the globally aggregated networks. We focus here mostly on the communication network, data for the co-presence network being shown in the Supporting Information. Figure 7 gives a first indication of the presence of homophily in the communication and co-presence networks, by comparing the distribution of the number of shared attributes for individuals connected by a link with the same distribution in the null model in which attributes are reshuffled across nodes. Here, we consider the following six attributes: cohort class, age, gender, nationality, GPA, and first spoken language. Large values of the number of shared attributes are over-represented with respect to the null model: in particular, a much larger fraction of links connect nodes sharing all these attributes than in the null model, while the fraction of links connecting nodes with no common attribute is smaller than in the null model.
Figure 8 goes further by showing the CCDF of edge weights in the communication network, separately for edges between individuals with similar and different values for these six attributes. All distributions are broad: both weak and strong links are observed in each case, showing that one cannot separate these easily in two groups and guess from the weight of a link if the two connected individuals share an attribute. On the other hand, the distributions tend to be broader for edges linking nodes with the same value for several attributes, and the largest weights link nodes with same nationality, gender, age and class.
Figures 9 and 10 show the homophily patterns with respect to gender, nationality, first spoken language and GPA uncovered by investigating the fraction of weight carried respectively by links and triangles between individuals with the same attribute, as described in the Methods section. Very strong homophily patterns are found with respect to gender and nationality, not only at the dyadic level but also for triangles: gender and nationality homophily determine which triangles, and not only which links, carry more weight in the network. Homophily with respect to GPA is on the other hand absent or at most very weak, while heterophilic patterns are observed for the first language.
Figure 11 investigates the social preference homophily patterns of each group of individuals. Both male and female students show a clear homophily pattern in their preferred communication partner. Similarly, both Singaporean and Foreigners display homophilous social preference. On the other hand, homophily with respect to GPA shows contrasting trends: individuals with an above median GPA do not show homophily in their preferred communication partner, while individuals who have low GPA (below median) do (more so in terms of aggregated duration of communication than in terms of number of calls). For first spoken language, a weak tendency toward heterophily is observed for non-Chinese speaking students.
Finally, Fig. 12 exhibits strong homophily patterns observed in reciprocal and repeated call motifs, both for gender and nationality. Only weak homophily is further observed with respect to GPA. In the first spoken language case, we also observe some tendency toward homophily, in contrast with the other indexes described above.
With respect to these attributes, various homophily patterns are thus observed when aggregating over the whole dataset of one year without taking into account the timing of communication events, but also when considering sequences of calls separated by short time windows.
Evolution of homophily in communication across terms
We now turn to the study of how homophily patterns evolve across the year in the group of students. To this aim, since questionnaire networks were collected once in each term, and also to work with sufficient statistics, we consider term-aggregated networks of communication. We show here the results corresponding to homophily patterns in dyads, while figures for triadic homophily and social preference are shown in the Supporting Information. Gender homophily as revealed by the weight carried by dyads with the same gender is very strong in all terms, and exhibits a clear increasing trend (Fig. 13(a) and (b)). The same increasing trend is observed in the weight carried by homophilic triads, even if the evidence for homophily is only weak with respect to the null model in the first term. In terms of social preference patterns, homophily increases for males, from absent or weak in the first two terms to very strong in the last term, while it is very strong in all terms for females (see Supporting Information).
Homophily with respect to nationality is also very strong and stable across terms as measured by dyads. It weakens, however, in the third term as measured by triads. In terms of social preference, interesting distinct patterns are found: homophily decreases strongly and becomes weak or absent in the third term for Singaporean students, but instead remain very strong and in fact increase for foreigners (see Supporting Information).
The tendency toward homophily with respect to GPA remains rather weak across all terms with respect to all indicators, except in the first term for triads and in the third term for dyads. On the other hand, several instances of heterophilic tendencies are found with respect to the first spoken language. Finally, we find no clear tendency toward homophilous behavior of students with respect to their scores in the three psychological questionnaires (see Supporting Information). Some tendency toward heterophilous behavior is even observed in some cases, in particular in the social preference of the students with loneliness index below median.
Comparison between homophily in various networks
As discussed in the introduction, an important issue, besides the evidence for homophily (or the lack thereof) in each layer of interaction or relations available for analysis, is whether the same or different conclusions are reached when investigating these different layers. As made clear from the comparison reported above, there are indeed significant correlations between communication and friendship or trust networks, and the students linked in the communication network tend also to have spent more time in co-presence. However, these networks are very distinct both in terms of structure and weights.
In order to investigate if the layers are similar enough in terms of the homophily patterns they exhibit, it is possible to thoroughly compare the results provided in the previous section for the communication network and in the Supporting Information for other networks. Examples of such comparisons are given in Figs. 13 and 14: in these figures, we can visually check if a given indicator shows homophily in each term for different networks. For instance in Fig. 13, we notice that there is dyadic homophily in all terms for gender and nationality in the communication network, while in the co-presence network there is homophily only in the first term for nationality, and in the first and third terms for gender. Such a visual investigation, also found for instance in , is however limited to only one type of indicator in each figure (e.g., one figure for dyadic homophily, one for triadic, etc.), and only a few attributes. Overall, a systematic side-by-side comparison of the figures showing whether homophily is present, for all pairs of layers and all possible indicators of homophily, would be difficult and tedious to carry out and would not improve the holistic analysis of homophily. A first improved visual way enabling a more holistic comparison of homophily across layers is given by Table 4 (see also Supporting information). In this Table, we summarize the evidence for homophily or heterophily in the different layers and terms, with respect to all the considered attributes. The use of colors highlights cases in which the same answer is obtained in different layers (e.g., gender and nationality homophily in communication and in both questionnaire networks). On the one hand, however, this Table is still not easy to apprehend globally, and on the other hand, one needs to draw a separate table for each type of homophily measure.
We thus perform one more summarizing step in order to reach more easily interpretable results: for each pair of networks, we count the number of cases in which one network gives a certain answer while the other network gives another answer, where by “case” we mean “one homophily measure on one attribute for one term”. We tabulate these numbers for each pair of networks and show the full tables in the Supporting Information. In Table 5, we show the outcome of a simplified counting procedure in which we group “No”, “W” and “Whet” as evidence for “No homophily nor heterophily pattern” on the one hand and “S” and “VS” (resp. “Shet” and “VShet”) as evidence for homophily (resp. heterophily) on the other hand. Note that this methodology could easily be adapted to answer more detailed comparisons, for instance by separating attributes into different groups (e.g., considering only homophily with respect to psychological indices), or on the opposite to include an arbitrary number of homophily indicators and of attributes.
A first assessment of the results gathered in Table 5 indicates that concordant cases (on the diagonals) are far more numerous than discordant ones. It is, however, important to deepen our analysis as this overall observation might simply be due to the large number of indicators showing an absence of homophilous patterns. Indeed, if we consider a large number of attributes and a large number of indicators, and only few of them show evidence for homophily, then many concordant cases will be automatically observed, even if the few cases of homophily are very different in distinct network layers. To check if this is indeed the case, we resort to a comparison with the following null model: for each layer and each homophily indicator (dyadic, triadic or social preference), we reshuffle at random the answers (“VS”, “S”, “W”, “No”, “Whet”, “Shet” and “VShet”) across terms and attributes, and compute for each reshuffling the number of concordant and discordant cases. We present in Table 5 the confidence intervals (C.I.) defined by the 5th and 95th percentiles of this null model, we emphasize in boldface the cases in which the empirical numbers are outside the C.I. and we color in particular the cells in which the numbers of concordant cases are above the C.I.
For the comparison between the two questionnaire networks, as well as between the communication network and the questionnaire networks, the numbers of concordant cases with and without homophily are both much larger than the upper bound of the confidence intervals of the null model, while the numbers of cases in which one network shows homophily while the other does not are smaller than the lower bound of the C.I. These three networks have therefore overall similar homophily patterns, despite discrepancies occurring in a number of specific cases.
On the other hand, comparisons involving the co-presence network lead mostly to numbers of concordant and discordant cases within the C.I. of the null model. This means that, even if the co-presence network displays a similar “amount” of evidence for homophilous behavior with respect to the other layers of the social network, the homophily patterns are no more similar than random, given this amount. Hence, the co-presence homophily patterns do not inform us about which specific attributes and which specific indicators exhibit homophily patterns in the other networks.