1 Introduction

Synchronization is a process that allows the automatic coordination of units and events in time. Across many domains in nature, it is a mechanism that permits to reduce uncertainty and risk without the need for a centralized mechanism of control. Synchronization is a widespread phenomenon observed everywhere in nature, from animals [1] to neurons [2] and heart cells [3], and up to more complex entities like human beings [4, 5].

In humans, synchronization emerges as a spontaneous coordination mechanism that provides benefits to groups and the individuals that live within [6]. In an evolutionary perspective, synchronization increases the probability of group survival, by reducing the individual costs required by the engagement of coordinated and cooperative action [7]: in a multilevel selection mechanism, a group of cooperators has indeed higher chances of evolutionary success than a group of defectors. The positive effect of synchronization is also found in the behavior of people within groups, where synchronous activity has been found to enhance the level of cooperativeness [8] even without muscular bonding [9] or shared positive emotions [10, 11]. Synchronized groups should then in principle be more cooperative ones, and by comparing the level of synchronization between different groups, we may be able to measure their relative level of cooperativeness. In the present study, we propose two synchronization indices: (i) within synchronization representing the relative level of cooperation within a close proximity-based community (i.e., municipality level), and (ii) between synchronization representing the level of cooperation among different communities in a larger geographical area (i.e., province level). More specifically, these indices capture the synchronization of human activity in an area through mobile phone data. Mobile phone data capture rich information about human activities and the structure of the social interactions therein [12]. They have been used to estimate the socioeconomic status of territories [13] and individuals [14], to analyze the dynamics of cities [15], to model the spreading of diseases [16], and to predict crime levels [17]. Our hypothesis is that the two synchronization indices, capturing the degree of cooperativeness among human activities, can describe traditional measures of social capital, which is the source of capital that facilitates cooperation through shared social norms [18].

The relevance of social capital for economic growth is largely acknowledged [19]; it reduces the transaction costs associated with formal coordination mechanisms, [20] predicts strong economic performance [21] and financial development [22], and reduces corruption by inducing political and civic participation [23, 24].

An important distinction in the social capital literature is the one between bonding and bridging patterns of relations [25]. In his work, the political scientist Putnam states that bonding social capital provides emotional support and a sense of belonging in which the members of a community sustain each other [25]. This form of social capital is usually observed in homogeneous groups with strong cooperation, such as families or circles of close friends. Bridging social capital, instead, stems from relations between groups, that is, between individuals from heterogeneous backgrounds [25]. A community exploring novel interactions and co-operation with other communities can be considered to have a high amount of bridging social capital [26]. This form of social capital has been described as potentially useful for achieving instrumental goals since a larger variety of resources becomes available by interacting with people of diverse status, occupation or ethnicity [26].

Previous research on capturing bonding and bridging social capital, and their effect on economic prosperity, from mobile phone and social media data has analyzed this issue focusing on the role played by different network structural properties (e.g., topological network diversity, network density, etc.) [13, 27]. To the best of our knowledge, the current work is the first study that analyzes whether and to what extent synchronization aspects of human communication are associated with traditional social capital metrics (i.e., Referendum turnout, Blood donations, and Association density).

Several studies have highlighted the role and the benefits played by the synchronization of activities among individuals and groups. Indeed, synchronization is argued to improve cooperation and trust in a community [5, 8]. Hence, we expect that communities with strong synchronization may experience richer opportunities for cooperation, decreased costs of market interactions, less reliance on formal business regulations and increased informal money circulation and investments, all aspects enabled by high levels of trust [5, 8, 28]. Thus, our first hypothesis is that high levels of call activity’s synchronization in a tight area (that we associate to a municipality) are likely to reflect bonding patterns as people interact and communicate within a close proximity-based social group. In particular, high levels of within synchronization in a proximity-based community capture frequent communication patterns and connections among people living in this community.

Interaction among diverse groups of individuals and communities have been linked to higher exploration of possibilities, thus promoting the flow of information and novel ideas that affect economic prosperity [2, 6]. Following Paxton [29], bridging social capital occurs when members of one group connect with members of other groups to seek access, support or to gain information. On this basis, our second hypothesis is that the interaction of a given community (i.e., a given municipality) with many different communities can be found in the high synchronization of their communication patterns. In particular, we expect that municipalities with more synchronization with other municipalities may experience a communication with a more diverse array of communities (i.e., having bridging ties spreading to many different municipalities) and gain novel ideas and information, and thus may show higher levels of bridging social capital.

Interestingly, our results show that a synchronization-based approach well correlates with traditional social capital measures (i.e., Referendum turnout, Blood donations, and Association density), being also able to characterize the different role played by high synchronization within a close proximity-based community and high synchronization among different communities.

2 Materials and methods

For this study we use an aggregated and anonymized Call Detail Records (CDRs) dataset provided by the largest Italian mobile phone operator (34% of market share) over a period of one month: from March 31, 2015 to April 30, 2015. CDRs are collected for billing purposes by mobile network operators: every time a phone interacts with the network, a CDR recording the time and location (in terms of cell network’s antenna) of the user is created.Footnote 1 The data we use is spatially aggregated and completely anonymized by the mobile phone operator as it is not possible to connect different calls of the same user.

Italy is an ideal playground in this domain because Italian regions present very different levels of economic development, although they have experienced the same formal institutions, laws, language and currency for many years now. Many scholars have identified the root of this persistent divergence in differential endowments of social capital [30, 31]. For these reasons, Italy has been widely studied in social capital economic literature [23, 25]. As a byproduct, there are several survey-based data sources for obtaining social capital measures that can be used as a ground-truth. More specifically, following examples in the economics literature [22, 25, 32], we use Referendums turnout, Association density and Blood donations as our ground-truth. Referendums turnout are usually considered as proxy of the desire of civic participation, as voting at referendums is not mandatory in Italy and the issues on the ballot in referendums are less related to local interests. Association density is defined as the number of associations per 100,000 inhabitants. Associations can be cultural, leisure, artistic, sports, environmental, and any kind of nonprofit associations with the exclusion of professional and religious associations [19]. Blood donations are measured as the instances of donations per 1000 inhabitants.

In our analysis, we select both large provinces (NUTS-3 regions) with more than one million inhabitants, and smaller provinces known for high and low levels of social capital (according to the aforementioned social capital survey-based measures). The indicators of level of social capital used to select small NUTS-3 regions—intended with a population between 200,000 and 500,000 inhabitants—are the data available for Italy on association density, referendum participation and blood donations [30, 33, 34]. Specifically, considered NUTS-3 regions are:

  • Turin, Milan, Venice, Rome, Naples, Bari, Palermo (large NUTS-3 regions);

  • Caltanissetta, Siracusa, Benevento, Campobasso (defined as low-social capital NUTS-3 regions [34]);

  • Siena, Ravenna, Ferrara, Asti, Modena (defined as high-social capital NUTS-3 regions [34]).

These areas represent the smallest areal units available for social capital data. NUTS-3 regions are therefore our unit of analysis. The choice of these NUTS-3 regions is partly data-driven, but we select them also as they exhibit different levels of social capital. Figure 1 shows the map of Italy with the NUTS-3 regions under analysis.

Figure 1
figure 1

Analyzed data from large NUTS-3 regions (\(>1\mathrm{M}\) inhabitants), and medium NUTS-3 regions known for high/low levels of social capital [34]. (Right inset) Enlargement of Rome NUTS-3 region highlighting municipalities (LAU-2 regions). Data are collected at a sub-municipality resolution

The area of each region is spatially divided in an irregular grid, provided by the mobile phone operator, based on the size of the underlying antennas’ coverage area. The cells have area ranging from 0.04 km2 in the city center to 40 km2 in the suburbs.

For each cell, we aggregate the number of CDRs at an hourly time scale to obtain a time series recording the level of activity on an hourly basis.

We normalize each ith cell’s time series \(x^{i}_{t=\text{day},h}\) with a z-score computed on an hourly basis. \(\mu^{i}_{h}\) and \(\sigma^{i}_{h}\) are the 24 means and standard deviations of \(x^{i}_{\text{day},h}\) for each hour. Thus, we obtain: \(z^{i}_{\text{day},h} = (x^{i}_{\text{day},h} - \mu^{i}_{h}) / \sigma^{i}_{h}\). Using different \(\mu^{i}_{h}\) and \(\sigma^{i}_{h}\) for different hours is very important because otherwise the circadian trend in our data would notably bias the synchronization among the time series (i.e., all time series would be highly synchronized because the day-night trend would cover more subtle differences).

The resulting time series (see Fig. 2) highlights deviations of the mean activity in different hours of the day on the one hand and on the other they are sufficiently stationary to apply standard statistics to measure the correlation (i.e., synchronization) of two time series.

Figure 2
figure 2

Example of daily rhythm in a mobile phone cell. (A) Original behaviour extracted from mobile phone data; (B) z-score scaled behaviour extracted from mobile phone data

For each NUTS-3 region, we compute two synchronization metrics: within synchronization is the average daily synchronization among cells assigned to the same municipality; between synchronization is the average daily synchronization among cells assigned to different municipalities (cells are assigned to municipalities based on the quantity of their overlapping area). Specifically, for each couple of cells i and j, we compute the average daily Mutual Information between \(z^{i}_{\text{day},h}\) and \(z^{j}_{\text{day},h}\): \(\frac{1}{N}\sum_{\text{day}=1}^{N} I(z^{i}_{\text{day},h};z^{j}_{\text{day},h})\).

Mutual information is a natural measure of non-linear dependence quantifying the amount of information obtained about one time-series through the other one. Therefore, it measures how synchronized the two series are, and it is computed as:

$$I\bigl(z^{i}_{\text{day},h};z^{j}_{\text{day},h}\bigr) = \int_{z^{i}_{\text{day},h}} \int_{z^{j}_{\text{day},h}} p\bigl(z^{i}_{\text{day},h},z^{j}_{\text{day},h} \bigr)\log \biggl(\frac {p(z^{i}_{\text{day},h},z^{j}_{\text{day},h})}{p(z^{i}_{\text{day},h})p(z^{j}_{\text{day},h})} \biggr). $$

This approach computes a single average (within and between) synchronization for the whole time of observation (one month with our data). So, even if short-term events can spur sudden synchronization, the average value reflects longer-term trends in the behavioral patterns in the regions.

Figure 3 shows the distribution of between and within synchronization for the NUTS-3 regions under analysis. We consider the mean (among cells) of between and within synchronization as the reference value for each region (to be used in the regression model described below).

Figure 3
figure 3

Violin plots, ordered by the median within synchronization, showing the average between and within synchronization of each city

As aforementioned in the Introduction Section, we postulate that:

  • High levels of within synchronization reflect the tendency of people to communicate together within their spatial cluster (i.e., municipality).

  • High levels of between synchronization reflect instead the tendency of people to communicate together across different spatial clusters (i.e., municipalities).

We therefore use these two synchronization measures, computed from passively collected human behavioural data, to describe traditional proxies for social capital used in economics literature such as Referendums turnout, Association density and Blood donations.

In summary, for each of the 16 NUTS-3 regions under analysis, we compute the respective synchronization indices (i.e., within and between synchronization) and extract the traditional proxies for social capital. We check via Moran’s I test that the obtained variables are not spatially auto-correlated, then we apply the linear regression analysis described in the following section.

2.1 Regression analysis

To validate our hypotheses, we describe the three social capital measures (i.e., Referendums turnout, Blood donations, and Association density) by means of three Ordinary Least Squares (OLS) models where the independent variables are: (i) within synchronization, (ii) between synchronization, and (iii) per-capita income. In principle many factors could affect the level of social capital and thus affect our estimation: the quality of institutions, the level of education, the degree of income inequality, to mention some. Following Alesina et al. [35] and Guiso et al. [36] we here consider per-capita income as a sole co-variate for the regression, to keep our estimates parsimonious, and use the level of per-capita income as a general proxy for these factors. Indeed higher per-capita income has been shown to be related to the strength of local institutions [37] and to the quality of education systems [18]. In Appendix C we report an additional set of regression analyses using the fraction of illiterate population, a good proxy for the level of education, as a sole covariate for the regression.

Between and within synchronization across NUTS-3 regions are highly correlated (\({\rho= 0.9}\)), raising multicollinearity issues. Having correlated regressors, we have to rely on multiple metrics to illustrate the statistical significance and importance of the variables in our model [38]. Thus, we report and discuss the variable importance through the beta weights, structure coefficients [39], commonality analysis components [40], dominance analysis [41] and Lindeman, Merenda, and Gold’s (LMG) method [42].

Beta weights are often relied on to assess regressors’ importance [39]. Beta weights indicate the expected increase/decrease in the dependent variable (e.g., Referendums turnout), expressed in standard deviation units, given a one standard deviation increase in such independent variable with all other independent variables held constant. However, the sole reliance on beta weights to interpret the contribution of each independent variable is justified only when the independent variables are perfectly uncorrelated [43]. In fact, beta weights may receive credit for explained variance shared with other regressors, while beta weights of the other regressors are not given credit for this shared variance [43]. Therefore, the contribution of the other regressors to the regression effect may be not fully captured. Moreover, beta weights have also limitations in determining suppression effects in a regression, that is, a regressor that contributes little or no variance to the dependent variable but it may have a large non-zero beta weight because it purifies one or more regressors of their irrelevant variance, thereby increasing its or theirs predictive power [44].

Structure coefficients quantify the strength of the bi-variate relationship between each regressor and the dependent variable in isolation from other correlations between regressors and dependent variable. Hence, they are a useful measure of the direct effect of a regressor [39]. Being only a measure of direct effect, they are unable to identify regressors sharing explained variance in the dependent variable, and thus to quantify the amount of this shared variance [39]. Instead, the LMG measure can be thought as the average improvement of regressor \(X_{1}\), over all models of size s without \(X_{1}\) [42].

In order to quantify the contribution that each regressor shares with every other set of regressors, we also perform a commonality analysis [40]. This technique decomposes \(R^{2}\), and thus the total effect (\(\mathit{Tot}_{\mathrm{CA}}\)), into its unique (\(U_{\mathrm{CA}}\)) and common (\(C_{\mathrm{CA}}\)) effects. Unique effects indicate how much variance is uniquely accounted for by a single regressor; while common effects indicate how much variance is common to each set of regressors [40]. It is worth noting that if the regressors are all uncorrelated, the contributions of all regressors are unique effects, as no variance is shared between independent variables in the prediction of the dependent variable.

Moreover, we use dominance analysis [41] to determine the importance of a regressor based on comparisons of unique variance contributions of all pair of independent variables to regression equations involving all possible subsets of regressors. Interestingly, dominance analysis is a technique able to quantify (i) the direct effect of a regressor in isolation from other regressors, as the subset containing no other regressors includes zero-squared correlations, (ii) the total effect, as it compares the unique variance contributions of the regressors when all of them are included in the model, and (iii) the partial effect, as it compares the unique variance contributions of the regressors for all the possible subsets of them.

3 Results

Results of OLS models are shown in Table 1, where we report the adjusted \(R^{2}_{\mathrm{adj}}\)Footnote 2 of the OLS using between synchronization, within synchronization and per-capita income as covariates.

Table 1 Referendums turnout, Blood donations, Association density represented by between and within synchronization, controlled for per-capita income were tested using commonality analysis. As for statistical significance of the beta weights, we use the following notation: \({}^{*}p<0.05\), \({}^{**}p<0.01\)

The variable importance of the independent variables is reported through the Beta weights, the structure coefficients [39], the commonality analysis components [40], the dominance analysis [41] and the Lindeman, Merenda, and Gold’s (LMG) method [42]. Figure 4 summarizes the results of two of the most used variable importance metrics.

Figure 4
figure 4

(Upper) Lindeman, Merenda and Gold relative importance of the independent variables we used in our model; (lower) total, common and unique contribution of the independent variables we used in our model. (BS): between synchronization. (I): per-capita income. (WS): within synchronization

Here we provide a detailed analysis of each social capital proxy used in economics literature.

Referendums turnouts. The first group of rows of Table 1 shows that between synchronization contributes the most to the regression equation (\(\beta= -0.12\)), while holding all other regressors constant. It is the most correlated variable with the predicted Referendums turnout (\(r_{s} = -0.76\)) and the major contributor to the regression effect (\(\mathit{Tot}_{\mathrm{CA}} = 0.43\)), where 27.2% of regression effects is unique and 16.2% is in common with the other variables. The relative importance of between synchronization (\(\mathit{Tot}_{\mathrm{CA}} = 0.43\) and \(\mathrm{LMG} = 0.38\)) is closely related to the one of per-capita income (\(\mathit{Tot}_{\mathrm{CA}} = 0.42\) and \(\mathrm{LMG} = 0.40\)). Dominance analysis confirms this importance (see Table 2).

Table 2 Referendums turnout: Dominance analysis output. The ✓ symbol represents the dominance of a variable A on B. The × symbol represents the dominance of a variable B on A. In empty cells dominance could not be established between regressors

The second most important beta weight is within synchronization that, besides its positive value, has negative correlation with Referendums turnout (\(r_{s} = -0.63\)). This may indicate that the regression effect was confounded by all the variables included in the model but they all contribute substantially in the explanation of Referendums turnout (all \(C_{\mathrm{CA}}\) and \(\mathit{Tot}_{\mathrm{CA}}\) values are greater than zero).

Blood donations. From the second group of rows of Table 1 we observe that between synchronization holds the highest contribution to the regression in all the metrics, accounting for 52% of the importance in the model (\(\beta= -24.91\)), highest total (\(\mathit{Tot}_{\mathrm{CA}} = 0.40\)) and unique contribution (\(U_{\mathrm{CA}} = 0.36\)).

The second most important beta weight is within synchronization that, besides its positive value, has negative correlation with Blood donations (\(r_{s} = -0.580\)). This may indicate that the regression effect was confounded by all the variables included in the model but they all contribute substantially in the explanation of Blood donations (all \(C_{\mathrm{CA}}\) and \(\mathit{Tot}_{\mathrm{CA}}\) values are greater than zero). The importance of within synchronization is very close to the importance of per-capita income, but from the Dominance analysis (see Table 3) we have that per-capita income has a minor role in the regression.

Table 3 Blood donations: Dominance analysis output. The ✓ symbol represents the dominance of a variable A on B

Associations density. The last group of rows in Table 1 shows that within synchronization and between synchronization obtained the largest beta weights (\(\beta= 22.96\) and \(\beta= -21.88\) respectively), demonstrating the most important contributions to the regression equation, while holding all other regressors constant. Despite this, per-capita income accounts for 42% of the importance in the model, having also the highest total (\(\mathit{Tot}_{\mathrm{CA}} = 0.42\)) and unique contribution (\(U_{\mathrm{CA}} = 0.41\)). From the Dominance analysis (see Table 4) it is possible to see that the most important variable is indeed per-capita income, followed by between synchronization and within synchronization.

Table 4 Association density: Dominance analysis output. The ✓ symbol represents the dominance of a variable A on B. The × symbol represents the dominance of a variable B on A

Particularly, besides the positive value of within synchronization’s beta weight, it is negatively correlated with Association density (\(r_{s} = -0.31\)). Together, the very small structure coefficient (\(r^{2}_{s} = 0.09\)) and the negative common effect (\(C_{\mathrm{CA}} = -0.21\)) may indicate [45] the suppression role of within synchronization in the regression that purifies the variance explained by the other variables.

4 Discussion

Taken together, our results show that the models can explain the 68% of the variation in Referendums turnout (\(R^{2}_{\mathrm{adj}} = 0.68\)), the 55% of the variation in Blood donations (\(R^{2}_{\mathrm{adj}} = 0.55\)) and the 52% of the variation in Association density (\(R^{2}_{\mathrm{adj}} = 0.52\)). Figure 5 shows the distribution of the fitted points.

Figure 5
figure 5

(A) Relation between actual referendums turnout (as reported in the official ISTAT statistics) and predicted referendums turnout (as inferred from mobile phone data); (B) relation between actual association density and predicted association density; (C) relation between actual blood donations and predicted blood donations

Particularly, within synchronization correlates positively with social capital metrics (\(\beta=0.09\) for Referendums turnout, \(\beta =19.49\) for Blood donations, and \(\beta=22.96\) for Association density). Thus, this indicator informs us on the intensity of cohesion within close-proximity groups and communities, which approximates “…the instantiated informal norm that promotes co-operation between two or more individuals… [18]”.

In Larssen et al., individuals with strong social bonding (i.e., association and trust among neighbors) are more likely to take civic action.

Our second indicator, between synchronization, captures the tendency of a given community (i.e., a given municipality) to communicate with many different communities (i.e., other municipalities). Thus, more between synchronization implies more interaction among multiple groups (i.e., municipalities); while less between synchronization implies less interaction and more isolation among groups. Interestingly, our results correlate negatively a high level of between synchronization with standard social capital metrics (\(\beta =-0.12\) for Referendums turnout, \(\beta=-24.91\) for Blood donations, and \(\beta=-21.88\) for Association density). These findings are in line with a number of theoretical and empirical works claiming that diversity undermines a sense of community and social cohesion [20, 35, 4649]. For example, Alesina and La Ferrara [46] have studied whether and how much the degree of heterogeneity in communities influences the amount of participation in different types of groups. Using survey data on group membership and data on localities in United States, they found that, after controlling for many individual characteristics, participation in associations (e.g., religious groups, hobby clubs, youth and sport groups, etc.) is significantly lower in more different, unequal, and racially or ethnically fragmented localities.

Our results are obtained including per-capita income in the regressions, similarly to what is done in the literature [22, 35]; controlling for wealth at the level of the NUTS-3 regions. The role of per-capita income is indeed important. We find that per-capita income has a strong relevance in describing the Association density, while it shows a minor role in explaining the higher Referendums turnout and Blood donations.

5 Conclusion

In this paper, we have introduced a couple of novel synchronization metrics (i.e., within and between synchronization) that represent an innovative and efficient way to describe traditional social capital measures (i.e., Referendum turnouts, Blood donations, and Association density). The proposed approach is, at the best of our knowledge, the first one that combines synchronization metrics and mobile phone data, which are always up to date and available for a very large fraction of the world population. A further merit of our approach is the ability to identify and analyze individually the role played by the level of cooperation within a close proximity-based community (i.e., within synchronization), and the one played by the level of cooperation among different communities in a larger geographical area (i.e., between synchronization). Moreover, our approach does not need individual-level data, which is rarely shared by telecommunication operators to ensure data confidentiality. It is also worth noting that our synchronization-based approach can be extended easily to other sources of information such as activities on social media platforms, mobility routines captured from transportation data, etc.

Social capital is a key determinant to understand neighborhood stability for crime prevention, to enforce social cohesion, e.g., immigrant integration, and to create integration tools ind addition to language and culture training. Thus, the geographical characterization of areas with differential levels of social capital is an important tool in the hands of policy makers aiming at specific incentive policies, which are clearly more or less effective depending on the underlying social capital types and levels.