Constant state of change: engagement inequality in temporal dynamic networks
Abstract
The temporal changes in complex systems of interactions have excited the research community in recent years as they encompass understandings on their dynamics and evolution. From the collective dynamics of organizations and online communities to the spreading of information and fake news, to name a few, temporal dynamics are fundamental in the understanding of complex systems. In this work, we quantify the level of engagement in dynamic complex systems of interactions, modeled as networks. We focus on interaction networks for which the dynamics of the interactions are coupled with that of the topology, such as online messaging, forums, and emails. We define two indices to capture the temporal level of engagement: the Temporal Network (edge) Intensity index, and the Temporal Dominance Inequality index. Our surprising results are that these measures are stationary for most measured networks, regardless of vast fluctuations in the size of the networks in time. Moreover, more than 80% of weekly changes in the indices values are bounded by less than 10%. The indices are stable between the temporal evolution of a network but are different between networks, and a classifier can determine the network the temporal indices belong to with high success. We find an exception in the Enron management email exchange during the year before its disintegration, in which both indices show high volatility throughout the inspected period.
Keywords
Engagement indices Interactions intensity Dominance Gini-index inequalityAbbreviations
Probability Distribution Function
- CDF
Cumulative distribution function
- CPD
Change point detection
Introduction
Dynamic complex systems of interactions are often modeled as a sequence of snapshots of networks in time (Holme and Saramäki 2012). While this is a rather simplistic representation, it is widely accepted that the structural properties of a network play a significant role in determining its actors’ behavior (Granovetter 1983; Burt 2000; Haynie 2001; Spencer 2003; Snijders 2005; Kossinets and Watts 2006; Perra et al. 2012). The last decade’s abundance of temporal information paved the path to a further understanding of the dynamics of networks (Lazer et al. 2009; Artime et al. 2017) and the effect on their actors (Fowler et al. 2008; Phelps 2010; Hellmann and Staudigl 2014; Ilany et al. 2015).
The intensity of interactions, also referred to as ties’ strength, has long been recognized as a fundamental property (Barrat et al. 2007). Human contacts are of different durations (Barabasi 2005; Onnela et al. 2007); Human relationships are of varying strength (Granovetter 1977); Human flight fluxes differ across routes (Opsahl et al. 2008), and more. The heterogeneity in edge intensity, i.e., the duration, strength, or capacity of the above interactions has been modeled utilizing edge weights and weighted networks (Barrat et al. 2007; Barrat et al. 2004; Newman 2004; Opsahl et al. 2010). The intensity of interactions (Opsahl et al. 2008) was used in a variety of applications, such as an aiding tool in the assessment of the level of conflicts within organizations (Nelson 1989), and the understanding of human communication patterns (Gilbert and Karahalios 2009; Miritello et al. 2011).
Here, we utilize weighted networks modeling to research temporal indices of engagement, such as average intensity and participation inequality in online person-to-person interaction networks, termed connection networks (Holme and Saramäki 2012). Connection networks may refer to organizational email networks, online forums and messaging apps, and online discussions (Eckmann et al. 2004; Sun et al. 2016).
Temporal measures of engagement are of interest as they give a measure of member participation, interest, influence, dominance, and more. In organizations, where frequent changes were found to be the norm (Burke 2017), following the temporal intensity and dominance of the interactions can help in identifying fluctuations in involvement and engagement prior, during, and after a planned organizational change, as well as assess the reactions to a shock. These temporal measures are of interest, also in the case of online social networks engagement, where participation was found to be dominated by a few (Nielsen 2006). Recent studies, however, found that participants change their active role in the network and their engagement over time (Sonnenbichler 2010). Currently, it is unclear whether these changes affect the temporal measures of network activity.
To study the temporal behavior of a network, we define indices of average connection intensity and nodal dominance inequality in temporal networks and measure these quantities over several real-world networks. Surprisingly, we find a stationary behavior of networks over time, regardless of massive fluctuations in their size. Our results demonstrate that networks converge to a steady-state of engagement, irrespective of significant variations in the number of participants. Deviations from the steady state are rare and do not correlate with a change in size.
Of specific interest is the case of the Enron managers email network. The dataset was released in a court order after the company has disintegrated, and has been recently used, together with the known set of events, for change point detection (CPD) schemes (Peel and Clauset 2015; Miller and Mokryn 2018). Unlike anomaly detection techniques that scan for temporal fluctuations from the norm, CPD schemes try to infer the points in time when networks change their norm and thus are termed points of change. We find that throughout the period inspected in the Enron managers email network, both indices cannot be seen as stationary, and the fluctuations in the network’s temporal indices are significantly higher than the ones we found for all other networks.
Our results determine that networks differ by the engagement indices we defined, and can be differentiated by them. To further verify this result, we ran a classification experiment over the weekly indices, and find that the classifier can classify the indices tuples to their corresponding network with high validity.
Our surprising yet robust results have implications to the inference of the behavior of complex systems over time and the dynamics of networks. Of interest is the understanding of the origin of the different engagement indices between networks, and whether they can be utilized to characterize networks. The robustness of the result across size changes in the networks is of importance for the understanding of stationary properties, and their implications for dynamic systems and collective behavior.
Related
Complex systems of interacting elements, from human (social and organizational) to physical and biological ones, can be modeled as interaction networks, with nodes representing the elements and edges representing their interactions. When the interactions are dynamic, i.e., human and social interactions, a complete model that captures the longitudinal evolution of the system is comprised of a sequence of networks, each portraying a snapshot of the system at a single point in time. Other models do exist (Holme and Saramäki 2012). In this work, we follow the modeling of sequential periods similarly to (Pan and Saramäki 2011).
Temporal networks are viewed in recent years as a natural way to investigate dynamic systems (Holme and Saramäki 2012; Artime et al. 2017; Sekara et al. 2016; Li et al. 2017), where “the system under study should consist of agents that interact pairwise, so that the interactions have both some degree of randomness and some regularity” (Gautreau et al. 2009; Holme and Saramäki 2012). Dynamic online interactions have been studied to model conflicts (Yasseri et al. 2012), temporal ego networks and strength of links over time (Karsai et al. 2014).
In this work, we model the dynamics of electronic one-to-one communication such as emails and instant messages. The case of online forums can be considered as a one-to-many communication (Holme and Saramäki 2012) yet, in this work it was modeled utilizing the replies and hence also as a form of one-to-one communications.
Temporal networks of electronic messages have been investigated mainly in the context of information spreading and contagious (Rodriguez et al. 2011; Gomez-Rodriguez et al. 2012; Rosvall et al. 2014; Nadini et al. 2018). Structural dynamics and properties of temporal networks also receive much attention, such as temporal paths length, centrality, community and motif measures (Pan and Saramäki 2011; Perra et al. 2012; Kovanen et al. 2011; Taylor et al. 2017).
Complex networks of interactions are dynamic and heterogeneous by nature (Corrado 2019). One of the cornerstones of heterogeneity is the nodal degree, or in weighted networks, node intensity. Intensity patterns are heterogeneous with a few nodes having a significantly higher degree or intensity level, hence more dominant in the network (Barrat et al. 2004; Barrat et al. 2007; Opsahl et al. 2008; Corrado 2019). Dominance in systems mostly refers to the dominant role of its members. In social networks of interactions, groups of roles are inferred by analyzing the structure of networks (Gupte et al. 2017; Costa and Ortale 2018). Studies found that in online social networks the most prominent group is that of active influencers, estimated at merely 1% of the members, while accounting for almost all the network activity (Nielsen 2006). Role groups differ in size. Nielsen (2006) found that most online communities have a highly unequal role group sizes, with 90% of members never contributing, 9% that contribute little, and 1% that account for almost all network activity. Interestingly, roles are temporal and members often transition between roles (Sonnenbichler 2010).
Hence, we continue to define measures of engagement in networks, and explore their temporal nature. For a suggested organizational change, for example, such measures can determine levels of engagement in the change: If communication inequality is low, then many participate in discussions. If inequality is high, only a few dominate the conversation and are actively involved. The intensity of the conversations can be identified by comparing to the intensity in other periods.
Network intensity measures
Average interaction intensity
Where N is the total number of nodes in this network, and w_{ij} is a non-zero value for the strength of edges that disseminate from the focal node i.
The tuning parameter, α, determines the importance of each of these parts. When α=0 the edge strength is ignored, and only its existence is taken into account, resulting in a measure that is similar to the one in Freeman (1978). Conversely, when α=1 only the edges weights are considered, while the binary structure is not (Opsahl et al. 2010).
ϕ is a metric that depending on the chosen value for the tuning parameter α describes with a scalar the weighted sum of the network degrees. Specifically, when the tuning parameter α is set to zero the metric ϕ_{α=0} corresponds to the number of edges in the graph; Alternatively, when the tuning parameter α is set to one the metric ϕ_{α=1} corresponds to the sum of all edge weights in the network, that is, the overall intensity of interactions in a network.
ψ≥1 holds for all graphs. In the case where edge weights are based on a ratio scale (Opsahl and Panzarasa 2009) then ψ is bounded by that ratio. Otherwise, it is unbounded. When ψ∼1 the network intensity level is very low, and the vast majority of edges have a low weight. In social networks of interactions, low intensity corresponds to a low number of interactions between any two members in the network. Accordingly, when ψ>>1, the network intensity level is high. High intensity, in this case, implies the existence of edges representing interactions of high volume, also referred to as strong ties.
In this work, we did not take the direction of the interactions into account, yet clearly, the intensity index can be computed for in-degrees and out-degrees separately. In organizations, for example, it corresponds to those disseminating information and those on the receiving side; in online forums to conversation initiators and responders, correspondingly.
Temporal network intensity index
Where G_{τ},τ∈[1..T] is a sequence of graphs representing consecutive network snapshots in a period T.
Interactions indicate how information flows in a network. Understanding the flow of information in a network over time is fundamental in the research of social networks and organizations. The proposed temporal intensity metric enables an additional layer of knowledge on the flow of information, as it gives a measure of volume. It captures interactions occurring during a measured period that do not change the structure but still carry additional information on the complex system behavior. It thus enriches our understanding of the network’s temporal complexity. For example, today’s organizations are in a constant state of change (Burke 2017). Following the temporal intensity of the interactions in an organization can help in identifying fluctuations in the level of intensity in the organization prior, during, and after a planned organizational change.
Network dominance inequality index
Complex networks are heterogeneous with a few dominant nodes. We explore here the measure and extent of this inequality. Measuring the disparity in the level of communication, for example, enables an understanding of the variance in the level of members’ engagement in a network.
We continue to study the inequality in nodal dominance in a graph while considering the intensity of nodes’ interactions. In organizations, for example, when a change is introduced, high interactions can be found among its supporters and opposers. Members that have yet to make up their mind would exhibit less intensity in their interactions (Burke 2017). In this case, understanding the level of inequality in the intensity of the participation can aid in understanding the balance between change-involved members versus those who are not.
We measure the inequality in nodal interactions dominance utilizing the Gini inequality index (Gini 1921; Atkinson 1970) for measuring income inequality. The Gini index is a measure of the mean absolute difference, and in our case, the difference is in nodal engagement, i.e., weighted degree. To follow the temporal changes of dominance in a network, we use a temporal measure of this index per period, which we term Dominance Inequality.
Measuring temporal intensity and dominance inequality in real networks
Real-World datasets descriptions
Name | #Nodes | #Edges | Duration |
---|---|---|---|
min, max | min, max | (in weeks) | |
AskUbuntu (Paranjape et al. 2017) | 1458, 2832 | 2108,4325 | 198 |
Facebook wall posts (Viswanath et al. 2009) | 1566, 11325 | 1514, 13384 | 124 |
Wikipedia conflict (Brandes et al. 2009) | 2011, 7250 | 3749, 33623 | 156 |
Wikipedia talk (Sun et al. 2016) | 15154, 53236 | 26494, 73356 | 132 |
Manufacturing emails (Michalski et al. 2011) | 104, 148 | 587, 1335 | 38 |
EU Research institutional emails (Paranjape et al. 2017) | 52, 667 | 46, 3197 | 74 |
Enron management emails (Klimt and Yang 2004) | 33, 107 | 27, 212 | 78 |
Robustness of the temporal network intensity
Robustness of temporal dominance inequality
A network nearing its end: enron emails
An important question is how would the devised metrics behave for the Enron dataset. We deviate here for a paragraph, to give the needed background on the once billion-dollar company known for its Bankruptcy in December of 2001 and its disintegration in the following year. Enron, originally a gas company, has “created Enron Online (EOL) in October 1999, an electronic trading website that focused on commodities. Enron was the counterparty to every transaction on EOL; it was either the buyer or the seller.. When the recession hit in 2000, Enron had significant exposure to the most volatile parts of the market. By the fall of 2000, Enron was starting to crumble under its own weight”(Segal 2019). Shortly after its demise, the company’s entire email exchange was released by a judge order.
Given the known set of events and their timeline, and the availability of the entire management email corpus, Enron’s emails are used for change point detection algorithms, who compare their found events with actual ones (Peel and Clauset 2015; Miller and Mokryn 2018).
Overall, during the entire checked period both the Temporal Networks Intensity and the Dominance Inequality indices exhibit large weekly changes that unlike the rest of the networks, cannot be defined as stationary.
Predicting a network from its engagement
The networks examined were characterized by stability in two selected indices, the activity index and the Gini index. This stability comes both in the range of the values measured for each network over time and in the level of changes within the indices between successive periods. In measuring the distribution of the percentage of changes between successive periods it appears that volatility of up to 0.5 in the Temporal Network Intensity covers over 90% of the network’s operating time. Similar results were obtained for the Dominance Inequality index. The values measured for the indices between networks, however, differ by 0.4 to 0.7.
To examine how typical are the Temporal Network Intensity and the Temporal Dominance Inequality indices for each network we perform a classification task over the temporal indices with the target of classifying the class (dataset) that produced it. We perform the experiment over all seven datasets as appear in Table 1.
Classification results according to the Intensity and Inequality features
Classifier | Balancing | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|
Avg.,Std | Avg.,Std | Avg.,Std | Avg.,Std | ||
KNN | Stratified fold | 0.83, 0.04 | 0.83, 0.03 | 0.85, 0.04 | 0.86, 0.01 |
KNN | Data multiplication | 0.87, 0.01 | 0.88, 0.01 | 0.87, 0.01 | 0.87, 0.01 |
Decision Tree | Stratified fold | 0.84, 0.03 | 0.84, 0.03 | 0.85, 0.02 | 0.85, 0.01 |
Decision Tree | Data multiplication | 0.90, 0.01 | 0.90, 0.01 | 0.90, 0.01 | 0.89, 0.01 |
Random Forest | Stratified fold | 0.86, 0.03 | 0.86, 0.03 | 0.87, 0.03 | 0.86, 0.03 |
Random Forest | Data multiplication | 0.95, 0.01 | 0.95, 0.01 | 0.95, 0.01 | 0.95, 0.01 |
Our classes (datasets) were not equal in size, when considering the number of weeks (see Table 1). We, therefore, employ two known balancing techniques. The first is multiplying the small datasets to balance the scale of each class; the other is Stratified Folds that preserves the probability distribution of each class for all folds (Kohavi and et al. 1995).
Classification Results: We present the results for each classification algorithm and each balancing method in Table 2.
All algorithms were able to infer with F1 in the range of [0.75,0.85] and high accuracy the correct network dataset from its weekly indices over all folds. To test the dependency of the success per class, we repeatedly re-ran the tests while excluding one class (dataset) at a time, and compared the overall results. The difference in the results was insignificant across all experiments, showing that the overall result is robust across the datasets.
Discussion and conclusions
In this work, we set to understand how temporal engagement in networks changes with time. To that end, we defined two indices to capture the temporal network activity. The first, Temporal Network Intensity, can be roughly described as the average edge intensity in the network over a period. The second, the Dominance Inequality, is a measure of the engagement variance. Our surprising results are that for most emails and forum networks checked, the indices were stationary, implying a steady state. For a network known to be nearing a disintegration, Enron, the indices were volatile.
A similar stationary value was found in Gautreau et al. (2009) for the average degree of the flux of people from airports. However, airports’ physical limitations may give a plausible explanation for this measure. In the datasets examined in this work these limitations do not exist. Interestingly, both our indices can be derived utilizing the average degree. We believe that these findings need to be further researched over a wider variety of networks exhibiting different dynamics.
The robustness of the indices regardless of significant size changes of the underlying network in time, is itself intriguing. For example, when the size of the network decreases, in a process of preferential detachment it is expected that the level of engagement and hence the indices would be also effected. We intend to further research this counter-intuitive result.
We focus here on the complex temporal interactions and utilize them to gain an understanding on the system’s temporal behavior. By moving from a nodal-centric view to an interaction-centric view, we suggest a novel understanding on the dynamics of complex networks. Lastly, our result show that the indices we devised fluctuated significantly in a network that was dealing with a shaky situation that let to the company’s disintegration. In a future research, we intend to further understand the behavior of the indices for different network models and dynamics.
Footnotes
- 1.
The Enron Management dataset will be discussed in detail in “A network nearing its end: enron emails” section
Notes
Acknowledgments
This research was supported by the ISRAEL SCIENCE FOUNDATION (grant No. 82371/).
Authors’ contributions
HM and OM designed the experiments and analyzed the results. HM conducted the experiments. OM and HM wrote the manuscript. Both authors read and approved the final manuscript.
Funding
This work was partially supported by the Israel Science Foundation Grant #328/17.
Competing interests
The authors declare that they have no competing interests.
