Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter

Tacchi, Jack; Boldrini, Chiara; Passarella, Andrea; Conti, Marco

doi:10.1140/epjds/s13688-024-00485-y

Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter

Research
Open access
Published: 09 August 2024

Volume 13, article number 55, (2024)
Cite this article

Download PDF

You have full access to this open access article

EPJ Data Science Submit manuscript

Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter

Download PDF

Jack Tacchi ORCID: orcid.org/0009-0009-4119-5725^1,2,
Chiara Boldrini¹,
Andrea Passarella¹ &
…
Marco Conti¹

149 Accesses
1 Altmetric
Explore all metrics

Abstract

The Ego Network Model (ENM) is a model for the structural organisation of relationships, rooted in evolutionary anthropology, that is found ubiquitously in social contexts. It takes the perspective of a single user (Ego) and organises their contacts (Alters) into a series of (typically 5) concentric circles of decreasing intimacy and increasing size. Alters are sorted based on their tie strength to the Ego, however, this is difficult to measure directly. Traditionally, the interaction frequency has been used as a proxy but this misses the qualitative aspects of connections, such as signs (i.e. polarity), which have been shown to provide extremely useful information. However, the sign of an online social relationship is usually an implicit piece of information, which needs to be estimated by interaction data from Online Social Networks (OSNs), making sign prediction in OSNs a research challenge in and of itself. This work aims to bring the ENM into the signed networks domain by investigating the interplay of signed connections with the ENM. This paper delivers 2 main contributions. Firstly, a new and data-efficient method of signing relationships between individuals using sentiment analysis and, secondly, we provide an in-depth look at the properties of Signed Ego Networks (SENs), using 9 Twitter datasets of various categories of users. We find that negative connections are generally over-represented in the active part of the Ego Networks, suggesting that Twitter greatly over-emphasises negative relationships with respect to “offline” social networks. Further, users who use social networks for professional reasons have an even greater share of negative connections. Despite this, we also found weak signs that less negative users tend to allocate more cognitive effort to individual relationships and thus have smaller ego networks on average. All in all, even though structurally ENMs are known to be similar in both offline and online social networks, our results indicate that relationships on Twitter tend to nurture more negativity than offline contexts.

Structural Analysis of Directed Signed Networks

Signed Social Networks: A Survey

Influence and Sentiment Homophily on Twitter Social Circles

1 Introduction

Online social networks (OSN) can be seen as a social microscope to investigate the properties of our social interactions in the online world. The increasing global connectivity underscores the significance of understanding social networks and the interactions that occur within them. Social network analysis has extensively employed graph-based models to study the structural characteristics of relationships. One such representation, the Ego Network Model (ENM), is rooted in evolutionary anthropology research on how humans structure their social networks [1]. The ENM model is centred around a single user, the Ego, and portrays all their immediate connections, named Alters, based on their relationship strength to the Ego. This results in a series of concentric circles with increasing size but decreasing intimacy, as illustrated in Fig. 1. The number and sizes of the circles are generally consistent, with an average of around 5, 15, 50, and 150 Alters [2]. The size ratio between them is also quite consistent, with a value close to 3 [3]. Note that an ENM only contains meaningful relationships, i.e. those the Ego spends some time nurturing regularly.

The importance of the ENM is due in large part to its omnipresence in social networks. Indeed, its structure is prevalent across an extremely diverse range of social communities; including traditional hunter-gatherer groups, small-scale horticultural societies, ancient Roman armies and modern-day military units [4]. The ENM is so prevalent that it can even be observed in many non-human primate species, although with smaller group sizes [5]. The Social Brain Hypothesis proposed by Dunbar explains this pervasiveness, positing that primates have a cognitive limit that restricts the size and complexity of social groups they can maintain. For humans, this limit is approximately 150, also known as Dunbar’s number. When the limit is exceeded, social groups tend to become unstable and fragment into smaller, more manageable groups [6]. Although one might assume that the ease of online communication would require less cognitive effort and therefore allow for larger social networks to be maintained, the ENM structure remains largely consistent in online contexts. The only notable difference is the occasional presence of an additional innermost circle, with an average size of around 1.5 Alters [7]. While this has been postulated for offline networks as well, quantities of data sufficient enough to confirm its existence in offline contexts have never been available.

Furthermore, because each individual in a social network can be viewed as an Ego, the entire network itself can be thought of as a collection of interconnected Ego Networks. Thus, observing a network from the perspectives of the individual Egos can reveal insights that are only visible at a microscopic scale, yet have far-reaching consequences across the entire network. Indeed, the structural properties of the ENM have been shown to influence a number of social behaviours, such as collaboration and information diffusion [8].

Despite its ability to provide many insights, the ENM does have some notable limitations. One such drawback is how the tie strength between Egos and Alters is measured, which has traditionally been done by measuring their frequency of interactions. While this has been shown to be a good proxy measure for the strength of a relationship [9], not all relationships can be differentiated merely by their strength. For example, an individual with a supportive coworker and an angry neighbour will have two very different relationships: even though the interaction frequencies may be very similar, the former relationship will be far more positive than the latter. One way to include some of the important qualitative information that is being lost is to use a signed representation of the network, known as a signed network. Each connection in a signed network has a polarity (+/−) indicating either a positive or negative link. The former denotes friendship, trust, and similarity, whilst the latter is associated with hatred and distrust. Positive and negative relationships play different roles in a network and can be leveraged to improve network-related tasks, such as community detection [10] and opinion dynamics [11]. Negative links are more informative than positive ones because, among other things, they are usually located along social divisions in a network, such as between two communities, and they can therefore reveal important information about the structure of the overall network [12]. Thus, the inclusion of signs may improve our understanding of the ENM and social networks in general. However, the sign of an OSN relationship is an implicit piece of information, which typically needs to be estimated by interaction data, making sign prediction in OSNs a research challenge in and of itself.

1.1 Contributions

In this work, we set out to extend the Ego Network Model with information about the signs of relationships. To this aim, we propose a novel method, grounded in quantitative results from psychology [13], of inferring signed relationships in unsigned network data (which are typically used to build ego networks), allowing an unsigned network to be converted into a signed one. This method (i) requires only text-based interactions to sign a relationship (hence, it can be applied to any network in which users interact principally via text, i.e. in the vast majority of popular OSNs), (ii) is designed for the short texts typical of OSNs interactions, (iii) requires only data about the interactions over the links we want to sign (hence scales linearly with them). Note that, while signing individual interactions between users simply boils down to attaching a sentiment to the interaction (typically with a sentiment classifier), signing relationships is more nuanced, as it implies deciding on an overall sentiment that captures the whole relation, and, for this sign to reflect human perception, we decided to ground our approach in psychology. This methodology is then shown to be robust to the chosen sentiment classifier for individual interactions and produces results that are consistent with Structural Balance Theory [14].

The second original contribution is the analysis of Signed Ego Networks (SENs), i.e. Ego Networks where edges have a polarity. This was done by obtaining unsigned Ego Networks, for 9 Twitter datasets, and applying the aforementioned method of generating signs to them. The unsigned and signed versions of the networks are analysed, including the distribution of signed links across the various circles of the SEN. The main findings are that: (i) Twitter users engage in much more negative relationships than expected in the Active Networks (illustrated in Fig. 1), (ii) specialised users (e.g. journalists) do so to an even higher extent, (iii) negative relationships are particularly present in the intimate Ego Network layers of specialised users, and (iv) there is evidence for a potential weak effect of negativity leading to a slightly higher-than-average number of distinct connections, but fewer interactions in each relationship. All in all, the results confirm the popular notion that higher engagement in online social interactions results in being exposed to increasingly negative relationships and sentiments. They also extend beyond this with the surprising revelation that negative relationships tend to be proportionally more present in the social circles of the Ego Networks closer to the Ego.

Some preliminary results on the Signed Ego Network Model (SENM) were first presented in [15]. These were then expanded on in [16], where the generalisability of the SENM was observed across several cultures and types of communities. The main extensions of this current work are the following. First, the robustness of the method of signing relationships is tested using 4 different sentiment analysis models for labelling individual interactions (Sect. 5.1). The results show that the proportions of positive and negative relationships were similar for all 4 of the models. Furthermore, the models agreed on the signs of around 70-80% of the relationships and when the models did disagree, the disagreements tended to be very close to the threshold used for signing the relationships (i.e. when the models disagreed, they tended to only disagree slightly). Next, the method of signing relationships is further validated via triad analysis (Sect. 5.2). Specifically, repeated analysis of the signed triads produced by each of the 4 models shows that the distribution of signs produced by this method fitted expectations of known psychological effects in social networks (i.e. Structural Balance Theory). These distributions are also extremely and significantly different from what would be obtained by chance. Finally, we have included an analysis of the impact of negative social relationships on the cognitive effort of the Ego (Sect. 5.7).

2 Background

2.1 Ego Network Model

As previously mentioned, the ENM is centred around an individual Ego, who is surrounded by their Alters, organised in a series of concentric circles. The ENM stems from the anthropological Social Brain Hypothesis [5], which posits that the social capabilities of primates are constrained by the sizes of their neocortices. Based on the size of our own neocortex, the maximum social group size that can be maintained by a human is estimated to be around 150 (the famous Dunbar’s number). Note that these 150 contacts with whom a person engages do not include acquaintances, rather they are exclusively relationships that are regularly nurtured. Traditionally, this has been defined as a minimum interaction frequency of at least once a year; for example, exchanging annual holiday wishes. These relationships constitute the so-called active part of the Ego Network.

Of course, the frequency and importance of the interactions generated by each relationship varies significantly from Alter to Alter. Indeed, by arranging the Alters based on their tie strength to the Ego, the aforementioned concentric structure will typically emerge [2, 3], with each subsequent circle containing the Alters of the previous ones (thus, the size of the active part of the Ego Network is equivalent to its outermost circle). Both the number of circles (approximately 4 or 5) and their sizes – 1.5, 5, 15, 50, 150 – are fairly regular, in offline and online social networks [7].

As the tie strength between Ego and Alter directly determines which circle the Alters are placed into, this is obviously a core concept of the ENM. Tie strength was defined by Granovetter as the equally weighted combination of 4 elements in a relationship: the time spent maintaining it, its emotional intensity, its level of intimacy and the reciprocal services it generates [17]. This definition can be a crucial consideration for understanding how various users interact socially. For example, individuals who engage in OSNs for professional purposes may devote more time to social platforms, thereby generating more reciprocal services and investing greater amounts of time in maintaining relationships. Indeed, it has previously been suggested that journalists are likely to be more cognitively engaged with Twitter than other types of users [18]. While the time spent maintaining a relationship is just one of the tie strength dimensions described by Granovetter, it has largely been the sole focus of the related literature on Ego Networks due to its widespread availability and ease of computation (using the number of interactions as its proxy). Therefore, the objective of this work is to advance the state of the art by exploring the hitherto underrepresented qualitative aspects of tie strength, in addition to the traditional metric of the time spent maintaining them.

2.2 Signed networks

In contrast to unsigned networks, whose connections are either binary (i.e. a connection between two users either exists or doesn’t) or weighted connections (usually based on tie strength), signed networks feature connections that can be further distinguished as either positive or negative (sometimes referred to as the polarity of edges [19]). Positive links indicate positive relationships and are used to infer trust and homogeneity [20]. On the other hand, negative links indicate negative relationships, distrust, and dissimilarities. Therefore, signed networks contain additional information that can be leveraged to enhance the performance of many tasks, such as community detection [21] and information diffusion [22].

Previous research on networks with publicly available signed connections has revealed that negative connections are significantly less prevalent than positive connections, accounting for approximately 15.0% to 22.6% of the total connections within a network [14]. In these networks, the users’ awareness of link polarity may intensify social pressure and effects such as social capital [23], whereby relationships between individuals who have many relationships in common are more likely to be positive due to social pressure from the surrounding community to get along. Conversely, even if an unsigned network contains implicit positive and negative relationships, the lack of explicitly visible negative links results in lower social pressure. Therefore, we can anticipate that networks without explicit signed relationships will have a higher proportion of negative relations than those with explicitly signed ones. We will investigate this hypothesis further in Sect. 5.

Despite the added advantages of signed networks, they are rarely the focus of research because the vast majority of popular social platforms do not allow users to create explicitly negative links. This makes it very difficult to obtain signed network data in sufficient enough quantities for in-depth analysis. Nevertheless, some exceptions do exist, most notably Slashdot and Epinions, which have provided two of the most widely used benchmark datasets for signed networks [19]. Unfortunately, these datasets do not provide information on interaction frequencies and therefore cannot be used for Ego Network analysis. ENM studies typically use Twitter data (due to their public nature and easy access via the Twitter API) but Twitter does not provide explicit relationship signs between users. However, just as with real-world relationships, relationships that take place online usually contain implicit information about their polarity, which can potentially be gleaned from the interactions they produce [20].

Several approaches have been developed to predict the signs of unsigned networks. However, most of these focus on the structural aspects of the surrounding network in order to deduce the sign of a connection (e.g. by leveraging topological notions like the clustering coefficient [24]), which is an indirect way of extracting signs, without looking directly at how people communicate with each other. Classification algorithms, trained on preexisting datasets with known signs, have also been used to compute the signs of novel networks [25]. All these techniques have taken a top-down perspective, viewing the network’s features as a whole and inferring signs based on the structure of the connections. However, if the inverse approach is taken, viewing the problem from the bottom up, then it is possible to take into consideration the more tacit aspects of connections that have largely gone uninvestigated, as we discuss below.

The basic building blocks that form a relationship are the interactions and exchanges between users and their corresponding sentiments. Sentiment analysis for individual exchanges is extremely well established [26]. This allows signs to be obtained for these singular interactions with an extremely high degree of confidence. However, methods for extending the signs of these bottom-level interactions to whole series of interactions, or relationships, have not received anywhere near the same level of scientific interest. One study [27] that has previously examined this problem trained a Support Vector Machine (SVM) on a manually-annotated dataset of relationships in discussion forums. The SVM took in 4 user features and 3 interaction features and achieved an accuracy of 0.835 on a subset of annotated data. Unfortunately, this approach cannot be directly replicated for Twitter interactions due to their very short and unstructured nature compared to discussion forums. In addition, there is a lack of publicly available ground truth data for Twitter relationships. In response to these problems, we propose an alternative approach that is specifically designed for dealing with short texts and can leverage models that have been established within the previous literature in order to obtain the sentiment of individual interaction.

2.3 Structural balance theory

Signed networks are known to conform to certain properties and configurations. A theory that lays out such a set of informative expectations is Structural Balance Theory [28, 29]; a psychological theory, which postulates that certain configurations of signed triads (i.e. groups of three individuals who are all interconnected by signed edges) should be more common than others when observed across a social network.^{Footnote 1} This is because connections are not independent but rather influenced by the other connections in the surrounding network. With regards to signed triads, those with odd numbers of positive connections, i.e. one and three, are considered plausible, or “balanced” (see T₃ and T₁ in Fig. 2), while those with even numbers of positive connections, i.e. two or zero, are considered implausible, or “unbalanced” (see T₂ and T₀ in Fig. 2). This is because these latter configurations correspond to socially problematic situations: the first, where one individual has two friends who are enemies, and the second, where all three individuals are hostile to one another and none of them decide to pair up against the third. However, a more lenient variant of this theory, commonly known as Weak Structural Balance Theory, argues that it should not be unexpected to have a situation in which three enemies refuse to team up (T₀) or for two friends to have a common enemy (T₁). Therefore, one should only expect triads with exactly two positive connections (T₂) to be underrepresented and only triads with three positive connections (T₃) to be overrepresented, with no expectations for T₁ or T₀ [30].

Given the expectations of Structural Balance Theory, it is possible to validate the predicted signs of a network by analysing the resulting triads [27] and comparing them to the expected numbers of each triad if the signs were distributed at random. This is indeed the approach we use to validate our method for signing relationships. Previously, it has been found that the expectations of the weaker version of Structural Balance Theory tend to fit online datasets better than those of the original theory [14], so this is the version we use in our analysis. While, ideally, the results would also have been validated using a manually-annotated ground truth or a known model, as discussed at the end of the previous section, validation via Structural Balance Theory has been shown to perform more than adequately in the literature [27]. The exact methodology used for this is given in Sect. 3.3.

3 Methodology

This section outlines the methodology for obtaining Signed Ego Networks, assuming that the input data is taken from Twitter (Twitter being the de-facto standard for data in the relevant literature [7, 18, 31, 32]). Our methodology comprises three steps: first, we attach a sign to each relationship based on the signs of individual interactions (Sects. 3.1 and 3.2); then, we validate the obtained relationship signs against Structural Balance Theory (Sect. 3.3); finally, we enrich the standard Ego Network Model by transposing the sign information onto it (Sect. 3.4). Afterwards, in Sect. 3.5, we discuss how to measure the burden of negative relationships on overall social cognitive capacity.

In order to construct Ego Networks, it is necessary to acquire Tweets that involve direct communications between Twitter users. These communications occur when users explicitly reply to another’s post (Replies), mention another user using the “@” symbol (Mentions) or share another user’s Tweet (Retweets). This latter case is sometimes accompanied by an additional piece of text made by the sharing user (Quote Retweets). Each of these directed Tweets corresponds to an interaction between an Ego and an Alter. While some of these interactions may involve the wider network beyond the specific Alter, they nonetheless reflect a cognitive involvement of the Ego towards the Alter, which is the most critical characteristic for mapping an interaction to a specific social relationship [5].

3.1 Signing relationships

As anticipated in the introduction, in this work we take a bottom-up approach to sign extraction, inferring signs from the sentiment of individual interactions. Indeed, the effects of positive and negative exchanges have been studied in a variety of contexts. One such observation that is particularly relevant here is that a ratio of around 1 negative interaction for every 5 positive interactions, or roughly 17%, appears to be an important tipping point for numerous different types of relationships. Once this threshold is crossed, marriages become significantly less likely to last [13] and, for parent-child relationships, children are more likely to underperform at school and have developmental problems [33].

This ratio, which we will refer to as the golden interaction threshold, is leveraged for our proposed method for signing relationships, which culminates in a binary classification (positive or negative) for each Ego-Alter pair. More precisely, our method consists of 2 main steps:

Step 1: label single interactions– First, sentiment analysis is carried out to obtain a positive, neutral or negative label for each text-based communication Tweet made by an Ego towards one of their Alters.^{Footnote 2} The models used for the sentiment analysis of single interactions are discussed in Sect. 3.2.

The sentiment analysis was done for Replies, Mentions and Quote Retweets. Regular Retweets are instead always classified as neutral because they were not originally written by the Ego and, therefore, do not reflect the same level of cognitive effort. Returning to Granovetter’s definition, these regular Retweets can be regarded instead as a reciprocal service generated by a relationship because they correspond to an Ego’s desire to share the content of an Alter. In addition, automatically assigning a neutral sentiment to regular Retweets reduces their relative impact on the overall sign of a relationship without completely ignoring it. This is also consistent with the lower relative cognitive and temporal costs required for clicking the Retweet button compared to composing a Quote Retweet, Reply or Mention. Neutral interactions are treated the same as positive interactions at the moment of signing the relationships. This is because the time spent on a relationship is directly correlated to its strength, as per Granovetter’s definition. Therefore, any active effort made by an individual to communicate with another should, intuitively, be considered positively unless there is reason to think otherwise.

Step 2: label relationships– Next, a sign is computed for each relationship based on the ratio of negative interactions produced by the relationship. Specifically, by applying the golden interaction ratio [13] as a threshold, we determine relationships exhibiting greater than 17% negative interactions as negative, otherwise, the relationship is classified as positive. According to the psychological literature, the former scenario would indicate an unstable relationship, while the latter corresponds to a stable one.

The use of a threshold for determining the relationship signs in the described manner may be inappropriate for relationships that have very few interactions; namely, fewer than 6, given the 1:5 interaction ratio. This point is addressed in Sect. 5.5, where we observe the numbers of interactions at each level of the ENM.

3.2 Choice of sentiment classifier for individual interactions

To check how susceptible the relationship signs are to the choice of model used to label the individual interactions, 4 sentiment analysis models were selected to be compared. Recently, there has been a strong shift towards the use of transformer-based methods for Natural Language Processing (NLP). This is largely due to transformers’ robustness and improved ability to process the sequential aspects of language. Reflecting this shift in focus, and in order to include a variety of high-performing and diverse models, representative of the various approaches proposed in the literature, the models chosen for this study consist of a more traditional, lexicon- and rule-based model and 3 transformer-based models. Indeed, models from these two approaches (VADER and BERT) have previously been compared using Twitter data and were found to have similar F1 performances (0.88 and 0.92 respectively) [34].

All the models were used to obtain relationship signs for the largest of the datasets used in this paper (that being the Snowball dataset, see Sect. 4). The numbers of each label predicted by the 4 models, as well as how often they agreed with each other can be seen in Sect. 5.1.

3.2.1 VADER

The first model, VADER (Valence Aware Dictionary and sEntiment Reasoner), is a well-established sentiment analysis tool developed specifically for use with social media data [35]. VADER provides a compound sentiment score between −1 and 1 for a given text. This score can be converted into a positive label if it is above 0.05, negative if it is below −0.05 or neutral if it is between these values [35]. VADER was compared to 7 state-of-practice alternatives, as well as individual human annotators, using a test set of 4200 Tweets. It obtained an F1 score of 0.99, outperforming all other models and humans [35].

3.2.2 BERTweet

The first BERT-based model used in this paper is BERTweet [36], a version of BERT [37] that has been purposefully optimised for Twitter data. Specifically, it was fine-tuned for the task of sentiment classification using a corpus of 850 million English Tweets collected between January 2012 and March 2020. BERTweet was tested using the SemEval 2017 (Task 4) corpus [38], a common benchmark dataset for sentiment classification, which contains around 50,000 English Tweets; BERTweet achieved an F1 score of 0.73 [36].

3.2.3 XLM-T

The next model is XLM-T [39], a fine-tuned version of XLM-RoBERTa [40]. This latter model is a general NLP model that was trained on 2.5TB of CommonCrawl data, containing 100 languages, which had been filtered following pre-established guidelines based on perplexity [41]. The former was then further trained specifically for sentiment classification using 198 million Tweets from over 60 languages. XLM-T’s performance varies from language to language, but attained a mean F1 score of 0.69 when tested across monolingual datasets for 8 languages (Arabic, English, French, German, Hindi, Italian, Portuguese and Spanish). The F1 scores for 7 of these languages were between 0.69 and 0.78, however, Hindi only reached 0.56, highlighting the model’s difficulty when dealing with certain languages. The English F1 score, 0.71, was obtained using a subset of 3033 Tweets from the SemEval 2017 dataset, thus, this model’s performance seems to be similar to that of BERTweet.

3.2.4 BERT-C

The final model is a downstream version of BERTweet, also fine-tuned for sentiment classification, this time on a classified dataset. This model was released by HuggingFace [42] and it is referred to here as the BERT Classified (BERT-C) model. Although we have no prior metrics for estimating the performance of this model, it is assumed that it will have a performance comparable to that of the original BERTweet model.

3.3 Triad analysis

As previously mentioned (in Sect. 2.3), signed connections in a social network are known to follow certain patterns, predicted by Structural Balance Theory. Thus, in this work, we leverage these expected patterns to validate the relationship signs obtained with our method. In order to form the triads, an interconnected network of users is required.

This is different from the standard data used for computing Ego Networks, where only the interactions between the Ego and the Alters are of interest. For triad analysis, we also need Alter-Alter interactions. The Snowball dataset described in Sect. 4 satisfies this requirement. Thus, each edge of the graph is assigned a sign with the methodology described in Sect. 3.1. The final step entails counting the triad types in the resulting signed graph. This makes it possible to obtain an idea of how under- or overrepresented each triad is and, thus, whether or not the predictions match the expectations of Structural Balance Theory.

In order to rule out that the same sign distribution could have been produced at random from the same background distribution of positive and negatives, we compare the triad counts in the signed graph above with those obtained after shuffling the signs [14]. For statistical reliability, the random shuffling was repeated 10 times and the final results use the mean values. The further away the quantities observed in the real signed graph are from the random ones, the more “surprise” there is and the lower the likelihood of the predictions occurring due to random chance. Here, surprise is defined as the number of standard deviations by which the observed number of Triad i differs from that of the randomly shuffled network with the same proportion of positive and negative signs.

The precise formula (taken from [14]) used for calculating the level of surprise $s(T_{i})$ for the observed number of Triad i is given in Equation (1).

$$ s(T_{i}) = \frac{T_{i} - \mathbb{E}[T_{i}]}{\sqrt{\Delta p_{0}(T_{i})(1 - p_{0}(T_{i}))}} $$

(1)

Here, Δ is the total number of triads in the dataset, $p_{0}(T_{i})$ is the fraction of $T_{i}$ triads to be expected in the network given a random distribution of signs, and $\mathbb{E}[T_{i}]$ is the expected number of triads $T_{i}$ in the randomly shuffled model. $s(T_{i})$ effectively measures the number of standard deviations by which the actual quantity of $T_{i}$ triads differs from the expected number under the randomly shuffled model. The denominator in Equation (1) corresponds to the standard deviation of a binomial distribution where the success probability is $p_{0}(T_{i})$ and the number of trials are Δ.

3.4 Computation of Signed Ego Networks

The computation of the Ego Networks is achieved by first computing the frequency of interaction between each Ego-Alter pair and then clustering the Alters based on these frequencies. This method is well-established and has previously been done using a variety of different clustering algorithms; including k-means [43], DBSCAN [44] and MeanShift [45]. MeanShift is used for this paper as it is one of the most commonly used algorithms and it also automatically finds the optimum number of clusters (corresponding to the number of circles in the Ego Network, into which the Alters are organised). The signs of the Ego Network relationships are computed separately, in the manner previously described. These signs are then matched to each Ego-Alter relationship in the Ego Networks, resulting in Signed Ego Networks.

3.5 Negativity metrics

Given the obvious differences in the effects that positive and negative interactions can have on a relationship, an additional investigation was conducted to examine whether interactions and relationships of differing sentiments exert different amounts of cognitive effort. Given that negative information is generally harder and more time-consuming for humans to process [46], one would expect negative relationships to be more cognitively demanding than positive ones. Therefore, the hypothesis we tested is whether greater numbers of negative relationships are associated with smaller active Ego Networks. For this analysis, the mean active Ego Network sizes of users with an optimum number of circles equal to 5 were compared. Note that it is standard practice in ENM research [32, 47, 48] to focus analyses on nodes with 5 circles. This choice is justified by its frequent occurrence as the optimum number of circles, ensuring a robust sample size for statistical reliability, as seen in related studies. This was also done for this paper because the analysis of the optimal number of layers across the users in the datasets (see Fig. 8 in Sect. 5.5) shows that the 5-circle case is the most common. The users’ levels of negativity were measured using 3 different metrics. Before introducing their formal definitions, let us denote with $\mathcal{A}_{i}$ the set of Alters in the active Ego network of Ego i. Considering the signs of the relationships with the Alters, we can also split $\mathcal{A}_{i}$ into $\mathcal{A}_{i}^{+}$ and $\mathcal{A}_{i}^{-}$, for Alters whose relationship with the Ego i is positive and negative, respectively. Further, we denote with $n_{ij}^{+}$ and $n_{ij}^{-}$ the number of positive and negative interactions between Ego i and Alter j. We denote their sum as $n_{ij}$. Leveraging this notation, the first negativity metric $l_{1} $ corresponds to the proportion of negative relationships, i.e. the number of negative relationships that each Ego had, divided by their total number of relationships:

$$ l_{1}(i) = \frac{ \vert \mathcal{A}_{i}^{-} \vert }{ \vert \mathcal{A}_{i} \vert }. $$

(2)

The second negativity metric measures the proportion of negative interactions, even if they belong to positive relationships, i.e. the number of negative interactions for each Ego divided by their total number of interactions:

$$ l_{2}(i) = \frac{\sum_{j \in \mathcal{A}_{i}} n_{i,j}^{-}}{\sum_{j \in \mathcal{A}_{i}} n_{i,j}}. $$

(3)

Finally, the third negativity metric follows the proportion of interactions that belong to negative relationships, even if the interaction itself is positive, i.e. the number of each Ego’s interactions that correspond to a negative relationship divided by their total number of interactions:

$$ l_{3}(i) = \frac{\sum_{j \in \mathcal{A}_{i}^{-}} n_{i,j}}{\sum_{j \in \mathcal{A}_{i}} n_{i,j}}. $$

(4)

When compared against the Ego Network size, the first of these metrics directly investigates the cognitive effects of maintaining negative relationships regardless of how often we interact with said negative contacts. The latter two metrics take a more fine-grained look at the role of interactions. Indeed, the second metric gauges whether negative interactions, rather than relationships, have a different impact on cognitive effort, even if the negative interaction is with someone we have a positive relationship with. The third metric checks whether interacting with negative relationships elicits a different level of cognitive effort, even if some of the interactions are positive.

The values of the metrics are defined between 0 and 1 (inclusive) and the Egos in each dataset were grouped into bins based on their negativity values for each of the 3 negativity metrics. This ensures that all the bins of a given dataset contain similar numbers of Egos, although it does mean that the bin boundaries change between dataset and metric. The Egos’ negativities were then compared to the sizes of their Ego Networks (the results are discussed in Sect. 5.7).

4 Datasets

All of the data used in this paper were collected from Twitter using the official Twitter Developer API. Twitter has long been a reliable source of Ego Network data due to its vast and active userbase as well as providing mostly public data. At the time of collection, the standard Twitter API allowed the most recent 3200 public Tweets created by a given user to be collected. These Tweets are referred to collectively as the user’s Timeline. Although this may not correspond to all the Tweets a user has created, this has been shown to be a significant quantity of information to generate meaningful Ego Networks (e.g. [7, 31, 49]).

In total, 9 datasets were used. These were collected from previous works and represent a mixture of specialised users, who use Twitter mainly for professional reasons, and generic users, who use the platform primarily for social reasons. The distinction between these two types of users is important as they have been observed to exhibit differing behaviours in certain online contexts [18]. Information describing these datasets in terms of the numbers of Egos, Alters, relationships and interactions they contain can be seen in Table 1 and Table 2, the former containing all collected users and the latter containing only the users that remained after the preprocessing steps detailed in Sect. 4.3.

Table 1 Number of Egos, Alters, relationships and interactions in the full Ego Networks, before removing unengaged users (as described in Sect. 4.3)

Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter

Abstract

Similar content being viewed by others

Structural Analysis of Directed Signed Networks

Signed Social Networks: A Survey

Influence and Sentiment Homophily on Twitter Social Circles

1 Introduction

1.1 Contributions

2 Background

2.1 Ego Network Model

2.2 Signed networks

2.3 Structural balance theory

3 Methodology

3.1 Signing relationships

3.2 Choice of sentiment classifier for individual interactions

3.2.1 VADER

3.2.2 BERTweet

3.2.3 XLM-T

3.2.4 BERT-C

3.3 Triad analysis

3.4 Computation of Signed Ego Networks

3.5 Negativity metrics

4 Datasets

4.1 Specialised users

Journalists

Science Writers

British Members of Parliament (MPs)

4.2 Generic users

Monday Motivation

UK Users

Snowball

4.3 Preprocessing

5 Results

5.1 Sensitivity of signing method to sentiment classifier

5.2 Validation via triad analysis

5.3 Negative relationships in full and active networks

5.4 Negative relationships of specialised and generic users

5.5 Circle-by-circle analysis of the ENM

5.6 Circle-by-circle analysis of the SENM

5.7 Negativity metrics

6 Conclusion

Data availability

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Appendices

Appendix A: Sentiment model disagreements

1.1 A.1 VADER

1.2 A.2 BERTweet

1.3 A.3 XLM-T

1.4 A.4 BERT-C

Appendix B: Investigation of users’ interactions

Appendix C: Negativity metric boxplots

3.1 C.1 American journalists

3.2 C.2 Australian journalists

3.3 C.3 British journalists

3.4 C.4 NYT journalists

3.5 C.5 Science writers

3.6 C.6 British MPs

3.7 C.7 Monday motivation

3.8 C.8 UK users

3.9 C.9 Snowball

Appendix D: Negativity metric t-scores

4.1 D.1 Active egonetwork sizes

4.2 D.2 Number of interactions

Rights and permissions

About this article

Cite this article

Share this article

Keywords