1 Introduction

Online social networks (OSN) can be seen as a social microscope to investigate the properties of our social interactions in the online world. The increasing global connectivity underscores the significance of understanding social networks and the interactions that occur within them. Social network analysis has extensively employed graph-based models to study the structural characteristics of relationships. One such representation, the Ego Network Model (ENM), is rooted in evolutionary anthropology research on how humans structure their social networks [1]. The ENM model is centred around a single user, the Ego, and portrays all their immediate connections, named Alters, based on their relationship strength to the Ego. This results in a series of concentric circles with increasing size but decreasing intimacy, as illustrated in Fig. 1. The number and sizes of the circles are generally consistent, with an average of around 5, 15, 50, and 150 Alters [2]. The size ratio between them is also quite consistent, with a value close to 3 [3]. Note that an ENM only contains meaningful relationships, i.e. those the Ego spends some time nurturing regularly.

Figure 1
figure 1

The Ego Network Model, with the names and expected sizes of each subgroup for social networks of humans

The importance of the ENM is due in large part to its omnipresence in social networks. Indeed, its structure is prevalent across an extremely diverse range of social communities; including traditional hunter-gatherer groups, small-scale horticultural societies, ancient Roman armies and modern-day military units [4]. The ENM is so prevalent that it can even be observed in many non-human primate species, although with smaller group sizes [5]. The Social Brain Hypothesis proposed by Dunbar explains this pervasiveness, positing that primates have a cognitive limit that restricts the size and complexity of social groups they can maintain. For humans, this limit is approximately 150, also known as Dunbar’s number. When the limit is exceeded, social groups tend to become unstable and fragment into smaller, more manageable groups [6]. Although one might assume that the ease of online communication would require less cognitive effort and therefore allow for larger social networks to be maintained, the ENM structure remains largely consistent in online contexts. The only notable difference is the occasional presence of an additional innermost circle, with an average size of around 1.5 Alters [7]. While this has been postulated for offline networks as well, quantities of data sufficient enough to confirm its existence in offline contexts have never been available.

Furthermore, because each individual in a social network can be viewed as an Ego, the entire network itself can be thought of as a collection of interconnected Ego Networks. Thus, observing a network from the perspectives of the individual Egos can reveal insights that are only visible at a microscopic scale, yet have far-reaching consequences across the entire network. Indeed, the structural properties of the ENM have been shown to influence a number of social behaviours, such as collaboration and information diffusion [8].

Despite its ability to provide many insights, the ENM does have some notable limitations. One such drawback is how the tie strength between Egos and Alters is measured, which has traditionally been done by measuring their frequency of interactions. While this has been shown to be a good proxy measure for the strength of a relationship [9], not all relationships can be differentiated merely by their strength. For example, an individual with a supportive coworker and an angry neighbour will have two very different relationships: even though the interaction frequencies may be very similar, the former relationship will be far more positive than the latter. One way to include some of the important qualitative information that is being lost is to use a signed representation of the network, known as a signed network. Each connection in a signed network has a polarity (+/−) indicating either a positive or negative link. The former denotes friendship, trust, and similarity, whilst the latter is associated with hatred and distrust. Positive and negative relationships play different roles in a network and can be leveraged to improve network-related tasks, such as community detection [10] and opinion dynamics [11]. Negative links are more informative than positive ones because, among other things, they are usually located along social divisions in a network, such as between two communities, and they can therefore reveal important information about the structure of the overall network [12]. Thus, the inclusion of signs may improve our understanding of the ENM and social networks in general. However, the sign of an OSN relationship is an implicit piece of information, which typically needs to be estimated by interaction data, making sign prediction in OSNs a research challenge in and of itself.

1.1 Contributions

In this work, we set out to extend the Ego Network Model with information about the signs of relationships. To this aim, we propose a novel method, grounded in quantitative results from psychology [13], of inferring signed relationships in unsigned network data (which are typically used to build ego networks), allowing an unsigned network to be converted into a signed one. This method (i) requires only text-based interactions to sign a relationship (hence, it can be applied to any network in which users interact principally via text, i.e. in the vast majority of popular OSNs), (ii) is designed for the short texts typical of OSNs interactions, (iii) requires only data about the interactions over the links we want to sign (hence scales linearly with them). Note that, while signing individual interactions between users simply boils down to attaching a sentiment to the interaction (typically with a sentiment classifier), signing relationships is more nuanced, as it implies deciding on an overall sentiment that captures the whole relation, and, for this sign to reflect human perception, we decided to ground our approach in psychology. This methodology is then shown to be robust to the chosen sentiment classifier for individual interactions and produces results that are consistent with Structural Balance Theory [14].

The second original contribution is the analysis of Signed Ego Networks (SENs), i.e. Ego Networks where edges have a polarity. This was done by obtaining unsigned Ego Networks, for 9 Twitter datasets, and applying the aforementioned method of generating signs to them. The unsigned and signed versions of the networks are analysed, including the distribution of signed links across the various circles of the SEN. The main findings are that: (i) Twitter users engage in much more negative relationships than expected in the Active Networks (illustrated in Fig. 1), (ii) specialised users (e.g. journalists) do so to an even higher extent, (iii) negative relationships are particularly present in the intimate Ego Network layers of specialised users, and (iv) there is evidence for a potential weak effect of negativity leading to a slightly higher-than-average number of distinct connections, but fewer interactions in each relationship. All in all, the results confirm the popular notion that higher engagement in online social interactions results in being exposed to increasingly negative relationships and sentiments. They also extend beyond this with the surprising revelation that negative relationships tend to be proportionally more present in the social circles of the Ego Networks closer to the Ego.

Some preliminary results on the Signed Ego Network Model (SENM) were first presented in [15]. These were then expanded on in [16], where the generalisability of the SENM was observed across several cultures and types of communities. The main extensions of this current work are the following. First, the robustness of the method of signing relationships is tested using 4 different sentiment analysis models for labelling individual interactions (Sect. 5.1). The results show that the proportions of positive and negative relationships were similar for all 4 of the models. Furthermore, the models agreed on the signs of around 70-80% of the relationships and when the models did disagree, the disagreements tended to be very close to the threshold used for signing the relationships (i.e. when the models disagreed, they tended to only disagree slightly). Next, the method of signing relationships is further validated via triad analysis (Sect. 5.2). Specifically, repeated analysis of the signed triads produced by each of the 4 models shows that the distribution of signs produced by this method fitted expectations of known psychological effects in social networks (i.e. Structural Balance Theory). These distributions are also extremely and significantly different from what would be obtained by chance. Finally, we have included an analysis of the impact of negative social relationships on the cognitive effort of the Ego (Sect. 5.7).

2 Background

2.1 Ego Network Model

As previously mentioned, the ENM is centred around an individual Ego, who is surrounded by their Alters, organised in a series of concentric circles. The ENM stems from the anthropological Social Brain Hypothesis [5], which posits that the social capabilities of primates are constrained by the sizes of their neocortices. Based on the size of our own neocortex, the maximum social group size that can be maintained by a human is estimated to be around 150 (the famous Dunbar’s number). Note that these 150 contacts with whom a person engages do not include acquaintances, rather they are exclusively relationships that are regularly nurtured. Traditionally, this has been defined as a minimum interaction frequency of at least once a year; for example, exchanging annual holiday wishes. These relationships constitute the so-called active part of the Ego Network.

Of course, the frequency and importance of the interactions generated by each relationship varies significantly from Alter to Alter. Indeed, by arranging the Alters based on their tie strength to the Ego, the aforementioned concentric structure will typically emerge [2, 3], with each subsequent circle containing the Alters of the previous ones (thus, the size of the active part of the Ego Network is equivalent to its outermost circle). Both the number of circles (approximately 4 or 5) and their sizes – 1.5, 5, 15, 50, 150 – are fairly regular, in offline and online social networks [7].

As the tie strength between Ego and Alter directly determines which circle the Alters are placed into, this is obviously a core concept of the ENM. Tie strength was defined by Granovetter as the equally weighted combination of 4 elements in a relationship: the time spent maintaining it, its emotional intensity, its level of intimacy and the reciprocal services it generates [17]. This definition can be a crucial consideration for understanding how various users interact socially. For example, individuals who engage in OSNs for professional purposes may devote more time to social platforms, thereby generating more reciprocal services and investing greater amounts of time in maintaining relationships. Indeed, it has previously been suggested that journalists are likely to be more cognitively engaged with Twitter than other types of users [18]. While the time spent maintaining a relationship is just one of the tie strength dimensions described by Granovetter, it has largely been the sole focus of the related literature on Ego Networks due to its widespread availability and ease of computation (using the number of interactions as its proxy). Therefore, the objective of this work is to advance the state of the art by exploring the hitherto underrepresented qualitative aspects of tie strength, in addition to the traditional metric of the time spent maintaining them.

2.2 Signed networks

In contrast to unsigned networks, whose connections are either binary (i.e. a connection between two users either exists or doesn’t) or weighted connections (usually based on tie strength), signed networks feature connections that can be further distinguished as either positive or negative (sometimes referred to as the polarity of edges [19]). Positive links indicate positive relationships and are used to infer trust and homogeneity [20]. On the other hand, negative links indicate negative relationships, distrust, and dissimilarities. Therefore, signed networks contain additional information that can be leveraged to enhance the performance of many tasks, such as community detection [21] and information diffusion [22].

Previous research on networks with publicly available signed connections has revealed that negative connections are significantly less prevalent than positive connections, accounting for approximately 15.0% to 22.6% of the total connections within a network [14]. In these networks, the users’ awareness of link polarity may intensify social pressure and effects such as social capital [23], whereby relationships between individuals who have many relationships in common are more likely to be positive due to social pressure from the surrounding community to get along. Conversely, even if an unsigned network contains implicit positive and negative relationships, the lack of explicitly visible negative links results in lower social pressure. Therefore, we can anticipate that networks without explicit signed relationships will have a higher proportion of negative relations than those with explicitly signed ones. We will investigate this hypothesis further in Sect. 5.

Despite the added advantages of signed networks, they are rarely the focus of research because the vast majority of popular social platforms do not allow users to create explicitly negative links. This makes it very difficult to obtain signed network data in sufficient enough quantities for in-depth analysis. Nevertheless, some exceptions do exist, most notably Slashdot and Epinions, which have provided two of the most widely used benchmark datasets for signed networks [19]. Unfortunately, these datasets do not provide information on interaction frequencies and therefore cannot be used for Ego Network analysis. ENM studies typically use Twitter data (due to their public nature and easy access via the Twitter API) but Twitter does not provide explicit relationship signs between users. However, just as with real-world relationships, relationships that take place online usually contain implicit information about their polarity, which can potentially be gleaned from the interactions they produce [20].

Several approaches have been developed to predict the signs of unsigned networks. However, most of these focus on the structural aspects of the surrounding network in order to deduce the sign of a connection (e.g. by leveraging topological notions like the clustering coefficient [24]), which is an indirect way of extracting signs, without looking directly at how people communicate with each other. Classification algorithms, trained on preexisting datasets with known signs, have also been used to compute the signs of novel networks [25]. All these techniques have taken a top-down perspective, viewing the network’s features as a whole and inferring signs based on the structure of the connections. However, if the inverse approach is taken, viewing the problem from the bottom up, then it is possible to take into consideration the more tacit aspects of connections that have largely gone uninvestigated, as we discuss below.

The basic building blocks that form a relationship are the interactions and exchanges between users and their corresponding sentiments. Sentiment analysis for individual exchanges is extremely well established [26]. This allows signs to be obtained for these singular interactions with an extremely high degree of confidence. However, methods for extending the signs of these bottom-level interactions to whole series of interactions, or relationships, have not received anywhere near the same level of scientific interest. One study [27] that has previously examined this problem trained a Support Vector Machine (SVM) on a manually-annotated dataset of relationships in discussion forums. The SVM took in 4 user features and 3 interaction features and achieved an accuracy of 0.835 on a subset of annotated data. Unfortunately, this approach cannot be directly replicated for Twitter interactions due to their very short and unstructured nature compared to discussion forums. In addition, there is a lack of publicly available ground truth data for Twitter relationships. In response to these problems, we propose an alternative approach that is specifically designed for dealing with short texts and can leverage models that have been established within the previous literature in order to obtain the sentiment of individual interaction.

2.3 Structural balance theory

Signed networks are known to conform to certain properties and configurations. A theory that lays out such a set of informative expectations is Structural Balance Theory [28, 29]; a psychological theory, which postulates that certain configurations of signed triads (i.e. groups of three individuals who are all interconnected by signed edges) should be more common than others when observed across a social network.Footnote 1 This is because connections are not independent but rather influenced by the other connections in the surrounding network. With regards to signed triads, those with odd numbers of positive connections, i.e. one and three, are considered plausible, or “balanced” (see T3 and T1 in Fig. 2), while those with even numbers of positive connections, i.e. two or zero, are considered implausible, or “unbalanced” (see T2 and T0 in Fig. 2). This is because these latter configurations correspond to socially problematic situations: the first, where one individual has two friends who are enemies, and the second, where all three individuals are hostile to one another and none of them decide to pair up against the third. However, a more lenient variant of this theory, commonly known as Weak Structural Balance Theory, argues that it should not be unexpected to have a situation in which three enemies refuse to team up (T0) or for two friends to have a common enemy (T1). Therefore, one should only expect triads with exactly two positive connections (T2) to be underrepresented and only triads with three positive connections (T3) to be overrepresented, with no expectations for T1 or T0 [30].

Figure 2
figure 2

All four possible signed triads, as per Structural Balance Theory. The subscript number following the “T” corresponds to the number of positive connections for that triad

Given the expectations of Structural Balance Theory, it is possible to validate the predicted signs of a network by analysing the resulting triads [27] and comparing them to the expected numbers of each triad if the signs were distributed at random. This is indeed the approach we use to validate our method for signing relationships. Previously, it has been found that the expectations of the weaker version of Structural Balance Theory tend to fit online datasets better than those of the original theory [14], so this is the version we use in our analysis. While, ideally, the results would also have been validated using a manually-annotated ground truth or a known model, as discussed at the end of the previous section, validation via Structural Balance Theory has been shown to perform more than adequately in the literature [27]. The exact methodology used for this is given in Sect. 3.3.

3 Methodology

This section outlines the methodology for obtaining Signed Ego Networks, assuming that the input data is taken from Twitter (Twitter being the de-facto standard for data in the relevant literature [7, 18, 31, 32]). Our methodology comprises three steps: first, we attach a sign to each relationship based on the signs of individual interactions (Sects. 3.1 and 3.2); then, we validate the obtained relationship signs against Structural Balance Theory (Sect. 3.3); finally, we enrich the standard Ego Network Model by transposing the sign information onto it (Sect. 3.4). Afterwards, in Sect. 3.5, we discuss how to measure the burden of negative relationships on overall social cognitive capacity.

In order to construct Ego Networks, it is necessary to acquire Tweets that involve direct communications between Twitter users. These communications occur when users explicitly reply to another’s post (Replies), mention another user using the “@” symbol (Mentions) or share another user’s Tweet (Retweets). This latter case is sometimes accompanied by an additional piece of text made by the sharing user (Quote Retweets). Each of these directed Tweets corresponds to an interaction between an Ego and an Alter. While some of these interactions may involve the wider network beyond the specific Alter, they nonetheless reflect a cognitive involvement of the Ego towards the Alter, which is the most critical characteristic for mapping an interaction to a specific social relationship [5].

3.1 Signing relationships

As anticipated in the introduction, in this work we take a bottom-up approach to sign extraction, inferring signs from the sentiment of individual interactions. Indeed, the effects of positive and negative exchanges have been studied in a variety of contexts. One such observation that is particularly relevant here is that a ratio of around 1 negative interaction for every 5 positive interactions, or roughly 17%, appears to be an important tipping point for numerous different types of relationships. Once this threshold is crossed, marriages become significantly less likely to last [13] and, for parent-child relationships, children are more likely to underperform at school and have developmental problems [33].

This ratio, which we will refer to as the golden interaction threshold, is leveraged for our proposed method for signing relationships, which culminates in a binary classification (positive or negative) for each Ego-Alter pair. More precisely, our method consists of 2 main steps:

Step 1: label single interactions– First, sentiment analysis is carried out to obtain a positive, neutral or negative label for each text-based communication Tweet made by an Ego towards one of their Alters.Footnote 2 The models used for the sentiment analysis of single interactions are discussed in Sect. 3.2.

The sentiment analysis was done for Replies, Mentions and Quote Retweets. Regular Retweets are instead always classified as neutral because they were not originally written by the Ego and, therefore, do not reflect the same level of cognitive effort. Returning to Granovetter’s definition, these regular Retweets can be regarded instead as a reciprocal service generated by a relationship because they correspond to an Ego’s desire to share the content of an Alter. In addition, automatically assigning a neutral sentiment to regular Retweets reduces their relative impact on the overall sign of a relationship without completely ignoring it. This is also consistent with the lower relative cognitive and temporal costs required for clicking the Retweet button compared to composing a Quote Retweet, Reply or Mention. Neutral interactions are treated the same as positive interactions at the moment of signing the relationships. This is because the time spent on a relationship is directly correlated to its strength, as per Granovetter’s definition. Therefore, any active effort made by an individual to communicate with another should, intuitively, be considered positively unless there is reason to think otherwise.

Step 2: label relationships– Next, a sign is computed for each relationship based on the ratio of negative interactions produced by the relationship. Specifically, by applying the golden interaction ratio [13] as a threshold, we determine relationships exhibiting greater than 17% negative interactions as negative, otherwise, the relationship is classified as positive. According to the psychological literature, the former scenario would indicate an unstable relationship, while the latter corresponds to a stable one.

The use of a threshold for determining the relationship signs in the described manner may be inappropriate for relationships that have very few interactions; namely, fewer than 6, given the 1:5 interaction ratio. This point is addressed in Sect. 5.5, where we observe the numbers of interactions at each level of the ENM.

3.2 Choice of sentiment classifier for individual interactions

To check how susceptible the relationship signs are to the choice of model used to label the individual interactions, 4 sentiment analysis models were selected to be compared. Recently, there has been a strong shift towards the use of transformer-based methods for Natural Language Processing (NLP). This is largely due to transformers’ robustness and improved ability to process the sequential aspects of language. Reflecting this shift in focus, and in order to include a variety of high-performing and diverse models, representative of the various approaches proposed in the literature, the models chosen for this study consist of a more traditional, lexicon- and rule-based model and 3 transformer-based models. Indeed, models from these two approaches (VADER and BERT) have previously been compared using Twitter data and were found to have similar F1 performances (0.88 and 0.92 respectively) [34].

All the models were used to obtain relationship signs for the largest of the datasets used in this paper (that being the Snowball dataset, see Sect. 4). The numbers of each label predicted by the 4 models, as well as how often they agreed with each other can be seen in Sect. 5.1.

3.2.1 VADER

The first model, VADER (Valence Aware Dictionary and sEntiment Reasoner), is a well-established sentiment analysis tool developed specifically for use with social media data [35]. VADER provides a compound sentiment score between −1 and 1 for a given text. This score can be converted into a positive label if it is above 0.05, negative if it is below −0.05 or neutral if it is between these values [35]. VADER was compared to 7 state-of-practice alternatives, as well as individual human annotators, using a test set of 4200 Tweets. It obtained an F1 score of 0.99, outperforming all other models and humans [35].

3.2.2 BERTweet

The first BERT-based model used in this paper is BERTweet [36], a version of BERT [37] that has been purposefully optimised for Twitter data. Specifically, it was fine-tuned for the task of sentiment classification using a corpus of 850 million English Tweets collected between January 2012 and March 2020. BERTweet was tested using the SemEval 2017 (Task 4) corpus [38], a common benchmark dataset for sentiment classification, which contains around 50,000 English Tweets; BERTweet achieved an F1 score of 0.73 [36].

3.2.3 XLM-T

The next model is XLM-T [39], a fine-tuned version of XLM-RoBERTa [40]. This latter model is a general NLP model that was trained on 2.5TB of CommonCrawl data, containing 100 languages, which had been filtered following pre-established guidelines based on perplexity [41]. The former was then further trained specifically for sentiment classification using 198 million Tweets from over 60 languages. XLM-T’s performance varies from language to language, but attained a mean F1 score of 0.69 when tested across monolingual datasets for 8 languages (Arabic, English, French, German, Hindi, Italian, Portuguese and Spanish). The F1 scores for 7 of these languages were between 0.69 and 0.78, however, Hindi only reached 0.56, highlighting the model’s difficulty when dealing with certain languages. The English F1 score, 0.71, was obtained using a subset of 3033 Tweets from the SemEval 2017 dataset, thus, this model’s performance seems to be similar to that of BERTweet.

3.2.4 BERT-C

The final model is a downstream version of BERTweet, also fine-tuned for sentiment classification, this time on a classified dataset. This model was released by HuggingFace [42] and it is referred to here as the BERT Classified (BERT-C) model. Although we have no prior metrics for estimating the performance of this model, it is assumed that it will have a performance comparable to that of the original BERTweet model.

3.3 Triad analysis

As previously mentioned (in Sect. 2.3), signed connections in a social network are known to follow certain patterns, predicted by Structural Balance Theory. Thus, in this work, we leverage these expected patterns to validate the relationship signs obtained with our method. In order to form the triads, an interconnected network of users is required.

This is different from the standard data used for computing Ego Networks, where only the interactions between the Ego and the Alters are of interest. For triad analysis, we also need Alter-Alter interactions. The Snowball dataset described in Sect. 4 satisfies this requirement. Thus, each edge of the graph is assigned a sign with the methodology described in Sect. 3.1. The final step entails counting the triad types in the resulting signed graph. This makes it possible to obtain an idea of how under- or overrepresented each triad is and, thus, whether or not the predictions match the expectations of Structural Balance Theory.

In order to rule out that the same sign distribution could have been produced at random from the same background distribution of positive and negatives, we compare the triad counts in the signed graph above with those obtained after shuffling the signs [14]. For statistical reliability, the random shuffling was repeated 10 times and the final results use the mean values. The further away the quantities observed in the real signed graph are from the random ones, the more “surprise” there is and the lower the likelihood of the predictions occurring due to random chance. Here, surprise is defined as the number of standard deviations by which the observed number of Triad i differs from that of the randomly shuffled network with the same proportion of positive and negative signs.

The precise formula (taken from [14]) used for calculating the level of surprise \(s(T_{i})\) for the observed number of Triad i is given in Equation (1).

$$ s(T_{i}) = \frac{T_{i} - \mathbb{E}[T_{i}]}{\sqrt{\Delta p_{0}(T_{i})(1 - p_{0}(T_{i}))}} $$
(1)

Here, Δ is the total number of triads in the dataset, \(p_{0}(T_{i})\) is the fraction of \(T_{i}\) triads to be expected in the network given a random distribution of signs, and \(\mathbb{E}[T_{i}]\) is the expected number of triads \(T_{i}\) in the randomly shuffled model. \(s(T_{i})\) effectively measures the number of standard deviations by which the actual quantity of \(T_{i}\) triads differs from the expected number under the randomly shuffled model. The denominator in Equation (1) corresponds to the standard deviation of a binomial distribution where the success probability is \(p_{0}(T_{i})\) and the number of trials are Δ.

3.4 Computation of Signed Ego Networks

The computation of the Ego Networks is achieved by first computing the frequency of interaction between each Ego-Alter pair and then clustering the Alters based on these frequencies. This method is well-established and has previously been done using a variety of different clustering algorithms; including k-means [43], DBSCAN [44] and MeanShift [45]. MeanShift is used for this paper as it is one of the most commonly used algorithms and it also automatically finds the optimum number of clusters (corresponding to the number of circles in the Ego Network, into which the Alters are organised). The signs of the Ego Network relationships are computed separately, in the manner previously described. These signs are then matched to each Ego-Alter relationship in the Ego Networks, resulting in Signed Ego Networks.

3.5 Negativity metrics

Given the obvious differences in the effects that positive and negative interactions can have on a relationship, an additional investigation was conducted to examine whether interactions and relationships of differing sentiments exert different amounts of cognitive effort. Given that negative information is generally harder and more time-consuming for humans to process [46], one would expect negative relationships to be more cognitively demanding than positive ones. Therefore, the hypothesis we tested is whether greater numbers of negative relationships are associated with smaller active Ego Networks. For this analysis, the mean active Ego Network sizes of users with an optimum number of circles equal to 5 were compared. Note that it is standard practice in ENM research [32, 47, 48] to focus analyses on nodes with 5 circles. This choice is justified by its frequent occurrence as the optimum number of circles, ensuring a robust sample size for statistical reliability, as seen in related studies. This was also done for this paper because the analysis of the optimal number of layers across the users in the datasets (see Fig. 8 in Sect. 5.5) shows that the 5-circle case is the most common. The users’ levels of negativity were measured using 3 different metrics. Before introducing their formal definitions, let us denote with \(\mathcal{A}_{i}\) the set of Alters in the active Ego network of Ego i. Considering the signs of the relationships with the Alters, we can also split \(\mathcal{A}_{i}\) into \(\mathcal{A}_{i}^{+}\) and \(\mathcal{A}_{i}^{-}\), for Alters whose relationship with the Ego i is positive and negative, respectively. Further, we denote with \(n_{ij}^{+}\) and \(n_{ij}^{-}\) the number of positive and negative interactions between Ego i and Alter j. We denote their sum as \(n_{ij}\). Leveraging this notation, the first negativity metric \(l_{1} \) corresponds to the proportion of negative relationships, i.e. the number of negative relationships that each Ego had, divided by their total number of relationships:

$$ l_{1}(i) = \frac{ \vert \mathcal{A}_{i}^{-} \vert }{ \vert \mathcal{A}_{i} \vert }. $$
(2)

The second negativity metric measures the proportion of negative interactions, even if they belong to positive relationships, i.e. the number of negative interactions for each Ego divided by their total number of interactions:

$$ l_{2}(i) = \frac{\sum_{j \in \mathcal{A}_{i}} n_{i,j}^{-}}{\sum_{j \in \mathcal{A}_{i}} n_{i,j}}. $$
(3)

Finally, the third negativity metric follows the proportion of interactions that belong to negative relationships, even if the interaction itself is positive, i.e. the number of each Ego’s interactions that correspond to a negative relationship divided by their total number of interactions:

$$ l_{3}(i) = \frac{\sum_{j \in \mathcal{A}_{i}^{-}} n_{i,j}}{\sum_{j \in \mathcal{A}_{i}} n_{i,j}}. $$
(4)

When compared against the Ego Network size, the first of these metrics directly investigates the cognitive effects of maintaining negative relationships regardless of how often we interact with said negative contacts. The latter two metrics take a more fine-grained look at the role of interactions. Indeed, the second metric gauges whether negative interactions, rather than relationships, have a different impact on cognitive effort, even if the negative interaction is with someone we have a positive relationship with. The third metric checks whether interacting with negative relationships elicits a different level of cognitive effort, even if some of the interactions are positive.

The values of the metrics are defined between 0 and 1 (inclusive) and the Egos in each dataset were grouped into bins based on their negativity values for each of the 3 negativity metrics. This ensures that all the bins of a given dataset contain similar numbers of Egos, although it does mean that the bin boundaries change between dataset and metric. The Egos’ negativities were then compared to the sizes of their Ego Networks (the results are discussed in Sect. 5.7).

4 Datasets

All of the data used in this paper were collected from Twitter using the official Twitter Developer API. Twitter has long been a reliable source of Ego Network data due to its vast and active userbase as well as providing mostly public data. At the time of collection, the standard Twitter API allowed the most recent 3200 public Tweets created by a given user to be collected. These Tweets are referred to collectively as the user’s Timeline. Although this may not correspond to all the Tweets a user has created, this has been shown to be a significant quantity of information to generate meaningful Ego Networks (e.g. [7, 31, 49]).

In total, 9 datasets were used. These were collected from previous works and represent a mixture of specialised users, who use Twitter mainly for professional reasons, and generic users, who use the platform primarily for social reasons. The distinction between these two types of users is important as they have been observed to exhibit differing behaviours in certain online contexts [18]. Information describing these datasets in terms of the numbers of Egos, Alters, relationships and interactions they contain can be seen in Table 1 and Table 2, the former containing all collected users and the latter containing only the users that remained after the preprocessing steps detailed in Sect. 4.3.

Table 1 Number of Egos, Alters, relationships and interactions in the full Ego Networks, before removing unengaged users (as described in Sect. 4.3)
Table 2 Number of Egos, Alters, relationships and interactions in the active networks of each dataset, after removing unengaged users (as described in Sect. 4.3)

4.1 Specialised users

Journalists

The first set of specialised users contains data from journalists. This set consists of 3 datasets that were originally collected during a previous study, which observed the Ego Networks of journalists from 17 different countries across the globe [18]. Unfortunately, many of these datasets contained, either entirely or in large part, non-English Tweets. The sentiment analysis of non-English tweets would introduce an additional level of complexity (since the vast majority of tools are trained and optimised for the English language) without contributing to the scope of the paper. Therefore, only data from anglophone countries were included in the present study; specifically: the United States of America, Australia and the United Kingdom. The American and Australian datasets were collected in May 2018 and the British dataset was collected in January 2018, using existing lists of Twitter journalists (validated in [47]).

In addition to these, another set of journalist data was taken from a different study [50]. This dataset was collected from a list of New York Times journalists, created by the New York Times itself. All the users from this list were downloaded in February 2018. This dataset will be referred to as NYT Journalists.

Science Writers

The next dataset of specialised users contains science writers. Again, these are users who use Twitter for professional means, albeit to a potentially different extent compared to journalists. This dataset was collected using a curated list of science writers, created by a writer at Scientific American, Jennifer Frazer. Its Timelines were gathered in June 2018, as part of a previous study [50].

British Members of Parliament (MPs)

The final specialised dataset was collected during the preliminary investigation of SENMs [15]. This one includes the Timelines of members of the British Parliament, taken from a publicly available list provided by UKinbound [51]. These Timelines were collected in March 2022. At the time of collection of this dataset, Twitter allowed academics to retrieve full user timelines (i.e. not just the first 3200 Tweets), however, for the sake of comparison with previous work, we limited our analysis to include only the first 3200 Tweets for each user.

4.2 Generic users

Monday Motivation

The first generic dataset consisted of users who tweeted in English using the hashtag #MondayMotivation on 16th January 2020. The Timelines of these users were then collected in January 2020, during a previous study [50].

UK Users

The second generic dataset came from a random sample of all users who tweeted in English from the United Kingdom on February 11th 2020. These users’ Timelines were collected in February 2020, as part of a previous study [50].

Snowball

The final dataset, taken from a cross-cultural analysis of SENMs [16] (in which it was referred to as Baseline) consists of a collection of interconnected Ego Networks, collected using a snowball sampling methodology. Specifically, an initial set of 31 interconnected seed users were selected, pseudorandomly to ensure a degree of interconnectivity between the seeds, from another preexisting dataset, which itself was collected using a snowball sampling starting from Barack Obama [52]. The timelines of these users were then collected, followed by those of their Alters and then of their Alters’ Alters. This means that Egos have common Alters and can be themselves Alters for other Egos, which is an important distinction as it is a requirement for carrying out Structural Balance analysis (see Sect. 3.3). The Timelines for the Snowball dataset were collected between April and May 2022. As with the British MPs dataset, the full timelines of each user were accessible at the time of collection, however, they were limited to 3200 Tweets per user during our analyses to ensure comparability with the other datasets.

4.3 Preprocessing

The first step of preprocessing was required to remove any undesired types of users from the data, namely by filtering out any user accounts that are not owned by individual humans. This is an important consideration as, for example, bots and other types of automated accounts will not have any cognitive constraints. As the specialised user datasets were gathered from verified lists of Twitter users, this step was only necessary for the generic datasets: Monday Motivation, UK Users and Snowball. A Support Vector Machine (SVM) [53] was trained on a set of 500 Twitter users that were manually classified as either “people” or “other”. This classifier and the training set are established in ENM research [52] and an accuracy of 81.3% was achieved using k-fold cross-validation (with \(k= 5\)). Any user accounts that were labelled as “other” by the SVM were removed by the original authors of each dataset.

Next, before conducting any analyses on the ENMs, it was necessary to filter out inactive and irregular users for all the datasets. This is because such users are unlikely to be engaged enough with Twitter to have fully developed Ego Networks on the platform. For this, Egos were removed if their timeline consisted of fewer than 2000 Tweets total, spanned a period of fewer than 6 months (from the first to the last Tweet in their Timeline) or if they tweeted less than once every 3 days for more than 50% of the months that they were active. The main rationale behind these choices is to keep only Twitter users that are active and engage regularly with Twitter. These filtration parameters are in line with those of previous work on Ego Networks [18, 49], to which we refer for further details.

5 Results

In this section, we report our experimental findings. First, we conduct 2 tests: to investigate the impact of the choice of sentiment analysis model on the interactions and relationships labels (Sect. 5.1) and to support the validity of said labels (Sect. 5.2). Next, we investigate the properties of the Signed Ego Networks of the 9 selected datasets extracted according to the methodology discussed in Sect. 3.4. Recalling from Sect. 2.1 that an Ego Network is composed of an active and inactive part, we study how negative relationships are distributed in the full vs active network in Sect. 5.3. Then, in Sect. 5.4, we discuss the differences between specialised and generic users and, in Sect. 5.5, analyse how positive and negative relationships are distributed across the Ego Network social circles. Finally, in Sect. 5.7 we investigate the effects of negativity on cognitive effort by observing the correlations between users’ Ego Network sizes and their level of negativity, using the 3 negativity metrics defined in Sect. 3.5.

5.1 Sensitivity of signing method to sentiment classifier

In Sect. 3.1, we have introduced our method for signing social relationships from unsigned social network data. It comprises two steps: labelling of individual interactions (using a state-of-the-art sentiment classifier) and labelling of relationships applying the psychology-grounded golden interaction ratio. Here, we investigate the sensitivity of the proposed relationship signing method to the choice of sentiment classifier, selected among the ones discussed in Sect. 3.2. The Snowball dataset was chosen as the focus of this comparison as it is the largest dataset in this paper; it is also the only dataset that can be used for the Triad Analysis in the next section.

We first compare the sentiment classifiers on the task of labelling single interactions. For the interaction labels (Fig. 3), the models show a fair degree of variability, with around 30 to 45% for positive, 35 to 50% for neutral and 20 to 30% for negative. However, when looking at the relationship labels (Fig. 4), there is a very tight percentage range for 3 of the models (VADER, BERTweet and BERT-C): between 60.71% and 63.53% positive (39.29% and 36.47% negative). By contrast, XLM-T, while still not far from the others,Footnote 3 leans towards almost equal numbers of positive and negative relationships (52.48% positive to 47.52% negative).

Figure 3
figure 3

Percentages of positive, neutral and negative interaction labels estimated by each model (95% confidence intervals)

Figure 4
figure 4

Percentages of positive and negative relationship labels estimated by each model (95% confidence intervals)

Overall, these observations suggest that even though the models may have significant variations in their predicted labels for interactions, these differences shrink when it comes to labelling relationships. As we verify at the end of this section, given the use of a threshold for signing relationships, this finding is due to the models disagreeing on interactions in relationships that are either very positive or very negative (i.e. where the signs of a few interactions could change without changing the sign of the relationship). Thus, the golden interaction threshold approach of signing relationships appears to achieve very similar results with three of the models used for signing the individual interactions and reasonably close results for the fourth. Effectively, this robustness is due to the threshold-based nature of the relationship signing method, which can tolerate a certain degree of disagreement.

Note, as an additional remark, that the percentages in Fig. 4 are more negative than the aforementioned observations of previous research (between 15.0% and 22.6% negative [14]). However, as mentioned in Sect. 2.2, those results were observed in networks with publicly visible signed links, meaning that the number of negative links could have been suppressed due to the effects of Social Capital [23]. Thus, it is expected that datasets without explicit signs that are disclosed to the users (as is the case for all datasets used in this paper) would be more negative than these previous findings.

Next, the level of agreement between each pair of the models was calculated using the proportion of predicted labels that matched exactly: i.e. the likelihood of the models agreeing on a randomly selected interaction or relationship. This was done to verify that the models are not just displaying similar amounts of negative relationships but are actually agreeing on the signs of specific interactions and relationships. A matrix displaying these proportions for both the individual interactions labels and the relationships labels can be seen in Table 3 and Table 4, respectively. For the relationships, only those with 6 or more interactions are included as this is the minimum length required for the relationship signs to be considered reliable. This is due to the 1:5 golden interaction ratio used for signing the relationships (see Sect. 3.1). Indeed, for interactions, the models display somewhat high levels of agreement; ranging from 0.56 to 0.73. What’s more, when looking at relationships, the models tend to agree much more; between 0.66 and 0.84. While the models’ strong agreements do not explicitly give an indication of their performance for the task of signing relationships, it does further illustrate that the relationship labels obtained are reasonably independent of models. Thus, the method of signing relationships proposed in this paper can work irrespective of the choice of model used to analyse the sentiments of individual interactions.

Table 3 The proportions of interactions that each pair of sentiment analysis models agree upon
Table 4 The proportions of relationships that each pair of sentiment analysis models agree upon. Only relationships with at least 6 interactions are included

In order to gain a better understanding of the degree to which the models disagree with each other, we then investigated the percentage of negative interactions in the relationships that pairs of models disagreed on (i.e. the percentages that are used in combination with the golden interaction ratio to determine a relationship’s sign). Again, only relationships with at least 6 interactions are included. By plotting these negativity percentages for pairs of models, it is possible to visualise where the models are disagreeing, as in the example Fig. 5.Footnote 4 However, given the fractional nature of these values, there are many points that overlap with one another. To combat this, and to gain a more precise, numerical perspective, we then look at where the quantiles of these disagreements are. Specifically, for each relationship marked as positive when using model X (meaning that the corresponding fraction of negative interactions is below 0.17) and as negative when using model Y (meaning that the fraction \(\gamma _{Y}\) of negative interactions is above 0.17), we compute the distribution of \(\gamma _{Y}\). If our hypothesis is correct, we expect \(\gamma _{Y}\) to be concentrated in the area close to 0.17. The exact values of the quantiles corresponding to the distribution of disagreements in the bottom-right areaFootnote 5 of the example Fig. 5 are displayed in Table 5, along with those of the other combinations of models. The associated figures can be found in Appendix A. These numbers show that the vast majority of disagreements are indeed happening in the area immediately above the 0.17 golden interaction ratio. This suggests that, even when the models do disagree, they usually don’t disagree by very much. Even the model that disagrees the most strongly with the others, BERT-C, has its third quantiles, i.e. 75% of its disagreements, under and around 40, which corresponds to approximately only 30% of the disagreement range \((17{,}100)\).

Figure 5
figure 5

Example disagreement scatter plot. Each point corresponds to a relationship where two target models (here, VADER and BERTweet) disagree. The x-coordinate of the point corresponds to the percentage of negative interactions in the relationship according to VADER, and the y-coordinate to the percentage of negative interactions in the relationship according to BERTweet. Only relationships with at least 6 interactions are included

Table 5 Disagreement quantiles. The model giving a positive label is on the top and the model giving a negative label is on the left

5.2 Validation via triad analysis

The results in the previous section have shown that the signing method is sufficiently robust to the choice of classifier but they do not tell us anything about the soundness of the obtained signs. In order to validate the assigned signs, we leverage triad analysis as discussed in Sect. 3.3. Recall that there are four types of triads (as illustrated in Fig. 2, depending on the number i of positive edges in them, with \(T_{i}\) denoting triads with i positive edges). As a triad requires interconnected users, most of the datasets included in this work are unsuitable for this analysis, as they contain data from a series of largely disconnected users. The one exception to this is the Snowball dataset, which, due to its snowball collection methodology, contains interconnected users. Therefore, the analysis of the signed triads was only conducted for the Snowball dataset. Fortunately, this is the largest dataset included in this study and is therefore the most likely to produce reliable results.

Four sets of signed triads were obtained using each of the four sentiment analysis classifiers. These were then compared against the signed triads extracted from their corresponding null models where the signs are randomly shuffled, as explained in Sect. 3.3. The triad counts and proportions, as well as the mean expectations and surprise levels (calculated using Equations (1)), can be seen in Table 6. The main focus for this analysis is the surprise (rightmost column), which indicates the number of standard deviations by which the predicted number of each triad differs from that of the randomly shuffled version. According to the weaker version of Structural Balance Theory, triad 3 should be overrepresented and triad 2 should be underrepresented, and this is indeed the case for all 4 of the models. This qualitatively confirms that the patterns of the extracted signs are compatible with what is observed in explicitly signed human social networks. Additionally, the surprisingly abundant \(T_{0}\) provides an initial glimpse at the higher prevalence of negative relationships on Twitter, which we explore further in the subsequent sections.

Table 6 Results of the triad analysis, with the counts and proportions of the observed triads from each model, along with the expected proportions (for a random distribution of signs) and the level of surprise (as described in Sect. 3.3)

Before moving on, it is important to note that, quantitatively, this triad analysis does not provide a means of comparison between the models. In other words, the magnitude of the surprise in the expected direction (e.g., \(T_{3}\) being overrepresented) is not a measure of how good the model is (because there is no such numerical notion of “correct amount of surprise”).

In the interest of time, all subsequent analyses were conducted using only the signs of a single model. As all the models met the expectations of Structural Balance Theory, they are all equally appropriate. However, given that VADER is well-established and known to annotate individual Tweets more accurately than individual humans [35], this was the model that was selected.

5.3 Negative relationships in full and active networks

We now investigate how the signs are distributed inside the Ego Networks. The percentages of negative relationships in the full and active Ego Networks were compared for each of the 9 datasets. Recall from Sect. 2.1 that the active Ego Network is defined as the set of Alters with whom the Ego engages meaningfully (at least one interaction a year, as per the anthropological definition). These percentages are displayed in Fig. 6.

Figure 6
figure 6

Percentages of relationships that are negative for the full and active networks of each dataset (95% confidence intervals)

For the full networks, the datasets display levels of negativity within and slightly above the previously observed range of 15.0% to 22.6% [14] (mentioned in Sect. 2.2). Specifically, the full Ego Network negativities all fall between 16.45% (Monday Motivation) and 31.58% (NYT Journalists). Given that the signs of the links in the current datasets are not explicitly visible to the users, and that, therefore, social pressure towards having positive links will likely be reduced, these observations are very much in line with a priori expectations.

By contrast, the active networks show significantly higher, albeit more varied, levels of negativity, between 21.83% (Monday Motivation) and 54.89% (NYT Journalists). This increase in negativity from the full to active networks suggests that individuals have proportionally greater numbers of negative relationships amongst close contacts with whom they engage frequently than amongst acquaintances. Messages containing or eliciting negative emotions have previously been shown to elicit stronger responses [46] and to spread faster [54] than positive ones. Therefore, one explanation for the higher negativities of the active networks could be that, because the users of the active networks are communicating more frequently, any negative content that enters a user’s Ego Network is more likely to be dispersed along the more active connections. Therefore, the connections of the active networks may display higher negatives because they have an elevated risk of being exposed to and spreading negativity. Thus, the more engaged an individual is, the seemingly greater the likelihood their relationships have of being negative.

In addition, although the increase in negativity from full to active network is most pronounced for the journalist datasets and science writers, this change is observable for all 9 of the included datasets. Therefore, rather than being a unique feature of any specific community, it appears that increased negativity is an inevitable byproduct of engaging with Twitter. Investigating whether this phenomenon is observable for other social platforms, as well as how the effects differ, could be an interesting avenue for future research.

5.4 Negative relationships of specialised and generic users

After observing the full and active networks, the negativities of the specialised and generic users were compared. As can be seen in Fig. 6, most of the specialised users display higher percentages of negative relationships, compared to the generic users. However, this difference is fairly small for the full networks, with Snowball and the generic UK Users dataset actually containing more negative relationships (24.05% and 24.22% respectively) than the British MPs (19.24%) and nearly as much as the Science Writers (25.62%). By comparison, the difference for the active networks is much starker. With the only exception of the British MPs (whose change in negativity better matches those of the generic datasets), the least negative specialised dataset, Science Writers (45.23%), was nearly 5 percentage points more negative than the most negative generic dataset, Snowball (40.31%).

The greater negativities of specialised users also support the hypothesis that more engaged users are more likely to have a greater number of negative relationships, mentioned in the previous subsection.

5.5 Circle-by-circle analysis of the ENM

As previously mentioned, the ENM is concentric, meaning that each of its circles contains all the Alters of the circles that come before it. In this section, we briefly analyze the circle sizes and the scaling ratios in the ENMs of our datasets, before proceeding with the SENM discussion.Footnote 6 It is important to note that the size of Ego Networks tends to vary slightly from Ego to Ego due to various social differences between individuals, as can be seen in Fig. 7. Because of these common variations, and in order to standardise the results of any analysis performed on the circles, it is standard practice to focus on Egos who have a common number of circles [18, 47]. Usually, the chosen number of circles is 5 as it is the most common number for OSN data [32] and, as can be seen in Fig. 8, 5 is the closest whole number for all except 2 of the datasets, the exceptions being NYT Journalists and British MPs (with mean circle numbers of 5.53 and 6.00 respectively). Further, the mode of all of the datasets is 5, except for NYT journalists and British MPs (which were both 6), so there is, indeed, a concentration of values around 5. Therefore, only Egos with 5 circles were considered for the subsequent circle-by-circle analyses.

Figure 7
figure 7

Mean active Ego Network sizes of users with 5 circles in each dataset (95% confidence intervals)

Figure 8
figure 8

Mean number of circles for each dataset (95% confidence intervals)

The first part of the circle-by-circle analysis is to examine the mean sizes of the circles. As expected from previous studies, the sizes are close to those of Dunbar’s expected values, i.e. 1.5, 5, 15, 50,150 (with the typical scaling factor of roughly 3) [7], the exact numbers can be seen in Table 7 (along with the remaining number of Egos after considering only those with 5 circles for each dataset). Note that the difference between the numbers in column “Circle 5” of Table 7 and the values displayed in Fig. 7 is due to the fact that, in the former case, we only consider egos with five circles, while all Egos are included in the latter. Some of the datasets (such as NYT Journalists, UK Users and Monday Motivation) become somewhat distant for the expected numbers in the outermost circle, however, this has also been observed in previous research [18, 31]. What’s more, the increasing scale of roughly 3 is clearly visible in Table 8, with 3 being the closest whole number to every single one of the ratios between subsequent circles as well as for the overall means of each dataset.

Table 7 Mean circle sizes and number of Egos with 5 circles
Table 8 Scaling ratios between circle sizes

Next, before considering the relationship signs at each level of the SENM, we gauge the appropriacy of the threshold method of signing relationships (described in 3.1) for our data. Indeed, given that psychological research has found the golden interaction ratio to be 1:5, we consider that a relationship requires a minimum of 6 interactions in order to be signed reliably. Therefore, we investigate the Egos’ mean numbers of interactions per Alter at each level of the ENM, in order to verify that we have enough data to properly apply the threshold. The results, summarised in Table 9, show that circles 1 to 4 have mean numbers of interactions that are equal to or greater than the required 6. Indeed, only the outermost circle tends to have numbers that are lower than necessary. This means that for circles 1 to 4, there is enough data, on average, to properly estimate the signs.

Table 9 Mean number of interactions per Alter at each level of the ENM

Beyond validating the application of the golden interaction ratio to relationships in circles 1 to 4, Table 9 also shows that journalists tend to interact about half as much per Alter compared to generic users. While this finding is initially counter-intuitive (given that journalists are generally considered to be more engaged with Twitter), a follow-up examination of the different types of interactions sent from the Egos revealed that this is actually in line with the findings of previous works. Essentially, specialised users, such as journalists, tend to generate more Mentions and Retweets, and fewer Replies, than generic users. Based on the conclusions of previous work [55], this suggests that specialised users generally spread their cognitive effort across slightly more distinct connections than generic users, while generic users tend to spend slightly more cognition on each individual relationship. This is supported by the slightly higher active ego network sizes of the specialised users in Table 7 (see Circle 5 column). While this investigation is important for properly understanding the results of Table 9, its findings are only tangentially related to the main focus of this paper. Consequently, the full details have been placed in Appendix B.

5.6 Circle-by-circle analysis of the SENM

Next, moving on to the analysis of the SENM, we observe the mean numbers and percentages of negative relationships for each circle, these can be seen in Table 10. The proportions of negative relationships are found to be disproportionately higher at the innermost circles of the ENM, especially for specialized users, decreasing steadily towards the outer layers. The negative percentages of all journalist datasets are above 61% at the innermost circle and are below 55% at the outermost. This is very surprising as the inner sections of the ENM should be associated with an individual’s most trusted and similar connections. Indeed, one of the four components from Granovetter’s definition of tie strength is reciprocal services [17], and reciprocity is thought to be very closely related to trust [56]. What makes these findings even more surprising is that the aforementioned effect of social capital, which creates a bias towards maintaining positive connections, would be strongest in the innermost circles, where individuals are expected to be the most tightly knit.

Table 10 Mean number and percentage of negative relationships at each level of the Signed Ego Network (for Egos with 5 circles). In bold, the most negative circle of each dataset

Despite these observed differences in the proportions of negative relations across the circles, an observable ratio similar to that of the circle sizes appears to be fairly consistent, as can be seen in Table 11. The mean value of this negativity ratio is marginally lower than that of the circle sizes, however, it is still roughly equal to 3. Looking at the mean column, this ratio appears to be roughly 2.8.

Table 11 Scaling ratios of negative relationships counts between circle sizes

In Table 10, we can compare the proportions of negative relationships between the different types of users. Once again, there appears to be a divide between specialised and generic users. This difference becomes even more noticeable when the journalists are compared to the non-journalists. Indeed, the variations in negativity across the circles appear to be much greater for journalists than for any of the other datasets. The most stable journalist dataset (British Journalists) drops by 13.56 percentage points from circle 1 to circle 5. By contrast, the biggest variation for the non-journalists is 8.08 percentage points (British MPs).

Again, these observations lend support to the notion that increased levels of engagement with Twitter lead to increased levels of negativity. Egos engage the most with their innermost circles and this is where the strongest concentration of negative relationships is found. What’s more, the difference between the negativity at this innermost level and that of the outer level is greatest for the most engaged category of users (journalists). Otherwise said, the most negativity is found at the highest levels of engagement and this is true at every level of the Ego Networks as well as between different types of users. This could also explain why the \(T_{0}\) triads in Sect. 5.2 were so prevalent.

5.7 Negativity metrics

As discussed in Sect. 3.5, a final analysis was carried out to investigate whether maintaining negative relationships is more cognitively demanding than maintaining positive ones. For this, 3 different metrics were computed: the proportion of negative relationships, the proportion of negative interactions and the cognitive effort spent on negative relationships (details in Sect. 3.5). These metrics were then compared to the sizes of the users’ active Ego Networks and the number of users’ interactions: both statistically and graphically.

For the statistical comparisons, Pearson’s R was used. Our hypothesis is that an increase in negativity may correspond to an increase in cognitive effort, hence to smaller active Ego Networks and fewer interactions (this latter hypothesis is based on our observations in Sect. 5.5, which showed that specialised users, who show higher negativity levels, tend to display roughly half the number of interactions as generic users). Thus, a 1-tailed analysis was employed. The results showed no significant correlations for any of the datasets for either the active Ego Network sizes (p>.523 for all cases) or the number of interactions (p>.531 for all cases). This suggests that negativity does not decrease the size of Ego Networks, on average.

Next, binned boxplots were made to visualise the interplay between negativity and cognitive effort, for different classes of Ego negativity. We binned the Egos into quantiles with respect to the negativity metrics (as described in Sect. 3.5), and then analysed the distributions of the active Ego Network sizes and the number of interactions in each bin. The corresponding boxplots for the 2 largest datasets in terms of Egos, Snowball and Monday Motivation, can be seen in Figs. 9 and 10. The complete set of boxplots is available in Appendix C. For the majority of the datasets, the means, medians, boxes and whiskers of the boxplots are fairly flat across the bins (as expected given the non-significant correlations). However, the Snowball dataset shows a smaller active ego network for the first quantile and numbers of interactions that steadily decrease from the first to fourth quantile. This 2 observations are seen for all 3 of the negativity metrics.

Figure 9
figure 9

Boxplots for active Ego Network size (left column) and number of interactions (right column) against the 3 negativity metrics (top, middle and bottom) for the Snowball dataset. For each group of binned Egos, the boxplots display mean (orange line), median (green triangle), first to third quartile (box), 1.5 times the interquartile range beyond the box (whiskers) and outliers (black circles)

Figure 10
figure 10

Boxplots for active Ego Network size (left column) and number of interactions (right column) against the 3 negativity metrics (top, middle and bottom) for the Monday Motivation dataset. For each group of binned Egos, the boxplots display mean (orange line), median (green triangle), first to third quartile (box), 1.5 times the interquartile range beyond the box (whiskers) and outliers (black circles)

We followed up on these observations by conducting t-tests between pairs of bins for each dataset, with the null hypothesis being that there should not be any differences between them. This was done for both the Ego network sizes and the number of interactions. The resulting p-values are displayed in Tables 12 and 13 respectively and the t-scores are available in Tables 15 and 16 within Appendix D. The only dataset that displays consistently significant values is Snowball, which shows significant differences for all comparisons involving the first bin, for the Ego network sizes, and all comparisons involving the last bin for the number of interactions. The Snowball results would suggest that users with many positive relationships are likely to have slightly smaller Ego networks (i.e. fewer connections) and those with many negative relationships are likely to have more overall interactions. Given that the Snowball dataset is significantly larger than the others this may be a relatively weak effect that is only statistically significant when observing a very large sample size.

Table 12 The p-values from the pairwise comparisons between bins for Ego network sizes and negativity. Statistically significant values (<0.05) are displayed in bold
Table 13 The p-values from the pairwise comparisons between bins for number of interactions and negativity. Statistically significant values (<0.05) are displayed in bold

These two results together would suggest that more positive users tend to have fewer connections and interact less frequently overall but more intimately with the connections they do have (at least on the Twitter platform). While more negative users have more, yet less intimate, connections with whom they interact less frequently compared to the positive users, they still end up interacting the most overall. In other words, these results suggest that nurturing positive relationships in online social networks is more cognitively engaging, resulting in smaller ego networks for more positive users. However, while these results seem very promising, given some of the limitations of the negativity metrics analysis (i.e. the observations were only found to be significant for Snowball dataset), it would be pertinent to further investigate the interplay between these effects.

6 Conclusion

The present study introduces a novel method for the inferral of signs in unsigned networks, which leveraged text-based communications among individual pairs of users. The proposed method is founded on solid theoretical underpinnings and enables the application of signed network techniques to non-signed networks in future research, even in situations where data about the global network topology is scarce or unavailable (hence, topology-based tools cannot be applied). The method was shown to be robust to the choice of the underlying sentiment classifier and to reproduce a sign distribution that matches the expectation of the well-known Structural Balance Theory. To demonstrate its effectiveness, this approach was used to generate signed relationships and Ego Networks across 9 distinct datasets. The resulting signed networks were then systematically examined and compared against their unsigned counterparts. This concluded in 4 main findings: (i) somewhat unexpectedly, percentages of negative relationships tend to be higher for active networks than for full networks and this is more pronounced for specialised users than for generic users; (ii) specialised users display a higher propensity towards having negative relationships than generic users; (iii) very surprisingly, negative relationships are found disproportionately more at the more intimate levels of the ENM; (iv) having and maintaining negative relationships appears to have a weak detrimental effect on the number of interactions an individual creates and a weak incremental effect on the distinct number of individuals one interacts with. On top of these core findings, a consolidated signed version of the ENM is also established, with a scaling ratio of negative relationships that decreases slightly from the inner circles to the outer circles for most types of users and which has an overall value that is slightly lower than that of the original model’s circle sizes (i.e. roughly 2.8).

The overall message is that OSNs, while generating structurally similar Ego Networks with respect to offline relationships (i.e. not mediated by social platforms), tend to drastically overemphasise negativity, leading to unexpectedly high percentages of negative relationships. On the other hand, our results also provide weak signs of a more positive use of online social platforms, as users who allocated more cognitive efforts to individual relationships tend to enjoy more positive relationships than the average.

These contributions enable several avenues for further research. For instance, investigating the observed effects in other OSNs such as Reddit or Mastodon or examining the interplay between “positive connections that share negative content” and “actually negative relationships”, which greatly increases our understanding of what it means to have and interact with negative relationships as well as how sharing negative content online can affect the polarity of communications over time.

Figure 11
figure 11

The Ego Network Model, with the names and expected sizes of each subgroup for social networks of humans

Figure 12
figure 12

Percentages of positive, neutral and negative interaction labels estimated by each model (95% confidence intervals)

Figure 13
figure 13

Percentages of positive and negative relationship labels estimated by each model (95% confidence intervals)

Figure 14
figure 14

Percentages of relationships that are negative for the full and active networks of each dataset (95% confidence intervals)