Introduction

Eating disorders (ED) such as anorexia and bulimia are complex mental illnesses that can cause serious health consequences and have the highest mortality rate of any mental illness (American Psychiatric Association and others 2013; Arcelus et al. 2011). Despite such negative effects, there are many online pro-ED (often known as pro-anorexia and pro-ana) communities that actively promote ED as a legitimate lifestyle choice rather than a dangerous disease (Wilson et al. 2006; Overbeke 2008; Mulveen and Hepworth 2006). These pro-ED communities can negatively affect people with and without ED, through promoting unrealistic ideals of thinness, encouraging disordered eating behaviors, and sharing harmful tips on how to develop and maintain ED (known as “thinspiration” or “thinspo”) (Harper et al. 2008; Mabe et al. 2014; Overbeke 2008; Rodgers et al. 2016; Syed-Abdul et al. 2013). As a public concern, pro-ED communities draw widespread criticism, particularly by so-called pro-recovery communities that aim to raise awareness of ED and offer support for people to recover (Lyons et al. 2006; Oksanen et al. 2015; De Choudhury 2015; Chancellor et al. 2016; Wang et al. 2018). Under pressures from these pro-recovery communities and the general public, several social media platforms have adopted censorship-based interventions for pro-ED communities, e.g., banning pro-ED content and user accounts on TumblrFootnote 1 and InstagramFootnote 2 (Chanchellor et al. 2016, 2017; Casilli et al. 2013). However, the efficacy of these interventions is still uncertain. Concerns about this are heightened by recent findings that censoring pro-ED content leads to a wide spread of more harmful alternatives to such content (e.g., content sharing self-harm) (Chancellor et al. 2016; Chancellor et al. 2017), and banning pro-ED users makes these individuals more “invisible”, less reachable by health care providers and recovery-oriented information (Casilli et al. 2013). These findings highlight the importance of understanding how different types of information (not only pro-ED content but also ED-related content more generally) flow through an online ED community and how these information flows correlate with one another, before introducing interventions.

However, our understanding of information flows in online ED communities is limited, as prior studies in this field have often focused on content analysis and largely ignored interaction patterns. Examples of these analyses are examining the types of content shared in online ED communities (Juarascio et al. 2010; Borzekowski et al. 2010; Teufel et al. 2013; Sowles et al. 2018; Wick and Harriger 2018; Branley and Covey 2017), characterizing linguistic styles of individuals in online self-presentation (Lyons et al. 2006; De Choudhury 2015), identifying diagnostic information from language use in pro-ED content (Chancellor et al. 2016; Chancellor et al. 2016), detecting lexical variation of pro-ED content (Chancellor et al. 2016; Chancellor et al. 2017), and measuring people’s attitudes on pro-ED and pro-recovery information based users’ emotional expressions in online comments (Syed-Abdul et al. 2013; Oksanen et al. 2015). Yet, social interactions in information exchange and the resulting communication networks have been largely under-explored. As a result, little is known about the organizational structure of communication in online ED communities and the functional roles of individuals in communicating different types of information.

Although recent studies have turned attention from content analysis to network analysis (Yom-Tov et al. 2012; Arseniev-Koehler et al. 2016; Moessner et al. 2018; Tiggemann et al. 2018), they either focus solely on a single type of communication (e.g., sharing pro-ED content (Tiggemann et al. 2018)) or do not distinguish different types of information shared in online ED communities (Yom-Tov et al. 2012; Arseniev-Koehler et al. 2016; Moessner et al. 2018). It remains unclear how different types of communication correlate with one another in these communities. Insights into the correlations among different types of communication can facilitate predictions of an community’s responses to interventions. For example, if users’ activities in two types of communication have a highly positive correlation, blocking one type of communication is likely to promote the other type of communication.

In this work, we address these research gaps by using a multilayer network approach to systematically characterizing communication networks for a broad range of types of content in an online ED community. We analyze a large set of Twitter conversations (i.e., tweets with a “mention” or “reply”) among individuals who self-identified by having ED in their Twitter profile descriptions and their online friends, involving 2,206,919 tweets posted by 55,164 users over 7 years (from March 2009 and March 2016). Three major research questions guide our analysis: (i) what types of content are often discussed in an online ED community? (ii) how do different types of content flow through interpersonal communication networks? and (iii) whether and how do different types of communication correlate with one another? The main contributions of this work are as follows.

First, we demonstrate the use of unsupervised clustering methods to identify the types of content discussed in online ED communities. Unlike previous studies that assume a type of content with predetermined features (e.g., a set of keywords) (Syed-Abdul et al. 2013; Oksanen et al. 2015; De Choudhury 2015; Chancellor et al. 2016; Yom-Tov et al. 2012; Arseniev-Koehler et al. 2016; Tiggemann et al. 2018), our approach allows themes of content to emerge from the data, which can reduce bias due to predetermined assumptions and provide an overall view of the full range of topics discussed in an online ED community.

Second, we propose to represent types of communication in online ED communities by a multilayer network in which each layer is a network representing interactions among the same set of users in discussing a specific topic. Compared to traditional monolayer networks (Newman 2010), multilayer networks provide (i) a more natural representation of a communication system by capturing the multiplex nature of human interactions (Nicosia and Latora 2015; Kivelä et al. 2014; Boccaletti et al. 2014), and (ii) a more elegant and flexible way for incorporating multidimensional information. Based on this multilayer representation, we (i) characterize different types of communication by measuring structures of single-layer networks in the multilayer communication network, and (ii) examine interdependencies of different communication by measuring structural correlations of inter-layer networks.

Finally, we study dynamics of user communication and reveal underlying processes that lead to correlations of communication on different topics. By measuring structural changes and stability in a sequence of temporal, multilayer networks that are built based on users’ conversations over time, we find that (i) actors previously engaged in pro-recovery communication are likely to engage in pro-ED communication in the future and (ii) actors engaged in sharing pro-ED content have frequent entries into and exits from the corresponding communication network.

Data

Our dataset is collected from Twitter, a microblogging platform that allows millions of people to interact by exchanging short tweet messages. Whereas many social media sites restrict pro-ED content (Chancellor et al. 2016), Twitter has not yet enforced any restrictions (Arseniev-Koehler et al. 2016). This makes Twitter a unique platform to examine communication naturally happening within online ED communities in a non-reactive way. All data used in this study is publicly accessible information on Twitter; no personally identifiable information is used. Next, we provide details about the dataset we used.

Collecting user sample

We use a snowball sampling method (Wang et al. 2017) to gather data about individuals affected by ED on Twitter. We first search for ED-related tweets by a set of keywords (e.g., “eating disorder”, “anorexia” and “bulimia”) via the Twitter application programming interfaces (APIs). From authors of 1169 ED-related tweets, we identified 33 users who self-reported both ED-diagnosis information (e.g., “eating disorder”, “edprob” and “proana”) and personal bio-information (e.g., height and weight) in their Twitter profile descriptions. Starting with these seed users, we use a snowball sampling procedure through users’ who-follows-whom networks to expand the user set. This results in 3380 ED users who self-identified as disordered in their profile descriptions. Our data validations show that 95.2% of the ED users are likely to be affected by ED (i.e., a high precision, see (Wang et al. 2017) for more details). However, the above process does not ensure a high recall, as we miss users who did not disclose their disorders in Twitter profile descriptions.

To obtain a more representative sample of online ED communities, we further collect ED users’ Twitter friends (including followees and followers) who posted ED-related content in tweets. To this end, we first crawl all friends of each ED user on Twitter, yielding 208,065 users (including the 3380 ED users). For each user, we retrieve up to 3200 (the limit returned from the Twitter APIs) of their most recent tweets, resulting in 241,243,043 tweets. This collection process finished on March 2, 2016. Then, we search for users who posted an ED-related hashtag in their historical tweets (see Additional file 1: Section 1 for details), resulting in 41,456 ED-related users.

Tracking interpersonal conversations

The other task of our data collection is to track interpersonal communication of ED-related users. We focus on users’ communication via the “mention” and “reply” interactions as these interactions are the two main ways to conduct direct communication on TwitterFootnote 3. Also, as users can discuss a topic by sending and replying to tweets over several rounds, a single tweet message often cannot provide complete context to understand human communication. For example, it may be hard to recognize that user A might dissuade user B from committing suicide based on a single tweet “@XXX please don’t do it, I love you so much!”, without considering that this tweet is user A’s reply to user B’s tweet “9 30pm on the 8th of July 2012 I will hopefully die, so n/r going to write my suicide note”. Thus, to obtain a relatively complete context in a discussion, we shift attention from single tweets to conversations, i.e., aggregations of successive tweets in a discussion (Alvarez-Melis and Saveski 2016).

Specifically, for each user, we search for their tweets that contain a mention or reply. Then, we aggregate tweets into conversations based on the “in_reply_to_status_id” field returned by the Twitter APIs. Each conversation consists of a seed tweet, all tweets in reply to it, and replies to the replies, which can involve several tweets and users. Mentions that do not receive any replies are considered as individual conversations. We obtain 1,044,573 conversations consisting of 2,206,919 tweets. All re-tweets are excluded, since the mentions or replies in a re-tweet are conducted by the original author of the re-tweet, not by users who re-tweet it. Detailed statistics of these conversations are presented in Additional file 1: Section 2.

Results

In this section, we present our analyses of Twitter conversations in online ED communities. These analyses involve three steps: (i) characterizing the types of content in users’ conversations; (ii) examining how different types of content flow through interpersonal interactions and measuring structural correlations among different types of communication; and (iii) exploring how different types of communication correlate by analyzing temporal information.

Content analysis

Common methods for characterizing the types of textual content are topic modeling (e.g., latent Dirichlet allocation models (Blei et al. 2003; Alvarez-Melis and Saveski 2016)) and content-based clustering methods (e.g., bag-of-words and word/document embeddings (Mikolov et al. 2013; Le and Mikolov 2014)). However, these methods generated topics that were hard to interpret in our preliminary experiments. Previous studies have shown that these methods perform poorly when applied to short and noisy tweets (Alvarez-Melis and Saveski 2016). Although we aggregate short tweets into a conversation, most conversations are still short (on average 21.9 words in each conversation) and they are often dominated by general chats (e.g., “why do you follow me?”). Inspired by prior work (Weng and Menczer 2015), we here characterize the types of users’ conversations by identifying topics of hashtags used in these conversations. As hashtags are often used to annotate the theme of a tweet, these clusters of hashtags have been shown to effectively indicate the underlying topics in tweets (Wang et al. 2014; Steinskog et al. 2017; Weng and Menczer 2015).

We detect topics of hashtags by performing community detection in co-occurrence networks of hashtags. We build an undirected, weighted hashtag network based on the co-occurrences of hashtags in the tweets of users’ conversations, where an edge is weighted by the co-occurrence count of hashtags. To filter out noise, only tags used by more than three distinct users and used in more than three tweets are considered. The resulting network contains 65,756 nodes and 109,663 edges, partitioned in 672 connected components, where 5791 nodes are in the giant component and 4 in the second largest component. Due to the dominance, we focus on analyzing the giant component and obtain 26 topic clusters of hashtags by applying the Louvain method (Blondel et al. 2008) to this networkFootnote 4. The resulting modularity is Q=0.51 (z=6.63 compared to a random configuration model (Newman and Girvan 2004), p<0.001 in a two-tailed test), indicating a clustered topic structure in the hashtag co-occurrence network. By examining the numbers of tweets and users related to each of the 26 topics identified above, we find that users have consistently high levels of engagement in five topics with IDs 2, 4, 8, 16 and 22 respectively, whereas other topics are much less popular (see Additional file 1: Section 3). To avoid analyzing topics of interest to a specific subgroup of online ED communities, we focus on the five popular topics in this work.

Figure 1a-e show the most frequent hashtags and their co-occurrence networks for each of the five popular topics. As shown in Fig. 1a, topic 2 is dominated by “#eatingdisorders”, “#mentalhealth”, “#recovery” and “#bellletstalk”Footnote 5, showing a clear tendency to support recovery from ED and promote mental health (Yom-Tov et al. 2012; Chancellor et al. 2016). We label this topic mental. In contrast, topic 4 (Fig. 1b) is dominated by a single tag “#ff” which is likely to be an abbreviation of “#followfriday”, given frequent co-occurrences between the two tags. These tags are often used in a weekly social events where people recommend their followers to follow more people on TwitterFootnote 6. We thus label this topic social. Figure 1c shows that topic 8 is mainly concerned with fitness activities and diet (thus labeled fitness). Topic 16 (Fig. 1d) is about “#picslip” which is often used by users to post a picture of themselvesFootnote 7. Other tags that highly co-occur with “#picslip” are “#bodyslip”, “#fat”, “#selfharm” and “#failure”, indicating a theme of body image and body dissatisfaction, thereby labeled body. As shown in Fig. 1e, topic 22 is mainly about thinspiration (or pro-ED) content (e.g., “#thinspo” and “#proana”) which is designed to inspire people to lose weight and stay extremely thin (Juarascio et al. 2010; Borzekowski et al. 2010). We label this topic thinspo. Moreover, to illustrate the relationships of these popular topics, we visualize a co-occurrence network of popular topics in Fig. 1f.

Fig. 1
figure 1

Topics in hashtag co-occurrence networks. a-e Co-occurrence networks of the most frequent hashtags in each of the five popular topics, where each node denotes a hashtag. The size of a node is proportional to the number of tweets with a hashtag and edge width is proportional to the number of co-occurrences between tags. f Co-occurrences of popular topics involving more than 1000 users, where each node denotes a topic as labeled. The node size is proportional to the number of tweets mentioning a topic and edge width is proportional to the ratio of the number of tweets mentioning both topics over the number of tweets mentioning at least one topic

To verify our results, we check (i) if the topics found above cover real-world events in ED communities, and (ii) if the relationships of these topics align with findings in prior qualitative studies on online ED content (Juarascio et al. 2010; Borzekowski et al. 2010). Results of these checks strongly confirm the reliability of our content analysis (see Additional file 1: Section 4).

Network analysis

We proceed to explore how different types of content flow through interpersonal interactions using network analysis methods. To do this, we first categorize users’ conversations based on the topics of hashtags found above. Given a conversation, we track the sequence of hashtags used in the conversation and annotate the topics of this conversation with the topic labels of these hashtags. To avoid ambiguous annotations, we only consider conversations that are labeled with only one unique topic; those with multiple topics or without a hashtag are excluded in our analysisFootnote 8. This results in 102,554 conversations consisting of 201,155 unique tweets. Then, we represent information about who interacts with whom and on which topic in users’ conversations via a multilayer network with N=55,164 nodes representing users and M=5 layers representing topics. The multilayer network can be described by a set of M adjacency matrices, one for each layer, \(G = \left [A^{[1]}, A^{[2]},..., A^{[M]}\right ] \in {\mathbb {R}}^{N\times N \times M}\). Each layer A[α] is a directed, weighted network, in which a link \(a_{ij}^{[\alpha ]}\) runs from a node representing user i to a node representing user j if i mentions or replies to j in the conversations on topic α=1,2,...,M, weighted by the frequency of these mentions and replies.

Based on this multilayer representation, we characterize communication patterns in online ED communities by quantifying structural properties of the multilayer network. First, we measure structural properties of single-layer networks (i.e., each layer is considered as a separated network) to examine organizational features of each type of communication. Second, we measure inter-layer dependencies in the multilayer network (i.e., structural correlations between inter-layer networks) to explore associations of different types of communication.

Structures of single-layer networks

We first examine structures of single-layer networks to explore the organization of an online ED community in a type of communication. Figure 2 shows cumulative in- and out-strength distributions of each single-layer network, and Table 1 gives details about structural properties of these networks. The key results are as follows.

Fig. 2
figure 2

Distributions of in-/out-strength (sin/sout) at each layer and the full aggregated network (AGG.). All values of s are shifted by 1 to account for nodes with zero values on the log-log plots. Lines fit a power-law distribution P(s)=sλ using the maximum likelihood estimator and a p-value for the goodness of fit is obtained using a bootstrapping procedure (Clauset et al. 2009). The mean values and standard deviations of exponents λ are shown in the legends, and p-values obtained via 1000 bootstrap replications are reported in parenthesis

Table 1 Statistics of single-layer networks and the aggregated (AGG.) network, including 1. the number of active nodes N[α], i.e., nodes that are connected by at least one in-/out-link (Nicosia and Latora 2015); 2. the total number of edges E[α]; 3. the average strength 〈s[α]〉; 4. density D[α] measuring the ratio of the number of edges to maximum possible number of edges; 5. fraction of nodes in the giant weakly connected component %G[α]; 6. reciprocity r[α] quantifying the likelihood of nodes with mutual links; 7. the Kendall’s τ correlation between in- and out-strengths \(\tau \left (s_{in}^{[\alpha ]}, s_{out}^{[\alpha ]}\right)\). 8. global clustering coefficient C[α] which measures the extent that two neighbors of a node are connected; 9. assortativity coefficient by strength \(A_{s}^{[\alpha ]}\), i.e., the correlation between the out-strengths of source nodes and the in-strengths of destination nodes (Newman 2003)

Users’ engagement levels in posting harmful content have skewed distributions. Figures 2a-e show two distinct behaviors in in- and out-strength (sin and sout) distributions of single-layer networks. Specifically, the distributions of sin are more skewed than those of sout in the mental, social and fitness layers, while the distributions of sout are more skewed in the body and thinspo layers. These behaviors can be quantified by fitting a power-law function P(s)=sλ. We find that all networks have comparable values of λ in sin distributions, indicating similar patterns of popularity ranking for actors in different interactions. However, the sout distributions in the body and thinspo layers (λ≈3) have a larger value of λ than those in the mental, social and fitness layers (λ≈2). As exposure to thin-ideal content (thinspo and body) is associated with higher risks of ED (Hargreaves and Tiggemann 2003; Yu 2014), this implies that the fractions of users who actively post harmful content are relatively small.

Private communication takes place in small groups. As shown in Table 1, mental and body layers have lower fractions of nodes in the giant weakly connected component %G[α] than other layers, revealing that users tend to form smaller communities when discussing mental health and body image. This may be related to the private nature of these topics—due to fear of rejection and feelings of shame (Becker et al. 2010; Swanson et al. 2011), people are more likely to talk about their illnesses and body image to someone they can trust rather than any friends online.

Interactions related to body image are reciprocal. Table 1 shows that interactions on body image and appearance management (fitness, body and thinspo) have higher degrees of reciprocity r[α] than those on other types of content (mental and social). High reciprocity indicates a tendency to reciprocate the interactions received from others, which can reward and reinforce these interactions (Fehr and Gächter 2000). The high degrees of reciprocity of interactions in the fitness, body and thinspo layers are confirmed by positive correlations between in- and out-strengths (τ>0), while τ<0 implies a suppression of reciprocity in the mental and social layers.

Users in general communication cluster. While the clustering coefficients C[α] are low in each network (Table 1), the value of z(C[α]) in general communication (social) is larger than those in communication on specific topics (thinspo and mental). Higher values of z(C[α]) indicate that users are more likely to cluster together, compared to a baseline of random clustering. Such high value of z(C[α]) in general communication may be due to the fact that more general topics tend to be of interest to a wider variety of individuals, and a higher level of individuals sharing common interests leads to a more cohesive social community (Lim and Datta 2013).

Private communication forms a weakly disassortative network. As Table 1 shows, all networks are characterized by disassortative mixing by strength \(\left (A_{s}^{[\alpha ]} < 0\right)\), i.e., hubs tend to be attached to peripheral nodes, which aligns with prior evidence on online social networks (Hu and Wang 2009). Compared to null models, the disassortative strengths in private communication (body and mental) are relatively weaker than those in other communication (social, fitness and thinspo). This implies that people tend to discuss private topics with others who have similar social-status characteristics in a community.

The independent analysis of single-layer networks described above shows different organizational structures in different types of communication, highlighting the multiplex nature of human interactions (Szell et al. 2010; Lewis et al. 2012). To demonstrate the disadvantage of not distinguishing types of communication, we include the statistics for the aggregated network (i.e., aggregating all single-layer networks in a single network) in Fig. 2 and Table 1. We see that ignoring the differences of interactions leads to the loss of essential information and a misrepresentation of the system, e.g., losing information on differential network structures between harmful and healthy communication.

Dependencies between inter-layer networks

We next extend the independent analysis of single-layer networks to analysis of interdependencies between these networks. The aim of this interdependency analysis is to examine the correlations of individuals’ activities and their functional roles in different types of communication. We consider the following measures.

  • Activity correlation: the tendency of users to be involved in one type of communication if they are involved in another type of communication. This can be measured by multiplexity, i.e., the fraction of nodes that are active at both layers α and β in all nodes of a multilayer network (Nicosia and Latora 2015).

  • Role correlation: the extent to which hubs (e.g., those users who have high popularity or active engagement) in one type of communication are also hubs in another type of communication. We measure this by the Kendall’s τ rank correlations of nodes’ in-/out-strengths between two layers of the multilayer communication network. To avoid bias due to a low degree of multiplexity in real-world networks (Nicosia and Latora 2015), we only consider nodes that are active in both layers.

  • Link overlap: the tendency that user i connects to user j in both types of communication. This can be measured by the Jaccard coefficient between two sets of links (binary links) at two layers (Szell et al. 2010).

  • Link-strength correlation: the extent to which user i has frequent interactions with user j in two types of communication. We measure this by Kendall’s τ correlation of link strengths between two layers. Due to the sparseness of connections in real-world networks (see Table 1), we only consider links between two nodes that are present in the two layers.

These measures alone, however, are not adequate for evaluating inter-layer correlations. This is because the values of these measures are influenced by the size and connectivity of each single-layer network, which can be related to the processes of data collection and content categorization discussed in the previous sections. For a reliable evaluation, we need to assess the statistical significance of a correlation result. A standard statistical approach for distinguishing patterns of networks from those generated by chance is null models (Connor et al. 2017; Newman and Girvan 2004). A null model generates patterns by randomizing an observed network many times under proper constraints; an observed pattern that differs from the distribution of randomly generated patterns is potentially derived from meaningful processes rather than chance (Gotelli and Ulrich 2012; Paul and Chen 2016). According to the null hypothesis in question, null models can have different constraints and randomization processes. Here, we consider four null models for testing hypotheses of interest (see Table 2). In each model, randomized networks in each layer have the same sizes (i.e., the numbers of active nodes and edges) as the original ones, so as to control for the effects of data collection and content categorization on inter-layer correlations. Details of these null models are introduced in Additional file 1: Section 5.

Table 2 Null hypotheses on correlations of individuals’ activities and roles in types of communication

We generate 1000 randomized multilayer networks for each null model, and measure a z-score for the empirical value of an inter-layer correlation measured in the original network x as z(x)=(x−〈x〉)/σx, where 〈x〉 and σx are the mean and standard deviation of the values of x measured in randomized networks respectively. The results are shown in Fig. 3, which can be summarized as follows.

Fig. 3
figure 3

Deviations of empirical pairwise correlations of inter-layer networks to null models. a Multiplexity, b-d in-/out-strength correlations, e link overlaps and (f) correlations of link strengths, where p<0.05 when z<−1.96 or z>1.96 with assumptions of normality

Social networks in thebody layer bridge those in themental andthinspo layers. Figure 3a shows z-scores of inter-layer multiplexity compared to a hypergeometric model (Nicosia and Latora 2015). The largest z-score occurs between body and thinspo layers, indicating that the correlation of users’ activities in sharing thinspo and body topics is much stronger than expected at random. On the other hand, while the overlap of actors in mental and thinspo layers is not significantly different from randomness, actors in the mental layer have a pronounced overlap with those in the body layer. These results imply that the group of users who engage in sharing body may bridge two groups who engage in sharing mental health and thinspo content.

Actors play different roles in healthy and harmful communication. Figure 3b-d show z-scores for in- and out-strength correlations of nodes in pairwise layers, as compared to an independent multilayer node-permutation model (Croft et al. 2011). In most pairwise layers, nodes with higher in-/out-strengths in a layer tend to have higher in-/out-strengths in the other layer (Fig. 3b-c), which indicates that popular/active users in a field are likely to be popular/active in the other field. However, nodes’ positions in the mental layer are not significantly correlated with those in the body layer, implying that actors may play a different role in these types of communication. Surprisingly, this pattern is absent between the mental and thinspo layers, i.e., nodes’ positions in these layers are significantly correlated. A possible reason for such correlations is that pro-recovery users who actively post mental health may send healthy information to pro-ED users who post thinspo content as interventions (Yom-Tov et al. 2012). This can be illustrated by the results in Fig. 3d that nodes with higher out-strengths in the thinspo layer are likely to have higher in-strengths in the mental layer. That is, users who post more thinspo content tend to receive more content on mental health. Figure 3d also reveals users’ responses when receiving different content. For example, nodes with higher in-strengths in the fitness and body layers tend to have higher out-strengths in the thinspo layer and lower out-strengths in the mental layer. This indicates that receiving more fitness and body content may reinforce users’ engagement in posting thinspo content and reduce their engagement in posting mental health.

People often connect to the same friends in different types of communication. Figure 3e shows z-scores for overlaps of links in pairwise layers, as compared to an independent multilayer configuration model (Paul and Chen 2016). We see that high z-scores show in each pair of layers, indicating that users generally tend to connect to the same friends when discussing different topics. This aligns with prior evidence that people are often surrounded by a relatively stable social network (Viswanath et al. 2009).

Strengths of interactions onmental health generally have no significant correlations with those on other content. Figure 3f shows z-scores for correlations of link strengths, compared to an independent directed-weight reshuffling model (Opsahl et al. 2008). A notable pattern is that users who often exchange content of mental health have no significant tendencies to frequently discuss other topics such as social, fitness and body. This can arise from two different processes: (i) actors in the mental layer exclusively focus on discussing mental health, while largely ignoring interactions on other topics; and (ii) actors who previously engaged in other topics are less likely to engage in discussing mental health later. Distinguishing the two processes requires detailed time information on different interactions, which will be discussed in the next section.

Analysis of temporal patterns

To better understand the relationships among different types of communication, we further consider the time dimension of Twitter conversations and examine the dynamics of communication networks over time. Compared to the above analysis on static networks, dynamic analysis on temporal networks allows to explore how users start/stop to engage in a topic and change interests from one topic to others, yielding further insights into the correlation patterns of different types of communication.

To this aim, we represent temporal information about who interacts with whom on which topic and when in Twitter conversations using temporal multilayer networks. Specifically, we divide users’ conversations into multiple sub-sets over time periods 1,...,T, based on the posting timestamp of a tweet. To reduce potential bias due to intermittent posting activities of users and temporal popularity of topics onlineFootnote 9, we build temporal networks by a fixed number of tweets instead of a fixed time interval. We rank all tweets by a chronological ordering and partition the tweets into subsets with a fixed number of tweets. The number of subsets is estimated by the Freedman-Diaconis rule (Freedman and Diaconis 1981), resulting in 55 subsets. For conversations in a subset at period t∈[1,...,T], we build a temporal multilayer network \(G_{t} = [A_{t}^{[1]}, A_{t}^{[2]},..., A_{t}^{[M]} ] \in {\mathbb {R}}^{N\times N \times M}\) in the same way that we build the static multilayer network, where M layers representing M topics and N nodes representing N users are fixed over time. Detailed statistics for these temporal multilayer networks are reported in Additional file 1: Section 6.

Based on these temporal networks, we study the dynamics for users’ communication in two ways. First, we measure the likelihood of users engaging in a type of communication given that they have engaged in other types of communication. Clarifying such likelihood is not only useful to understand how the above correlation patterns appear among different types of communication, but also helps to identify signs suggestive of engagement in a type of communication, e.g., risk factors for engaging in harmful communication. Second, we examine the stability of a community of users who engage in a type of communication over time, particularly on investigating the presence of hardcore actors who have long-standing involvement in a type of communication. Evidence from this investigation can give insights into what strategies are likely to achieve quality, cost-effective outcomes in interventions. For example, if a type of communication is mainly carried out by a fixed set of hardcore actors, banning a small number of these actors can lead to serious damage to the connectivity of the communication network (Kirman and Lawson 2009) and reduce the efficacy of the network in shaping individual cognition and behavior (Kilduff et al. 2006), while banning a larger number of actors at random may have limited influence on the network (Albert et al. 2000).

Transition of engagement activities

We first examine how users change their engagement between types of communication by measuring transitions of nodes’ activities across layers in temporal multilayer networks. As users can engage in discussing multiple topics at the same time period, following prior work (Nicosia and Latora 2015), we represent the activity state of node i across layers at time t by a node-activity vector \(b_{i,t} = \left (b_{i,t}^{[1]},..., b_{i,t}^{[M]}\right)\), where \(b_{i,t}^{[\alpha ]}=1\) if node i is active at layer α of Gt (i.e., user i engages in topic α at time t) and \(b_{i,t}^{[\alpha ]}=0\) otherwise. For computational efficiency, each binary vector \(b_{i,t} = \left (b_{i,t}^{[1]},..., b_{i,t}^{[M]}\right)\) is encoded as a decimal integer \(R_{i,t} = \sum _{m=1}^{M} b_{i,t}^{[m]} \cdot 2^{M-m}\), where Ri,t=0 indicates that node i has no interaction with others at time t and Ri,t=2M−1 indicates that node i interacts with others in discussing all topics at time t. Then, we measure the transitions of users’ engagement from a set of topics to another set by the period-to-period transition probability of node i from state Rt=x to state Rt+1=y asFootnote 10:

$$ P(R_{t+1}=y|R_{t} = x) = \frac{\sum_{t=1}^{T-1} \sum_{i=1}^{N} I(R_{i,t} = x, R_{i, t+1}=y) }{\sum_{t=1}^{T-1}\sum_{i=1}^{N} I(R_{i,t} = x)}. $$
(1)

where I(Ri,t=x,Ri,t+1=y) is an indicator function denoting whether node i has both an activity state Ri,t=x at time t and a state Ri,t+1=y at t+1, defined as:

$$ I(R_{i,t} = x, R_{i, t+1}=y) = \left\{\begin{array}{ll} 1 & \text{if}\ R_{i,t} = x\ \text{and}\ R_{i, t+1}=y \\ 0 & \text{otherwise}. \end{array}\right. $$
(2)

Similarly, I(Ri,t=x)=1 if node i has an activity state Ri,t=x at time t and I(Ri,t=x)=0 otherwise.

Figure 4a shows results of transition probabilities P(Rt+1|Rt) in our data, where we only consider nodes that are active in at least one of the two successive periods, i.e., Ri,t+Ri,t+1>0. These results reveal the following patterns.

Fig. 4
figure 4

Transitions of engagement in different topics. a Transition probabilities of topic engagement between two subsequent observations Rt and Rt+1; b Fractions of users who posted content on topic α earlier will post β later (note that \(\sum _{\beta } P(\beta |\alpha)\) is not necessarily equal to 1 as a user can post multiple different topics β after posting α); c Transition probabilities of topic engagement between the beginning Rb and the end Re of participation

Users tend to shift engagement from healthy communication to other communication. One notable pattern in Fig. 4a is that the probability values in region I, namely P(Rt+1>16|Rt<16), are smaller than those in other regions. Since Rt≥16 and Rt<16 denote whether nodes are active in the mental layer or not respectively, this result indicates that users who previously engaged in other topics are less likely to discuss mental health subsequently. In contrast, the values of P(Rt+1<16|Rt>16) in region IV are relatively high, showing that users who previously talked about mental health tend to change to talk about other topics like thinspo (i.e., Rt+1=1). Together, these results imply that users are more likely to shift engagement from pro-recovery to pro-ED communication than vice versa.

To reinforce the above argument on users’ engagement between pro-recovery and pro-ED communication, we inspect users’ historical tweets and compute the probability that users post content on topic β after posting content on topic α. The results are shown in Fig. 4b. We see that 17% of users who posted mental earlier will post thinspo later, while only 10% of users who posted thinspo earlier will post mental later, which confirms that users are more likely to shift engagement from pro-recovery to pro-ED communication. Also, the probabilities in the last row of Fig. 4b are relatively low (P(β|α)≤0.12 when αβ), indicating that users previously engaged in posting other content are less likely to engage in posting mental health. This explains why the link strengths in the mental layer are less correlated with those at other layers (Fig. 3f). Moreover, the highest value of P(β|α)=0.26 occurs when users post body content after posting thinspo. This explains the significant inter-layer correlations between body and thinspo (“Dependencies between inter-layer networks” section), and also confirms that individuals are likely to engage in comparison of body image after viewing thinspo content (Bardone-Cone and Cass 2007).

Users interested in a specific topic earlier tend to engage in the same topic later. Another notable pattern in Fig. 4a is the relatively high values of P(Rt+1|Rt) when Rt+1=1,2,4,8,16 and Rt=0. Since Rt=0 denotes users having no engagement in the communication system at time t and Rt+1=1,2,4,8,16 denotes users engaging in a single topic at t+1, this result suggests that a relatively large number of new users (and those who restore to active state after an inactive period) join the communication system by discussing a single topic. Similarly, the results of P(Rt+1|Rt) when Rt+1=0 show the statuses of users’ engagement in topics before they leave the system. We see that the values of P(Rt+1|Rt) when Rt+1=0 and Rt=1,2,4,8,16 are generally high, indicating that users have high dropout rates when discussing only a single topic. Thus, a natural question is whether users have constant interests in the same topics at the beginning and the end of participation in the communication system.

To explore this question, we use the same method described above to measure the beginning-to-end transition probabilities P(Re|Rb), where Rb and Re are nodes’ activities across layers at the beginning and the end of participation, respectively. To avoid overestimation of P(Re|Rb) for users who are observed only in one time periodFootnote 11, we only consider nodes that are active at least in two different temporal networks (i.e., 1≤b<eT). The results are shown in Fig. 4c. As expected, high probabilities of P(Re|Rb) appear when Re=Rb (highlighted in a red line) and Rb=1,2,4,8,16, indicating that users who engage in a single topic earlier are more likely to engage in the same topic later. However, this pattern is absent when users engage in more than one topic at an early stage, i.e., Re=Rb but Rb≠1,2,4,8,16. Measuring Cohen’s κ between Ri,b and Ri,e for each user i confirms that the consistency between the beginning and end of participation for users with Ri,b=1,2,4,8,16 (κ=0.34) is higher than that with Ri,b≠1,2,4,8,16 (κ=0.01).

The diversity of users’ interests decreases over time. We also notice that the probabilities in region II of Fig. 4c are higher than those in region I. In region II, Rb>Re, meaning that users who engage in a wide range of topics at the beginning of participation tend to focus on a small number of specific topics at the end of participation. To verify this pattern, we measure the diversity of users’ interests in tweets over sliding windows. Again, we set sliding windows by a fixed number of tweets rather than a fixed time interval. This is to avoid bias from intermittent activities of users, e.g., as a user becomes less active in posting content, the number of tweets posted in a fixed time interval decreases and the diversity of topics in these tweets will also decrease over time. Given user u posting n distinct tweets on topics T1,...,Tn (with repetition), the diversity of posting interests of u in window i∈[1,nk+1] is measured by the entropy of topics Ti,...,Ti+k:

$$ H_{i} (u) = -\sum_{T_{j}\in \mathbb{T}_{u,i}} P(T_{j})\log P(T_{j}), $$
(3)

where \(\mathbb {T}_{u,i}\) is the set of distinct topics among Ti,...,Ti+k. P(Tj)=C(Tj)/k in which C(Tj) counts the frequency of Tj in Ti,...,Ti+k. A larger value of Hi(u) indicates a higher degree of diversity in users’ interests. In a similar way, we also measure the diversity of user interests based on tweets that are received from other users.

Figure 5 shows the mean entropy 〈Hi〉 and 95% confidence intervals (CI) of topics in tweets posted and received by users over sliding windows, where a window size of k=10 is used. Inactive users who have posted or received less than 20 tweets are excluded to avoid noise. Both plots show that the diversity of user interests has a decreasing trend over time. Results of linear regression models that relate 〈Hi〉 to a function of i confirm negative correlations between 〈Hi〉 and i, with p<0.001 in both models. Robustness checks using other window sizes and thresholds for excluding inactive users produce similar results. These findings strongly support the hypothesis that users tend to focus on a small number of specific topics as they engage more online. Moreover, the diversity of interests in received tweets declines more slowly, as compared to that in posted tweets. This hints a time-lag between the two trends, likely because a user may continue to receive information on a topic from other users even when the user loses interests in posting the topic.

Fig. 5
figure 5

Mean entropy 〈Hi〉 and 95% CI of topics in (a) posted and (b) received tweets over sliding windows with a size of k=10. CI become wider due to the decreased sample sizes of users having a large number of tweets. Legends report estimated coefficients B and p-values in a linear regression model: 〈Hi〉=Bi+ε, where estimations are based on windows over i∈[1,150] due to relatively small mean errors in this range

Stability of communities

While “dynamics” typically imply changes, another important direction in studying the evolution of social networks is to examine the stability or sustainability of a network over time (Kilduff et al. 2006; Becke 2014; Onnela et al. 2018). The core concept of such research is that members of a social network retain and remember historical ties to former members of the network. This ongoing strengthening of relationships will affect the extent to which the interpersonal network can shape cognition and behavior (Kilduff et al. 2006). As such, we now turn our focus from analyzing changes in topics of conversations to studying stability of a community of users involved in a type of communication. We measure the stability of a community by overlaps of users who engage in the same type of communication over time, i.e., the overlaps of active nodes in the same layer α in different pairs of temporal multilayer networks Gt and Gt+Δt. This can be computed by the Jaccard similarity of nodes that are active in \(G_{t}^{[\alpha ]}\) and \(G_{t + \Delta t}^{[\alpha ]}\) as:

$$ J^{[\alpha]} (t, t + \Delta t) = \frac{N^{[\alpha]}_{11}}{N^{[\alpha]}_{01} + N^{[\alpha]}_{11} + N^{[\alpha]}_{10}}, $$
(4)

where Δt∈[1,T−1] is the time interval between two networks Gt and Gt+Δt. \(N^{[\alpha ]}_{11}\) is the number of nodes that are active at both \(G_{t}^{[\alpha ]}\) and \(G_{t + \Delta t}^{[\alpha ]}\), \(N^{[\alpha ]}_{01}\) is the number of nodes that are active at \(G_{t + \Delta t}^{[\alpha ]}\) but not in \(G_{t}^{[\alpha ]}\), and \(N^{[\alpha ]}_{10}\) is the number of nodes that are active at \(G_{t}^{[\alpha ]}\) but not in \(G_{t + \Delta t}^{[\alpha ]}\). Then, we calculate the mean similarity across intervals Δt, and can obtain the overlaps of actors as a function of Δt:

$$ O^{[\alpha]} (\Delta t) = \frac{1}{T-\Delta t} \sum_{t=1}^{T-\Delta t} J^{[\alpha]} (t, t + \Delta t). $$
(5)

Figure 6a shows results of O[α](Δt) in each layer of temporal networks. As only a small number of observations are available for large values of Δt, we only consider results of Δt∈[1,40] to reduce noise. The key findings are summarized as follows.

Fig. 6
figure 6

a Mean overlaps of active users in temporal networks across interval Δt. Error bars give 95% CI and yellow region highlights the stable stages in the mental, social and fitness layers. b Average numbers of periods that users post content on each topic. Error bars show 95% CI

Limited numbers of hardcore members engage in harmful communication. As shown in Fig. 6a, users engaged in discussing mental health have the largest overlaps over time, indicating strong stability of pro-recovery communities. Moreover, the overlaps of actors in the mental, social and fitness layers tend to be relatively stable (see the highlighted region in Fig. 6a), suggesting the presence of a large set of hardcore users who have a constantly high level of engagement in exchanging these types of content. In contrast, the overlaps in the thinspo and body layers continue to decline as the interval Δt increases. This indicates that members of pro-ED communities have frequent entries into and exits from the system, revealing a high level of fluctuation in these communities.

Individuals engage in harmful communication while organizations engage in healthy communication. To better understand the results in Fig. 6a, we examine users’ posting activities in more detail and compute the number of time periods that a user posts a topic. Figure 6b shows the average number of posting time periods of users on each topic. We see that users on average share body and thinspo content in 1.63 and 1.74 time periods respectively, less frequently than sharing other content. This aligns with our results that active nodes in the body and thinspo layers are highly fluctuating in Fig. 6a. Inspecting the most active users in sharing each topic, we find that active users in sharing mental health are often charities and organizations that aim to prevent ED and mental illnesses, such as @HealingFromBPD, @beatED and @NEDAstaff. Similarly, active users in sharing social and fitness often show a brand-promoting or marketing purpose, e.g., @WWE for social and @Reebok for fitness. In contrast, most active users in sharing thinspo and body content tend to be personal users. Compared to professional organizations and marketing accounts, personal users are less likely to keep continuously active engagement online due to their limited time and attention. Thus, it is not surprising that the thinspo and body layers have less overlaps of active nodes over time than other layers. This may also explain why the thinspo and body layers have a more skewed distribution of nodes’ out-strengths than other layers in Fig. 2d-e.

Discussion

In this study, we have investigated patterns of communication revolved around topics in online ED communities through a large set of conversations among users who self-identified with ED and their friends on Twitter. Applying clustering algorithms to textual content of these Twitter conversations, we find that members of online ED communities are interested in discussing specific topics. By projecting interpersonal interactions in exchanging different topics into a multilayer communication network, we show that different types of communication have distinct network structures and people play different roles in different types of communication. We further incorporate an additional dimension, namely time, into the multilayer network and reveal dynamic characteristics of multiplex communication in online ED communities.

We show that online ED communities largely focus on discussing mental health, general social activities, fitness, body image and thinspo content, which aligns with previous qualitative studies on the content in these communities (Juarascio et al. 2010; Borzekowski et al. 2010; Tiggemann et al. 2018). Beyond such content analysis, we further find that different types of content are diffused in different ways, e.g., conversations on private content often take place within small groups and actors in sharing general topics tend to cluster. This multiplex feature of communication cannot be observed through a single-layer network obtained by aggregating all different types of communication, highlighting the importance of considering multiplex patterns in studying human interactions (Nicosia and Latora 2015; Szell et al. 2010).

In line with evidence on other social media platforms (De Choudhury 2015; Borzekowski et al. 2010; Chancellor et al. 2016; Yom-Tov et al. 2012), we find the presence of two communities with distinct stances on Twitter: (i) a pro-recovery community in which members discuss their health problems and support sufferers to recover from ED and (ii) a pro-ED community in which members often encourage people to lose weight and stay thin. We observe that a small number of users engage in exchanging both pro-ED and pro-recovery content, as indicated by the low value of multiplexity between mental and thinspo layers. This aligns with prior evidence that social networks of pro-ED communities have small overlaps with those of pro-recovery communities on Flickr (Yom-Tov et al. 2012) and YouTube (Oksanen et al. 2015). Despite these small direct overlaps, our results suggest that both pro-ED and pro-recovery communities have pronounced overlaps with communities of users who engage in exchanging content on body image, revealing an indirect connection of social networks between pro-ED and pro-recovery communities. Moreover, we find that users who receive more content on body image are likely to post more thinspo content and less content on mental health. This confirms a conceptual model based on social comparison theory (Yu 2014) where people who are exposed to images of others’ bodies tend to compare their appearance with others, which can lead to a negative view of their own bodies and social pressures to have a thin body that can promote the development of ED.

Our results show that users are more likely to engage in pro-ED communication after pro-recovery communication than vice versa. A possible reason for this is that pro-recovery communities tend to post comments on pro-ED content as an intervention for pro-ED communities (Yom-Tov et al. 2012). We also find that people tend to focus their communication on narrow, specific topics over time. This can be explained as follows: an individual’s time and attention are finite resources, and hence each individual must make a choice about how best to use them given the priority of personal preferences, interests and needs (Gonçalves et al. 2011). Prior studies have shown that focusing on a single topic and posting creative or insightful content on the topic can help people to gain influence online (Cha et al. 2010), and the perception of being valued and respected by others can further motivate people to do so (Ryan and Deci 2000). Moreover, our results suggest that pro-ED communities have a limited number of hardcore actors, with strongly fluctuating membership in the periphery of the communities. A recent study has also shown that individuals of ED communities have short periods of activity in posting content on Twitter (Wang et al.). This unstable community structure aligns with views of the pro-ED communities as hidden, secretive groups with frequent migrations (Casilli et al. 2013), which can make it hard to monitor and track the positions/roles of individuals (e.g., influential cores) in these communities (Sekara et al. 2016). Such fluctuating characteristics is likely to be reinforced by the banning actions of pro-ED content (Casilli et al. 2013; Tiggemann et al. 2018), making pro-ED communities less reachable by health care professionals on social media sites.

Our findings have implications for public health. To prevent ED and minimize the negative impact of pro-ED content online, many social media sites have begun to ban thinspo content. Our results show that pro-ED communities may engage in disseminating other content that is related to thinspo but has not been banned online, e.g., body image. Exposure to such content can potentially reinforce individuals’ engagement in pro-ED communication and weaken their engagement in pro-recovery communication, which may be used as alternatives to the thinspo content to avoid censorship (Boepple et al. 2016; Chancellor et al. 2016). Thus, to enhance health outcomes, content-based interventions should account for the relationships of types of content which can be extracted automatically as we did in this study. Another common intervention strategy in public health is network-based intervention which focuses on using social network data to promote organizational well-being (Valente 2012). One typical approach in this strategy is identifying community opinion leaders to accelerate behavior change (Valente and Pumpuang 2007; Laranjo et al. 2014). We show that people can have different roles in different types of interactions and these roles can change over time. Thus, network-based interventions should account for the multiplex and dynamic nature of social interactions for identifying appropriate opinion leaders for a targeted community.

Future work will focus on exploring effective intervention strategies, tailored to the structural and dynamic characteristics of different interactions. For example, one could identify important/central actors in disseminating harmful content, such that removing or isolating these actors can cause maximum damage to the communication of harmful content but with minimum impact on the communication of other content in online ED communities. However, traditional measures of centrality in monolayer networks may not be able to identify these actors, since actors that are not central in each single-layer network might be important for flows of information across layers and have a high centrality score in a multilayer network (De Domenico et al. 2015). It is thus necessary to consider the multilayer structure in measuring actors’ roles in diffusive processes. Another direction of future work would be to identify individuals’ traits, e.g., personality, emotions, positions in social networks, that can predict the interaction behaviors and dynamics in online ED communities, as well as proper models for characterizing and evaluating dynamic patterns in multilayer networks, so as to enhance our understanding of these communities. In particular, our results indicate that actors’ activities and positions across different layers of the multilayer communication network are correlated in different ways. Thus, to build models that predict a user’s engagement in sharing a type of information or topic, we should not only consider the user’s historical engagement in this topic but also her/his engagement in other topics. Also, we will explore whether our findings are applicable to other online communities based on different social media platforms (e.g., Facebook and Instagram), types of multimedia content (e.g., images and videos), and a wider range of health problems.

In conclusion, our investigation of communication behaviors in online ED communities has uncovered distinct patterns in different types of communication on Twitter. The rich information in our data allows us to explore the effects of multi-dimensional interactions on the structure and evolution of a large-scale social network, thereby establishing the first empirical basis for modeling multiplex and dynamic communication in online health communities. Our findings can guide public health officials to design advanced online interventions to prevent harmful content from reaching risky individuals and promote organizational well-being.