1 Introduction

After the East Japan Great Earthquake on 11 March, 2011, Twitter users reacted quickly and discussed a variety of topics both real and imaginary. An example is the rumor about an explosion at a petrochemical complex owned by Cosmo Oil Co., Ltd. Stories of oil tanks exploding and releasing harmful substances into the air caused widespread panic until official government announcement released on the following day. Social media has the potential to be a source of both help and trouble during disasters. Constructing strategies for disaster mitigation requires addressing issues that arise from social media as well.

Analysis and modeling of popularity dynamics of an online content has been an active area of research  [1,2,3,4,5,6,7,8,9,10,11]. A popular method for extracting a topic is to collect all the tweets that mentioned a specific word (keyword) or hashtag and analyze the temporal patterns  [1, 2, 7, 10]. While this approach makes it easy to extract the emergence of topics, we can often identify various “sub-topics” within a topic intuitively. The diversity of the content may vary greatly depending on topics.

We focus on a subtopic obtained by clustering the extracted tweet data related to a keyword (i.e., a topic). We study the topic diversity defined as the number of subtopics in a topic (discussed in more detail in Sect. 3.3). The topic models including Latent Dirichlet Allocation (LDA) [12] are popular method for discovering an abstract “topic” from a documents and so have been applied to social media analysis for discovering topics or sub-topics  [2, 7, 10, 11]. In spite of their simplicity and usefulness, most topic model algorithms require users to specify the number of subtopics in advance. In this study, we cannot apply these models because we are interested in inferring the number of subtopics from data.

In this study, we investigate how topic diversity depends on the truthfulness of a topic (whether it is a rumor or a non-rumor) and how the topic diversity changes in time after a disaster. As a first step, we develop a method for quantifying the topic diversity of the tweet data based on text content. Our method is based on clustering a tweet graph using Data polishing [13, 14] that automatically determines the number of subtopics. Then, the proposed method is applied to a Twitter dataset before and after the Great East Japan Earthquake of 2011. We find that the temporal patterns in topic diversity differ between rumor and non-rumor topics. Finally, we evaluate the performance of the method and compare its performance with several baselines.

The contributions of this paper are as follows:

  • We propose a method for analyzing topic diversity (i.e., homogeneity of opinions in a topic) based on graph clustering.

  • We compare topic diversity between rumor and non-rumor topics by applying the method to Twitter dataset before and after the Great East Japan Earthquake of 2011.

  • We compare the performance of the proposed method to other existing methods.

This paper is organized as follows. Section 2 introduces related work. Section 3 describes our proposed method. We apply the proposed method to a large Twitter dataset, evaluate our method and compare its performance with other existing method in Sect. 4. Finally, in Sect. 5, we conclude and discuss a direction for future research.

2 Related work

It is essential to discover a higher-level “topic” underlying the online content for summarizing and finding subtopics of collected data (e.g., tweets or hashtags). There are three approaches for topic identification: (1) topic models, (2) clustering word features and (3) graph clustering. Topic models extract a latent topic from the frequency of words in a tweet by using a probabilistic model. LDA [12] and its extensions [15, 16] have been used to identify topics to social media data in a number of approaches [2, 7]. These algorithms have the additional disadvantage of requiring the user to specify the number of topics as a parameter. Clustering word features classifies online contents by applying clustering algorithms to feature vectors. In Rosa et al. [2], the authors developed a method for discovering a topic by applying K-means algorithm to TF-IDF vectors calculated from the tweets and showed their method performed better than LDA. Graph clustering classifies a large amount of online content by finding community structures. In Tanev et al. [17], the authors showed that the utilization of word co-occurrence graph clustering improved the performance of linking news events to tweets. While previous work identified a topic of each tweet and analyzed the temporal dynamics, our work additionally investigates the diversity of the content by calculating the number of topics. For this purpose, we exploit a graph clustering approach that does not require the number of topics to be specified in advance.

Recently, it has been argued that search engines and social networks potentially facilitate “filter bubble” effects, in which machine-learning algorithms amplify ideological segregation by recommending content targeted toward a user’s preexisting opinions or biases [18, 19]. In Puschmann 2019, the author pointed out that a political party or political candidate is able to exert a great influence on search results, which decreases the diversity of the content people will see when they search for information about that candidate [20]. Some work has developed algorithms to address the diversity issue [21, 22]. Interestingly, Stoyanovich et al. [21] proposed ranking algorithms that achieved the diversity and fairness of the results, usually with modest costs in terms of quality. While these works focus on the diversity of the input to a user on social media, we focus on the topic diversity of a population of tweets.

In this study, we find the difference in the temporal pattern of the topic diversity between rumor and non-rumor topics, which is applicable to rumor detection algorithms. Rumor detection is an active research area, and several methods have been proposed [23,24,25,26,27]. While these methods are based on the machine learning methods, our method is based on clustering a graph of word co-occurrence. Unfortunately, it is difficult to select most relevant features in machine learning problems when there exist a large number of feature variables [28, 29]. The graph-based approach has three potential benefits: (1) it is simple to implement, (2) it is applicable to large data set, and (3) the result is easy to interpret.

Finally, we discuss the relation between this work and our previous one [10]. In the previous work, we proposed a method for visualizing temporal topic transition based on graph clustering. Since the method was proposed for visualization, we demonstrated the topic transition of a few topics. We neither analyzed the temporal patterns quantitatively nor examined the performance of the method. In this work, the method has been extended to quantify the content diversity of a topic by using correlation analysis and quantile regression. We also compare the temporal patterns of the topic diversity between rumor and non-rumor topics. Furthermore, we evaluate the performance of the method and demonstrate its improvement on the runtime for data processing over existing methods.

3 Proposed method

Our method consists of three steps (Fig. 1): (a) Construction of tweet-word matrices, (b) Tweet graph generation, (c) Graph clustering using Data polishing.

Fig. 1
figure 1

Proposed method. Our method consists of three steps. a Construction of tweet-word matrices, b tweet graph generation, c clustering tweet graph

3.1 Construction of tweet-word matrices: Fig. 1a

We analyze tweet data posted from 00:00 Japan Standard Time (JST) on March 11 to 0:00 JST on March 14, a total of 72 h. The tweet data were divided into 144 (\(= 72 \div 0.5\)) groups based on their posted time and a fixed time window of half an hour. We construct a sequence of tweet-word matrix for each window: \(\langle { W }_1, W_2, \ldots W_{144} \rangle\). Tweet-word matrix (W) denotes the constituent word of the tweets

$$\begin{aligned} W= \left( \begin{array}{ccccc} { w }_{11}&{}{ w }_{12} &{}\cdots &{}{ w }_{1n} &{}{}\\ { w }_{21}&{}{ w }_{22} &{}\cdots &{}{ w }_{2n} &{}{}\\ \vdots &{}\vdots &{}{} &{}\ddots &{} \vdots \\ { w }_{m1}&{}{ w }_{m2}&{}\cdots &{}{ w }_{mn} &{}{}\\ \end{array}\right) \end{aligned}$$

where m and n are the number of tweets and words during a time window, respectively. The element \({ w }_{ij}\) is 1 if ith tweet contains the jth word, and 0 otherwise. We used the morphological analyzer MeCab [30] to perform Japanese word segmentation.

3.2 Tweet graph generation: Fig. 1b

We generate tweet graph from the tweet data in each time window. We define a tweet graph as an undirected graph, in which a node represents a tweet and an edge represents that the connected tweets are similar (Fig. 1): If the Jaccard coefficient [31] of two tweets is larger than the edge threshold \(\theta _E\), these nodes (tweets) are connected by an edge. The threshold was set to \(\theta _E= 0.3\).

The main result (Fig. 3) is robust for the small change of the threshold: The result did not change qualitatively for different thresholds \(\theta _E= 0.2\).

3.3 Clustering tweet graph: Fig. 1c

We here briefly describe our Data polishing algorithm for clustering a tweet graph (see [13, 14] for more detail). This algorithm iteratively increases the density of dense subgraphs and makes sparse subgraphs sparser. As a consequence, we obtain a graph whose dense subgraphs are all cliques and can thus easily be enumerated by a maximal clique enumeration algorithm.

The iteration is described as follows. For arbitrary two vertices u and v, we consider the condition

$$\begin{aligned} \left| N[u] \cap N[v]\right| {/} \left| N[u] \cup N[v]\right| > \theta _P, \end{aligned}$$

This intuitively means that u and v belong to the same cluster, since u and v’s neighbor sets must overlap considerably when they belong to the same cluster. We construct a new graph by using this condition so that two vertices are connected when this condition is met in the original graph. In the other words, two vertices are connected in the new graph if and only if the vertices seem to belong to the same cluster. Data polishing applies this graph reconstruction iteratively until it does not change. Maximal clique enumeration is performed with an algorithm such as MACE[32], to obtain the clusters of the resulting graph. The threshold \(\theta _P\) was set to 0.2. The threshold value affects the clustering result. The size of clusters increases as the threshold \(\theta _P\) decreases.

In this paper, we analyze the Tweet graph related to a topic (e.g., non-rumor topic such as “I’m OK” and rumor topics). The resulting cluster is interpreted as a subtopic, and the number of clusters is interpreted as the diversity of the topic (topic diversity). For instance, suppose that there are two topics, and Topic A consists of 30 subtopics (clusters) and Topic B consists of 5 subtopics (Fig. 2). In this case, we interpret that Topic A is more diverse than Topic B.

Fig. 2
figure 2

Interpretation of the number of the clusters (# of clusters)

4 Experiments

We first describe the Twitter dataset. Second, we analyze the temporal pattern of the topic diversity for rumor and non-rumor topics. Finally, we compare the performance of the proposed method with the existing clustering methods.

4.1 Dataset

Our data set consists of tweets posted around the time of the Great East Japan Earthquake that happened at 14:47 JST on March 11, 2011. This dataset was obtained from the social media monitoring company Hotto link Inc. [33], which tracked users who used one of 43 hashtags (for example, #jishin, #nhk, and #prayforjapan) or one of 21 keywords related to the disaster. Later, Hottolink collected all tweets posted by all of these users between March 9th (2 days prior to the earthquake) and March 29th. The total number of is around 200 million tweets, which offers one of a largest data set for users’ responses to a disaster. We focused on the dataset from 00:00 JST on March 11 to 0:00 JST on March 14, a total of 72 h.

We picked out 10 topics that contain the following keywords: “do” (“suru” in Japanese), “I’m OK”, “go home”, “important”, “safe”, “damage”, “Fukushima”, “Miyagi”, “Cosmo Oil”, and “Isodine”. The collected topics consists of eight non-rumor topics (“do”, “I’m OK”, “go home”, “important”, “safe”, “damage”, “Fukushima”, and “Miyagi”) and two rumor topics (“Cosmo Oil” and “Isodine”). Note that the word “suru” (an auxiliary verb often translated as “do” or ignored when translating to English) is a common function word; a user’s usage of “suru” does not depend on the topic that they are tweeting about. Analyzing the usage of “suru” gives us a baseline for a general topic. The other words were heavily used in tweets after the East Japan Great Earthquake. Particularly, the two rumor topics (“Cosmo Oil” and “Isodine”) were related to well-known rumors spread after the earthquake [34]. A detailed description of the rumors follows.

  • “Cosmo Oil” An explosion at the Cosmo Oil plant released harmful substances into the air.

    “Cosmo Oil” is the name of a Japanese oil company. The rumor about the oil tank explosion was diffused and frightened people. A rumor topic about the explosion at the Cosmo Oil petrochemical complex progressed through the four stages:

    1. 1.

      Fact Around 15:00 JST on March 11 (just after the quake), the petrochemical complex in Chiba caught fire.

    2. 2.

      Rumor Around 19:00 JST on March 11, the following two tweets were posted and retweeted:

      • Radiation and harmful chemicals are leaking into the air from the petrochemical complex. Be careful!

      • Don’t go out! The rain contains radiation and harmful materials from the petrochemical complex explosion.

    3. 3.

      Correction Around 15:00 JST on March 12 (the day after the earthquake), the industry’s Web site and the local government’s twitter officially corrected the rumor.

    4. 4.

      Disappearance At night on March 12, the topic disappeared.

  • “Isodine” Isodine was good for protecting from radiation.

    “Isodine” is the brand name of a mouthwash that includes iodopovidone. A rumor emerged that isodine protects people from radiation. It progressed in the following four stages:

    1. 1.

      Fact After the nuclear plant explosion occurred, twitter users expressed fear of radiative contamination.

    2. 2.

      Rumor Around 7:00 JST on March 12 (the day after the earthquake), the rumors about isodine’s protective benefits emerged.

    3. 3.

      Correction Around 15:00 JST on March 12 (the day after the earthquake), the government and isodine’s manufacturer corrected the rumor.

    4. 4.

      Disappearance At night on March 12, the topic gradually disappeared.

4.2 Analyzing temporal patterns of topic diversity

We examine the temporal patterns of topic diversity of tweets including 10 keywords (e.g., topics) from 0:00 JST, 11 th March to 0:00 JST, 14 th March. We calculated the topic diversity for each 30 minutes widow by the following procedure. First, the word count matrices were constructed from the collected tweets (Sect. 3.1). Second, the tweet graph was generated (Sect. 3.2) and the topic diversity (i.e., the number of subtopics) was obtained by applying Data polishing algorithm to each graph (Sect. 3.3).

Figure 3 shows log–log scatter plot of the number of tweets and the topic diversity for 10 topics and for all the dataset. Each circle represents the number of tweets and the topic diversity at a time window. The color in a circle represents the time: The color changes from white to dark blue as the time passes. For all the dataset (Fig. 3a) and non-rumor topics (Fig. 3b–i), we observe that the topic diversity is highly correlated with the number of tweets (Pearson correlation coefficient ranged from 0.953 to 0.996). This result suggests the power law relationship between the topic diversity and the number of tweets. For rumor topics (Fig. 3j, k), we observe that the scatter plot disperses from the linear fit compared to the non-rumor ones. The topic diversity is less correlated with the number of tweets (Pearson correlation coefficient 0.958 and 0.938 for “Cosmo Oil” and “Isodine”, respectively).

To quantify the dispersion in the scatter plot, we applied the quantile regression[35, 36] that fits the top and bottom \(5\%\) data point. Figure 3 depicts the top and bottom \(5\%\) regression lines (in blue and red, respectively) and their slopes. These slopes are close for all the dataset (Fig. 3a) and non-rumor topics (Fig. 3b–i). In contrast, these slopes are different rumor topics (Fig. 3j, k): 0.73 vs 0.59 and 0.83 vs 0.60 for “Cosmo Oil” and “Isodine”, respectively. This result indicates that the scatter plot of the rumor topics disperses compared the non-rumor topics, which is consistent with the correlation analysis. In addition, we found that the slopes of the general topic “do” (“suru” in Japanese) was smaller than that of the specific ones. A possible reason could be that 1) there are a huge difference in the number of tweets between “do” and the other topic and 2) the slope tends to decrease when the number of tweets is large. Thus, the slope may not be useful for comparison of two topics when the number of tweets in each topic is not comparable.

Fig. 3
figure 3

The number of tweets (# of Tweets) vs the number of subtopics (# of Subtopics) for all the dataset (a) and for 10 keywords (bk). The slope of the top 5% and bottom 5% regression lines are shown in blue and red, respectively (color figure online)

Next, we observe how the topic diversity of the rumor topics changed before and after the earthquake. Figure 4a shows the time course of the topic diversity of the rumor about “Cosmo Oil”. We can see that the topic was not popular before the earthquake. While the number of tweets increased dramatically just after the earthquake, the topic diversity does not increase much, showing that the topic burst with relatively homogenous opinions, or low diversity. After the rumor correction was issued, both the number of tweets and the topic diversity increased and the topic diversity grew higher than that before the rumor correction. Figure 4b shows the time course of the topic diversity of the rumor about “Isodine”. We can see that the temporal pattern of the topic diversity is similar to that of “Cosmo Oil”. Interestingly, both rumor topics spread with low topic diversity and they were corrected with high topic diversity.

Fig. 4
figure 4

Time course of topic diversity. Cosmo oil (a) and isodines (b)

4.3 Performance evaluation

In this section, we examine the runtime and the size of subtopics generated by the proposed method and existing methods.

4.3.1 Runtime

We examine the runtime for processing the tweet dataset by the proposed method based on Data polishing (Sect. 3.3) and compare the performance with four existing methods for clustering subtopics: LDA [12], K-means [37], MeanShift [38] and Agglomerative clustering [39]. LDA is the most popular topic model algorithm based on word frequency across documents. K-means, MeanShift and Agglomerative clustering are general clustering algorithms that were applied to word vectors \(\mathbf {w}_j= (w_{j1}, w_{j2}, \ldots , w_{jn})\) (\(j= 1, 2, \ldots , m\)) (Sect. 3.1). Data polishing is a graph clustering approach and was applied to the tweet graph (Sect. 3.2). We used Nysol Python [40]’s implementation of Data polishing, and scikit-learn’s implementations of K-means, MeanShift, and Agglomerative Clustering [41], and LDA implemented in Python Gensim [42]. All our experiments were performed on a 2018 13-inch MacBook Pro, with a 2.7 GHz Intel Core i7 with 16 GB 2133 MHz of memory.

Figure 5 illustrates the performance of the algorithms, as the number of tweets was increased from 2000 to 30,000. These tweets were randomly extracted from the Tweet data posted during 18:30–19:00 JST on March 11. The runtime was measured by executing each method for three times and calculating their average. Note that we stopped the measurement if the runtime reached 2 h (7200 s). While Data polishing and MeanShift can automatically determine the number of subtopics from the data, the other methods cannot do it. For these methods, the number of subtopics was assigned as the number that Data polishing determined. The numbers of subtopics estimated by Data polishing were smaller than those by MeanShift. The result shows that Data polishing is more efficient than the existing algorithms. The second most efficient one is LDA. The proposed method is five times faster than LDA for the sample size of 20,000 and at least six times faster than LDA for the sample size of 30,000. LDA code was terminated because the runtime exceeded 2 h.

Fig. 5
figure 5

Performance evaluation. The runtime of the proposed method (data polishing: blue) was compared with four existing methods (MeanShift: orange, K-means: black, agglomerative: green, and LDA: yellow). We stopped the measurement if the runtime reached 2 h (7200 s). Please note that the color version is available at the online version of this article (color figure online)

4.3.2 Number of tweets in top 5 largest subtopics

We randomly extracted 10,000 tweets posted during 18:30–19:00 JST on March 11 and analyzed the Tweet data by using the proposed method and the existing methods (LDA, K-means, MeanShift, and Agglomerative clustering). Table 1 shows the number of tweets in the top 5 largest subtopics. K-means and MeanShift each generated one large cluster (more than 14% of the total tweet) as a largest subtopic. Typically, these method tend to result in one huge subtopic and many small subtopics, which often leads to a trivial clustering or difficulty for interpreting the results. In contrast, the proposed method (Data polishing) generated the smallest cluster as the largest subtopic (\(\sim\) 4% of the total tweets). Though LDA and Agglomerative clustering’s largest subtopics were smaller than K-means’ and MeanShift’s, they require specifying the number of subtopics ahead of time.

Table 1 The number of tweets for top 5 largest subtopics

5 Conclusion

In this paper, we have proposed a method for analyzing the topic diversity using Graph Clustering. After generating the tweet graph based on the similarity between tweets, we use Data polishing algorithm to obtain the clusters in the graph. We interpret a cluster as a subtopic and the number of clusters as the topic diversity, i.e., the number of subtopics in a target topic. This method was applied to a dataset of millions of tweets posted before and after the Great East Japan Earthquake of 2011. The proposed method is useful for detecting a low topic diversity situations, and the rumor diffusion associates with the burst with low topic diversity or homogeneous opinions. We have confirmed that our approach outperformed other existing clustering approaches (LDA, K-means, MeanShift, and Agglomerative clustering) in running time and results. Our method has significant applications in that it could, for example, companies or celebrities about newly circulating rumors, and help them avoid bad publicity on social media.

The main limitation is that we focused on only 10 topics in this study. We have performed a case study of rumor diffusion on Twitter during a disaster rather than a general study of rumor diffusion. We are planning to apply the proposed method to another rumor dataset [43] and systematically investigate temporal patterns in topic diversity toward the development of a practical rumor detection algorithm.