Skip to main content

Making sense of tweets using sentiment analysis on closely related topics

Abstract

Microblogging has taken a considerable upturn in recent years, with the growth of microblogging websites like Twitter people have started to share more of their opinions about various pressing issues on such online social networks. A broader understanding of the domain in question is required to make an informed decision. With this motivation, our study focuses on finding overall sentiments of related topics with reference to a given topic. We propose an architecture that combines sentiment analysis and community detection to get an overall sentiment of related topics. We apply that model on the following topics: shopping, politics, covid19 and electric vehicles to understand emerging trends, issues and its possible marketing, business and political implications.

Introduction

Online social networks (OSNs) have been burgeoning in recent years (Alamsyah et al. 2021). This rapid growth of social network, combined with easily accessible data and discussions on multitude of topics provides great research potential for customer analysis, product analysis, sector analysis and digital marketing. Different data science and machine learning techniques such as clustering, association rule mining, ensemble models, deep learning and sentiment analysis, are used in conjunction with digital marketing and product analysis (Saura 2020; Alsini et al. 2018). People use social networks to discuss wide ranging topics and share opinions on them (Wu et al. 2011). Given the scale of information on OSNs, there arises a need to apply different data mining techniques to get actionable insights from them (Davenport 2014; Saura et al. 2019).

OSNs such as Facebook, Twitter, Instagram and so on encourage people to participate and collaborate, forming virtual online communities (Leskovec et al. 2008; Reyes-Menendez et al. 2018). This encouragement is in various forms such as likes, shares, retweets, use of hashtags, comments and mentions. In these OSNs, authors write about their life, share opinions on various topics and discuss wide-ranging issues (Wu et al. 2011). Also, the use of collaboration features, as discussed above, facilitates studies like community detection by allowing formation of multidimensional networks based on friend/follower network (Deitrick and Hu 2013), network based on hashtags (Xiao et al. 2014; Lorenz-Spreen et al. 2018), sentiment based network (Xu et al. 2011) and so on. Multidimensional networks are networks that may have multiple connections between any pair of nodes (Berlingerio et al. 2013) for this purpose, multidimensional analysis is required to gain valuable insights from them. Among different OSNs, Twitter is one of the most studied OSN for social network research (Kumar and Sebastian 2012). One of the main advantages of platforms like Twitter for research is that, on these platforms, users are organized in networks, which makes it possible to investigate groups of people, or communities, united by common interests, rather than individual profiles or personalities which is enabled by extensive use of hashtags, mentions and retweets that forms a complex network (Hubert et al. 2017), which in turn is important for big data analysis and digital marketing (Karataş and Şahin 2018; Saura 2020; Hu et al. 2013).

To gauge profitability of a product or a business, one needs to consider two main things: (1) attractiveness of a business or product and (2) competitiveness level (Chevalier-Roignant and Trigeorgis 2012). Finding attractiveness of a product is important as with time, trend changes. These changes in trends often demand changes to existing business models in order to sustain in the market and to alleviate the inevitable risks involved (Direction 2021). As an example, recent trend to use sustainable energy led to the growth of electrical vehicles shifting the focus from petrol or diesel based vehicles (Hsieh et al. 2020; Hall and Lutsey 2020). A growth in trend is generally accompanied with positive opinion for that topic consequently text analysis techniques are often used to identify public opinion about a trend (Hassani et al. 2020). In many cases, it is often important to get a broader understanding of the topic to understand key players and overall opinion for that sector. Broader understanding of a topic also allows us to better understand emerging trends and public opinion about them. For this purpose, traditional methods tend to be more time consuming as it involves finding relevant topic and then applying sentiment analysis over it (Chandrasekaran et al. 2020). This takes time because topic modeling is a slow process and often involves qualitative human intervention (Reyes-Menendez et al. 2020). Also, existing topic modeling methods does not allow changes to generality of the found topics (Boon-Itt and Skunkan 2020; Chandrasekaran et al. 2020; Reyes-Menendez et al. 2020; Saura and Bennett 2019). With this as our motivation, in this paper we propose a framework that can be used to get related trends and topics accompanied by their overall public sentiment along with a parameter that can be used to change generality of the found topics. Finding recent trends and topics is important for businesses, politicians and marketing agencies alike. We can use the results from our proposed model to answer questions like what is the overall sentiment for a given topic? What are the emerging issues and public opinion about them? How is a product faring compared to other products? What is the general market trend? and who are the key players for a given trend? For this purpose, we apply our model over wide ranging topics like shopping, electric vehicles, covid19 and politics. We then compare our results with recent exploratory analysis in these topics.

In conclusion, the contributions of this paper include: (1) overall topic sentiment classifier model; (2) evaluation of different trends based on our model findings. In this, our goal being to propose a model that can be used to effectively find related topics along with their overall sentiment from a given topic on twitter platform. Furthermore, in our model, we provide a key metric that can be used to vary how general the resultant related topics are with respect to our given topic. We also note that user generated content are qualitative and, therefore, should be used for exploratory analysis (Kim et al. 2013; Pfeffer et al. 2018).

Background

Rapid growth of OSNs and massive data flow through social networks have given rise to research on the analysis of social networks (Alamsyah et al. 2021; Tavakolifard and Almeroth 2012; Kulshrestha et al. 2015). OSNs have also changed the dynamics of how consumers buy products and interact with one another (Lăzăroiu et al. 2020). This change of dynamics combined with modern data mining techniques has led to its use in digital marketing and targeted customer analysis (Saura 2020). In particular, sentiment analysis for opinion mining (Diamantini et al. 2019) and community detection for customer targeting, segmentation and topic modeling (Karataş and Şahin 2018) are widely studied. We use sentiment analysis to understand how people orient themselves about a topic given a piece of text (Yadav and Vishwakarma 2020). Particularly, we aim to determine whether the given text is of positive connotation, negative connotation or neutral connotation (Kontopoulos et al. 2013). It helps us understand public opinion given a text corpus of a given topic. As an example, we can try to identify public opinion about covid-19 based on tweets about it (Boon-Itt and Skunkan 2020).

Another task for understanding broader market dynamics is to retrieve closely related subtopics for which we can perform the above mentioned analysis (Chandrasekaran et al. 2020). To get closely related subtopics, we can use community detection over a topic based network (Lorenz-Spreen et al. 2018). We define community detection as a way in which we attempt to find a set of clusters such that it minimizes intraconnection between them and maximizes interconnection within the cluster in a given set (Fortunato 2018). Another possible approach is using latent Dirichlet allocation (LDA) over corpus of text to find embedded topics within them (Saura and Bennett 2019). Although LDA is a good choice to detect themes discussed in a set of text corpus, it fails to incorporate intrinsic twitter feature ‘hashtag’ that inherently is used for expressing the topic that particular tweet is about (Kumar and Sebastian 2012; Davidov et al. 2010). Furthermore, there is no means using which we can induce the generality metric to find related topics for the given tweets.

In real world, networks are often multidimensional. To get actionable insights from such networks, we require multidimensional analysis to distinguish among different kinds of interactions or equivalently look at interaction from different perspectives. Dimensions can either be explicit that directly reflect interactions such as friend-follower network or it can be implicit that reflect interesting qualities of interactions that can be inferred from the available data, for instance, hashtag network (Berlingerio et al. 2013). In our work, we focus on multidimensional network with two explicit dimensions (1) hashtags and (2) opinions about topics. There can be different interactions between two users. They can be connected to each other with same set of topics with similar opinion. They can be connected to each other with same set of topics with different opinion. They can also be connected to each other with partial set of topics with same or different opinion.

Organization

The rest of the paper is organized as follows. In Sect. 2, we discuss prior works on community detection and sentiment analysis. In Sect. 3, we take a look at community detection and sentiment analysis. The architecture for overall topic sentiment classifier (OTSC) is described in Sect. 4, mentions our results in Sect. 5 and discusses it in Sect. 6. In Sect. 7, we conclude by stating our contributions, and discuss managerial implications and practical/social implications for marketers. Finally, we discuss limitations and future research in Sect. 8.

Related work

It is important to understand emerging trends and their public opinion for making informed business and political decision (Bello-Orgaz et al. 2020; Ansari et al. 2020; Puthussery 2020). Interactions among people in OSN lead to formation of a multidimensional complex network. (Berlingerio et al. 2013) lays foundations of multidimensional network and its analysis.

Among different OSNs, Twitter is one of the most studied OSN (Hubert et al. 2017). (Pak and Paroubek 2010; Kouloumpis et al. 2011; Kumar and Sebastian 2012) analyzes twitter tweets using sentiment analysis. Furthermore, there are many works related to development and discussion about different sentiment analysis models (Kontopoulos et al. 2013; Bhatnagar et al. 2020; Zhang et al. 2021; Yadav and Vishwakarma 2020). These models can be used to understand sentiment of a given text. To train a model for classification purposes, we need labeled data. For this purpose, some automatic data collection methods have been researched, for instance, (Read 2005) used emoji’s to collect and label data while (Davidov et al. 2010) used hashtags for the same.

Other major topic of research for OSNs includes community detection (Karataş and Şahin 2018). Community detection is not a well-defined topic and requires some degree of arbitrariness and/or common sense. Given that, Fortunato (2010) performed a thorough review of community detection algorithms. Karataş and Şahin (2018) studied various applications of community detection such as criminology, public health, politics, customer segmentation, smart advertising, targeted marketing, network summarization, social network analysis, recommendation systems, link prediction and community evolution prediction. Furthermore, in rare instances, community detection and sentiment analysis have been combined, for instance, Deitrick and Hu (2013) focus on using sentiment analysis to enhance community detection. They use different twitter specific features to further enhance the detected community.

Saura and Bennett (2019) proposed a three stage method for text mining using LDA for topic modeling followed by sentiment analysis which is followed by application of text mining techniques. Application of similar procedure can be found in Reyes-Menendez et al. (2018), Saura et al. (2019), Chandrasekaran et al. (2020) and Boon-Itt and Skunkan (2020). Reyes-Menendez et al. (2020) used model proposed in Saura and Bennett (2019) to analyze and understand business implication of #metoo in twitter, further highlighting key takeaways for businesses and advertisers. Liu et al. (2017) also proposed a framework that integrates LDA and sentiment analysis and answer several brand-related questions using it. Liu et al. (2019) used model proposed in Liu et al. (2017) in part to get trendiness metrics for analysis of luxury brands.

All these works use LDA as a base algorithm for finding topics given a text corpus. They then apply different text analysis tasks on those topics. Using LDA on twitter is a bad choice as it fails to incorporate twitter intrinsic feature such as hashtag, fails to provide generality for topics and requires human intervention. To our knowledge, the model proposed in our paper (OTSC) is the first model that identifies these shortcomings and solves them.

Community detection and sentiment analysis

Community detection and sentiment analysis are core components of our architecture. There are various preprocessing tasks required for both stages, and in this section, along with discussing our choice of algorithm, we will also look at preprocessing steps involved for that particular stage.

Community detection

In this section, we discuss community detection briefly in the context of social network analysis. For a more detailed introduction to community detection, refer to Fortunato (2010).

Preprocessing for community detection

To apply a community detection algorithm, we need to model tweets as mathematical graphs. Generally, a friend follower network is selected because community detection (Solomon et al. 2019; Luo et al. 2020), in general, used to detect closely related groups. We are more interested in closely related topics than groups; hence, we use hashtags to model our network (Xiao et al. 2014). Hashtags by nature represent the topic of discussion in that tweet, thus giving considerable information about that tweet (Kumar and Sebastian 2012). To form communities based on hashtags, we first extract hashtags from the tweet and lowercase it to retain meaning irrespective of capitalization. After successfully extracting hashtags, we form combinations of 2 and link all the combinations together. This process is repeated on all tweets to make a network of hashtags. To make this network of hashtags (hashmap), we need general topic G. G would be used to collect data from twitter such that the hashmap generated from it would cover wide ranging topics.

Community detection algorithm

Communities are defined as sets of vertices which are densely interconnected whereas sparsely connected with the rest of the vertices (Parés et al. 2017). We have various community detection algorithms such as Newmans leading eigenvector (Newman 2006), Label Propagation (Raghavan et al. 2007), Louvian method for community detection (Blondel et al. 2008), infomap (Rosvall and Bergstrom 2008) and many more (Chunaev 2020). For our purposes, we want to find k communities instead of some random number of communities. Using k we can change the generality of our result, i.e., the higher the value of k, the more specific a community would be. This is due to the fact that k denotes number of communities found and if there are lesser number of communities, the more general a community is. Hence to find k communities, we use the fluid communities algorithm as it allows us to provide insights into the graph structure at different levels of granularity (Parés et al. 2017).

Fluid communities algorithm is a propagation-based algorithm that is capable of identifying variable number of communities in a network. It is based on the idea of introducing number of communities within a non-homogeneous environment where communities will expand and compete until a stable state is reached. Given a graph \(G = (V,E)\) where V is set of vertices and E is set of edges in the graph, fluid communities algorithm initializes k communities, i.e., \(C= \{c_1,c_2,\ldots ,c_k\}\), where \(0< k < |V|\). Each community is initialized in a different random vertex and is associated with density d as described in Eq. (1).

$$\begin{aligned} d = \frac{1}{|v \in c|} \end{aligned}$$
(1)

Fluid community algorithm operates in supersteps and updates communities using an update rule until assignment of a vertex to that community does not change for two consecutive supersteps (Parés et al. 2017).

Sentiment analysis

In this section, we take a look at sentiment analysis and the preprocessing steps required for performing sentiment analysis on tweets.

Preprocessing for sentiment analysis

To train sentiment classifier, we require labeled tweets which we gather using method described in Go et al. (2009). In this method, we use emoji’s to collect data and label them based on polarity of that emoji. In this method, we assume that tweets containing happy emojis like ‘:-), :), :D’ will correlate to a positive tweet and tweets containing sad emoji’s like ‘:-(, :(, =(’ will correlate to a negative tweet. After data gathering is done, we first filter our tweets by converting text to lowercase, removing all hashtags, removing retweet designations (‘RT’), usernames and URLs. After this step, we remove all the stopwords from the NLTK corpus, and we perform tokenization and remove punctuations from the collected tweets.

Sentiment analysis algorithm

After preprocessing, we need to extract features from our tweets, and for that, we apply TFIDF Vectorization [refer Eq. (4)] (Singh and Shashi 2019). In Eq. (2), \(f_{t,d}\) is the raw count of a term t in the document d. In Eq. (3), N is the total number of Documents, i.e., |D|. The resulting features are passed to the Multinomial Naive Bayes Classifier (Kibriya et al. 2004), which classifies tweets positively or negatively.

$$\begin{aligned}&\mathrm {tf}(t,d) = \frac{f_{t,d}}{{\sum _{t' \in d}{f_{t',d}}}} \end{aligned}$$
(2)
$$\begin{aligned}&\mathrm {idf}(t, D) = \log \frac{N}{|\{d \in D: t \in d\}|} \end{aligned}$$
(3)
$$\begin{aligned}&\mathrm {tfidf}(t,d,D) = \mathrm {tf}(t,d) \cdot \mathrm {idf}(t, D) \end{aligned}$$
(4)

Naive Bayes classifier is based on Bayes theorem (Anthony 2007) where s is a sentiment, M is a Twitter message. Because we have equal sets of positive and negative tweets we can simplify the equation as:

$$\begin{aligned}&P(s|M) = \frac{P(s) \cdot P(M|s)}{P(M)} \end{aligned}$$
(5)
$$\begin{aligned}&P(s|M) = \frac{P(M|s)}{P(M)} \end{aligned}$$
(6)
$$\begin{aligned}&P(s|M) \sim P(M|s) \end{aligned}$$
(7)

After training our sentiment classifier on our training data for sentiment classification, it performs with 77% accuracy. When we calculate f1 score, we get 76% for positive labels and 78% for negative labels.

Overall topic sentiment classifier

Fig. 1
figure 1

Architecture for overall topic sentiment classifier model

Figure 1 gives a general overview of our architecture. The first step involves collecting data and preprocessing it for training our sentiment classifier and generating a hashmap based on a general topic G (Refer Sect. 3.1.1). After preprocessing is done, we use the hashmap to detect communities using fluid community detection algorithm. It takes in k that determines how many communities should be formed (Parés et al. 2017). By default, we divide our hashmap into ten communities (i.e., \(k=10\)). We can fine-tune these hyperparameters based on our requirements. After finding communities and training our classifier, the trained sentiment classifier and detected communities (C) are passed for analyzing overall sentiments for related topics. In this step, we pass a topic t that we use along with C to find related topics and perform sentiment analysis for those topics.

Analyzing overall sentiments

This module takes in a general topic G, detected communities C, hashmap from the previous step and a topic t. It first uses a hashmap and t to calculate the most valued, directly related topics \(R_t\). We can find this by looking at neighboring nodes of t, and we pick at max ten topics with the highest weight. We apply Eq. (8) to find suitable community (\(S_t\)) for t.

$$\begin{aligned} S_t= & {} \max (R_t \cap c)\nonumber \\&\quad \forall c \in C \end{aligned}$$
(8)

In Eq. (8), C is set of communities we detect using FluidC algorithm, and \(R_t\) is set of directly connected topics we select using hashmap and topic t.

$$\begin{aligned} Q_f = \frac{w * d}{w + d} \end{aligned}$$
(9)

After this step, we apply Eq. (9) to all the nodes within \(S_t\) and pick 10% nodes with highest Quality factor \(Q_f\). It ensures that the topics we pick in a community are of high quality, i.e., have high degree and its combined weight with adjacent edges is high. In Eq. (9), w is total weight of node with its adjacent nodes and d is the degree of the node under consideration. We pass those 10% selected topics along with a value n to our sentiment classifier. n denotes the number of tweets to fetch for each topic for sentiment analysis. To demonstrate, we take \(n=1000\) and use Twitter API to fetch tweets for our selected topics. The sentiment classifier then classifies sentiment for each tweet. To keep track of overall sentiment, we initialize T with 0 and increment it by 1 for every positive tweet and decrement by 1 for every negative tweet. To normalize, we divide T by n. Finally, the result we get is in the range -1 to 1 where 1 denotes every post encountered is a positive post and -1 denotes every post encountered is a negative post. The greater the output sentiment, the more positive it is.

Fig. 2
figure 2

Resultant closely related topics for \(G=\) #shopping and \(t=\) #summer (\(k=10\), \(k=15\) and \(k=20\))

Fig. 3
figure 3

Resultant closely related topics for \(G=\) #covid19 and \(t=\) #vaccine (\(k=10\), \(k=15\) and \(k=20\))

Table 1 Overall sentiment table for G = ‘#summer,’ t = ‘#shopping’ and G = ‘#covid19,’ t = ‘#vaccine’

Results

Twitter users post messages about a range of topics unlike other sites which are designed for a specific topic. Users use hashtags (#) to mark topics a tweet talks or is related about (Kumar and Sebastian 2012). We propose an OTSC model that uses this feature of twitter to make a hashmap of a general topic G and use this hashmap to find closely related topics of a given topic t. Furthermore, it finds overall sentiments of top 10% topics among the found topics. We use our proposed OTSC model and apply it to G = #summer and t = #shopping with k set to 10, 15 and 20. Similarly, we apply our model to G = #covid19 and t = #vaccine, G = #politics and t = #issues and G = #electricvehicles and t = #tesla with k set to 10, 15 and 20.

The found topics can be referred in Figs. 2, 3, 4 and 5. In each of these figures, we added 3 nodes k10, k15 and k20. All the topics connected to k10 are found when \(k=10\), similarly, k15 corresponds to topics found when \(k=15\) and k20 for \(k=20\). Sentiments related to corresponding topics can be referred to in Table 1 and 2. In Fig. 2, node k10 is connected with 18 topics, k15 is connected with 8 topics, and k20 is connected with 6 topics. k10 and k15 have 1 topic in common, k10 and k20 have 1 topic in common, whereas k15 and k20 have 2 topics in common. In Fig. 3, node k10 is connected with 12 topics, k15 is connected with 10 topics, and k20 is connected with 8 topics. k10 and k15 have 4 topics in common, k10 and k20 have 4 topics in common, whereas k15 and k20 have 7 topics in common. In Fig. 4, node k10 is connected with 16 topics, k15 is connected with 10 topics, and k20 is connected with 7 topics. k10 and k15 have 3 topics in common, k10 and k20 have 1 topic in common, whereas k15 and k20 have 1 topic in common. Finally for Fig. 5, node k10 is connected with 17 topics, k15 is connected with 11 topics and k20 is connected with 9 topics. k10 and k15 have 10 topics in common, k10 and k20 have 8 topics in common, whereas k15 and k20 have 7 topics in common.

We find most topics positively viewed for Fig. 2 and most topics negatively viewed for Fig. 3 (Refer Table 1). For Fig. 4, some topics are positive and some are negative, whereas for Fig. 5 most topics are positive while some being negative (Refer Table 2).

Fig. 4
figure 4

Resultant closely related topics for G = #politics and t = #issues (\(k=10\), \(k=15\) and \(k=20\))

Fig. 5
figure 5

Resultant closely related topics for G = #electricvehicles and t = #tesla (\(k=10\), \(k=15\) and \(k=20\))

Table 2 Overall sentiment table for G = ‘#politics,’ t = ‘#issues’ and G = ‘#electricvehicles’ and t = ‘#tesla’

Discussion

It is important to understand emerging trends and public opinion about them to make informed decision and gain actionable insights (Saura 2020; Alsini et al. 2018). These trend can be used for marketing analysis as used by Reyes-Menendez et al. (2020) to understand implications of emerging trend and how advertisements can be made while keeping them in mind. Similar to this, much research has been done to analyze emerging trends and their marketing implications (Iyengar et al. 2011; Alalwan et al. 2017; Saura et al. 2019; Kim et al. 2013; Boon-Itt and Skunkan 2020; Reyes-Menendez et al. 2018; Lorenz-Spreen et al. 2018; Chandrasekaran et al. 2020). We can also observe policy changes of government to incorporate and promote sustainable development (Li et al. 2016; Hishan et al. 2019; Elkerbout et al. 2020).

Using these research, we can understand the importance of analysis of emerging trends. Given that, most of the work done to find topics are based of latent Dirichlet allocation (LDA) (Saura and Bennett 2019), which works well but it fails to incorporate hashtags that are primarily used for marking topics in a tweet (Kumar and Sebastian 2012). Furthermore, there is no means using which we can change generality of the found topics when we use LDA. To include these features, we use fluid community detection algorithm that allows us to find k communities using which we can change granularity of the resultant communities (Parés et al. 2017).

We applied our model to find trends in the following (Gt) pairs: (#shopping, #summer), (#covid19, #vaccine), (#politics, #issues), (#electricvehicles , #tesla) for \( k=10,15 \& 20\). In each cases, we found maximum number of topics for \(k=10\) and minimum number of topics for \(k=20\). From this, we can infer that the topics found tend to be more specific due to the shrinkage of number of nodes in individual communities as we increase k(Number of communities) (Parés et al. 2017).

In Fig. 2, we found some interesting topics like #ootd which stands for outfit of the day and its corresponding sentiment (Refer Table 1) seems to be among the highest. This provides a potential advertising keyword and technique that marketing agencies can use which is also supported by paper (Dar and Tariq 2021). Also emerging topics like swimsuit, beach, accessories, handmade with above average sentiment suggest that people may tend to buy items related to these topics. This also opens a door for business opportunity in accessories, handmade products, bags and custom designs made using zazzle. Topics such as love, cute, shopsmall, weship, mothersday and retail therapy can be used by marketing agencies to promote their products as they correspond to a positive public opinion. Given that, one should be careful to use the keyword ‘retail therapy’ as its below average public opinion.

For topics relating to (#covid19, #vaccine), we found a general negative sentiment. Particularly for #vaccine which might point toward vaccine hesitancy. A research (Rosenbaum 2021) suggests that about 31% of Americans wish to take wait and see approach and about 20% remain quite reluctant about it. Other reason for its negative sentiment might be because several European nations are suspending the use of Astrazenca covid-19 vaccine (Mahase 2021). Furthermore, a negative sentiment in patients might indicate increasing number of covid patients. A positive travel sentiment paired with #yeg (Edmonton International Airport) and #yyc (Calgary International Airport) might suggest ease of lockdown and possible investment opportunity in travel sector (Nayak et al. 2021). A positive sentiment on #health asserts that people are health conscious which provides opportunity related to healthcare and organic products (Tandon et al. 2020). A neutral sentiment of #wearamask suggests that many people have negative sentiment about it where our results match with a similar research which states that among 4099 respondents only 53.3% of symptotic participants reported wearing a mask in preceding week and about 62% people without symptoms did not wear a mask in prior week (Egan et al. 2021) pointing toward need for education about importance of mask.

Results from (#politics, #issues) give us a list of pressing issues such as racism, shootings and free speech. It also points out emerging Asian hate (#stopasianhatecrimes) that recently emerged due to current covid19 pandemic (Xu et al. 2021). Additionally, a positive viewpoint on OSN shows that people tend to view it positively, suggesting that schemes promoting equality might be viewed positively (Reyes-Menendez et al. 2020). Podcasts being connected to all three nodes (k10, k15 and k20) might suggest that people are often using podcasts to listen to or share opinions on politics and pressing issues. Advertisers might want to use that medium to advertise related content.

Finally, Fig. 5 is one of the most overlapping graphs among the topics we explored. These overlaps mostly points toward competitors of tesla such as renault, bmw, daimier, volvo and vw(volkswagen). Looking at the sentiments, we can infer that tesla might be one of the most positively viewed vehicle in electric vehicle domain (Thomas and Maine 2019). Furthermore, the occurrence of topics such as stocks and stockmarket indicates that people that talk about tesla or electric vehicles might also be related to investment community. A positive sentiment in #climateactionnow and #eugreendeal indicates that people are optimistic of sustainable alternatives (O’Riordan 2004) that can be considered while entering a business domain or while creating an advert. It also indicates that governments work on related policies might be positively viewed (Li et al. 2016; Hishan et al. 2019; O’Riordan 2004).

It is also important to note that in most cases overlapping topics are of greater importance as compared with non-overlapping topics, e.g., beach, swimsuit and accessories for Fig. 2, vaccine and health and patients in Fig. 3, podcasts and government in Fig. 4 and different automakers such as renault, bmw, daimer, volvo and volkswagen in Fig. 5.

Conclusion

Analysis of emerging trends is important to both businesses and policy makers. In this paper, we propose an OTSC model (Fig. 1) that can be used to get emerging trends along with their public sentiment based on Twitter tweets. We propose using fluid community detection algorithm instead of generally used LDA for finding related topics in Twitter. This is because LDA fails to incorporate Twitters intrinsic features such as hashtags and does not provide metrics to change generality of the found topics. With the help of fluid community detection algorithm, our model is capable of changing the granularity of underlying community thereby changing the generality of the found topics. We further assert this by applying our model to following (Gt) pairs: (1) (#summer, #shopping), (2) (#covid19, #vaccine), (3) (#politics, #issues) and (4) (#electricvehicles, #tesla) for \(k=10,15\) and 20. Our resulting topics are presented in Figs. 2, 3, 4 and 5. Their corresponding sentiments are in Tables 1 and 2. We found that for \(k=10\), the number of topics found were maximum and for \(k=20\) they were minimum pointing that as number of community increase, the topics tend to be more specific thereby less general. We further analyzed our results in context of business analysis and policy analysis and found that we can answer questions like what are the emerging trends, positively viewed keywords, key competitors, find pressing and emerging issues and general sentiment around a topic. This information can then be used to make informed business and political decisions.

Managerial implications

Discovering emerging trends and issues have several applications in business analysis and management. Our research proposes an OTSC model to discover and analyze public sentiments about those emerging trends. Emerging trends can be used to understand a broader market picture potentially helping with business and managerial changes. One such example that we discussed is about emerging trends for shopping in summer that helps us understand how people are more positive about handmade products, swimwear and potentially fitness and sports products during summer. Furthermore, we can understand declining trends such as for sweater as compared with other trends for the same query. A closer look at electric vehicles and tesla points us that clean energy is positively viewed at. One can apply this knowledge to create a positive customer/client outlook by promoting environmental friendly approach. Given this information, it is important to understand how this might affect business and a prior knowledge about the domain is required. Prior knowledge is also necessary to set apart topics for business use and keywords for adverts in the found topics. Our model gives a broader view of the subject, but to take decisions, one need to understand specifics of the topic of interest keeping in mind its broader implications.

Practical/social implications for marketers

Interactivity on the internet shifts the ways in which users perceive advertising. This research provides practical implications on how advertisers can use interactions among users to understand keywords that might give a better perception of their adverts. For instance, we found ootd, love, cute, retailtherapy and shopsmall for shopping in summer. Based on user interactions, our model might also suggest relevant advertisement means (Podcasts for political advertisements) as found in Table 2. Other than that, corresponding sentiments related to keywords can also help advertisers understand how a keyword might affect advertisement. For instance, retailtherapy have below average sentiment score as compared to other found topics. This may indicate that advertisers should take caution while using this keyword and better understanding of using this keyword might be required.

Future work and limitations

Our proposed model uses fluid community detection algorithm, and it does not always return the same result during each run (Sun et al. 2020). Finding a suitable value of k requires trial and error furthermore results may vary at each run. We hand-picked topics for analysis which might include some bias. Furthermore, topics found are open to interpretation and hence subjective. In future, we can try an iterative model that uses OTSC as a base and applies it for different values of k. This work can also be extended for analysis of different trends similar to the following works (Boon-Itt and Skunkan 2020; Reyes-Menendez et al. 2018; Lorenz-Spreen et al. 2018; Chandrasekaran et al. 2020). This work can also be extended to better understand importance of overlapping topics for different values of k.

References

  • Alalwan AA, Rana NP, Dwivedi YK, Algharabat R (2017) Social media in marketing: a review and analysis of the existing literature. Telemat Inform 34(7):1177–1190

    Google Scholar 

  • Alamsyah A, Rahardjo B et al (2021) Social network analysis taxonomy based on graph representation. ArXiv preprint arXiv:2102.08888

  • Alsini A, Datta A, Huynh DQ, Li J (2018) Community aware personalized hashtag recommendation in social networks. In: Australasian conference on data mining. Springer, pp 216–227

  • Ansari MZ, Aziz M, Siddiqui M, Mehra H, Singh K (2020) Analysis of political sentiment orientations on twitter. Proced Comput Sci 167:1821–1828

    Google Scholar 

  • Anthony JH (2007) Probability and statistics for engineers and scientists. Thomson Brooks/Cole

  • Bello-Orgaz G, Mesas RM, Zarco C, Rodriguez V, Cordón O, Camacho D (2020) Marketing analysis of wineries using social collective behavior from users’ temporal activity on twitter. Inf Process Manag 57(5):102220

    Google Scholar 

  • Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2013) Multidimensional networks: foundations of structural analysis. World Wide Web 16(5–6):567–593

    Google Scholar 

  • Bhatnagar S, Dixit M, Prasad N (2020) A review of common approaches to sentiment analysis and community detection. Int J Comput Appl 975:8887

    Google Scholar 

  • Blondel VD, Guillaume JL, Lambiotte R (2008) Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 10:P10008

    MATH  Google Scholar 

  • Boon-Itt S, Skunkan Y (2020) Public perception of the covid-19 pandemic on twitter: sentiment analysis and topic modeling study. JMIR Public Health Surveill 6(4):e21978

    Google Scholar 

  • Chandrasekaran R, Mehta V, Valkunde T, Moustakas E (2020) Topics, trends, and sentiments of tweets about the covid-19 pandemic: temporal infoveillance study. J Med Internet Res 22(10):e22624

    Google Scholar 

  • Chevalier-Roignant B, Trigeorgis L (2012)Competitive strategy: options and games, vol. 1, 1 edn. MIT Press. https://EconPapers.repec.org/RePEc:mtp:titles:0262015994

  • Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286

    MathSciNet  MATH  Google Scholar 

  • Dar TM, Tariq N (2021) Celebrities and influencers: have they changed the game of online marketing? Eur J Bus Manag Res 6(1):106–111

    Google Scholar 

  • Davenport T (2014) Big data at work: dispelling the myths, uncovering the opportunities. Harvard Business Review Press

  • Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Coling 2010: Posters, pp 241–249

  • Deitrick W, Hu W (2013) Mutually enhancing community detection and sentiment analysis on twitter networks. J Data Anal Inf Process 1(3):19–29. https://doi.org/10.4236/jdaip.2013.13004

  • Diamantini C, Mircoli A, Potena D, Storti E (2019) Social information discovery enhanced by sentiment analysis techniques. Fut Gener Comput Syst 95:816–828

    Google Scholar 

  • Direction S (2021) Firm capacity to manage new trends: business model innovation can increase resilience. Strateg Dir 37(4):15–18. https://doi.org/10.1108/SD-01-2021-0009

  • Egan M, Acharya A, Sounderajah V, Xu Y, Mottershaw A, Phillips R, Ashrafian H, Darzi A (2021) Evaluating the effect of infographics on public recall, sentiment and willingness to use face masks during the covid-19 pandemic: a randomised internet-based questionnaire study. BMC Public Health 21(1):1–10

    Google Scholar 

  • Elkerbout M, Egenhofer C, Núñez Ferrer J, Catuti M, Kustova I, Rizos V et al (2020) The European green deal after corona-implications for EU climate policy. No. 26869. Centre Eur Policy Stud

  • Fortunato S (2018) Community structure in complex networks. In: EGC, pp 5–6

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    MathSciNet  Google Scholar 

  • Go A, Bhayani R, L Huang (2009) Twitter sentiment classification using distant supervision. CS224N Proj Rep, Stanford 1(12):2009

    Google Scholar 

  • Hall D, Lutsey N (2020) Electric vehicle charging guide for cities. Consulting Report; The International Council on Clean Transportation: Washington, DC, USA. Available online: https://theicct.org/publications/city-EV-charging-guide

  • Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR (2020) Text mining in big data analytics. Big Data Cogn Comput 4(1):1

    Google Scholar 

  • Hishan SS, Khan A, Ahmad J, Hassan ZB, Zaman K, Qureshi MI et al (2019) Access to clean technologies, energy, finance, and food: environmental sustainability agenda and its implications on sub-saharan african countries. Environ Sci Pollut Res 26(16):16503–16518

    Google Scholar 

  • Hsieh IYL, Pan MS, Green WH (2020) Transition to electric vehicles in china: implications for private motorization rate and battery market. Energy Policy 144:111654

    Google Scholar 

  • Hu X, Tang L, Tang J, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the 6th ACM international conference on Web search and data mining, pp 537–546

  • Hubert M, Linzmajer M, Riedl R, Hubert M, Kenning P, Weber B (2017) The use of psycho-physiological interaction analysis with FMRI-data in is research-a guideline. Commun Assoc Inf Syst (CAIS) 40(9):181–217

    Google Scholar 

  • Iyengar R, Van den Bulte C, Valente TW (2011) Opinion leadership and social contagion in new product diffusion. Mark Sci 30(2):195–212

    Google Scholar 

  • Karataş A, Şahin S (2018) Application areas of community detection: a review. In: 2018 international congress on big data, deep learning and fighting cyber terrorism (IBIGDELFT). IEEE, pp 65–70

  • Kibriya AM, Frank E, Pfahringer B, Holmes G (2004) Multinomial naive bayes for text categorization revisited. In: Australasian joint conference on artificial intelligence. Springer, pp 488–499

  • Kim AE, Hansen HM, Murphy J, Richards AK, Duke J, JA Allen (2013) Methodological considerations in analyzing twitter data. J Natl Cancer Inst Monogr 47:140–146

    Google Scholar 

  • Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of twitter posts. Expert Syst Appl 40(10):4065–4074

    Google Scholar 

  • Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the omg! In: 5th international AAAI conference on weblogs and social media. Citeseer

  • Kulshrestha J, Zafar M, Noboa L, Gummadi K, Ghosh S (2015) Characterizing information diets of social media users. In: Proceedings of the international AAAI conference on web and social media, vol 9

  • Kumar A, Sebastian TM (2012) Sentiment analysis on twitter. Int J Comput Sci Issues (IJCSI) 9(4):372

    Google Scholar 

  • Lăzăroiu G, Neguriţă O, Grecu I, Grecu G, Mitran PC (2020) Consumers’ decision-making process on social commerce platforms: online trust, perceived risk, and purchase intentions. Front Psychol 11:890

  • Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web, pp 695–704

  • Li Y, Zhan C, de Jong M, Lukszo Z (2016) Business innovation and government regulation for the promotion of electric vehicle use: lessons from Shenzhen, China. J Clean Prod 134:371–383

    Google Scholar 

  • Liu X, Burns AC, Hou Y (2017) An investigation of brand-related user-generated content on twitter. J Advert 46(2):236–247

    Google Scholar 

  • Liu X, Shin H, Burns AC (2019) Examining the impact of luxury brand’s social media marketing on customer engagement: using big data analytics and natural language processing. J Bus Res 125:815–826

  • Lorenz-Spreen P, Wolf F, Braun J, Ghoshal G, Conrad ND, Hövel P (2018) Tracking online topics over time: understanding dynamic hashtag communities. Comput Soc Netw 5(1):1–18

    Google Scholar 

  • Luo L, Liu K, Guo B, Ma J (2020) User interaction-oriented community detection based on cascading analysis. Inf Sci 510:70–88

    Google Scholar 

  • Mahase E (2021) Covid-19: who says rollout of astrazeneca vaccine should continue, as europe divides over safety. BMJ 372:n728. https://doi.org/10.1136/bmj.n728

  • Nayak J, Mishra M, Naik B, Swapnarekha H, Cengiz K, Shanmuganathan V (2021) An impact study of covid-19 on six different industries: automobile, energy and power, agriculture, education, travel and tourism and consumer electronics. Expert Syst 2021:1–32. https://doi.org/10.1111/exsy.12677

  • Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104

    MathSciNet  Google Scholar 

  • O’Riordan T (2004) Environmental science, sustainability and politics. Trans Inst Brit Geogr 29(2):234–247

    Google Scholar 

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. LREc 10:1320–1326

    Google Scholar 

  • Parés F, Gasulla DG, Vilalta A, Moreno J, Ayguadé E, Labarta J, Cortés U, Suzumura T (2017) Fluid communities: a competitive, scalable and diverse community detection algorithm. In: International conference on complex networks and their applications. Springer, pp 229–240

  • Pfeffer J, Mayer K, Morstatter F (2018) Tampering with twitter’s sample API. EPJ Data Sci 7(1):50

    Google Scholar 

  • Puthussery A (2020) Digital marketing: an overview. Notion Press

  • Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106

    Google Scholar 

  • Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL student research workshop, pp 43–48

  • Reyes-Menendez A, Saura JR, Alvarez-Alonso C (2018) Understanding# worldenvironmentday user opinions in twitter: a topic-based sentiment analysis approach. Int J Environ Res Public Health 15(11):2537

    Google Scholar 

  • Reyes-Menendez A, Saura JR, Filipe F (2020) Marketing challenges in the# metoo era: gaining business insights using an exploratory sentiment analysis. Heliyon 6(3):e03626

    Google Scholar 

  • Rosenbaum L (2021) Escaping catch-22—overcoming covid vaccine hesitancy. New England J Med 384(14):1367–1371

  • Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123

    Google Scholar 

  • Saura JR (2020) Using data sciences in digital marketing: framework, methods, and performance metrics. J Innov Knowl 6(2):92–102

  • Saura JR, Bennett DR (2019) A three-stage method for data text mining: using UGC in business intelligence analysis. Symmetry 11(4):519

    Google Scholar 

  • Saura JR, Reyes-Menendez A, Bennett DR (2019) How to extract meaningful insights from UGC: a knowledge-based method applied to education. Appl Sci 9(21):4603. https://doi.org/10.3390/app9214603

    Article  Google Scholar 

  • Singh AK, Shashi M (2019) Vectorization of text documents for identifying unifiable news articles. Int J Adv Comput Sci Appl 10:305–310

  • Solomon RS, Srinivas P, Das A, Gamback B, Chakraborty T (2019) Understanding the psycho-sociological facets of homophily in social network communities. IEEE Comput Intell Maga 14(2):28–40

    Google Scholar 

  • Sun Z, Sun Y, Chang X, Wang Q, Yan X, Pan Z, Zp Li (2020) Community detection based on the matthew effect. Knowl-Based Syst 205:106256

    Google Scholar 

  • Tandon A, Dhir A, Kaur P, Kushwah S, Salo J (2020) Why do people buy organic food? the moderating role of environmental concerns and trust. J Retail Consum Serv 57:102247

    Google Scholar 

  • Tavakolifard M, Almeroth KC (2012) Social computing: an intersection of recommender systems, trust/reputation systems, and social networks. IEEE Netw 26(4):53–58

    Google Scholar 

  • Thomas V, Maine E (2019) Market entry strategies for electric vehicle start-ups in the automotive industry-lessons from tesla motors. J Clean Prod 235:653–663

    Google Scholar 

  • Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on twitter. In: Proceedings of the 20th international conference on World wide web, pp 705–714

  • Xiao F, Noro T, Tokuda T (2014) Finding news-topic oriented influential twitter users based on topic related hashtag community detection. J Web Eng 13(5 & 6):405–429

    Google Scholar 

  • Xu J, Sun G, Cao W, Fan W, Pan Z, Yao Z, Li H (2021) Stigma, discrimination, and hate crimes in chinese-speaking world amid covid-19 pandemic. Asian J Criminol 16(1):1–24

    Google Scholar 

  • Xu K, Li J, Liao SS (2011) Sentiment community detection in social networks. In: Proceedings of the 2011 iConference, pp 804–805

  • Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385

    Google Scholar 

  • Zhang Q, Zhang Z, Yang M (2021) Zhu L (2021) Exploring coevolution of emotional contagion and behavior for microblog sentiment analysis: a deep learning architecture. Complexity 2021:10

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarvesh Bhatnagar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bhatnagar, S., Choubey, N. Making sense of tweets using sentiment analysis on closely related topics. Soc. Netw. Anal. Min. 11, 44 (2021). https://doi.org/10.1007/s13278-021-00752-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00752-0

Keywords

  • Community detection
  • Sentiment analysis
  • Social network analysis
  • Online social networks
  • Trend analysis