1 Introduction

Traditionally, the general public had not been provided with opportunity to become a party to the public policy making decisions by the government agencies and mainly the bureaucrats used to have their own say (Severo et al. 2016). This led to a sharp decline in conviction as well as reliance of public towards the government in general and its policies in particular (Hibbing and Theiss-Morse 2001; James and Van Ryzin 2017; Rosenstone et al. 1993; Schneider et al. 2010). The perception rift between the government and the public increased dramatically and currently both are struggling to maintain a perfect harmony with regard to their relationship (Janssen et al. 2018). Electronic-Government (E-Government) is one such powerful tool that holds all the essentials to improve the relationship between the general public and the government (Dwivedi et al. 2017a, b; Rana et al. 2016, 2017; Shareef et al. 2016) as it strongly put emphasis on maintaining transparency, enhancing public participation and upgrading the quality of service (Dwivedi et al. 2016). However, the required Information and Communication Technology (ICT) infrastructure, implementation and operational cost remained a major bottleneck towards the implementation of e-government in true sense. Cloud computing is one such popular as well as reliable technology that can provide an optimal solution of this delinquent issue (Mohammed et al. 2016). With several inbuilt advantages such as on-demand scalability and pay-as-use, have motivated many countries to adapt cloud computing based e-government (Sharma et al. 2012). However, the important point of public participation towards the policy making exercise still remains a distant dream. To overcome this paradox, governments all around world have started making use of social media to acquire appropriate feedback from the various realms of society regarding effectiveness of the new public policies being formulated (Grubmüller et al. 2013).

So, social media can overcome the limitations of e-government like low accessibility and public participation (Pieterson and Johnson 2011), thus improving the relationship between the general public and the government. But with more than 2 billion active users on social media (Aladwani and Dwivedi 2018), the volume of data generated is quite large (i.e. big data). The traditional systems are not sufficient to handle this much large volume of data, due to high infrastructural requirement to store and process this data. As discussed above cloud based e-government can be used to overcome this infrastructure related problems. Earlier studies (Cotterill and King 2007; Dwivedi et al. 2017; Kaigo and Okura 2016; Mossberger et al. 2013) have highlighted the role of social media in e-government but these studies failed to highlight the infrastructural barrier in its implementation.

This paper proposes an innovative and smart approach which unitizes the capabilities of two important concurrent technologies, cloud computing and social media analytics for efficient monitoring and controlling of government initiative for the general public. The main objective of this research is to envisage the effect of increased public participation towards policy making decisions. We have applied our proposed system to a latest policy implementation by the Government of India i.e. Goods and Services Tax (GST) which was intended to unify all the taxes in the country in a single domain and bring rationalization (Ahmad and Poddar 2009; Garima 2018; Jha 2018; Joseph et al. 2017). Earlier Indian Tax system consisted of complex structure where both state governments and central government imposed their own tax on a specific commodity. Due to the variable tax rates imposed by different state governments, same commodity was sold at different rates in different states. In order to overcome this complex problem, multiple solutions had been suggested from time to time. Eventually in 1999, a proposal was initiated by the council of ministers under the aegis of then Prime Minister of India, Mr. Atal Bihari Vajpayee (Singh et al. 2019) to standardized the tax structure throughout India. It took another 18 long years for GST to become a reality (Constitution 2017) and finally, on July 1, 2017 GST was implemented pan India. Like every other government policy, it was the common man who faced the initial hardships. Misconceptions and confusion related to its implementation led to create a faulty image about the policy among the common people, although GST has many benefits in store for them (GST Benefits 2019). The initial hardships and the resolution provided by MoF motivated us to use this GST data for experimentation in our proposed system. The results have been observed to be quite encouraging, which establish the fact that the proposed approach can be instrumental in efficient evaluation of policy making decisions including future policy formulation.

The paper is structured into eight sections. The second section discusses brief literature regarding cloud based e-government and social media. Following that the third section provides a discussion on proposed methodology. Section four presents implementation and results. Section five gives performance analysis of the system, while section six discusses economic feasibility of the system along with limitation and future research directions. Finally, we conclude in the last section.

2 Review of Literature

The literature review is classified into the following two subsections. The first subsection highlights the use of cloud computing in e-government. While the second subsection highlights the use of social media for policy making.

2.1 Use of Cloud Computing in E-Government

Governments around the world are promoting multiple services to serve its citizens in a better and transparent way. E-Government is one such mechanism that enables the government to perform its day to day tasks and provide a better service to its citizens (Mohammed et al. 2016; Zwattendorfer et al. 2013). E-Government is the use of ICT and other web technologies to provide access to effective, efficient and transparent public services to its citizens and employees (Jeong 2006; Rana et al. 2013). Although, use of ICT can provide several advantages, yet the required technical infrastructure, implementation cost and requirement of skilled staff becomes major obstacle towards E-Government implementation (Rana et al. 2013). With the emergence of cloud computing, these challenges can be addressed up to a fair degree of satisfaction for all stake-holders (Mohammed et al. 2016). Cloud computing consists of large shared pool of computer resources which provide features like on-demand scalability and pay-as-use (Sadiku et al. 2014). These advantages have played a decisive factor in motivating the governments of many countries to migrate from traditional costly E-Government model to cost efficient as well as scalable cloud based E- Government model (Sharma et al. 2012). Due to this adoption cloud computing has become a new channel for delivering improved government services (Liang et al. 2011; Smitha et al. 2012). Cloud based E-Government services provides advantages like reduced operational cost, distributed data storage, scalability and finally security management (Smitha et al. 2012), which indeed improves the relationship between government and public. Not only this, but cloud based E-Government model is also building a strong foundation for smart cities (Clohessy et al. 2014).

2.2 Use of Social Media for Policy Making

Traditionally, policy making had always been based upon official statistics (data) generated by government agencies and international bodies (Severo et al. 2016). However, decision makers normally reported flaws in this form of data because of publication delays, top down approach and insufficient topic of interest etc. (Pfeffermann et al. 2015). Therefore to overcome this traditional problem of data collection, political scientist and policy makers moved their attention towards social media platforms like Twitter and Facebook, to accumulate more authenticated user data and that too in real time. Hence, social media becomes an efficient tool for enhancing transparency in government working and increasing the communication between citizens and policy makers by bringing out transparent data analytic statistics (Chung and Zeng 2016; Lee and Kwak 2012). Nowadays, social media have become an integral part of everyday life, irrespective of the status of any individual (AlAlwan et al. 2017; Dwivedi et al. 2015; Kapoor et al. 2018; Shiau et al. 2017, 2018). This virtual world provides a perfect platform for people from all around the world to discuss topics of common interest such as sports, entertainment and even politics. Talking about politics, at least 33% of social media users comments, discusses or post about politics on these platforms (Hossain et al. 2018). Even governments have realized the potential of social media. Consequently, various government agencies have started using various social media platforms to connect and engage with general public (Aladwani and Dwivedi 2018; Alryalat et al. 2017; Bertot et al. 2012; Rana et al. 2015). As social media is helping to increase the interaction between public and government, it is indeed facilitating public participation (Ceron and Negri 2016). Generally, people post something regarding government, politics or policies which might be intentional or unintentional (Hossain et al. 2018), which can be utilized by government for formulating more effective public policy as well as designing and delivering better services to its citizens (Androutsopoulou et al. 2018; Joseph et al. 2017; Park et al. 2016).

As represented in both the above subsections that considerable work has been done in the field of cloud computing based e-government and use of social media for policy making having their own advantages and benefits. Hence, cloud computing based e-government and social media can be considered as an influential paradigm to be collaborated with controlling and monitoring of public policies. With large number of people engaging on social media sites for discussions related to public policies, resulting in generation of huge amount of data. There is a need for a cloud based system that can utilize this large data and map useful public opinion in form of issues, concerns, solutions, proposals, advantages and disadvantages (Androutsopoulou et al. 2018) towards a public policy at an early stage so that appropriate steps can be taken in order to please the public for which the policy has been formulated (Bertot et al. 2012). To the best of our knowledge, till date no effort has been made to combine both these services. Hence, this paper aims to unitize the capabilities of cloud computing and social media analytics for efficient monitoring and controlling of public policies. With this research, we try to answer the following questions that may erupt in the mind of the readers of this paper:

  1. (a)

    Can social media act as an effective tool for opinion mining on issues related to national importance?

  2. (b)

    Can cloud based environment be successfully employed in implementing a national policy in cost efficient manner?

3 Research Methodology

The main aim of this study is to combine the capabilities of Cloud Computing and Social Media Analytics for efficient monitoring and controlling of public policies. For this cloud based system is proposed. Figure 1 shows the architecture of the proposed system. The proposed system comprises of three main components: (a) Data acquisition component (b) Monitoring component and (c) Controlling component. The reason to breakdown the system into its three components was to separately highlight the task being performed by each component. The details of each component are explained as follow:

Fig. 1
figure 1

Architecture of proposed system

  1. (a)

    Data acquisition component: Data acquisition is the first component of our proposed system. This component is responsible for data collection from various social media platforms like Facebook, Twitter, Instagram etc. The data collection task is performed on continuous basis in a time specific manner. This component works in four phases. Figure 2 gives an overview of these phases. Phase-1 deals with identifying all the keywords (Search Terms) on basis of which data will be extracted from various social media sites. Once all the keywords have been identified, phase-2 comes into play. Phase-2 deals with extraction of data (Tweets, Posts etc) from various social media websites (Facebook, Twitter, Instagram etc). Phase-2 returns data in unstructured json format, so in phase-3 this unstructured json data is converted into structured format (Excel Format). This excel file is stored on cloud database in phase-4 so that the computations can be effectively made. The details of each phase are explained in upcoming Section (See Section 5).

    Fig. 2
    figure 2

    Different phases of data acquisition component

  2. (b)

    Monitoring Component: This component is responsible for performing the monitoring activities. These monitoring activities are performed using various social media analytics techniques, on the user content for extracting important decision making information (Stieglitz and Dang-Xuan 2013a). Figure 3 shows the various social media analytics techniques. Generally, social media analytics is categorized into four broad categories namely descriptive analysis, content analysis, network analysis and geospatial analysis (Singh et al. 2018a). The descriptive analysis gives descriptive statistics about number of tweets, number of tweet senders, hashtags, @mentions, word cloud etc. (Chae 2015; Ganis and Kohirkar 2015; McNaught and Lam 2010; Purohit et al. 2013). Content analysis deals with obtaining semantic content from the given text. This is achieved using techniques like sentiment analysis, topic modeling etc. (Kassarjian 1977; Kayser and Blind 2017; Llewellyn et al. 2015; Zhang et al. 2016). Network analysis allows us to identify different communities and cluster of users based upon their opinion towards an entity (Abascal-Mena et al. 2015; HerdaĞdelen et al. 2013; Stieglitz and Dang-Xuan 2013a, b). Finally, geospatial analysis deals with location based analysis and time series analysis (Attu and Terras 2017; Saboo et al. 2016; Singh et al. 2018b; Stephens and Poorthuis 2015). All these techniques give different results that are useful for appropriate decision making and these results are again stored in cloud database.

    Fig. 3
    figure 3

    Different Social Media Analytics Techniques (Source: Adapted from Grover et al. 2017)

  3. (c)

    Controlling Component: This component is responsible for decision making based upon the results generated in monitoring component and providing appropriate alerts to the government. The alerts are generated in a timely manner and the time interval can be fixed ranging from hours to days or even weeks and months. In addition to this it also sends calculated information to the government so that appropriate control measures can be taken based upon these results generated by the monitoring component.

4 Implementation and Results

Considering numerous advantages, the backbone of our proposed system is cloud. Though the system can be implemented without considering cloud computing, however the various features associated with cloud environment would make cloud based system more versatile and hence, motivate us to integrate cloud within our proposed system (Buyya et al. 2013; IBM 2019). The benefits associated with cloud computing are explained in Table 1.

Table 1 Benefits of cloud computing (Buyya et al. 2013, IBM 2019)

For implementation on cloud we have used Amazon EC2 (AWS Amazon-EC2 2017), while for performing computations on cloud we have used RStudio server. There was no specific reason for selecting Amazon EC2 for implementation, in fact any alternate cloud service provider like Microsoft Azure (2018) or Google cloud (Google Cloud 2018) could have been used. The other integral part of our system is data from social media sites. All social media sites provide authorized APIs (Application Programming Interfaces) for fetching data from these sites in an authenticated manner. Now in order to fetch data from these social media sites, we need to identify keywords based upon which this entire search operation will be based. This section is divided into various sub-sections, each depicting a specific operation.

4.1 Data Collection

Data for experimentation has been collected from Twitter. Though the proposed system (See Section 3) is capable enough to use multiple social media platforms for data retrieval, however for this experimentation only Twitter data is used. As discussed above data was fetched based upon specific #hashtags (#GST, #GSTForNewIndia and #OneNationOneTax). For identification of #hashtags, an expert team consisting of three independent experts was constituted. These experts sorted #hashtags that were linked with GST. The common #hashtags identified by three experts were selected for performing tweet fetching operation (Singh et al. 2018c). The Twitter API used for fetching tweet provide us with various search parameters such as, language of tweets, type of tweets (Original, Re-Tweets or Both) and date range between which fetching operation need to be performed. Using these search parameters, we perform the fetching operation in more appropriate and desired manner.

In total 41,823 tweets were collected over a span of 24 days starting from June 23, 2017 to July 16, 2017. Since data was fetched using APIs, it was in unstructured form i.e. json format. In order to use this data for further analysis it was necessary to convert this unstructured data into structured data i.e. excel format. Table 2 shows the various attributes that were fetched during data collected from Twitter API.

Table 2 Various attributes fetched during data collection

4.2 Tweet Statistics

In total 41,823 tweets were collected from 35,400 different users from India. In total 2873 users were detected who tweeted more than one tweet, accounting a total of 6423 tweets. For better interpretation of the results tweet collection period was broken into 3-phases (Pre-GST, In-GST and Post-GST) of eight days each. The breakup of these phases is shown in Fig. 4, while Fig. 5 shows daily tweet collection.

Fig. 4
figure 4

Tweet collection details

Fig. 5
figure 5

Daily tweet collection

The shaded part (orange color) in Fig. 5 indicates the In-GST period when Twitter traffic was highest as compared to other two phases. The details of tweet statistics are given in Table 3.

Table 3 Tweet statistics

Inference:

These statistics helps the government in understanding the scale and impact of the issues. The data clearly indicates that the social engagement created by GST was on larger scale. As evident from the data the number of tweets increased by 45% from phase-1 to phase-2, depicting the issue has increased and larger number of people joined the conversation to put forward their reviews regarding this policy. This indeed acts as an indicator for the government to look into such scenarios and act accordingly.

Similarly the average tweets per sender are 1.11, 1.25 and 1.15 for the 3 phases respectively. This indicates that data collected is normalized (equally distributed) and not biased. In case average tweets per sender is more, then there is a possibility that the data is biased and hence we need to normalize this. We can normalize the data by using a similar technique as used by Bornmann and Haunschild (2016). We can also restrict the number of tweets from a sender to one tweet per user by adopting a similar technique used by Singh et al. (2017). Since average tweets per sender are almost “1”, hence both the above discussed techniques are not required in these experimentations.

4.3 (#)Hashtag Analysis

Hashtag analysis deals with the various hashtags that occurs among the collected tweets. In total 11,499 unique hashtags were found, that appeared 81,054 times. From the total 41,823 tweets, 19,114 tweets contained more than one hashtag. The analysis shows that hashtag “#GST” had the maximum occurrences i.e. 18,925. Figure 6 shows top 15 hashtags which had maximum occurrences. For better visualization “#GST” was excluded from the results in Fig. 6. Figure 7 shows association among popular hashtags. The association is plotted using Fruchterman-Reingold layout (Fruchterman and Reingold 1991; Csardi and Nepusz 2006). Similarly Table 3 shows the hashtag adjacency matrix. Both Fig. 7 and Table 4 the relationship among different hashtags.

Fig. 6
figure 6

Top15 (#)Hashtags

Fig. 7
figure 7

Association among frequent hashtags

Table 4 Hashtag adjacency matrix

Inference:

Hashtags allows the government to understand the latest trends among the people. Any new policy will lead to increase in people participation in discussion on Twitter (social media). This discussion can be in favour as well as against the new policy. So, hashtag analysis enables the government to keep a check on any trends generated among general public and taking appropriate measures to resolve the issues. Hashtag analysis shows that overall people were in support of GST, this is evident from the fact that our analysis shows most popular hashtags were “#gstforcommonman”, “#gstindia”, “#gstfornewindia”, “#onenationonetaxonemarket” etc.

However for a brief period during phase-2 (In-GST Period), hashtags such as “#Confussion”, “#GSTConfusion” and “#Clarification” were trending, indicating that people were not clear with GST rules and were raising the voice using these hashtags on twitter. But as soon as government took preventive measures like introduction of online course for GST training (NDTV News 2017) and restricting the retail sellers to charge more the MRP (Business Standard News 2017a, b), the trends were again in favour of GST.

4.4 Sentiment Analysis

Sentiment analysis is often regarded as most effective tool to map public response towards an entity (Joseph et al. 2017). Sentiment analysis is defined as a text analytic technique which deals with extraction of sentiment from given piece of text (Liu 2012; Mishra and Singh 2016). Unlike the previous analysis, the upcoming analysis requires preprocessing of data. Data preprocessing is an important task, which aims to prepare data for various data mining task (Van Broeck et al. 2005; García et al. 2016; Liu et al. 2015). Since Twitter data contained lot of noise and unwanted stuff, task of data preprocessing was performed (Haddi et al. 2013). These tweets preprocessed by performing various task conversion to lower case, removing punctuations, removing special characters and finally removing web links. Sentiment analysis consists of two sub-operations: (a) E-Motion Analysis (Mohammad and Turney 2010; Ou et al. 2014) and (b) Polarity Analysis (Saif et al. 2013; Yuan et al. 2017).

  1. (a)

    E-Motion Analysis: It is a sentiment analysis operation, in which given data is classified according to emotion lexicon comprising of words having association with eight emotions (Trust, Surprise, Sadness, Joy, Fear, Disgust, Anticipation and Anger). The result of e-motion analysis is shown in Fig. 8.

Fig. 8
figure 8

Results of E-Motion analysis

Inference:

Phase-1 (Pre-GST Period) dominates emotions such as trust, anticipation and joy indicating support of people for GST. These results totally got reversed during phase-2 (In-GST Period) as emotions like fear and sadness dominated, showing unrest among citizens and alerting government to take appropriate steps. As soon as appropriate steps have been taken by government results of phase-3(Post-GST Period) started to fall in line with phase-1 (Pre-GST Period), indicating support for GST once again.

  1. (b)

    Polarity Analysis: Polarity analysis deals with polarity identification i.e. positive or negative. The result of polarity analysis is shown in Fig. 9, while the daily number of positive tweets is shown in Figs. 10 and 11 shows daily number of negative tweets. The shaded portion (Orange color) in Figs. 10 and 11 shows the In-GST period. Similarly, treemap of positive and negative words is shown in Fig. 12.

Fig. 9
figure 9

Results of polarity analysis

Fig. 10
figure 10

Number of positive tweets (day wise)

Fig. 11
figure 11

Number of negative tweets (day wise)

Fig. 12
figure 12

Treemap of positive and negative words

For better interpretation of the results we use a threshold value (μ) as given in Eq. 1. This threshold value will act as an indicator for the government act when the situation starts becoming out of control i.e. opinion of people towards the new policy is on the negative side.

$$ \upmu =\frac{\sum Negative\ Tweets}{\sum Tweets} $$
(1)

For our calculations we have taken μ = 50% (Threshold Value) i.e. whenever μ > 50%, this will be alarming signal for the government that citizens are not happy with the policy and some appropriate measure are required to overcome this unrest. Further, the warning signals have been further classified as primary, secondary and tertiary. The details of the same are shown in Table 5. Depending upon the type of warning the government needs to take appropriate steps.

Table 5 Classification of warning signals

Inference:

During phase-1 (Pre-GST Period) the overall threshold value μ < 50%, this shows that citizens were in favour of GST as it ended multi tax system and hence giving a hope that prices of fast moving consumer goods (FMCG) will go down (Economic Times 2017). However, as soon as GST was actually implemented on July 1, 2017 (In-GST Period) the threshold value μ > 50%, showing unrest among citizens. This unrest was due to various problems encountered by public as soon as GST was implemented. From consumer’s perspective negative sentiment was due to the fact that traders were charging GST over and above maximum retail price (MRP). Similarly, the traders were unhappy because of the confusion about GST rates on various goods plus they needed to update their current inventory system which led to increased expenses. Since we discussed that as soon as threshold value μ > 50%, it is an alarming signal for the government to take appropriate steps. So acting upon this, government soon issued warnings to traders that they cannot charge over MRP for any good (Business Standard News 2017a, b). In addition to this government also introduced various courses to train traders and make them familiar with GST filling and other related processes (NDTV News 2017). As a result of these measures by the government the threshold value again became μ < 50% during phase-3 (Post-GST). This provided a concrete proof that the actions taken by government after phase-2 did helped to win the trust of citizens.

4.5 Topic Modeling

Topic modelling identifies the main themes among the captured tweets (Blei 2012). It is defined by the common set of words that have high probability of belonging to a particular topic. For performing the process of topic modelling latent dirichlet allocation (LDA) is used (Arun et al. 2010; Deveaud et al. 2014). For deciding optimum number of topics we calculated the coherence score of topics (2 to 30). The results of these are shown in Table 6, while the scores are shown pictorially in Fig. 13. The results show that 29 topics are optimum. The results of topic modelling are shown in Fig. 14, in form of intertopic distance map.

Table 6 Coherence score for different number of topic
Fig. 13
figure 13

Coherence score for different number of topics

Fig. 14
figure 14

Intertopic distance map

Inference:

The intertopic distance map helps to better understand the results. The size of circle represents the amounts of tweets associated with the topic. The size of topic-1 is the largest, indicating large portion of positive tweets. Though there were 29 different topics, however topic-1 (representing positive tweets), topic-2 (representing negative tweets) were visible as distinct one. Rest all (3 to 29) are all over lapping each other, indicating that all these topics are related to tweets where people have some sort of confusion or are demanding some sort of clarification on GST implementation. In order to get better interpretation of these results we calculated cosine similarity (Dehak et al. 2010) of tweets in topics 3 to 29. The results of cosine similarity are shown in Fig. 15. The results show moderate relationship among words of tweets in topic 3 to 29. If we zoom at particular portion as shown in Fig. 16, we can see words like “clarification”, “filling”, “gstreturn” etc. indicating that people wanted clarification in GST filling. So topic modelling can help the government to understand the main topics (themes) that are prominent among people leading to engagement on social media (Twitter). As indicated from results here the people wanted some clarification regarding GST filling and GST registration.

Fig. 15
figure 15

Results of cosine similarity

Fig. 16
figure 16

Zoomed results of a particular portion

4.6 Community Analysis

Community analysis is part of network analysis, which detects various communities taking active part in a particular discussion (Ding 2011). The polarity (positive and negative) wise results of community analysis is shown in Fig. 17 for all three phases. In this the green color nodes indicates community which is happy with GST implementation, while red color nodes indicates the community which has negative opinion towards GST. The results clearly indicates that the number of people with negative opinion towards GST increased during phase-2 (In-GST Period). But once all preventive steps were taken positive community again had upper hand.

Fig. 17
figure 17

Phase wise results (polarity) of community analysis

The overall results of community analysis is shown in Fig. 18. The results shows that there are four prominent communities. Community-1 (Blue) indicates people who are supporting GST, making it the largest community. Community-2 (Red) indicates people which are against GST. Finallay, community-3 (Green) and community-4 (Yellow) showing people who have some confusion and requries some clarrification regarding GST.

Fig. 18
figure 18

Overall results of community analysis

Inference:

The community analysis helps the government to identify prominent communities in order to check whether the newly implemented policy forms large positive community. Further, it can also assist the government to identify communities that have some sort of confusion or problem regarding the newly implemented policy, so that appropriate steps can be taken.

4.7 Geo-Location Analysis

Location based analysis is a very crucial tool for gathering information, while mapping public response towards an entity (Amirkhanyan and Meinel 2017). Although all tweets do not contain location from where they were tweeted, yet we can’t ignore them as they provide important information about the actual location, hence helping policy makers to target the audience while finding solution to their problem. Since in the previous section we detected that during phase-2 (In-GST Period) the overall sentiment was on negative side, it is essential to see which states and cities are worst affected so that appropriate solution can be found. Figure 19 shows results of zone wise analysis, while Fig. 20 shows results of state wise analysis and finally Fig. 21 shows location wise analysis of two markets of Mumbai city from where negative tweets were sent.

Fig. 19
figure 19

Results of zone wise analysis

Fig. 20
figure 20

Results of state wise analysis

Fig. 21
figure 21

Results of location based analysis

Inference:

This location based analysis is extremely important; as it gives us the targeted audience which is unhappy with the policy and hence government can take appropriate steps keeping in mind the demands or problems of this targeted group. Since these two points indicates markets of Mumbai city, hence it shows that traders of these markets are not happy and government needs to take appropriate steps accordingly.

5 Performance Analysis

Since we are proposing this system for real time implementation, therefore it is very important to check the performance of system in a cloud based environment. As discussed earlier the entire experimentation has been performed on Amazon EC2 using compute optimised c4:large instance (containing 2 processors and 8GB of RAM). Since topic modelling is computationally very expensive (Grover et al. 2017), so the entire experimentation of performance analysis has been performed on topic modelling only. The collected data was broken up and the performance analysis was computed for 5000 tweets going all the way to 40,000 tweets, incrementing 5000 tweets in next iteration every time. The results of performance analysis are shown in Figs. 22 and 23 for resource utilization and execution time respectively. The results clearly indicate that resource utilization increases with increase in number of tweets (data). Similarly, execution time increases with increase in number of tweets (data).

Fig. 22
figure 22

Results of resource utilization

Fig. 23
figure 23

Results of execution time (seconds)

6 Discussion

Methodologically, this study presents a cloud based system in which user generated social media content (tweets) is collected; and how this data can be utilized in efficient monitoring and controlling of public policies using social media analytics like descriptive analysis, content analysis, network analysis and geospatial analysis.

This study tries to map public opinion towards GST implementation by the Indian Government. The data was collected from Twitter using hashtag “#GST”. Different social media analytics were applied in cloud based environment (RStudio Server on Amazon EC2) to have an insight of GST policy from public perspective. The study of tweet statistics (Purohit et al. 2013) helped in understanding the impact and scale of the issue caused by implementation of the new policy. The trends were detected using (#)hashtag analysis (Chae 2015). Not only this, it also helped in understanding association between various hashtags. Sentiment analysis (Saif et al. 2013), gave polarity and emotions associated with tweets, which helped in understanding the opinion of people toward the GST. The sentiment analysis also served as an indicator for calculating threshold value, based upon which warning signals were generated for the government (See Section 4.4 Sub-section (b)). The identification of theme was done by topic modeling (Llewellyn et al. 2015). Network analysis (Walther and Kaisser 2013) helped in detection of prominent communities, engaged in GST discussion. Finally, geo-location analysis (Singh et al. 2018b) helped in detection of targeted audience which had negative opinion towards GST. The threshold based warning signals and location based analysis through which we can map the unhappy target audience up to market level were the unique contributions made to this study.

The results clearly indicate that in the starting (phase-1) there was a strong support for GST, but as soon as GST became reality (phase-2) and people started facing hardship the support for GST dipped and negative opinion rose among the people. Observing the dip in popularity of GST, the government acted fast and took steps to ensure that all confusion and hardship faced by public can be resolved. As soon as these preventive measures were taken, the overall opinion again changed to positive in the final phase (phase-3).

6.1 Economic Feasibility Analysis

The main reason for integrating cloud computing in our proposed system was cost cutting, as we only pay for what we use and initial cost of installation and later cost of maintenance are all eliminated. Since cost plays an important role in implementation of such projects and often becomes a major bottleneck in developing countries (Like India). Hence is becomes extremely important to evaluate economic feasibly of our proposed system. The very basic instance (t2.nano with 1 processor and 0.5GB RAM) available at Amazon cost around $0.0062 per hour (Amazon EC2 Instance Price 2018). While the instance used by us for the entire experimentation c4:large instance (containing 2 processors and 8GB of RAM) cost around $0.10 per hour. Depending upon the hardware requirements for computation we can switch to any instance. Another cost that needs to be considered here is social media sites API cost. Again depending upon the use the government might opt for monthly or annual contract (Twitter API Price 2018). Paid APIs gives access to various additional parameter like location of all tweets, which is not available in free APIs. So from above we can conclude that the proposed system is highly cost effective.

6.2 Implications for Practice and Policy

The implication of the study for practice can be divided into three subsections (a) Government which will be implementing any policy (b) General Public for which this policy will be implemented and (c) cloud service providers. These subsections are discussed briefly below.

  1. (a)

    Government: The government is one of the main stakeholders in this system, as it will be the one who will be implementing the policy for the public and then monitor it using the proposed cloud based system. The first and foremost thing the government needs to do is to spread awareness among people before implementing any policy so that people passively (Androutsopoulou et al. 2018) start discussing about the new policy on social media platforms. Since the data is collected from social media platforms, and if people are not discussing so there will be no data available for the government to monitor.

For this government can create a page on Facebook, or highlight the advantages of new policy through tweets on Twitter. Early discussion on social media platform will help the government to better monitor the policy before actual implementation by considering various suggestions highlighted by people on social media. Another important insight that the government can gain are, what all factors lead to polarization among various social media groups towards a policy (Grover et al. 2018). In case the government receives too much negative opinion towards the policy, then government might consider delaying its implementation or might take expert opinion for its betterment. Once the policy is implemented and if afterwards government starts receiving negative opinion towards the policy as it was in case of GST. Then government should take appropriate steps to overcome the hardship faced by the public.

  1. (a)

    Public/ People: General public are the other main stakeholder in this system. Public are the one for whom the policy is implemented. As revealed from results of our study, public should engage in social media discussions regarding the new policy (Hossain et al. 2018). Discussion on social media will help the public in understanding the merits as well as demerits of the policy in a better way. They can further raise their concern, or even suggest some recommendation that will indeed help the government for better policy formation of new policy and improvement in the existing policy.

  2. (b)

    Cloud Service Providers: Cloud service providers such as Amazon web service (AWS - Amazon EC2), Microsoft Azure (2018), Google cloud (Google Cloud 2018) etc. can engage in long term commitment with the government to provide them cloud platform for implementation of such systems. Since it will be a long term commitment, these service providers can gain long term benefits. Further, they can provide additional data analytics services to boost their revenues.

6.3 Limitation and Future Research Directions

Though, the proposed system has given us some encouraging results yet it suffers from certain limitations that needed to be addressed. Firstly, data (Tweets) for experimentation was fetched only from Twitter, while other social media platforms like Facebook, Instagram etc. were not utilized. Secondly, only original tweets were considered for analysis, while no emphasis was given to re-tweets. Re-tweets as we know constitute up to third of the entire Twitter traffic (Holmberg 2014), hence considerable amount of Twitter traffic was ignored by us in our analysis. Similarly, bot detection (Chu et al. 2010) was also not performed in our system. Social media bots can adversely affect the analysis by making the results biased. Hence, it is very important to address them. Though there are many advantages of cloud computing, still cloud computing has its own limitations such as network communication cost, unreliable latency, lack of mobility support etc. (Stojmenovic and Wen 2014). We can overcome these limitations by extending our system to Fog computing (Yi et al. 2015). All above stated issues can be addressed in future.

7 Conclusions

Technology is advancing with a rapid pace and various governments are also proactive in terms of adopting newer technologies for providing access to enhanced services to its citizens. Multiple efforts are being done to bridge down the perception gap between the policy makers and the general public, for whom these policies are actually designed and implemented. Cloud computing and social media platforms have emerged as two significantly powerful tools for the governments all around the world to communicate and provide quality services to its citizens. However, all the studies conducted till date to the best of our knowledge have utilized the advantages of these tools independently and no fruitful effort has ever been made towards combined implementation of these powerful technologies. We started our research work to combine the selective capabilities of cloud computing and social media analytics towards efficient monitoring and controlling of public policies. We proposed a new cloud based approach, which captures the response of public through data gathered from various social media platforms about new policy implementation. For monitoring of public policies various social media analytics techniques are applied on this captured data (tweets). Based upon the results of these social media analytics techniques, the appropriate controlling operation was performed. We tested our proposed system with data collected towards GST implementation by Indian government.

The initial results indicate a strong support for GST, however as soon as the GST became reality, traders and consumers faced hardship and the overall sentiment dipped towards negative side raising an alarming signal to the government to take appropriate actions. Based on post implementation reaction, government took various controlling measures which again resulted in increase of positive sentiment among citizens. A prominent contribution made in this research was to do location based analysis which can help the government to select the targeted audience affected by a new public policy and can take control measures accordingly. These implementation results were encouraging, suggesting that the system proposed and evaluated in this study can be used for efficient monitoring and controlling of public policies.