Introduction

For any type of information social media is a boon. It is the gateway to real-time mechanisms providing irrepressible and effective information. Even though the data is a useful major chunk of social media data is the conversational type which does not require a search. Social media data are huge, noisy, unstructured, [1, 2] dynamic and distributed. Presently the focus of research is how to make the best use of social media data which is being accumulated in databases. This article attempts to examine the informative tweets from social media, especially during natural calamity in order to provide safety and relief measures for the affected people. Now a day’s broadcasting information to the people of the world is a micro-blogging site Twitter. It is a digital focal point where people converge for information, especially during the natural calamity. Generally, information from twitter can be obtained directly or indirectly. Directly it can be obtained by the people involved in the calamity. This information can also be obtained from reports or pointers of the information is available anywhere. The information on the natural calamity in social media can be clouded by conversation and incomplete information.

Analysis [3,4,5] of informative data will allow further to understand the information on tweets which will help to know the trends and extended the disaster. Data sources with information along with the derivate branches, when mapped identify the users (nodes) with sent @ replies (outbound) and received retweets. This will help to identify the numerous patterns including logistics or focal points of disasters [6]. Relieve efforts can be expedited if there is a system to filter information of conversational tweets combining geo-locations with sentimental analysis along with informative social media filtering will help to take correct decisions that will help the sufferings of the victims and also save the lives of humanity in natural calamities. An attempt is made to identify the location of the occurrences of natural disasters so as to send rescue teams to those locations.

There are two work phases suggested namely data streaming [7] from twitter and mining knowledge through R-Studio. For these two operations, the methods used are twitter API and sentiment analysis [8] through R. Twitter applications requests for connection twitter database. When the connection is established authentication is generated, providing search key the natural disaster. A file will be data frame (DF) is generated with tweets and is converted into a comma separated values (CSV) file which can be visualized in a Map. Twitter applications requests for connection twitter database. When the connection is established authentication is generated, providing search key the natural disaster.

Big data analytics

The process of analyzing and mining big data [9] known as big data Analytics can provide operation and business knowledge with clarity and uniqueness. One of the important tasks of big data is to analyse data collected by business houses for trends in leverage [10]. The innovative techniques of big data are primarily in data storage, processing and analysis [11]. These methods ensure reduction in storage cost and power consumption. Data centers will be profited by cost reduction techniques. Further the cloud computing increases the efficiency. Also new development of packages Hadoop for computation in distributed mode is a weaker tool. Processing large volume of data in parallel mode is an important measure [12]. Tradition analytics, big data analytics has several differences. Data tools namely Clojure, Scala, Python, Hadoop and java for Natural Language Processing (NLP) [13] and text mining and R, MAT can be used in data analytics.

Twitter

Social media has become a very important tool to stay in touch with friends, to market any subject of interest.

Fast communication in short form is the key of social network twitter started in 2006. It has global popularity and is one of the can most viewed websites in the word. The twitter messages are twitters. For subscriber are twitter all over the world the messages show up as microblogs which are brief in the amount of text posted in comparison with regular blocks. The Twitter limit is 140 characters. Twitter often contains links to online resources such as WebPages images or videos and they refer to other users called mentions. When a message is posted and all users see the updates, they take note of the author of the message namely the submitter.

All posts in twitter are made public. The messages can’t receive if the submitter is not followed. The messages can be located by keywords or topics. Twitter follows its own concessions to make a distinct from other textual messages. For effective knowledge applications, understanding language and terms play a key role.

The following are the some of the terminology used on twitter.

  1. 1.

    A message posted in a twitter with a maximum of 140 character is called twit [14].

  2. 2.

    It maps contained text, photos, links, and videos.

  3. 3.

    The user name of twitter appears after “@”.

  4. 4.

    To mark keywords are topics hashtag symbol is used “#” in a tweet. It categorises a message.

  5. 5.

    The other users of twitter are referred @ who get alerts.

The reply is used to respond to a tweet. Answering a tweet built up personal relationship among follower and friends in a conversation. When a tweet is chosen from others and tweeted again it is retweet. It can be done by a retweet button or addition own message including “RT” before retweeted. An easy way of identifying people and topics is through Twitters, hashtags mentioned. This allows searching and filtering information on any subject of interest.

Natural disasters in India

There are number of natural disasters that occur regularly in India. Some of them are earthquake, cyclones, landslide, cloudburst, storm, flood, tsunami, volcanic eruption, heat wave, and cold wave.

During the earlier day, the natural disaster information is communicated to others by a phone call or telegram, direct observation or personal interview. This process is used to gets delay the help or relief operations. When relief operations get delayed, the human and animal mortality increases and sufferings of people will increase.

Due to natural calamities namely floods tsunami, storms, etc. occurring almost every year. Thousands of human life and millions of dollars are including loss of animals and property damage. It’s being done. The internet technology developed can be used to some extent to reduce the sufferings of the tragedy victims. So to observe the moment of natural disasters in India, we have targeted the information from social network web sites like twitter to do sentiment analysis to see that among the Indian population, on natural disasters taking a sample of live tweets about natural disasters.

Tweets are the quick real-time sources of information. The information on natural disasters when tweeted will reach the rest of world faster than any other source. In many countries in the world, twitter information is used to manage natural disasters. Even though the damage due to calamities is more in India with regard to human population and damage to property compared to other countries because of effective management of twitter with respect to information on national disasters.

In Indian situation, reporting of natural disasters through tweets is very small, since using social media is only getting popular slowly. A database has to be prepared for the natural calamities occurrences in India using sources like news media both print and electronic, social media and search engine data and twitter data.

The response to disaster and relief data has to be streamlined for effective relief operations through Tweet tracker and tweedr [15] systems built in the USA. Earlier research in this direction has proposed location inventory store emphasizing location tagging and information from news agencies. A wider source structure has to be proposed for India.

After identifying the different types of natural disasters occurring in India the places of their occurrences and their frequency has to be documented. The geographical data like latitude, longitude and any other landmarks for easy identification may be recorded. The information on reachability of the vulnerable areas of disasters by road, water, and air should be documented. A constant monitoring of these places be made and all available geo information may be provided to print media, electronic media, the internet and social media.

In case of emergencies, these sources can react for reporting the news and help in relief measures for the victims from Non Governmental Organizations (NGO) and other services organization in addition to Government services. Asia, tops in terms of number of disaster events among the continents. Close to 60% of the disasters in Asia are organized in South Asia and 40% [1] are organized in India.

Sentiment analysis

Sentiment analysis is the process of using text analytics to mine various sources of data for opinions. Often, sentiment analysis is done on the data that is collected from the internet and from various social media platforms. Politicians and governments often use sentiment analysis to understand how the people feel about themselves and their policies.

With the advent of social media, data is captured from different sources, such as mobile devices and web browsers, and it is stored in various data formats. Because the social media content is unstructured with respect to traditional storage systems (such as RDBMS, Relational Database Management System), we need tools that can process and analyze this disparate data. However, big data technology is made to handle the different sources and different formats of the structured and unstructured data. In this article, I describe how to use big data tools to capture data for storage and process the data for sentiment analysis.

Architecture

It consists of

  1. 1.

    Twitter environment.

  2. 2.

    R-Studio.

  3. 3.

    R-Packages.

Environment of twitter

Oauth package of R is used for Twitter API confirmation procedure. Figure 1 shows the steps involved in usage of Oauth to Access Twitter API.

Fig. 1
figure 1

Show an flow of steps twitter

  1. 1.

    Upon registration by the user in twitter a key and secret key are provided which are required for application authentication.

  2. 2.

    The authentication process is initiated with the help of these keys, which are used to create a twitter link. The twitter verifies the user’s identity and issues PIN3 called verifier. This pin is required for twitter application.

  3. 3.

    This PIN is used for requesting an Access Token and Access Secret, which are exclusive to the particular individual from twitter API for continuing the next application process.

  4. 4.

    GetUserAccessKeySecret contain token and secret key, whose information are required for further use.

R-studio

For statistical analysis and a Graphical analysis of the large datasets, R studio environment is suitable. R studio which contains more than 8000 packages is available. Four window environments are provided in R Studio. The work area in the left bottom is called R console. Here the R scripts are implemented. The area where R scripts are written is the top left area which is called R script area. The area where variables are defined and reading of data sets in the right top area called Global environment. Display of charts [16] of data in the right bottom area called plot area. The script area of R-Studio is where the R code is written.

R packages

The number of packages is called R packages are used for R functions compiling code on sample data. These functions make the R library environment. At the time of installation, R installs some packages by default. Based on requirement other R packages are installed and loaded separately as per the particular specifications. The number of packages like twitters ROAuth, twitter, plyr, stringr, ggplot2 RColorBrewer and Devtools is used in the implementation of this paper.

  1. 1.

    TwitteR: It is mainly used for providing an interface to Twitter API.

  2. 2.

    ROAuth: it provides the users it means of o authenticate to the server.

  3. 3.

    Plyr: a package called for plyr is used for solving a big problem but using this package; we can divide a big problem into small pieces, solve them and put them again together.

  4. 4.

    Stringr: this is an easy function for String functions in R. These functions can handle characters of zero length and NA’s also.

  5. 5.

    Gplot2: Graphics in R can be implemented by using ggplot2 functions. It supports multiple data sources and is useful for both base and lattice graphs.

  6. 6.

    RColorBrewer: This package can be used for drawing nice maps shared according to a variable through palettes.

  7. 7.

    Devtools: Devtools helps the developer in providing functions for simplifying many common tasks.

An R script is simply a text file containing (almost) the same commands that you would enter on the command line of R. (almost) refers to the fact that if you are using sink() to send the output to a file, you will have to enclose some commands in print() to get the same output as on the command line.

Methodology

In this study, a methodology is proposed to find location using twitter the social media. The proposed system constitute the following steps for twitter analysis.

  1. 1.

    Creation of twitter application.

  2. 2.

    Twitter API codes are executed trough R-studio.

  3. 3.

    Import tweets trough twitter API.

  4. 4.

    Standardizing the data.

  5. 5.

    Classification of the data.

  6. 6.

    Getting scores.

  7. 7.

    Establishing R maps to view results.

Creation of twitter application

Part of the recent tweets of twitter is accessed by twitter search API. The collected tweets are cleansed for usage in research work. For performing all these tasks creation for a twitter application is essential.

Using R-studio execute twitter API code

R console executes twitter search Application Program Interface code. Establish the Twitter website to have an interface with tweets. Comma separated values (CSV) contains a file which has other tweets. There are many R-packages to be installed through R-command which is a part of Twitter API process.

Natural disasters.list < - searchTwitter (‘makeinindia’, n = 1000, lang="en")

The tweets from the source for the past couple of days on natural disasters are returned by the above command.

Import tweets through twitter API

This is one of the introductory property with twitter. Later, we retrieve the latest tweets with the area keyword. The final phase of downloading tweets from the timeline [17] is done by search Twitter function. These lists [17] of tweets are converted into the data frame (DF) which in turn is converted to.csv format file.

Standardizing the data

The tweets are converted into useful uniform action by applying some function on the tweets. This procedure is called standardizing of the data. The burden of classification is reduced by removing the extra symbol which does not give any meaning to the tweet.

Classification of the data

Sentiment analysis consists of calculation of combination of the words of the tweets in respect of related list. For performing this word list is to be downloaded and saved to the working directory. Two additional packages PalyR and StringR are required to manipulate strings in sentiment analysis.

Getting scores

Each individual tweet is scored by sentiment function [18]. Comparing the words with related words list gives is scored [19, 20]. In Table 1 shows, the imported tweets longitude and latitude information and Table 2 shows, natural disasters in India imported Tweets using R-Studio.

Table 1 Natural disasters in India imported tweets longitude and latitude information
Table 2 Natural disasters in India imported tweets in R-studio

Results

The status of public opinion is visualized [21] on the natural disasters in India shows in Fig. 2. The sentiments of the user can be visualized by creating visual maps. This can be accomplished through addMarkers function [22]. All the tweets are selected and a world map shape file is made.

Fig. 2
figure 2

Shows natural disasters in India tweets and tweets count

We are maintaining event-related tweets words list in a directory. The dictionary includes a set of standard words that depict natural disasters affected in India words within a context. It identifies natural disasters like earthquake, cyclones, landslide, cloudburst, storm, flood, tsunami, volcanic eruption, heat wave, and cold wave used in social media. These word lists can be obtained from the NDMA (National Disaster Management Authority, Government of India), updated regularly, and integrated into our analyses logic.

Convert free text description of locations into the geographical identifier by using longitude and latitude. The following methods were used.

with(locations, plot(lon, lat))

map = leaflet() % >% addTiles() % >%

addMarkers(locations$lon,locations$lat,popup = tweetFrame$text) and print(map)

Conclusion and future works

It is a challenging the task to analyze large data emanating from social media with the existing data mining tool. Our aim is to access twitter and R-studio. Large data for decision making is done by twitter and R-studio analysis. From the retrieved “Natural Disaster in India” data from Twitter Sentiment Analysis has shown the opinion of the people. This concludes that R statistics tool is sufficiently used for the analysis of big data. The application of PYTHON for analysis of big data can be explored.