1 Introduction

Ever since its discovery in Wuhan, Hubei Province, China, in late 2019, the novel Coronavirus has been spreading at an alarming rate [23]. The nCov-2019 or COVID-19 which is associated with human respiratory disease [44], is believed to have potentially originated from either bat [46] or the Huanan seafood market in China [15]. Although severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and coronavirus disease 19 (COVID-19) are all associated with bronchitis, pneumonia, or severe respiratory illness [30], due to the increasing number of cases, the World Health Organization declared COVID-19 as a pandemic [6] within three months after its discovery. The outbreak infected more than ten million people across the globe and reported half a million deaths worldwide [7]. The effects of COVID-19 are not only restricted to infected cases and the number of deaths but also have an impact on several businesses across the world. The travel industry perished due to heavy travel restrictions [24]. The stock market and oil prices were hit critically [38]. Other industries affected due to the pandemic are the tourism industry, sports industry, and restaurant industry [32]. A major contributing factor to the uncontrollable spread of infections is the lack of an appropriate vaccine for COVID-19. Although many researchers have suggested a variety of treatments and proposed medicines, the results are not very promising [13, 18, 36, 37]. Since an established cure could not be developed even after six months of the discovery of the nCov-19, social distancing and other lockdown strategies were suggested by various government bodies all over the world [3, 26]. With the lockdown imposed across many parts of the world, restrictions were placed on major facilities and services, such that the general public was not able to avail of any of these amenities [22, 41].

Human beings are social animals [39]; hence, there is a need for them to interact among themselves as well as with the environment. The growing number of cases all over the world, the inability to socialize, and the lack of access to the environment may lead to psychological distress among people [5, 17, 45]. As mentioned before, the lockdown has also resulted in the closing down of several businesses. Thus people whose livelihood solely depended on these businesses may also be helpless economically. Many suicide incidents have also been reported following the lockdown [10]. Worldwide curfews and confinement to their respective homes have made a lot of people dependent on multimedia devices. There is an increase in the number of devices, especially smartphones [14]. Moreover, students and employees depend on Zoom applications for attending virtual classes and meetings [25]. As humans become more and more dependent on multimedia devices, the increased usage of such devices may also have a drastic impact on people’s minds, leading to much lesser brain activity, loss of productivity, and device addictions. This makes it imperative to analyze the state of mind of people all over the world. While people around the globe may be dealing with several emotions like anxiety, depression, fear, stress, panic, hope, etc., there is a need to identify how positive or negative people are feeling in a lockdown situation. Therefore, there is a need to identify the sentiments of people in such a situation. While some of the past research works explore sentiment analysis during the COVID-19 lockdown situation, the limitation of many of these research works exists in the form of country-specific sentiment analysis. Moreover, the data is collected over a specific time period. This study will address these limitations by considering a global dataset and performing sentiment analysis on two different days during the COVID-19 lockdown situation. One of the most used and popular global social networking platforms is Twitter. It is a micro blogging system that is used for sending or receiving short posts [42]. We analyze tweets during two of the lockdown days and perform a sentiment analysis on the same. Sentiment analysis is used to determine the emotional tone behind a series of words. It gives us information about the attitude, opinion, and emotion behind a set of words. The idea is to explore the emotions of people all over the world with respect to lockdown during the COVID-19 outbreak. The key contributions of the study are as follows:

  1. 1.

    A number of research works related to the COVID-19 outbreak and cure have been proposed before. However, little has been done with respect to the after-effects of the lockdown on human psychology.

  2. 2.

    In this paper, we perform sentiment analysis (positive, negative, and neutral sentences), which is an unexplored territory with respect to lockdown situations during the COVID-19 outbreak, for analyzing the state of mind of people.

  3. 3.

    The data collected is not specific to a given country or region, but is taken from a popular global social networking forum, i.e., Twitter. Some of the features of the dataset considered for this study are ‘created_at’, ‘text’, ‘favourites_count’, ‘retweet_count’, ‘country_code’ and ‘lang’.

  4. 4.

    By the beginning of April, most parts of the world had witnessed lockdown [19]. Hence it would be interesting to explore the sentiments of people after a month of lockdown. It may be essentially useful in designing appropriate lockdown strategies and crisis management.

  5. 5.

    Existing research works consider data for either a specific date or a range of dates. In this study, we have considered two dates, i.e., sentiments of people following two weeks and four weeks of the lockdown, in order to perform an in-depth analysis of their sentiments. This assists us in comparing how their sentiments vary over multiple weeks of the lockdown.

  6. 6.

    We compare the results with two other datasets for validating the study.

The rest of the paper is organized as follows. Section 2 presents materials and methods where we discuss the related works and the proposed methodology. Section 3 highlights the experimental analysis incorporating the datasets and the results. Section 4 discusses the overall results obtained, and Section 5 takes into account the conclusion and future work.

2 Materials and methods

This section comprises two parts. In the first part, we present some past related works with respect to the pandemic. In the second part, we discuss the proposed method of our work.

2.1 Related works

In this section, we explore the past research works that have been performed with respect to the COVID-19 outbreak. We also perform a critical analysis of the previous works so as to justify the novelty of our research work.

[11] performed a Twitter sentiment analysis during the COVID-19 outbreak for specific countries like Australia, China, India, United States of America. The study takes into consideration eight basic emotions i.e., fear, joy, anticipation, anger, disgust, sadness, surprise, and trust. Tweets related to Anger are highest in France, and Anticipation is highest in Germany. The USA reports the maximum number of tweets related to disgust, and Switzerland reports fear. India reports the highest percentage of joy, and Switzerland reports the maximum amount of sadness. The highest number of tweets depicting surprise and trust are from Belgium. The limitation of this study is that it does not take into account the sentiments of people across the globe and is only confined to a few countries. [12] presented a report on analysis and forecast of COVID-19 outbreak for countries China, Italy, and France. The analysis has been performed using a susceptible-infected-recovered-deaths model, indicating that the rate of recovery is irrespective of the country, while death rates and infections vary. The case-fatality ratio is shown to be 4% to 8% in Italy and between 1% to 3% in China. According to the study, 2500 ventilation units are a fair figure for COVID-19 strategic management in Italy. Although the research highlights how the case fatality ratio can be decreased, the limitation of the research is that it is narrow and considers a very small fraction of individuals from only three countries. [4] provided a risk assessment strategy for the novel Coronavirus in the form of a computational tool. The number of cases, connectivity of destination countries with China, and efficacy of control measures of destination countries have been taken into account for performing the risk assessment. The study states that countries with low connectivity to China and higher efficacy measures can reduce the risks better. The limitation of the study is that it considers destination countries from only China’s perspective, whereas COVID-19 is a global issue.

[32] performed an analysis of the outbreak of the COVID-19 and presented a comparison of the cases confirmed, recovered, and deaths with respect to China using data visualization. The study also stated some industries that were severely affected by the pandemic, like tourism, travel, and the sports industry. The study fails to discuss many other global industries like real estate, retail, supply chain, etc. that may have also been affected due to the pandemic. [1] presented an infoveillance study highlighting the top concerns of tweeters during the COVID-19 pandemic. The analysis identified specific themes, i.e., origin of the virus, its sources, its impact on people, countries, and the economy, ways of mitigating the risk of infection. Most of the tweets were classified as positive. The negative tweets were based on deaths due to COVID-19 and increased racism. The highest mean of tweets was for economic loss and the lowest mean of tweets was for travel bans and warnings. The analysis has been performed with respect to the COVID-19 pandemic, and not with respect to the psychology of the public following the lockdown.

[16] proposed a strategy for real-time estimation of risk of death from COVID-19 infection. The study relies on the exponential growth rate of incidence to estimate the basic reproduction number and confirmed case fatality risk. The results estimated a growing number of cases by the end of January suggesting the potential possibility of the infection culminating into a pandemic. The study takes into account limited empirical data confined to analyze only the case fatality risk. The veracity of the assumed date of onset is questionable and detection window time is uncertain. [31] suggested a binary classification and regression analysis methodology for investigating the COVID-19 outbreak for the sustainable development process. The classification model produces accurate results with respect to confirmed cases. The regression analysis has been used for comparing the fluctuations of parameters like wind, humidity, and average temperature. The results manifest that maximum daily temperature and relative humidity have the highest impact on confirmed cases. The case study is restricted to China, hence the number of cases considered is a fraction of the total number of cases. Hence, the results may be specific. [29] conducted a sentiment analysis of students Synchronous Online Delivery of Instruction in the Philippines as a result of community quarantine during the COVID-19 pandemic. Most of the respondents forecasted that they may face issues and many respondents are worried about network connectivity. A negative sentiment analysis (66%) indicates that most of the students may not be able to adapt to the new trends of education. The limitation of this study is that the dataset considered is immensely small focusing on a specific institution. [34] recommended a quicker method of identifying COVID-19 using an artificial intelligence framework. The framework is based on a mobile phone survey using which data may be collected during the quarantine. The data comprises information related to an individual’s location details, demographic information, travel information, signs, symptoms, etc. The only limitation of the study is that it is a device and survey-based application dependent. [20] presented a review of modern technologies that may assist in tackling the COVID-19 pandemic. The methods mentioned in the study are Diagnosis using radiology images, disease tracking, Prediction outcome of patient’s health condition, Computational Biology and medicine perspective, Drug discovery, protein structure predictions, and awareness through the internet. The methods prescribed are certainly interesting, however, according to the past research works, the accuracy of the suggested methods is not 100%. [33] presented an analysis of ‘Fear Injury Kidney’ theory, according to which long term or excessive fear could lead to damage in the neuro-endocrine-immune system. This in turn leads to diseases, and several groups are observed to have the influence of stress and fear on the body which cannot be ignored. [27] analyzed the psychological impact of COVID-19 on the elderly population in China. More than fifteen hundred elderly people representing various ages were considered for the study. More than 37 % of seniors experienced depression and anxiety due to the pandemic. The study also shows that women experience more anxiety and depression as compared to men. The suggestions presented for such a situation included calling psychological hotlines, concentrating on seniors’ mental health, and considering counseling and psychological crisis intervention. [9] presented a study on the impact of COVID-19 on the psychology of more than fifteen thousand university students. Epidemic related stressors are observed to be related to the level of anxiety. The stressors revolve around the impact of the pandemic on the economy, daily lives, academic delays, and social support. [40] explored the effect of COVID-19 lockdown on the psychology of people based on law enforcement. The enforcement of law led a large population to follow rules and restrictions during emergency situations, and quarantine people affected by the disease. The laws stated that citizens not abiding by laws would be penalized. All these impositions have been observed to cause distress among the general public (Table 1).

Table 1 Summary of exiting related works

In this section, we presented the past relevant works performed with respect to the COVID-19 outbreak. Based on the previous work, we can say that COVID-19 research is multi-faceted. Most of the research work done aims at analyzing the outbreak and finding a cure for the same. Very little work with respect to human psychology has been done in the past. One way to explore the domain is by performing sentiment analysis. Another shortcoming of the previous research works is that many research works fail to conduct the studies globally, and many times the study is limited to specific regions and a specific number of cases. In our research, we will be handling this shortcoming by targeting data from all over the globe on a social networking platform.

2.2 Methodology

As is evident from the previous sections, a significant number of research works related to COVID-19 outbreak analysis and clinical trials have been proposed before. However, the after-effects of the lockdown on human psychology due to the pandemic still needs to be explored. To explore the psychological effects of the pandemic on people’s minds, we consider the natural language processing technique of sentiment analysis. Sentiment analysis leads to the exploration of subjective opinions and feelings with respect to a particular subject by collecting data from various sources. Sentiment analysis can be of various types like fine-grained, emotion detection, aspect-based, and intent analysis. For our study, we determine the polarity of the sentences and perform aspect-based sentiment analysis. It is a classification technique that categorizes sentences into positive, negative, and neutral categories based on their polarity. Polarity or orientation is the emotion expressed in a sentence. For analyzing the sentiment of texts, there is a need to analyze individual words or phrases’ sentiment. There will be several words in a sentence that are positive, and several words are negative. Presenting emotions and sentiments in a sentence is a challenge. Words like ‘good’ seem positive, but bear a neutral meaning when used in phrases like ‘Good Morning.’ Hence, positive and negative sentences are sprinkled throughout phrases. When a text is long, positive and negative sentiments tend to average towards neutral. Hence it is important to separate genuine sentiments from basic phrases. Polar words and phrases are used to denote strong and clearly defined sentiments. On the other hand, non-polar words and phrases are used in everyday language. Thus polarity approximates the sentiments within a given text. In order to determine polarity, we rely on the python library, Textblob, which is a tool built on top of the Natural Language Toolkit. Based on the polarity, it is possible to determine whether a text is positive, negative, or neutral. For TextBlog, if the polarity is greater than 0, the text is considered positive, while a text is considered negative if the polarity is lesser than 0. If the polarity of a text is equal to zero, the sentence is considered neutral. The following operations have been conducted for performing the sentiment analysis.

  1. Step 1:

    Data Collection - The data is collected from appropriate sources

  2. Step 2:

    Text Preprocessing- This step refers to cleaning and preparing text data. It may include several operations like noise removal (removing unwanted data), tokenization (breaking the text into smaller components for accessing each word), stopword removal (removing common words in the language that do not provide relevant information), etc.

  3. Step 3:

    Calculating polarity- Using Textblob, we calculate the polarity of the words

  4. Step 4:

    Calculating the total sentiment score of the text

  5. Step 5:

    Sentiment Results

Figure 1 depicts the steps followed to perform the analysis.

Fig. 1
figure 1

Steps for performing sentiment analysis

The data has been collected from a popular global social networking forum, i.e. Twitter, and is not specific to any country or region. We consider the dates April 16, 2020, and April 30, 2020, i.e., the tweets after two weeks and four weeks of the lockdown. The steps for pre-processing of the dataset are as follows:

2.2.1 Noise removal

Once the data is collected, we observe that there are a total of 21 columns namely ‘status_id’, ‘user_id’, ‘created_at’, ‘screen_name’, ‘text’, ‘source’, ‘reply_to_status_id’, ‘reply_to_user_id’, ‘reply_to_screen_name’, ‘is_quote’, ‘is_retweet’, ‘favourites_count’, ‘retweet_count’, ‘country_code’, ‘place_full_name’, ‘place_type’, ‘followers_count’, ‘friends_count’, ‘account_lang’, ‘account_created_at’, ‘verified’, ‘lang’. For our study, we do not require all these columns, hence we preprocess the data or perform noise removal. Once we remove the unwanted data, we are left with five columns namely ‘created_at’, ‘text’, ‘favourites_count’, ‘retweet_count’, ‘country_code’ and ‘lang’. Since the most commonly used language on Twitter is English, we set the language preference to English. Figure 2a and b depict the twitter data for April 16, 2020 and April 30, 2020 respectively.

Fig. 2
figure 2

a. Tweets for April 16, 2020. b. Tweets for April 30, 2020

2.2.2 Favorited tweets

We extract the most favorited tweets from the ‘favourites_count’ column.

Three most favorited tweets on April 16, 2020.

figure a

Three most favorited tweets on April 30, 2020.

figure b

2.2.3 Retweet count

We extract the most retweeted tweets from the ‘retweet_count’ column.

The three most retweeted tweets on April 16, 2020.

figure c

The three most retweeted tweets on April 30, 2020.

figure d

2.2.4 Number of tweets per hour

Within the span of 24 h, we can analyze the number of tweets generated.

The number of tweets per hour for April 16, 2020, and April 30, 2020, are depicted as follows (Fig. 3).

Fig. 3
figure 3

Number of tweets per hour for April 16, 2020, and April 30, 2020

From the number of tweets generated it is evident that there are fewer tweets in the first few hours, and the number of tweets increases between the 10 to 20 h. For April 16, 2020, the maximum number of tweets per hour is close to 20,000, whereas, for April 30, 2020, the maximum number of tweets per hour is slightly more than 14,000.

2.2.5 Overall WordCloud

Word clouds are used to depict the important words in the collection of texts. One of the popular ways of communicating important information at glance is by using data visualization techniques that promote charts, infographics, graphs, etc., especially when raw data is text-based. In order to perform an in-depth analysis of text-based raw data, it may be difficult to discern which points are the most important. In such situations, word cloud generators can simplify the process. Word clouds or text clouds or tag clouds are images of words used in specific texts or subjects, such that the size of each word indicates its frequency of importance. The more a specific word appears in a source of textual data the bigger and bolder it appears in the word cloud. Word clouds are clusters of words that vary in sizes. They are appropriate for pulling the most pertinent parts of textual data. Word clouds may also be used to compare and contrast two different pieces of text to find any similarities and differences between the two. In this study, we use the word cloud data visualization for determining which words were most frequently used during the COVID-19 lockdown period across social media. Since words have their own polarity, it will be beneficial in understanding the sentiments behind the tweets, and the overall population. The most important words for April 16, 2020, and April 30, 2020, are as follows (Fig. 4).

Fig. 4
figure 4

Word Clouds depicting the important words from tweets for April 16, 2020, and April 30, 2020

From the word clouds generated on April 16, 2020, and April 30, 2020, we observe that some words that have the highest frequency are g20, health, relief, COVID19, lockdown, pandemic, etc.

2.2.6 Data cleaning by removing stopwords

Stopwords refer to data that must be filtered out before natural language processing. In order to determine the polarity of the texts, there is a need to remove the stopwords. After removing stopwords, the data for April 16, 2020, and April 30, 2020, is as follows.

Data for April 16, 2020.

figure e

Data for April 30, 2020.

figure f

3 Experimental analysis

This section is divided into two parts. In the first section, we describe the dataset and in the second section, we perform the experimental analysis.

3.1 Datasets

The datasets considered for the study have been acquired from Kaggle, and is referred to as ‘Coronavirus (covid19) Tweets - late April Tweets using hashtags associated with Coronavirus’. The data collected is not specific to a given country or region, rather it is taken from Twitter. The dataset incorporates the Tweets of users supported by the following hashtags: #covid19, #coronavirus, #covid_19, #ihavecorona, #coronavirusoutbreak, #coronavirusPandemic, #StayHomeStaySafe, #epitwitter, #TestTraceIsolate. In order to conduct the analysis, we consider global tweets from Twitter on April 16, 2020, and April 30, 2020. We specifically choose the dates April 16, 2020, and April 30, 2020. The lockdown restrictions were imposed during the end of March or the beginning of April, hence we analyze the tweets after two weeks and four weeks of the lockdown. As we know, lockdowns were imposed almost all over the world by the end of March or early April. It would be interesting to analyze the sentiments of people on April 16, 2020, which marks the end of two weeks after the lockdown was imposed, and April 30, 2020, which marks the end of four weeks since the lockdown was imposed. April 30, 2020, also happens to be the quadrimester of the year 2020, which means the end of the first four months and nearly a month after the worldwide lockdown was imposed. Since the Coronavirus was discovered in late December 2019, it would be interesting to analyze people’s state of mind after four months of the outbreak. We use statistical learning methods to analyze the tweets.

In order to validate the study, we consider two more datasets from Kaggle, namely Coronavirus tweets NLP - Text Classification, and COVID19 Tweets respectively. We refer to them as Dataset 2 and Dataset 3. While Dataset 2 incorporates Corona virus tagged data, Dataset 3 considers tweets with the hashtag #covid19. Dataset 2 has a total of 87,446 unique tweets, while Dataset 3 has 44,955 tweets. Some of the features for Dataset 2 are UserName, ScreenName, Location, TweetAt, OriginialTweet, etc. Dataset 3 is characterized by user_name, user_location, user_description, user_created, user_followers, user_friends, etc.

3.2 Experimental results

In this section, we present the analysis results for dates April 16, 2020, and April 30, 2020, respectively.

3.2.1 Calculating the polarity of sentences

In order to classify sentences as positive, negative, and neutral, there is a need to calculate the polarity of the texts. The polarity has been determined using Textblob. If the polarity is less than 0, texts are classified as negative. If the polarity is greater than 0, texts are classified as positive. If the polarity is equal to 0, texts are classified as neutral. After calculating the polarity, the revised data for April 16, 2020, and April 30, 2020, is as follows (Figs. 5a and 5b).

Fig. 5
figure 5

a: Data depicting polarity for April 16, 2020 tweets. b: Data depicting polarity for April 30, 2020 tweets

3.2.2 Displaying data categorically

Based on the analysis, we find that on April 16, 2020, the number of positive tweets is 140,084, negative tweets are 57,274, and neutral tweets are 100,855. For April 30, 2020, the number of positive tweets is 96,812, negative tweets are 36,562, and neutral tweets are 65,205. In order to depict the data categorically, count plots have been used.

From Fig. 6, it is evident that there are more positive tweets in both the cases, i.e., for April 16, 2020, and April 30, 2020, respectively. This is followed by the number of neural tweets and negative tweets in both cases respectively.

Fig. 6
figure 6

Classification of tweets for April 16, 2020, and April 30, 2020

3.2.3 Sentiment distribution of data

A sentiment distribution plot can identify how polarity is distributed. The range varies from −1 to +1. The Sentiment distribution for April 16, 2020, and April 30, 2020, are as follows (Fig. 7).

Fig. 7
figure 7

Sentiment distribution of data for April 16, 2020, and April 30, 2020

We observe that the sentiment distribution graph for both days is almost the same. It is evident that there are more data points between 0 and + 1 denoting a majority in positive texts. For both days, the number of negative tweets is less (between 0 and − 1). This may be verified from the number of positive and negative tweets listed above for the days.

3.2.4 Word cloud for positive texts

The most important words in the tweets (positive texts) for April 16, 2020, and April 30, 2020, are depicted as follows (Fig. 8).

Fig. 8
figure 8

Word cloud of positive words for tweets corresponding to April 16, 2020, and April 30, 2020

We observe that words like infection, people, health, Coronavirus, etc. are popular for April 16, 2020 tweets. Words like covid19, Coronavirus, china, grocery, seattle, etc. are popular for April 30, 2020 tweets

3.2.5 Word cloud for negative texts

The most important words in the tweets (negative texts) for April 16, 2020, and April 30, 2020, are depicted as follows (Fig. 9).

Fig. 9
figure 9

Word cloud of negative words for tweets corresponding to April 16, 2020, and April 30, 2020

We observe that words like long, sad, gone, nothing, sick, wuhan, etc. are popular for April 16, 2020 tweets. Words like covid19, million, companies, quarantine, doctors, china, etc. are popular for April 30, 2020 tweets.

3.2.6 Word cloud for neutral texts

The most important words in the tweets (neutral texts) for April 16, 2020, and April 30, 2020, are depicted as follows (Fig. 10).

Fig. 10
figure 10

Word cloud of neutral words for tweets corresponding to April 16, 2020, and April 30, 2020

We observe that words like countries, world, poorest, g20, ebola COVID19, etc. are popular for April 16, 2020 tweets. Words like COVID19, pandemic, lockdown, cancelled, coronavirus, microsoft, etc. are popular for April 30, 2020 tweets.

3.2.7 Most frequent words

Based on the analysis, we identified the top five common words on April 16, 2020, and April 30, 2020, respectively (Fig. 11).

Fig. 11
figure 11

Most common words for April 16, 2020, and April 30, 2020

The analysis shows that ‘covid19’ and ‘coronavirus’ are the most common words for both days and have the maximum frequencies. For April 16, 2020, the other common words are ‘covid’, ‘19’, and ‘s’ respectively. For April 30, 2020, the other common words are ‘s’,‘covid’, and ‘amp’ respectively.

3.2.8 Bar plot depicting most frequent words

The following is a barplot depicting the frequent words for April 16, 2020, and April 30, 2020, respectively.

Figure 12a and b depict the bar plots for the frequency of words based on our Twitter dataset for April 16, 2020, and April 30, 2020. While the most frequent words for April 16, 2020, are covid19, coronavirus, covid, 19, s, amp, etc., the most frequent words for April 30, 2020, are covid19, coronavirus, s, covid, amp, 19, will, etc.

Fig. 12
figure 12

a: Barplot depicting the frequency of words for April 16, 2020. b: Barplot depicting the frequency of words for April 30, 2020

3.2.9 Validating the study on other datasets

In order to validate the study we consider Datasets 2 and 3 and compare the Twitter sentiments on April 16, 2020, and April 30, 2020, with both the datasets (Fig. 13a and b).

Fig. 13
figure 13

a: Comparison of tweets on April 16, 2020, with Dataset 2 and Dataset 3. 13b: Comparison of Tweets on April 30, 2020, with Dataset 2 and Dataset 3

Figure 13a and b show that for Dataset 2, the number of positive tweets is the highest, followed by the number of negative tweets and neutral tweets, respectively. We also observe that for Dataset 3, the number of positive tweets is the highest followed by the number of neutral tweets and negative tweets respectively. In both the datasets, i.e., Dataset 2 and Dataset 3, the number of positive tweets is the highest. This coincides with the initial study performed in which the number of positive tweets is the highest with respect to negative tweets and neutral tweets for April 16, 2020, and April 30, 2020, respectively. Thus, the study is validated.

4 Discussions

Based on the experimental analysis and the results obtained we observed the twitter data for dates April 16, 2020, and April 30, 2020. While the number of tweets with respect to the dataset considered are more for April 16, 2020, with respect to April 30, 2020, it is observed that in both the cases positive tweets take the lead. For April 16, 2020, a total of 140,084 positive tweets were observed with respect to 96,812 positive tweets on April 30, 2020. For both, the dates, negative tweets are low with respect to positive tweets. For April 16, 2020, the number of negative tweets is 57,274 whereas, for April 30, 2020, the number of negative tweets is 36,562. Therefore, observing the huge difference in the number of positive and negative tweets of both the dates it is appropriate to say that people have a positive sentiment towards the pandemic in the situation of the lockdown. The same is verified on Dataset 2 and Dataset 3, respectively. Although the number of positive tweets is lower in April 30, 2020, there is a significant decrease in the number of negative tweets. The negative sentiments that exist in such a situation may be due to several causes ranging from inability to socialize to unemployment during the lockdown. Hence a greater number of positive sentiments are indicative of a healthy state of mind of the public. We provide a comparative analysis of our proposed work with some existing works (Table 2).

Table 2 Comparative analysis of our proposed work with existing works

Based on the comparative analysis, we assert that

  • While many research works target prediction and cure as their objectives, little is mentioned about human psychology and sentiment analysis. In this paper, we attempt to understand human psychology during the lockdown using sentiment analysis.

  • The analysis has been conducted by acquiring tweets from the global social networking platform, Twitter. Since the most common language used over this platform is English, the tweets we are analyzing are also in English.

  • We have specifically chosen the dates April 16, 2020, and April 30, 2020, which are approximately two and four weeks after the worldwide lockdown was imposed. Thus we were able to compare the sentiments of people using the two dates.

  • COVID-19 is a global issue. Most of the research works pertaining to COVID-19 sentiment analysis are confined to specific countries. This paper performs sentiment analysis on global data for the first time to the best of our knowledge.

Based on the analysis, with the majority of positive texts for all the datasets considered, it is evident that people exhibit a positive attitude towards the pandemic during the lockdown. It is not surprising that the effects of the lockdown can be seen in the general public devoting much of their time to multimedia devices. With restrictions on socializing many people spend their time playing games or watching videos during the lockdown. This may plant the thoughts of loneliness and depression in individuals. However, multimedia also offers various methods of socializing and interacting virtually through gaming platforms, social media sites, and video platforms. Although increased use of multimedia may affect people psychologically, the increased number of positive tweets is indicative of an optimistic attitude that people have exhibited during the lockdown. Some limitations of the study are as follows.

  1. 1.

    The study is restricted to comprehending the psychological state of people who use Twitter. Other social media platforms and networking sites may include the public who are not Twitter users.

  2. 2.

    The study does not consider the psychology and sentiments of other communities and groups like daily wage workers or healthcare staff if they do not use social media.

  3. 3.

    There are other languages like Chinese, Spanish, Arabic, etc., that are frequently used on social media platforms like Twitter. There is a large population of Twitter users relying on these languages. We have analyzed the tweets in English for this study.

5 Conclusion and future works

In this paper, we performed sentiment analysis of Twitter data specified during the lockdown due to the COVID-19 pandemic. We chose two dates, i.e., April 16, 2020, and April 30, 2020, which marks two and four weeks after the worldwide lockdown was imposed. The idea behind choosing the dates two weeks apart was to analyze the sentiments of people all over the world due to the lockdown. The lockdown led to many businesses closing down, economic crises, and suicides among the general public. Moreover, people being dependent on multimedia devices for spending their time during lockdown may also serve as a victim of severe psychological effects like loneliness and depression. Hence there is a need to analyze the psychology of the human mind during a situation like this. Based on the sentiment analysis conducted, we observed that the number of positive tweets was more in both the cases depicting a positive attitude of people towards the pandemic during the lockdown. In the future, we would like to broaden the study and perform emotion detection like anger, disgust, fear, joy, anxiety, panic, sadness, sarcasm, etc. using global data. Moreover, an integrated analysis may be conducted to associate sentiments to a psychological profile, behaviors, demographic characteristics, transactions, events, and/or other data.