Introduction

In 2020, the world was rocked by the COVID-19 pandemic, and global concerns related to two major issues arose: health and the economy. Stock markets around the world collapsed under the uncertainty generated by the SARS-CoV-2 virus, and something similar happened with the H1N1 pandemic. In this regard, different studies have been conducted in attempts to understand the effects of pandemics in different areas. In particular, this article aims to study the relationship between the sentiments generated on Twitter and the stock market through sentic computing techniques.

Sentic computing is a recent line of research that is supported by computational techniques, neuroscience, sociology, psychology and mathematics that have the potential to be applied to different areas of knowledge [1]. Some sophisticated artificial intelligence techniques allow for the analysis of different types of data, whether text, images, audio, or video, to determine the sentiment, emotions, or meaning of the content of big data [2]. Areas such as computer science, mathematics, neuroscience, business, administration, accounting, engineering, arts and humanities, earth sciences, psychology, economics, econometrics and finance make use of and are complemented by sentiment analysis [3].

Additionally, stock market prediction is one of the most challenging problems and has concerned both researchers and financial analysts for more than half a century [4, 5]. Research on this topic during pandemics is practically non-existent. However, Khatua and colleagues [6] studied the 2014 Ebola and 2016 Zika outbreaks, finding that smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationships than generic pre-trained Word2Vec or GloVe. In this regard, financial markets are based on expectations [7], and investors buy or sell shares depending on the sentiments of fear or desire caused by some event [8]. Investors bet on the credibility of the media outlets that publish rumours despite knowing that the information is not always reliable [9, 10]. However, this trend is growing with the use of online social media networks such as Twitter, Facebook, Instagram and YouTube [11]. Exposure to topics and ideas is thus increased through these platforms, affecting the stock market and investors’ emotions. Emergencies such as earthquakes, pandemics, political events and terrorism are special cases where the amount of available information is reduced, and decisions have to be made in uncertain circumstances [12].

Indeed, the COVID-19 pandemic has changed our lives, altered the balance within stock markets around the world and created an unstable environment for at least several weeks depending on the country [13]. A similar situation arose 11 years ago, with the pandemic caused by the H1N1 virus [14]; these kinds of events affect the behaviour, emotions, sentiments and feelings of investors. In this regard, emotions and sentiments are inherent to investors’ reactions and decision-making, learning, communication and awareness [15]. Understanding emotions is one of the most important aspects of personal development and growth and is paramount to the emulation of human intelligence. The processing of emotions is important for the advancement of artificial intelligence and a task closely related to the detection of polarity and emotions [16]. The potential for this type of artificial intelligence technique has extended to different fields, such as social media and finance, where sentic computing has been applied to find patterns of behaviour, meaning identification, emotions, trends and even the induction of people’s decisions.

Research on sentic computing and finance has grown, but very few studies have analysed social media datasets during pandemics. It is important to identify signs or patterns that guide investors when selecting the assets that will constitute their portfolios. Investors try to ensure that the prices of selected stocks have a trend that is either very similar to or higher than the corresponding indices. Therefore, an investor must have tools that allow him/her to factor in the most considerable information available for technical and fundamental analysis. This leads him/her to reduce risk and uncertainty as much as possible, as well as maximize returns [17].

The information shared on social media generates sentiments and reactions in investors that influence their decisions to buy or sell financial assets, especially on the stock market [18]. News shared on social media, whether true or not, cause changes in the trends of international stock market indices [19]. Some studies confirm the relationship between sentiments generated by social media in investors and market trends, and the current study continues this path in exploring pandemic periods with a lexicon-based approach to detect polarity in financial news on Twitter. This study aims to address this gap in literature. To fulfil this aim, the study seeks to compare similar emergencies, as well as the use of social media platforms that improve the dissemination of rumours and news that alter investors’ perceptions of the stock exchange. Although there are different techniques and applications for computational finance, no one knows for how long an event will affect the financial market, specifically whether posts on social media, such as Twitter, affect financial indicators and vice versa. This is especially true during pandemics such as H1N1 and COVID-19. To achieve this, we analysed shifting correlations of polarity generated by Twitter accounts that influence the financial sector and its relationship with the most important financial indices worldwide.

The following question guides the study: How does the polarity generated by Twitter posts influence the behaviour of financial indices during pandemics? To answer this question, our analysis proposes a methodology based on the financial sentiment analysis of influential Twitter accounts in the financial field and shifted correlations with the behaviour of important financial indices. The document is divided into five sections. In the first section, we present the state of the art related to sentic computing and finance. In the second section, we show our methodology, which is based on fundamental and technical financial analysis combined with a lexicon-based approach on financial Twitter accounts and shifted correlations during pandemics. The third section describes the findings of the shifted correlations between the polarities of financial market indicators and posts on Twitter. The fourth section is the discussion of results. Finally, we present the conclusions of the study and suggestions for future work.

State of the Art

This section consists of two stages. First, we present a brief introduction to sentic computing; the second section describes the state of the art of sentic computing and finance based on the 13 scientific papers existing in Google Scholar, Scopus, Web of Science, IEEE Xplore, Springer Link and ACM Digital Library.

A Brief Introduction to Sentic Computing

Emotions are fundamental for successful and effective communication between human beings [20, 21]. In fact, in many everyday situations, emotional intelligence is more important than the intelligence quotient (IQ) for successful interactions [16], and social media is a massive modern source for detecting people’s emotions and sentiments [22]. For instance, content posted on Facebook and Twitter shows the opinions, emotions and sentiments of people about any kind of event [23].

Affective computing and sentiment analysis are multidisciplinary areas that include academics and professionals from different disciplines, such as computer science, cognitive science, political science, economics and finance [24,25,26,27]. In particular, sentiment analysis is an area of artificial intelligence that allows us to extract and analyse data stored on social media in images, sounds and videos. Sentiment analysis allows the identification of patterns or characteristics in long data sets. This can be very useful for decision-making in organizations, political movements, business strategies, marketing campaigns and product preferences, among others [1, 15, 24]. Sentiment analysis reveals personal opinions towards entities such as products, services, or events. This helps organizations and companies to improve their marketing, communication, production and acquisition [28].

Cambria and colleagues [16] argued that sentiment analysis has been confused with the task of polarity detection (detecting negative and positive sentiments). However, this is only one of many natural language processing (NLP) problems that must be solved to achieve human-like performance in sentiment analysis. Thus, when performing sentiment analysis, we encounter a complex problem of NLP that can be analysed by different emotional or psychological models [1, 29].

The opportunity to automatically capture the sentiments of the public regarding social events, political movements, marketing campaigns and product preferences has led to growing interest both in the scientific community and in the business world. The scientific community is attracted to exciting open challenges, and the business world is attracted based on the notable consequences for marketing and the incorporation of sentiments into financial market predictions [29, 30]. This has led to the emergence of the fields of affective computing and sentiment analysis, which take advantage of human–computer interactions, information retrieval and multimodal signal processing to identify people’s emotions from the growing amount of social data online [31].

In this regard, the main goal of sentic computing is to create emotion detection models supported by different disciplines. It proposes the application of artificial intelligence and semantic web techniques for the representation of knowledge, mathematics for making inferences, linguistics for analysing discourse and pragmatics, psychology for cognitive and affective modelling, sociology for understanding the dynamics of social networks and social influence and ethics for understanding the nature of the mind [1].

The first wave of sentic computing identified as the state of the art dates from 2009. Cambria, Hussain, Havasi and Eckl [32] published a seminal study related to the detection of emotions in texts for the development of intelligent systems called AffectiveSpace. Today, there are 1154 documents on the Internet regarding sentic computing and the second version of AffectiveSpace, a vector space model that allows for reasoning by analogy on natural language concepts [25]. In 2017, for instance, 169 papers were published on this topic (Fig. 1).

Fig. 1
figure 1

Sentic computing papers per year

On Google Scholar, there are 876 papers, including reports and conference proceedings. Regarding scientific databases such as Scopus, Web of Science, IEEE Xplore, Springer Link and ACM Digital Library, there are a total of 278 documents, including articles, book chapters, books and proceedings from international conferences related to the application of sentic computing in different areas (Table 1) .

Table 1 Sentic computing papers on the Internet

Sentic Computing and Finance

It is widely accepted that public mood is correlated with financial markets. In this regard, sentic computing has great potential to improve the capacity of customer relationship management and referral systems [25]. For example, sentic computing helps to reveal what characteristics clients enjoy or which items to exclude from referrals based on what previously received a negative response. Business and financial intelligence are also major factors in companies’ interest in affective computing and sentiment analysis [16].

Investors have always been interested in stock price forecasting. However, little research has examined the effects of social media on forecast results specifically during pandemics. In this regard, Day and Lee [33] argued that financial sentiment analysis is an important area of financial technology research (FinTech).

Ahmed [34] stated that investor reactions to a financial event, in the form of stock price movement, reflect the severity of the event. He also argued that the words used in published news show a similar degree of emotion.

Computational techniques applied to financial analysis are related to natural language processing, sentiment analysis, text mining, artificial neural networks, machine learning, biologically inspired computational approaches and vine-growing algorithms. Regarding the literature on sentic computing and finance, we identified 13 scientific papers that show the application of financial sentiment analysis (Table 2).

Table 2 Scientific papers related to sentic computing and finance

The first study on the state of the art of sentic computing and finance was conducted in 2013 using a biologically inspired computational approach to solve the problems of narrative financial disclosures [35]. In 2017, Ahmed [34] proposed a lexicon useful to identify investor emotions in financial news. He assumed that the words used in the news to describe an event also exhibit a similar degree of emotion. In 2018, Xing, Cambria and Welsch [39] published a study of articles that harness NLP techniques to make financial market predictions by identifying the research field of natural language-based financial forecasting (NLFF).

Atzeni, Dridi and Reforgiato [36] considered that user-generated data in blogs and on social media have become a valuable resource for sentiment analysis in the financial domain. Therefore, they suggested a fine-grained approach that returns a continuous score in the range of [− 1, + 1] to identify polarity. Malandri and colleagues [40] investigated whether public mood ascertained from social media and online news is correlated with or predictive of portfolio returns, introducing the framework of sentiment-driven portfolio allocation. This novel approach automatically produces an optimal online portfolio allocation strategy.

Xing, Cambria and Welsch [37] investigated the role of market sentiment in asset allocation problems. They proposed the computation of sentiment time series from social media data, finding that the sentiment time series obtained through sentic computing is comparable to that obtained through some commercial tools. Picasso and colleagues [38] combined both technical and fundamental analysis through the application of data science and machine learning techniques. They found a robust predictive model able to forecast the trend of a portfolio composed of the twenty most capitalized companies listed in the NASDAQ100 index.

Also in 2019, Xing, Cambria and Welsch [44] proposed leveraging prior semantic knowledge of assets to find a suitable vine structure for financial portfolio optimization. Their findings suggest that the construction of a semantic vine improves on the arbitrary vine-growing method in the context of robust correlation estimation and multi-period asset allocation. Merello and colleagues [43] showed that the return predictions generated by a regression approach are more meaningful concerning the “buy” or “sell” signals provided by classification approaches during trading. They found that the application of transfer learning and sample weighting over different market fluctuations has been instrumental in enhancing performance. This is especially true for the largest and most important returns. Xing, Cambria and Zhang [45] proposed a novel model termed sentiment-aware volatility forecasting (SAVING). This model incorporates market sentiment for stock return fluctuation prediction and aims to provide a more accurate estimation of the temporal variances of asset returns by better capturing the bidirectional interaction between movements of asset prices and market sentiment. Upreti and colleagues [41] provided an overview of recent developments in the domain of financial news analytics, with a special focus on sentiment analysis and event detection. They observed that robo-readers and sentiment analysis techniques capable of making automated decisions are still in their infancy.

Dridi, Atzeni and Recupero [42] proposed a supervised method and found that semantic features and semantic frames can be applied successfully to sentiment analysis within the financial domain, thus leading to better results. Finally, Akhtar, Ekbal and Cambria [46] proposed a stacked ensemble method for predicting the degree of intensity for emotion and sentiment by combining the outputs obtained from several deep learning and classical feature-based models using a multi-layer perceptron network. To achieve this, Akhtar and colleagues focused on emotion analysis in the generic domain and sentiment analysis in the financial domain. Their model achieved better performance than existing state-of-the-art systems.

In summary, the application of sentic computing in finance has focused on sentiment analysis generated by news related to the stock market. Other applications have focused on the prediction of the behaviour of financial investment portfolios and the detection of useful patterns for decision-making (Fig. 2).

Fig. 2
figure 2

Word cloud of scientific papers related to sentic computing and finance

Methodology

This section describes the methodology of the study based on the comparison of different lexicons to determine the polarity of Twitter posts and their relation to the behaviour of financial indices. This section is divided into two stages. First, we present the fundamental and technical financial analysis of the stock market and Twitter data. Second, we describe the process of extracting and analysing data based on the lexicon approach and date-shifted correlations.

Fundamental and Technical Analyses

The methodology is based on sentiment analysis applied to Twitter accounts that influence stock market behaviour. To achieve the study objective, we downloaded data from Twitter at two important moments: during the H1N1 pandemic and during the COVID-19 pandemic. For the H1N1 pandemic, the period for data collection was from June to July 2009. For the COVID-19 pandemic, the period for data collection was from January to May 2020. The above periods were selected based on the time when the first stock market index peaked and began to fall, i.e., the point at which the pandemics started to influence financial indices. Pandemics cause prices to fall to a minimum and lead to a period of financial disruption and uncertainty. Since the COVID-19 pandemic has been the most damaging to the financial market, we extended the Twitter data analysis period to May 2020.

We selected the financial indices because they are the most representative and influential for each continent. Additionally, we included companies from different sectors, which provided a general perspective of their performance and behaviour. The financial data used for this study consisted of adjusted closing prices, where each of the financial indices in the world reached their maximum and minimum. This was first done for the H1N1 influenza period and then for the COVID-19 period. The analysed stock market indices were IPC (in Spanish, Índice de Precios y Cotizaciones), S&P 500 (Standard & Poor’s 500), NASDAQ 100 (National Association of Securities Dealers Automated Quotation 100), Dow Jones, FTSE 100 (Financial Times Stock Exchange 100), BOVESPA (in Portuguese, BOlsa de Valores do Estado de São PAulo), CAC 40 (in French, Cotation Assistée en Continu), DAX (in German, Deutscher Aktienindex), Hang Seng, Nikkei 225 (Nikkei Heikin Kabuka 225) and SSE Composite (Shanghai Stock Exchange Composite) (Table 3).

Table 3 Description of financial indicators

Once the dates for the maximum and minimum prices were identified in the stock market behaviour, the prices from January 2009 to May 2020 were downloaded from the Yahoo Finance site and Investing.com, which is a common practice of stock market analysts. Then, the percentage loss of each of the indices was calculated (Table 4). Adjusted closing prices were selected because they represented the closing price and the corporate earnings to which each one was entitled once the share was acquired. The adjusted closing price was used when analysing historical returns and was calculated in terms of the stock exchange to which it belonged.

Table 4 Maximum and minimum prices of the stock exchange indices

We selected Twitter posts from financial markets, finance and the economy for this study, collecting data from media accounts recommended by financial experts [47]. Another source of data was Internet sites that published general and international news that affected the world’s population and therefore affected the sentiment of investors. Twitter accounts selected for sentiment analysis consisted of personal, company, or organizational accounts, as well as news broadcasts that were influential or important for finance. Some financial influencers do not have a Twitter account; for this reason, we downloaded Tweets from the company or organization they run, when available. Another important issue is that some Twitter users had an account in 2020 but not in 2009 due to the variation in the adoption and use of social media (NA*) (Table 5).

Table 5 Analysed Twitter accounts

Data Extraction and Analysis

Tweets were analysed to determine semantic orientation by using a lexicon-based approach. In this approach, texts were tokenized by splitting the text into individual words. The stemming of the words was then computed. Next, stop words, punctuation symbols, numbers and other special characters were removed. A lexicon containing both words and a measure of polarity was used to infer emotion in texts. Lexicon-based analysis is a very simple technique to implement, and it yields good results in most cases. In this paper, data (posts on Twitter) were pre-processed according to the steps shown in Algorithm 1.

figure a

We then analysed the cleaned data by applying the lexicon-based approach.

Lexicon-Based Analysis

We used the following lexical items to determine the sentiment of a text: Bing Liu [26], Sentiment 140 [48], NRC [49], Affin [50] and SenticNet [51]. Each of these lexicons has—as part of their structure—the elements word and value. The former (word) is a list of terms that are commonly used on social networks. The latter (value) is a measure of polarity that corresponds to the word used. Bing Liu’s Opinion Lexicon uses the values “positive” and “negative” to rate words. Sentiment 140 classifies the polarity of words as follows: 0 for negative, 2 for neutral and 4 for positive. The NRC lexicon uses a set of sentiments instead of numeric values to determine the polarity of words. The sentiments used are trust ( +), fear (-), negative (-), sadness (-), anger (-), surprise( +), positive( +), disgust (-), joy ( +) and anticipation ( +). In this set, the symbols “-” and “ + ” refer to negative and positive polarity, respectively. In the Affin lexicon, terms are rated for valence with an integer between − 5 (most negative polarity) and + 5 (most positive polarity). SenticNet provides an API that allows for the easy application of sentic computing. We used this API, but the polarity of each word was computed by invoking the polarity_intense() method. During analysis, the polarity values were transformed into numerical values. This process is shown in Table 6.

Table 6 Transformation of values of polarity into real numbers

Algorithm 2 shows the procedure used to retrieve the value of the polarity intensity of a text.

figure b

For each tweet, the strength of polarity—positivity and negativity—was computed and normalized according to the Euclidean norm or 2-norm, assuming that values near zero are neutral (see Eqs. (1) and (2)):

$$\mathrm{positivity}=\frac{ \sum {{\varvec{w}}}_{{\varvec{p}}{\varvec{i}}}\boldsymbol{ }}{\sqrt{{( \sum {{\varvec{w}}}_{{\varvec{p}}{\varvec{i}}} )}^{2}+{( \sum {{\varvec{w}}}_{{\varvec{n}}{\varvec{i}}} )}^{2}}}$$
(1)
$$\mathrm{negativity}=\frac{ \sum {\varvec{p}}{{\varvec{w}}}_{{\varvec{n}}{\varvec{i}}}\boldsymbol{ }}{\sqrt{{( \sum {{\varvec{w}}}_{{\varvec{p}}{\varvec{i}}} )}^{2}+{( \sum {{\varvec{w}}}_{{\varvec{n}}{\varvec{i}}} )}^{2}}}$$
(2)

where

\({{\varvec{w}}}_{{\varvec{p}}{\varvec{i}}}\): positive weight of ith word in text T.

\({{\varvec{w}}}_{{\varvec{n}}{\varvec{i}}}\): negative weight of ith word in text T.

We defined positivity = 0 and negativity = 0 for neutral tweets, which allowed us to avoid division by 0.

In some cases, more than one tweet was published by a person or institution on the same date; for this reason, we decided to average the polarities of these tweets.

Date-Shifted Correlations

We computed the correlation between the financial indices and polarities of posts published on Twitter. The value of the correlation was used as an indicator of the latent relationship between the behaviours of these two variables. The relationships between the financial indices on day A and the polarity of sentiments in tweets published on day B can be discovered by applying a date shift on day B. This allowed the detection of correlations presented a number of days before or after a Twitter post.

To identify whether the Twitter posts made on a given date may have had any correlation with the financial indices reported on a date different from the post, we decided to simulate a (mathematical) translation of the date of the post. In a translation, all dates of posts were moved (forward or backward) to the same number of days in the same direction. We called this translation date shift.

The result provided a shifted value. Based on the price of a financial index reported on Datei and a text of a tweet created on Datei + shift, we built a vector V with the components shown below:

$${\text{V}}={{\varvec{D}}{\varvec{a}}{\varvec{t}}{\varvec{e}}}_{{\varvec{i}}}+{\varvec{s}}{\varvec{h}}{\varvec{i}}{\varvec{f}}{\varvec{t}},{{\varvec{A}}{\varvec{d}}{\varvec{j}}{\varvec{p}}{\varvec{r}}{\varvec{i}}{\varvec{c}}{\varvec{e}}}_{{\varvec{i}}},{{\varvec{S}}{\varvec{140}}}^{+},{{\varvec{S}}{\varvec{140}}}^{-},{{\varvec{B}}{\varvec{i}}{\varvec{n}}{\varvec{g}}}^{+},{{\varvec{B}}{\varvec{i}}{\varvec{n}}{\varvec{g}}}^{-},{{\varvec{N}}{\varvec{R}}{\varvec{C}}}^{+},{{\varvec{N}}{\varvec{R}}{\varvec{C}}}^{-},{{\varvec{A}}{\varvec{f}}{\varvec{f}}{\varvec{i}}{\varvec{n}}}^{+},{{\varvec{A}}{\varvec{f}}{\varvec{f}}{\varvec{i}}{\varvec{n}}}^{-},{\varvec{S}}{\varvec{N}}]$$

where:

shift: number of days to translate Datei, forward if shift > 0, backward if shift < 0, no date translation otherwise.

Price: the adjusted closing price of the financial index index.

S140+, S140: positivity and negativity of a tweet according to S140′s lexicon, respectively.

Bing+, Bing: positivity and negativity of a tweet according to Bing Liu’s lexicon, respectively.

NRC+, NRC: positivity and negativity of a tweet according to the NRC lexicon, respectively.

Affin+, Affin: positivity and negativity of a tweet according to Affin’s lexicon, respectively.

SN+,SN: polarity computed with SenticNet.

Since more than one text was usually posted by the same Twitter user, these form a matrix Mpricesentic of vectors V. Table 7 shows a hypothetical example of such a matrix formed with three texts published on different dates.

Table 7 Hypothetical example of a matrix Mpricesentic of dates, adjusted prices and polarities of posts

The correlations corresponding to the columns of positivity with an adjusted price and those of negativity with an adjusted price are calculated. Algorithm 3 shows this procedure:

figure c

To identify the Twitter posts that had the greatest correlation with the financial indices, we performed a search to find the value of the offset that produced the most significant correlation in an absolute value (see Eq. (3)):

$$\underset{{\varvec{shift}}-{\varvec{date}}}{\mathbf{max}}|{\varvec{r}}({\varvec{M}}{\varvec{p}}{\varvec{r}}{\varvec{i}}{\varvec{c}}{\varvec{e}}{\varvec{s}}{\varvec{e}}{\varvec{n}}{\varvec{t}}{\varvec{i}}{\varvec{c}})|$$
(3)

When the dates for adjusted closing prices did not correspond to the components of the polarity vector with shifted dates, the correlation was defined as not available (NA). The analysis of data was performed on a computer with the following characteristics: 2.2 GHz Intel Core i7 processor, 8 MB RAM, Mac OS 10.15.4. The method was based on the Python programming language.

Results

Due to the large number of computed shifted correlations between the daily polarity of Twitter accounts and the daily behaviour of financial indices, Table 8 shows an example of the polarity vectors (\({\varvec{D}}{\varvec{a}}{\varvec{t}}{\varvec{e}}, {\varvec{S}}{\varvec{140}}, {{\varvec{B}}{\varvec{i}}{\varvec{n}}{\varvec{g}}}^{+},{{\varvec{B}}{\varvec{i}}{\varvec{n}}{\varvec{g}}}^{-},{{\varvec{N}}{\varvec{R}}{\varvec{C}}}^{+},{{\varvec{N}}{\varvec{R}}{\varvec{C}}}^{-},{\varvec{A}}{\varvec{f}}{\varvec{i}}{\varvec{n}}{\varvec{n}},{\varvec{S}}{\varvec{e}}{\varvec{n}}{\varvec{t}}{\varvec{i}}{\varvec{c}}{\varvec{N}}{\varvec{e}}{\varvec{t}}\)). These polarity vectors were found in the Bloomberg data-set, i.e., the @business Twitter account. Neutral polarity was determined when the S140 lexicon was used. Almost the same positive and negative polarity strengths were found when Bing Liu’s lexicon was used, whereas with the NRC lexicon, the positive polarity strength was found to be greater than negative polarity strength. With the AFFIN Sentiment Lexicon, a slightly negative polarity was computed. the SenticNet lexicon detects the highest polarity. The date shift was equal to − 7. This means that the original dates of publications of the tweets were May 5, 2020, and May 6, 2020, whereas the values of the adjusted closing prices occurred on May 12, 2020, and May 13, 2020, respectively. The adjusted closing prices corresponded to Shanghai, China’s financial index.

Table 8 Example of polarity vector with a date shift = − 7

An example of the calculated correlations between the polarity and the adjusting closing price \({\varvec{r}}({\varvec{M}}{\varvec{p}}{\varvec{r}}{\varvec{i}}{\varvec{c}}{\varvec{e}}{\varvec{s}}{\varvec{e}}{\varvec{n}}{\varvec{t}}{\varvec{i}}{\varvec{c}})\) is shown in Table 9. This example shows a perfect correlation between the polarities and financial indices found in the Bloomberg data set.

Table 9 Correlations found for the data set Bloomberg, date shift = − 7

After calculating the correlations of each financial indicator with the polarity generated by the proposed method \({\varvec{r}}({\varvec{M}}{\varvec{p}}{\varvec{r}}{\varvec{i}}{\varvec{c}}{\varvec{e}}{\varvec{s}}{\varvec{e}}{\varvec{n}}{\varvec{t}}{\varvec{i}}{\varvec{c}})\), we identified the best correlations to understand financial market behaviour and sentiment analysis (Table 10). With SenticNet, we always found correlations between the polarity on Twitter and the behaviour of financial indices. However, there were some days when we did not find correlations with the lexicon-based analysis; for this reason, we decided to extend the search interval.

Table 10 Most significant correlations in absolute value for each data set

Despite extending the number of days of the search interval to find correlations with lexicon-based analysis, the SenticNet lexicon always generated the best results.

Discussion

Sentic computing has the potential to be an important tool to improve decision-making in finance because it extracts and identifies patterns on Twitter [1, 15]. The analysis of polarity on Twitter has the potential to predict the behaviour of the financial market [16]. The information shared on Twitter affects investors’ reactions and therefore affects stock indices, either positively or negatively or in terms of upward or downward trends. Negative sentiments lead to lower risk tolerance.

One of the applications of understanding the emotions of social media users consists of predicting the behaviour of financial indices. This reduces risk and uncertainty for investors. Investment portfolios can be formed with shares that follow the tendency of the indices, diversifying the selection of shares of different world markets. This can lead to an increase in the availability of more accurate and efficient sentiment measures and more wide-ranging studies.

Some of the analysed Twitter accounts were unrelated to financial indices due to several factors: (1) they had few followers; (2) they published very few posts or did not publish at all; (3) the Twitter posts were very specific and could affect only one sector and therefore were not perceived in the financial indices; and (4) the Twitter account did not publish relevant information for financial markets. For these reasons, some Twitter accounts were not included in the correlation analysis.

High correlations were found for the Investing.com, Bloomberg and CNN Business accounts. One of the reasons they had high correlations was their considerable number of followers—Investing.com, 168,000; CNN Business, 1.8 million; and Bloomberg, 6.4 million—and an average of 10 posts per day. Another Twitter account with high correlations was The New York Times (46.8 million followers), covering both financial and economic topics, such as music, culture, sports, art and entertainment. Different types of content led to different kinds of sentiments of investors.

We found that in 2009, information posted on Twitter about H1N1 was practically non-existent; this was not the case with COVID-19 in 2020. Therefore, our findings show that some Twitter posts had a more significant effect in the COVID-19 era than in the H1N1 period. However, the effect took fewer days than the moderate effect in 2009. This was because the use of Twitter was more widespread in 2020, and more accounts published information regarding COVID-19 and its effects on finances. Investors had more information for decision-making.

The sentiment of investors could also predict the evolution of indices a few days in advance. This investigation revealed that it took 0 to 10 days for markets to react to information shared and disseminated on Twitter during the COVID-19 pandemic. During the H1N1 pandemic, this period was from 0 to 15 days [52]. We found correlations not only in the positive shift values but also in the negative ones (from − 11 to − 1). This indicates that the behaviour of the stock market affects the reactions of Twitter users. In the case of H1N1, this took 1 to 11 days, and in the case of COVID-19, this took 1 to 6 days (see Appendix).

Some studies reflect the importance of investors’ emotions in financial markets [15]. In this research, we confirmed the argument of Ahmed [34] concerning investors’ reactions to a financial event in terms of stock price movements. When applied to Twitter, days later, there is an effect on the reactions generated by Twitter posts on the topic of financial indices. An important factor influencing the sentiments found on Twitter is the sharp drop in the financial market caused by COVID-19, in addition to the fact that its duration and impact on the economy and health have been much greater than those of H1N1.

We confirm that the number of followers on Twitter influences the performance of financial indices. Regardless of whether the publication is accurate or not, this alters the sentiment of a higher number of investors, forcing them to seek a risk-reducing strategy. We verified that The New York Times is an important source for accurately predicting the behaviour of the stock market. Moreover, we found three more sources: CNN News, Bloomberg and Investing.com. We also found higher shifted correlations with the SenticNet lexicon, and we detected that this method also produces a large number of cases in which correlations can be observed.

Conclusions

This research aimed to study the effect of polarity in Twitter posts on the behaviour of world financial indices during pandemics through a lexicon-based approach. Our research question was as follows: How does the polarity generated by Twitter posts influence the behaviour of financial indices during pandemics? Our analysis shows sufficient evidence supporting the notion that Twitter posts have influenced financial indices during both pandemics, with remarkable influences in the case of COVID-19. We found that the effect of social media publications was more significant during the COVID-19 pandemic than during the H1N1 pandemic.

Our contributions are as follows. First, we found that the combination of a lexicon-based approach is enhanced by a shifted correlation analysis, as latent or hidden correlations can be found in data. Second, there is an important effect of sentiments on Twitter on financial indices, and this effect is observed a few days after the information is posted on Twitter. Third, our research confirms that SenticNet performed better than other lexicons. Fourth, we proposed a correlation matrix that contains the polarity of Twitter posts and the behaviour of financial markets to study their relationship when the date is shifted in terms of financial sentiment analysis and the adjusted closing prices. Fifth, our data show some important effects days after and days before the posts were made. Sixth, we also confirm that social media propagation—more Twitter accounts—over this period of 11 years has a direct impact on the indices’ behaviour. Seventh, we presented the state of the art on sentic computing and finance.

Our findings confirm that the drop in stock prices during the COVID-19 era, compared with the H1N1 period, was more dramatic because there was more speculation, rumours and negative news. This investigation has additional value because it was conducted in the non-normal context of pandemics; similar research has been conducted under normal or traditional conditions. This research can be extended by analysing other indices, markets, or products, such as cryptosystems, FOREX and futures; moreover, further analysis of investor reactions is needed. Such research can use of sophisticated techniques such as sentic computing and explore different kinds of emotions. Finally, it is possible to analyse data from other social media platforms such as Facebook and WhatsApp in real time.