Introduction

Financial market behavior is one of the most studied areas in the financial field. Over several years, financial asset movements, factors influencing stock markets, and their predictions have been investigated (Zhou and Lu 2023; Papadamou et al. 2022; Subramaniam and Chakraborty 2021; Nguyen et al. 2020; Li and Yu 2012). Financial markets fluctuate according to investors’ decisions, which are affected by and react to different data sources available to them. Vlastakis and Markellos (2012) noted that “Information is the most valuable and highly sought asset in financial markets,” and accordingly, with Internet uptake, financial markets have been globalizing, allowing the speedy information exchange for investors, and creating new sources of knowledge, such as online newspapers, advertising or social media. Thus, investors are exposed to a vast amount of data that affect their trading decisions (Gao et al. 2023; Shen and Wang 2023; Swamy and Dharani 2019). When more information is available, investors are expected to improve their decision-making. However, it is relevant only if they can analyze and reason it (Filippou et al. 2023; Barber and Odean 2008). Furthermore, investors’ attention is limited (Xu et al. 2023; Akarsu and Süer 2022; Kahneman 1973), and they cannot process and reason all the available information for their trading decisions. Thus, investors are selective regarding financial data choices and will only pay attention to assets that capture their interests and overlook unattractive ones. In line with this, Baber and Odean (2008) established the Price Pressure Hypothesis, claiming that investors are net buyers of attention-grabbing stocks and that an increase in the attention paid to the market will positively temporarily pressure the prices of the stocks and, therefore, their returns and liquidity (Filippou et al. 2023; Chen et al. 2022).

Over the past several years, the research community has focused on accurately measuring investor attention. Some authors have proposed indirect financial measurements of investor attention, such as abnormal trading volumes (Wright and Swidler 2023; Hou et al. 2009), advertising expenses (Mayer 2021; Lou 2014), and high stock returns (Barber and Odean 2008) which reflect the information supply from the market. Nevertheless, these indirect proxy variables do not show the information demand that can be obtained from investors; thus, it is not certain whether the investor is receiving it (Shen and Wang 2023; Da et al. 2011). For instance, publishing an economic article about a company does not guarantee that the investor will read it; therefore, the attention paid to that company, specifically the stock, cannot be measured through the published articles. Alternatives for measuring market attention based on investors’ information demand have recently emerged (Gao et al. 2023; Adekoya et al. 2022; Petropoulos et al. 2021). Analyzing what investors search for provides fresh insights into what interests them, which Joseph et al. (2011) noted: “… results in a database of intentions”. Da et al. (2011) and Mondria et al. (2010) pioneered the financial applications of online queries. Da et al. (2011) introduced a novel and direct measure of investor attention based on data related to online searches provided by Google, known as the Google Search Volume Index (GSVI). According to the author, the GSVI provides fundamental data on firms and financial markets and captures investors’ active information demand. In other words, if an investor looks for information related to a stock, this means that or she is paying attention to it; therefore, this will be reflected in the market. Some contributions are shown by Chen and Lo (2019), Yoshinaga and Rocco (2020), Akarsu and Süer (2022), and Vozlyublennaia (2014), who relate investor attention measured by the GSVI, with variables such as abnormal returns, volatility, and trading volume, to study the effects on stock markets.

Google is the most relevant, popular, and used information search website globally (Khosrowjerdi et al. 2023). Search query data are provided by the tool Google Trends, which offers the possibility of downloading historical data on the search volume for different terms or groups of words searched on Google, being able to delimit by country and region (Shen and Wang 2023). GSVI-based studies consider investor attention as the frequency of keyword queries via Google Trends. According to Da et al. (2011), if the GSVI is considered a tool for estimating investor attention, it can be useful for many other applications in finance. This has led researchers to explore the potential use of search frequencies in stock indicators for financial market forecasts and nowcasts. The samples of search queries used for the GSVI are numerous, such as company names (Salisu and Vo 2021; Ramos et al. 2020), stock tickers (Nguyen et al. 2020; Sifat and Thaker 2020; Tan and Taş, 2019), finance-related terms (Smales 2021; Ding et al. 2020) or even the name of the stock index studied (Basistha et al. 2019; Škrinjarić, 2019). Notwithstanding, although a wide range of options is offered, it can be considered a double-edged sword. On the one hand, it allows for a wider study of investors’ interests and attention. However, this entails more discrepancies among the different inputs used. The selection of keywords is crucial for the validity of the studies. Similarly, different geological locations on the internet have been used, such as the US (Ouadghiri et al. 2021; Tang and Zhu 2017; Arditi et al. 2015), Europe (Ramos et al. 2020; Kim et al. 2019; Oliveira-Brochado 2019) and Asian markets (Sifat and Thaker 2020; Adachi et al. 2017). Regarding the results obtained in each study, there is no consensus on the contributions presented in the academic literature. It is questioned whether GSVI can predict (Lai et al. 2022; Yoshinaga and Rocco 2020; Perlin et al. 2017; Bijl et al. 2016; Preis et al. 2013) or not stock markets (Sifat and Thaker 2020; Basistha et al. 2019; Lobão et al. 2017). Moreover, the literature presents positive and negative relationships between the GSVI and financial variables. Bijl et al. (2016), Preis et al. (2013), and Perlin et al. (2017) find significant and negative relationships between the GSVI and stock returns. By contrast, Lai et al. (2022), Da et al. (2011), Swamy et al. (2019), Ahundjanov et al. (2020), and Nguyen et al. (2020) found a significant and positive relationship between the same variables. However, although their conclusions differ depending on the approach and variables used, they all have something in common: GSVI is a potential variable for explaining stock market movements.

As previously mentioned, there has been an exponential increase in publishing articles that use the GSVI as a proxy for investor attention and stock market forecasting. It is crucial to note that Google search data have only been available since 2004, so it is considered a recent database, and the interest paid to it has increased since 2009. To our knowledge, no systematic review article has focused solely on the potential use of the GSVI as a variable proxy for investor attention and a predictor for stock market indicators. Hence, owing to the large number of mixed results obtained, the multiple search input data applied in the study of the GSVI, and the manifold contributions for future research, the present systematic review was conducted. The main purpose of this study is to review the current literature on the GSVI as a proxy variable for investor attention and its relationship to stock market forecasting. The scope of this study includes 56 articles dealing with investor attention, volatility, volume, stock price prediction, risk diversification, and trading strategies regarding Google search volume. Our contributions to literature are threefold. First, we extend the literature on investor attention to the international level and show that the impact of investor attention on stock market performance is not consistent internationally. Second, we provide valuable insights into the debate on the effectiveness of GSVI as a proxy for investor attention in forecasting critical financial metrics in stock markets. Third, we identify a new understanding of stock market behavior to help academic researchers and retail and institutional investors with their trading activities.

The remainder of this paper is organized as follows. Section 2 provides a theoretical framework for the concepts related to the study area and a literature review. Section 3 describes the methodology of this study and presents a descriptive analysis of the selected articles. Section 4 classifies the research by conducting a data analysis of how the composition criteria of the GSVI keyword, region, and frequency impact the financial variables of return, volatility, and trading volume. Finally, Sect. 5 presents the contributions and discusses the limitations, main challenges, and suggestions for future research.

Overview of the GSVI and its applications

This section will explain some key concepts that provide a foundation for this study and the literature are reviewed.

Google Trends

Google Trends is a public and free information service offered by Google’s search engine that supplies frequency time-series data for a particular word or group of words. The search volume offered by Google Trends encompasses data from 2004 to the present and can be enclosed by country or region. Therefore, it is considered a recent tool, unable to develop studies before that date and lacking reliable data before 2008 (Challet and Ayed 2013). The data provided by Google Trends are considered unstructured inputs because text-mining techniques are necessary to transform the data into inputs for models. (Bustos and Pomares-Quimbaya 2020). Google search volumes provided by Google Trends can be obtained using monthly, weekly, daily, and even minute averages. Most studies use weekly search volumes in their methodologies (Akarsu and Süer 2022; Ouadghiri et al. 2021), although there are some discrepancies in the most suitable frequency for measuring the data (Hamid and Heiden 2015). The data provided by Google Trends ranged from 0 to 100. To calculate the frequency range, each search volume obtained in a period was divided by the total search volume for that period. Therefore, the data were normalized (Choi and Varian 2012).

Ettredge et al. (2005) and Cooper et al. (2005) introduced the application of web search volume as input data to measure its relationship with the US unemployment rate and cancer-related topics, respectively. Google Trends was not used until 2009. Ginsberg et al. (2009) estimated weekly influenza activity in the US regarding the incidence of influenza-related Google search queries. In Economics, Choi and Varian (2012) described a novel methodology using Google search volume for “nowcasting” economic indicators related to employment, motor vehicle sales, traveling, and consumer confidence. The authors propose baselines for future research and familiarize readers with Google Trends. The application of Google Search queries in the financial field arrived at Preis et al. (2010), who studied the link between search volume data and financial markets, and Da et al. (2011), who were the first to provide empirical evidence of the Google search volume index as a direct proxy for measuring investor attention.

The GSVI and investor attention

Traditional asset-pricing models rely on the assumption that markets are a continuous source of information released in real-time. This requires investors to draw attention to collecting such information to improve their knowledge when making investment decisions. In this field, gathering, interpreting, and connecting data are the central cognitive tasks that require memory retrieval and action planning. Peng and Xiong (2006) state that these cognitive processes are relevant to determining stock prices.

However, attention is not an unlimited resource (Shen and Wang 2023; Kahneman 1973). As Pashler et al. (2001) have pointed out, there is supporting evidence affirming that the central cognitive processing capacity of the human brain is limited. Accordingly, investors cannot process all the information in their trading decisions. They have access to more data than ever; however, owing to so many sources, it is becoming increasingly difficult to focus on a specific piece of information. This paradox was illustrated by Nobel Prize economist Herbert Alexander Simon (1971), who stated that such a wealth of information creates a poverty of attention.” Consequently, investors are only able to retain and process a fraction of the information at hand, which implies that they do not pay attention to unnoticeable knowledge (Shen and Wang 2023; Mondria et al. 2010) and; therefore, their economic decisions are affected by this information bias (Ramos et al. 2020). Specifically, as shown by different authors, limited attention plays an important role in investor sentiment toward stock market movements (Filippou et al. 2023; Akarsu and Süer 2022; Drake et al. 2017; Goddard et al. 2015).

Most studies on investor attention rely on the “Price pressure hypothesis” presented by Barber and Odean (2008). These authors claim that investors face problems when looking for assets to purchase. According to these authors, increases in stock prices and volatility are caused by increased investor attention. When investors seek to sell, they constrain themselves to their securities. However, when they engage in buying activities, they must handle the vast stock available in the market. As investor attention is limited, investors look for information on stocks that attract their attention. Therefore, investor attention is paid to buying behavior that leads to buying pressure, which will temporarily press up prices and liquidity.

Several studies address the most accurate approach to measuring investor attention (Ben-Rephael et al. 2017). Barber and Odean (2008) suggest an indirect measure of investor attention based on unusual trading, extreme returns, and firm news. Takeda and Yamazaki (2006) studied the effect of mass media on the stock prices of companies advertised during a well-known Japanese TV program. Da et al. (2011) explored the relationship between investor attention and stock market prices, presenting a Nobel direct proxy based on Google Search queries representing investors’ information demand. According to the author, when an investor is looking for information about a stock as measured by the search volume index, they pay attention to that company. In their empirical work, Da et al. (2011) studied the performance of the Russell 3000 index, concluding that Google search queries capture investor attention and that stocks with a higher search volume obtain an increase in stock prices in the next two weeks. Therefore, their research supports the effect of the Price Pressure Hypothesis, relating investor attention to buying behavior.

Over the last decade, limited attention theories have been studied together with using Google Search queries by retail investors because institutional investors commonly use different information channels and systems. Da et al. (2011) explained that when investors look for information on a certain stock, they pay attention to that term and carry out a decision-making process that includes that firm. In this context, the higher the Google search volume of a term, the higher the attention drawn by investors. The basis for building the GSVI is similar to that in the literature. However, different methodologies are worth explaining to better understand how to approach this index.

Investor attention is measured by the frequency of keywords searched using the GSVI. However, the data frequency depends on the length of the sample period. For example, when retrieving daily data, Google Trends offers downloaded data for up to 270 days. However, weekly and monthly data are available over longer time windows. Hence, the frequency of the data provided by the search engine decreases as the interval length increases. This demonstrates shortcomings in computing the data sample. For example, Smales (2021) and Pereira et al. (2018) rearranged other data to form time series using daily data. Another data-processing constraint is the time retrieval bias, as the GSVI can be obtained from different intervals that are not comparable. Da et al. (2011) proposed the Abnormal Search Volume Index (ASVI) as the logarithm of the GSVI during the current week minus the logarithm of the median value during the previous eight weeks. In this manner, the index can be robust to recent jumps, remove time trends and low-frequency seasonality, and investor attention variable can be compared across stocks in the cross-sections (Ramos et al. 2020; Lyócsa et al. 2020; Tan and Taş, 2019; Tang and Zhu 2017). A different approach was proposed by other authors (Swamy et al. 2019; Swamy and Dharani 2019; Bijl et al. 2016; Dimpfl and Jank 2016), who standardized the GSVI (SGSVI) to make it more comparable across firms. The SGSVI is the weekly raw GSVI minus the average of the past 52 weeks, divided by the standard deviation of the previous year. Kim et al. (2019) compared both methods and concluded that standardization was more convenient because using logarithms resulted in very low values.

Despite methodological differences in building the GSVI, there is a consensus on the link between the GSVI and investor attention. Empirical evidence has related the GSVI to market variables such as stock returns, trading volume, and volatility to measure the impact of investor attention on stock markets. The following subsections provide deeper insights into these variables.

The GSVI and financial variables

Return and the GSVI

Stock returns are key variables in explaining the potential behavioral signals of stock markets. Therefore, most studies that analyze the effects of investor attention on stock market behavior use it as their main variable.

Joseph et al. (2011) built a trading strategy based on search volumes for companies’ tickers and concluded that previous search intensity was related to abnormal stock returns in the current period. The same results were obtained by Bank et al. (2011) by shortening portfolios according to Google search volume, although the sample used was based on company names. The positive relationship between the GSVI and current and future stock returns found in the two aforementioned studies is consistent with the propositions stemming from the work of Barber and Odean (2008), as the results improve short-term buying pressure accompanied by an increase in stock prices. According to Bank et al. (2011) and Takeda and Wakao (2014), the positive relationship between investor attention and returns is stronger for smaller firms.

Bijl et al. (2016) use sample data within the 2008–2013 period, claiming that Google search volume, as a measure of investor attention, is negatively related to returns. Furthermore, the authors concluded that the trading strategy was not profitable when considering transaction costs. Akarsu and Söer (2022) developed a cross-country analysis that includes 31 countries from the Americas, Asia–Pacific, and Europe, concluding that the impact of investor attention on stock returns differs among countries. Nguyen et al. (2019) and Salisu and Vo (2021) found evidence of a positive long-term relationship between search intensity and stock returns based on the Vietnamese stock market. This study is similar to that of Adachi et al. (2017), who conducted a study on the Japanese startup market. These authors conclude that the positive effect of GSVI on stock returns is not temporary and, therefore, conveys long-term effects.

Volatility and the GSVI

Another related variable for measuring the impact of the GSVI on stock markets is volatility. Vlastakis and Markellos (2012) show that investor attention measured by the GSVI significantly affects volatility in both firm-specific, measured by company names, and market-related data. Dimpfl and Jank (2016) found that high volatility precedes increased investors’ information demand. In their empirical approach, they obtain daily basis data at a market level with the keyword “DOW” for search frequency. In contrast, Hamid and Heiden (2015) suggest that daily frequency does not improve volatility predictions, although they find a positive relationship with investor attention in short horizons. Ramos et al. (2020) explore European markets by analyzing the investor attention effect in the EUROSTOXX50 index, concluding that an increase in search queries precedes a short increase in volatility that is reversed two weeks later. González-Velasco and González-Fernandez (2023) base their work on the assumption that investors’ decisions are influenced by sentiment (including fear) and assess the effect of the fear response to the COVID-19 pandemic on stock market volatility. Working from a different angle, Pereira et al. (2018) find that investor behavior is influenced by events related to Donald Trump’s presidency. The authors observe a positive correlation between searches and volatility in markets such as Mexico, Japan, and Australia.

Trading volume and the GSVI

Finally, the effect of investor attention, as measured by information demand, is related to the trading volume. Lai et al. (2022), Bank et al. (2011), and Takeda and Wakao (2014) claim that the higher the intensity of GSVI, the greater the abnormal trading volume will be reported. The same results were supported by Aouadi et al. (2013), although they found a stronger correlation between the GSVI and trading volume that was more market-related than at a specific firm level. Chen and Lo (2019), Joseph et al. (2011), and Nguyen et al. (2020) claim that the GSVI is significantly positively correlated with abnormal trading volume and that it provides evidence that the GSVI is a direct proxy for investor attention. Desagre and D’Hondt (2019) studied the relationship between investor attention and trading activity in a sample of 455 stocks and concluded that this relationship was positive but not stronger for purchases than for sales, not supporting the Price Pressure Hypothesis. Based on the latter, attention influences buying behavior more than selling behavior because investors have a wider range of options when purchasing than when selling. Hence, increased attention (increase in the GSVI) will temporarily pressure prices, becoming more traded financial assets (Barber and Odean 2008).

The GSVI and stock market forecasting

To study the link between the GSVI and stock market forecasting, it is vital to introduce the Efficient Market Hypothesis (EMH) developed by Fama (1991). The EMH assumes that security market prices reflect all available information; therefore, it is impossible to determine prediction models for beating the market (Fama 1991). Accordingly, selecting portfolios based on undervalued stocks is pointless because no profit opportunities are associated with these predicted trading strategies. Nevertheless, as well as a large body of knowledge supporting the EMH, the academic literature on the prediction market hypothesis expands many techniques and methods, compiled in two main approaches: technical and fundamental analysis. The first method uses historical prices and volume values as input data for asset forecasting (Bazán-Palomino and Svogun 2023; Mustafa et al. 2022; Ahmadi et al. 2018; Laboissiere et al. 2015), claiming that these values already contain all the information analyzed by the fundamental approach (Bustos and Pomares-Quimbaya 2020). In contrast, fundamental analysis obtains input data from economic and financial factors that could affect companies in predicting future asset intrinsic values (Tajmazinani et al. 2022; Zhang et al. 2018; Checkley et al. 2017). The fact that investors do not have access to all available market information contrasts with the EMH (Takeda and Wakao 2014), rejecting the assumption that prices reflect all available information. Google search volume captures investors’ interests, and this information is a key element for financial market interpretation. If attention can be measured, it could provide evidence of future investor trading decisions and be useful for determining how securities prices change (Huang et al. 2020). In other words, the GSVI can be useful for stock market forecasting. Several empirical studies have analyzed the use of the GSVI to predict key variables in stock markets.

According to Da et al. (2011), the GSVI can forecast stock returns more accurately than other variables, such as news about a company, because the latter information is slowly incorporated into stock prices. They studied the prediction application of search intenseness using the Russell 3000 index and concluded that a higher GSVI predicts an increase in stock prices in the next two weeks, which reverses within a year. Joseph et al. (2011) found the same results, although the input data for measuring search queries are company tickers instead of company names. Lai et al. (2022) observed that positive shocks drive the GSVI; hence, excess returns and abnormal trading volumes are predicted positively. Preis et al. (2013) noted a negative correlation between the stock returns of the Dow Jones Index and the search volume of finance-related terms. These results are supported by Perlin et al. (2017), who extend Preis et al.’s (2013) strategy across four countries to study the forecasting approach on a broader scope Moreover, the authors showed that an increase in the GSVI is followed by an increase in stock market volatility, also supported by Dimpfl and Jank (2016). Tan and Taş (2019) explored the predictive capabilities of 313 stock tickers search volume concerning Turkey’s financial market. Although they observed a positive relationship between search frequency and future stock returns, they established that this relationship persisted for over two weeks. Swamy and Dharani (2019) discovered analogous results for the Indian stock market, although they focused on the impact of the GSVI on excess returns for each company. Other studies, such as those of Sifat and Thaker (2020) and Vozlyublennaia (2014), suggest that although a relationship exists between the GSVI and stock market variables, the application of predictability is low or diminishes as an increase in information demands an improvement in market efficiency.

Research methodology

A systematic review establishes a starting point for practitioners, providing an unbiased synthesis and quality appraisal of existing knowledge on a topic (Okoli and Schabram 2010). Moreover, systematic reviews contribute to the research community by updating expertise, making the development of future research easier (Tranfield et al. 2003). The purpose of this methodological approach was consistent with the state-of-the-art topics examined in this study.

After introducing a theoretical framework that provides the foundation for this research, the systematic review was structured into three steps (Fig. 1) based on Okoli and Schabram (2010). First, data was collected by setting the selection criteria for the literature searches and establishing a quality appraisal process (Fig. 2). The second part corresponds to a systematic analysis of the articles, where they are first classified according to the GSVI composition criteria. Second, the impact of the GSVI on the financial variables is analyzed. The third part presents a discussion of the findings of the selected articles.

Fig. 1
figure 1

Design structure of the research

Fig. 2
figure 2

Selection process of the articles reviewed

Data collection

Selected articles were obtained from the Web of Science (WoS) and Science Direct (SD) databases between 2010 and 2021. The data were collected between October 2020 and March 2021. The WoS and SD search engines were chosen to ensure quality. To improve reliability, only articles from peer-reviewed journals were accepted (Schlosser 2007). The literature search included articles using keywords related to the main terms discussed in the previous section. The following keywords were used for the literature search in the title, abstract, and keywords of articles: “Google Trends” AND (“Financial Market” OR “Stock Market”); “Google Trends” AND (prediction OR forecasting); “Google search volume” AND “Investor attention”; “Google search volume” AND “limited attention.” A second search was conducted to ensure the review’s validity to avoid discrepancies in the results. The search retrieved 119 articles from WoS and 604 from SD, of which 134 appeared twice.

Selection criteria

The following inclusion criteria were adopted to reduce the risk of bias:

  1. 1.

    The studies found in the literature search will fit the present systematic review if they contribute to any of these objectives. Therefore, papers whose main contribution was the influence of other economic indices (GDP, unemployment, etc.) or other fields (health, sociology, etc.) were not selected.

  2. 2.

    The articles accepted are those whose main analysis field is based on stock indexes, excluding those that use other financial assets, such as cryptocurrencies, commodities, exchange rates, fixed income assets, or CFDs, as the principal data input.

  3. 3.

    We included papers exclusively focused on Google Trends data. In order to isolate the effect of the GSVI, we decided not to consider articles that rely on any other online data sources (e.g., Wikipedia or Twitter).

The studies included in this review fulfilled all the inclusion criteria. Each article was read comprehensively to determine which articles were relevant to the present study.

Selection process

The selection process is illustrated in Fig. 2. The search retrieved 119 and 604 articles from WoS and SD, respectively. From the primary search, 196 papers were excluded because they were not directly related to studying financial market behavior using the GSVI. After applying the second exclusion criterion, 227 articles from both databases were discarded, and 112 articles were not considered because they used other text-mining techniques in their methodology. In total, 188 papers were selected, and 134 retrieved from both bibliographic databases. Of the 54 articles, three did not analyze the variables selected for data synthesis. Finally, 51 studies were reviewed. The journals in which the articles were surveyed are listed in Table 1.

Table 1 Publication outlet

To ensure the quality of the sampled articles, they were exclusively collected from journals indexed in the Journal Citation Report and Scimago Journal Rank. As a signal of the relevance of the studies included in this systematic review, 65% of the journals are ranked on the Journal Quality List, 69th Edition (Harzing 2022) in Finance and Accounting, Economics, and International Business. Furthermore, they will be included in the Top 50 SCimago Journal Rank (2022) in finance, multidisciplinary science, and artificial intelligence. Regardless of the area in which they are listed, all the selected papers have a financial approach and, consequently, are completely aligned with the scope of this study.

Although Google Trends was first publicly available in 2006, time-series data were offered in 2004. However, the first study on applying the GSVI in the financial market and investor attention (Da et al. 2011; Preis et al. 2010) was not published until 2010. The publication trend (Fig. 3) shows the growth in academic publications on applying the GSVI in financial markets since 2010. As time series data have broadened, the academic attention paid to this discipline has increased at the same pace as data availability.

Fig. 3
figure 3

Publication trend of the articles reviewed

Findings

This section analyzes the predictive capacity of the GSVI for stock market movement. We measure stock market movements using returns, volatility, and trading volumes. The predictive capacity of the GSVI for these variables depends on its composition. In other words, it depends on the criteria used to develop the data search. The main criteria for the reviewed articles corresponded to three classifications: keywords, region or study market, and data search frequency. Hence, the articles discussed here are organized according to the criteria applied to the GSVI composition and analyze how these differences influence the predictive capacity of the GSVI for the financial variables of return, volatility, and trading volume. Table 2 presents the data classification according to financial variables and criteria. Table 3 summarizes the papers reviewed based on the region of study, stock market, financial study variables, keywords, frequency, and main approaches.

Table 2 Classification of articles according to the financial variable studied and the GSVI building criteria applied
Table 3 Classification of the reviewed articles about the GSVI and stock market behavior

Financial variables

According to the reviewed articles, market movements are measured mainly through the variables of returns (R), volatility (Vol), and trading volume (TV) (see Table 4). In this manner, it not only reveals how investors’ searches affect the prices of the assets but also explains the market turmoil and the financial transaction variation in different stock markets. Half of the studies (n = 24) analyzed more than one financial variable simultaneously to study its relationship with search engines, obtaining a wider analysis of the stock market and investor behavior. Of the 56 studies in the systematic review, 86% focused on the effect on asset returns and 37.5% and 34% on the results measured by volatility and trading volume, respectively. The fact that returns are the most used indicator could be explained by considering that the information of the GSVI is incorporated rapidly into stock prices; therefore, the effect of Google search queries can be reflected in stock returns in a shorter period (Da et al. 2011).

Table 4 Articles’ classification according to the financial variable

The papers reviewed are divided into two main categories according to the approach followed: nowcasting or forecasting the values of stock market variables. Choi and Varian (2012) stated that the GSVI can help in contemporaneous forecasting, considered a present or short-term forecast. In this sense, the nowcasting approach studies whether past search volume data forecasts present the financial market value. Regarding the forecasting approach, academic literature has studied how the GSVI can help test market efficiency theories (Fama 1991) and develop investing portfolio strategies for market outperformance. Of the articles reviewed, 41% (n = 23) surveyed the direct nowcasting relationship between search queries and financial market values, analyzing the short-term predictive capacity of the GSVI and its relationship with investor attention. Conversely, the remaining studies adopted a long-term predictive approach.

In the following paragraphs, the main outcomes related to the criteria adopted in each article are discussed.

Keywords

Keywords classification

Based on the articles read, we divided the keywords used to obtain the GSVI into four classifications: company names (CN), ticker (TICKER), related term (TERM), and index name (INDEX). Dissenting opinions exist regarding less ambiguous keywords for GSVI construction. As stated in Broder (2002), the data obtained from the search traffic can give a general view of what the seeker is looking for but not know exactly the “need behind the query.” Da et al. (2011) and Joseph et al. (2011) claimed two main problems when using company names as keywords in the GSVI. First, the search term does not have to strictly indicate investment intention and could indicate any other reason, such as online shopping or store location. Second, firm names can be spelled out in various ways and using abbreviations. Hence, the authors propose using the firm’s ticker, as it is highly probable that someone looking for that term will have investing intentions, which would be more accurate for the GSVI data. Kristoufek (2013) found difficulties in using stock tickers as a data sample because of the difficulty in obtaining a good frequency for some of the terms that can reduce the extent of the data sample. In the same line, Tan and Taş (2019) manually excluded some of the ticker terms as they had general meanings with other Turkish words, which could generate some noise and, hence, diminish the data sample.

Despite discrepancies in the use of firm names, many articles (34%) built a GSVI using company name data. According to Vlastakis and Markellos (2012), the noise supported by Da et al. (2011) and Joseph et al. (2011) is random and does not influence the GSVI data sample. To avoid the second potential problem, several authors have proposed different keyword selection processes. Takeda and Wakao (2014) and Adachi et al. (2017) developed a complete procedure for Japanese companies, excluding general-meaning abbreviations and subtracting part of the name when considering it irrelevant to the company. Preis et al. (2010), Ramos et al. (2020), and Bijl et al. (2016) adapted Company names, simplifying the search term and eliminating part of the name as “Inc.”. Moussa et al. (2017a, b) applied a service called “Adwords,” which improves the keyword search by restricting the search term by the “investment” category. Furthermore, Google Trends allows delimiting by region when obtaining query data, reducing the bias of different company names in other languages.

From the total number of papers found, 24% used related terms as keywords. Preis et al. (2013), Perlin et al. (2017), and Oliveira-Brochado (2019) use financial terms as keywords to construct the GSVI. However, not only did this use financial terms for measuring its relationship with stock market performance but also related terms according to the study matter. Ahundjanov et al. (2020), Ding et al. (2020), Lyócsa et al. (2020), and Smales (2021) establish search terms related to COVID-19 to measure investor attention to the pandemic and its effects on stock markets. Dzielinski (2012) uses the term “economy” to measure economic uncertainty based on the GSVI, volatility, and returns of the S&P500. Other authors that follow the same term selection process are Piñeiro et al. (2020) for the water sector index, El Ouadguiri and Peillex (2018) for investor attention to Islamic terrorism in stock markets, Ouadghiri et al. (2021) on the climate change sector, and Pereira et al. (2018) for the “Donald Trump” effect worldwide.

Finally, 19% of the articles used index names as keywords. Aouadi et al. (2013) argue that trading volume and investor attention have a stronger correlation at the market level (measured by the index name) than at the firm-specific level. Accordingly, Tantaopas et al. (2016) analyzed the index level, claiming it is more representative than individual firms.

Keyword impact of financial variables

We begin our analysis by classifying each keyword according to the financial variables studied. As shown in Table 5, the company name is the most used category, followed by ticker, related terms, and index names. The total number of articles was 62 instead of 56 because four (Fan et al. 2021; Sifat and Thaker 2020; Chen 2017; Aouadi et al. 2013) applied different classifications in their investigations. Returns were the most studied variable in three of the four categories. Volatility is the most widely studied financial variable in the index name category. This could be explained by the articles measuring the GSVI effect on an index level instead of a firm-specific level, which is more successful in measuring the GSVI effect through market value fluctuation. The GSVI shows mixed results with the return variable, independent of the keywords used for the data sample. Hence, a variation in the search volume of a keyword is positively and negatively correlated with the returns of the stocks or indexes used.

Table 5 Article classification according to keywords and financial variables

Concerning volatility, the classifications of tickers and related terms present a small and low-significance sample notwithstanding, articles that obtain the data sample for the GSVI based on company names and index names amount to nearly 80% of all the samples that measure volatility. These studies show a positive relationship between the GSVI and volatility. Hence, an increase in investor attention to a specific stock or market is followed by an increase in price fluctuations (Kim et al. 2019; Dimpfl and Jank 2016; Hamid and Heiden 2015; Aouadi et al. 2013). For 52% of the total sample, the effect of the GSVI on trading volume is measured mainly with the classification of the index name. As with the volatility variable, the search volume shows a positive relationship with the trading volume in nearly the entire sample. Furthermore, these articles also include the relation of the GSVI with returns, which is positive in both variables.

Concerning the results of the forecasting approach, 90% supported the predictive capacity of the GSVI for stock market movements, regardless of the keyword used to build the search index. According to Da et al. (2011) and Joseph et al. (2011), the lack of predictive capacity could be caused by the keywords used for building the GSVI because using company names could cause data noise. Hence, the authors claimed that other keywords were more useful for measuring the predictive capacity of the GSVI. The only classification that does not include any article rejecting the predictive capacity of the GSVI is the ticker; therefore, it could be considered the most suitable option for researchers when testing the GSVI forecasting models.

Region

Region classification

Another key factor in this research is the geographical region of the study. The articles were classified into five groups: US, Europe, Asia, South America, and multiple countries. The most scientific research works (41.1%) have been developed using US data samples and concerning US financial markets, which are the most well-organized stock markets and have more available data than other countries. Only 30% of the articles were published by North American authors. In contrast, despite European academic productivity (48%), only 20% of the articles reviewed investigate European financial market value. The S&P500 (29%) is the most studied stock index, either by itself or in a cross-country analysis with another index. In conjunction with the DJIA (20%), they involve around half of the investigations (49%). As the US stock market is one of the most traded stock exchanges in the world, companies listed on this stock market show how market variables such as returns, trading volume, and volatility react to news. Furthermore, when the market is hot, investors show more interest in and motivation toward stock markets and search for information (Yoshinga and Rocco 2020). Hence, researchers can obtain a stronger sample from these markets, reflecting investors’ interests, attention, and behavior. Moreover, eight cross-country studies provide stronger evidence of the efficiency of the GSVI for investor attention and forecasting over a wider range. However, Internet penetration is not equal in all areas the sample covers for search traffic in cross-country studies (Sifat and Thaker 2020). Another constraint found by region is the existence of other information webs, such as Baidu, which is commonly used in Asia instead of Google. Therefore, research conducted in Asia is limited by the number of search queries. Despite the restrictions on Asian markets, they represent approximately 21% of the studies reviewed. Similarly, the leading European stock index, EUROSTOXX50, and other powerful indexes in the European region, such as CAC40 and FTSE100, collected 20% of all the literature in this study. Finally, the South American classification includes two studies conducted on Brazilian stock markets (Yoshinaga and Rocco 2020; Ramos et al. 2017), one of the BRICS economies, and a representation of emerging countries in the world.

Region criteria impact on financial variables

Table 6 shows the article classification by study region and the relationship with the different financial variables, measuring their positive or negative relationship with the GSVI. Not all the articles present a positive and/or negative relation of the GSVI with each financial variable, as there exists a minority of them that do not result in any universal relation between the variables (Sifat and Thaker 2020; Basistha et al. 2019; Lobão et al. 2017; Kristoufek 2015). The three main market regions were the US (41.1%), Europe (19.6%), and Asia (21.4%), accounting for 82.1% of the articles reviewed. The South American region presents only two research works, considering it a small and non-representative sample compared with the rest of the regions. Similarly, cross-country studies accounted for 14.3% of the global sample and were even less representative than the three leading regions.

Table 6 Article classification according to region criteria and financial variables

Studies conducted in the US encompassed nearly half of the total sample. The results were diverse for the variables used to measure the GSVI effect and the relationship sign. When the effect of the GSVI is measured by returns, the results differ from when it is measured by volatility or trading volume, even for the same index. Only two studies in the United States, Basistha et al. (2019) and Kristoufek (2015) found no universal relationship between GSVI and stock market movements. The most studied variable in the US is returns. Interestingly, this was the only variable that accounted for a similar number of articles with positive and negative results regarding the GSVI. Regarding volatility and trading volume, there is a clear trend in the positive relationship with GSVI, which is an increase in search volume, continued by a higher fluctuation in prices and a greater number of financial transactions. Hence, a surge in investor attention, measured by the GSVI, increases volatility and trading volume. Moreover, 57% of the articles published in the US agreed on the forecasting capacity of the GSVI for financial markets. Joseph et al. (2011), Huang et al. (2020), and Chronopoulos et al. (2018) claim that the GSVI improves stock market prediction models.

Similar results were obtained in Europe and Asia. Likewise, in the US, the most analyzed financial variable is returns. In general, there is a positive correlation between all financial variables and the GSVI. Focusing on Europe, no article presents a negative relationship between the GSVI and returns, although this has been studied more from a forecasting approach. Most studies agree on the predictive capacity of GSVI, even though it is more related to future values than current activity (Kim et al. 2019). In Asia, the sample forecasting stock market activity is small and insignificant.

Search frequency

Frequency classification

The frequency of the data was studied on a weekly, daily, monthly, or yearly basis. Google Trends offers data at different frequencies according to the length of the period. Data from Google Trends are available from 2004, and up to a window of nine months can be obtained daily. The longer the period, the more frequency is given (weeks or months). The choice of the appropriate frequency is crucial, although the research community has not yet studied this in detail. Some contributions, such as that of Ekinci and Bulut (2021), state that using weekly data cannot clarify exactly what causes the movement. For instance, if there is an increase of the GSVI on a Monday and an increase of Return on Wednesday, then a growth of the GSVI follows as a surge of return. However, if the return of a stock increases on Thursdays and the GSVI increases on Fridays, then the effect will be the opposite. Similarly, Hamid and Heiden (2015) claim that the daily frequency of the GSVI is inappropriate for volatility forecasting. However, 40% of the daily frequency studies have included volatility as a financial variable.

By constraining the period, it would be possible to adapt the frequency; however, according to the sample period for most studies, the authors usually use weekly data. Hence, most reviewed papers used weekly frequencies for search queries and financial data (66.1%). Daily and monthly frequencies were used in nearly equal proportions, and there was only one study conducted yearly (Nguyen et al. 2019), although the author first obtained monthly data and then transformed it into a yearly basis. Ekinci and Bulut (2021) state that information leakage can occur when a data frequency is selected. Asset market prices change quickly, presenting boom and bust price situations in seconds. Suppose the data sample obtained is measured at a high frequency (weeks over the years). In that case, obtaining more information from the data sample will be possible, thereby increasing the number of observations and presenting robust results. Hence, obtaining high-frequency data to consider market fluctuations are essential (Araújo et al. 2015).

Frequency selection impact on financial variables

Table 7 summarizes the articles reviewed by the frequency used to obtain the search volume data and the relationship between this variable and the financial variables of return, volatility, and trading volume. The most representative frequency was weekly, used in nearly 70% of all articles reviewed. This was followed by daily and monthly frequencies, which amounted to 17.9% and 14.3% of all the data samples, respectively. The yearly frequency was used in only one study, considering that it was a small and non-representative sample compared to the rest of the frequencies. Generally, the attention paid to data frequency is low compared to that paid to market or keyword selection.

Table 7 Article classification according to region criteria and financial variables

As with the keyword and region criteria, the relationship between the GSVI and return variables is diverse. An increase in the GSVI positively correlates with increased returns in most studies, regardless of the frequency used. Nevertheless, as the data sample is higher in weekly frequency, the number of articles with a negative relationship also increases, being mainly developed in the US markets. When measuring volatility and trading volume, there is a clear tendency toward a positive relationship with the GSVI at any frequency. As with the other criteria classifications, increasing investor attention toward a specific term increases the volatility and number of transactions.

Regarding daily frequency, 40% of these articles (Smales 2021; Ahundjanov et al. 2020; Ding et al. 2020; Lyócsa et al. 2020) correspond to a COVID-related search and its effect on investor attention and stock market reaction. Considering that the COVID-19 pandemic is a recent topic, the periods are less than one year; therefore, data can be obtained daily, and there is no sense in taking it at another frequency. The results obtained for all three financial variables show a positive relationship with the search queries. The predictive capacity of the GSVI was analyzed using all frequency categories except yearly. Most articles provide evidence on how the GSVI improves the prediction of stock market movements and price variations.

Conclusion

This study presents a systematic review of the existing literature on the predictive capacity of the GSVI for stock market movements. It uses 56 articles from 2010 to 2021 obtained from the WoS and Science Direct databases, which use the GSVI as a proxy variable for measuring investor attention and its relationship to stock market forecasting. The analysis of these articles results in three different classification criteria for building the GSVI and three main financial variables to measure the effect on stock markets. The classification criteria were keywords, market regions, and data frequency. Financial variables are summarized in terms of returns, volatility, and trading volumes.

After reviewing the most relevant papers in the field, we conclude that regardless of the keyword, region, or frequency used to build the GSVI, there is evidence that the GSVI is positively related to volatility and trading volume. Hence, increasing investor attention to a specific financial term increases volatility and trading volumes. However, regarding returns, the effect of GSVI depends on the building criteria. In particular, negative relations are found in the US markets using an economy-related term as a search topic. Furthermore, we conclude that the GSVI forecasts financial market movements irrespective of the data sample’s market region, keywords, or frequency.

This study sheds light on the evidence of the increased significance of information the GSVI conveys. In addition, this systematic review highlights the challenges and opportunities for future research. Longitudinal studies should be conducted to compare the predictive capacity of the GSVI at different points in time. These studies are needed because Google Trends information has become available relatively recently. One of the most challenging issues to address in future research is the selection of keywords, as it is crucial for the validity of a study to measure the financial purpose of a seeker search query. Although many articles analyze more than one stock index, some comparative works among the keywords have been used to build the GSVI and quantify its consistency. Third, little attention has been paid to the practical implications of the GSVI as a predictive variable associated with profitable trading strategies and portfolio diversification. Future research should continue to survey the implementation of investment strategies for shortening portfolios according to the search volume of financial assets to beat the market. However, the net profitability of the strategy can be absorbed because of the transaction costs of modifying the portfolio for each period.

Finally, introducing emerging data technologies such as artificial intelligence (AI) and linguistic modeling has a crucial impact on data processing. Google has recently provided access to new generative AI capabilities embedded in Google searches. Future research should examine these factors and their potential effects on the GSVI. Although this study is the first to consolidate the current knowledge in this area, the inclusion criteria for selecting the articles reviewed exclude other analyses based on fixed-income assets and cryptocurrencies. Future studies should consider other systematic reviews that include these types of assets. Moreover, an analysis based on the prediction models used in each article should be considered in future research to obtain clear insights into the most suitable model for each situation.