1 Introduction

Academic literature highlights the existence of cycles in economic thought linked with the ebbs and flows of economic activities (see, e.g. Kufenko and Geiger 2016). The referenced paper undertakes a bibliometric analysis of the scientific literature on business cycles, delving into the association between economic-related keywords and the evolution of economic activities.

The present paper contributes to this discussion through a statistical analysis of a comprehensive collection of official political speeches, focussing on economically significant keywords. Specifically, we examine whether and how the Standard and Poor’s 500 (S &P 500 henceforth) interrelates to the economic terminology used by US Presidents in their public addresses. We operate under the premise that words used on official occasions carry distinct significance and weight.

Our study’s foundation lies in recognising the US President as a key global influencer, whose carefully crafted speeches significantly impact the socio-economic milieu. For instance, Lee and Myers (2004) discusses the influence of political viewpoints on Enterprise Resource Planning (ERP) changes. The authors observe: “In the political school of thought within strategic management, strategy formation and implementation are seen as being shaped by power and politics”. Conversely, we highlight the effect of factual developments and contingencies on the choice of themes in US Presidents’ speeches, suggesting a bidirectional interaction.

Our research utilises two datasets: the first comprises textual data, identical to that in Cinelli et al. (2021), Ficcadenti et al. (2019), Ficcadenti et al. (2020), encompassing US Presidential speeches from George Washington’s tenure in April 1789 to Donald Trump’s in February 2017. The second dataset includes daily prices, volumes, and returns of the S &P 500 index. The former dataset originates from the Miller Center,Footnote 1 a renowned Political Research Institution affiliated with the University of Virginia. The latter dataset is sourced from the freely accessible “Yahoo!Finance” websiteFootnote 2 (the rationale for selecting these sources is detailed in Sect. 3). The analysis period for the S &P 500 data spans from April 10, 1951, to March 1, 2017, aligning with the speech dataset dates.

Our exploration delves into the interplay between textual content (speech transcripts) and financial markets, offering a novel perspective on an this researched area; we signpost the reader to Gupta et al. (2020) and Fisher et al. (2016)—where extensive literature reviews are presented. Specific examples to contextualise our motives further are Kalyani et al. (2016), where sentiment classifications of Apple’s financial news predict stock trends, and Renault (2020), which examines the correlation between investor mood and stock returns through sentiment analysis of messages on key financial topics. The use of named-entity recognition in finance is exemplified in Sagheer and Sukkar (2019), where texts from Arabic media outlets are analysed using the Al Khalil lexicon in the context of oil production. Furthermore, Qiu et al. (2013) conducted Chinese document modelling for annual report disclosure quality assessment. These studies predominantly leverage Machine Learning and AI tools, especially in the field of Natural Language Processing (NLP), which often rely on large tagged dictionaries. Research focusing explicitly on economic semantic tagged dictionaries includes Consoli et al. (2020), centred on sentiment analysis of economic texts, and Loughran and McDonald (2011) where 4000 words taken from 10Ks financial statement are classified in 7 sentiment categories. These studies underscore the widespread use of AI and Machine Learning, despite their often limited explicability (Lipton, 2018). A key motivation for our research is the utilisation of a fully explicable approach to examine the connection between the economic content of US political speeches and the S &P 500, employing a predefined dictionary.Footnote 3 Additionally, the use of a politically focused corpus spanning [1951-2017] necessitates careful calibration of a pre-trained model due to changes in language over the years.

The scholarly debate in this realm has primarily focused on the interplay between Central Banks (through alternative data) and financial markets (refer to the discussion on the “evolution of information processing technology” in Kuroda 2017), with less attention to politicians’ communications. For instance, Born et al. (2014) conducted a comprehensive textual analysis of communications from Central Bank governors across over 20 countries and their respective stock markets.

Studies on the impact of US politicians’ speeches on financial markets present a scattered view; recent works like Marinč et al. (2021) have explored the role of the Presidents’ (and candidates’) Tweets from 2007 to 2020, and Ajjoub et al. (2020) focused on President Trump’s Tweets. Thus, our second motivation is to examine the historical relationship between US Presidents’ speeches and the US stock market.

To demonstrate the exploratory potential of our proposed method (see, Makri and Neely 2021; Saunders et al. 2009, where the exploratory research is defined and showcased), we treat the speeches as market news, employing a bag-of-words approach with an economic glossary sourced from Bishop (2009) and Wikipedia’s economics glossary,Footnote 4 consistent with Cinelli et al. (2021). We then correlate the presence of economic terms in the speeches with S &P 500 variables one day before, on the same date, and one day after the speeches, allowing us to detect anticipatory or lagged effects of the discourse on the market, and vice versa. Our goal is not to quantify the effects of one variable on another or to establish the directionality of the relationship between the presence of economic words in US presidential speeches and S &P500 market dynamics. Rather, our aim is to identify the existence of a systematic relationship between these elements.

Methodologically, we examine the relationship between the economic content of the speeches and the S &P 500 index through comprehensive statistical analysis from two macroscopic perspectives. Firstly, we employ Kendall’s \(\tau \) correlation (introduced in Kendall 1938) to measure the ordinal association between the frequency of economic terms in the speeches and the financial variables of the index (for more on rank-rank correlation, see Melucci 2007; Gibbons 1993). In doing so, we assess the presence of a relation at a rank level among the occurrences of economic terms in the talks and the value of daily prices, returns and volumes. Secondly, we adopt a geometric and information-theoretic perspective, calculating max, min, Manhattan, and Euclidean distances between the time series of economic term frequencies and S &P 500 observations. These distances offer diverse insights into the relationship between the economic lexicon of the Presidents and the evolution of the S &P 500. Additionally, we compute Shannon entropy (see Shannon 1948) to gain insights into the distribution similarities of each series. This multifaceted approach enables a comprehensive understanding of the relationship between political economic messages and the S &P 500. The unique nature of our research subject further informs our methodology choice. The US presidential speeches in our dataset are distributed irregularly across a lengthy time span, encompassing various major global events from wars to financial crises (some changes on Presidencies are described here Eroukhmanoff 2018). Quantifying the impact of word frequencies on the stock market, or vice versa, would necessitate accounting for an undefined number of variables, potentially limiting our scope. Additionally, discerning the direction and consistency of variable impacts over time is challenging, given the evolving nature of presidential leadership and the institution itself (Rutledge & Larsen Price, 2014). The growth of the institutional Presidency and the resulting dynamics between the President and Congress, as discussed by (Dickinson & Lebo, 2007), highlight the complexities and changing credibility of the Presidency as an institution.

The paper is structured as follows: Sect. 2 lists comparable and complementary works related to our analytical methods. Section 3 details the datasets and their key statistical characteristics. In Sect. 4, we describe the methodological tools employed. Section 5 presents the results and related discussion. Finally, Sect. 6 offers concluding remarks.

2 Review of the literature

Textual data are attracting growing interest, particularly in fields where relevant information is disseminated through written documents, like pathology’s diagnoses in medicine Huang et al. (2023), or in social science, like in Adamopoulos et al. (2018), where the authors used social media users’ latent personality traits—extracted from unstructured textual data—to assess the impact of “word of mouth” on shaping their behaviours and preferences; and the notable case of Wei et al. (2002), in which advancements in document-category management for internet-sourced content were presented.

Getting closer to the core of our study, we mention scientific contributions that address the relationship of US presidencies and Presidents elections with financial markets to give further context for our study and to provide the reader with additional views on the matter. Santa-Clara and Valkanov (2003) investigated the relationship between political cycles and stock market performance, focusing on different US presidencies and election periods. Their findings revealed that presidential elections significantly affect stock market returns, suggesting a potential influence of speeches during these times, although they are less directly market-related than Federal Reserve announcements (Hayo et al., 2012). Santa-Clara and Valkanov (2003)’s methodology included a range of control variables like the log dividend-price ratio, the term spread, the default spread, and the relative interest rate, utilizing data from the CRSP portfolio and the DRI database. They employed linear regression to analyse the correlation between “political variables” and “financial variables”, also incorporating time shifts, albeit on a monthly scale. Due to data distribution challenges and test power limitations, they used a Monte Carlo approach but faced interpretational difficulties, leading them not to present results post Bonferroni correction (see, Leamer and Leamer 1978). Kräussl et al. (2014) conducted a more detailed examination of the presidential cycle puzzle. They focused on the relationship around election dates but found no statistically significant evidence confirming a consistent presidential election cycle effect on stock returns. Their methodology, which involved using the S &P500’s log-returns and dummy variables for presidential periods and Senate majorities, aimed to assess the explainability of stock market returns through political variables. They observed a business cycle pattern, suggesting various potential influences, including presidential actions on monetary policy, taxation, or spending. These actions could be communicated through legislative measures and presidential announcements, prompting our focus on the economic content of these speeches. Recent studies, like Brans and Scholtens (2020), have explored the impact of social media, particularly presidential tweets, on the stock market. They found that President Donald Trump’s tweets did not significantly impact stock returns unless sentiment was included in the analysis. Their treatment of presidential tweets aligns with our approach to presidential speeches, and they did not use control variables. Additionally, Kiessling et al. (2017) highlighted the influence of presidential signaling and rhetoric over a 20-year period, showing that presidential addresses on economic policy can affect market movements. Their findings, derived from a shorter list of economic words and a 20-year analysis period, complement our more extensive word list and longer study period. Abolghasemi and Dimitrov (2020) explored the causality between US presidential prediction markets and global financial markets, finding that US election-related market fluctuations can impact global financial indices. This suggests that presidential speeches, as vehicles for party positions and voter influence, can indirectly affect financial markets in various countries, underlining the US’s unique and central position. Li and Born (2006) examined how presidential election uncertainty affects US stock returns, finding substantial evidence of political outcomes, including elections, influencing the business cycle and stock market. This supports the notion that presidential speeches can have wider financial market impacts beyond the US. Shaikh (2019) studied the 2012 and 2016 US presidential elections’ effects on investor sentiment and stock market performance. Their empirical findings indicated a strong and significant relationship, including the influence of debates and speeches, on investor behavior and market dynamics, using indicators like the COBE VIX implied volatility index, which is S &P500 related. So, while some studies indicate that presidential speeches may not systematically influence financial markets, there is also evidence to suggest they can impact global investor sentiment, market volatility, and stock market returns, via, for exmple, their electoral campains messages. The relationship between presidential speeches and financial markets is complex and depends on various factors, such as the specific context and timing of the speeches.

In terms of methodological approach, our data treatment procedure is similar to those found in Tsai and Wang (2017), Alfaro et al. (2016), Maligkris (2017). However, the work most akin to our method is Goel and Chengalur-Smith (2010), which analysed organisational (public, private, educational, etc.) information security policies through content analysis of policy documents. Quoting the paper: “Content analysis relies extensively on word frequencies to determine the themes (focus areas) and relative frequencies of the themes (or emphasis on focus areas).” As stated in the Introduction and akin to Goel and Chengalur-Smith (2010), Maligkris (2017), we assess the economic content level in analysed speeches by referencing selected economic glossaries.

Our approach is comparable to Lavrenko et al. (2000), where the authors aligned news releases with current or immediately subsequent trends. Unlike Lavrenko et al. (2000), we investigate the co-influence of talks and speeches starting from a glossary, whereas the referenced paper aimed at creating language models for predicting stock market trends. Furthermore, our study differs from Maligkris (2017) in that we do not limit our analysis to speeches from presidential candidates and form Santa-Clara and Valkanov (2003) as it is more exploratory, focusing on trends and synchronicities between variables. We investigate the presence of an interaction between the economic content of U.S. presidential speeches and financial market movements, albeit recognizing similar challenges to those encountered by the authors in estimating the magnitude of this interaction.

Building on Santa-Clara and Valkanov (2003)’s findings, which demonstrated that the correlation between stock returns and political variables is not merely an indirect relation with business cycle factors, we posit that a synchronic behaviour between presidential economic addresses and market dynamics might exist. However, as they noted, the mechanism through which political variables impact stock returns remains an open question.

In the procedure we propose, we utilise rank-rank correlation. This technique is commonly employed in text analysis (for example, Baron et al. 2009) and is frequently used in sentiment analysis and finance studies (see Tsai and Wang 2017, Pak and Paroubek 2011, Laih 2014). Our methodological decision is based on the collective aforementioned studies which provide a rich background for understanding the relationship between US presidential speeches, political cycles, investor sentiment, and the US stock market. They offer sufficient evidence to support our explorative approach to this investigation; in fact, benefiting from this literature and declaring that our objective is not to quantify the relationship between the entities at play or explain its direction, we deploy the analysis by means of tools described in Sect. 4.

The number of studies demonstrating a correlation between the stock market and political speeches using text mining techniques and rank-rank correlation is limited. A particularly relevant study is Preis et al. (2013), which, while not focusing on political messages, used Kendall’s \(\tau \) to measure the relationship between Google query volumes for finance-related search terms and the stock market.

In the realm of machine learning algorithms, distance measures are extensively used (see Nassirtoussi et al. 2014, for an extensive review of machine learning applications in stock market prediction through text mining methods). In news classification and impact studies, distances are a fundamental component. For example, Fung et al. (2002) employed Euclidean distance to segment stock time series and compare them with various classes of news grouped by keyword collections.

Many trading and prediction algorithms for news exploration involve distance measurements and are based on text analysis. In this respect, we refer to Shynkevich et al. (2015), Schumaker and Chen (2009) for providing further context.

Finally, Shannon entropy is commonly used in text mining, especially in text categorisation (see e.g. Largeron et al. 2011) and authorship attribution (see Rosso et al. 2009). In the financial sector, it is employed to analyse financial time series features or design portfolio selection strategies (see Bentes et al. 2008; Mendes et al. 2016; Tessmer et al. 1993). A notable review in finance is Zhou et al. (2013). Generally, entropy’s use has a broad and longstanding history in information theory (see, e.g., Gray 2011). To our knowledge, this paper is the first to employ such a combination of tools to assess the similarity between the economic content of political speeches and the patterns of financial markets.

3 Data

We have chosen the Standard and Poor’s 500 for our analysis as it is among the most representative index of the US stock market. The S &P 500 is often regarded as the quintessential stock index. It is recognized as the foremost representation of the “most significant financial markets” (the US exchanges) and is extensively referenced in authoritative financial literature (see, e.g., Golez and Koudijs 2018; Patton and Weller 2020) As the primary benchmark for large-cap US equities, the S &P 500 encapsulates approximately 13.5 trillion US dollars, either indexed or benchmarked against it. It comprises over 500 elite companies, representing around 80% of the available market capitalisation (S &P Dow Jones Indices, 2022). Consequently, the S &P 500 holds a pivotal position in the socio-economic-political landscape of the US, and the whole world. This prominence positions it at the heart of US Presidents’ interests, providing a clear and direct motivation to explore the interplay between the economic content of US Presidents’ speeches and the performance of such a stock index. Therefore, we have collated daily index closing prices and volumes from January 3rd, 1950, as these were the earliest available records from “Yahoo!Finance”. The dataset extends up to March 1st, 2017, and includes an additional day following President Donald Trump’s speech on February 28th, 2017 (Address to Joint Session of Congress), to encompass the latest presidential speech available in the dataset, in our analysis.

The corpus of Presidents’ speeches analysed in this study comprises 951 addresses, as catalogued in Ficcadenti et al. (2019), Cinelli et al. (2021). This collection spans from George Washington’s tenure in 1789 to Donald Trump’s in February 2017Footnote 5. We tailored this corpus to include speeches from January 3rd, 1950 onwards, aligning with the start of the S &P 500 series available on “Yahoo! Finance”. The earliest speech in this adjusted corpus is Harry S. Truman’s “Report to the American People on Korea” dated April 11th, 1951.

We compiled a list of economic terms from Bishop (2009), where the author elucidates “the most important economic terms and concepts.” His work focuses on economics as it relates to jobs, prices, trade, and its impact on everyday life, drawing on articles from The Economist as a prime source. This list was expanded by incorporating terms from https://en.wikipedia.org/wiki/Glossary_of_economics. Following the methodology in Cinelli et al. (2021), we processed this compilation to derive a glossary of 383 economic terms. The presence and frequency of these terms in the speeches were analysed to gauge the economic emphasis of each address. We calculated the absolute frequency of economic terms in each speech and determined their relative frequency by dividing these counts by the total word count of the respective speeches.

A minor bias exists in our approach to defining relative frequency, as the proportion of economic words in a speech might not have one as an upper bound. However, the effect of such a bias is negligible and does not detract from the validity of our analysis, similar to the approach in Goel and Chengalur-Smith (2010), where authors used this method for evaluating the significance of Breadth in security policy-related documents.

Figure 1 illustrates the relative frequencies of economic terms in the speeches over time.

Fig. 1
figure 1

Percentage of economic terms in each speech over the years, calculated as the proportion of economic terms relative to the total word count in the speech

This procedure aligns with the methodologies of Baker et al. (2016), Tsai and Wang (2017), and others, as discussed in Sect. 2. For example, Baker et al. identified a set of terms related to economic insecurity to create an index of economic policy uncertainty, while Tsai and Wang utilised word lists to analyse financial reports of firms. Similarly, Wei et al. (2015) selected keywords to identify relevant studies in climate change modelling.

For comparative analysis, we normalised all data, including the absolute frequencies of economic terms in speeches and the S &P 500’s daily closing prices, volumes, and returns. Let \(\{x_1, \dots , x_N\}\) represent a time series with N realisations, where \(t=1, \dots , N\) corresponds to a trading day for financial variables and a speech date for US Presidents’ talks. The normalised value of \(x_t\) is given by

$$\begin{aligned} x_t' = \dfrac{x_t - min(x)}{max(x) - min(x)}, \qquad t=1, \dots , N, \end{aligned}$$
(1)

where

$$\begin{aligned} max(x)=\max \limits _{s=1, \dots , N} x_s \end{aligned}$$

and

$$\begin{aligned} min(x)=\min \limits _{s=1, \dots , N} x_s \end{aligned}$$

are the maximum and minimum values across the series. This normalization allows for a direct comparison between variables with differing scales, making it ideal for our context. Note that the relative frequencies of economic terms fall in [0, 1) by definition and thus do not require normalization.

To compare the speeches with the S &P 500 variables, these datasets have to be aligned. As mentioned, all the talks delivered before January 3rd, 1950 (the date on which the first S &P 500’s closing price is observable in the considered dataset) are not considered. Since the first available speech after January 3rd, 1950, dates back to April 11th, 1951, the financial information is considered from April 10th, 1951. We start “one day before” the speech delivery date to allow the analysis of the financial data recorded before the speech. So, the resulting considered messages are 376.

To compare speeches’ economic content and the S &P500, we need to ensure that information is recorded for both entities. The speeches are often delivered during weekends or days when the market has not opened, so the number of days we have info for both entities is 327. As we will see in Sect. 4, we perform analyses comparing the speeches’ economic content against the S &P 500 realisations registered “one day before”, “contemporaneously” and “one day after” the speeches’ dates. Thus, it happens that data availability changes. To better explain how the differences in the number of speeches available for the analysis occur, we use the example driven by the following two transcripts:

  • Saturday, 13/03/1965 – Press Conference at the White House, Lyndon B. Johnson

  • Monday, 15/03/1965 – Speech Before Congress on Voting Rights, Lyndon B. Johnson

In the “contemporaneous” analysis, the speech on 13/03/1965 is excluded due to the absence of corresponding financial data (it being a Saturday). However, when assessing its economic content against the S &P 500 data from the day before, the financial data from 12/03/1965 is considered. For the 15/03/1965 speech, if the analysis includes market data from the day before, only the information from 12/03/1965 is relevant, as the 13th and 14th were non-trading days. Thus, both speeches are included in this case, and the same applies when comparing with financial data from the day after the speech.

There are 41 speeches delivered on weekends (see Table 1). The difference between 376 and 327 (49, see Tables 5 and 4) includes speeches on non-trading days. For instance, President Barack Obama’s Second Inaugural Address on Monday, 21/01/2013 (Martin Luther King Jr. Day), is an example where the market was closed, and it was not a weekend.

Table 1 The second column indicates the number of speeches delivered on each day of the week from April 30, 1789, to February 28, 2017. The third column reflects the number from April 11, 1951, to February 28, 2017

4 Methodology

To investigate the interplay between the economic terminology in Presidential speeches and the S &P 500 index, we have considered several scenarios. These are outlined below for ease of reference, examining various pairings:

  1. (a)

    Normalised S &P 500 returns observed on the day before, the day of, and the day after the President’s speeches, each paired with the relative frequencies of the selected economic terms.

  2. (b)

    Normalised S &P 500 returns observed on the day before, the day of, and the day after the President’s speeches, each paired with the normalised absolute frequencies of the economic terms.

  3. (c)

    Normalised S &P 500 closing prices observed on the day before, the day of, and the day after the President’s speeches, each paired with the relative frequencies of the economic terms.

  4. (d)

    Normalised S &P 500 closing prices observed on the day before, the day of, and the day after the President’s speeches, each paired with the normalised absolute frequencies of the economic terms.

  5. (e)

    Normalised S &P 500 volumes observed on the day before, the day of, and the day after the President’s speeches, each paired with the relative frequencies of the economic terms.

  6. (f)

    Normalised S &P 500 volumes observed on the day before, the day of, and the day after the President’s speeches, each paired with the normalised absolute frequencies of the economic terms.

The enumerated cases (a) through (f) encapsulate six investigated relationships, combining two variables from the economic glossary (relative and absolute frequencies) with three market variables (closing prices, returns, and volumes).

Table 2 A synopsis of the variables considered in this study. The market variables analysed include the normalised S &P 500’s daily returns, normalised closing prices, and volumes. From the speeches, we examine both the normalised absolute and relative frequencies of the economic glossary, as described in Sect. 3

The analysis of the days preceding and following the speeches aims to discern if the market reacts in anticipation of the speeches or, alternatively, if it seems to pre-empt the economic themes addressed in the speeches.

We employed Kendall’s \(\tau \) rank-rank correlation for our correlation analysis. This indicator is calculated for pairs of series observed jointly and of equal length; denoted as \(\{(k_i,h_i)\}_{i = 1,\dots , N}\). The computation procedure initiates by ranking the ks and the hs in either ascending or descending order. These ranks are then coupled based on their original joint observations. The Kendall rank correlation coefficient \(\tau \) is defined as:

$$\begin{aligned} \tau = \dfrac{(\text {number of concordant pairs}) -(\text {number of discordant pairs})}{N(N-1)/2} \end{aligned}$$
(2)

A pair of observations \((k_i,h_i)\) and \((k_j,h_j)\) is concordant if the ranks of both k and h agree post-sorting. The number of such concordant pairs is given by:

$$\begin{aligned} {\left\{ \begin{array}{ll} k_i< k_j \\ h_i < h_j \\ \end{array}\right. } \qquad \text {or, alternatively,} \qquad {\left\{ \begin{array}{ll} k_i> k_j \\ h_i > h_j \\ \end{array}\right. } \end{aligned}$$
(3)

The number of pairs not satisfying equation (3) are classified as discordant.

We decided to make the analysis through Kendall’s rank-rank correlation because we want to measure the ordinal association between economic content appearance in the speeches (frequency of locutions presence) and the considered financial variables. Namely, we look for the presence of a relation at a rank level among the occurrences of economic terms in the talks and the S &P 500’s daily prices, returns and volumes. In so doing, we avoid the distorting effect on correlation due to the very different sizes of the quantitative terms of interest, hence, measuring the association between the considered variables. So, Kendall’s \(\tau \) is suitable for our case because it allows for an evaluation of the strength of association between the considered variables, which are radically different. Indeed, such a methodological device helps to visualise the synchronicity of the occurrences of economic content in speeches and the S &P 500 levels. In other words, Kendall rank-rank correlation allows for an evaluation of positive or negative strength of co-influence of the elements under analysis. Therefore, the correlation of rank-rank nature allows disregarding assumptions on the data’s empirical distributions (Mata & Fuerst, 1997).

The association between frequencies of a set of words and another variable of interest can explain if stressing a theme in a speech is synchronised with such a variable. From this perspective, ranks are more informative than sizes. Indeed, ranks are obtained by comparing frequencies of words that occurred along the considered history (because they result from the ranking exercise), while sizes are absolute terms. Dealing with ranks allows exploiting the information brought by the speeches in the period and the implementation of imitative behaviours among Presidents to gain insights into the performance of the S &P 500—the main US financial index. This offers immediate and intuitive information on the role of Presidents speeches’ economic content in the interaction with the financial markets. This argument explains why rank-rank correlation is often employed in text analysis (for example, see Baron et al. 2009; Teevan et al. 2018). For an extensive review of Kendall’s \(\tau \) and other similar measures, see Kruskal (1958).

We have also explored the datasets using various distance measures between the normalised frequencies of economic glossary terms and the S &P 500 data. The analysed cases are (a),(b),(c),(d),(e) and (f) of the list above. These measures are:

$$ \begin{aligned}&d_{mx} (f,S \& P 500^{(k)}) = max_{t} \left| f_t - S \& P 500_{t+k}\right| \end{aligned}$$
(4)
$$ \begin{aligned}&d_{mn}(f,S \& P 500^{(k)}) = min_t \left| f_t - S \& P 500_{t+k}\right| \end{aligned}$$
(5)
$$ \begin{aligned}&d_{am}(f,S \& P 500^{(k)}) = \dfrac{1}{T^{(k)}} \sum _{t=1}^{T^{(k)}} \left| f_t - S \& P 500_{t+k}\right| \end{aligned}$$
(6)
$$ \begin{aligned}&d_{ec}(f,S \& P 500^{(k)}) = \sqrt{ \sum _{t=1}^{T^{(k)}} (f_t - S \& P 500_{t+k})^{2}} \end{aligned}$$
(7)

where \(k=-1,0,1\); \( S \& P 500^{(k)}\) is representing the normalised S &P 500 data (volume, closing price, or return) observed one day before, on the day, and one day after for \(k=-1,0,1\), respectively. \(T^{(k)}\) denotes the total number of observations, which varies with the time selection; \(f_{t}\) represents the summed (absolute normalised or relative) frequency of the economic terms in the speech at time t.

Furthermore, we computed and compared the Shannon entropy for each data series to quantify and discuss the information contained therein. At this aim, the variation range of each series is divided into N intervals of equal size. Entropy is defined as

$$\begin{aligned} H= - \sum _{j=1}^N p_j \log _2 p_j \end{aligned}$$
(8)

where \(p_j\) is the probability of an observation falling within class j. In our analysis, N is set to 320, based on empirical considerations and trials. For further details, see Shannon (2001).

5 Results and discussion

In this section, we present and discuss the results derived from the analysis of three S &P 500 variables: normalised returns, closing price, and volumes and two variables from an economic glossary: relative and normalised absolute frequencies. Our analysis considers three distinct temporal instances: “one day before”, “contemporaneous”, and “one day after” the event of interest. A synopsis of the variables and instances considered in this study are presented in Table 2.

Table 3 Statistical summary of the variables used for evaluating the relationship between the Presidents’ speeches and the S &P 500 observations on the same days the talks are delivered. * The inverse of the variation coefficient is chosen. We mention Kalkur T. and Rao (2017), where the authors make a methodological proposal for estimating both the coefficient of variation and its inverse
Table 4 Statistical summary of the variables used for evaluating the relationship between the Presidents’ speeches and the S &P 500 on the day before the talks’ delivery dates
Table 5 Statistical summary of the variables used for evaluating the relationship between the Presidents’ speeches and the S &P 500 observations on the day after the talks’ delivery dates

Tables 3, 4, and 5 offer detailed statistical summaries. These Tables compare the relative and absolute normalised frequencies of economic terms with the normalised variables of the S &P 500 for each of the specified time frames: contemporaneous, one day prior, and one day subsequent to the speeches.

Furthermore, we have computed Kendall’s \(\tau \) coefficient to evaluate the rank-rank correlation for pairs (a), (b), (c), (d), (e), and (f). These calculations’ results are documented in Table 6.

To augment our analysis, we have calculated distances in accordance with the formulas (4), (5), (6), and (7). The outcomes of these computations are detailed in Table 7. Lastly, the Shannon entropies, as defined in formula (8), are elucidated in Table 8, providing a comprehensive view of the data’s informational content.

Figure 2 presents the histograms of the economic terms’ relative frequencies. They are asymmetric with positive skewnesses, which suggests right-tailed distributions (see also Tables 3, 4 and 5). Furthermore, the values of the kurtoses indicate leptokurtic behaviours. There is an evident presence of outliers: for example, some speeches exhibit more than 7.5% of terms related to economics. Furthermore, Figs. 2 and 3 display the frequencies of economic terms, adjusted to align with the S &P500 data according to the specified instances. Specifically, we analyse the economic content of speeches in relation to financial data recorded one day before, on the same day, and one day after the speeches. The differences in these histograms primarily result from the exclusion of speeches for which corresponding financial data were unavailable, thus precluding proper alignment (refer to Sect. 4 for more details). These histograms, in conjunction with Tables 1, 3, 4, and 5, provide insights into the empirical distribution of the observed variables and their variations for each instance.

Figure 3 shows the histograms of normalised absolute frequencies. The asymmetry indexes in Tables 3, 4 and 5 manifest the presence of highly skewed distributions, and kurtoses almost tripled with respect to the ones of the previous case. The outcome is motivated by the higher concentration of observations on the left side of the distributions and the presence of outliers.

Table 6 The Kendall’s \(\tau \) estimations. Each couple of columns contains results coming from a different sampling time of S &P 500 observations

Figures 4 and 5 illustrate the empirical distributions of the normalised daily closing prices and volumes. These should be considered alongside Tables 3, 4, and 5. The data suggest that on the days preceding and following presidential speeches, prices tend to hover near the series’ minimum (i.e., the minimum before normalisation), except on the days of the speeches themselves, where prices close slightly further from their minimum. A similar trend is observed in trading volumes, with an increased number of transactions on the days of the speeches. In contrast, returns (as depicted in Fig. 6) tend to cluster around the average on the day before and on the day of the speeches.

Table 7 The different distance measures described in Sect. 4 are computed and reported, a reference them is found in the first column
Table 8 Shannon entropy for each series. The columns distinguish the different data selections made in accord with the selected dates. The third column points to the S &P 500 observations registered on the same days Presidents have stated their talks, while the second and the last ones are about the series of S &P 500 observations registered the days before and after the Presidents’ talks

A visual examination of Figs. 4 and 5 reveals asymmetries, characterised by less pronounced right tails compared to previous cases. The kurtosis is more pronounced in the case of normalised volumes, with the tallest bin in the histograms containing the majority of values, predominantly clustering around zero. These occurrences influence all statistical position indicators and result in lower series variances, which are smaller than those calculated for normalised prices. The presence of outliers is also significant in these cases.

Figure 6 shows different behaviours for the considered variables. This outcome supports the idea of more symmetric distributions of normalised returns, with skewness values very close to zero and means and medians almost coinciding. The variances are tiny inasmuch as the distributions have a high concentration of values around their centres. From a visual inspection, it is possible to note that the outliers are present in all the histograms, but in the cases of the day after the Presidents’ talks dates, we have a slightly fatter right tail.

Fig. 2
figure 2

Histograms illustrating the relative frequencies of economic terms. These histograms are divided into sub-figures based on the dates selected. In the “contemporaneous” case, speeches without corresponding S &P 500 observations are omitted. This includes speeches on days like Sunday, when the financial market is closed, thereby lacking S &P 500 data. Such instances are excluded from the analysis to ensure accurate comparison with market activities

Fig. 3
figure 3

Histograms depicting the normalised absolute frequencies of economic terms. Variations in the histograms arise from the different dates selected. In the “contemporaneous” scenario, speeches lacking corresponding S &P 500 observations are excluded. For instance, speeches delivered on Sundays are omitted, as there are no S &P 500 observations on these days due to market closure. This exclusion ensures that only speeches with relevant financial market data are considered in the analysis

Fig. 4
figure 4

Histograms of the S &P 500 normalised daily closing prices

Fig. 5
figure 5

Histograms of the S &P 500 normalised daily volumes

Fig. 6
figure 6

Histograms of the S &P 500 normalised daily returns

Table 6 demonstrates that the \(\tau \) estimations lack statistical significance when involving the S &P 500 normalised returns, as evidenced in pairs (a) and (b). This is also observed in pairs comprising normalised closing prices and the relative frequencies of economic terms, as in the (c) scenarios.

Conversely, other cases exhibit markedly high levels of statistical significance. In instances (d), there exists a positive correlation between the prevalence of economic expressions and the normalised closing prices. Notably, the weakest positive rank correlation is observed on the day preceding the speeches, while the correlations recorded on the speech day and the following day are marginally higher. In (f), the p-values are substantially lower compared to those in (d) and (e), indicating exceptionally significant \(\tau \) estimations. The analysis for (e) on the day prior to the speeches is rendered inconclusive due to a p-value of 0.051. However, the latter two columns of Table 6 for this case show statistically significant and positive correlations. The most pronounced positive \(\tau \) value, reaching 10%, is observed in the interplay between the normalised frequency of economic terms and the transaction volume on the day following the speeches. For (f), the lowest correlation is 9.6%; a minor increment occurs between the volume of the day before the speech and the economic words’ presence.

The positive rank-rank correlations between the inclusion of economic locutions in the speeches and the S &P 500 index range from a minimum of 8.5% in (d)—bserved when normalised closing prices are logged a day before the speeches—o a maximum of 10% in (f), specifically when S &P 500 volume data are recorded on the days following the speeches. These correlations corroborate the intuitive notion that the considerable influence of both US Presidents and the stock market, coupled with their interrelations, inevitably intersects their spheres of influence. Consequently, the estimated correlations reveal that the economic discourse of US Presidents both influences and is influenced by stock market dynamics. In this context, the positive correlations between trade volume and the mention of economic terms—as in cases (e) and (f)—potentially indicate the traders’ attentiveness to the economic content of US Presidents’ speeches. In essence, the volume of trades escalates in proximity to these speeches, suggesting that a higher incidence of economic terminology in the speeches correlates with an increased volume of transactions.

From this analysis, we can partially elucidate the variations in price observed within the specified time frames—from one day preceding to one day following the speeches—through the lens of the economic substance in Presidents’ addresses. Notably, the correlations evolve over time, exhibiting an increase on the days of the speeches (“at the date” comparison) in scenarios (d) and partially (c). On these occasions, the influence of Presidents’ speeches on pricing can be attributed to factors such as rumours (since speeches are typically delivered post-market closure), the content of the messages, or merely due to anticipation of the President speaking recently anticipated by, for example, Tweets.

The price appears to mirror the traders’ tentative expectations regarding the economic substance of the speeches on the preceding day. For instance, when a speech is scheduled on the political agenda, market participants formulate trading strategies based on their assumptions about the forthcoming information from the President. Subsequently, price adjustments also materialise as a result of modifications in traders’ expectations following the speech. Specifically, both price and volume undergo changes as they reflect shifts in traders’ perspectives, with strategies being revised to incorporate the newly acquired information (evident in the correlation increases for observed S &P 500 variables on the day after the speeches). The measured correlation on the volume, especially in cases (f), further substantiates this observation.

Let us now examine the distances delineated in Table 7. Distances calculated using equations (4) and (5) facilitate the identification of the extremities in the deviation ranges between series. Observing the results from equation (4), one notices that the distance estimates are considerably large and exhibit homogeneity across both the columns and the pairs of comparisons. Contrastingly, equation (5) reveals notable disparities between the estimates, including some instances of null values. It is particularly noteworthy that pairs (a) and (b), which involve returns, display the highest minimum levels across the columns.

Distances derived from formulas (6) and (7) highlight two distinct facets. The former represents a fair mean of the distances between points, while the latter mitigates the impact of larger distances by squaring numbers in the (0,1) range. Thus, we obtain a balanced measure alongside an underestimation of the differences between the series in question.

Regarding distances computed via equation (6), the outcomes for pairs (a), (c), and (d) exhibit minimal variations across the columns, in contrast to the more pronounced changes seen in other pairs. Excluding case (b), a consistent pattern emerges: the distances are notably greater in contemporaneous instances compared to those where S &P 500 data is recorded one day before or after the speeches. Additionally, the distances recorded for the latter exceed those for the former. The magnitude of these differences is more pronounced when volumes are included in the analysis, as seen in pairs (e) and (f). This observation suggests that the selected variables demonstrate varied sensitivities towards each other and differ according to the temporal dimensions chosen. Employing the mean (6), the smallest distance across the columns in Table 7 is observed for pairs analysing the relative frequencies of economic terms and normalised volumes. These results corroborate the findings from Kendall’s \(\tau \) (refer to Table 6) and further emphasize the correlation between S &P 500 volume and the presence of economic terms in the speeches.

Distances calculated using formula (7) yield intriguing findings for pairs (a), (b), (c), (d), and (f). Minimum values are observed in “contemporaneous” observation dates (“at the date” case), with the maximum occurring when considering S &P 500 data the day after the speeches, and a slight deviation when the data are recorded a day before. This pattern can be attributed to traders adjusting their positions in anticipation of the speeches, subsequently influencing both price and volume. The minimum Euclidean distance for pairs (e) reaffirms that S &P 500 volumes are closely aligned with the frequency of economic terms, thereby validating conclusions drawn from the \(\tau \) coefficient estimations. Notably, this pair exhibits a unique trajectory across different time windows.

Conversely, when employing both formulas (6) and (7), it becomes apparent that the series most distanced from the frequencies of economic locutions (normalised absolute and relative) are the normalised daily returns series. This finding aligns with the limited statistical significance observed in the \(\tau \) estimations involving returns. The prominent cases of distances measured with equation (5) further reinforce this conclusion, with pairs involving normalised returns displaying the most significant deviations from zero.

Table 8 presents the outcomes derived from Shannon’s entropy, calculated using formula (8). It is observed that the series representing the relative frequencies of economic terms manifests the highest entropy, whereas the S &P 500 volume series exhibits the lowest.

The relative frequency series of locutions closely aligns with the returns in terms of entropy. Likewise, the entropy of the locutions’ normalised absolute frequencies bears a close resemblance to that of the S &P 500 normalised closing prices. These observations suggest that the degrees of disorder in these series are similar on a pairwise basis, and this parallel holds true in scenarios one day before, on the day, and one day after the events in question. When juxtaposing this insight with the results derived from Kendall’s \(\tau \) and the distances, it is feasible to propose that the frequencies of economic locutions might serve as a proxy for the distributions and statistical characteristics of returns and closing prices, albeit with non-simultaneous behaviours. However, the entropy analysis does not enable assertions regarding potential deviations or convergences in the paths of returns, prices, and frequencies of economic terms. The alignment in entropy is more pronounced in the case of normalised absolute frequencies as opposed to relative frequencies, particularly with respect to normalised closing prices. The most significant divergences are noted between the frequencies of locutions and the volumes. Reflecting upon the earlier discussions concerning Kendall and distances, it is evident that while the volume series exhibit positive rank-rank correlations with frequencies, their empirical distributions, and those of the frequencies, differ in shape. Consequently, while the frequencies of economic terms can be utilised as proxies for analysing the evolutionary properties of volumes, they are less suited for examining the statistical attributes of the corresponding empirical distributions.

6 Conclusive remarks

The presence of economic terminology in US Presidents’ speeches exhibits a discernible association with the Standard and Poor’s 500 index, with the degree of this interconnection varying depending on the specific S &P 500 variables considered. As per the evidence presented, the S &P 500 volume is the most sensible to the presence of economic expressions in these addresses. While initial indications of this are perceptible in Fig. 5, it is the Kendall \(\tau \) coefficient that more convincingly suggests a positive and significant correlation. The distance measurements further corroborate a high level of synchronisation between these series. Moreover, the Shannon entropy values reveal comparable patterns between the volume’s fluctuations and the frequencies of economic terms, thereby affirming the relationship from an alternate standpoint.

Concerning the S &P 500 closing prices, their correlation with the relative frequencies of economic locutions is not statistically significant as per the \(\tau \) coefficient. Conversely, when considering normalised absolute frequencies, the Kendall \(\tau \) demonstrates positive—and significant—correlations. The distance measurements in these scenarios are higher compared to the latter cases, and the entropy values for the closing price series more closely align with those associated with the economic word frequency series. Mild indicators of variation in the empirical distribution shapes, as suggested by Fig. 4, intuitively support this observation. Thus, the insights regarding the interplay between the closing prices and the economic content of Presidential speeches align with those observed for the volume, albeit with a lesser intensity in the S &P 500 prices. The distributions disorder (measured by Shannon’s entropy) proves the presence of shared characteristics between the closing price series and the frequency of economic terms. This relation is remarkably lower in the context of trading volumes.

In contrast, the S &P 500 daily returns do not exhibit significant co-movements with the economic locutions within the speeches, particularly when analysing Kendall’s \(\tau \). Various distance measurements confirm that the returns series consistently diverge from the series of glossary terms’ frequencies, which may partially explain the lack of statistical significance in the \(\tau \) estimations. However, when assessing Shannon’s entropy, the return series’ entropies are akin to those of the economic terms’ frequencies. This finding implies that, although an instantaneous relationship may be absent, a connection at the level of empirical distributions could exist.

In light of these findings, several practical implications emerge:

  • Presidents may emphatically emphasise economic themes in their speeches to facilitate market trading activity.

  • A President’s speech scheduled in the near future generates a movement in the market—mainly when the speech is assumed/expected to have economic content. Therefore, announcing the economic content of a speech beforehand is a way to interact with the financial market.

  • There exists a limited (positive) relationship between returns, prices, and the economic content of speeches. In terms of correlation, this relationship is insignificant for returns but significant for prices. Consequently, the economic content of Presidential speeches is not a reliable policy instrument for influencing daily index levels and changes.

These results are relevant, and succinctly demonstrate that the economic content of Presidential speeches has a more profound impact on, and is more influenced by, the S &P 500’s daily volumes than by its prices and returns. Therefore, the findings facilitate the identification of an immediate communication-based strategy for interacting with some market’s features.

Furthermore, this study underscores the necessity of contemplating whether our findings and proposed methodology could be extrapolated to other countries. We posit that our methodology could be applicable in different national contexts, although evidence to confirm its effectiveness in these settings is currently lacking. We anticipate that specific effects may vary due to diverse cultural, economic, and political environments, yet the overarching approach would remain pertinent.

There is scope for a research agenda that explores the speeches of leaders from countries with substantial global or regional financial influence, such as the United Kingdom, Germany, or China. Such an expansion would enable comparative analysis and facilitate the establishment of theoretical foundations underpinning the interactions between political rhetoric focussed on economics and financial markets.