COVID-19 forecasts via stock market indicators

Liang, Yi; Unwin, James

doi:10.1038/s41598-022-15897-x

COVID-19 forecasts via stock market indicators

Article
Open access
Published: 01 August 2022

Volume 12, article number 13197, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

COVID-19 forecasts via stock market indicators

Download PDF

Yi Liang¹ &
James Unwin²

2259 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We propose that technical analysis tools developed to give buy/sell signals in asset trading can be applied to analyze time series datasets in the natural sciences, and we show this explicitly for a study of WHO COVID-19 data. Notably, reliable short term forecasting can provide potentially lifesaving insights into logistical planning, and in particular, into the optimal allocation of resources such as hospital staff and equipment. By reinterpreting COVID-19 daily cases in terms of candlesticks, we are able to apply some of the most popular stock market technical indicators to obtain predictive power over the course of the pandemics. By providing a quantitative assessment of MACD, RSI, and candlestick analyses, we show their statistical significance in making predictions for both stock market data and WHO COVID-19 data. In particular, we show the utility of this novel approach by considering the identification of the beginnings of subsequent waves of the pandemic. Finally, our new methods are used to assess whether current health policies are impacting the growth in new COVID-19 cases.

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Article Open access 20 January 2024

Forecast evaluation for data scientists: common pitfalls and best practices

Article Open access 02 December 2022

Causal inference for time series analysis: problems, methods and evaluation

Article 23 November 2021

Introduction

Logistical planning can be the difference between life and death during a pandemic, such as the ongoing COVID-19 crisis. Here we identify new techniques which can be applied during pandemics to assist in the optimal allocation of resources, and to aid in the evaluation of current health policies. Specifically, we repurpose a number of tools developed as stock market strategies and demonstrate that these techniques can be applied to predict future trends in the number of daily new cases of COVID. Notably, these tools can be used to

Identify the peak of a pandemic wave;
Forecast the start of new waves.

While fluctuations in the number of new COVID cases and the prices of stocks may naively seem disconnected, both systems can be described as non-stationary random walks, i.e. a time series which exhibits random fluctuations around a longer-term trend. In the context of stocks, the random walk hypothesis can be formulated with the daily rate of returns in the stock market randomly drawn from a Gaussian or Laplace distribution¹. On the other hand, the daily increase in COVID-19 cases can be modeled as a random walk due to the complex nature of human interactions and it has an overall trend as an infectious disease in the spread and controlled phases. Drawing on this connection we hypothesized that strategies developed for predicting stock price movements can be repurposed to forecast changes in the trend of the number of new COVID cases, and more generally any system that is well described as a non-stationary random walk. We propose here that these techniques, collectively known as “technical analysis”, developed to give buy/sell signals in the stock market (in particular ‘momentum’ indicators) can be used to provide predictions for other time series data, such as COVID-19 forecasts. Specifically, we show that these technical indicators do indeed provide accurate predictions of the COVID-19 pandemic, in the sense that they are statistically significant.

Traders look to identify continuations or reversals in stock market trends to profit from short-term trades, and there are several well established set of techniques for forecasting future stock movements, known as “technical indicators”. Popular technical indicators include candlestick patterns², the Moving Average Convergence Divergence (MACD) indicator³, and the Relative Strength Index (RSI) indicator⁴. While, the robustness of these techniques has been debated⁵—in particular, the efficient market hypothesis states that if the market is efficient, then any profitability information of technical indicators would be incorporated into the new prices and thus it should be impossible to gain abnormal profits⁶—it has been shown that several of these technical indicators are indeed statistically significant. Specifically, statistical tests of technical indicators have been conducted to examine the effectiveness, and the profitability of the stock markets of Taiwan, Thailand, USA, and Brazil^7,8,9,10,11. These studies have shown that some techniques do have predictive power, providing evidence for market inefficiency, while others indicators are not successful. For further recent studies of technical indicators applied to stock market returns the author may also wish to compare with e.g.^12,13,14,15.

After introducing the notion of trends and candlestick representations for time series data, we provide mathematical definitions for the various technical indicators in “Technical indicators”. To emphasize the novelty of our techniques, we use the World Health Organization (WHO) COVID-19 data throughout the paper and highlight the application of these technical analysis techniques outside of the domain of the financial markets. By performing statistical tests on technical indicators in “Statistical methods”, we show that a selection of these indicators correctly predict reversals in existing trends at rates which are statistically significant. Our analysis improves on aspects of earlier analyses of stock market data and, moreover, extends the study to these indicators outside of stock market forecasts, as illustrated in “Predicting new cases of COVID-19” for COVID-19. Our results have the following important implications:

Technical indicators have predictive power beyond forecasting future asset prices.
These tools can be used to identify the beginnings of subsequent waves of a pandemic.
New methods to asses whether current health policies are impacting the growth in new cases.

For completeness, “Application in stock markets” gives an analysis using stock market data. “Concluding remarks” gives some concluding remarks.

Technical indicators

Technical analysis aims to forecast future price movements in the stock market, and will be the key tools which we shall use to analyze COVID-19 data. In this section, we first introduce the notion of a trend (“Trends”) and Candlestick Representations of data (“Candlestick representations”), and then outline some of the main technical indicators:

Candlestick Patterns (“Candlestick patterns”)
RSI (“Relative strength index”)
MACD (“Moving average convergence divergence”)

In subsequent sections, we will investigate the predictive power of these technical indicators on COVID-19 data.

Trends

While stock prices and daily COVID cases may fluctuate on shorter time frames, they have an observed tendency to evolve in the same direction for extended periods of time. The cause for long term trends in the stock market may be linked to macroeconomic factors such as monetary policy, or in the case of individual companies trends may be due to particular news or sentiment which result in the continual increases or decreases of the stock prices. For COVID-19, the growth of new cases is due to the fact that the virus is very infectious and the population was (and remains) highly vulnerable, which led to a significant initial uptrend. As a result, there is typically a well defined notion of “trends” in time series such as these. A good approach to identify trends in a time series is through a simple regression procedure (as we will detail below) and Fig. 1 presents an example in terms of the daily change in asset prices.

A period for which the gradient of the regression line has a fixed sign indicates a trend in the time series, which can be either an uptrend or a downtrend. Identifying such trends can provide insight into the likely subsequent behavior of the time series. Technical analysis was originally developed to provide signals for the start and end of trends in stock market data, and here we apply these techniques to alternative time series data sets.

We introduce a time variable which is integer valued and in particular we focus on daily and weekly intervals, counting from some partiular start date. A natural question to ask is whether for a given date D a particular evolution in the time series exhibits a preexisting trend over the last $\delta $ days. If such a trend exists we will say that the data exhibits a $\delta $-interval trend, additionally:

A trend is called bullish if the gradient of the trend is strictly positive.
Conversely, a trend is said to be bearish its gradient is strictly negative.

In what follows we shall often partition the time series data into daily and weekly intervals. Over the course of an interval the value of the time series will vary, following conventions of stock market analyses, we shall track a number of characteristic features of each time interval, in particular:

The initial value for a given time interval, is called the opening value and is denoted $O_t$.
The final value for a given time interval, is called the closing value and is denoted $C_t$.

The subscript t is the index of the time interval. Since the data is discretized, typically $C_t\ne O_{t+1}$. It is also useful to define an average value for a given time interval

$$\begin{aligned} M_t := \frac{O_t+C_t}{2}. \end{aligned}$$

(1)

To identify potential trends we apply a linear regression fitted to $M_t$ over the range of dates $D-\delta \le t \le D$, with residuals $\gamma _t$ of the regression defined by

$$\begin{aligned} \gamma _t := \mathrm{abs}[(l(t)-M_t] , \end{aligned}$$

(2)

where l(t) is the value of the linear regression at time t. We then define a trend function $T(\cdot )$ which takes the set of $\{M_t\}$ as input and returns $+1$ for an uptrend and $-1$ for an downwards trend, as follows

$$\begin{aligned} T(\cdot ):= {\left\{ \begin{array}{ll} ~1, &{} k \ge 0.005\cdot \mu , \qquad \gamma _t<0.02 \cdot \mu \\ -1, &{}k \le -0.005\cdot \mu , \quad \gamma _t<0.02 \cdot \mu \\ ~0, &{} \mathrm{otherwise} \end{array}\right. }, \end{aligned}$$

(3)

where k denotes the slope of the regression and the mean is given by $\mu = \mathrm{mean}(\{M_t\})$. The requirement on k corresponds to an increase or decrease of at least half of a percent of the mean. The restriction on $\gamma $ evaluates the goodness of the linear regression fit, requiring that each $\{M_t\}$ be no further than $2\%$ from the trend line. The function returns zero if there is not a robust trend in values, indicating that there is no clear trend.

Candlestick representations

While price movements in the stock market can be represented as a continuous curve that is smoothed by time averaging over some period (be it seconds, minutes, hours, or days), candlesticks were proposed as a tool to better visualize the movements. Candlesticks provide a summary of prices using four numbers - open, close, high, low - in a given period. In addition to the opening $O_t$ and closing $C_t$ values defined above, we now introduce:

The highest value $H_t$ in a given interval is the high;
The lowest value $L_t$ in a given interval is the low.

Typical lengths of the periods that candlestick describe are one day, an hour, 30 minutes, and 5 minutes. Specifically, given a time series over a certain period, a candlestick ${\mathcal {I}}_t$ for the interval t is defined by the quadruple

$$\begin{aligned} {\mathcal {I}}_t=(O_t, C_t, H_t, L_t). \end{aligned}$$

(4)

Taking the period to be a single day, this implies that $C_t-O_t$ is the change in value over the day. For $O_t>C_t$ this indicates a decrease in the value of the time series during the day, while $C_t>O_t$ implies an increase. Moreover, $C_{t-7}-O_t$ is the change in value over a given week.

A visualization of how a single candlestick is constructed from the data in the intervening period is shown in Fig. 2, following common practice we color the candlesticks red if $O_t>C_t$ and green if $C_t>O_t$, the color indicating whether the price increased or decreased over the period of the candlestick.

Each candlestick is comprised of three parts, the real body, and its lower and upper shadows. The real body $r_t$ at time t is the difference between the opening values and the closing value

$$\begin{aligned} \begin{aligned} r_t = \text {abs}(O_t-C_t). \end{aligned} \end{aligned}$$

(5)

This is represented as the central solid rectangle in the visualization of Fig. 2. The lower shadows $l_t$ and upper shadows $u_t$ at time t are defined by

$$\begin{aligned} l_t= & {} \min (O_t, C_t)- L_t, \end{aligned}$$

(6)

$$\begin{aligned} u_t= & {} H_t- \max (O_t, C_t). \end{aligned}$$

(7)

These are represented as the thin lines which extend above and below the real body in the visualization of Fig. 2. Note that in some cases the shadows may have vanishing extent, for instance for $O_t=L_t$ with $C_t< O_t$.

In the context of the stock market, asset traders may often choose to utilize these discrete candlesticks to visualize the data, as this representation provides substantially much information than the simpler line graphs of stock prices. Traders have developed a number of visual cues based on this candlestick representation—known as candlestick patterns—which are thought to forecast future asset price moves, as we discuss next.

Candlestick patterns

Candlestick patterns typically involve the relative magnitude of the high, low, open, and close values of one or two consecutive candlesticks. There is a widespread use of these patterns within the trading community, with the belief that specific configurations of candlesticks can be used to forecast future price movements²:

If a pattern predicts an uptrend will reverse to a downtrend, it is called a bearish reversal pattern.
Conversely, a bullish reversal pattern predicts a reversal of a downwards trend to an uptrend.

In this work we focus our analysis on three bearish reversal patterns patterns (Bearish Engulfing, Hanging Man, and Dark Cloud Over) and two bullish reversal patterns (Bullish Engulfing and Hammer). For the mathematical definition of the candlestick patterns we used the definitions proposed in⁷, through restrictions on their $O_t, C_t,L_t, H_t$ values and requirements on a pre-existing trend. These patterns are shown graphically in Figs. 3 and 4 and then are defined mathematically in Fig. 5.

The indices appearing in the definitions of Fig. 5 denote the time ordering such that the first candle of each pattern occurs with time stamp $D=0$, with the second candle (if any) for $D=1$. We require a trend for the preceding $\delta $ intervals, such that there is an appropriate trend over the period $D-\delta \le t \le D$ as outlined in “Trends”.

Using R we implemented a code which takes time-series data and outputs candlestick representations then scans the output for specific patterns. Some example candlestick patterns identified by our code when applied to the S &P 500 Index (GSPC) daily data are presented in Fig. 4. We show the signal event which indicates the candlestick pattern in a lighter shade. The regression line for the center of the four candlesticks preceding the candlestick patterns is shown to confirm the trend (note that we vary the required trend period in later sections).

Moving average convergence divergence

The Moving Average Convergence Divergence (MACD)³ provides an alternative set of bullish/bearish market signals which can be repurposed for general time series data. The MACD is calculated using two exponential moving averages (EMAs), calculated over two periods of differing length n. Specifically, for a given dataset of length n, usually the closing values $\{C_1,C_2, \ldots C_{n}\}$, the EMA $V_n$ is calculated recursively via

$$\begin{aligned} \begin{aligned} V_i [C_i]:= {\left\{ \begin{array}{ll} C_1 &{}i =1\\ s C_i + (1-s) V_{i-1} &{} i>1 \end{array}\right. }, \end{aligned} \end{aligned}$$

(8)

where $s = \frac{2}{n+1}$ is smoothing factor. Thus, $V_n$ can be seen as the exponential average over n intervals, which by substitutions in the recursive formula can be expressed

$$\begin{aligned} \begin{aligned} V_n = s[C_n + (1-s)C_{n-1} \ldots (1-s)^{n-1}C_1]. \end{aligned} \end{aligned}$$

(9)

Observe that the coefficient of each term decreases exponentially for earlier values in the time series, thus giving greater weighting to more recent data, hence the name. Given the EMA, the MACD is defined by the difference between a longer period average $n_2$ and a shorter period average $n_1$ (thus by convention $n_1<n_2$), as follows

$$\begin{aligned} \begin{aligned} \text {MACD}(n_1,n_2)= V_{n_1}- V_{n_2}. \end{aligned} \end{aligned}$$

(10)

Common choices for $(n_1, n_2)$ are (12, 26), which corresponds to the number of trading days in roughly two weeks and a month, and lead to the following (Fig. 6):

When the MACD has large positive values, it indicates that the values have risen more in the recent $n_1$ observations when compared with the last $n_2$ observations, signifying a strong uptrend.
Conversely, when MACD is negative, the price has fallen more in the last $n_1$ observations, signifying a recent downtrend.

MACD analyses provides signals based on “momentum” of the time series. To identify buy and sell signals, the MACD is compared to the so-called Signal Line S, defined by

$$\begin{aligned} \begin{aligned} S = V_{n_3}[\text {MACD}(n_1,n_2)]. \end{aligned} \end{aligned}$$

(11)

A common value for $n_3$ is 9, signifying a week and a half trading period.

There are many ways to use the signal line. In this paper we will focus on crossovers between the MACD and S, illustrated in Fig. 6, which are described below:

When MACD crosses from below to above the signal line, it serves as a bullish signal because the crossing signifies a strong uptrend in MACD, meaning the short-term momentum has risen faster than the long term momentum.
Conversely, when MACD crosses from above to below the signal line, it serves as a bearish signal forecasting a downturn in values.

Relative strength index

The Relative Strength Index (RSI) quantifies the momentum in the times series data through average rate of increases and decreases in value (see Fig. 7). The indicator is constructed by dividing the closing values $\{C_t\}$ over some period into two sets:

The set $\{G_t\}$ in which the series increased:
$$\begin{aligned} \{G_t\} := \frac{C_t-C_{t-1}}{C_t}, \qquad C_t>C_{t-1}. \end{aligned}$$
(12)
The set $\{D_t\}$ in which the series decreased:
$$\begin{aligned} \{D_t\}:= \frac{C_{t-1}-C_t}{C_t}, \qquad C_t<C_{t-1}. \end{aligned}$$
(13)

From the above sets one can compute the averages ${\bar{G}}_t$ and ${\bar{L}}_t $ using the EMA $V_n$ over n periods with a smoothing factor $s = \frac{1}{n}$, leading to

$$\begin{aligned} \begin{aligned} {\bar{G}}_t = V_n[G_t]~~~\mathrm{and}~ ~~D_t = V_n[D_t]. \end{aligned} \end{aligned}$$

(14)

Then, the RSI$_t$ indicator at time t is defined as follows

$$\begin{aligned} \begin{aligned} \text {RSI}_t := 100- \frac{100}{1+\frac{{\bar{G}}_t}{{\bar{D}}_t}}. \end{aligned} \end{aligned}$$

(15)

In the stock market the RSI is used to signal when an asset has become overbought (meaning it has appreciated more rapidly than thought to be typically sustaiable) or oversold. In particular, a high RSI is thought to indicate that one should anticipate a reversal from an uptrend to a downtrend in the near-term. A low RSI is interpreted in by market traders as an asset being oversold, and predicts a near-term increase in prices. We set the threshold for high and low RSI$_t$ to be 75 and 25. When the RSI reaches 25 it serves as a bullish signal and conversely, when the RSI reaches 75 this gives a bearish signal. Figure 7 gives two examples of accurate RSI signals.

Statistical methods

Following a standard hypothesis testing protocol, a given procedure tests a null hypothesis $H_0$, against a alternative hypothesis $H_1$. The testing framework then either rejects, or fails to reject, the null hypothesis. Specifically, to test whether or not these technical indicators can correctly predict trends, we formulate a testing procedure using the Wilcoxon Signed-Rank Test.

Wilcoxon signed-rank test

The Wilcoxon test¹⁶ is a nonparametric test for testing the median of a distribution, as we outline below. Previous studies by Goo et al.⁷ utilized the t test as a possible way to confirm the predictive powers of candlestick patterns. However, the t test is a parametric test, meaning that it assumes the normality of the observations. The t test studies the means of the given sample which only makes it reliable in normal samples. Notably, when⁷ was published (2007), normal distributions were a common belief for the rate of return in stock markets, however, recent studies have suggested that Laplace distributions fit better to the daily return of stock markets¹⁷. Given that the distribution of an n day return cannot be assumed as normally distributed, a non-parametric test that tests for the median, i.e. the 50th percentile of the distribution, is the most desirable.

Specifically, here we employ a One-Sample Wilcoxon Test to test for a hypothesized median. Suppose that at time t one observes a signal from one of the technical indicators outlined in the previous section, i.e. a candlestick pattern, a high/low RSI$_t$ reading, or a crossover in the MACD. We record the value of the close of that day and denote it as $C_1$, we also record the closes for n days following the signal and form a vector of values $\mathbf {C} = \{C_1, C_2, \ldots , C_n\} $. From this we can calculate the rate of return $R_i$ for i days after the observation of the signal event as follows

$$\begin{aligned} R_i = \frac{C_{i+t}- C_{t}}{C_{t}}. \end{aligned}$$

(16)

Thus we can also define a vector of rates of return following the signal event: $\mathbf {R} = \{R_1, R_2, \ldots , R_n\}$.

To proceed, we take the n-day rate of returns vector $\mathbf {R} = \{R_1, R_2, \ldots , R_n\}$ and denote the median of the set by ${\tilde{R}}$. Then we define $\{d_1, d_2, \ldots , d_n\}$ to be the difference of each $R_n$ from the median ${\tilde{R}}$ such that

$$\begin{aligned} d_t = (R_t-{\tilde{R}}). \end{aligned}$$

(17)

The null hypothesis $H_0$ and alternative hypothesis $H_1$ for the pooled sample of the rate of returns on a given day $\mathbf {R}$ can be formulated as follows. Hypothesis $H_1$ holds that at the occurrence of a bullish (bearish) signal event there should be a positive (negative) rate of return in the near-term future ${\tilde{R}} > 0$ (conversely, ${\tilde{R}} < 0$). The null hypothesis $H_0$ holds that the rate of return should be uncorrelated to the signal events, implying that the indicator under examination fails to provide accurate forecasts.

To implement the one-sample Wilcoxon test we assign to each $d_t$ sequential integers $R(d_T)\in {\mathbb {Z}}$ (a rank), assigning $R=1$ to the $d_t$ with the smallest absolute value, $R=2$ for the next smallest absolute value, and so forth, such that the rank of the $d_t$ with the largest absolute value is $R=n$. Then we define $W_1$ and $W_2$ as follows

$$\begin{aligned} \begin{aligned} W_1 = \sum _{d_t>0} R(d_t),~~~\mathrm{and}~~~ W_2 = \sum _{d_t<0} R(d_t). \end{aligned} \end{aligned}$$

(18)

The Wilcoxon test statistic $W'$ is then defined to be

$$\begin{aligned} W'= W_1-W_2. \end{aligned}$$

(19)

Note that the statistic $W'$ is essentially the difference between all the ranks of the observations below the hypothesized median ($W_1$) and the ranks of the observations above the hypothesized median ($W_2$). It is a robust way to measure a median since, if the true median is the hypothesized median, the distribution of samples should be symmetric about the median. Thus, when we rank the difference of the samples from the median, about half should be positive and about half should be negative, and the sum of their ranks should cancel.

The distribution of expected outcomes assuming $H_0$ is true, the null distribution, is centered on ${\tilde{R}} = 0$. In contrast, suppose that $H_1$ is true, then for a bullish reversal ${\tilde{R}} >0$ and there should be fewer observations below 0 (the null hypothesis median). In this case the true distribution is shifted toward the new median ${\tilde{R}} >0$ and, as a result, the test statistic $W'$ would tend to be greater.

If the true median is sufficiently different from zero, we reject the claim that its median is 0. This statement can be reformulated in terms of a random variable following the null distribution W, such that for bullish reversals the null hypothesis $H_0$ is rejected for

$$\begin{aligned} \begin{aligned} P(W> W')=\int _{W'}^\infty p(t)~\mathrm{dt} \le \alpha , \end{aligned} \end{aligned}$$

(20)

where p(t) is the Wilcoxon distribution and $\alpha $ is a constant threshold which signifies thee significance level, commonly A is taken to be 0.05 or 0.10. Conversely, for bearish reversals the $H_0$ is rejected for $P(W< W') \le \alpha $.

Calculating p values

Given the above we can compute the p value from Eq. (20) to quantify the statistical significance of positive correlations. The p value is the probability that the obtained statistic (or a more extreme statistic) occurs given that the null hypothesis is true. A low p value, below the significance levels $\alpha $, signifies the unlikeliness of the null hypothesis being true. Typical values for $\alpha $ are 0.05 and 0.10, we will adopt the latter value going forward. For an observed p value p such that $p\le \alpha =0.1$, we reject the null hypothesis and claim that the alternative hypothesis is more likely. In our testing procedure, a rejection of the null hypothesis would indicate that a given technical indicator has predicting power.

We wish to quantify whether a signal event identified by one of the technical indicators discussed above makes predictions which are statistically, significant. Specifically, the signal events under consideration are

Occurrences of a Candlestick Patterns;
MACD cross over events;
RSI values of 25 or 75.

Out R code scans the data for such signal events and then record each occurrence along with the $O_t,C_t,H_t,L_t$ values for a range of days or weeks around each occurrence. We use the first $\delta $ time intervals prior to the signal to establish the initial trend, the value of $\delta $ is stated in the analysis. We then take the $\Delta t$ time intervals after the signal events to assess whether the signal correctly predicted the future evolution of the time series.

As an initial approach towards quantifying the statistically significance of the predictions we took the set of signals and calculated the p value multiple times for different choices of $\Delta t$. This analysis is intuitive and insightful (and we present results arising from this in the Supplementary Material), however, there are a number of critical issues:

It gives multiple p values for each indicator (which can be conflicting).
The p values for each $\Delta t$ are not independent.
For 9 indicators and 10 choices of $\Delta t$ one calculates 90 p values, thus one anticipates false positives.

Hence, below we outline a more sophisticated analysis. We start with the record of all occurrences of signals for a given indicators identified by our code. Then to calculate a global p value for a given indicator, we subdivide the data of corresponding to each occurrences randomly into n subsets. Given that each signals typically occur ${\mathcal {O}}(100)$ times in the data series (cf. Table 1), we will use $n=3$ subsets. Then for each subset we calculate the p value using a different $\Delta t$ for each subset.

Notably, these subsets will be independent of each other, allowing us to calculate independent p values for each subset. Since these p values are independent we can then use the standard Fisher method¹⁸ to combine the three p values $p_i$ (with $i=1,2,3$) into the following statistic

$$\begin{aligned} \begin{aligned} X=-2\mathrm{log}(p_1)-2\mathrm{log}(p_2)-2\mathrm{log}(p_3). \end{aligned} \end{aligned}$$

(21)

The statistic X has a chi-squared distribution with 6 degrees of freedom (more generally 2n for n subdivisions of the dataset). To obtain the global p value for each indicator we then calculate the area under the chi-squared curve (with 6 degrees of freedom) which lies to the right of the value of X. Following this procedure, we report our finding for the case of new COVID infections in Table 2 and in the context of the stock market in Table 4.

Predicting new cases of COVID-19

With the technical indicators defined in “Technical indicators” and the statistical analysis of “Statistical methods”, we are now prepared to examine whether technical analysis can provide statistically significant predictions when adapted to study the near-term changes in the number of new COVID-19 daily cases. Following this we will outline and evaluate two specific use cases for these indicators, namely, identifying the peak of a wave of infections, and the commencement of subsequent pandemic waves.

Statistical significance

To investigate whether these technical indicators could be of use during a pandemic, we undertook an analysis of World Health Organisation (WHO) COVID-19 data. Specifically, we used data on the daily reported cases for 237 countries from January, 3, 2020 to July, 29, 2021¹⁹. For a given country, the starting date of the pandemic was defined to be the identification of the first case.

We then grouped the observations of new cases into weekly candlesticks $\{P_c\}$ by setting the open values as the number of new cases on the first day of each 7-day period and the close values as the number of new cases on the last day of each 7-day period. The real body of each candle was defined using the highest/lowest number of new daily cases during the corresponding 7-day period. The data was organized into 17,140 candlesticks with about 70 candlesticks for each country.

Our code identified occurrences of the various signals relating to the technical indicators under consideration across all countries, and we present the number of observation for each signal in Table 1. For calculating of the pre-existing trends to identify candlestick patterns, we use the two preceding weekly candlesticks. Following “Statistical methods”, after identifying the occurrences of each indicator in the COVID datasets for each country, we pooled the occurrences together to proceed with the statistical analysis. We dropped any indicator with less than 50 occurrences from our analysis.

Table 1 Summary of the number of observations of technical analysis signals in pooled 17,140 weekly candlesticks of COVID-19 data for 237 countries from $\text {January}, 3, 2020$ to $\text {July}, 29, 2021$ data obtained from WHO¹⁹.

Full size table

Table 2 Statistical significance of each technical indicator is shown in terms of their global p value (averaging over multiple values of $\Delta t$).

Full size table

Since we only have daily data (and not hourly) we choose to construct weekly candlesticks. For the analysis of all of the indicators we calculated the p value on a weekly time scale, thus taking $\Delta t$ to be some ${\mathcal {O}}(1)$ number weeks following a signal observation. We subdivided our data as described in “Statistical methods” and calculated the global p value for each indicator by averaging over three choices for $\Delta t$, specifically we used $\Delta t=3,5,7$ weeks. The combination of the individual p values is described in “Statistical methods”. We present the global p values from our analysis in Table 2. Additionally, as a preliminary analysis we calculated the p value while varying $\Delta t$ to see the impact, this analysis is presented in the Supplementary Material (however, as alluded to in “Statistical methods”, while this can be insightful, it encounters some technical drawbacks).

By observation of Table 2 it can been seen that some indicators certainly are statistically significant predictors of future COVID cases, while others are not. Specifically, Bullish Engulfing and (bullish) Hammer candlestick patterns, as well as both MACD indicators are all seen to be statistically significant. Notably, the p value of the Bearish MACD signal implies that this is a highly accurate indicator. Note that the Dark Cloud Over and Bullish RSI indicators had only ${\mathcal {O}}(10)$ occurrences, this is a relatively small sample which could lead to an erroneous conclusion, and thus were omitted from our analysis. We also highlight that these global p values are corroborated by the cruder—although perhaps more intuitive—local p value analysis in the Supplementary Material.

Having concluded that a subset of technical indicators can indeed provide insights into the near-term progression of the pandemic, we next explore how these indicators might be applied to gain insights into trends during an ongoing pandemic. Specifically, we next explore two particularly important use cases:

Identification of the peak of a pandemic waves.
Forecast the start of a new wave of a pandemic.

Peaks of pandemic waves

In the early stages of a pandemic (such as the current COVID-19 crisis) new daily cases grow steadily from week-to-week, perhaps with some small daily fluctuations. At this stage the number of COVID cases exhibits a clear uptrend. Accordingly governments and health official put in place policies and funding to endeavour to reduce the spread of infections. A major milestone in controlling the pandemic is to identify a peak in the daily cases. While peaks are simple to identify in retrospect, at the height of a pandemic it is far from obvious whether a decline in cases is a fluctuation or a local top. The signals we have considered each imply reversals in the trend, thus if COVID cases are growing and one observes a bearish signal using weekly data, this is a prediction that cases will begin falling over the next few weeks.

Therefore, the peak in the number of cases corresponds to a change in the trend of the pandemic, and this is precisely what the indicators that we have been studying are designed to detect. We now apply the Bearish MACD indicator in an effort to identify the peaks of the COVID-19 pandemic in a number of case studies. Notably, the Bearish MACD was the sole indicators for bearish reversals that we found to be statistically significant in Table 2. In other words, we propose that a crossing of the MACD line below the signal lines indicate a peak in infections. More specifically, this indicates the end of the first wave and one expects such crossing events to coincide with each of the peaks of the pandemic.

In Fig. 8 we present COVID-19 daily cases for Japan, South Africa, and the UK in the form of a weekly candlestick charts along with the corresponding plots of the MACD indicator. Observe that a change from uptrend to downtrend does indeed coincide with the MACD line crossing the signal line from above. Moreover, one can interpret the convergence of the MACD and signal lines as indication that cases are nearing a peak, which is indicative that current health policies are likely being effective in reducing the rate of infections.

Additional waves of the pandemic

Table 3 Summary of the number of observations of technical analysis signals in pooled 120,000 sample of 28 stocks over 10 years used in our study from Yahoo Finance²⁰.

Full size table

As evident from the COVID-19 crisis, pandemics can exhibit multiple waves of infections. A second wave refers to the case in which after an initial peak in infections there is a period in which new cases are in decline, then a subsequent reversal with daily cases growing once again.

By inspection of Fig. 8 we can see that the transition from declining cases to increasing new cases can be discerned through the observation of the Bullish MACD signal. We know that this is statistically significant predictor of new cases and this is supported by the case studies of Fig. 8, where one can clearly see that there is an apparent correlation between the crossing event and the commencement of a second wave. The observation of a Bullish Engulfing or Hammer pattern in the candlesticks would also be indicative of subsequent waves.

These tools have significant value for predicting the broad strokes of the future course of the pandemic and can used to identify when a relaxation of health restrictions (such as ending social distancing or mask mandates) is leading to a new wave of infections. The MACD analysis for Japan (Fig. 8, left) is a particular good example of how this indicator gives clear signals of subsequent peaks. The observation of a bullish signal should be used as an indicator that health restrictions must be re-established in order to regain a downtrend in new cases.

Since Fig. 8 includes data up to November 2021, this provides a COVID-19 forecast for Winter 2021. For instance this suggests that a new wave of infections is not imminent for Japan, whereas since the Korean MACD is nearly crossing one anticipates there could be a growth in new cases in early 2022. The proximity of the MACD and signal lines in the UK plot is ambiguous and thus the near-term future is unclear, but this should be taken as an indicator that health restrictions should be strengthened to mitigate the risk of increasing infections.

Application in stock markets

Finally, we also apply our statistical methods to a pool of stock market data. While other such studies have been undertaken, we believe that our use of the Wilcoxon signed-rank test and carefully averaging over multiple time periods in calculating the p value make our analysis more robust. Following the methodology of “Statistical methods”, we applied our code on a pool of stock market data based on 28 stocks and indices, including companies such as Google, Amazon, indexes such as $ S \& P $ 500 (see Supplementary Material and the Data Availability statement for full details). The data was all sampled with daily and weekly candlesticks obtained from Yahoo Finance²⁰. Table 3 summarizes the number of observations for the aforementioned signals in the compiled data set. We dropped any indicator with less than 50 occurrences from our analysis.

Table 4 Global statistical significant of each technical indicator for stock market data.

Full size table

Table 4 gives the p value of each signal using the one-sample Wilcoxon signed-rank test. While for the COVID study $\Delta t$ was measured in weeks, as we have much more data for stock prices we undertake our analysis at the time scale of both weeks and days. We take statistically significance to be p values less than 0.10.

Our results indicate that for financial data the MACD and RSI signals, as well as the Bullish Engulfing pattern, are statistically significant on the daily timeframe. Although only the Bullish MACD signal is found to be significant for financial data analysed on the weekly timeframe. We note that our findings disagree with⁷ which used a different methodology for their analysis.

Concluding remarks

The world has struggled with the COVID-19 pandemic for the past two years and increasing attention has been given to the forecasting of infectious diseases. This paper has shown that technical analysis used in asset trading can be repurposed to forecast changes in the number of new cases of COVID-19 and future pandemics. Here we analysed WHO data of the daily new COVID-19 cases for all countries and identified a number of technical indicators that make statistically significant predictions.

Since financial data and COVID data arises from very different systems, it is notable that technical analysis can provide predictions in both settings. We conjectured that these indicators work across these two systems since both can be modeled as non-stationary random walks. It is conceivable that these indicators can identify underlying trends in the time series. Moreover, some groups have expressed doubts regarding whether technical analysis has any intrinsic predictive power, and these results in relation to the pandemic provide some evidence that technical indicators are genuinely predictive.

This work presents new tools for evaluating the effectiveness of policies and practices employed in reducing the impact of the current and future pandemics. This was demonstrated in "Peaks of pandemic waves" and "Additional waves of the pandemic" where it was seen that through observations of the weekly MACD indicator one could identify both the peaks of each wave of the pandemic, as well as the onset of subsequent waves of infections. Importantly, reliable short term forecasting can provide potentially lifesaving insights into logistical planning, in particular when and where to allocate additional resources such as hospital staff and equipment.

Data availability

The datasets analysed in this are available in the Harvard Dataverse repository at:https://doi.org/10.7910/DVN/DYGZBQ.

References

Malkiel, B. A Random Walk Down Wall Street (W.W. Norton & Company, 2019).
Google Scholar
Bulkowski, T. Encyclopedia of Candlestick Charts (Wiley, 2013).
Google Scholar
Appel, G. Technical Analysis Power Tools for Active Investors. Financial Times 166 (Prentice Hall, 2005).
Google Scholar
Wilder, J. New Concepts in Technical Trading Systems (Hunter Pub, 1978).
Google Scholar
Irwin, S. & Park, C.-H. What do we know about the profitability of technical analysis?. J. Econ. Surv. 20, 20 (2007).
Google Scholar
Fama, E. Efficient capital markets: A review of theory and empirical work. J. Financ. 25(2), 383–417 (1970).
Article Google Scholar
Goo, Y., Chen, D. & Chang, Y. The application of Japanese trading strategies in Taiwan. Invest. Manage. Financ. Innov. 4, 4 (2007).
Google Scholar
Tharavanij, P., Siraprapasiri, V. & Rajchamaha, K. Profitability of candlestick charting patterns in the stock exchange of Thailand. SAGE Open 7(4), 215824401773679 (2017).
Article Google Scholar
Prado, H., Ferneda, E., Morais, L., Luiz, A. & Matsura, E. On the effectiveness of candlestick chart analysis for the Brazilian Stock Market. Proced. Comput. Sci. 22, 1136–1145 (2013).
Article Google Scholar
Caginalp, G. & Laurent, H. The predictive power of price patterns. Appl. Math. Financ. 5, 181 (1998).
Article Google Scholar
Anghel, G. Stock market efficiency and the MACD. Evidence from countries around the world. Proced. Econ. Financ. 32, 1414–1431 (2015).
Article Google Scholar
Neely, C. J., Rapach, D. E., Tu, J. & Zhou, G. Forecasting the equity risk premium: The role of technical indicators. Manage. Sci. 60(7), 1772–1791 (2014).
Article Google Scholar
Dai, Z., Zhu, H. & Kang, J. New technical indicators and stock returns predictability. Int. Rev. Econ. Financ. 71, 127–142 (2021).
Article Google Scholar
Dai, Z. F., Li, T. & Yang, M. Forecasting stock return volatility: The role of shrinkage approaches in a data-rich environment. J. Forecast. 20, 1–17 (2022).
MathSciNet Google Scholar
Dai, Z. F. & Zhu, H. Y. Time-varying spillover effects and investment strategies between WTI crude oil, Natural Gas and Chinese stock markets related to Belt and Road initiative. Energy Econ. 107, 105883 (2022).
Article Google Scholar
Wilcoxon, F. Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80 (1945).
Article Google Scholar
Toth, D., & Jones, B. Against the Norm: Modeling Daily Stock Returns with the Laplace Distribution. arXiv:1906.10325 (preprint) (2021).
Fisher, R. A. Questions and answers #14’’. Am. Stat. 2(5), 30–31 (1948).
Google Scholar
Data.humdata.org. Coronavirus (COVID-19) Cases and Deaths. https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths. Accessed 04 Sep 2021 (2021).
Yahoo Finance. https://finance.yahoo.com/.

Download references

Acknowledgements

This work was completed as part of the MIT PRIMES program. We are grateful to Laura Schaposnik for her thoughtful insights and help, and to Kent Vashaw for their comments on a draft.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Yi Liang
Department of Physics, University of Illinois at Chicago, Chicago, IL, 60607, USA
James Unwin

Authors

Yi Liang
View author publications
You can also search for this author in PubMed Google Scholar
James Unwin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.L. and J.U. together conceived the project and completed the analysis and interpretation, contributing equally.

Corresponding author

Correspondence to James Unwin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, Y., Unwin, J. COVID-19 forecasts via stock market indicators. Sci Rep 12, 13197 (2022). https://doi.org/10.1038/s41598-022-15897-x

Download citation

Received: 31 December 2021
Accepted: 30 June 2022
Published: 01 August 2022
DOI: https://doi.org/10.1038/s41598-022-15897-x
Springer Nature Limited

Associated content

COVID-19

Collection 23 February 2021

COVID-19 forecasts via stock market indicators

Abstract

Similar content being viewed by others

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Forecast evaluation for data scientists: common pitfalls and best practices

Causal inference for time series analysis: problems, methods and evaluation

Introduction