Skip to main content
Log in

Entropy and Efficiency of the ETF Market

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

We investigate the relative information efficiency of financial markets by measuring the entropy of the time series of high frequency data. Our tool to measure efficiency is the Shannon entropy, applied to 2-symbol and 3-symbol discretisations of the data. Analysing 1-min and 5-min price time series of 55 Exchange Traded Funds traded at the New York Stock Exchange, we develop a methodology to isolate residual inefficiencies from other sources of regularities, such as the intraday pattern, the volatility clustering and the microstructure effects. The first two are modelled as multiplicative factors, while the microstructure is modelled as an ARMA noise process. Following an analytical and empirical combined approach, we find a strong relationship between low entropy and high relative tick size and that volatility is responsible for the largest amount of regularity, averaging 62% of the total regularity against 18% of the intraday pattern regularity and 20% of the microstructure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The \(\delta / 2\) lowest and the \(\delta / 2\) highest observations are discarded.

  2. A (forward or reverse) stock split is a change decided by the company both in the number of its shares and in the price of the single share, such that the market capitalisation remains constant. A stock split is said to be m-for-n if m new shares are emitted for every n old ones, with a price adjustment from p to \(\frac{n}{m} p\). If \(m > n\) it is called a forward stock split, while if \(m < n\) we have a reverse stock split (or stock merge).

References

  • Aït-Sahalia, Y., Mykland, P. A., & Zhang, L. (2011). Ultra high frequency volatility estimation with dependent microstructure noise. Journal of Econometrics, 160(1), 160–175.

    Google Scholar 

  • Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance, 4, 115–158.

    Google Scholar 

  • Barndorff-Nielsen, O. E., & Shephard, N. (2004). Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics, 2, 1–48.

    Google Scholar 

  • Brownlees, C. T., & Gallo, G. M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, 2232–2245.

    Google Scholar 

  • Cajueiro, D. O., & Tabak, B. M. (2004). Ranking efficiency for emerging markets. Chaos, Solitons and Fractals, 22, 349–352.

    Google Scholar 

  • Giglio, R., Matsushita, R., Figueiredo, A., Gleria, I., & Da Silva, S. (2008). Algorithmic complexity theory and the relative efficiency of financial markets. EPL 84, 48,005

  • Grassberger, P. (2008). Entropy estimates from insufficient samplings ArXiv:physics/0307138v2

  • Oh, G., Kim, S., & Eom, C. (2007). Market efficiency in foreign exchange markets. Physica A, 382, 209–212.

    Google Scholar 

  • Risso, W. A. (2009). The informational efficiency: The emerging markets versus the developed markets. Applied Economics Letters, 16, 485–487.

    Google Scholar 

  • Shmilovici, A., Alon-Brimer, Y., & Hauser, S. (2003). Using a stochastic complexity measure to check the efficient market hypothesis. Computational Economics, 22, 273–284.

    Google Scholar 

  • Shmilovici, A., Kahiri, Y., Ben-Gal, I., & Hauser, S. (2009). Measuring the efficiency of the intraday forex market with a universal data compression algorithm. Computational Economics, 33(2), 131–154.

    Google Scholar 

  • Taylor, S. J. (2011). Asset price dynamics, volatility, and prediction. Princeton: Princeton University Press.

  • Weiss, G. (1975). Time-reversibility of linear stochastic processes. Journal of Applied Probability, pp. 831–836.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucio Maria Calcagnile.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

LMC acknowledges support from the Scuola Normale Superiore (grant number GR12CALCAG). LMC also acknowledges LIST S.p.A. for the funding of his research associate grant. SM acknowledges support from UnicreditBank through the Dynamics and Information Research Institute at the Scuola Normale Superiore.

Appendices

A Entropy Estimation

Let us suppose to have N points randomly distributed into M boxes according to probabilities \(p_1, \ldots , p_M\). The simplest way we can estimate the entropy \(H = - \sum _{i = 1}^M p_i \log p_i\) is by replacing the \(p_i\)’s with the observed frequencies \(\frac{n_i}{N}\), \(i = 1, \ldots , M\), where \(n_i\) represents the number of points in box i. Unfortunately, the estimator

$$\begin{aligned} {\hat{H}}^{\text {naive}} = - \sum _{i = 1}^M \frac{n_i}{N} \log \frac{n_i}{N} \end{aligned}$$

is strongly biased, meaning that by employing it we would make a large systematic error in the entropy estimates. A much more accurate estimator is the one derived by Grassberger (2008) and defined by

$$\begin{aligned} {\hat{H}}^{\text {G}} = - \sum _{i = 1}^M \frac{n_i}{N} \log \frac{\mathrm {e}^{G_{n_i}}}{N} , \end{aligned}$$
(22)

where the terms \(G_n\) are defined by \(G_{2n+1} = G_{2n} = - \gamma - \log 2 + \frac{2}{1} + \frac{2}{3} + \frac{2}{5} + \ldots + \frac{2}{2n-1}\), with \(\gamma = 0.577215\ldots \) representing Euler’s constant. For the bias \(\varDelta {\hat{H}}^{\text {G}} = \mathbb {E}[{\hat{H}}^{\text {G}}] - H\) of this estimator, it holds

$$\begin{aligned} 0< - \varDelta {\hat{H}}^{\text {G}} < 0,1407\ldots \times \frac{M}{N} . \end{aligned}$$

In order to make it clear how the notations used in this section relate to the ones used in the rest of the paper, we remark that we use Grassberger’s estimator to estimate the entropies of order k of sources with binary or ternary alphabet, so that we typically have \(M = 2^k\) or \(M = 3^k\). The number N and the \(n_i\)’s represent respectively the number of non-overlapping k-blocks in the observed symbolic sequence and the number of occurrences of each one of the symbolic strings of length k.

B Details on the Shannon entropy of the processes AR(1) and MA(1)

1.1 A Geometric Characterisation of the Shannon Entropies \(H_k^{AR(1)}\)

In this section we give a general characterisation of the Shannon entropies \(H_k^{AR(1)}\), for all \(k = 1, 2, \ldots \), in terms of the entropy of some partition of the unit sphere \(\mathbb {S}^{k-1} = \{ \mathbf {x}\in \mathbb {R}^k \, | \, ||\mathbf {x}|| = 1 \}\). In principle, the same path can be followed to obtain an analogous general characterisation for the entropies \(H_k^{MA(1)}\). However, this does not seem to be feasible, since for the process MA(1) the general formulas for the conditional distributions of \(X_k\), given \(X_1,X_2,\ldots ,X_{k-1}\), are not as simple as the ones for the process AR(1), which is Markov.

Let \(s_1^k \in \{ 0,1 \} ^k\) be one of the \(2^k\) binary strings of length k. According to the symbolisation (8), it corresponds to the event \(\{ X_1 \in I_1, \ldots , X_k \in I_k \}\), where \(I_i = (-\infty ,0)\) if \(s_i = 0\) and \(I_i = (0,\infty )\) if \(s_i = 1\). For the process AR(1) we have

and therefore

$$\begin{aligned}&\mu (s_1^k) = \int \limits _{I_1} \! \! \frac{1}{\sqrt{2 \pi } \frac{\sigma }{\sqrt{1-\phi ^2}}} e^{-\frac{1}{2} \left( \frac{X_1}{\frac{\sigma }{\sqrt{1-\phi ^2}}} \right) ^2} \! \! \! \int \limits _{I_2} \! \! \frac{1}{\sqrt{2 \pi } \sigma } e^{-\frac{1}{2} \left( \frac{X_2 - \phi X_1}{\sigma } \right) ^2} \! \ldots \nonumber \\&\quad \! \ldots \! \int \limits _{I_k} \! \! \frac{1}{\sqrt{2 \pi } \sigma } e^{-\frac{1}{2} \left( \frac{X_k - \phi X_{k-1}}{\sigma } \right) ^2} \mathrm {d}X_k \, \ldots \, \mathrm {d}X_2 \, \mathrm {d}X_1 . \end{aligned}$$
(23)

Let us now consider the normalising linear transformation

$$\begin{aligned} \left\{ \begin{array}{rcl} Y_1 &{} = &{} \frac{1}{\frac{\sigma }{\sqrt{1-\phi ^2}}} X_1\\ Y_2 &{} = &{} \frac{X_2 - \phi X_1}{\sigma }\\ &{} \vdots &{}\\ Y_k &{} = &{} \frac{X_k - \phi X_{k-1}}{\sigma } \end{array} \right. , \end{aligned}$$
(24)

described in matrix form by \(Y = A_\phi X\), with

$$\begin{aligned} A_\phi = \frac{1}{\sigma } \begin{pmatrix} \sqrt{1-\phi ^2} &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 &{}\quad 0\\ -\phi &{}\quad 1 &{}\quad 0 &{}\quad \ldots &{}\quad 0 &{}\quad 0\\ 0 &{}\quad -\phi &{}\quad 1 &{}\quad \ldots &{}\quad 0 &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad -\phi &{}\quad 1\\ \end{pmatrix} . \end{aligned}$$

The random variables \(Y_t\) are \({\mathcal {N}} (0,1)\) and Eq. (23) can be written

$$\begin{aligned} \mu (s_1^k) = \int \limits _{I'} \! \! \frac{1}{(2 \pi )^\frac{k}{2}} e^{-\frac{1}{2} \left( Y_1^2 + Y_2^2 + \ldots + Y_k^2 \right) } \, \mathrm {d}Y_1 \, \ldots \, \mathrm {d}Y_k , \end{aligned}$$
(25)

where \(I' = A_\phi (I_1 \times I_2 \times \ldots \times I_k)\). The integral in Eq. (25) is equal to the fraction of k-dimensional solid angle determined by the cone \(I'\), or, equivalently, to the fraction of hypersphere \(\frac{\lambda (I' \cap \mathbb {S}^{k-1})}{\lambda (\mathbb {S}^{k-1})}\), being \(\lambda \) the Lebesgue measure.

The \(2^k\) solid angles of the form \(I'\), corresponding to the strings of k binary symbols, are those that result from sectioning the k-dimensional Euclidean space with the hyperplanes \(\pi _1\), \(\pi _2\), \(\pi _3\), ..., \(\pi _k\) of equations

The problem of calculating the measures \(\mu (s_1^k)\) in Eq. (2) has thus been translated into a purely geometric problem: calculating the solid angles in \(\mathbb {R}^k\) cut by the hyperplanes \(\pi _i\), \(i = 1, \ldots , k\). The entropy of Eq. (2) is thus nothing else than the entropy of the partition of \(\mathbb {S}^{k-1}\) determined by the hyperplanes \(\pi _i\).

1.2 Proofs of the Propositions of Sect. 2.3

Proof

(Proof of Proposition 1) First note that, if \(\{ \epsilon _t \}_t\) is a Gaussian white noise, then also \(\{ \epsilon _t^\prime \}_t = \{ (-1)^t \epsilon _t \}_t\) is a Gaussian white noise and it is indeed the same process as \(\{ \epsilon _t \}_t\) since a Gaussian random variable \(\epsilon _t\) has the same distribution as its opposite \(- \epsilon _t\). The AR(1) process defined by \(X_t^\prime = - \phi X_{t-1}^\prime + \epsilon _t^\prime \) has the MA(\(\infty \)) form

$$\begin{aligned} X_t^\prime = \sum _{i=0}^\infty (- \phi )^i \epsilon _{t-i}^\prime = \sum _{i=0}^\infty (-1)^i \phi ^i (-1)^{t-i} \epsilon _{t-i} = \sum _{i=0}^\infty (-1)^t \phi ^i \epsilon _{t-i} . \end{aligned}$$

Thus we have \(X_t^\prime = (-1)^t X_t\), for all t. This relation between the two continuous-state processes \(\{ X_t^\prime \}\) and \(\{ X_t \}\) translates into an analogous one for the binary processes \(S^\prime = \{ s_t^\prime \}_t\) and \(S = \{ s_t \}_t\) defined by discretisation as in (8). This means that a single realisation of the process \(\{ X_t \}_t\) (or, equivalently, of the process \(\{ \epsilon _t \}_t\)) produces two binary sequences s and \(s^\prime \) for which it holds \(s_t = s_t^\prime \) for even t and \(s_t = - s_t^\prime \) for odd t. We therefore have a bijective correspondence between realisations of \(\{ s_t \}\) and of \(\{ s_t^\prime \}\) which also preserves the measure, that is, it holds

$$\begin{aligned} \mu _S (s_{t_1}^{t_k}) = \mu _{S^\prime } (s_{t_1}^{t_k \prime }) , \quad \text {for all } k \text { and all } t_1 \le \ldots \le t_k . \end{aligned}$$
(26)

The following diagram provides a picture of the process isomorphism:

From Eq. (26) it follows that \(H_k^{AR(1)} (\phi ) = H_k^{AR(1)} (- \phi )\), which means that (i) is proved.

Equality (ii) is proved in the very same way as for (i), by noting that the MA(1) process defined by \(X_t^\prime = \epsilon _t^\prime - \theta \epsilon _{t-1}^\prime \), with \(\epsilon _t^\prime = (-1)^t \epsilon _t\) for all t, is isomorphic to that defined by \(X_t = \epsilon _t - \theta \epsilon _{t-1}\).

Finally, (iii) and (v) follow immediately from (i), while (iv) and (vi) follow immediately from (ii). \(\square \)

Proof

(Proof of Proposition 2) If \(\{ \epsilon _t \}_t\) is a Gaussian white noise process defining the process \(\{ X_t \}_t\) (either AR(1) or MA(1)), the white noise \({\bar{\epsilon }} = \{ - \epsilon _t \}_t\) defines the process \({\bar{X}} = \{ - X_t \}_t\). This is actually isomorphic to the process X itself, since the random variables \(\epsilon _t\) have the same distributions as their opposites. The processes S and \({\bar{S}}\), discretised versions of the processes X and \({\bar{X}}\), are therefore isomorphic and the thesis follows. \(\square \)

Proof

(Proof of Proposition 5) The quantities \(\mu (0 \cdot ^i 0)\) and \(\mu (0 \cdot ^i 1)\) are the probabilities of the events \(\{ X_1< 0 \} \cap \{ X_{i+2} < 0 \}\) and \(\{ X_1 < 0 \} \cap \{ X_{i+2} > 0 \}\), respectively. Recall that \(X_{i+2} | X_1 \sim {\mathcal {N}} \big (\phi ^{i+1} X_1,\frac{1-\phi ^{2(i+1)}}{1-\phi ^2} \sigma ^2 \big )\). Thus, proceeding as in Sect. B.1, we are left with calculating the measures of the subsets of \(\mathbb {S}^1\) cut by the lines in \(\mathbb {R}^2\) given by equations \(x_1 = 0\) and \(\frac{\phi ^{i+1}}{\sqrt{1-\phi ^{2(i+1)}}} x_1 + x_2 = 0\). Equalities (9) and (10) follow immediately. \(\square \)

Proof

(Proof of Proposition 6) Just as in Proposition 5, \(\mu (00)\) and \(\mu (01)\) are the probabilities of the events \(\{ X_1< 0 \} \cap \{ X_2 < 0 \}\) and \(\{ X_1 < 0 \} \cap \{ X_2 > 0 \}\), respectively. Since the conditional distribution of \(X_2 | X_1\) is \({\mathcal {N}} (\frac{\theta }{1+\theta ^2} X_1,(\frac{1+\theta ^2+\theta ^4}{1+\theta ^2} \sigma ^2))\), we have that \(\mu (00)\) and \(\mu (01)\) are the relative measures of the subsets of \(\mathbb {S}^1\) cut by the lines of equations \(x_1 = 0\) and \(\frac{\theta }{\sqrt{1+\theta ^2+\theta ^4}} x_1 + x_2 = 0\). Expressions (11) and (12) follow straightforwardly.

Finally, equality (13) is easily proved by noting that the conditional distribution of a random variable \(X_t\), given \(X_{t-i}\) with \(i \ge 2\), is the same as its unconditional distribution because \(X_t\) and \(X_{t-i}\) (\(i \ge 2\)) are independent. \(\square \)

C Data Cleaning

1.1 Outliers

In order to detect and remove outliers, that may be present in high frequency data for example because of errors of transmission, we use the algorithm proposed in Brownlees and Gallo (2006). Though it was originally developed for tick-by-tick prices, we apply it to 1-min data by setting the parameters suitably. The algorithm is designed to identify and remove the price records which are too distant from a mean value calculated in their neighbourhood. To be precise, a price \(p_i\) in the price series is removed if

$$\begin{aligned} |p_i - {\bar{p}}_i (k)| \ge c \, s_i (k) + \gamma , \end{aligned}$$
(27)

where \({\bar{p}}_i (k)\) and \(s_i (k)\) are respectively the \(\delta \)-trimmedFootnote 1 sample mean and sample standard deviation of the k price records closest to time i, c is a constant amplifying the standard deviation and \(\gamma \) is a parameter that allows to avoid cases of zero variance (e.g., when there are k equal consecutive prices).

We take \(k = 20\), \(\delta = 10\%\), \(c = 5\), \(\gamma = 0.05\). Results of this outlier detection procedure lead to the removal of a number of 1-min observations ranging from 9 to 310 across the 55 ETFs. Their distribution with respect to the time of the day shows that these observations occur for the great majority at the very beginning and at the very end of the trading day, suggesting that the algorithm spuriously identifies as outliers some genuine observations where high variability is physiological. However, the number of 1-min observations detected as outliers is so limited (about one every three days in the worst case) that even spurious removal has negligible impact on the results.

1.2 Stock Splits

Price data made available by data providers are generally adjusted for stock splitsFootnote 2 To detect possible unadjusted splits, we check the condition

$$\begin{aligned} |r| > 0.2 \end{aligned}$$

in the return series. This procedure would detect, for example, a 3-for-2 split or a 4-for-5 merge.

In our data we find four unadjusted splits, which we pointwise remove from the return series.

1.3 Intraday Volatility Pattern

As is well known, the volatility of intraday returns has a periodic behaviour. It is higher near the opening and the closing of the market, showing a typical U-shaped profile every day. Moreover, events like the release of news always at the same time, or the opening of another market, contribute to create a specific intraday pattern that is the basic volatility structure of each day. We filter out the intraday volatility pattern from the return series by using the following simple model with intraday volatility factors. If \(R_{d,t}\) is the raw return of day d and intraday time t, we define the rescaled return

$$\begin{aligned} {\tilde{R}}_{d,t} = \frac{R_{d,t}}{\zeta _t} , \end{aligned}$$
(28)

where

$$\begin{aligned} \zeta _t = \frac{1}{N_{\text {days}}} \sum _{d'} \frac{|R_{d',t}|}{s_{d'}} , \end{aligned}$$
(29)

where \(N_{\text {days}}\) is the number of days in the sample and \(s_{d'}\) is the standard deviation of absolute intraday returns of day \(d'\).

In this paper, we refer to the rescaled returns \({\tilde{R}}\) defined by Eq. (28) as deseasonalised returns.

As an example, we report in Fig. 10 the intraday volatility profile of the DIA 1-min return series.

Fig. 10
figure 10

Intraday volatility profile of 1-min returns of the DIA ETF

1.4 Heteroskedasticity

Deseasonalised return series, as defined by Eqs. (28) and (29), possess no residual intraday volatility structure, but they are still heteroskedastic, since different days can have different levels of volatility. In order to remove this heteroskedasticity, we estimate the time series of local volatility \(\sigma _t\) and define the standardised returns by

$$\begin{aligned} r_t = \frac{{\tilde{R}}_t}{\sigma _t} . \end{aligned}$$
(30)

As a proxy of the local volatility, we use the realised absolute variation [see Andersen and Bollerslev (1997), Barndorff-Nielsen and Shephard (2004)]. Let the logarithmic price p(t) be generated by a process

$$\begin{aligned} \mathrm {d}p (t) = \mu (t) \, \mathrm {d}t + \sigma (t) \, \mathrm {d}W (t) , \end{aligned}$$
(31)

where \(\mu (t)\) is a finite variation process, \(\sigma (t)\) a càdlàg volatility process and W(t) a standard Brownian motion. Divide the interval [0, t] into subintervals of the same length \(\delta \) and denote by \(r_i\) the return at time i, \(p (i \delta ) - p ((i-1) \delta )\). Then the following probability limit holds:

$$\begin{aligned} \text {p}-\lim _{\delta \searrow 0} \delta ^{\frac{1}{2}} \sum _{i=1}^{\lfloor t/\delta \rfloor } |r_i| = \mu _1 \int _0^t \sigma (s) \, \mathrm {d}s , \end{aligned}$$

where \(\mu _1 = \mathbb {E}(|u|) = \sqrt{\frac{2}{\pi }} \simeq 0.797885\), \(u \sim {\mathcal {N}} (0,1)\).

Our estimator of local volatility is based on these quantity and is defined by the exponentially weighted moving average

$$\begin{aligned} {\hat{\sigma }}_{\text {abs},t} = \mu _1^{-1} \alpha \sum _{i > 0} (1-\alpha )^{i-1} |r_{t-i}| , \end{aligned}$$
(32)

where \(\alpha \) is the parameter of the exponential average to be specified. We take \(\alpha = 0.05\) for the 1-min data and \(\alpha = 0.25\) for the 5-min data, corresponding to a half-life time of nearly 14 min.

Filtering out the heteroskedasticity by means of Eq. (30), with the volatility estimated by Eq. (32), considerably reduces the excess kurtosis of the returns distribution for all the ETFs, thus proving to be an effective method. For instance, for the FXI ETF the excess kurtosis of 1-min returns equals 11.87 before removing the heteroskedasticity and 0.88 after doing it. Figure 11 shows the histograms of FXI intraday 1-min returns, the intraday pattern being already removed, before and after the removal of heteroskedasticity by means of Eq. (30). As can be seen in the figures, there is a spike at 0 representing the great number of zero returns, due to the discreteness of price.

Fig. 11
figure 11

Histograms of 1-min returns of the FXI ETF, before (left) and after (right) removing the heteroskedasticity. The intraday pattern has already been filtered out in both series

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Calcagnile, L.M., Corsi, F. & Marmi, S. Entropy and Efficiency of the ETF Market. Comput Econ 55, 143–184 (2020). https://doi.org/10.1007/s10614-019-09885-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-019-09885-z

Keywords

Navigation