Skip to main content

What Can We Learn from Statistical Regularities in Stock Returns? Insights from An Entropy-Constrained Framework

  • Conference paper
  • First Online:
Crises and Uncertainty in the Economy
  • 203 Accesses

Abstract

In this chapter, we investigate the US stock market dynamics over bull markets, bear markets and corrections through the lenses of statistical equilibrium. By making use of an entropy-constrained framework, we build a theoretical model to recover the cross-sectional distribution of daily returns of individual companies listed on the S&P 500, over the period 1988–2019. The results of the model shed light on the microscopic as well as macroscopic behavior of the stock market, in addition to providing insights in terms of stock returns distribution and investors’ behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Woo et al. (2020) provide an extensive literature review of models of stock returns starting from 1960.

  2. 2.

    This is further exacerbated when we move to longer holding-period returns, in the order of decades, which still pose considerable econometric challenges (see Campbell et al. 1997, pp. 59–80, Fama and French 2018). In this case, not only the amount of available observations is drastically reduced, thus jeopardizing the reliability of the estimates themselves, but the difficulty of distinguishing between a permanent and transitory component prevents us from detecting potential sources of predictability in stock returns.

  3. 3.

    The root of information theory can be traced back to algorithmic complexity theory developed by Kolmogorov (1968) and Chaitin (1966). According to this theory, a series of symbols is considered unpredictable if the information embodied in it cannot be reduced to a more compact form.

  4. 4.

    Given that, during this period, the inflation rate in the United States has been relatively constant, we only consider the nominal rate of returns.

  5. 5.

    The standard errors for skewness and kurtosis estimates under the null hypothesis of normality are \(\sqrt {\frac {24}{3{,}004{,}150}}=0.0028\) and \(\sqrt {\frac {6}{3{,}004{,}150}}=0.0014\) respectively.

  6. 6.

    An interactive version of the model is available here: https://emanuelecitera.shinyapps.io/Shiny/.

  7. 7.

    Note that symmetry in the distribution is not a necessary criterion for the statistical equilibrium. However, it can be interpreted as an indicator of rapid appreciation/depreciation of stock prices in certain sub-markets before or after a crisis.

  8. 8.

    See Appendix 2 for the analytical derivation of the model.

  9. 9.

    For a detailed discussion on how conventions regulate financial markets see Citera and Sau (2021).

  10. 10.

    See Appendix 2 for further detail.

  11. 11.

    It is important to note that there are other ways to derive the above result through entropy maximization, and this is just one of them. Foley (2020a) provides a thorough analysis of entropy-constrained behavior and its applications to economic theory.

  12. 12.

    The optimization algorithm for the KL minimization is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, with the following initial conditions for the parameters: μ = 0, T = 1, α = 0, S = 1.

References

Download references

Acknowledgements

I am indebted to Duncan K. Foley, whose continuous support and valuable help have been crucial in working out the details of this research. I also wish to thank Mark Setterfield, Willi Semmler, and Ellis Scharfenaker, for their helpful remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuele CITERA .

Editor information

Editors and Affiliations

Appendices

Appendix 1

The Maximum Entropy Principle

The cumbersome task social scientists face is the mutual dependence of individual (subjective) actions and observable (objective) outcomes, which result from complex dynamics not directly observable from the data. This causes the problem at hand to be underdetermined, thus raising the importance of a method of inference to construct the interactions of human actions and outcomes, as expressed by their joint probability distribution.

One powerful approach to underdetermined problems of this type is the maximum entropy method, championed particularly by the physicist E. T. Jaynes (2003). Jaynes’ general idea is to maximize the entropy of the unknown and underdetermined distribution subject to constraints expressing whatever information from observation or theory are relevant. Due to the strict concavity of the entropy function, as long as the constraints describing the available information represent a non-empty convex set in the space of distributions, the maximum entropy program will define some maximizing distribution that is a candidate statistical equilibrium of the model. The substantive interest of the resulting maximum entropy distribution depends on the information expressed by the constraints. At this stage, an important qualification is necessary.

Generally speaking, maximizing entropy maximizes the uncertainty of the system and gets the least informative state with no additional assumptions other than the existing knowledge of the researcher and observed data. However, constrained maximum entropy can be used either as a method of “rational inference”(Golan 2018), where maximization of entropy resolves the residual uncertainty once relevant information has been introduced via constraints, or as a “statistical equilibrium theory”, which assumes that a system is observed in a state of equilibrium characterized by the decay of transients and is reaching a state of maximum entropy subject to constraints. Even though the formalism is the same for both methods, the interpretation of the results can be rather different. In our case, we deploy the maximum entropy principle as a statistical equilibrium theory.

The foundational concept in information theory is informational or Shannon (1948) entropy. In economic applications of information theory, entropy is most often described as a measure of uncertainty, but it is more helpful to think of entropy as the lack of predictability. Given a random variable x ∈ X, with probability distribution f(x), the Shannon entropy, \(\mathcal {H}[x]\) is defined as follows:

$$\displaystyle \begin{aligned} \mathcal{H}[x] = \mathbb{E}[-\log[f[x]] = - \sum_{x \in X} f[x]\log[f[x]] \end{aligned} $$
(7.11)

Let the rate of return on stock be \(r\in \mathbb {R}\) and \(a\in \mathbb {A}\) the set of quantal actions of selling and buying stocks, a = {sell, buy}, respectively. Our objective is to determine the equilibrium joint distribution f[a, r] with the marginal and conditional frequencies f[r], f[a], f[a|r], f[r|a] such that:

$$\displaystyle \begin{aligned} f[r]=\sum_{a}{f[a,r]} \, , \quad f[a]=\int_{r}f[a,r]\,dr \,;\end{aligned}$$
$$\displaystyle \begin{aligned} f[a|r]=\frac{f[a,r]}{f[r]} \,\, \text{if} \, f[r]>0 \, , \quad f[r|a]=\frac{f[a,r]}{f[a]} \,\, \text{if} \, f[a]>0. \end{aligned}$$

We write sums over outcomes, r, as integrals with the understanding that in theoretical applications r is treated as real-valued. In empirical applications, however, measurements will inevitably be coarse-grained in a finite number of bins. We also omit the limits on integrals with the understanding that the sums are over the range of r.

If we were to maximize the entropy of the joint distribution f[a, r], \(-\int \sum _{a}f[a,r]\log [f[a,r]]\,dr\), without the introduction of any constraint (except the normalization of the sum of the joint frequencies to unity), we would find that the entropy is maximized when the aggregate outcome r is independent of the individual actions a. Since this result sheds no light on the process through which actions determine the outcome (it just returns the information already known through the observation of f[r]), we need to construct a theory of investors’ behavior and how it impacts the social outcome by expressing it in terms of moment constraints.

Finally, we should note that Shannon entropy is only one of the possible models of entropy that can be adopted. Indeed, the financial literature has adopted both Rényi (1961) and Tsallis (1988) measures of entropy, that are generalizations of Shannon entropy. However, there are two reasons to prefer the latter over the generalized entropy measures. First, both Renyi and Tsallis provide parametric entropy measures, that attach completely different weights to extremely rare and regular events. As suggested by Batra and Taneja (2020), they might not be entirely appropriate to analyze stock market data series. Second, Shannon entropy has been most successful in the treatment of equilibrium systems, which is the intended purpose of our analysis.

1.1 The Derivation of Investor’s Behavior

Let us assume that the typical agent’s response probability f[a|r] depends on the payoff u for choosing an action, which is the difference between the expected outcome variable r and the agent’s expected average payoff, or fundamental valuation of r, which we call μ, such that we can write the payoff function as: u[a, r] = r − μ.

If an investor chooses a mixed strategy \(f[a|r]: \mathbb {A} \times \mathbb {R} \rightarrow (0,1)\) to maximize the expected payoff ∑a f[a|r]u[a, r], then the informational entropy is:

$$\displaystyle \begin{aligned} \mathcal{H}[f[a|r]] = -\sum_{a} f[a|r]\log[f[a|r]] \end{aligned} $$
(7.12)

The entropy maximization program reads:

IMAGE
(7.13)

Here we are imposing a constraint on our uncertainty of f[a|r] for a set of agents subject to the condition that individuals have a minimum expected payoff for acting (a sort of “satisficing” behavior a \(l\grave {a}\) Simon 1955). The associated Lagrangian has the following form:

$$\displaystyle \begin{aligned} \mathcal{L}= -\sum_{a} f[a|r]\log[f[a|r]] - \lambda \left(\sum_{a}f[a|r]-1 \right) + \beta \left( \sum_{a} f[a|r]u[a,r]- U_{min}\right) \end{aligned} $$
(7.14)

The solution to this program gives the maximum entropy distribution, which turns out to be the logit quantal response distribution (with \(\beta =\frac {1}{T}\)):Footnote 11

$$\displaystyle \begin{aligned} f[buy|r]&=\frac{1}{1+e^{\frac{u[a,r]}{T}}} & f[sell|r]&=\frac{1}{1+e^{-\frac{u[a,r]}{T}}} \end{aligned} $$
(7.15)

Our assumption about the agents’ behavior implies that choice decisions are best described as a probabilistic phenomenon as opposed to the deterministic rational theory of choice, which assumes choices are always associated with probabilities equal to unity. In this sense, an interesting feature of the informational entropy constrained model is that it gives meaning to the observed dispersion of behavior as the relative payoff of different actions, thus generating an “entropy-constrained behavior”. Interestingly enough, entropy-constrained behavior leads to the logit quantal response distribution without imposing any prior distributional assumption on the errors that affect the decision-making process.

Appendix 2

Model Derivation and Inference

The maximum entropy problem that incorporates the behavioral and feedback constraints on the joint distributions reads as follows:

IMAGE
(7.16)

To solve this maximum entropy problem, it is convenient to write the joint entropy as the entropy of the marginal distribution plus the average entropy of the conditional distribution and solve for f[r]:

$$\displaystyle \begin{aligned} \mathcal{H}[a,r] =& -\mathcal{H}[r]+\int f[r]\mathcal{H}_{T,\mu}[r] \, dr\\ =&-\int f[r]\log[f[r]]dr+\int f[r]\sum_a f[a|r]\log[f[a|r]] \, dr \end{aligned} $$
(7.17)

where \(\mathcal {H}_{T,\mu }[r]\) denotes the binary entropy function:

$$\displaystyle \begin{aligned} \mathcal{H}_{T,\mu}[r] &= -\sum_{a} f[a|r]\log[f[a|r]] \\&= -\left( \frac{1}{1+e^{-\frac{r-\mu}{2T}}} \log \left[\frac{1}{1+e^{-\frac{r-\mu}{2T}}}\right] + \frac{1}{1+e^{\frac{r-\mu}{2T}}} \log\left[\frac{1}{1+e^{\frac{r-\mu}{2T}}}\right] \right) \end{aligned} $$

The final maximum entropy program reads:

IMAGE
(7.18)

This programming problem has the following associated Lagrangian:

$$\displaystyle \begin{aligned} \mathcal{L}[f[r],\lambda,\gamma] &= -\mathcal{H}[r] + \int f[r]\mathcal{H}_{T,\mu}[r]dr - \lambda \left(\int f[r]\,dr - 1\right) + \\ & \quad - \left(\int \tanh \left[\frac{r-\mu}{2T}\right] \left(\frac{r-\alpha}{S}\right) f[r] \,dr - \delta \right) \end{aligned} $$
(7.19)

The first-order conditions for maximizing entropy of the joint and conditional frequencies require:

$$\displaystyle \begin{aligned} \frac{\partial{\mathcal{L}}}{\partial{f[r]}} = -\log[f[r]]-1-\lambda+\mathcal{H}_{T,\mu}[r]-\tanh\left[\frac{r-\mu}{2T}\right]\left(\frac{r-\alpha}{S}\right) = 0 \end{aligned} $$
(7.20)

The solution to this maximum entropy problem gives the most probable distribution of outcomes, that is the marginal distribution \(\hat {f}[r]\), that satisfies the constraints and has the following form:

$$\displaystyle \begin{aligned} \hat{f}[r] = \frac{e^{\mathcal{H}_{T,\mu}[r] - \tanh\left[\frac{r-\mu}{2T}\right]\left(\frac{r-\alpha}{S}\right)}}{\int e^{\mathcal{H}_{T,\mu}[r] - \tanh\left[\frac{r-\mu}{2T}\right]\left(\frac{r-\alpha}{S}\right)}\, dr} \end{aligned} $$
(7.21)

As we can see, in the marginal distribution we introduced another parameter, S, which represents the market temperature (scale) of the feedback constraint.

The predicted marginal distribution \(\hat {f}[r]\) from Eq. (7.10) is a Kernel to the maximum entropy program, and together with the parameters μ, T, γ, α, and S provides a multinomial distribution for the model. From a Bayesian perspective, the empirical marginal distribution, \(\bar {f}[r]\) can be thought of as a sample of a multinomial model with frequencies f[r] determined by Eq. (7.10). We use the Kullback-Leibler (KL) divergence as an approximation to the log posterior probability for the multinomial model since it allows us to make posterior inferences about the parameter estimates (Scharfenaker and Foley 2017, p. 15).

The KL divergence measures the discrepancy between the empirical marginal frequencies, \(\bar {f}[r]\), and the predicted marginal frequencies \(\hat {f}[r;\mu ,T,\alpha ,S]\), inferred from the maximum entropy Kernel as:

$$\displaystyle \begin{aligned} D_{KL}[\hat{f}[r]||\bar{f}[r]] = \sum{\hat{f}[r] \log \left[\frac{\hat{f}[r]}{\bar{f}[r]}\right]} \end{aligned} $$
(7.22)

As a result, the KL divergence provides us with a tool to compare the observed marginal frequency distribution with the predicted marginal distribution. If \(\hat {f}[r]=\bar {f}[r]\), the D KL becomes zero indicating that the two distributions are the same. Therefore, the smaller the KL divergence, the closer the observed distribution to the predicted one, and the better the fit is.

The set of parameters θ = {μ, T, α, S} are estimated jointly by minimizing the KL divergence.Footnote 12 To measure the closeness of the model fit, we use the information distinguishability criteria (ID) introduced by Soofi and Retzer (2002). The ID measure shows approximately how much of the informational content of the observed frequencies is captured from the results of the maximum entropy program and is defined as follows:

$$\displaystyle \begin{aligned} ID[\hat{f}[r]:\bar{f}[r]] = 1 - e^{-D_{KL}[\hat{f}[r]||\bar{f}[r]]} \end{aligned} $$
(7.23)

Finally, to make posterior inferences about the vector of the model parameters θ, we compute the conditional distribution of each parameter holding all others at their maximum posterior probability estimate \(\bar {\theta }\). Following Scharfenaker and Foley (2017), we approximate the conditional posterior probability of the parameters as follows:

$$\displaystyle \begin{aligned} P[i|\bar{\theta}_{-i}] \sim e^{-nD_{KL}[\hat{f}[r;\,\mu,\,T,\,\alpha,\,S],\,\bar{f}[r]]} \,, \end{aligned} $$
(7.24)

where i denotes each element in the parameter set and n the number of observations. In the case of ζ, we vary only μ by holding \(\bar {\theta }_{-\mu }\) at their maximum posterior estimates.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

CITERA, E. (2022). What Can We Learn from Statistical Regularities in Stock Returns? Insights from An Entropy-Constrained Framework. In: BEN AMEUR, H., FTITI, Z., LOUHICHI, W., PRIGENT, JL. (eds) Crises and Uncertainty in the Economy. Springer, Singapore. https://doi.org/10.1007/978-981-19-3296-0_7

Download citation

Publish with us

Policies and ethics