How to combine a billion alphas

Kakushadze, Zura; Yu, Willie

doi:10.1057/s41260-016-0004-9

How to combine a billion alphas

Original Article
Published: 14 July 2016

Volume 18, pages 64–80, (2017)
Cite this article

Journal of Asset Management Aims and scope Submit manuscript

Zura Kakushadze^1,2 &
Willie Yu³

129 Accesses
9 Citations
Explore all metrics

Abstract

We give an explicit algorithm and source code for computing optimal weights for combining a large number N of alphas. This algorithm does not cost \({\mathcal {O}}(N^3)\) or even \({\mathcal {O}}(N^2)\) operations but is much cheaper, in fact, the number of required operations scales linearly with N. We discuss how in the absence of binary or quasi-binary “clustering” of alphas, which is not observed in practice, the optimization problem simplifies when N is large. Our algorithm does not require computing principal components or inverting large matrices, nor does it require iterations. The number of risk factors it employs, which typically is limited by the number of historical observations, can be sizably enlarged via using position data for the underlying tradables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Decoding stock market with quant alphas

Article 17 August 2017

Zura Kakushadze & Willie Yu

On the determination of the number of factors using information criteria with data-driven penalty

Article 04 June 2015

Joakim Westerlund & Sagarika Mishra

Bayesian generalized additive model selection including a fast variational option

Article 15 December 2023

Virginia X. He & Matt P. Wand

Notes

Here “alpha” – following the common trader lingo – generally means any reasonable “expected return” that one may wish to trade on and is not necessarily the same as the “academic” alpha. In practice, often the detailed information about how alphas are constructed may not even be available, e.g., the only data available could be the position data, so “alpha” then is a set of instructions to achieve certain stock (or some other instrument) holdings by some times \(t_1,t_2,\dots \)
In the olden days the alpha weights would have to be nonnegative. In many practical applications this is no longer the case as only the “mega-alpha” is traded, not the individual alphas.
With no position bounds, trading costs, etc. – these do not affect the point we make here.
By binary “clusters” we mean that each alpha would belong to one and only one “cluster”. By quasi-binary clusters we mean that this would be mostly the case but a (small) fraction of alphas could possibly belong to multiple (at most several) “clusters”. This is analogous to binary and quasi-binary (i.e., where we have some conglomerates belonging to multiple industries, sub-industries, etc., depending on the naming conventions) industry classifications in the case of stocks.
There is also the issue of stability. Stocks rarely, if ever, jump industries rendering well-constructed industry classifications quite stable. However, alphas being ephemeral objects make poor candidates for being classified into any kind of stable binary or quasi-binary “clusters”.
More precisely, we can do that, but in the 0th – and very good – approximation we need not.
R Package for Statistical Computing, http://www.r-project.org. The source code given in Appendix 1 is not written to be “fancy” or optimized for speed or in any other way. Its sole purpose is to illustrate the algorithms described in the main text in a simple-to-understand fashion. Some legalese relating to this code is given in Appendix 2.
The overall normalization of \(C_{ij}\) does not affect the weights \(w_i\) in (1), so the difference between the unbiased estimate with M in the denominator vs. the maximum likelihood estimate with \(M+1\) in the denominator is immaterial for our purposes here. Also, in most applications \(M \gg 1\).
For equity multifactor models, see, e.g., Grinold and Kahn (2000) and references therein. A multifactor model approach for alphas was set forth and discussed in detail in Kakushadze (2014).
Without the intercept unless it is subsumed in a linear combination of the columns of \({\widetilde{\beta }}_{iA}\).
In shrinkage, when translated into our language here, one simply takes \({\widetilde{\phi }}_{ss^\prime } = \left( 1-\zeta \right) \phi _{ss^\prime }\) and (“shrinkage target”) \(\Delta _{ij} = \zeta ~\Gamma _{ij}\), where the weight (“shrinkage constant”) \(0\le \zeta \le 1\), and \(\Gamma _{ij}\) (usually chosen as a diagonal matrix or a K-factor model with low K) is such that \(\Gamma _{ii}= C_{ii}\).
This is ZK’s own translation of an aphorism from a stanza in Rustaveli’s sole known epic poem whose title is erroneously translated as “The Knight in the Panther’s Skin” (or similar). In ZK’s humble opinion, not only is it the greatest masterpiece of the Georgian literature, but one of the greatest literary writings of all time. It consists of over 1600 perfectly rhymed shairi or Rustavelian quatrains all containing \(16 = 8 + 8\) syllables per line with a caesura between the 8th and 9th syllables. How a human brain can come up with such perfection is mind-boggling, especially considering that this poem tells an extremely complex story complete with dialogs, aphorisms, etc.
Nontrivial algorithms are required to ensure that all \(\xi _i^2\) so computed are positive and consistent with FCM. Such algorithms and source code are given in Kakushadze and Yu (2016a) (see below).
This is shrinkage with a diagonal “shrinkage target” (see footnote 11).
As in Ledoit and Wolf (2004).
To wit, the M columns in \(X_{is}\), \(s=1,\dots ,M\), plus a single column equal \(\sigma _i\). Usually, there is a high correlation between the latter and a linear combination of the former (see below). In fact, we will argue below that the factor proportional to \(\sigma _i\) should be taken out altogether, i.e., removed from the factor loadings matrix \({\widehat{\Omega }}_{i\alpha }\), irrespective of how the latter is constructed (see "A Refinement" section).
Such a classification has a factor loadings matrix (or a subset of its columns) of the form \(\Omega _{iA} = \omega _i~\delta _{G(i), A}\), where \(G:\{1,\dots ,N\}\rightarrow \{1,\dots ,K\}\) maps our N returns to K “clusters”. Note, however, that the “weights” \(\omega _i\) [not to be confused with the portfolio weights \(w_i\) in (1)] can be arbitrary (including negative) and need not be proportional to the unit N-vector \(u_i\equiv 1\).
Recall that \({\widetilde{C}}_{ii} = C_{ii}\), and \(C_{ij} = \sigma _i\sigma _j\sum _{a=1}^M \lambda ^{(a)}~V_i^{(a)}~V_j^{(a)}\).
Note that \(\xi _i^2/\zeta \sigma _i^2 = 1 - \sum _{a=1}^K \lambda ^{(a)}~[V_i^{(a)}]^2\); the \(a>1\) terms are weighted by smaller eigenvalues.
This costs \({\mathcal {O}}(n_{iter}MN)\) operations, where the number of iterations \(n_{iter} \gg K\). As K increases, at some point \(n_{tot}\gtrsim M\) and it makes more sense to use the next method.
This costs \({\mathcal {O}}(M^2N)\) operations.
I.e., in the 0th approximation \(\xi _i^2\) based on principal components are still close to rescaled \(C_{ii}\).
Assuming momentum is positive; otherwise, we can use momentum over volatility, so its distribution is not too skewed. If momentum equals realized return, then this is the Sharpe ratio.
However, capacity is difficult to implement and it is unclear if it adds value.
Following Kakushadze (2015d), we define \(\nu _i = \ln (\sigma _i/\mu )\), where \(\mu \) is such that \(\nu _i\) has zero mean. We define three symmetric tensor combinations \(x_{ij}=u_i u_j\), \(y_{ij}=u_i\nu _j + u_j\nu _i\), and \(z_{ij}=\nu _i\nu _j\) (\(u_i\equiv 1\) is the unit N-vector). We further define a composite index \(\{a\}=\{(i,j)|i>j\}\), which takes \(L=N(N-1)/2\) values, i.e., we pull the off-diagonal lower-triangular elements of a general symmetric matrix \(G_{ij}\) into a vector \(G_a\). This way we can construct four L-vectors \(\Psi _a\), \(x_a\), \(y_a\) and \(z_a\). Now we can run a linear regression of \(\Psi _a\) over \(x_a\), \(y_a\) and \(z_a\). Note that \(x_a \equiv 1\) is simply the intercept (the unit L-vector), so this is a regression of \(\Psi _a\) over \(y_a\) and \(z_a\) with the intercept. The results based on the same data as in Kakushadze (2015d) are summarized in Table 1 and Figure 1 confirming our conclusion above.
This does not necessarily mean that “hockey-stick” alphas (i.e., those that have performed well in the past but have “flat-lined”) are not used in constructing a portfolio of alphas.
If we replace volatility by momentum defined as the realized return over the entire period of the data sample in Kakushadze (2015d), log of momentum too turns out a poor predictor for pair-wise correlations. Table 2 and Figure 2 summarize the regression results.
See, e.g., Kakushadze (2015c) or Kakushadze and Yu (2016a) for a more detailed discussion.
Many alphas can be more ephemeral than that.
The overall normalization factor is immaterial and included for aesthetic reasons.
The distribution of \(\sigma _i\) is skewed and roughly log-normal, and so is that of \(E_i\), so the distribution of \({\widetilde{E}}_i\) is not very skewed and is roughly normal; see Kakushadze and Tulchinsky (2016).
See, e.g., Bouchaud and Potters (2011), which reviews applications of random matrix theory to modeling a sample correlation matrix for equities, and references therein.
Here we deliberately use slightly different notations than above.
As before, \(i=1,\dots ,N\) labels the alphas; \(s=1,\dots ,M+1\) labels the times \(t_s\).
Their normalization is immaterial in what follows.
This step removes the “overall” mode and can be skipped if so desired. Then we would also skip the next step 7) below.
Without the intercept.
Which can be, e.g., (17) or a union thereof with \(Y_{is}\) defined in step 5) above (with any linearly dependent columns removed).
The fact that \(\Upsilon _{ss^\prime }\) has a factor form does not help as \(N \gg M\), in fact, in practice \(N\gg M^2\).
We can keep step (6) above; alternatively, we can simply drop the first principal component, which will give a slightly different set of \(w_i\).
Albeit not all stocks in the trading universe may be optionable and have impled volatilities readily available. Also, it is unclear whether implied volatilities add value (Kakushadze, 2015c).

References

Bouchaud, J.-P. and Potters, M. (2011) Financial applications of random matrix theory: a short review. In: G. Akemann, J. Baik and P. Di Francesco (eds.) The Oxford Handbook of Random Matrix Theory, Oxford, United Kingdom: Oxford University Press.
Grinold, R.C. and Kahn, R.N. (2000) Active Portfolio Management, New York, NY: McGraw-Hill.
Google Scholar
Kakushadze, Z. (2014) Factor models for alpha streams. The Journal of Investment Strategies 4(1):83–109.
Article Google Scholar
Kakushadze, Z. (2015a) Combining alpha streams with costs. The Journal of Risk 17(3):57–78.
Article Google Scholar
Kakushadze, Z. (2015b) Combining alphas via bounded regression. Risks 3(4):474–490.
Article Google Scholar
Kakushadze, Z. (2015c) Heterotic risk models. Wilmott Magazine 2015(80):40–55.
Article Google Scholar
Kakushadze, Z. (2015d) 101 formulaic alphas. Wilmott Magazine (forthcoming). http://ssrn.com/abstract=2701346.
Kakushadze, Z. (2016) Shrinkage = factor model. Journal of Asset Management 17(2):69–72.
Kakushadze, Z. and Tulchinsky, I. (2016) Performance vs. turnover: a story by 4,000 alphas. The Journal of Investment Strategies 5(2):75–89.
Article Google Scholar
Kakushadze, Z. and Yu, W. (2016a) Multifactor risk models and heterotic CAPM. The Journal of Investment Strategies 5(4) (forthcoming). http://ssrn.com/abstract=2722093.
Kakushadze, Z. and Yu, W. (2016b) Statistical risk models. Working Paper. http://ssrn.com/abstract=2732453.
Ledoit, O. and Wolf, M. (2004) Honey, I shrunk the sample covariance matrix. The Journal of Portfolio Management 30(4):110–119.
Article Google Scholar
Markowitz, H. (1952) Portfolio selection. The Journal of Finance 7(1):77–91.
Google Scholar
Mises, R.V. and Pollaczek-Geiringer. H. (1929) Praktische Verfahren der Gleichungsauflösung. ZAMM – Zeitschrift für Angewandte Mathematik und Mechanik 9(2):152–164.
Sharpe, W.F. (1994) The Sharpe ratio. The Journal of Portfolio Management 21(1):49–58.
Article Google Scholar

Download references

Author information

Authors and Affiliations

QuantigicR Solutions LLC, 1127 High Ridge Road #135, Stamford, CT, 06905, USA
Zura Kakushadze
Business School & School of Physics, Free University of Tbilisi, 240, David Agmashenebeli Alley, 0159, Tbilisi, Georgia
Zura Kakushadze
Duke-NUS Medical School, Centre for Computational Biology, 8 College Road, Singapore, 169857, Singapore
Willie Yu

Authors

Zura Kakushadze
View author publications
You can also search for this author in PubMed Google Scholar
Willie Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zura Kakushadze.

Additional information

Disclaimer: The address linked to the corresponding author is used by him for no purpose other than to indicate his professional affiliation as is customary in publications. In particular, the contents of this paper are not intended as an investment, legal, tax or any other such advice, and in no way represent views of Quantigic\(^\circledR \) Solutions LLC, the website www.quantigic.com or any of their other affiliates.

Appendices

R Code for Alpha Weights

In this appendix we give the R source code for calculating the alpha weights based on a regression. The code below is essentially self-explanatory and straightforward as it simply follows the algorithm and formulas in "Summary of Regression Procedure" section. It consists of a single function calc.opt.weights(e.r, ret, y = 0, s = 0, rm.overall = T); e.r is an N-vector of expected returns we wish to optimize; N is the number of the underlying returns (e.g., alphas); ret is an \(N\times (M+1)\) matrix of returns; \(M+1\) is the number of data points in the time series (e.g., days); y is an \(N\times K\) factor loadings matrix \({\widetilde{\Omega }}_{iA}\), \(A=1,\dots ,K\), pre-computed, e.g., via (17); otherwise, if the default y = 0 is used, the code computes the factor loadings matrix \(Y_{is}\) (\(s=1,\dots ,M-1\) or \(s=1,\dots ,M\) depending on whether rm.overall = T or rm.overall = F – see below) based on the time series ret via the algorithm of "Summary of Regression Procedure" section; s is an N-vector of specific risks \({\widetilde{\xi }}_i\) pre-computed, e.g., via (15) or (16); otherwise, if the default s = 0 is used, the code computes s as the square root of the sample variances; rm.overall, if TRUE (default), implies that the “overall” mode is taken out; otherwise, it is kept (see "Summary of Regression Procedure" section). The output is an N-vector \(w_i\) of the optimized alpha weights normalized such that \(\sum _{i=1}^N\left| w_i\right| = 1\). The code can be easily modified, e.g., to combine (via cbind()) the pre-computed factor-loadings matrix \({\widetilde{\Omega }}_{iA}\) as in (17) with the factor loadings matrix \(Y_{is}\) it already computes based on the time series ret.

Disclaimers

Wherever the context so requires, the masculine gender includes the feminine and/or neuter, and the singular form includes the plural and vice versa. The author of this paper (“Author”) and his affiliates including without limitation Quantigic\(^\circledR \) Solutions LLC (“Author’s Affiliates” or “his Affiliates”) make no implied or express warranties or any other representations whatsoever, including without limitation implied warranties of merchantability and fitness for a particular purpose, in connection with or with regard to the content of this paper including without limitation any code or algorithms contained herein (“Content”).

The reader may use the Content solely at his/her/its own risk and the reader shall have no claims whatsoever against the Author or his Affiliates and the Author and his Affiliates shall have no liability whatsoever to the reader or any third party whatsoever for any loss, expense, opportunity cost, damages or any other adverse effects whatsoever relating to or arising from the use of the Content by the reader including without any limitation whatsoever: any direct, indirect, incidental, special, consequential or any other damages incurred by the reader, however caused and under any theory of liability; any loss of profit (whether incurred directly or indirectly), any loss of goodwill or reputation, any loss of data suffered, cost of procurement of substitute goods or services, or any other tangible or intangible loss; any reliance placed by the reader on the completeness, accuracy or existence of the Content or any other effect of using the Content; and any and all other adversities or negative effects the reader might encounter in using the Content irrespective of whether the Author or his Affiliates is or are or should have been aware of such adversities or negative effects.

The R code included in Appendix 1 hereof is part of the copyrighted R code of Quantigic\(^\circledR \) Solutions LLC and is provided herein with the express permission of Quantigic\(^\circledR \) Solutions LLC. The copyright owner retains all rights, title and interest in and to its copyrighted source code included in Appendix 1 hereof and any and all copyrights therefor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kakushadze, Z., Yu, W. How to combine a billion alphas. J Asset Manag 18, 64–80 (2017). https://doi.org/10.1057/s41260-016-0004-9

Download citation

Published: 14 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1057/s41260-016-0004-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

How to combine a billion alphas

Abstract

Access this article