Abstract
The information coefficient (IC), defined as the correlation coefficient between a stock return and its factor exposures predictor variables, is one of the most commonly used statistics in quantitative financial analysis. In this paper, we establish consistency and asymptotic normality of the time series average of cross-sectional sample ICs when the true underlying ICs between the risk-adjusted residual return and the standardized factor exposures are time varying. We use those results to show that the time series average of the cross-sectional sample ICs divided by its sample standard deviation converges to the ex ante expected portfolio information ratio (IR) as derived in Ding and Martin (2017). A simulation study based on a true factor model shows that the finite sample results are strikingly close to what the theory suggests. We also conduct empirical simulations using actual stock returns and quantitative factor exposures, and we find that the logarithm of the estimated IR can be explained very well by a function of the IC mean, the IC standard deviation, and the sample size in exactly the same way as predicted by our theory built on a linear factor model with time varying ICs.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The information coefficient (IC) is one of the most commonly used statistics in quantitative financial analysis. It can be defined along either the time series dimension or the cross section dimension. In the former case, the IC for a given security is defined to be the time series correlation coefficient between the return of this security and its factor exposure prediction. In the latter case, the IC is defined to be the cross-sectional correlation coefficient between the predicted returns of a group of securities for a given time period and the actual returns subsequently realized in that time period. According to modern portfolio theory, individual securities in a portfolio should receive a weight proportional to their predicted returns. As a result, the return of the so-constructed portfolio is a function of the cross-sectional IC between predicted and actual (risk-adjusted) residual returns. As pointed out by Clarke et al. (2002), “performance in any given period is related to the cross-sectional correlation between the active security weights and realized residual returns.” As such, we focus on the cross-sectional IC in this paper.
Most theoretical work to date (see, for example, Grinold (1989), Grinold and Kahn (2000), Clarke et al. (2002, 2006)) assumes that the cross-sectional ICs are constant over time, but this is not consistent with reality. In this paper, we present strong empirical evidence that the cross-sectional ICs are time varying, and their uncertainty is an important risk in factor investing. The realized \({\text{IC}}_{t}\) for a time period can be very different from its ex ante prediction. In fact, even their signs can be different. The portfolio will perform well when the realized factor IC has the same sign as the ex ante prediction and will perform badly if the realized factor IC has the opposite sign of its prediction. Investors will need to incorporate this risk into their quantitative investment process in order to control their portfolio risk properly. Ding and Martin (2017) develop a comprehensive framework for such a quantitative factor investing process when the factor ICs are time varying vectors, whose dimension is equal to the number of factors in the cross-section factor model. The risk estimate in their model is directly related to their return prediction model. As their new fundamental law of active management (FLAM), Ding and Martin (2017) show that the performance of an active portfolio manager should be compared to the portfolio ex ante expected information ratio (IR), a new measure of the added value in every unit of risk added. This new FLAM includes Grinold and Kahn (2000) and Qian and Hua (2004) as special cases.
The work of Ding and Martin (2017) is built on the assumption that the vector \({\text{IC}}_{t}\) is a stationary time series whose true mean vector and covariance matrix are known, and in that case, the IR is shown to be a quadratic form determined by the mean vector and the inverse of the covariance matrix. In reality, these quantities are not known and need to be estimated from observed data. In this paper, we show that a simple multifactor linear regression can be applied to the risk-adjusted residual returns and the standardized factor exposures to obtain the cross-sectional \({\text{IC}}_{t}\) vector estimator at each time t. We further show how to use natural estimates of the mean and covariance of \({\text{IC}}_{t}\) to compute an estimator \(\widehat{\mathrm{IR}}\) of the ex ante information ratio IR. We prove in an appendix that under reasonable regularity conditions \(\widehat{{{\text{IR}}}}\) is a consistent estimator of IR, and it has an asymptotically normal distribution with an asymptotic variance having the same form as that derived by Lo (2002) for the Sharpe ratio in the case of normality. One can use the consistency and asymptotic normality of \(\widehat{{{\text{IR}}}}\) to compare different factor investment strategies and choose the optimal one to use.
Our asymptotic result is developed for a general multifactor model, and the single factor case can be easily derived as a special case. For a single factor model, the result shows that one can simply use the sample IC average divided by the sample IC standard deviation as an estimator of the ex ante expected IR of any single factor investing strategy. This is indeed what many practitioners do in their work, and our research shows that this is a valid approach and puts any portfolio backtest along this line on a rigorous footing.
To lend some support to our theoretical results in finite samples, we first conduct a simulation study based on a single factor model. This study shows that the distributional properties of the cross-sectional IC from our simulated single factor data are strikingly close to what our theory suggests. Furthermore, we conduct empirical studies for both the single factor model and the multifactor model using eight popular quantitative factors in practice. Our empirical result shows that the relationship among the estimated ex ante IR, the IC mean, the IC standard deviation, and the sample size is well predicted by a linear single factor model with time varying ICs. For the multifactor model, our empirical study compares the performances of multifactor models with up to three factors. As in the single factor case, the result shows that the relationship among the estimated ex ante IR, the IC mean vector, the IC covariance matrix, and the sample size is well modeled by a linear multifactor model with time varying ICs. As predicted by our fundamental law for multifactor models, the IR becomes larger as we increase the number of factors. This result allows us to assess the relative IR performance of various subsets of candidate factors, and choose the factor model with the best IR performance.
The factor model and time varying information coefficients
In this section, we describe the factor models introduced by Ding and Martin (2017), upon which our main theoretical results here are based. Let \(\{ r_{it}^{{{\text{Total}}}} ,\;i = 1, \ldots ,N{\text{ and }}t = 1, \ldots ,T\}\) be the set of returns in excess of the risk-free rate for N securities over T periods. Based on the CAPM, we have
where \(\beta_{i}\) is the beta of security i with respect to the market proxy benchmark excess return \(r_{B,t}\), and \(r_{it}\) is the residual return. Under the CAPM, we have \(E(r_{it} ) = 0\). Although we do not assume the CAPM, it is assumed throughout that the residual return has unconditional mean zero. Let \(\sigma_{{r_{it} }}^{2} = var(r_{it} |r_{i,\,t - 1}^{o} )\) be the conditional variance of the residual return \(r_{it}\) given its past returns \(r_{i,\,t - 1}^{o} :\, = \{ r_{i,\,t - 1} , \ldots ,\,r_{i,\,1} \}\). The risk-adjusted residual return is defined to be
which has a conditional variance \(var\left( {\tilde{r}_{it} |{r}_{i,t - 1}^{o} } \right)\) equal to 1. It is further assumed that \(E\left( {\tilde{r}_{it} |r_{i,\,t - 1}^{o} } \right) = 0\) so that the past residual returns alone are not useful as a predictor for the risk-adjusted residual return in the next periodFootnote 1. This assumption ensures that the unconditional variance of \(\tilde{r}_{it}\) is also equal to 1Footnote 2:
Our factor model has the form
where \({{\varvec{\uprho}}}_{t}\) is the factor returns vector at time t, \({\mathbf{z}}_{i,\,t - 1} = (z_{i1,\,t - 1} , \ldots ,\,z_{iK,\,t - 1} )^{\prime}\) is the \(K \times 1\) vector of lagged factor exposures one is betting on, and for \(i = 1, \ldots ,\,N\) and \(t = 1, \ldots ,\,T,\) the \({\mathbf{z}}_{i,\,t - 1}\) is standardized to have zero mean \(E({\mathbf{z}}_{i,\,t - 1} ) = {\mathbf{0}}\) and an identity covariance matrix \(E({\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} ) = {\mathbf{I}}_{K}\). Examples of quantitative factors are value factors, such as the earnings-to-price ratio, the cash-flow-to-price ratio, and various momentum factors. Here we use the risk-adjusted residual return \(\tilde{r}_{it}\) as the dependent variable in the factor model, which differs from the factor models used by other researchers where raw residual returns are used. Since the residual returns and the factor exposures vectors are both standardized, the factor returns \({{\varvec{\uprho}}}_{t}\) is also the correlation coefficient between the lagged factor exposures and the risk-adjusted residual returns. As such, it is the time-varying information coefficient \({\text{IC}}_{t}\). An important feature of our factor model is that one unit of risk-adjusted exposure will be rewarded (paid off) with the same amount of risk-adjusted residual returns across securities.
The time series of factor returns \(\left\{ {{{\varvec{\uprho}}}_{t} } \right\}_{t = 1}^{T}\) are assumed to be independent of \({\mathbf{z}}_{i,\,t - 1}\) and have constant mean vector \({{\varvec{\uprho}}} = {{\varvec{\uprho}}}_{{{\text{IC}}}}\) and covariance matrix \({{\varvec{\Sigma}}} = {{\varvec{\Sigma}}}_{{{\text{IC}}}}\). The idiosyncratic returns in our factor model, \(e_{it} ,\;i = 1, \ldots ,N,\) are assumed to satisfy the standard assumptions of a linear factor model with zero cross-correlation between \(e_{it}\) and \(e_{jt}\) when \(i \ne j\), and it is shown in Ding and Martin (2017) that they have constant variance \(\sigma_{e}^{2}\) given our model assumptions.
A key feature of the Ding and Martin model is that the cross-sectional information coefficients are time varying, which is strongly supported by empirical financial data. For example, Fig. 1 plots the sample cross-sectional correlation \(\hat{\rho }_{t,\,N}\) (defined in Eq. (4) in the next section) between the risk-adjusted residual return and the standardized 12-month momentum factor from January 1979 to June 2010 for stocks in the Russell 1000 universe. Similarly, Fig. 2 plots \(\hat{\rho }_{t,\,N}\) between the risk-adjusted residual return and the standardized book-to-price ratio factor. For such single factor cases, analysts who make the mistake of assuming that \(\rho_{t}\) is constant, and that the estimates \(\hat{\rho }_{t,\,N}\) contain only estimation error, are likely to incorrectly assume that the standard deviation of \(\hat{\rho }_{t,\,N}\) is approximately \(\sqrt {\left( {1 - \hat{\rho }_{N}^{2} } \right)/N}\) where \(\hat{\rho }_{N} = \sum\nolimits_{t = 1}^{T} {\hat{\rho }_{t,\,N} /T}\). The red solid lines in each of the two figures are located at the values of \(\hat{\rho }_{N}\) , and the red dotted lines are the 95% confidence bounds for the time varying IC values \(\hat{\rho }_{t,\,N}\) based on the above standard deviation formula. One would expect 95% of the cross-sectional IC values to be within the red dotted lines if they made the erroneous assumption that the cross-sectional IC values were constant. However, for both factors only about 25% of the IC estimates are within the corresponding bounds.
Given the above data generating process for the cross-section of security residual returns, Ding and Martin (2017, Proposition 2) show that the ex ante expected information ratio (IR) of the optimal mean-variance dollar-neutral portfolio that is based on a bet on factor \({\mathbf{z}}_{t - 1}\) can be approximated very well by the formula
In the special case when there is a single factor so that \(K = 1,\) we have
where we have used \(\sigma_{{{\text{IC}}}}^{2}\) to denote \({{\varvec{\Sigma}}}_{{{\text{IC}}}}\) when \(K = 1.\) The above two formulae extend the original work by Grinold (1989), Clarke et al. (2002), Qian and Hua (2004), and Ye (2008) by allowing for time varying information coefficients.
Focusing on the single factor model case, we note that the portfolio manager faces two sources of risk: one is the non-diversifiable strategy risk, captured by \(\sigma_{{{\text{IC}}}}^{2} ,\) and the other is the sampling error risk, captured by \(\sigma_{e}^{2} /N,\) which decreases with increasing sample size. Because all strategies have an underlying risk reflected in the fact that one always has \(\sigma_{{{\text{IC}}}}^{2} \ne 0\), and correspondingly one should expect the portfolio to underperform randomly from time to time. However, for a well-chosen factor with \(\rho_{{{\text{IC}}}} > 0,\) patience will get paid off as a risk premium for taking the strategy risk. Of course, if one can predict the sign of each \(\rho_{t}\) and bet accordingly, then the payoff will be overwhelming to such a skill. However, as pointed out by Asness (2016), “(performances of) such timing strategies to be very weak historically, and some tests of their long-term power to be exaggerated and/or inapplicable.”
Consistent IC and IR estimators
The formulae in (2) and (3) are simple but depend on unknown parameters. To estimate the IR, one needs to estimate the model parameters using historical data. The most common way of estimating the \({{\varvec{\uprho}}}_{t}\) at each time period is to use the OLS cross-section regression estimators
Since \({{\varvec{\uprho}}}_{{{\text{IC}}}} = E\left( {{{\varvec{\uprho}}}_{t} } \right)\), one naturally estimates \({{\varvec{\uprho}}}_{{{\text{IC}}}}\) with the sample mean of the \({\hat{\mathbf{\rho }}}_{t,\,N}\)
and then estimates the unknown covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N}\) with the sample covariance matrix:
Our proposed estimator of the ex ante expected portfolio IR is
It is shown in Appendix A that under weak regularity conditions \({\hat{\mathbf{\rho }}}_{t,\,N}\) and \({\hat{\mathbf{\rho }}}_{N}\) are consistent estimators, that is
and
where \(\to^{p}\) stands for convergence in probability. We define the total risk covariance matrix as
Appendix A proves the consistency resultFootnote 3
that is, the sample IC covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N}\) is approximately equal to the total risk covariance matrix in the expected IR expression. When \(K = 1\), the result in (10) reduces to
The above result shows that in the case of time varying ICs, one should use (11) to get a proper confidence interval under a single factor model. The two green dashed lines in Figs. 1 and 2 provide the corrected confidence intervals for the time-varying ICs of the Momentum factor and the Book-to-Price Ratio factor. It can be seen that the correct confidence intervals are much wider than the naïve confidence intervals when assuming the ICs are constant over time. With the corrected confidence intervals that we develop in this paper, about 6.3% of the estimated ICs for the Momentum factor and 5.3% of the estimated ICs for the Book-to-Price Ratio factor are outside the 95% confidence intervals.
In the quite unrealistic case where the underlying population cross-sectional ICs are constant over time so that \(\sigma_{{{\text{IC}}}}^{2} = 0\), we have \(\sigma_{e}^{2} = 1 - \rho^{2}\) and so
This is a very familiar result for the sample cross-correlation coefficient between two random variables based on an iid sample. See, for example, Casella and Berger (2016) and Qian et al. (2007).
Combining the results in (9) and (10), we have the following consistency result
In the special case with \(K = 1,\) we have that \(\widehat{\text{IR}} = \hat{\rho }_{N} /\hat{\sigma }_{\rho ,\,N}\) is a consistent estimator of the expected IR of the alpha factor portfolio.
It should be emphasized that the portfolio is constructed using a selection universe of N securities, and each sample regression coefficient \({\hat{\mathbf{\rho }}}_{t,\,N}\) is also calculated using the same N observations. The expected information ratio (\({\text{IR}}\)) depends on the size of the universe N. A portfolio constructed using a universe of 1000 stocks will have a different (smaller) expected IR from that of a portfolio constructed using a universe of 3000 stocks. The increased sample size improves the accuracy of parameter estimation, which in turn improves the portfolio performance.
To demonstrate the above consistency results, we generate data from a single factor model as follows. We first generate 240 random \(\rho_{t}\) from a normal distribution with \(\rho_{{{\text{IC}}}} = 0.05\) and \(\sigma_{{{\text{IC}}}} = 0.15\). We then generate a total cross-section of 2000 risk-adjusted residual returns for each of 240 time periods (\(T = 240\)) using Eq. (1) with normally distributed factor exposures \(z_{i,t - 1}\) and idiosyncratic returns \(e_{it}\). After the factor exposures and risk-adjusted residual returns are generated, we randomly draw N observations for \(5 \le N \le 1000\). We calculate the cross-sectional IC estimates \(\hat{\rho }_{t,N}\) over 240 time periods, get the sample mean \(\hat{\rho }_{N}\) and the sample standard deviation \(\hat{\sigma }_{\rho ,N}\) of these IC estimates, and then estimate the expected portfolio IR using \(\hat{\rho }_{N} /\hat{\sigma }_{\rho ,N}\). The procedure is simulated 100 times. More specifically, for each N, we draw 100 random samples of size N with replacement from the 2000 residual returns. An average of these simulated IR estimates is calculated and is shown as the blue line in Fig. 3. The red line is the theoretical IR as in Eq. (3) using the true parameters. The dotted line is the maximum IR (i.e., \({\text{IR}}_{\max } = \rho_{{{\text{IC}}}} /\sigma_{{{\text{IC}}}}\)) one can reach when N goes to infinity.
Figure 3 shows that the average IR estimate is remarkably close to what our theory suggests. When the cross-sectional sample size (N) is small, the estimation error may not be very small even after simulation averaging. This suggests that it is necessary to adjust the standard error using \(\sigma_{e}^{2} /N\) when N is not large. Results not reported show that the IR estimate for each sample is also close to what our theory predicts. We use simulation averaging only to average out some of the simulation noises so that what we plot reflects a systematic pattern that is not due to pure chance.
Asymptotic normality of the IC and IR estimators
With the consistency results in (9) and (10), we show in Appendix B that
that is, the distribution of \(\sqrt T \left( {{\hat{\mathbf{\rho }}}_{N} - {{\varvec{\uprho}}}_{{{\text{IC}}}} } \right)\) can be approximated by a normal distribution with mean zero and variance \({{\varvec{\Omega}}}_{N}\). Furthermore, we show in Appendix B that under some additional conditions, the IR estimator in (7) has an asymptotically normal distribution:
where
is the asymptotic variance of \(\widehat{\text{IR}}\). This result has the same form as the asymptotic normality of the Sharpe ratio established by Lo (2002) when returns are iid and normally distributed. As in the case of the Sharpe ratio, a standard error (SE) for the information ratio estimator is computed as
The t statistic for testing the null hypothesis that the information ratio has a value greater or equal to \({\text{IR}}_{0}\) is
The asymptotic variance formula in (16) relies on Assumption 2 in Appendix B which states that the asymptotic variance of \(1/T\sum\nolimits_{t = 1}^{T} {({{\varvec{\uprho}}}_{t} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )({{\varvec{\uprho}}}_{t} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )^{\prime} - {{\varvec{\Sigma}}}_{{{\text{IC}}}} }\) depends on the variance of \({{\varvec{\uprho}}}_{t}\) only. Such an assumption will hold if \({{\varvec{\uprho}}}_{t}\) is iid normal. In reality \({{\varvec{\uprho}}}_{t}\) will be not only serially correlated but also non-normal. In the case when \({{\varvec{\uprho}}}_{t}\) is iid but non-normal with nonzero skewness \(k_{3}\) and kurtosis \(k_{4}\), we can show that the asymptotic variance becomes
A standard error for the IR estimate is obtained by plugging estimates of IR, \(k_{3}\), \(k_{4}\), and taking the square root of the result divided by T. A formula of a similar form was obtained for the Sharpe ratio by Zhang et al. (2021, Eqs. (27) and (44)).
Empirical simulation results for single factor models
The results in the previous section show that data generated from our theoretical models have the desired properties as predicted by the asymptotic theory. The crucial question is: how relevant are our theoretical models in the real world? Quantitative models are often built on different factors, such as value factors, momentum factors, etc. Researchers usually assume a linear relationship like in Eq. (1) between security returns and factor exposures. It will be interesting to see if the simulated IR from universes of different sample sizes has the same features as the theoretically simulated factors above. This can be a check for model specification (if a theoretical model has a certain property but the empirical data does not have this property, then we can be sure that the theoretical model does not fit the data well). It can also provide a guide to portfolio managers on how to choose factors that lead to the best IR performance. Here we focus on choosing a single factor model that has the best IR performance among a set of factors.
The eight quantitative factors we study here are:
-
(1)
Book to Price Ratio,
-
(2)
Cash Flow to Price Ratio,
-
(3)
Earnings to Enterprise Value Ratio,
-
(4)
Sales to Price Ratio,
-
(5)
12-month Momentum,
-
(6)
Share Buyback as a Percent of Total Shares Outstanding,
-
(7)
Return on Capital,
-
(8)
Short as a Percent of Total Shares Floating.
All the raw exposures start from 1979:01 to 2010:06 except the Cash Flow to Price Ratio, which is from 1990:02 to 2010:06, and Short as a Percent of Total Shares Floating, which is from 1988:02 to 2010:06.
The raw factor exposures are cross-sectionally Winsorized and then standardized. The residual returns are calculated and standardized using time series estimated residual return volatilityFootnote 4 so that they have zero mean and unit variance as well. The details of the standardization for both the returns and the factor exposures were provided in Ding and Martin (2017). At each time period, we randomly draw N companies with returns and factor exposures and calculate the cross-sectional regression coefficient \(\hat{\rho }_{t,N}\) for that time period. We then calculate the sample mean \(\hat{\rho }_{N}\) and the sample standard deviation \(\hat{\sigma }_{\rho ,N}\) over time and get an IR estimator (\(\widehat{\text{IR}} = \hat{\rho }_{N} /\hat{\sigma }_{\rho ,N}\)). This is simulated 100 times, and we then get the average IR from these 100 repeated samples. We do this for \(N = 50{\text{ to 3000}}\) companies.
Figure 4 displays the empirical simulation results. It can be seen that the actual data has the shape we expect for all the factors considered. So, at least from this perspective, we can say that the linear models specified in Eq. (1) give a quite good description of the relationship between security returns and factor exposures.
Note that when \(K = 1\), we have
As an alternative method to evaluate the performance of \(\widehat{{{\text{IR}}}}\), we can fit the following nonlinear regression model:
based on the observations \(\left\{ {\widetilde{\mathrm{IR}}_N {\text{ for }}N = 50,51, \ldots ,3000} \right\}\) where each \(\widetilde{\text{IR}}_N\) is the average of 100 simulated \(\widehat{\mathrm{IR}}_N\)’s. For each factor, we can estimate \(\rho_{{{\text{IC}}}}\) and \(\sigma_{{{\text{IC}}}}\) by nonlinear least squares. The estimated results are shown in Table 1. The table shows that the above empirical model captures the relationship between the expected IR and the three measures \(\rho_{{{\text{IC}}}} ,\) \(\sigma_{{{\text{IC}}}}^{{2}}\) and N well. The \(R^{2}\)’s from these regressions are very close to 1 in almost all the cases. This implies that the error in (21) is very small relative to the true value of \(\log ({\text{IR}})\). We note that the results are qualitatively similar when we do not use simulation averaging, although the \(R^{2}\)’s are somewhat smaller.
In Table 1 and Fig. 4 below, the value of \(N_{90}\) for each factor denotes the number of stocks needed to construct a portfolio that achieves 90% of the maximum possible IR. For example, for the Momentum factor, only 363 stocks are needed, and for the Return on Capital factor, 2483 stocks are needed in order to reach 90% of the maximum possible IRs, respectively.
Empirical simulation results for multifactor models
In order to demonstrate the great relevance of our asymptotic results for multifactor models, we also carried out simulations for all 6 subsets of a three-factor model. As in the single-factor-model simulations, we randomly draw N companies at each time period with returns and factor exposures and calculate the cross-sectional multifactor regression coefficient vectors \({\hat{\mathbf{\rho }}}_{t,N}\) for \(t = 1,2, \ldots ,T\). We then calculate the sample mean vector \({\hat{\mathbf{\rho }}}_{N}\) and the sample covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,N}\) for the regression coefficient vectors \({\hat{\mathbf{\rho }}}_{t,N}\). The final portfolio IR using N companies is then estimated as \(\widehat{\mathrm{IR}}_N = \sqrt {{\mathbf{\hat{\rho }^{\prime}}}_{N} {\hat{\mathbf{\Sigma }}}_{\rho ,\,N}^{ - 1} {\hat{\mathbf{\rho }}}_{N} } .\) We conducted this simulation 15 times for N = 50 to 1000 companies. The final IR curves shown in Fig. 5 are the average of the 15 simulations.
The bottom three lines are the simulated IRs for the single factor portfolios which are the same as those shown in Fig. 4. The middle three lines are the simulated IRs for the portfolios based on the three different two-factor models, BP+EEV, BP+MOM, and EEV+MOM. It is clear that the IRs based on the two-factor models are all larger than the IRs of the single factor models by a substantial amount. Finally, the top line shows the IRs from using all three factors (BP+EEV+MOM). It is interesting to note that the best performing two-factor model is based on the two best performing single factor models.
There is no convenient analog to Eq. (21) for the above multifactor model. However, one can imagine the potential existence of a single-factor model not yet discovered, but its IR values will be the same as those for a multifactor model. It is then of interest to know what the parameters are for the single-factor model. Against this backdrop, we fit Eq. (21) to the IR values for each of the multifactor models under consideration. We display the results in Table 2, where the first three rows are taken from Table 1 for ease of comparison. The very high R-squared values in Table 2 indicated that the fits of nonlinear least squares are quite good. While the IC volatility (\(\hat{\sigma }_{{{\text{IC}}}}\)) is sometimes higher for a multifactor model than that for a special case single factor model, the IC mean (\(\hat{\rho }_{{{\text{IC}}}}\)) is always considerably larger, resulting in a considerably larger IR for the multifactor model compared to each subset single factor model.
Conclusion
Building on the Ding and Martin (2017) model, this paper has studied the estimation and inference of the IC mean and the IR in the presence of time varying ICs. Our theoretical results can be summarized as follows: (a) We have established the consistency and asymptotic normality of the time series sample mean of the IC vectors; (b) We have established the consistency and asymptotic normality of the IR estimator based on the sample mean and the sample covariance of the empirical IC’s. In both cases, we have provided the asymptotic variance formula. In particular, the asymptotic variance of the IR estimator is shown to take the same form as the asymptotic variance of the Sharpe ratio, allowing the portfolio manager to easily compute the standard error of the IR estimate.
Our empirical results for single-factor models show that the behavior of the IR estimator, as a function of cross-section sample size N, is as predicted by the theoretical results of Ding and Martin (2017). The empirical results for a variety of factor models reveal that different factor models have different maximum IR values \({\text{IR}}_{\max }\), and approach \({\text{IR}}_{\max }\) at different rates as a function of N. Furthermore, the IR estimators under different factor models have different standard errors. Thus, a portfolio manager now has some new tools for analytic evaluation and decision-making regarding the choice of the “best” single-factor models.
Our empirical studies for a 3-factor model and each of its 6 subset models show that, as anticipated, higher values of IR are obtained with multifactor models that contain more factors, and that there is a clear IR-based ranking for the three 2-factor models. Thus, we have a tool for subset selection of multifactor models, which with current high-performance computers can be applied to select among more than 10 factors. An open question is how we may design a good penalty to control for over-fitting.
It is well-known that the Sharpe ratio is optimistic when returns are serially dependent, either with positive serial autocorrelation or an uncorrelated but serially dependent GARCH process and in such cases a correction to the standard error is needed to obtain a valid standard error. See for example Lo (2002). The IR takes the same form as the Sharpe ratio but with returns replaced by the IC time series. The last column in Table 1 with the column title “ACF(1)” reports the first-order autocorrelation for the estimated IC series for various single-factor models. The magnitude of autocorrelation indicates that the IC may be serially correlated, and the information ratio may need to be modified to account for the extra information that a fund manager can exploit. However, in finite samples, a fund manager may choose to explore only some low-order dependence. In this case, inferences on the newly derived IR will need to account for residual serial dependence in the ICs. To design a reliable inference procedure, we can use the ideas from the large econometric literature on long-run variance estimation and HAC (Heteroscedasticity and Autocorrelation Consistent) inference. Chen and Martin (2021) and Christidis and Martin (2021) use the HAC methodFootnote 5, but we may employ the most recent HAR (Heteroscedasticity and Autocorrelation Robust) method to obtain more accurate and trustworthy inferences. See, for example, Kiefer and Vogelsang (2005) and Sun (2013). We leave this for future research.
Notes
The assumption \(E\left( {\tilde{r}_{it} |r_{i,\,t - 1}^{o} } \right) = 0\) here is stronger than \(E\left( {\tilde{r}_{it} } \right) = 0\) and is needed but not explicitly stated in Ding and Martin (2017) for their fundamental law of active management to hold.
This follows from the equalities \(var\left( {\tilde{r}_{it} } \right) = E\left[ {var\left( {\tilde{r}_{it} |{r}_{i,t - 1}^{o} } \right)} \right] + var\left[ {E\left( {\tilde{r}_{it} |{r}_{i,t - 1}^{o} } \right)} \right] = E\left[ {var\left( {\tilde{r}_{it} |{r}_{i,t - 1}^{o} } \right)} \right] = 1\).
More rigorously, this should be understood as \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N} - {{\varvec{\Omega}}}_{N} \to^{p} {\mathbf{0}}\). We follow the same convention hereafter.
The residual return volatility is estimated using a GARCH(1,1) model with the ARCH parameter equal to 0.1 and the GARCH parameter equal to 0.81. The constant term in the GARCH model is then decided by 0.09 times the sample unconditional variance estimated using data up to that time point.
Their procedures have been implemented in the R package available on CRAN (https://cran.r-project.org/web/packages/RPESE/index.html).
References
Asness, C.S. 2016. The Siren Song of Factor Timing aka “Smart Beta Timing” aka “Style Timing”. The Journal of Portfolio Management 42 (5): 1–6.
Casella, George and Roger L. Berger 2016. Statistical Inference. 2nd ed. Thomson Learning.
Chen, X., and R.D. Martin. 2021. Standard Errors of Risk and Performance Measure Estimators for Serially Correlated Returns. Journal of Risk 23 (2): 1–41.
Christidis, A.A., and R.D. Martin. 2021. RPESE: Risk and Performance Estimators Standard Errors with Serially Dependent Data. The R Journal 13 (2): 697–712.
Clarke, Roger, Harindra de Silva, and Steven Thorley. 2002. Portfolio Constraints and the Fundamental Law of Active Management. Financial Analysts Journal, vol. 58, no. 5 (September/October): 48–66.
Clarke, Roger, Harindra de Silva, and Steven Thorley. 2006. The Fundamental Law of Active Management. The Journal of Investment Management 4 (3): 54–72.
Ding, Zhuanxin, and R. Douglas Martin. 2017. The Fundamental Law of Active Management: Redux. Journal of Empirical Finance 43: 91–114.
Grinold, Richard C. 1989. The Fundamental Law of Active Management. The Journal of Portfolio Management, vol. 15, no. 3 (Spring): 30–38.
Grinold, Richard C. and Ronald N. Kahn. 2000. Active Portfolio Management 2nd Edition, McGraw-Hill New York.
Kiefer, N.M., and T.J. Vogelsang. 2005. A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests. Econometric Theory 21: 1130–1164.
Lo, Andrew W. 2002. The Statistics of Sharpe Ratios. Financial Analysts Journal 58 (4): 36–52.
Jan R. Magnus and Heinz Neudecker, 2002. Matrix Differential Calculus with Applications in Statistics and Econometrics. Revised Edition. John Wiley & Sons New York.
Qian, Edward, and Ronald Hua. 2004. Active Risk and Information Ratio. The Journal of Investment Management 2 (3): 20–34.
Qian, E., Hua, R., and Sorensen, E.H., 2007. Quantitative Equity Portfolio Management: Modern Techniques and Applications. CRC Press London.
Sun, Y. 2013. A heteroskedasticity and autocorrelation robust F test using an orthonormal series variance estimator. The Econometrics Journal 16: 1–26.
Ye, Jia. 2008. How Variation in Signal Quality Affects Performance. Financial Analysts Journal 64 (4): 48–61.
Zhang, S.Y., R.D. Martin, and A.A. Christidis. 2021. Influence Functions for Risk and Performance Estimators. Journal of Mathematical Finance 11: 15–47.
Acknowledgments
We are very grateful to Doug Martin for his inspiring discussions, detailed comments, constructive suggestions, and other considerable inputs. We very much appreciate the contributions from two anonymous reviewers. Their very careful reviews and suggestions led to a substantial revision of the paper to its present form.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Consistent IC and IR estimators
We consider the asymptotics under which \(N \to \infty\) , \(T \to \infty .\) We will maintain the following high-level assumption, which is plausible under enough moment conditions and weak dependence conditions.
Assumption 1
(a) Uniform convergence:
(b) Central Limit Theorem (CLT)
where \({{\varvec{\upxi}}}\sim N({\mathbf{0}},{\mathbf{I}}_{K} ),{{\varvec{\upeta}}}\sim N({\mathbf{0}},{{\varvec{\Sigma}}}_{{{\text{IC}}}} )\) and \({{\varvec{\upxi}}}\) and \({{\varvec{\upeta}}}\) are independent.
(c) Law of Large Numbers (LLN)
and \({{\varvec{\Omega}}}_{N} :\, = {{\varvec{\Sigma}}}_{{{\text{IC}}}} + \sigma_{e}^{2} /N \cdot {\mathbf{I}}_{K}\) is nonsingular.
(d) Stochastic boundedness:
In the above formula, \(O_{p} (1)\) denotes a term whose absolute value is stochastically bounded, and \(o_{p} \left( 1 \right)\) denotes a term converging to zero in probability.
Assumption 1(a) is a standard uniform convergence condition. For each time period, we may obtain \(N^{ - 1} \sum\nolimits_{i = 1}^{N} {{\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} - {\mathbf{I}}_{K} = O_{p} (1/\sqrt N )}\) which can be verified by using the Markov inequality and a moment bound. Of course, this holds trivially if a CLT holds jointly for all elements of \(N^{ - 1/2} \sum\nolimits_{i = 1}^{N} {\left( {{\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} - {\mathbf{I}}_{K} } \right)} {\kern 1pt}\). Assumption 1(a) attempts to control \(N^{ - 1/2} \sum\nolimits_{i = 1}^{N} {\left( {{\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} - {\mathbf{I}}_{K} } \right)} {\kern 1pt}\) uniformly over \(t = 1, \ldots ,\,T\) by looking at its maximum over \(t = 1, \ldots ,\,T\). It is well known that the maximum of T independent standard normals is of order \(\sqrt {\log T} .\) Hence, Assumption 1(a) is plausible when we have enough moment and weak dependence conditions. Assumption 1(b) is a fairly standard CLT in time series analysis. Note that it does not follow from the assumptions imposed in the main text. For a sequence of observations that are not iid, a CLT typically needs a finite “\(2 + \delta\)” moment for some \(\delta > 0\) . Assumption 1(c) is an LLN for time series data. Assumption 1(d) can be proved if each summand has enough moments and the time series and cross-sectional dependence is weak enough.
We first show that \({\hat{\mathbf{\rho }}}_{t,\,N}\) is a consistent estimator for \({{\varvec{\uprho}}}_{t}\). By Assumption 1(b) \({\text{P}}\lim_{N \to \infty } \tfrac{1}{N}\sum\nolimits_{i = 1}^{N} {{\mathbf{z}}_{i,\,t - 1} e_{it} = {\mathbf{0}}}\) for each given t, we have
This shows that (8) holds, and that the cross-sectional regression estimator at time t is consistent for \({{\varvec{\uprho}}}_{t} .\)
Next, we show that (9), (10), and (14) hold under Assumption 1. Recall that \({{\varvec{\uprho}}}_{{{\text{IC}}}} :\, = E({{\varvec{\uprho}}}_{t} )\). We have
In the above, “\(\mathop \approx \limits^{d}\)” denotes asymptotic equivalence in distribution, that is, the asymptotic distribution of \(\sqrt T ({\hat{\mathbf{\rho }}}_{N} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )\) is the same as the asymptotic distribution of \(\frac{{\sigma_{e} }}{\sqrt N }{{\varvec{\upxi}}} + {{\varvec{\upeta}}}\,\). Here we have used Assumption 1(a).
Next,
Now that under Assumption 1(a) we have
where the term \(O_{p} \left( {\sqrt {\log (T)/N} } \right)\) holds uniformly over \(t = 1, \ldots ,T\). Using this, we then have
where we have used the first part of Assumption 1(d). Also,
where the last equality holds because \({{\varvec{\uprho}}}_{{{\text{IC}}}} - {\hat{\mathbf{\rho }}}_{N} = O_{p} (1/\sqrt T )\) and so \(({{\varvec{\uprho}}}_{{{\text{IC}}}} - {\hat{\mathbf{\rho }}}_{N} )({{\varvec{\uprho}}}_{{{\text{IC}}}} - {\hat{\mathbf{\rho }}}_{N} )^{\prime}\) \(= O_{p} \left( {1/T} \right).\) Furthermore, using Assumption 1(b) and the second condition in Assumption 1(d), we have
Noting that \(\sqrt {(\log T)N} = o(1)\) under Assumption 1(a), we then have
Hence
So, by Assumption 1(c) we have
Appendix B
Asymptotic normality of the IC and IR estimators
We will maintain the following assumption:
Assumption 2
(a) CLT
where \({{\varvec{\upeta}}}\sim N({\mathbf{0}},{{\varvec{\Sigma}}}_{{{\text{IC}}}} ),\;{{\varvec{\upzeta}}}\sim N\left[ {{\mathbf{0}},2({{\varvec{\Sigma}}}_{{{\text{IC}}}} \otimes {{\varvec{\Sigma}}}_{{{\text{IC}}}} )} \right]\) and \({{\varvec{\upeta}}}\) and \({{\varvec{\upzeta}}}\) are independent.
(b) Rate condition: \(\frac{\sqrt T }{N}\sqrt {\frac{\log T}{N}} = O(1)\).
A sufficient condition for Assumption 2(a) is that \({{\varvec{\uprho}}}_{t}\) is iid \(N({{\varvec{\uprho}}}_{{{\text{IC}}}} ,\,{{\varvec{\Sigma}}}_{{{\text{IC}}}} ).\) This is clearly stronger than necessary. Note that the first part of Assumption 2(a) (i.e., the marginal convergence of \(T^{ - 1/2} \sum\nolimits_{t = 1}^{T} {\left( {{{\varvec{\uprho}}}_{t} - {{\varvec{\uprho}}}_{{{\text{IC}}}} } \right)}\) is the same as that in Assumption 2(b). Here we impose an additional weak convergence condition, which is necessary for establishing the asymptotic distribution of \(\widehat{\text{IR}}\). Assumption 2(b) is a technical condition, which often appears in the double index asymptotics. It requires that N grows as fast as \(\left( {T\log T} \right)^{1/3}\) , which appears to be a mild requirement when there are many securities available.
Now we are ready to prove (14) and (15). With the result in (A1), we have
by Assumption 2(b). This implies that
and
Using Assumption 2 and the decomposition of \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N}\) in Appendix A, we have
This implies that \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N} - {{\varvec{\Omega}}}_{N} = O_{p} (1/\sqrt T ).\) It then follows that
and
We next give an approximation to \(\widehat{{{\text{IR}}}}\):
For \(Q_{4}\), we have
For \(Q_{5}\), we use \(\sqrt {1 + x} = 1 + \tfrac{1}{2}x + O\left( {x^{2} } \right)\) as \(\left| x \right| \to 0\) to obtain
Hence,
and so
Using \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N} - {{\varvec{\Omega}}}_{N} = O_{p} (1/\sqrt T )\) and taking an expansion of \(\sqrt {{\mathbf{\rho^{\prime}}}_{{{\text{IC}}}} {\hat{\mathbf{\Sigma }}}_{{{{\varvec{\uprho}}},N}}^{ - 1} {{\varvec{\uprho}}}_{{{\text{IC}}}} }\) , we have
In view of \(\sqrt {1 - x} = 1 - \tfrac{1}{2}x + o\left( x \right)\) as \(\left| x \right| \to 0,\) we then have
Therefore,
Finally, under Assumption 2 in this appendix, both \(\sqrt T ({\hat{\mathbf{\rho }}}_{N} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )^{\prime}\Omega_{N}^{ - 1} {{\varvec{\uprho}}}_{{{\text{IC}}}}\) and \(- \tfrac{1}{2}{\mathbf{\rho^{\prime}}}_{{{\text{IC}}}} {{\varvec{\Omega}}}_{N}^{ - 1} \sqrt T ({\hat{\mathbf{\Sigma }}}_{\rho ,N} - {{\varvec{\Omega}}}_{N} ){{\varvec{\Omega}}}_{N}^{ - 1} {{\varvec{\uprho}}}_{{{\text{IC}}}}\) are asymptotically normal. The asymptotic variance of the first term is the limit of
For the second term, note that it is a scalar, and so
where we have used \(vec(a) = a\) for a scalar a and \(vec(ABC) = (C^{\prime} \otimes A)vec(B)\) with \(A = C^{\prime} = ({{\varvec{\Omega}}}_{N}^{ - 1} {{\varvec{\uprho}}}_{{{\text{IC}}}} )^{\prime}\) and \(B = \sqrt T ({\hat{\mathbf{\Sigma }}}_{\rho ,N} - {{\varvec{\Omega}}}_{N} )\). The asymptotic variance of the second term is then equal to the limit of
where we have used the following rules involving Kronecker products: \((A \otimes B)^{\prime} = A^{\prime} \otimes B^{\prime}\) and \((A \otimes B)(C \otimes D) = (AC) \otimes (BD)\). See, for example, Magnus and Neudecker (2002, p28).
Hence,
as desired.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ding, Z., Sun, Y. The statistics of time varying cross-sectional information coefficients. J Asset Manag 24, 1–15 (2023). https://doi.org/10.1057/s41260-022-00295-9
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41260-022-00295-9
Keywords
- Information coefficient (IC)
- Asymptotic distribution
- Information ratio (IR)
- Factor model
- The fundamental law of active management (FLAM)