Introduction

The information coefficient (IC) is one of the most commonly used statistics in quantitative financial analysis. It can be defined along either the time series dimension or the cross section dimension. In the former case, the IC for a given security is defined to be the time series correlation coefficient between the return of this security and its factor exposure prediction. In the latter case, the IC is defined to be the cross-sectional correlation coefficient between the predicted returns of a group of securities for a given time period and the actual returns subsequently realized in that time period. According to modern portfolio theory, individual securities in a portfolio should receive a weight proportional to their predicted returns. As a result, the return of the so-constructed portfolio is a function of the cross-sectional IC between predicted and actual (risk-adjusted) residual returns. As pointed out by Clarke et al. (2002), “performance in any given period is related to the cross-sectional correlation between the active security weights and realized residual returns.” As such, we focus on the cross-sectional IC in this paper.

Most theoretical work to date (see, for example, Grinold (1989), Grinold and Kahn (2000), Clarke et al. (2002, 2006)) assumes that the cross-sectional ICs are constant over time, but this is not consistent with reality. In this paper, we present strong empirical evidence that the cross-sectional ICs are time varying, and their uncertainty is an important risk in factor investing. The realized \({\text{IC}}_{t}\) for a time period can be very different from its ex ante prediction. In fact, even their signs can be different. The portfolio will perform well when the realized factor IC has the same sign as the ex ante prediction and will perform badly if the realized factor IC has the opposite sign of its prediction. Investors will need to incorporate this risk into their quantitative investment process in order to control their portfolio risk properly. Ding and Martin (2017) develop a comprehensive framework for such a quantitative factor investing process when the factor ICs are time varying vectors, whose dimension is equal to the number of factors in the cross-section factor model. The risk estimate in their model is directly related to their return prediction model. As their new fundamental law of active management (FLAM), Ding and Martin (2017) show that the performance of an active portfolio manager should be compared to the portfolio ex ante expected information ratio (IR), a new measure of the added value in every unit of risk added. This new FLAM includes Grinold and Kahn (2000) and Qian and Hua (2004) as special cases.

The work of Ding and Martin (2017) is built on the assumption that the vector \({\text{IC}}_{t}\) is a stationary time series whose true mean vector and covariance matrix are known, and in that case, the IR is shown to be a quadratic form determined by the mean vector and the inverse of the covariance matrix. In reality, these quantities are not known and need to be estimated from observed data. In this paper, we show that a simple multifactor linear regression can be applied to the risk-adjusted residual returns and the standardized factor exposures to obtain the cross-sectional \({\text{IC}}_{t}\) vector estimator at each time t. We further show how to use natural estimates of the mean and covariance of \({\text{IC}}_{t}\) to compute an estimator \(\widehat{\mathrm{IR}}\) of the ex ante information ratio IR. We prove in an appendix that under reasonable regularity conditions \(\widehat{{{\text{IR}}}}\) is a consistent estimator of IR, and it has an asymptotically normal distribution with an asymptotic variance having the same form as that derived by Lo (2002) for the Sharpe ratio in the case of normality. One can use the consistency and asymptotic normality of \(\widehat{{{\text{IR}}}}\) to compare different factor investment strategies and choose the optimal one to use.

Our asymptotic result is developed for a general multifactor model, and the single factor case can be easily derived as a special case. For a single factor model, the result shows that one can simply use the sample IC average divided by the sample IC standard deviation as an estimator of the ex ante expected IR of any single factor investing strategy. This is indeed what many practitioners do in their work, and our research shows that this is a valid approach and puts any portfolio backtest along this line on a rigorous footing.

To lend some support to our theoretical results in finite samples, we first conduct a simulation study based on a single factor model. This study shows that the distributional properties of the cross-sectional IC from our simulated single factor data are strikingly close to what our theory suggests. Furthermore, we conduct empirical studies for both the single factor model and the multifactor model using eight popular quantitative factors in practice. Our empirical result shows that the relationship among the estimated ex ante IR, the IC mean, the IC standard deviation, and the sample size is well predicted by a linear single factor model with time varying ICs. For the multifactor model, our empirical study compares the performances of multifactor models with up to three factors. As in the single factor case, the result shows that the relationship among the estimated ex ante IR, the IC mean vector, the IC covariance matrix, and the sample size is well modeled by a linear multifactor model with time varying ICs. As predicted by our fundamental law for multifactor models, the IR becomes larger as we increase the number of factors. This result allows us to assess the relative IR performance of various subsets of candidate factors, and choose the factor model with the best IR performance.

The factor model and time varying information coefficients

In this section, we describe the factor models introduced by Ding and Martin (2017), upon which our main theoretical results here are based. Let \(\{ r_{it}^{{{\text{Total}}}} ,\;i = 1, \ldots ,N{\text{ and }}t = 1, \ldots ,T\}\) be the set of returns in excess of the risk-free rate for N securities over T periods. Based on the CAPM, we have

$$ r_{it}^{{{\text{Total}}}} = \beta_{i} \,r_{B,t} + r_{it} , $$

where \(\beta_{i}\) is the beta of security i with respect to the market proxy benchmark excess return \(r_{B,t}\), and \(r_{it}\) is the residual return. Under the CAPM, we have \(E(r_{it} ) = 0\). Although we do not assume the CAPM, it is assumed throughout that the residual return has unconditional mean zero. Let \(\sigma_{{r_{it} }}^{2} = var(r_{it} |r_{i,\,t - 1}^{o} )\) be the conditional variance of the residual return \(r_{it}\) given its past returns \(r_{i,\,t - 1}^{o} :\, = \{ r_{i,\,t - 1} , \ldots ,\,r_{i,\,1} \}\). The risk-adjusted residual return is defined to be

$$ \tilde{r}_{it} = \frac{{r_{it} }}{{\sigma_{{r_{it} }} }}, $$

which has a conditional variance \(var\left( {\tilde{r}_{it} |{r}_{i,t - 1}^{o} } \right)\) equal to 1. It is further assumed that \(E\left( {\tilde{r}_{it} |r_{i,\,t - 1}^{o} } \right) = 0\) so that the past residual returns alone are not useful as a predictor for the risk-adjusted residual return in the next periodFootnote 1. This assumption ensures that the unconditional variance of \(\tilde{r}_{it}\) is also equal to 1Footnote 2:

$$ var\left( {\tilde{r}_{it} } \right) = 1. $$

Our factor model has the form

$$ \tilde{r}_{it} = {\mathbf{z^{\prime}}}_{i,t - 1} {{\varvec{\uprho}}}_{t} + e_{it} ,\,\;t = 1, \ldots ,\,T $$
(1)

where \({{\varvec{\uprho}}}_{t}\) is the factor returns vector at time t, \({\mathbf{z}}_{i,\,t - 1} = (z_{i1,\,t - 1} , \ldots ,\,z_{iK,\,t - 1} )^{\prime}\) is the \(K \times 1\) vector of lagged factor exposures one is betting on, and for \(i = 1, \ldots ,\,N\) and \(t = 1, \ldots ,\,T,\) the \({\mathbf{z}}_{i,\,t - 1}\) is standardized to have zero mean \(E({\mathbf{z}}_{i,\,t - 1} ) = {\mathbf{0}}\) and an identity covariance matrix \(E({\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} ) = {\mathbf{I}}_{K}\). Examples of quantitative factors are value factors, such as the earnings-to-price ratio, the cash-flow-to-price ratio, and various momentum factors. Here we use the risk-adjusted residual return \(\tilde{r}_{it}\) as the dependent variable in the factor model, which differs from the factor models used by other researchers where raw residual returns are used. Since the residual returns and the factor exposures vectors are both standardized, the factor returns \({{\varvec{\uprho}}}_{t}\) is also the correlation coefficient between the lagged factor exposures and the risk-adjusted residual returns. As such, it is the time-varying information coefficient \({\text{IC}}_{t}\). An important feature of our factor model is that one unit of risk-adjusted exposure will be rewarded (paid off) with the same amount of risk-adjusted residual returns across securities.

The time series of factor returns \(\left\{ {{{\varvec{\uprho}}}_{t} } \right\}_{t = 1}^{T}\) are assumed to be independent of \({\mathbf{z}}_{i,\,t - 1}\) and have constant mean vector \({{\varvec{\uprho}}} = {{\varvec{\uprho}}}_{{{\text{IC}}}}\) and covariance matrix \({{\varvec{\Sigma}}} = {{\varvec{\Sigma}}}_{{{\text{IC}}}}\). The idiosyncratic returns in our factor model, \(e_{it} ,\;i = 1, \ldots ,N,\) are assumed to satisfy the standard assumptions of a linear factor model with zero cross-correlation between \(e_{it}\) and \(e_{jt}\) when \(i \ne j\), and it is shown in Ding and Martin (2017) that they have constant variance \(\sigma_{e}^{2}\) given our model assumptions.

A key feature of the Ding and Martin model is that the cross-sectional information coefficients are time varying, which is strongly supported by empirical financial data. For example, Fig. 1 plots the sample cross-sectional correlation \(\hat{\rho }_{t,\,N}\) (defined in Eq. (4) in the next section) between the risk-adjusted residual return and the standardized 12-month momentum factor from January 1979 to June 2010 for stocks in the Russell 1000 universe. Similarly, Fig. 2 plots \(\hat{\rho }_{t,\,N}\) between the risk-adjusted residual return and the standardized book-to-price ratio factor. For such single factor cases, analysts who make the mistake of assuming that \(\rho_{t}\) is constant, and that the estimates \(\hat{\rho }_{t,\,N}\) contain only estimation error, are likely to incorrectly assume that the standard deviation of \(\hat{\rho }_{t,\,N}\) is approximately \(\sqrt {\left( {1 - \hat{\rho }_{N}^{2} } \right)/N}\) where \(\hat{\rho }_{N} = \sum\nolimits_{t = 1}^{T} {\hat{\rho }_{t,\,N} /T}\). The red solid lines in each of the two figures are located at the values of \(\hat{\rho }_{N}\) , and the red dotted lines are the 95% confidence bounds for the time varying IC values \(\hat{\rho }_{t,\,N}\) based on the above standard deviation formula. One would expect 95% of the cross-sectional IC values to be within the red dotted lines if they made the erroneous assumption that the cross-sectional IC values were constant. However, for both factors only about 25% of the IC estimates are within the corresponding bounds.

Fig. 1
figure 1

Cross-sectional IC estimates for the 12-month momentum factor

Fig. 2
figure 2

Cross-sectional IC estimates for the Book-to-Price Ratio factor

Given the above data generating process for the cross-section of security residual returns, Ding and Martin (2017, Proposition 2) show that the ex ante expected information ratio (IR) of the optimal mean-variance dollar-neutral portfolio that is based on a bet on factor \({\mathbf{z}}_{t - 1}\) can be approximated very well by the formula

$$ {\text{IR}} = \sqrt {{\mathbf{\rho^{\prime}}}_{{{\text{IC}}}} \left( {{{\varvec{\Sigma}}}_{{{\text{IC}}}} + \sigma_{e}^{2} /N \cdot {\mathbf{I}}_{K} } \right)^{ - 1} {{\varvec{\uprho}}}_{{{\text{IC}}}} }. $$
(2)

In the special case when there is a single factor so that \(K = 1,\) we have

$$ {\text{IR}} = \frac{{\rho_{{{\text{IC}}}} }}{{\sqrt {\sigma_{{{\text{IC}}}}^{2} + \sigma_{e}^{2} /N} }}, $$
(3)

where we have used \(\sigma_{{{\text{IC}}}}^{2}\) to denote \({{\varvec{\Sigma}}}_{{{\text{IC}}}}\) when \(K = 1.\) The above two formulae extend the original work by Grinold (1989), Clarke et al. (2002), Qian and Hua (2004), and Ye (2008) by allowing for time varying information coefficients.

Focusing on the single factor model case, we note that the portfolio manager faces two sources of risk: one is the non-diversifiable strategy risk, captured by \(\sigma_{{{\text{IC}}}}^{2} ,\) and the other is the sampling error risk, captured by \(\sigma_{e}^{2} /N,\) which decreases with increasing sample size. Because all strategies have an underlying risk reflected in the fact that one always has \(\sigma_{{{\text{IC}}}}^{2} \ne 0\), and correspondingly one should expect the portfolio to underperform randomly from time to time. However, for a well-chosen factor with \(\rho_{{{\text{IC}}}} > 0,\) patience will get paid off as a risk premium for taking the strategy risk. Of course, if one can predict the sign of each \(\rho_{t}\) and bet accordingly, then the payoff will be overwhelming to such a skill. However, as pointed out by Asness (2016), “(performances of) such timing strategies to be very weak historically, and some tests of their long-term power to be exaggerated and/or inapplicable.”

Consistent IC and IR estimators

The formulae in (2) and (3) are simple but depend on unknown parameters. To estimate the IR, one needs to estimate the model parameters using historical data. The most common way of estimating the \({{\varvec{\uprho}}}_{t}\) at each time period is to use the OLS cross-section regression estimators

$$ {\hat{\mathbf{\rho }}}_{t,\,N} = \left( {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} {\mathbf{z}}_{i,\,t - 1} {\mathbf{z^{\prime}}}_{i,\,t - 1} } \right)^{ - 1} \left( {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} {\mathbf{z}}_{i,\,t - 1} \tilde{r}_{it} } \right),\,\;t = 1, \ldots ,T. $$
(4)

Since \({{\varvec{\uprho}}}_{{{\text{IC}}}} = E\left( {{{\varvec{\uprho}}}_{t} } \right)\), one naturally estimates \({{\varvec{\uprho}}}_{{{\text{IC}}}}\) with the sample mean of the \({\hat{\mathbf{\rho }}}_{t,\,N}\)

$$ {\hat{\mathbf{\rho }}}_{N} = \frac{1}{T}\mathop \sum \limits_{t = 1}^{T} \,{\hat{\mathbf{\rho }}}_{t,\,N} $$
(5)

and then estimates the unknown covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N}\) with the sample covariance matrix:

$$ {\hat{\mathbf{\Sigma }}}_{\rho ,\,N} = \frac{1}{T}\mathop \sum \limits_{t = 1}^{T} \left( {{\hat{\mathbf{\rho }}}_{t,\,N} - {\hat{\mathbf{\rho }}}_{N} } \right)\,\left( {{\hat{\mathbf{\rho }}}_{t,\,N} - {\hat{\mathbf{\rho }}}_{N} } \right)^{\prime } . $$
(6)

Our proposed estimator of the ex ante expected portfolio IR is

$$ \widehat{\text{IR}} = \sqrt {{\mathbf{\hat{\rho }^{\prime}}}_{N} {\hat{\mathbf{\Sigma }}}_{\rho ,\,N}^{ - 1} {\hat{\mathbf{\rho }}}_{N} } . $$
(7)

It is shown in Appendix A that under weak regularity conditions \({\hat{\mathbf{\rho }}}_{t,\,N}\) and \({\hat{\mathbf{\rho }}}_{N}\) are consistent estimators, that is

$$ {\hat{\mathbf{\rho }}}_{t,\,N} \to^{p} {{\varvec{\uprho}}}_{t} $$
(8)

and

$$ {\hat{\mathbf{\rho }}}_{N} \to^{p} {{\varvec{\uprho}}}_{{{\text{IC}}}} $$
(9)

where \(\to^{p}\) stands for convergence in probability. We define the total risk covariance matrix as

$$ {{\varvec{\Omega}}}_{N} :\, = {{\varvec{\Sigma}}}_{{{\text{IC}}}} + \sigma_{e}^{2} /N \cdot {\mathbf{I}}_{K} .$$

Appendix A proves the consistency resultFootnote 3

$$ {\hat{\mathbf{\Sigma }}}_{\rho ,\,N} \to^{p} {{\varvec{\Omega}}}_{N}, $$
(10)

that is, the sample IC covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,\,N}\) is approximately equal to the total risk covariance matrix in the expected IR expression. When \(K = 1\), the result in (10) reduces to

$$ \hat{\sigma }_{\rho ,\,N} \to^{p} \sqrt {\sigma_{{{\text{IC}}}}^{2} + \sigma_{e}^{2} /N} . $$
(11)

The above result shows that in the case of time varying ICs, one should use (11) to get a proper confidence interval under a single factor model. The two green dashed lines in Figs. 1 and 2 provide the corrected confidence intervals for the time-varying ICs of the Momentum factor and the Book-to-Price Ratio factor. It can be seen that the correct confidence intervals are much wider than the naïve confidence intervals when assuming the ICs are constant over time. With the corrected confidence intervals that we develop in this paper, about 6.3% of the estimated ICs for the Momentum factor and 5.3% of the estimated ICs for the Book-to-Price Ratio factor are outside the 95% confidence intervals.

In the quite unrealistic case where the underlying population cross-sectional ICs are constant over time so that \(\sigma_{{{\text{IC}}}}^{2} = 0\), we have \(\sigma_{e}^{2} = 1 - \rho^{2}\) and so

$$ \hat{\sigma }_{\rho ,\,N} \to^{p} \sqrt {\frac{{1 - \rho^{2} }}{N}} . $$
(12)

This is a very familiar result for the sample cross-correlation coefficient between two random variables based on an iid sample. See, for example, Casella and Berger (2016) and Qian et al. (2007).

Combining the results in (9) and (10), we have the following consistency result

$$ \widehat{\text{IR}} \to^{p} {\text{ IR}} .$$
(13)

In the special case with \(K = 1,\) we have that \(\widehat{\text{IR}} = \hat{\rho }_{N} /\hat{\sigma }_{\rho ,\,N}\) is a consistent estimator of the expected IR of the alpha factor portfolio.

It should be emphasized that the portfolio is constructed using a selection universe of N securities, and each sample regression coefficient \({\hat{\mathbf{\rho }}}_{t,\,N}\) is also calculated using the same N observations. The expected information ratio (\({\text{IR}}\)) depends on the size of the universe N. A portfolio constructed using a universe of 1000 stocks will have a different (smaller) expected IR from that of a portfolio constructed using a universe of 3000 stocks. The increased sample size improves the accuracy of parameter estimation, which in turn improves the portfolio performance.

To demonstrate the above consistency results, we generate data from a single factor model as follows. We first generate 240 random \(\rho_{t}\) from a normal distribution with \(\rho_{{{\text{IC}}}} = 0.05\) and \(\sigma_{{{\text{IC}}}} = 0.15\). We then generate a total cross-section of 2000 risk-adjusted residual returns for each of 240 time periods (\(T = 240\)) using Eq. (1) with normally distributed factor exposures \(z_{i,t - 1}\) and idiosyncratic returns \(e_{it}\). After the factor exposures and risk-adjusted residual returns are generated, we randomly draw N observations for \(5 \le N \le 1000\). We calculate the cross-sectional IC estimates \(\hat{\rho }_{t,N}\) over 240 time periods, get the sample mean \(\hat{\rho }_{N}\) and the sample standard deviation \(\hat{\sigma }_{\rho ,N}\) of these IC estimates, and then estimate the expected portfolio IR using \(\hat{\rho }_{N} /\hat{\sigma }_{\rho ,N}\). The procedure is simulated 100 times. More specifically, for each N, we draw 100 random samples of size N with replacement from the 2000 residual returns. An average of these simulated IR estimates is calculated and is shown as the blue line in Fig. 3. The red line is the theoretical IR as in Eq. (3) using the true parameters. The dotted line is the maximum IR (i.e., \({\text{IR}}_{\max } = \rho_{{{\text{IC}}}} /\sigma_{{{\text{IC}}}}\)) one can reach when N goes to infinity.

Fig. 3
figure 3

Theoretical and simulated information ratios as functions of N under time-varying cross-sectional information coefficients

Figure 3 shows that the average IR estimate is remarkably close to what our theory suggests. When the cross-sectional sample size (N) is small, the estimation error may not be very small even after simulation averaging. This suggests that it is necessary to adjust the standard error using \(\sigma_{e}^{2} /N\) when N is not large. Results not reported show that the IR estimate for each sample is also close to what our theory predicts. We use simulation averaging only to average out some of the simulation noises so that what we plot reflects a systematic pattern that is not due to pure chance.

Asymptotic normality of the IC and IR estimators

With the consistency results in (9) and (10), we show in Appendix B that

$$ \sqrt T \left( {{\hat{\mathbf{\rho }}}_{N} - {{\varvec{\uprho}}}_{{{\text{IC}}}} } \right)\mathop \approx \limits^{d} N({\mathbf{0}},\,{{\varvec{\Omega}}}_{N} ), $$
(14)

that is, the distribution of \(\sqrt T \left( {{\hat{\mathbf{\rho }}}_{N} - {{\varvec{\uprho}}}_{{{\text{IC}}}} } \right)\) can be approximated by a normal distribution with mean zero and variance \({{\varvec{\Omega}}}_{N}\). Furthermore, we show in Appendix B that under some additional conditions, the IR estimator in (7) has an asymptotically normal distribution:

$$ \sqrt T (\widehat{\text{IR}} - {\text{IR}})\mathop \approx \limits^{d} N(0,\,V_{{{\text{norm}}}} ) $$
(15)

where

$$ V_{\text{norm}} = 1 + \frac{1}{2}{\text{IR}}^{2} $$
(16)

is the asymptotic variance of \(\widehat{\text{IR}}\). This result has the same form as the asymptotic normality of the Sharpe ratio established by Lo (2002) when returns are iid and normally distributed. As in the case of the Sharpe ratio, a standard error (SE) for the information ratio estimator is computed as

$$ {\text{SE}}(\widehat{\text{IR}}) = \sqrt {\frac{1}{T}\left( {1 + \frac{1}{2}\widehat{\text{IR}}^{2} } \right)} . $$
(17)

The t statistic for testing the null hypothesis that the information ratio has a value greater or equal to \({\text{IR}}_{0}\) is

$$ t = \frac{{\sqrt T (\widehat{\text{IR}} - {\text{IR}}_{0} )}}{{\sqrt {\left( {1 + \frac{1}{2}\widehat{\text{IR}}^{2} } \right)} }}. $$
(18)

The asymptotic variance formula in (16) relies on Assumption 2 in Appendix B which states that the asymptotic variance of \(1/T\sum\nolimits_{t = 1}^{T} {({{\varvec{\uprho}}}_{t} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )({{\varvec{\uprho}}}_{t} - {{\varvec{\uprho}}}_{{{\text{IC}}}} )^{\prime} - {{\varvec{\Sigma}}}_{{{\text{IC}}}} }\) depends on the variance of \({{\varvec{\uprho}}}_{t}\) only. Such an assumption will hold if \({{\varvec{\uprho}}}_{t}\) is iid normal. In reality \({{\varvec{\uprho}}}_{t}\) will be not only serially correlated but also non-normal. In the case when \({{\varvec{\uprho}}}_{t}\) is iid but non-normal with nonzero skewness \(k_{3}\) and kurtosis \(k_{4}\), we can show that the asymptotic variance becomes

$$ V_{{{\text{IID}}}} = 1 - k_{3} {\text{IR}} + \frac{{k_{4} - 1}}{4}{\text{IR}}^{2}. $$
(19)

A standard error for the IR estimate is obtained by plugging estimates of IR, \(k_{3}\), \(k_{4}\), and taking the square root of the result divided by T. A formula of a similar form was obtained for the Sharpe ratio by Zhang et al. (2021, Eqs. (27) and (44)).

Empirical simulation results for single factor models

The results in the previous section show that data generated from our theoretical models have the desired properties as predicted by the asymptotic theory. The crucial question is: how relevant are our theoretical models in the real world? Quantitative models are often built on different factors, such as value factors, momentum factors, etc. Researchers usually assume a linear relationship like in Eq. (1) between security returns and factor exposures. It will be interesting to see if the simulated IR from universes of different sample sizes has the same features as the theoretically simulated factors above. This can be a check for model specification (if a theoretical model has a certain property but the empirical data does not have this property, then we can be sure that the theoretical model does not fit the data well). It can also provide a guide to portfolio managers on how to choose factors that lead to the best IR performance. Here we focus on choosing a single factor model that has the best IR performance among a set of factors.

The eight quantitative factors we study here are:

  1. (1)

    Book to Price Ratio,

  2. (2)

    Cash Flow to Price Ratio,

  3. (3)

    Earnings to Enterprise Value Ratio,

  4. (4)

    Sales to Price Ratio,

  5. (5)

    12-month Momentum,

  6. (6)

    Share Buyback as a Percent of Total Shares Outstanding,

  7. (7)

    Return on Capital,

  8. (8)

    Short as a Percent of Total Shares Floating.

All the raw exposures start from 1979:01 to 2010:06 except the Cash Flow to Price Ratio, which is from 1990:02 to 2010:06, and Short as a Percent of Total Shares Floating, which is from 1988:02 to 2010:06.

The raw factor exposures are cross-sectionally Winsorized and then standardized. The residual returns are calculated and standardized using time series estimated residual return volatilityFootnote 4 so that they have zero mean and unit variance as well. The details of the standardization for both the returns and the factor exposures were provided in Ding and Martin (2017). At each time period, we randomly draw N companies with returns and factor exposures and calculate the cross-sectional regression coefficient \(\hat{\rho }_{t,N}\) for that time period. We then calculate the sample mean \(\hat{\rho }_{N}\) and the sample standard deviation \(\hat{\sigma }_{\rho ,N}\) over time and get an IR estimator (\(\widehat{\text{IR}} = \hat{\rho }_{N} /\hat{\sigma }_{\rho ,N}\)). This is simulated 100 times, and we then get the average IR from these 100 repeated samples. We do this for \(N = 50{\text{ to 3000}}\) companies.

Figure 4 displays the empirical simulation results. It can be seen that the actual data has the shape we expect for all the factors considered. So, at least from this perspective, we can say that the linear models specified in Eq. (1) give a quite good description of the relationship between security returns and factor exposures.

Fig. 4
figure 4

Simulated IR using empirical factors

Note that when \(K = 1\), we have

$$ \log ({\text{IR}}) = \log \left( {\rho_{{{\text{IC}}}} } \right) - 0.5\log \left[ {\left( {1 - \rho_{{{\text{IC}}}}^{2} - \sigma_{{{\text{IC}}}}^{2} } \right)/N + \sigma_{{{\text{IC}}}}^{2} } \right]. $$
(20)

As an alternative method to evaluate the performance of \(\widehat{{{\text{IR}}}}\), we can fit the following nonlinear regression model:

$$ \log (\widetilde{\text{IR}}_{N} ) = \log \left( {\rho_{{{\text{IC}}}} } \right) - 0.5\log \left[ {\left( {1 - \rho_{{{\text{IC}}}}^{2} - \sigma_{{{\text{IC}}}}^{2} } \right)/N + \sigma_{{{\text{IC}}}}^{2} } \right] + error $$
(21)

based on the observations \(\left\{ {\widetilde{\mathrm{IR}}_N {\text{ for }}N = 50,51, \ldots ,3000} \right\}\) where each \(\widetilde{\text{IR}}_N\) is the average of 100 simulated \(\widehat{\mathrm{IR}}_N\)’s. For each factor, we can estimate \(\rho_{{{\text{IC}}}}\) and \(\sigma_{{{\text{IC}}}}\) by nonlinear least squares. The estimated results are shown in Table 1. The table shows that the above empirical model captures the relationship between the expected IR and the three measures \(\rho_{{{\text{IC}}}} ,\) \(\sigma_{{{\text{IC}}}}^{{2}}\) and N well. The \(R^{2}\)’s from these regressions are very close to 1 in almost all the cases. This implies that the error in (21) is very small relative to the true value of \(\log ({\text{IR}})\). We note that the results are qualitatively similar when we do not use simulation averaging, although the \(R^{2}\)’s are somewhat smaller.

Table 1 FLAM model fit relating the estimated IR to the IC mean, the IC standard deviation, and the sample size N

In Table 1 and Fig. 4 below, the value of \(N_{90}\) for each factor denotes the number of stocks needed to construct a portfolio that achieves 90% of the maximum possible IR. For example, for the Momentum factor, only 363 stocks are needed, and for the Return on Capital factor, 2483 stocks are needed in order to reach 90% of the maximum possible IRs, respectively.

Empirical simulation results for multifactor models

In order to demonstrate the great relevance of our asymptotic results for multifactor models, we also carried out simulations for all 6 subsets of a three-factor model. As in the single-factor-model simulations, we randomly draw N companies at each time period with returns and factor exposures and calculate the cross-sectional multifactor regression coefficient vectors \({\hat{\mathbf{\rho }}}_{t,N}\) for \(t = 1,2, \ldots ,T\). We then calculate the sample mean vector \({\hat{\mathbf{\rho }}}_{N}\) and the sample covariance matrix \({\hat{\mathbf{\Sigma }}}_{\rho ,N}\) for the regression coefficient vectors \({\hat{\mathbf{\rho }}}_{t,N}\). The final portfolio IR using N companies is then estimated as \(\widehat{\mathrm{IR}}_N = \sqrt {{\mathbf{\hat{\rho }^{\prime}}}_{N} {\hat{\mathbf{\Sigma }}}_{\rho ,\,N}^{ - 1} {\hat{\mathbf{\rho }}}_{N} } .\) We conducted this simulation 15 times for N = 50 to 1000 companies. The final IR curves shown in Fig. 5 are the average of the 15 simulations.

Fig. 5
figure 5

Simulated IR using multifactor combination

The bottom three lines are the simulated IRs for the single factor portfolios which are the same as those shown in Fig. 4. The middle three lines are the simulated IRs for the portfolios based on the three different two-factor models, BP+EEV, BP+MOM, and EEV+MOM. It is clear that the IRs based on the two-factor models are all larger than the IRs of the single factor models by a substantial amount. Finally, the top line shows the IRs from using all three factors (BP+EEV+MOM). It is interesting to note that the best performing two-factor model is based on the two best performing single factor models.

There is no convenient analog to Eq. (21) for the above multifactor model. However, one can imagine the potential existence of a single-factor model not yet discovered, but its IR values will be the same as those for a multifactor model. It is then of interest to know what the parameters are for the single-factor model. Against this backdrop, we fit Eq. (21) to the IR values for each of the multifactor models under consideration. We display the results in Table 2, where the first three rows are taken from Table 1 for ease of comparison. The very high R-squared values in Table 2 indicated that the fits of nonlinear least squares are quite good. While the IC volatility (\(\hat{\sigma }_{{{\text{IC}}}}\)) is sometimes higher for a multifactor model than that for a special case single factor model, the IC mean (\(\hat{\rho }_{{{\text{IC}}}}\)) is always considerably larger, resulting in a considerably larger IR for the multifactor model compared to each subset single factor model.

Table 2 FLAM model fit relating the estimated IR to the IC mean, the IC standard deviation, and the sample size N for multifactor combined portfolios

Conclusion

Building on the Ding and Martin (2017) model, this paper has studied the estimation and inference of the IC mean and the IR in the presence of time varying ICs. Our theoretical results can be summarized as follows: (a) We have established the consistency and asymptotic normality of the time series sample mean of the IC vectors; (b) We have established the consistency and asymptotic normality of the IR estimator based on the sample mean and the sample covariance of the empirical IC’s. In both cases, we have provided the asymptotic variance formula. In particular, the asymptotic variance of the IR estimator is shown to take the same form as the asymptotic variance of the Sharpe ratio, allowing the portfolio manager to easily compute the standard error of the IR estimate.

Our empirical results for single-factor models show that the behavior of the IR estimator, as a function of cross-section sample size N, is as predicted by the theoretical results of Ding and Martin (2017). The empirical results for a variety of factor models reveal that different factor models have different maximum IR values \({\text{IR}}_{\max }\), and approach \({\text{IR}}_{\max }\) at different rates as a function of N. Furthermore, the IR estimators under different factor models have different standard errors. Thus, a portfolio manager now has some new tools for analytic evaluation and decision-making regarding the choice of the “best” single-factor models.

Our empirical studies for a 3-factor model and each of its 6 subset models show that, as anticipated, higher values of IR are obtained with multifactor models that contain more factors, and that there is a clear IR-based ranking for the three 2-factor models. Thus, we have a tool for subset selection of multifactor models, which with current high-performance computers can be applied to select among more than 10 factors. An open question is how we may design a good penalty to control for over-fitting.

It is well-known that the Sharpe ratio is optimistic when returns are serially dependent, either with positive serial autocorrelation or an uncorrelated but serially dependent GARCH process and in such cases a correction to the standard error is needed to obtain a valid standard error. See for example Lo (2002). The IR takes the same form as the Sharpe ratio but with returns replaced by the IC time series. The last column in Table 1 with the column title “ACF(1)” reports the first-order autocorrelation for the estimated IC series for various single-factor models. The magnitude of autocorrelation indicates that the IC may be serially correlated, and the information ratio may need to be modified to account for the extra information that a fund manager can exploit. However, in finite samples, a fund manager may choose to explore only some low-order dependence. In this case, inferences on the newly derived IR will need to account for residual serial dependence in the ICs. To design a reliable inference procedure, we can use the ideas from the large econometric literature on long-run variance estimation and HAC (Heteroscedasticity and Autocorrelation Consistent) inference. Chen and Martin (2021) and Christidis and Martin (2021) use the HAC methodFootnote 5, but we may employ the most recent HAR (Heteroscedasticity and Autocorrelation Robust) method to obtain more accurate and trustworthy inferences. See, for example, Kiefer and Vogelsang (2005) and Sun (2013). We leave this for future research.