Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The​​ Footnote 1 development of what has become known as the Capital Asset Pricing Model (CAPM) by Jack Treynor and others was a watershed event in financial economics.Footnote 2 It marked the birth of modern asset pricing because for the first time financial economists had a formal method for estimating the expected return of a risky investment opportunity. Within a short period of time after the model was developed, researchers set about collecting stock market data to determine whether or not the model actually worked. Early results were encouraging—beta appeared to explain cross sectional differences in realized returns.Footnote 3 However, as researchers subjected the model to more powerful tests, cracks began to appear. In particular, researchers were able to group assets into portfolios using variables such as size, book-to-market, and past returns (momentum) and show that even though these portfolios displayed large cross sectional variation in realized returns, this was not mirrored in an equivalent cross sectional variation in beta. In response to these empirical shortcomings, a number of extensions to the original model have been proposed. The most notable are the factor specifications proposed by Fama and French (1993) and Carhart (1997) which are motivated by theoretical developments in Ross (1976) and Merton (1973).

The new models have been subjected to the same level of empirical scrutiny as the CAPM, and for the most part they have fared better. This has led many financial economists to conclude that, to adjust for risk, the CAPM should be replaced by one of the new models. The problem with this logic is that it ignores a subtle distinction between the models. The CAPM was developed using theory alone—stock market data did not exist in electronic format when Jack wrote his original paper. The new models were developed only after observing the data. Indeed, they were actually derived with one goal in mind, to fit return data better. What that means is that they have an inherent advantage over the CAPM when they are evaluated using the data they were designed to explain. This is true even if you only test the model using data collected after the models were first proposed.

The easiest way to understand why, when you compare the performance of the new models to the CAPM, the tests are biased in favor of the new models, is to consider the following analogy. Early astronomers could not reconcile the motion of the planets with the dominant theory of the day—the Ptolemaic theory that had the Earth at the center of the universe. Rather than look for an alternative theory, these astronomers reacted to the inability of the Ptolemaic theory to predict the motion of the planets by “fixing” each observational inconsistency. Just as modern financial economists added new risk factors to the CAPM, the early astronomers added epicycles to the theory. For example, because the Ptolemaic theory did not account for the motion of the Earth, it could not explain the fact that, when viewed from the Earth, the planets sometimes move backwards. An epicycle fixes this problem by adding a circular orbit within another circular orbit. The net result was that by the time Copernicus proposed the correct theory that the Earth revolved around the Sun, the Ptolemaic theory had been fixed so many times, it better explained the motion of the planets than the Copernican system.Footnote 4 The lesson here is if you test a theory using the data it was designed to explain, it should not be surprising to find that the theory works. A real test of a theory is when it explains data it was not designed to explain.

Although the extensions to the CAPM better explain the cross section of stock returns, it is hard to know, using traditional tests, whether these extensions represent true progress towards a better measure of risk or simply the asset pricing equivalent of an epicycle. To determine whether any extension to the CAPM better explains risk, one needs to confront the models with facts they were not specifically designed to explain. At first glance it might appear that this approach is a lost cause. How can you test a model of risk without using stock return data? The answer is that instead of looking at stock returns, we look at what investors actually do. That is, we infer what risk model investors use by observing their investment decisions.

To understand the basis of our new test, it is helpful to recall how prices and returns are determined in any risk model. All models of risk assume that investors compete with each other to find attractive investment opportunities. When investors find such opportunities, they react by submitting buy or sell orders. Prices are then determined so that the market clears, that is, total demand equals total supply. As a consequence of this competition, equilibrium prices are set so that the expected return of every asset is solely a function of its risk. Consequently these buy and sell orders reveal the preferences of investors and therefore they reveal which risk model investors are using. By observing these orders we can infer whether investors price risk at all, and if so, which risk model they are using.

There are two criteria that are required to implement this idea. First, one needs a mechanism that identifies attractive investment opportunities. Second, one needs to observe investor reactions to these opportunities. We can satisfy both criteria if we implement the method using mutual fund data. Using this dataset we infer, from a set of candidate models, the model that is closest to the risk model investors are actually using. We will restrict attention to the time period after the new models were developed, that is 1996–2011. Here we follow the lead of Guerard Jr, Deng, Gillam, Markowitz, Wang, and Xu (2015) who test Bloch, Guerard Jr, Markowitz, Todd, and Xu (1993) in the 1997-2014 period.

What we find is somewhat of a triumph for economic theory. Even without the benefit of the last 50 years of data, we find that the model derived in the early 1960s, the CAPM, is the best description of investor behavior. None of the extensions that have been proposed do better. Importantly, the CAPM better explains investor behavior than no model at all, indicating that investors do price risk. Most surprisingly, the CAPM also outperforms a naive model in which investors ignore beta and simply chase any outperformance relative to the market portfolio. Investors’ capital allocation decisions reveal that they adjust for risk using the CAPM beta. The poor performance of the extensions to the CAPM implies that although these extensions might better explain cross sectional variation in realized returns, they do not help explain how investors measure risk. In short, we are no closer to understanding the risk-return relation today than we were when the CAPM was originally developed more than half a century ago.

18.1 Methodology and Data

In earlier work we explain how the mutual fund market equilibrates.Footnote 5 When investors find an investment in a particular mutual fund to be attractive, they invest capital in the fund. As the fund grows, the expected return of the fund declines as the fund manager’s attractive investment ideas are exhausted. The flow of capital ceases when the expected return the mutual fund delivers to its investors is solely a function of the risk of the fund. That is, competition between investors drives the fund’s net alpha to zero. What this implies is that the flows of capital in and out of mutual funds are the buy and sell orders mentioned in the introduction. Thus, the flow of funds reveals which investment opportunities mutual fund investors considered to be attractive.

Notice that when the market is in equilibrium, all mutual funds have a zero net alpha. Now consider what happens when new information arrives that allows investors to make a better inference about a fund’s alpha. One example of new information is the fund’s return itself. If the fund’s return exceeds the risk adjusted return predicted by the risk model investors are using, investors will positively update their beliefs about the skill level of the fund’s manager and infer that at the fund’s current size, the alpha is positive. Similarly, if the fund’s realized return is less than the risk adjusted return predicted by the risk model, investors will negatively update their beliefs about the skill level of the manager and infer that at the fund’s current size, the alpha is negative. In short, the fund’s realized return reveals attractive investment opportunities, and the subsequent flow of funds reveals investor reactions to these opportunities.

We are now ready to describe our test. Each risk model we consider uniquely determines which funds outperform and which funds underperform. We then observe the subsequent flow of funds. The model for which outperformance best drives capital flows is the model that comes closest to the model that investors are actually using to price risk. We use the mutual fund data set described in Berk and van Binsbergen (2015). Because the focus of this article is to ensure that we test all models on an equal footing, we will only conduct our test using data that was not available at the time all the models were developed. In practice that means we restrict attention to the time period from 1996–2011.Footnote 6

We implement this idea as follows. We compute the fraction of times we observe an inflow when the fund’s realized return exceeds the risk adjusted return and the fraction of times we observe an outflow when the fund’s realized return is less than the risk adjusted return, as defined by the risk model. Our measure of fit is the average of these two fractions. We show in Berk and van Binsbergen (2016b) that this average can also be estimated by running a simple linear regression of the sign of flows on the sign of outperformance. The latter approach is preferable because, as we show in the same paper, the t-statistics of this regression is an accurate measure of statistical significance. In particular, if the coefficient using one risk model statistically significantly exceeds the coefficient using a second risk model, then we can say the first model is closer to the risk model investors are actually using.

18.2 Results

There are two practical issues that we need to confront in order to run this test. The first concerns what a flow actually is. A fund’s assets under management changes for two reasons. Either the prices of the underlying stocks change or investors invest or withdraw capital. Although both mechanisms change assets under management, they are unlikely to equally affect the fund’s alpha. For example, increases in fund sizes that result from inflation are unlikely to affect the alpha generating process. Similarly, the fund’s alpha generating process is unlikely to be affected by changes in fund size that result from changes in the price level of the market as a whole. Consequently, in our empirical specification, we only consider capital flows into and out of funds net of what would have happened had investors not invested or withdrawn capital and had the fund manager adopted a purely passive strategy and invested in Vanguard index funds. That is, we measure the flow of funds as

$$\displaystyle{ \mbox{ SIGN}(q_{it} - q_{it-T}(1 + R_{it}^{V })), }$$
(18.1)

where q it is the size of fund i at time t, and R it V is the cumulative return, over the horizon from tT to t, to investors of the collection of available Vanguard index funds that comes closest to matching the fund under consideration. Under this definition of capital flows, we are assuming that, in making their capital allocation decisions, investors take into account changes in the size of the fund that result from returns due to managerial outperformance alone. That said, all of our results are robust to replacing R it V with the fund’s own return in (18.1).

The second practical issue that we need to confront is the horizon length over which to measure the effects. For most of our sample funds report their AUMs monthly, however, in the early part of the sample many funds report their AUMs only quarterly. In order not to introduce a selection bias by dropping these funds, the shortest horizon we will consider is three months. If investors react to new information immediately, then flows should immediately respond to performance and the appropriate horizon to measure the effect would be the shortest horizon possible. But in reality, there is evidence that some investors do not respond immediately. For this reason, we also consider longer horizons (up to four years). The downside of using longer horizons is that longer horizons tend to put less weight on investors who update immediately, and these investors are also the investors more likely to be marginal in setting prices.

Table 18.1 Flow of funds outperformance relationship (1996–2011): The table reports the average of the fraction of times we observe an inflow when the fund’s realized return exceeds the risk adjusted return and the fraction of times we observe an outflow when the fund’s realized return is less than the risk adjusted return. Each row corresponds to a different risk model. The first two rows report the results for the market model (CAPM) using the CRSP value-weighted index and the S&P 500 index as the market portfolio. The next three rows report the results of using as the benchmark return, three rules of thumb: (1) the fund’s actual return, (2) the fund’s return in excess of the risk-free rate, and (3) the fund’s return in excess of the return on the market as measured by the CRSP value-weighted index. The next two rows are the FF and FFC factor specifications. The largest value in each column is shown in boldface

We will consider the following models of risk. Because the market portfolio is not observable, we will test two versions of the CAPM that correspond to two different market proxies, the CRSP value weighted index of stocks and the S&P 500 index. We will also test the factor models proposed in Fama and French (1993), hereafter the FF factor specification and Carhart (1997), hereafter the FFC factor specification. In addition we will consider three “no model” benchmarks. The first uses the actual return of the fund, which corresponds to investors using no model at all. The second uses the return of the fund in excess of the risk free return. Investors would use this measure of risk if they were risk neutral. Finally, we will consider a model where the performance of the fund is just the fund’s return minus the return of the market (as measured by the CRSP value weighted index). Although similar to the CAPM, in this model investors ignore beta. All they care about is outperformance relative to the market.

Which model best approximates the true asset pricing model? Table 18.1 reports the average of the fraction of times we observe an inflow when the fund’s realized return exceeds the risk adjusted return and the fraction of times we observe an outflow when the fund’s realized return is less than the risk adjusted return. If flows and outperformance are unrelated, we would expect this average to equal 50 %. The first takeaway from Table 18.1 is that none of our candidate models can be rejected,Footnote 7 implying that regardless of the risk adjustment, a flow-performance relation exists. On the other hand, none of the models perform better than 63 %. It appears that a large fraction of flows remain unexplained. Investors appear to be using other criteria to make a non-trivial fraction of their investment decisions.

The CAPM with the CRSP value weighted index as the market proxy performs best at the 3- and 6-month horizon, and the FFC model performs best at the 1-year horizon. To assess whether the difference in performance between the CAPM and the other models is statistically significant, we report, in Table 18.2, the double-clustered (by fund and time) t-statistics. Recall that because the new models nest the CAPM, for researchers to reject those models in favor of the CAPM, they must statistically outperform the CAPM. Yet as Table 18.2 shows, no model statistically outperforms the CAPM at any horizon.

Table 18.2 Tests of statistical significance: The first column in the table reports the average of the fraction of times we observe an inflow when the fund’s realized return exceeds the risk adjusted return and the fraction of times we observe an outflow when the fund’s realized return is less than the risk adjusted return. The second column provides the t-statistic of the test of whether this average is significantly different from 50 %. The rest of the columns provide the statistical significance of the pairwise test of whether the models are better approximations of the true asset pricing model. For each model in a column, the table displays the t-statistic of the test that the model in the row is a better approximation of the true asset pricing model. The rows (and columns) are ordered by the probabilities in the first column, with the best performing model on top. All t-statistics are double clustered by fund and time (see Thompson (2011))

To assess the relative performance of the models, we begin by first focusing on the behavioral model that investors just react to past returns without adjusting for risk, the column marked “Ret” in the table. By looking down that column in Table 18.2, one can see that the factor models all statistically significantly outperform this model at horizons of less than two years. For example, the t-statistic reported in Table 18.2 that the CAPM outperforms this no model benchmark at the 3-month horizon is 4.98, indicating that we can reject the hypothesis that the behavioral model is a better approximation of the true model than the CAPM. Based on these results, we can reject the hypothesis that investors just react to past returns. The next possibility is that investors are risk neutral. In an economy with risk-neutral investors, we would find that the excess return (the difference between the fund’s return and the risk free rate) best explains flows, so the performance of this model can be assessed by looking at the columns labeled “Ex. ret.” Notice that all the risk models nest this model, so to conclude that a risk model better approximates the true model, the risk model must statistically outperform this model. For horizons less than 2 years, all the risk models satisfy this criterion. Finally, one might hypothesize that investors benchmark their investments relative to the market portfolio alone, that is, they do not adjust for any risk differences (beta) between their investment and the market. The performance of this model is reported in the column labeled “Ex. mkt.” The CAPM statistically significantly outperforms this model at all horizons—investors’ actions reveal that they use betas to allocate resources.

Next, we use our method to discriminate between the risk models. Recall that both the FF and FFC factor specifications nest the CAPM (the first factor in each specification is the market), so to conclude that either factor model better approximates the true model, it must statistically significantly outperform the CAPM. The test of this hypothesis is in the columns labeled “CAPM.” Neither factor model statistically outperforms the CAPM at any horizon implying that the additional factors add no explanatory power for flows.

It is also informative to compare the tests of statistical significance across horizons. The ability to statistically discriminate between the models deteriorates as the horizon increases. This is what one would expect to observe if investors instantaneously moved capital in response to the information in realized returns. Thus, this evidence is consistent with the idea that capital does in fact move quickly to attractive investment opportunities.

18.3 Conclusion

Our empirical finding that no model outperforms the CAPM is, in some sense, startling. The model was developed at a time when the mutual fund sector was tiny. In 1962, there were just 172 equity mutual funds in existence. In the interim an entirely new sector of investing developed, so that today there are more funds then there are stocks. The other models we evaluated were all developed after the mutual fund sector started to experience explosive growth. Yet none of those models are better able to explain investor behavior. That Jack Treynor was able to predict behavior in a sector that essentially did not exist when he first developed the CAPM is a remarkable achievement in the field of economics. His subsequent application of the CAPM and beta to mutual fund performance (Treynor (1965) and Treynor and Mazuy (1966)) measurement was a great innovation in financial economics.

Yet the fact remains that the CAPM does a poor job explaining cross-sectional variation in expected returns. The profession’s answer to this shortcoming has been to attempt to improve the CAPM. What our results show is that the reason these “improved” models better explain cross-sectional variation is simply because that is what they have been designed to do. We are therefore no closer at explaining what, if any, other factors determine expected returns than we were when the CAPM was first developed.

This raises a number of possibilities about the relation between risk and return. The first possibility, and the one most often considered in the existing literature, is that this finding does not invalidate the neoclassical paradigm that requires expected returns to be a function solely of risk. Instead, it merely indicates that the CAPM is not the correct model of risk, and, more importantly, a better model of risk exists.

The second possibility is that the poor performance of the CAPM is a consequence of the fact that there is no relation between risk and return. That is, that expected returns are determined by non-risk based effects. The final possibility is that risk only partially explains expected returns, and that other, non-risk based factors, also explain expected returns. The results in this paper shed new light on the relative likelihood of these possibilities.

The fact that we find that the factor models all statistically significantly outperform our “no model” benchmarks implies that the second possibility is unlikely. That leaves the question of whether the failure of the CAPM to explain the cross section of expected stock returns results because a better model of risk exists, or because factors other than risk also explain expected returns. To conclude that a better risk model exists, one has to show that the part of the variation in asset returns not explained by the CAPM can be explained by variation in risk. This is what the flow of funds data allow us to do. If variation in asset returns that is not explained by the CAPM attracts flows, as is the case for the extensions of the CAPM we tested, then one can conclude that this variation is not compensation for risk.