INTRODUCTION

Financial returns, and especially hedge funds returns, are contaminated by errors-in-variables. If neglected, this problem may completely invalidate the results obtained when estimating financial models, like the market model or the Fama and French (F&F)1, 2, 3 one or any standard linear factor model. Nevertheless, researchers seldom tackle the problem of errors-in-variables in their empirical works, perhaps because the econometric methods related to the correction of this problem are not so advanced as other fields of financial econometrics. Some papers4 have proposed correction techniques for errors-in-variables in a capital asset pricing model (CAPM) setting, but these studies are not numerous and much work remains to be done.

In this article, we develop a new empirical version of the F&F model based on a new version of the Hausman5 specification test.6 This empirical model incorporates correction factors for risk exposure. These factors deal with the problem of errors-in-variables. Our model also includes another innovation. It uses higher moments as instruments for correcting the problem of errors-in-variables. In fact, the methods of financial econometrics were traditionally well suited to estimate financial models, which, like the CAPM, postulate a linear relationship between return and risk. But this kind of relation is valid only if risk is not very high or under very special assumptions as Gaussian returns or quadratic utility function, which is not very realistic because it implies increasing absolute risk aversion. Investor prudence,7 which is associated to the utility function third derivative, and fat-tail risk, which is related to the fourth derivative, introduce nonlinearities in the relation between return and risk. In this article, we reformulate the problem of correcting errors-in-variables in the setting of a nonlinear relation between return and risk. Higher-order moments and cumulants of returns become very important in this analysis in relation to nonlinear risk (Racicot and Théoret8, 9).

According to Campbell et al,10 the errors-in-variables problem in asset pricing models can be addressed in two ways. The first way (Fama and MacBeth11) is to reduce this problem by pooling stocks into portfolios. The second way (Shanken12 ) is to explicitly adjust the coefficient standard errors to reduce the biases resulting from the errors-in-variables. However, there is a more recent approach proposed by Kandel and Stambaugh13 using generalized least squares instead of ordinary least squares (OLS) as estimator. But, in this approach, the covariance matrix required to weight the observations must be estimated, which might be a problem.

Durbin,14 Pal15 and more recently Racicot (1993) and Dagenais and Dagenais15, 16 have proposed an estimation method to tackle the problem of errors-in-variables based on an optimal combination of estimators built on the cumulants of the explanatory variables. These cumulants are used as instruments to reduce the problem of errors-in-variables. In this article, we use a variant of this method to correct the errors-in-variables problem, which might bias the estimation of the well-known augmented F&F model and which is based on higher moments.

As shown in this article, using higher moments to tackle errors-in-variables opens the way to a synthesis between the modern asset pricing theory and the financial econometric treatment of errors-in-variables. Surely, Durbin,14 Pal,15 Racicot18 and Dagenais and Dagenais16, 17 did not aim at transposing their technique to the asset pricing theory and risk measures. But it is widely recognized now that the first two moments of returns, that is, the mean and the variance, are largely insufficient to measure the risk of a portfolio. Huang and Litzenberger,19 Ingersoll20 and Levy21 have mentioned that portfolio selection paradigm based on the first two moments of the returns distribution maximizes the expected utility of a representative agent only in either of the two following situations: his utility function is quadratic or the returns distribution is normal. Of course, these two postulates are violated in the real world. Following this ‘constat’, Samuelson,22 Rubinstein,23 Kraus and Litzenberger,24 Friend and Westerfield25 and Sears and Wei26 have laid the foundations of the approach based on higher moments for pricing financial instruments. These theoretical developments gave birth to the three-moment and the four-moment CAPM.23

The theoretical developments related to the use of higher moments as measures of financial risk are linked to other sections of risk theory, but a synthesis is yet to do. The theory of stochastic dominance27 has a long history. The stochastic dominance theory considers an increasing amount of higher moments of a return distribution to evaluate whether a portfolio is superior to another. This theory has delivered new risk measures like the risk of shortfall, which is based on the returns probability distribution and transfers of probability masses between high-wealth states of nature and low-wealth states of nature. In the same line of ideas, Scott and Hovarth28 argue that the odd moments of the returns distribution, like a positive mean and a positive skewness, provide positive marginal utility to an investor. The positive even moments, like variance and kurtosis, entail negative marginal utility. These developments lay the foundations of the modern theory of risk, which is under construction.

The new empirical version of the F&F model, based on the Hausman specification test we propose, is in line with the new risk literature. Our contribution is twofold. First, we add factors adjusting or correcting for risk exposure in the F&F model, this correction being required by a potential problem of errors-in-variables, which will be detected by our new version of the Hausman specification test. Next, we use as instruments the higher moments of the regressors in the F&F equation. These instruments not only serve as technical tools, but also as risk measures that implement our estimation method in the modern theory of risk analysis, which is essentially nonlinear. As we will observe, our procedure has another advantage as the regressors (factors) of the F&F equation are long–short portfolios, which are consequently similar to those of hedge funds. Indeed, these long–short portfolios are mimicking market anomalies, and thus they are long in a category of returns (for example, returns of small firms) and short in the opposite category (for example, returns of big firms). Consequently, they behave as options portfolios that incorporate many nonlinearities. Higher-order moments are therefore required to account for these nonlinearities. Taleb29 even suggests using moments of order higher than four to measure the risk of an option, odd moments being measures of asymmetry and even moments being measures of convexity.30 The empirical work shows that our choice of instruments is judicious. In fact, the correlation of a regressor with moments higher than two is often much more important than the correlation with conventional inferior moments.

This article is organized as follows. The methodology used to carry our empirical analysis is explained in the next section. First, we discuss the theoretical aspects related to errors-in-variables and to the choice of instruments, which consist in the higher moments of the regressors of the F&F model. Second, the estimation method used to correct the problem of errors-in-variables in the F&F model is presented. That gives rise to the development of our new empirical version of the F&F model, which features a new two-stage least-squares estimator incorporating a Hausman specification test using an artificial regression. The subsequent section describes the data series used in this study and reports the empirical results. Concluding remarks are given in the final section.

METHODOLOGY

The biases caused by errors-in-variables

The errors-in-variables problem is well-known in the econometric literature, but is often overlooked in finance. Assume31 a regression model where observed y and x variables are measured with errors. The true values of these variables are y* and x*. The relations between the observed variables and their true counterparts are:

where u and v are error vectors. These error vectors have zero means and their variances are: E(uu′)=σ u 2I and E(vv′)=σ v 2I. The vectors u and v are assumed orthogonal.

The exact linear relationship between the two unobserved variables is:

As y* and x* are unobservable, we substitute equations (1) and (2) in equation (3):

where ɛ=v. Consequently, by equation (2), x is correlated with the error term ɛ, and this creates a bias. To compute it, we solve equation (4) for β̂, the estimated value of parameter β, by the method of least squares:

Substituting equation (4) in equation (5), we obtain:

Equation (6) shows that β̂ is biased. And in large samples, it is not consistent:

where We can rewrite expression (7) as:

where λ<1.

According to equation (8), the problem of errors-in-variables tends to underestimate β̂, the degree of underestimation being related to λ. The more λ is near 1, the more serious is the problem of errors-in-variables. At the limit, λ is 1 and p lim β=0. When only y is plagued with errors of measurement in equation (1), there is no bias because x remains uncorrelated with the innovation of the equation. The problem appears when x is measured with error. This creates a correlation between x and the innovation term and therefore a bias.

Unfortunately, if there is more than one explanatory variable in a model, we cannot know a priori the relative impact of errors-in-variables on the estimation process. Some parameters will be overstated and others understated. But as seen in another section, the Hausman test, and more precisely the version of this test based on artificial regressions, will not only help us to detect errors in variables, but also give us more information about the incidence of this problem on estimated parameters.

The choice of instrumental variables to estimate the augmented F&F model

The augmented F&F (1992, 1993 and 1997) model is a purely empirical model that may be written as:

with:

R pt R ft :

the excess return of a portfolio, R ft being the risk-free return;

R mt R ft :

the market risk premium;

SMB :

a portfolio that mimics the ‘small firm anomaly’, which is long in the returns of selected small firms and short in the returns of selected big firms;

HML :

a portfolio that mimics the ‘income stock anomaly’, which is long in returns of stocks of selected firms having a high (book value/market value) ratio (income stocks) and short in selected stocks having a low (book value/market value) ratio (growth stocks);

UMD :

a portfolio that mimics the ‘momentum anomaly’, which is long in returns of selected stocks having a persistent upper trend and short in stocks having a persistent downwards trend.

To explain the return of a stock or a portfolio of stocks, the F&F model adds to the unique factor retained by the CAPM, the market risk premium, three other factors that are assumed to represent market anomalies: the small firm anomaly, the book value to market value anomaly and the momentum anomaly.32

We postulate that the three mimicking portfolios SMB, HML and UMD might be measured with errors. They are thus possibly correlated with the innovation term in equation (9), and the estimators of the parameters of this equation obtained by OLS are consequently biased and not consistent. To purge these coefficients from these biases, we must regress in a first pass the independent variables on instrumental variables. The estimated method used in this article, which is based on the Hausman test, will be explained below. The problem lies in the choice of these instruments.

As said previously, it is difficult to find valuable instruments for the excess returns of the mimicking portfolios. Being long in some stocks and short in others, their cash flows are similar to those of hedge funds. Higher moments of returns, as asymmetry and kurtosis, might have a great influence on these returns. This suggests the use of higher moments of the variables on the RHS of equation (9) as instrumental variables. An econometric theory is indeed in construction on this subject. Following Durbin14 and Pal,15 Dagenais and Dagenais16, 17 showed that higher moments33 of independent variables of a regression might be valid instruments to remove errors-in-variables. But instead of defining higher moments as in these papers, we will adopt a method more akin to asset pricing theory, which defines higher moments of returns by powers of these returns.

The method of asset pricing based on higher moments is not new. Rubinstein23 and Kraus and Litzenberger24 put the foundations of the three-moment and four-moment CAPM. The three-moment CAPM integrated asymmetry of returns in the analysis while the four-moment CAPM added kurtosis.

Some authors, like Kraus and Litzenberger,24 use a general utility function to derive the moment-CAPM. Others use a Taylor expansion of the utility function, which, following Samuelson22 and Rubinstein,23 allows expressing utility in terms of the higher moments of returns.

Let us assume that the expected utility of wealth, E[U(W)], is function of the n first moments of the distribution of wealth:

with the expected value of wealth; σ W , the volatility of wealth; skew W , its skewness; kur W , its kurtosis and sm nW , the nth moment of the distribution of wealth. We incorporate moments of order 5, particularly because we know that they might be important to explain the returns of long–short portfolios like mimicking portfolios that are the foundation of the F&F model.

The expected utility of end-of-period wealth is maximized over the one period horizon subject to the constraint of initial wealth, which is:

According to equation (11), the initial wealth w0 is allocated between the risk-free asset a0 and n other risky assets designated by a i . To maximize the utility of end-of-period wealth subject to (11), we form the usual Lagrangian:

Taking the first-order conditions for a maximum, we have:

with ϕ x =(∂E[U(X)])/(∂x).

End-of-period expected wealth is equal to:

We thus have:

We can therefore express the moments of wealth in terms of the moments of returns of the portfolio as:

with the volatility of the return of the risk assets; β ip =(E[(R i i ) (R p \(\overline{R}\) p )])/(σ p 2), the beta of risk asset i with the investor's portfolio of risk assets. the asymmetry of the return of the portfolio and γ ip =(E[(R i \(\overline{R}\) i ) (R p \(\overline{R}\) p )2])/(sk p 3), the gamma of risk asset i with the investor's portfolio of risk assets. It is easy to generate all the other variables by following this pattern.

Equating equations (13) and (14) to delete φ and taking into account equations (18), (19), (20) to (21), we arrive at an expression for E(R i ) in terms of the moments of the distribution of the return of an investor's portfolio:

with (ϕ moment )/(ϕ ) being an investor's marginal rate of substitution between expected wealth and a specific moment. According to Scott and Hovarth,28 these marginal rates of substitution are positive for odd moments, like mean and positive asymmetry, and negative for even moments, like variance and kurtosis. From equation (22), odd moments have, ceteris paribus, a negative impact on expected return from the point of view of investors. Even moments have a positive impact because they represent risk.

Moving from equation (22) to the condition of market equilibrium for E(R i ) requires making, according to Kraus and Litzenberger,24 the strong assumption of homogeneous expectations for investors. Following this assumption, equation (22) becomes, assuming that p is the market portfolio:

The terms in brackets in expression (23) are the slopes of the efficient frontiers whose arguments are expected wealth and the respective moment. We obtain finally the n-moment CAPM:

We might use directly expression (24) to define our instruments for removing errors-in-variables by the methods of higher moments in the F&F model. Assume we want to correct the mimicking portfolio SMB for errors-in-variables. In the first pass of our regressions, we would regress this variable on the co-moments of the lagged excess return of the market portfolio. The variable SMB corrected for errors-in-variables would be:

In fact, we would also have to introduce the co-moments of the mimicking portfolios. This approach would be laborious and would require computing rolling windows of co-moments. But there is a procedure for simplifying equation (25). Kraus and Lintzenberger24, 34 have shown that a three-moment CAPM is consistent with the following quadratic form:

and consequently a n-moment CAPM can be written as:

A test on α2 is a test on skewness preferences in asset pricing and a test on α3, a test on kurtosis preferences, and so on. The higher moments are consequently powers of returns in this approach. We therefore use a financial theory, the n-moment CAPM, to give an object to the method of Dagenais and Dagenais17 for correcting errors-in-variables. Let us return to the variable SMB, which we want to correct for the problem of errors-in-variables. In the first pass of the regression, this variable will be regressed on:

where F i are the variables in the RHS of the F&F equation (equation (9)) including SMB. They stand for the higher moments of these variables. Fit−12 stands for the skewness of factor F i ; Fit−13, for its kurtosis, and so on. The variables appearing on the RHS of equation (28) will serve as instrumental variables in the first pass of the Hausman test, as explained in the following section.

Hausman specification test and errors-in-variables

To detect errors in variables in our sample of hedge funds, we could use the original Hausman h test.35 To explain this test, let us suppose the following classical model:

with Y a (n × 1) vector representing the dependent variable; X, a (n × k) matrix of the explicative variables; β, a (k × 1) vector of the estimates of the parameters and ɛiid(0, σ2).

Hausman compares two sets of estimates of the parameters vector, say, β OLS , the least-squares estimator (OLS), a β A , and alternative estimator that can take a variety of forms, but which for our purposes is the instrumental variables estimator which we designate by β IV . The hypotheses to test are H0, being in our case the absence of errors in variables and H1, being the presence of errors in variables. The vector of estimates β IV is consistent under both H0 and H1 but β OLS is consistent under H0 but inconsistent under H1. Under H0, β IV is indeed less efficient than β OLS .

Hausman wants to verify whether the ‘endogeneity’ of some variables,36 the variables measured with errors in our case, has any significant effect on the estimation of the vector of parameters. To do so, he defines the following vector of contrasts: β IV β OLS . The test statistic may be written as follows:

with Var(β̂ IV ) and Var(β̂ OLS ) being consistent estimates of the covariance matrices of β̂ IV and β̂ OLS . g is the number of potentially endogenous regressors, that is the variables measured with errors in our case. H0 will be rejected if the P-value of this test is less than α, with α being the critical threshold of the test, say 5 per cent.

According to MacKinnon,37 this test might run into difficulties if the matrix ⌊Var(β̂ IV )−Var(β̂ OLS )⌋ that weights the vector of contrasts is not positive definite. Fortunately, there is an alternative way to do the Hausman test which is much easier. This test goes as follows.

Assume a two-variable linear model:

with ɛN(0, σ2).

The variables x*1t and x*2t38 are observed with errors, that is:

with x1t and x2t, the corresponding observed variables which are measured with errors. By substituting equations (32) and (33) in equation (31), we have:

with ɛ* t =ɛ t β1υ1β2υ2. As seen before, estimating coefficients of equation (34) by the OLS method gives way to biased and inconsistent coefficients because the explanatory variables are correlated with the innovation.

Consistent estimators can be found if we can identify an instruments vector z t that is correlated with every explanatory variable but not with the innovation of equation (31). Then we regress these two explanatory variables on z t . We have:

with it , the value of x it estimated with the vector of instruments and ŵ it , the residuals of the regression of x it on it . Substituting equations (35) and (36) into equation (34), we have39:

The explanatory variables of this equation are, on the one hand, the estimated values of x1t and x2t, obtained by regressing these two variables on the vector of instruments z t , and on the other hand, the respective residuals of these regressions. Equation (37) is therefore an augmented version of equation (34), which might be qualified of auxiliary or artificial regression.

We can show that:

and the same for w 2t . When there is no measurement error, and OLS gives way to a consistent estimator for the parameter of ŵ1t in equation (37), that is β1. When there are measurement errors, and therefore this estimator is not consistent.

We can therefore write the following test to detect the presence of errors in variables. As we do not know a priori whether there are errors in variables, we replace the coefficients of ŵ1t and ŵ2t in equation (37) by γ1 and γ2. We have:

But following equations (35) and (36) 1t=x1tŵ1t and 2t=x2tŵ2t. We can therefore rewrite equation (39) as follows:

If there is no measurement error for both variables x1t and x2t, then γ1=β1 and γ2=β2. If there are measurement errors, γ i β i and the coefficients of the residuals terms w it will not be zero.

There is more information that we can draw from equation (40). Indeed, if the estimated coefficient (γ i β i ) is significantly positive, that indicates that the estimated coefficient of the corresponding explanatory variable x it is overstated in the OLS run. Therefore, the estimated coefficient for this variable will decrease in equation (40). On the other hand, if the estimated coefficient (γ i β i ) is significantly negative, that indicates that the estimated coefficient of the corresponding explanatory variable x it is understated in the OLS run. Therefore, the estimated coefficient for this variable will increase in equation (40). These effects of errors in variables produced by equation (40) are very informative. In the next section, we will transpose these results to the F&F model.

We must note that the coefficients β i estimated by the equation (40) are identical to those produced by a two-stage least squares (TSLS) procedure using the same instruments. Equation (40) is therefore another way to do a TSLS. But in view of the useful information produced by equation (40), this equation opens the doors to new financial models. We will therefore prefer this formulation to that one represented by TSLS to estimate the augmented F&F model. And we thus have a new empirical formulation for the F&F model.

We therefore proceed as follows to test for errors in variables. First, we regress the observed explanatory variables x it on the instruments vector to obtain the residuals ŵ it . Then, we regress y t on the observed explanatory variables x it and on the residuals ŵ it . This is an auxiliary or artificial regression. If the coefficient of the residuals of an explanatory variable is significantly different from 0, we may conclude that there is a measurement error on this explanatory variable. We may use the Wald test (F test) to determine whether the whole set of (γ i β i ) coefficients is significantly different from zero.

We can generalize the preceding procedure to the case of k explanatory variables, which are potentially suffering from the problem of errors in variables. Let X be a (n × k) matrix of explanatory variables that are potentially suffering from the disease of errors in variables and let Z be a (n × s) matrix of instruments (s>k). To perform the Hausman test based on an artificial regression, we first regress X on Z to obtain , that is:

where P Z is the ‘predicted value maker’. Having performed this regression, we compute the matrix of residuals ŵ:

Then we perform the following artificial regression:

An F test on the λ coefficients will indicate whether they are significant as a group. A t test on individual coefficients will indicate whether the corresponding β is understated or overstated, as discussed previously.

The vector of β estimated by equation (43) is identical to the TSLS estimates, that is:

To detect errors in variables in the augmented F&F model, we will run two sets of regressions. First, we will run the OLS one, that is:

Then, we will run the following artificial regression explained previously:

The estimated coefficients ϕ i will allow detecting errors in variables, and their signs will indicate whether the corresponding variable is overstated or understated in the OLS regression.

As said previously, the β* estimated by equation (46) are equivalent to the TSLS estimates. But we prefer equation (46) because it gives more information on the problem of errors in variables. Equation (46) is thus our new empirical version of the augmented F&F model. The ϕ i are really factors of correction of the risk exposure of a Fund to the ith factor of risk. If ϕ i is positive, that means that the exposure to the ith risk factor is overstated in the OLS regression. The β associated with this factor will thus decrease in the artificial regression. And vice-versa if ϕ i is negative. Moreover, according to our previous developments, we expect a high positive correlation between (β̂ i β̂* i ), that is the estimated error on the coefficient of factor i, and i , the estimated coefficient of the corresponding artificial variable (ŵ i ).

EMPIRICAL RESULTS AND ANALYSIS

Our sample of hedge funds returns comprises the monthly returns of 20 Greenwich-Van US American hedge funds indexes classified by categories or groups of categories. The appendix gives the enumeration of these funds with their chosen symbol. The observation period runs from January 1995 to November 2005, for a total of 131 observations. The risk factors that appear in the F&F equation – that is the market risk premium and the three mimicking portfolios: SMB, HML and UMD – are for their part drawn from the French's website.40

Table 1 gives a first glance of our hedge fund sample, and reports the descriptive statistics of the ‘average’ fund. At 11.6 per cent, the annualized mean return of this sample is quite high and its standard deviation is relatively moderate. But it is well known that standard deviation is a reliable measure of risk only if risk is small or under very special conditions. Otherwise, we must consider the higher moments of the distributions of excess returns. The skewness of the average fund is close to 0 but the mean level of kurtosis, at 6.4, is quite high, the level of kurtosis associated with a normal distribution being 3. Obviously, a large number of equity-oriented hedge funds strategies exhibit payoffs that are similar to those of a short position in a put option written on the market index,41 and therefore bears significant left-tail risk, risk that is ignored by the commonly used mean-variance framework. This may imply that rare events are more frequent than in a normal distribution, and that nonlinearities of payoffs are quite important. Incidentally, there are 18 funds over 20 that have a kurtosis level exceeding three and 17 that have non-Gaussian returns according to the Jarque–Bera test.

Table 1 Descriptive statistics of the sample of 20 hedge funds returns

We now define the instruments necessary to perform the Hausman test. Table 2 gives the correlations of the F&F factors with themselves and their instruments from January 1995 to November 2005. In addition to the instruments discussed before, we add other macroeconomic variables: the monthly and annual American inflation rate, IPC_MENS and IPC_ANN, and the monthly and annual growth rate of the American industrial production, PROD_MENS and PROD_ANN.

Table 2 Correlation between the four factors and their instruments

According to Table 2, the F&F factors are more or less correlated with conventional instruments as the first lag of the factor or with the macroeconomic variables. Regarding the macroeconomic variables, we observe that their correlation with the risk factors located at the top of the columns is quite low. Only the industrial production annual growth has a moderate correlation with three of these factors.42

With respect to the risk factors of equation (45), we noted before that there are many nonlinearities in these mimicking portfolios that are similar to hedge fund portfolio. We also noted before that these nonlinearities might be captured by the higher moments of these factors. Corroborating this assumption, we observe from Table 2 that the risk factors are usually more correlated or cross-correlated with the higher moments of the first lag of a risk factor than to the first lag itself. For instance, the market risk premium is more related to the higher moments of UMD(-1) than to UMD(-1) itself. Indeed, the correlation between RM_RF and UMD(-1) is quite low: 0.03, but it is equal to 0.22 for UMD(-1),7 that is the higher moment of UMD(-1) of order 5. The same is true for the factor SMB and the higher moments of SMB(-1). Consequently, higher moments of lagged variables may constitute quite good instruments.9

Before discussing the results, let us note that we performed a Wald test over the whole set of the four coefficients associated with the risk factors of equation (46). For this test, the null hypothesis H0 is:

If this hypothesis is not rejected, we cannot detect errors-in-variables for the four factors considered as a group. This hypothesis was not rejected for any fund, and consequently it seems that there is no problem of errors-in-variables in our sample of hedge funds when we consider the four factors as a group. But as we will see, individual t tests reveal that in some cases, the bias caused by the presence of errors in variables may be quite serious.

Tables 3 and 4 give a first grasp of the estimations of equations (45) and (46) over the whole sample of our hedge funds. These tables contain the total count of significant coefficients at a confidence level of 95 per cent and the mean level of the t statistics computed over the 20 funds for both estimation methods. Table 3 indicates that the constant and the coefficients of regression of the risk premium and SMB are significant for the majority of the funds of our sample. The variable UMD is significant for approximately 50 per cent of the sample while the variable HML is more problematic, being significant for only a minority of funds. Table 4, which produces the average of the t statistics for the constant and the four factors computed over the 20 funds, confirms those preliminary observations. Let us note that equation (46) produces less significant coefficients than equation (45), even if this difference is not quite high. It is well known that the coefficients estimated by TSLS tend to have a larger variance than those resulting from a corresponding OLS regression. As seen previously, the artificial regression given by equation (46) is equivalent to TSLS method. On the other hand, the average R2 of the two estimation methods are quite comparable, being 0.55 for the OLS estimation and 0.54 for the artificial regression.

Table 3 Count of significant coefficients for the constant and the four factors for the OLS and the artificial Hausman regressiona
Table 4 Average level of the t statistics for the constant and the four factors for the OLS and the artificial Hausman regressiona

From Table 5, we find the mean levels of the coefficients estimated by the OLS method and by the artificial regression for the whole set of funds. The mean level of the constant, which corresponds to the alpha, is practically the same for both regression methods. We note that the beta seems to be overstated by the OLS regression that produces biased coefficients if there are errors in variables. The average beta resulting from the OLS estimation is 0.25 and 0.23 for the artificial one. Otherwise, the impact of SMB tends to be understated by the OLS regression, its average coefficient being 0.17 in the OLS regression and 0.21 in the artificial one. The average incidence of the UMD factor is quite similar for the two estimation methods but, as we will see, this situation hides quite a high dispersion. Finally, the influence of HML is quite low in both estimation methods, being moderately overstated by the OLS regression.

Table 5 Mean level of the estimated coefficients for the constant and the four factors for the OLS and the Hausman artificial regression

As the F&F model is a purely empirical one, there is no theory on the signs of the four factors of this model, except perhaps for the market index whose coefficient is generally positive according to the CAPM. But if we consider the three mimicking portfolios SMB, HML and UMD as risk factors, it is reasonable to expect generally a positive sign for the estimated coefficients of these factors. In fact, a fund can short a factor of risk, as it might short the market index. A hedge fund that makes short selling will have a negative beta. Another fund that short sells the mimicking portfolio SMB, say, will have a negative coefficient for this factor. But this behaviour seems to be the exception rather than the rule. Risk factors in returns equations should generally be preceded by positive signs.

Table 6 gives information on this matter for our sample. For both estimation methods, the estimated coefficients of the four factors have positive signs for the majority of funds. That comforts us in our expectations of considering the mimicking portfolios as risk factors and not as market anomalies as they were viewed in the past.

Table 6 Count of positive signs for the constant and the four factors for both estimation methods

To conclude these preliminary observations, we may say that the problem of errors in variables does not seem to be very serious for the group of funds surveyed in this study. But average behaviour might hide a great dispersion at the individual level. As we will see in the following paragraphs, the biases caused by errors in variables might be severe for some funds.

In our sample, it was the factor HML that seemed to be suffering most from errors in variables. The artificial regression reveals that its residuals were significant at the 95 per cent confidence level for four funds. For both SMB and UMD, the residuals are significant for three funds.

We will gain a better grasp of our estimations if we look at the individuals results. Table 7 gives the estimated betas, that is the coefficient of the risk premium (R m R f ), for each fund and for the two estimation methods. This table also provides the corresponding coefficient ϕ in the artificial regression (equation (46)). This coefficient is in bold when it is significant at the 95 per cent confidence level.43

Table 7 Spread (error) between the OLS and the Hausman beta for each fund of the studya

We note from Table 7 that for five funds, the spread between the OLS and the Hausman estimate, which is the measurement error on this coefficient, is quite high. For these funds, it is overstated four times and understated one time by the OLS method. When the beta of a fund is overstated, Table 7 reveals that the artificial coefficient ϕ associated with the residuals of the market risk premium in equation (46) is positive, as it must be. In the case of the ‘aggressive growth’ fund designated by ag, the beta is greatly understated by the OLS regression and the corresponding ϕ is therefore negative. Let us note that the correlation between the spreads (errors) column in our tables and the corresponding ϕ column is 0.98. Therefore, the association between the level of the measurement error on a coefficient and the corresponding level of ϕ is positive and almost perfect.44 Moreover, the regression of the error of the ith fund, that is the spread (β̂ i,OLS β̂ i,HAUS ), on the corresponding artificial variable i , gives the following result:

For this regression, the adjusted R2 is 0.96. Therefore, the level of the artificial variable is a very good indicator of the measurement error of the corresponding coefficient. It thus gives precious information on the measurement error, and it is the reason why we prefer the Hausman version of the F&F model to the equivalent TSLS one. We will not repeat this regression for the other factors of risk of the F&F model because the results are very similar to those obtained for the risk premium.

Table 8 provides the same information as Table 7 concerning the SMB factor. There are five funds where the measurement error is high. With regard to these funds, we note that the coefficient of this variable is understated four times and overstated one time by the OLS regression. When the impact of SMB is understated, the corresponding ϕ is negative, and when it is overstated the corresponding ϕ is positive, as it might be. The SMB coefficient is particularly understated for the emerging markets fund (em), and its associated ϕ is therefore negative and very high in absolute value.

Table 8 Spread (error) between the OLS and the Hausman SMB estimated coefficients for each fund of the studya

Table 9 reveals that the estimation of the HML coefficient is quite problematic, the ϕ coefficients for this variable being very high for some funds. The OLS regression tends to overstate greatly the impact of this variable on the returns of four funds. In this study, this overstatement is explained by a possible serious problem of errors in variables.

Table 9 Spread (error) between the OLS and the Hausman HML estimated coefficients for each fund of the studya

Finally, according to Table 10, the overstatement of the impact of the factor UMD is quite high in three cases. It is moderately understated in two cases. Moreover, if we correlate the OLS- and Hausman-estimated coefficients for this factor, we obtain only 0.23. The corresponding coefficients for the three other factors – R m −R f , SMB and HML – are 0.99, 0.91 and 0.80, respectively. Consequently, there is a great divergence between the results obtained by the OLS and Hausman regressions for the UMD factor, which is not observed for the other ones.

Table 10 Spread (error) between the OLS and the Hausman UMD estimated coefficients for each fund of the studya

From Tables 11 and 12, we find, respectively, the OLS and artificial regressions for the funds that seem to suffer the most from a problem of error in variables. The adjusted R2 varies greatly for these funds, having a low of 0.02 for the futures funds and a high of 0.85 for the short selling funds. The artificial regressions (Table 12) were performed by using the higher moments of the predetermined variables of our model as instruments.

Table 11 Estimation of Funds plagued with errors in variables by the OLS methoda
Table 12 Estimation of Funds plagued with errors in variables by the Hausman method with higher moments as instrumentsa

Table 12 reports our preferred empirical version of the F&F model. It is a new version that has not been produced yet. Therefore, it is not only an artificial regression, but a new empirical model. It includes the estimated coefficients of the residuals that produce a great amount of information about the correction of the exposures of the funds to the risk factors, as we have seen before. This correcting process is required because of the problem of errors in variables. In Table 13, we note that the coefficients of the regressions performed in Table 12 are identical to the coefficients obtained by a TSLS using the same instruments as equation (46). But this last estimation form is less informative than the one produced by the artificial regression, which constitutes a new empirical model.

Table 13 Estimation of Funds plagued with errors in variables by the TSLS method with higher moments as instruments

To check the relevance of higher moments as instruments, we repeat the estimations appearing in Table 13, but without using higher moments. The instruments used are therefore standard predetermined variables, that is, exogenous variables or lagged endogenous or exogenous variables not powered. Table 14 provides this estimation. The results are obviously bad compared to those obtained by TSLS using higher moments as instruments. Higher moments are therefore good candidates for instruments. They take into account the nonlinearities of the payoffs of hedge funds that are neglected by classic instruments.

Table 14 Estimation of Funds plagued with errors in variables by the TSLS method with classic instruments

SUMMARY AND CONCLUSION

In this article, we have developed a new empirical version of the F&F model aimed at testing and simultaneously correcting the problem of errors-in-variables. This model includes variables to account for the biases in the estimated exposures to risk factors. This correction process allows reducing the errors-in-variables biases from the estimated coefficients of the risk factors. These correction factors, based on the artificial regression of the Hausman specification test, are interesting because they give information on the understatement or overstatement of risk resulting from the problem of errors-in-variables and of the kind of specification errors.

Our new empirical version of the F&F model provides another novelty. It uses as instruments the higher moments of the mimicking portfolios returns distribution to correct the F&F model from its errors-in-variables. The fact that these instruments should be highly related to the risk factors is explained by the nonlinearities incorporated in the factor mimicking portfolios. These nonlinearities cannot be captured by a standard CAPM or APT model that postulates a linear relation between the returns to be explained and their risk factors. The F&F model is also linear in its factors. But the risk factors SMB, HML and UMD partly account for the presence of these nonlinearities. The strong relation between these factors and the instruments we proposed in this article, which are the higher moments of those variables, tends to validate these points. Moreover, a TSLS run on hedge funds returns using conventional instruments performed very poorly with respect to a TSLS using higher moments instruments.

Many recent articles criticize the F&F model.45 Our article provides instead arguments supporting this model. The three factors, SMB, HML and UMD, of the F&F model have their justification in the explanation of risk, because they aggregate the many nonlinearities that are present in the distribution of returns. They are not market anomalies as they were considered in the past. They are instead ‘réservoirs’ of moments and co-moments risks.

Our study also sheds light on the idiosyncratic behaviour of hedge funds. The beta of hedge funds is not generally high as it does not exceed 0.25. The short sellers differ on that matter, having a mean beta near −1. For hedge funds, the three factors SMB, HML and UMD are definitively factors of risk, their estimated signs being usually positive and significant in the regressions of the funds returns on these factors. After the market risk premium, the factor SMB exerts the most prominent positive impact on the returns of the hedge funds of our sample. This factor allows identifying overvalued and undervalued securities, a useful information for the investor.

Globally, the problem of errors-in-variables does not seem to be too serious in our sample of hedge funds. This problem gives rise to an overstatement of the impact of the risk premium and to an understatement of the influence of SMB. While the impact of HML on hedge funds returns is quite low, the incidence of UMD is more questionable. The correlation between the coefficients adjusted and unadjusted for the problem of errors-in-variables is negligible in the case of UMD.

In summary, our new empirical version of the F&F model relying on a new version of the TSLS technique incorporating an Hausman test seems quite promising. Another procedure to account for error-in-variables is to use optimal combinations of cumulants46 estimators, in addition to the combination of Durbin and Pal estimators we use in Racicot and Théoret,8, 9 instead of higher moments to build the instruments. In finance, we rely usually on moments to quantify risks, but cumulants are quite promising on this matter as they generalize the theory of moments.