1 Introduction

Mean–variance analysis developed by Markowitz (1952) has long played an important role in a number of areas in Finance. One of these areas is in the testing of asset pricing models. Roll (1977) shows that the central prediction of the capital asset pricing model (CAPM) is that the market portfolio lies on the ex ante mean–variance frontier. Chamberlain (1983) and Grinblatt and Titman (1987) show that for multifactor models, a combination of the K factor portfolios lie on the mean–variance frontier. Ferson (2019) points out that any candidate stochastic discount factor model, whether linear or nonlinear, implies that the portfolio with the maximum squared correlation portfolio to the stochastic discount factor lies on the mean–variance frontier (Hansen and Richard (1987)).

The classic test of mean–variance efficiency in the presence of a risk-free asset was developed by Gibbons et al. (1989) (GRS). The GRS test examines the mean–variance efficiency of a linear factor model relative to the efficient frontier where the optimal strategies are fixed-weight portfolios (passive mean–variance (PMV) frontier). The heart of the GRS test compares the maximum squared Sharpe (1966) performance of the factors to the maximum squared Sharpe performance of the factors and test assets to see if there is a significant shift. Barillas and Shanken (2017) extend this analysis and show that when it comes to relative model comparison tests, the choice of test assets is irrelevant and models can be compared in terms of the maximum squared Sharpe measures of the factors in each model. The better models are the ones with higher maximum squared Sharpe measures.

Ferson and Siegel (2009) extend the mean–variance efficiency tests of Gibbons et al. (1989) to allow dynamic trading strategies through the optimal use of conditioning information, building on the work by Hansen and Richard (1987), and Ferson and Siegel (2001). Hansen and Richard (1987) define the unconditional mean–variance frontier (UMV) in the presence of conditioning information where an investor can follow a dynamic trading strategyFootnote 1 to maximize the unconditional risk and return trade-off. Ferson and Siegel (2001, 2015) derive the closed-form solutions to UMV optimal portfolios.

Ferson and Siegel (2009) show that every asset pricing model makes a prediction about a portfolio (or combination of portfolios) that lie on the UMV frontier. Testing UMV efficiency represents a higher hurdle for asset pricing models to pass as models are required to correctly price not only fixed-weight portfolio strategies but also all portfolio strategies (satisfying the budget constraint) that can depend upon conditioning information. This approach compares the maximum squared Sharpe measure of the factors to the maximum squared Sharpe measure of the UMV frontier of the test assets and factors. Ferson et al. (2024) also extend the arguments of Barillas and Shanken (2017) and show that the maximum squared Sharpe measure of the UMV frontier of the factors of different models can be used in relative model comparison tests.

Ferson and Siegel (2009) use simulation analysis to test the UMV efficiency of factor models in U.S. stock returns and are able to reject unconditional and conditional versions of the CAPM and Fama and French (1993) models. Ferson et al. (2024) derive the asymptotic distribution of tests based on the maximum squared Sharpe measures of the UMV frontier, and the corresponding standard errors. These can be used to calculate t-statistics of the UMV efficiency tests of linear factor models, and to conduct relative model comparison tests.

This study examines the UMV efficiency of ten multifactor models in U.K. stock returns and to conduct relative model comparison tests. A focus on the U.K. is important for a number of reasons. A recent study by Pukthuanthong et al. (2023) find that the factors in the best model from a Bayesian model scan can be country specific. Dimson et al. (2015) find that the industrial compositions of the U.K. and U.S. markets can vary. The mining, oil, and gas sectors play a bigger role in the U.K. and the technology sector plays a smaller role relative to the U.S. market. The study of UMV efficiency of factor models is important in the evaluation of the performance of U.K. equity mutual funds. Ferson (2013) show that UMV efficient portfolios is an “Appropriate Benchmark” to use for clients with quadratic utility functions.Footnote 2 The results of the study suggests whether any of the factor models are Appropriate Benchmarks in this context. This is the first study to examine the UMV efficiency of linear factor models in U.K. stock returns,Footnote 3 and complements the studies in U.S. stock returns such as Ferson and Siegel (2009), Penaranda (2016), and Ferson et al. (2024). Recent studies by Harvey (2019) and Hou et al. (2020) highlight the importance of replication studies in Finance.

The sample period is between July 1983 and December 2022. The models include the three-factor model of Fama and French (1993), the five-factor model of Fama and French (2015), the six-factor model of Fama and French (2018), the four-factor model of Hou et al. (2015), the three-factor model of Clarke (2022), the two-factor model of Frazzini and Pedersen (2014), the four-factor model of Stambaugh and Yuan (2017), and the best factor models drawn from Bayesian model scan studies of Barillas and Shanken (2018), Chib and Zeng (2020), and Chib et al. (2024). I use two sets of test assets in 16 size/book-to-market (BM) portfolios, and 15 volatility/momentum portfolios.

There are three main findings in my study. First, the UMV efficiency of all the factor models is rejected. When dynamic trading is allowed in the factors, the UMV efficiency is no longer rejected for some of the models using the volatility/momentum portfolios as the test assets. Second, the rejection of UMV efficiency is driven mainly by allowing dynamic trading in the test assets and factors. Third, the best performing model in the relative model comparison tests using the UMV frontiers is the Chib and Zeng (2020) model.

The paper is organized as follows. Section II presents the research method. Section III describes the data used in my study. Section IV reports the empirical results. The final section concludes.

2 Research method

Ross (1978), Harrison and Kreps (1979), and Hansen and Richard (1987) show that if the Law of One Price (LOP) exists in financial markets, then there exists a stochastic discount factorFootnote 4 (mt+1) such that:

$${{\text{P}}}_{{\text{t}}}={\text{E}}\left({{\text{m}}}_{{\text{t}}+1}{{\text{X}}}_{{\text{t}}+1}\left|{{\text{Z}}}_{{\text{t}}}\right.\right)$$
(1)

where pt are the costs of the N test assets at t, Xt+1 are the payoffs of the N test assets at t + 1, and Zt is the information set of investors at time t. Where there are No Arbitrage (NA) opportunities available in financial markets, then mt+1 > 0 (Cochrane (2005)).Footnote 5 When the payoffs are gross returns (1 + returns), then Eq. (1) becomes:

$$1={\text{E}}\left({{\text{m}}}_{{\text{t}}+1}{{\text{R}}}_{{\text{t}}+1}\left|{{\text{Z}}}_{{\text{t}}}\right.\right)$$
(2)

Ferson and Siegel (2009) show if we restrict portfolio strategies such that the weights sum to 1 at each point in time, and take unconditional expectations, then Eq. (2) becomes:

$$\begin{array}{lc}\mathrm E\left({\mathrm m}_{\mathrm t+1}\mathrm x'\left({\mathrm Z}_{\mathrm t}\right){\mathrm R}_{\mathrm t+1}\right)=1&\mathrm{for}\;\mathrm{all}\;\mathrm x'\left({\mathrm Z}_{\mathrm t}\right)\mathrm e=1\end{array}$$
(3)

where x’(Zt) is a (N,1) vector of portfolio weights that can depend upon Zt, and e is a (N,1) vector of ones. In Eq. (3) asset pricing models are required to price not only the test assets but also all dynamic trading strategies that trade on Zt subject to the restriction that the weights sum to 1.

Ferson and Siegel (2009) show that if a candidate stochastic discount factor model satisfies Eq. (3), then it implies that a certain portfolio lies on the UMV frontier in the presence of conditioning information. The UMV frontier is defined as in Hansen and Richard (1987) as a portfolio (Rpt+1) such that:

$$\mathrm{VAR}\left({\mathrm R}_{\mathrm{pt}+1}\right)\;\leq\;\mathrm{Var}\left({\mathrm x'}{\left({\mathrm Z}_{\mathrm i}\right){\mathrm R}_{\mathrm i+1}}\right)\;\mathrm{if}\;\mathrm E\left({\mathrm R}_{\mathrm{pt}+1}\right)\;=\;\mathrm E\left({\mathrm x'}{\left({\mathrm Z}_{\mathrm i}\right){\mathrm R}_{\mathrm i+1}}\right)\;\mathrm{and}\;\mathrm x'\left({\mathrm Z}_{\mathrm t}\right)\mathrm e\;=\;1$$
(4)

The candidate stochastic discount factor models examined in this study are linear factor models given by:

$${{\text{m}}}_{{\text{t}}+1}={\text{a}}+ {{\text{b}}}_{{\text{k}}}{{\text{f}}}_{{\text{t}}+1}$$
(5)

where ft+1 is a (K,1) vector of the K excess factor returns at time t + 1, and bK is a (1,K) vector of slope coefficients on the K factors in the model. The individual slope coefficients in bK tell us whether the factor is important for pricing the test assets given the other factors in the model. Proposition 2 in Ferson and Siegel (2009) show that if Eq. (3) is satisfied by a linear factor model in Eq. (5), then there will be a combination of the K factor portfolios that lie on the UMV frontier.

The UMV efficiency of a linear factor model can be tested by comparing squared Sharpe (1966) measures. Define r as the N test assets, f as the K factors in a model, and Sh2 as the squared Sharpe measure. The null hypothesis is given by:

$${\text{Sh}}2\mathrm{ Diff}={{\text{Sh}}2}_{{\text{umv}}}\left({\text{r}},{\text{f}}\right)- {{\text{Sh}}2}_{{\text{umv}}}\left({\text{f}}\right)=0$$
(6)

where Sh2umv(r,f) is the maximum squared Sharpe measure from the UMV frontier of the test assets and factors, and Sh2pmv(f) is the maximum squared Sharpe measure from the PMV frontier of the factors. To estimate the Sharpe measures, a zero-beta return is requiredFootnote 6 and it is assumed in this study to be equal to the average return of the one-month Treasury Bill as in Ferson and Siegel (2009) and Ferson et al. (2024).

Ferson and Siegel (2001, 2015) derive the closed-form solutions of the optimal weights of the UMV frontier. Define ut as a (N,1) vector of the conditional expected returns of the assets based on information at time t, Vt is the (N,N) conditional covariance matrix, Zt is a (L,1) vector of lagged information variables (including a constant), and Lt is the (N,N) inverse conditional second moment matrix and is equal to (Vt + utut’)−1. Define Dt = Lt – (Ltee’Lt)/(e’Lte), α1 = E(1/e’Lte), α2 = E((e’Ltut)/(e’Lte)), and α3 = E((ut’Dtut)/(e’Lte). The optimal weights are given by:

$${\text{x}}\left({{\text{Z}}}_{{\text{t}}}\right)= \left({{\text{L}}}_{{\text{t}}}{\text{e}}/\mathrm{e{\prime}}{{\text{L}}}_{{\text{t}}}{\text{e}}\right)+ \left(\left({{\text{u}}}_{{\text{p}}}-{\mathrm{\alpha }}_{2}\right)/{\mathrm{\alpha }}_{3}\right){{\text{D}}}_{{\text{t}}}{{\text{u}}}_{{\text{t}}}$$
(7)

where up is the target expected return.Footnote 7

The first term in Eq. (7) is the minimum conditional second moment portfolio where the weights sum to 1. The second term are the excess returns on the mean–variance component where the weights sum to zero. By varying the target up, any point on the UMV frontier can be selected.Footnote 8 Ferson and Siegel (2001) show that investors with a quadratic utility function will select UMV portfolios. Ferson and Siegel (2001) point out that the UMV optimal portfolio weights are conservative for extreme values of Zt. For a client who does not observe the information of the fund manager they would want the fund manager to hold the UMV portfolio.Footnote 9

Given a model of conditional moments, the optimal weights can be estimated, and the corresponding squared Sharpe measures can be calculated. Ferson et al. (2024) show that the squared Sharpe measure on the UMV frontier can be calculated as Sh2 = a – 2brz + crz2, where a = [(α22 + α1α3)/(α1(1 – α3) – α22)], b = [α2/(α1(1 – α3) – α22)], and c = [(1 – α3)/(α1(1 – α3) – α22)]. Ferson and Siegel (2009), and Ferson et al. (2024) use a predictive regression of the asset returns on Zt to model the conditional moments. The conditional expected returns are the fitted values from the regression, and the conditional covariance matrix is assumed constant and given by the residual covariance matrix from the regression. Ferson and Siegel (2009) point out the tests are robust to using the wrong model of conditional moments. The UMV portfolio is still a valid portfolio strategy but no longer the optimal one. Ferson and Siegel (2009) note that this leads to a loss in power. Ferson and Siegel (2009) use simulations to test the UMV efficiency of a factor model. Ferson et al. (2024) derive the asymptotic distribution of the test of Eq. (6) through Theorem I and Corollary I.Footnote 10 These can be used to calculate the standard error of the Sh2 Diff measure, and the corresponding t-statistic to evaluate the null hypothesis.

To provide further insight into the UMV efficiency tests, Ferson et al. (2024) consider two decompositions. The first decomposition is given by:

$${\text{Sh}}2\mathrm{ Diff}={\text{Sh}}2\mathrm{ Diff}+{\text{Sh}}2\mathrm{ Diff}2$$
(8)

where Sh2 Diff1 = Sh2pmv(r,f)–Sh2pmv(f), Sh2 Diff2 = Sh2umv(r,f)–Sh2pmv(r,f), where Sh2pmv(r,f) is the maximum squared Sharpe measure from the PMV frontier of the test assets and factors. The first term in the decomposition in Eq. (8) is a test of PMV efficiency of the factor model, and the second term captures the impact of allowing dynamic trading in the test assets and factors. The second decomposition is given by:

$${{\text{Sh}}}^{2}\mathrm{ Diff}+{\text{Sh}}2\mathrm{ Diff}3+{\text{Sh}}2\mathrm{ Diff}4$$
(9)

where Sh2 Diff3 = Sh2umv(r,f)–Sh2umv(f), Sh2 Diff4 = Sh2umv(f)–Sh2pmv(f), where Sh2umv(f) is the maximum squared Sharpe measure from the UMV frontier of the factors. The first term in the decomposition in Eq. (9) is the UMV efficiency test of the factor model, where dynamic trading is allowed in the factors. Ferson and Siegel(2009) point out that this is a test of dynamic mean–variance intersection along the lines of Huberman and Kandel (1987). The second term captures the impact in dynamic trading in the factors, and estimates the increase in the maximum squared Sharpe performance of the factors through the optimal use of conditioning information. The Sh2 Diff measures can be calculated for the four terms, and the corresponding t-statistics using Theorem I and relevant Corollaries in Ferson et al. (2024).

Ferson et al. (2024) extend the analysis of Barillas and Shanken (2017) to relative model comparison tests using the UMV fronter. Better models have higher maximum squared Sharpe measures from the UMV frontier. Define two factor models fA, and fB. The null hypothesis in relative model comparison tests is:

$${\text{Sh}}2\mathrm{ Diff}={{\text{Sh}}2}_{{\text{umv}}}\left({{\text{f}}}_{{\text{A}}}\right)-{{\text{Sh}}2}_{{\text{umv}}}\left({{\text{f}}}_{{\text{B}}}\right)=0$$
(10)

The t-statistic of the Sh2 Diff measure in the null hypotheses in Eq. (10) can be calculated using Theroem I and the relevant Corollaries in Ferson et al. (2024).

One issue that arises when using the maximum squared Sharpe measures to test and compare factor models is that there is a large upward bias in the sample maximum squared Sharpe measure (Jobson and Korkie (1980)). Ferson and Siegel (2003) in their study of Hansen and Jagannathan (1991) volatility bounds use a bias adjusted maximum squared Sharpe measure given by:

$${{\text{Sh}}2}_{{\text{b}}}= {{\text{Sh}}2}^{*}\left(\left({\text{T}}-{\text{N}}-2\right)/{\text{T}}\right)-{\text{N}}/{\text{T}}$$
(11)

The adjusted Sh2b works well when evaluating models using the PMV frontier. Ferson and Siegel (2003) find that the bias adjustment works less well when using the UMV frontier. Proposition II in Ferson et al. (2024) derives a bias adjustment of the maximum squared Sharpe measures of UMV portfolios based on the method of statistical differentials (see Siegel and Woodgate (2007)). Simulation evidence in Ferson et al. (2024) suggests that their bias adjustment works well in testing factor models in U.S. stock returns, and performs better than alternative bias adjustment methods. In this study, I use the adjusted maximum squared Sharpe measures for UMV portfolios based on Proposition II in Ferson et al. (2024), and Sh2b for the PMV frontier.

3 Data

3.1 A) Test assets

The sample period covers between July 1983 and December 2022. I use two sets of test assets in the study. Details on the formation of the test assets are included in the Appendix. The first set is 16 size/BM portfolios, where the stocks are sorted by market value (Small to Big), and the BM ratio (Growth to Value). The portfolios are reformed annually and are value weighted portfolio returns. The data for forming the size/BM portfolios is collected from the London Share Price Database (LSPD) provided by the London Business School, and Refinitiv Worldscope.

The second set of test assets is motivated from Kirby and Ostdiek (2012ab), and is 15 portfolios sorted by volatility (Low to High), and momentum (Losers to Winners). The volatility/momentum portfolios are formed monthly and are value weighted portfolio returns. The data for forming the volatility/momentum portfolios is collected from LSPD. I use the return on the one-month U.K. Treasury Bill as the risk-free asset, which I collect from LSPD and Datastream.

3.2 B) Factor models

I consider the performance of ten different linear factor models. Fama and French (2018) argue for using a small number of linear factor models in relative model comparison tests to mitigate the impact of data dredging issues. The factors are formed using data on LSPD and Worldscope. Details on how the factor models are formed are included in the Appendix. The following factor models are used.

  1. 1.

    Fama and French (1993) (FF3).

    The FF3 model is a three-factor model, which includes the excess market returns, and two zero-cost portfolios that capture the size (SMB), and value (HML) effects in stock returns.

  2. 2.

    Fama and French (2015) (FF5).

    The FF5 model is a five-factor model, which includes the FF3 factors and adds two zero-cost portfolios that capture the profitability (RMW), and investment (CMA) effects in stock returns.

  3. 3.

    Fama and French (2018) (FF6).

    The FF6 model is a six-factor model, which includes the FF5 factors, and a zero-cost portfolio that captures the momentum (MOM) effect in stock returns.

  4. 4.

    Clarke (2022) (LSC).

    The LSC model is a three-factor model, which includes the excess returns of a Level, Slope, and Curve factors in stock returns.Footnote 11

  5. 5.

    Hou et al. (2015) (HXZ).

    The HXZ model is a four-factor model, which includes the excess market return and three zero-cost portfolios that capture the size (ME), profitability (ROE), and investment (IA) effects in stock returns.

  6. 6.

    Frazzini and Pedersen (2014) (FP).

    The FP model is a two-factor model which includes the excess market returns, and the Betting against Beta (BAB) factor.

  7. 7.

    Stambaugh and Yuan (2017) (SY).

    The SY model is a four-factor model, which includes the excess market return, and zero-cost portfolios for the size, (SIZE), management (MGMT), and performance (PERF) factors.

    The final three models are selected from recent Bayesian model scan studies of Barillas and Shanken (2018), Chib and Zeng (2020), and Chib et al. (2024) in U.S. stock returns. The model scan searches for the best model which has the highest posterior probability (log Marginal Likelihood) among a set of factors.

  8. 8.

    Barillas and Shanken (2018) (BS).

    The BS model is a six-factor model, and includes the excess market return, and zero-cost portfolios for the size (SMB), value (HMLT),Footnote 12 profitability (ROE), investment (IA), and momentum (MOM) factors.

  9. 9.

    Chib and Zeng. (2020) (CZ).

    The CZ model is an eight-factor model.Footnote 13 The model includes the excess market returns, and zero-cost portfolios including the SMB, HMLT, RMW, ROE, MOM, BAB, and the Quality minus Junk (QMJ)Footnote 14 factors.

  10. 10.

    Chib et al. (2024) (CZZ).

    The CZZ model is a seven-factor model. The model includes the excess market returns, and zero-cost portfolios including the SMB, ROE, MOM, MGMT, PERF, and Post Earnings Announcement Drift (PEAD)Footnote 15 factors.

Table 1 reports summary statistics of the test assets and the factors between July 1983 and December 2022. Panel A of Table 1 includes the average excess return (%), standard deviation (Std Dev), and the t-statistic of the null hypothesis that the average excess factor returns are equal to zero for the different factors. Panel B of Table 1 reports the average excess returns (%) of the size/BM, and volatility/momentum portfolios.

Table 1 Summary statistics of test assets and factors

Panel A of Table 1 shows that most of the factors have significant positive average excess returns. The main exception are the size factors (SMB, ME, Size), the ROE and HMLT factors. The MOM factor has the largest average excess return across factors at 0.760%, highlighting the strong momentum effect in U.K. stock returns, followed by the BAB factor at 0.590%. The MGMT and PERF factors in the SY model also have substantial average excess returns. There is a significant investment effect in the FF5 and HXZ models, using the CMA, and IA factors. It is only the CMA and MOM factors that have a t-statistic larger than 3, which is the recommended cut-off t-statistic by Harvey et al. (2016) to control for multiple testing.

Panel B of Table 1 shows that there is a wide spread in average excess returns for both sets of test assets. The average excess returns of the size/BM portfolios range between 0.141% (Small/Growth), and 0.722% (Small/Value). The value effect is stronger in smaller companies, which is consistent with Fama and French (2012). There is a small size effect in the Value portfolios, and a reverse size effect in the Growth portfolios.

The spread in average excess returns in panel B of Table 1 is a lot wider in the volatility/momentum portfolios compared to the size/BM portfolios. The average excess returns of the volatility/momentum portfolios range between -0.751% (High/Losers), and 0.914% (4/Winners). There is a large momentum effect in average excess returns across all volatility groups. There is likewise a volatility effect in average excess returns, for the Losers and 2 portfolios, where the Low volatility portfolio has a much higher average excess return than the High volatility portfolio. The volatility effect is a lot stronger when we look the standard deviations of the volatility/momentum portfolios.

3.3 C) Lagged information variables

I use four lagged information variables that earlier studies have found to have some predictive ability of future stock returns.Footnote 16 The lagged information variables include the lag one-month annualized dividend yield (DY) of the U.K. market index (Fama and French (1988)), lag return on the one-month U.K. Treasury Bill return (Rf) (Fama and Schwert (1977), Ferson (1989)), the lag one-month term spread (Term) given by the difference in the annualized yields of the long-term government bonds (International Financial Statistics), and the three-month U.K. Treasury Bill (LSPD), and the lag one-month excess return on U.K. market index. The lag DY is formed using data from LSPD.

To examine the predictive ability of the lagged information variables, I run predictive regressions for both sets of test assets of the excess asset returns on a constant and the four lagged information variables in unreported tests.Footnote 17 It is only for the size/BM portfolios that the Wald test rejects the null hypothesis that the slope coefficients on the lagged information variables are jointly equal to zero. The magnitude of the predictability is small in statistical terms with the highest adjusted R2 is 8.13% (Small/Value) in the size/BM portfolios, and 2.74% (High/2) in the volatility/momentum portfolios.

4 Empirical results

I begin the empirical analysis by testing the UMV efficiency of the linear factor models. Table 2 reports the difference in adjusted maximum squared Sharpe measures (Sh2 Diff) between the UMV frontier of the test assets and factors and the PMV frontier of the factors, and the corresponding t-statistics. An earlier version of Ferson et al. (2024) point out that the increase in squared Sharpe performance can be interpreted in economic terms using maximum quadratic utilities (Kan and Zhou (2007)). The Certainty Equivalent (CE) excess returns is given by (1/2γ)Sh2 Diff where γ is risk aversion level. The CE in Table 2 assumes a risk aversion level of 5 as in Ferson et al. (2024). Panel A includes the results using the size/BM portfolios as the test assets, and panel B includes the results using the volatility/momentum portfolios as the test assets.

Table 2 UMV Efficiency tests of linear factor models

Table 2 shows that the UMV efficiency of each factor model is strongly rejected in both sets of test assets. There is a large significant increase in the adjusted maximum squared Sharpe performance between the UMV frontier of the test assets and factors, and the PMV frontier of the factors. The Sh2 Diff measures range between 0.28 (BS) and 0.377 (CZ) for the size/BM portfolios, and 0.117 (SY) and 0.268 (CZ) for the volatility/momentum portfolios. The magnitude of the CE excess returns is greater than 2.79% for all models using the size/BM portfolios, and greater than 1.17% for all models using the volatility/momentum portfolios. The rejection of the UMV efficiency of the factor models is consistent with Ferson and Siegel (2009), and Ferson et al. (2024) in U.S. stock returns.

Table 2 shows that the UMV efficiency of all the factor models is rejected in both sets of test assets. I next explore what drives the rejection in UMV efficiency by estimating the decompositions of Ferson et al. (2024) in Eqs. (8) and (9). Tables 3 and 4 report the two decompositions of UMV efficiency tests using the size/BM portfolios as the test assets (Table 3), and volatility/momentum portfolios as the test assets (Table 4). The first decomposition is in panel A of each table, and the second decomposition is in panel B. The table reports the differences in adjusted maximum squared Sharpe measures (Sh2 Diff1, Sh2 Diff2, Sh3 Diff3, Sh4 Diff4), and the corresponding t-statistics.

Table 3 UMV Efficiency decomposition tests: size/bm portfolios
Table 4 UMV Efficiency decomposition tests: volatility/momentum portfolios

Panel A of Table 3 shows that using the size/BM portfolios as the test assets, it is the dynamic trading in both the test assets and factors that drives the rejection of UMV efficiency of the factor models. The Sh2 Diff2 measures are a lot larger than the Sh2Diff1 measures and all are highly statistically significant. The Sh2 Diff1 measures reject the passive mean–variance efficiency of all models at the 10% level, except for the BS and CZ models. The finding that the dynamic trading drives the rejection of UMV efficiency of the factor models is similar to Ferson et al. (2024).

Panel B of Table 3 shows that allowing dynamic trading in the factors, there is a significant increase in the maximum adjusted squared Sharpe measures of the factors, as reflected in the significant positive Sh2 Diff4 measures. This is especially the case for the CZ model. This result provides support for the optimal use of conditioning information in the factors, which is consistent with Ferson and Siegel (2009), Abhyankar et al. (2012), Penaranda (2016), and Ferson et al. (2024). Although allowing dynamic trading in the factors leads to a significant increase in squared Sharpe performance of the factors, the UMV efficiency of each model is still rejected. The Sh2 Diff3 measures are all large in economic terms and highly statistically significant. This result rejects the dynamin mean–variance intersection of all the models (Huberman and Kandel (1987)).

When using the volatility/momentum portfolios as the test assets, panel A of Table 4 shows again that it is the dynamic trading in the test assets and factors that drives the rejection of the UMV efficiency of the models in most cases. This is especially the case for the FF6, BS, CZ, and CZZ models. All of the Sh2 Diff2 measures are significantly positive at the 10% level. In contrast, the Sh2 Diff1 measures are only significantly positive for FF3, FF5, LSC, HXZ, FP, and SY models. It is interesting to note that the PMV efficiency is not rejected for the BS and CZ models in either set of test assets. Allowing dynamic trading in the factors in panel B of Table 4 shows that the UMV efficiency of the FF6, BS, CZ, and CZZ models is no longer rejected. For these models, the hypothesis of dynamic mean–variance intersection is not rejected.

Tables 3 and 4 provide some support for the FF6, BS, CZ, and CZZ models when allowing dynamic trading in the factors. I next examine the relative model comparison tests using the maximum adjusted squared Sharpe measures from the PMV and UMV frontiers of the factors. Tables 5 and 6 report the difference between the adjusted maximum squared Sharpe measures (Sh2 Diff) of two factor models (panel A), and corresponding t-statistics (panel B). Table 5 reports the relative model comparison tests using the PMV frontier, and Table 6 reports the relative model comparison tests using the UMV frontier. The Sh2 Diff measures in Tables 5 and 6 is the difference between the adjusted maximum squared Sharpe measures of the model in the column and the model in the row.

Table 5 PMV Efficiency model comparison tests
Table 6 UMV Efficiency model comparison tests

Table 5 shows that there are a large number of significant differences in the adjusted maximum squared Sharpe measures using the PMV frontier between the factor models. The FF3 model has a significant lower adjusted squared Sharpe measure then the FF5, FF6, BS, CZ, and CZZ models. The FF3, LSC, HXZ, FP, and SY models have similar adjusted squared Sharpe measures. These models all significantly underperform the FF6, BS, CZ, and CZZ models. Among the FF6, BS, CZ, and CZZ models, there are no significant differences in the adjusted squared Sharpe measures. These are the best performing models in the relative model comparison tests based on the PMV frontiers.

Table 6 shows that allowing dynamic trading in the factors has an impact on the relative model comparison tests. There is a sizeable increase in the magnitude of the Sh2 Diff measures between models and a larger number of significant Sh2 Diff measures. The FF3 model continues to perform poorly in relative model comparison tests with a significant lower adjusted maximum squared Sharpe measures relative to the FF5, FF6, FP, BS, CZ, and CZZ models. The FP, LSC, HXZ, and SY models have similar performance to one another. The FF5 model significantly outperforms the LSC, HXZ, and SY models but significantly underperforms the FF6, and CZ models. Among the FF6, BS, CZ, and CZZ models, the CZ model has a significant higher adjusted squared Sharpe measure than the FF6 and CZZ models. The CZ model does have a sizeable higher adjusted squared Sharpe measure than the BS model but the difference is not statistically significant. The findings in Table 6 suggest that the CZ model is the best performing model, and complements the empirical results in Chib and Zeng (2020).

My study has used a standard set of portfolios as the test assets. However even in these test assets, the UMV efficiency of the models are rejected. It is likely that if a more challenging set of test assets such as the anomaly portfolios of Jensen et al. (2023) or the approach used by Bryzgalova et al. (2023b), the rejection of the UMV efficiency of the factor models would be even stronger. I have also used a standard set of lagged information variables. One of the attractions of the Ferson and Siegel (2009) approach is that the dimensions of the conditional covariance matrix remains fixed no matter how many lagged information variables are used. I repeat the tests by replacing the lagged excess market returns with a lagged default spread. The benefits of the optimal use of conditioning information is a lot weaker in all the factor models, and the CZ model no longer significantly outperforms the FF6, BS, and CZZ models. Although the results can be sensitive to the choice of lagged information variables, the results in the paper are likely to be conservative given that a much broader set of lagged information variables can be used.

5 Conclusions

This paper examines the UMV efficiency of ten multifactor models in U.K. stock returns, and conducts relative model comparison tests. There are three main findings in the study.

First, the UMV efficiency of all the factor models is strongly rejected in both sets of test assets. There is a significant increase in the maximum adjusted squared Sharpe performance in moving from the PMV frontier of the factors to the UMV frontier of the test assets and factors. Allowing dynamic trading in the factors, the UMV efficiency is rejected for all factor models using the size/BM portfolios as the test assets. This result implies the dynamic mean–variance intersection hypothesis (Huberman and Kandel (1987)) is rejected for each factor model. In contrast, using the volatility/momentum portfolios the dynamic mean–variance intersection is only rejected for the FF3, FF5, LSC, HXZ, FP, and SY models. The rejection of UMV efficiency of the multifactor models is consistent with Ferson and Siegel (2009), and Ferson et al. (2024).

Second, the rejection of the UMV efficiency of the factor models is driven mainly by the dynamic trading in the test assets and factors. All of the Sh2 Diff2 measures are significantly positive and in most cases a lot higher than the Sh2 Diff1 measures. For the BS and CZ models, the PMV efficiency cannot be rejected in either set of test assets. Allowing dynamic trading in the factors leads to a significant increase in the maximum adjusted squared Sharpe measures, with a significant positive Sh2 Diff4 measures for all models. This is especially the case with the CZ model. The importance of dynamic trading is in evaluating factor models is consistent with Ferson and Siegel (2009) and Ferson et al. (2024).

Third, allowing dynamic trading in the factors has a significant impact on the relative model comparison tests. In most cases, there is a sizeable increase in the Sh2 Diff measures using the UMV frontier relative to the PMV frontier, and more of the Sh2 Diff measures are statistically significant. The CZ model has the highest maximum adjusted squared Sharpe measure and significantly outperforms all models, except the BS model. The difference in adjusted squared Sharpe measures of the CZ and BS models is sizeable but not statistically significant. The superior performance of the CZ model stems from the large increase in maximum adjusted squared Sharpe measure in moving from the PMV to UMV frontier, and complements the evidence in Chib and Zeng (2020).

My study suggests that testing UMV efficiency of linear factor models represents a much greater challenge for asset pricing models to pass, and it also has a significant impact on relative model comparison tests. The rejection of UMV efficiency suggests that none of the factor models are an “Appropriate Benchmark” to evaluate U.K. equity fund managers for clients with a quadratic utility function. My study has assumed the zero-beta return is given by the average return of the one-month U.K. Treasury Bill. An interesting extension would be to conduct model comparison tests where the optimal zero-beta return is estimated along the lines suggested by Ferson et al. (2024). My study has focused on multifactor models. An interesting extension would be to look at alternative stochastic discount factor models based on nonlinear models like the consumption CAPM, or the use of conditional factor models following the approach in Ferson et al. (2024). Recent studies by Ehsani and Linnainmaa (2022), and Chib et al. (2023) suggest alternative ways of forming the factors. An examination of the UMV efficiency of these factor models is also of interest. I leave these issues to future research.

The table reports summary statistics of test assets and factors between July 1983 and December 2022. Panel A of the table includes the average excess returns (%) and standard deviation (Std Dev) of the factors. The t-statistic column is the t-statistic of the null hypothesis that the average excess factor returns are equal to zero. Panel B of the table includes the average excess returns (%) of the 16 size/BM portfolios, and 15 volatility/momentum portfolios.

The table reports the UMV efficiency tests of ten linear factor models in U.K. stock returns, and corresponding t-statistics. The Sh2 Diff measure is given by the difference between the adjusted maximum squared Sharpe measures of the UMV frontier of the test assets and factors and the PMV frontier of the factors. The t-statistic comes from Ferson et al. (2024). The CE is the Certainty Equivalent excess return and is given by (1/2γ)Sh2 Diff, and γ is set equal to 5. In panel A, the test assets are 16 size/BM portfolios, and in panel B the test assets are 15 volatility/momentum portfolios. The zero-beta return is assumed to be given by the average returns of the one-month U.K. Treasury Bill. The sample period is between July 1983 and December 2022.

The table reports the decompositions of the UMV efficiency tests of Ferson et al. (2024). The first decomposition in panel A reports Sh2 Diff1 = Sh2pmv(r,f) – Sh2pmv(f), Sh2 Diff2 = Sh2umv(r,f) – Sh2pmv(r,f), and the corresponding t-statistics. The second decomposition in panel B reports Sh2 Diff3 = Sh2umv(r,f) – Sh2umv(f), and Sh2 Diff4 = Sh2umv(f) – Sh2pmv(f), and the corresponding t-statistics. The test assets are 16 size/BM portfolios, and the zero-beta return is given by the average return of the one-month U.K. Treasury Bill. The t-statistics are estimated from Ferson et al. (2024). The sample period is July 1983 and December 2022.

The table reports the decompositions of the UMV efficiency tests of Ferson et al. (2024). The first decomposition in panel A reports Sh2 Diff1 = Sh2pmv(r,f) – Sh2pmv(f), Sh2 Diff2 = Sh2umv(r,f) – Sh2pmv(r,f), and the corresponding t-statistics. The second decomposition in panel B reports Sh2 Diff3 = Sh2umv(r,f) – Sh2umv(f), and Sh2 Diff4 = Sh2umv(f) – Sh2pmv(f), and the corresponding t-statistics. The test assets are 15 volatility/momentum portfolios, and the zero-beta return is given by the average return of the one-month U.K. Treasury Bill. The t-statistics are estimated from Ferson et al. (2024). The sample period is July 1983 and December 2022.

The table reports relative model comparison tests between the linear factor models using the PMV frontier. Panel A includes the difference between the adjusted maximum squared Sharpe measures of the model in the column and the model in the row (Sh2 Diff). Panel B includes the corresponding t-statistics which are estimated from Ferson et al. (2024). The zero-beta return is given by the average return of the one-month U.K. Treasury Bill. The sample period is July 1983 and December 2022.

The table reports relative model comparison tests between the linear factor models using the UMV frontier. Panel A includes the difference between the adjusted maximum squared Sharpe measures of the model in the column and the model in the row (Sh2 Diff). Panel B includes the corresponding t-statistics which are estimated from Ferson et al. (2024). The zero-beta return is given by the average return of the one-month U.K. Treasury Bill. The sample period is July 1983 and December 2022.