1 Introduction

Return predictability and portfolio selection are two of the most relevant topics in finance. The ability to forecast returns has important implications for portfolio asset allocation, as highlighted by Campbell et al. (2003) and McMillan (2021). The possibility of return predictability motivates the use of more robust estimation techniques and predictor selection to detect and use all the pertinent information available in the data. Most notably because, as it is well known, financial markets are subject to permanent shocks and structural breaks, and experience volatility clustering. Hence, well-defined investment strategies should use flexible forecasting models that accommodate those features of financial data.

Recent research has analyzed the predictability of stock and bond returns by considering several macro and microeconomic variables, such as inflation rates (Ludvigson & Ng, 2009), interest rates (Bandi et al., 2019), macroeconomic attention indices (Ma et al., 2022a), geopolitical risk (Ma et al., 2022b), climate change news risk (Huynh & Xia, 2020), financial stress index (Xu et al., 2023), valuation ratios, and dividend-price ratio (Cochrane, 2007; Golez & Koudijs, 2018). Typically, these studies rely on specific sets of predictors to forecast multiple-asset returns (Gao & Nardari, 2018). For instance, Welch and Goyal (2008) and Rapach et al. (2010) consider fifteen predictors, most of them financial variables. Hjalmarsson (2010) use four financial variables (stock price ratios and interest rates) to predict excess stock returns in 14 countries, Golez and Koudijs (2018) only use two valuation ratios as predictors of stock returns in four countries, and Zhang et al. (2019) only apply the short interest index and the aligned investor sentiment to forecast the returns of the S&P500. There is no consensus on the predictor space, but for sure its definition is of utmost importance since inadequate predictors reduce the predictive ability and, consequently, the performance of asset allocation strategies devised upon those models.

Traditionally, research on return predictability has mainly been building up in-sample empirical evidence. However, more recent literature has highlighted the power and robustness of out-of-sample analyses (Fisher et al., 2020; Welch & Goyal, 2008). The debate between the advantages and disadvantages of in-sample versus out-of-sample analyses has focused on different aspects, such as data snooping, data mining, spurious regressions, and instability in return predictability (Wu et al., 2013; Dichtl et al., 2021). This paper adopts an out-sample analysis, as we believe that it provides more reliable insights.

There is an extensive body of empirical literature comparing the predictive power of different models. Typically, asset returns are forecasted within a Vector Autoregressive (VAR) framework (Guidolin & Hyde, 2012). Despite its popularity, VAR models entail the danger of over-parameterization, leading to unreliable predictions. Nowadays, the toolbox of applied econometrics includes numerous modeling and forecasting tools to prevent the proliferation of parameters and reduce the number of parameters and model uncertainty. Amongst these tools, stand out time-varying parameters models, forecast combinations, model averaging, and model selection techniques, which have been fueled by the noticeable advances in computational power. Although the existing literature proposes numerous extensions to VAR models, still there is no consensus on which framework is the best setup to forecast multiple assets. According to Fisher et al. (2020), not a single feature alone, but an ensemble of them, is required to handle the uncertainty and instability of financial markets, aiming at making good predictions. Hence, in line with the existing literature, this study analyses different model specifications and features, such as time-varying parameters, model/forecast combinations, and dynamic model selection/averaging to jointly obtain dynamic forecasts of three risky asset classes out-of-sample and use these forecasts to lead the investment.

Typically, the literature has concentrated on stocks and bonds. However, other assets, such as Real Estate Investment Trusts (REITs), have increasingly gained the interest of academics and practitioners. Recently, REITs have been seen as an alternative investment vehicle because they provide diversification benefits, improve the risk-return trade-off, and offer a non-negligible dividend income (see, for instance, Ling et al., 2020; Zhu & Lizieri, 2022). In this study, we focus on three classes of assets: Stocks, Bonds, and REITs. More precisely, the main goal of this study is to obtain the best portfolio of Stocks, Bonds, and REITs in the US from January 1976 to December 2021. The US is the world’s largest economy, accounting for a quarter of the global Gross Domestic Product (GDP) and has the largest stock market capitalization in the world.Footnote 1 Besides that, we choose the US due to the availability of long and up-to-date time series and to have a higher level of comparability with most of the literature.

The applied forecasting schemes used in this study are in line with several papers, namely Banbura et al. (2010), Rapach et al. (2010), and Koop and Korobilis (2013). The methodological framework includes combinations of forecasts from conventional VARs, namely means and forecasts weighted by the Mean Squared Forecast Error (WMSFE) of the best models, and Bayesian Dynamic Model Selection (DMS) and Dynamic Model Averaging (DMA) with different sets of predictors. But this paper contributes to the existing literature in several aspects.

First, it includes REITs in the asset space. US REITs are firms that own or finance income-producing real estate across 13 property sectors. Investment into REITs can be performed through the purchase of individual company stocks, mutual funds, or exchange-traded funds (ETF). According to the NAREIT website (https://www.reit.com), currently “REITs of all types collectively own more than $4.5 trillion in gross assets across the US, with public REITs owning around $3 trillion in assets. Approximately 150 million (roughly 45%) American households invested in REITs.” By September 2022, US-listed REITs had an equity market capitalization of more than $1.4 trillion, almost 4% of the total capitalization of the S&P 500.

Second, the study begins with a large set of predictors that surpasses most of the ones used in related studies. Besides the lag returns of the assets, we additionally consider 155 predictors (19 for Stocks, 122 for Bonds, and 14 for REITs). A Genetic Algorithm (GA) is used to select the predictor space besides returns. Usually, the choice of the variables is made through ad hoc methods, which can potentially exclude pertinent variables. The GA search technique has been applied successfully in diverse optimization problems commonly associated with Machine Learning models to forecast returns and develop trading strategies (e.g., Bauer, 1994; Karathanasopoulos et al., 2016; Leigh et al., 2002; Ozcalici & Bumin, 2022). However, to the best of our knowledge, this is one of the first papers to apply GA to select the predictor space.

Third, the use of up-to-date data allows the study of the recent years (the out-of-sample period begins in October 2006), when the financial markets were characterized by abnormal turbulence due to the global financial crisis of 2008–2009 triggered by the US subprime and subsequent sovereign crises and to the COVID-19 pandemic (the World Health Organization’s (WHO)) identified the outbreak as a Public Health Emergency of International Concern (PHEIC) on January 30, 2020, and a pandemic on March 11, 2020, (see, for instance, Ganie et al., 2022; Szczygielski et al., 2023).

Based on the literature, we believe that predictability is time-varying. Both the best predictor space and coefficients change over time, and therefore there is a need to update the forecasting models. Combining or selecting forecasts from different models may increase the forecasting accuracy and strengthen the forecasts against misspecification bias and measurement errors in the data (Timmermann, 2006). Thus, we expect from the beginning that the Bayesian frameworks will produce better results.

There are two main results of this study to highlight. First, although the forecasting accuracy is low, especially for the Stocks, the lagged returns of the S&P 500 present some ability to forecast the other markets. Second, the best portfolio strategy is the DMA/DMS with only the lagged returns in the predictor space, regardless of the level of risk aversion of a Constant Relative Risk Averse (CRRA) investor. Our results support the use of methodologies that incorporate dynamic modeling and parameter uncertainty and make use of combinations or selections of several models. Most importantly, highlight that the inclusion of other assets increases the overall predictability and improves the performance of the portfolios.

In a nutshell, we intend to provide additional evidence that may help investor and practitioners in devising reliable and robust investment strategies. The paper focus different forecasting models, provides a method for selecting the predictor space, and highlights the need to consider different types of assets.

The remaining of this study is structured into five sections. Section 2 presents a brief literature review. Section 3 describes and provides a preliminary analysis of the data. Section 4 outlines the basic theoretical concepts, the specifications of the models and presents the methodology to measure the forecast accuracy, construct the portfolios, and the metrics used to assess their performance. Section 5 shows the statistical and economic results obtained from different models. Finally, Sect. 6 highlights the main conclusions.

2 Literature Review

Throughout the years, several studies have analyzed the predictability of financial returns using different predictor spaces (e.g., Kothari & Shanken, 1997; Campbell & Shiller, 1988; Pontiff & Schall, 1998; Baker & Wurgler, 2000; Goetzmann et al., 2001; Lettau & Ludvigson, 2001; Guo, 2006; van Binsbergen & Koijen, 2010; Ferreira & Santa-Clara, 2011; Rapach et al., 2013; Neely et al., 2014; Maio & Santa-Clara, 2015; Golez & Koudijs, 2018; Jagannathan & Liu, 2019; le Bris et al., 2019; Bandi et al., 2019; Piatti & Trojani, 2019; Dai et al., 2021).

Goetzmann et al. (2001) analyze the US aggregate stock market and found little evidence of stock return predictability during most of the XIX century until 1925. Ferreira and Santa Clara (2011) address the predictability of international stock returns using dividend-price ratios, earnings growths, and price-earnings ratio growths from 1927 to 2007 finding substantial predictability, hence concluding that it would have been possible to profitably “time the market”. Golez and Koudijs (2018) show that dividend-price ratios did not predict US stock returns from 1871 to 1945. However, there was some forecasting ability afterwards, until 2015. Piatti and Trojani (2019) also reach a similar conclusion to the previous study. Dai et al. (2021) find evidence of stock return predictability using technical indicators from 1989 to 2018.

Although mainstream research has focused solely on stocks and bonds, more recently, other types of assets, such as Real Estate Investment Trusts (REITs), have attracted the attention of researchers. According to Habbab et al. (2022), one of the most significant advantages of investing in REITs is to benefit from the real estate sector without having to pay a substantial amount or manage the underlying assets. Bhuyan et al. (2014) reinforce that REITs have been an alternative investment vehicle since the 1980s. Historically, REITs have been a desirable financial asset by providing diversification benefits, improving the risk-return trade-off, and supplying a non-negligible dividend income (see, for instance, Ling et al., 2020). Since REITs tend to adjust quickly to the cost of living, they also provide a hedge against inflation, turning their real return relatively stable. Furthermore, REITs returns show high predictability since their income comes from the underlying commercial real estate with long-term lease periods (Bhuyan et al., 2014; Fugazza et al., 2015). Beracha et al. (2019) point out that REITs had a high market valuation of 4% annually in the last ten years before the study was made.

There is extensive research on which variables are more suitable for predicting returns of stocks and bonds. In the early 1960s, several studies examined the forecast power of several technical indicators, such as moving averages, filter rules, and momentum oscillators. This line of research was recently recovered by some authors, such as Neely et al. (2014), Gao et al. (2018), Zhang et al. (2019), and Dai et al. (2021). Besides these indicators, the literature has provided a broad list of predictors, such as the dividend-price ratio (Campbell & Shiller, 1988; Cochrane, 2007), earnings-price ratio (Campbell & Shiller, 1988), book-to-market ratio (Kothari & Shanken, 1997), accruals (Hirshleifer et al., 2009), nominal interest rate and interest rate spread (Fama, 1990; Rapach et al., 2016), volatility and downside risk (Bollerslev et al., 2014; Guo, 2006; Kilic & Shaliastovich, 2019), lagged industry returns (Hong et al., 2007), oil prices and oil-relative variables (Driesprong et al., 2008; Nonejad, 2018), investor sentiment (Huang et al., 2015), manager sentiment (Jiang et al., 2019), expected business conditions (Campbell & Diebold, 2009), labor income (Santos & Veronesi, 2006), aggregate output (Rangvid, 2006), output gap (Cooper & Priestly, 2009), inflation rate (Ludvigson & Ng, 2009), and main macroeconomic indicators (Wang et al., 2018). Welch and Goyal (2008) summarize a list of several variables that have been used in the literature with positive results. The present study considers not only this list but also other variables that have also been used for predicting returns in a multi-asset framework.

Earlier studies mostly report in-sample evidence on return predictability. The predominance of in-sample studies could be justified by using all available data, which increases the power of econometric tests (Neely et al., 2014). As argued by the authors, in-sample estimations tend to produce efficient and precise estimates of the parameters. However, in-sample tests may be biased if the predictor and return innovations are correlated and the predictor is highly persistent (Stambaugh, 1999). That bias potentially leads to substantial size distortions in the usual t-tests on the significance of the variables.

The focus on in-sample predictability has been gradually shifting to out-of-sample predictability (Feunou et al., 2018; Dai et al., 2020). For instance, Welch and Goyal (2008) and Thornton and Valente (2012) show that although some variables successfully predicted returns in-sample, they were not significant out-of-sample. Predictions based on these variables failed to consistently outperform the simple historical average benchmark forecast in terms of Mean Squared Forecast Error (MSFE).

Whether returns are predictable out-of-sample or not is still an ongoing debate. According to Wu et al. (2013), the conflicting empirical results presented in the literature may be related to problems such as data mining, spurious regressions, and instability of return predictability. Hence, recent studies have provided adaptive methods that improve forecasting in a dynamic setup. Just to name a few: time-varying parameters or time-varying volatility (Dangl & Halling, 2012), diffusion indexes (Ludvigson & Ng, 2009), combinations of many potential return predictors (Rapach et al., 2013; Fisher et al., 2020; Zhang et al., 2018; Bahrami et al., 2019, Gargano et al., 2019), inclusion of regime shifts (Hammerschmid & Lohre, 2018). Nevertheless, some recent studies have argued in favor of traditional predictive regressions, showing that these methods, updated with schemes to resolve parameter uncertainty and instability, outperform the historical average forecast in out-of-sample experiments (see, for instance, Rapach et al., 2013; Koop & Korobilis, 2013; Fisher et al., 2020).

Most commonly, asset returns are forecasted in a Vector Autoregressive (VAR) framework (Guidolin & Hyde, 2012). VAR models provide a coherent way to generate internally consistent multiperiod forecasts that account for concurrent and dynamic correlations across the variables (Elliott & Timmermann, 2008). VAR models are a valuable tool for a small number of assets and additional predictors. However, the amplification of the asset or predictor space implies an increase in the number of parameters that may lead to an enlargement of the estimation error. Different methodologies have been developed to deal with this issue, such as Bayesian methods that make use of the high computational power that is now available to researchers. Koop and Korobilis (2013) and Dangl and Halling (2012) are two examples of applications of such methodologies with positive results.

According to Parslow et al. (2013), the main advantage of Bayesian methods is their potential to systematically incorporate previous knowledge of models and parameters. Additionally, Bayesian frameworks allow to moderate prior information (see, for instance, Barberis, 2000; Fugazza et al., 2015). Bayesian models may also include numerous efficient shrinkage tools to prevent the proliferation of parameters and eliminate parameter/model uncertainty. Two examples are the Bayesian model averaging and selection applied to VAR and time-varying VAR (Dangl & Halling, 2012; Elliott et al., 2013; Koop & Korobilis, 2013).

The Bayesian-based dynamic models have four main advantages. First, they do not require Markov Chain Monte Carlo simulations. Instead, they rely on estimated discount factors that characterize the degree of variation of the VAR coefficients. Second, they allow Dynamic Model Switching (DMS), which mitigates over-parameterization. This method selects a model over a set of different dimensions based on the past predictive likelihoods of the dependent variables. Third, they allow for time-varying parameters. Usually, forecasting models assume that coefficients are constant over time, although there is plenty of evidence of instability in the relationship between asset returns and predictors. Fourth, Bayesian setups may average forecasts from different models over time (Dynamic Model Averaging). Several studies highlighted the substantial benefits of combining forecasts across different models, such as the improvement in the predictive performance (Tian et al., 2021), the strengthening of the forecast against misspecification bias and measurement errors in the data (Timmermann, 2006), the handling of uncertainty and instability of financial markets (Fisher et al., 2020), and the providing of diversification benefits (Atiya, 2020).

Most academic studies regarding asset allocation focus on myopic portfolio optimization problems (see, for instance, DeMiguel et al., 2009; Daskalaki & Skiadopoulos, 2011; Cenesizoglu & Timmermann, 2012). However, recent literature has shown substantial utility benefits when investors incorporate return predictability and model uncertainty into their investment decisions (Diris et al., 2015; Rapach & Zhou, 2013). Hence, several studies, such as Johannes et al. (2014), Gargano et al. (2019), Gao and Nardari (2018), and Fisher et al. (2020), have implemented Bayesian dynamic approaches to portfolio strategies with positive results.

3 Data Description and Preliminary Analysis

3.1 Asset Classes

This study considers three US-based asset classes: Stocks, Bonds, and REITs. The period under scrutiny spans from January 1976 to December 2021 (553 monthly observations). The three initial months were used to compute some predictors. Hence the sample was reduced to 550 observations, from March 1976 to December 2021. Stocks are proxied by the S&P 500 Total Return Index, and Bonds are proxied by the Barclays Capital US Aggregate Bond Index, both collected from the Refinitiv Eikon database. REITs are proxied by the FTSE Nareit US Real Estate All Equity REITs Index, collected from the NAREIT website (https://www.reit.com/data-research). Hence, we only consider publicly available stocks of REITs. The risk-free rate is proxied by the monthly yield-to-maturity of 3-month US Treasury Bills.

Figure 1 plots the cumulative returns of the total return indexes of the Stocks, Bonds, and REITs. All series are notably more volatile after 2000, and Stocks and REITs are more sensitive to the business cycle than Bonds. However, the cumulative returns of Stocks and REITs dominate those of Bonds throughout the overall period, especially after March 2009. In the previous 19 months, from September 2007 until March 2009, the Stocks and REITs indexes fell − 58.53% and − 93.26% while the Bonds increased by 20.08% due to the subprime crisis. Nevertheless, a Buy-and-Hold investment during the overall period would earn an annual rate of return of approximately 7.32% and 6.91% for REITs and Stocks, respectively, while this figure for Bonds is only 3.66%.

Fig. 1
figure 1

Cumulative log-returns of the total return indexes of the Stocks, Bonds, and REITs, computed as \({I}_{t}={\text{exp}}\left({\sum }_{\tau =1}^{t}{r}_{i,\tau }\right)\) from March 1976 to December 2021 (550 monthly observations). Stocks are proxied by the S&P 500 Total Return Index, Bonds are proxied by the Barclays Capital US Aggregate Bond Index, both collected from the Refinitiv Eikon database, and REITs are proxied by the FTSE Nareit US Real Estate All Equity REITs Index, collected from the NAREIT website (https://www.reit.com/data-research). The vertical line separates the in-sample (IS) period (2/3 of the sample, corresponding to 367 months) and the out-of-sample (OOS) period (1/3 of the sample, corresponding to 183 months)

To analyze the predictability of the returns of these three classes of assets, we divided the sample into an in-sample (IS) period (2/3 of the sample, corresponding to 367 months) and an out-of-sample (OOS) period (1/3 of the sample, corresponding to 183 months). The analysis is conducted out-of-sample recursively. More specifically, for each month out-of-sample, beginning at \({t}_{0}+1=368\) until \(T=550\), the returns are forecasted using all the information until the previous month, i.e., the forecast at \({t}_{0}+1\) is obtained with the previous 367 months, the forecast at \({t}_{0}+2\) is obtained with the previous 368 months, and so forth until month \(T\), to which the forecast is obtained using the previous 549 months. The recursive (expanding) estimation window to generate out-of-sample forecasts is usually used in the literature on return predictability (see, e.g., Rapach et al., 2010, 2016; Neely et al., 2014; Zhang et al., 2019).

The forecasts are then used to allocate the investment into portfolios formed by Stocks, Bonds, REITs, and the riskless asset (3-month US Treasury Bills). Accordingly, the portfolios are rebalanced monthly. Almadi et al. (2014) analyzed the performance of dynamic portfolios rebalanced at the same frequency as the forecast horizon showing that monthly rebalancing provides the best performance in the presence of transaction costs of 0.5%.

Table 1 reports the summary statistics of Stocks, Bonds, and REITs returns for the overall sample, IS, and OOS periods. In the overall sample, the mean and median monthly stock returns are the highest (0.59% and 0.96%, respectively). Most of the means are significantly different from zero. The variability of these returns, measured by the standard deviation and range, is high but is surpassed by REITs. The means of Stocks and Bonds are higher OOS, while the median is higher for Stocks and REITs. The variability measured by the standard deviation and range is also, loosely speaking, higher OOS, especially for REITS, for which the standard deviation almost doubled. Bonds are the asset class that has the lowest mean, median, and variability. Bonds have positive skewness, whilst Stocks and especially REITs are negatively skewed. The three asset classes present mild excess kurtosis, and REITs stand out as the one with the most leptokurtic distribution, especially in the OOS period. All series are non-normal, as indicated by the Jarque–Bera test.

Table 1 Descriptive statistics of log-returns

3.2 Additional Predictors

The initial database compiles a comprehensive set of predictive variables documented in the literature on asset return predictability. The Appendix lists these variables (19 for stocks, 122 for Bonds, and 14 for REITs), presenting their abbreviations, summary descriptions, online sources, and transformations made to obtain the variables used in the literature and to correct non-stationarity.

It is well known that the profusion of predictive variables in asset returns regressions leads to overfitting and a poor performance out-of-sample. Thus, to avoid this problem, a Genetic Algorithm (GA) is used for selecting the most relevant predictors. The Darwinian process of natural selection that drives the evolution of species inspired this method, which may solve a wide array of optimization problems. It has proved valuable in variable selection for multivariate regressions, as it achieves good performance and its execution time is lower than that of alternative algorithms (Leardi & Gonzáles, 1998; Leardi et al., 1992). The GA begins by randomly creating an initial population of candidate solutions (chromosomes). Then, the performance of each chromosome is evaluated. In the next stage, the chromosomes are selected based on their performance and are combined using crossover operations. Also, some chromosomes are mutated according to the mutation probability. These operations lead to the creation of a new population. The procedure is repeated until the termination condition is reached (for further details, see, for example, Leardi et al., 1992).

In this research, the GA is implemented using the Matlab Regression toolbox (see Consonni et al., 2021, for further details). The initial population has 50 individuals, and the GA is run for 200 generations. The crossover and mutation probabilities are set to 0.5 and 0.01, respectively. The performance of each chromosome is assessed using a fivefold cross-validated Root Mean Squared Error (RMSE). A maximum number of 10 predictors is superimposed into the algorithm.

Table 2 displays the predictors selected by the GA and highlights, in bold, the best predictor for each asset class.

Table 2 Selected predictors for each asset class

Table 3 presents the correlations of asset returns and the three most important additional predictors. In the full sample, only the lagged predictor of bonds is significantly correlated with that asset class. Also, there is no significant predictor for Stocks, but Bonds and REITs are significantly correlated with three predictors implying that these two asset classes probably present some degree of predictability. The additional predictors are not significantly cross-correlated, which is a desirable feature, as Zhang et al. (2019) highlights.

Table 3 Autocorrelations and cross-correlations

The absence of significant correlations for Stocks remains OOS. In this period, the number and significance of correlations for Bonds and especially for REITs decreases. In OOS, the correlations between Bonds and lagged Stocks and lagged REITs returns are significant at the 1% and 5% levels, respectively. Lagged stock returns are also correlated with REITs (at the 5% level). It seems that OOS, the conservative minus aggressive factor (CMA) may help predict REITs.

In sum, this analysis points out that probably Stocks are not forecastable, but the information on the stock market may help predict REITs and especially Bonds.

4 Methodology

This section presents the basic theoretical concepts and model specifications. It begins by presenting the Vector Autoregressive model (VAR) and the procedures used to obtain forecasts based on the combinations of VAR estimated with different predictor spaces. Next, it presents the Time-Varying Parameter Vector Autoregressive model (TVP-VAR), the procedures used to estimate these models with forgetting factors, and the methods to combine or select these models. Finally, it shows the metrics of forecasting accuracy and portfolio economic performance from the perspective of a CRRA investor.

4.1 Vector Autoregressive Models (VAR) and Forecast Combinations

Since its introduction by Sims (1980), VAR has been widely used for forecasting purposes. These models are a straightforward multivariate generalization of univariate autoregressions and can generate dynamic forecasts that ensure consistency across different endogenous variables and forecasting horizons. Many researchers have used large VAR with tens of dependent variables (see, among many others, Banbura et al., 2010; Carriero et al., 2009, and Koop & Korobilis, 2013).

As in Campbell et al. (2003), this study implements first-order vector autoregressive models, VAR(1), to capture the linear dynamics of asset returns and other predictors,Footnote 2 such that:

$${\mathbf{y}}_{t}={\mathbf{X}}_{t}\mathbf{\rm B}+{{\varvec{\upepsilon}}}_{t},$$
(1)

where \({\mathbf{y}}_{t}\) is a column vector containing the observations on \(m={m}_{1}+ {m}_{2}\) time series. This vector includes a \(({m}_{1}\times 1)\) vector of excess log-returns at time \(t\), \({\mathbf{r}}_{t}\), and a \(({m}_{2}\times 1)\) vector of other endogenous variables, \({\mathbf{u}}_{t}\). In our empirical application, \({\mathbf{r}}_{t}=\left[{r}_{1,t} {r}_{2,t} {r}_{3,t}\right]{\prime}\), where \({r}_{1,t}\), \({r}_{2,t}\), and \({r}_{3,t}\) are the excess log-returns over the risk-free rate of Stocks, Bonds, and REITs, respectively. \({\mathbf{X}}_{t}\) is a \(\left(m\times k\right)\) such that:

$${\mathbf{X}}_{t}=\left[\begin{array}{cccc}{\mathbf{x}^{\prime}}_{t}& 0& \cdots & 0\\ 0& {\mathbf{x}^{\prime}}_{t}& \dots & \vdots \\ \vdots & \vdots & \vdots & 0\\ 0& \cdots & 0& {\mathbf{x}^{\prime}}_{t}\end{array}\right],$$
(2)

where \({\mathbf{x}}_{t}\) is a column vector containing an intercept and one lag of each of the \(m\) variables, and therefore \(k=m(1+m)\). \(\mathbf{B}\) is the coefficient matrix, and \({{\varvec{\upepsilon}}}_{t}\) is a vector of shocks i.i.d. \(\mathcal{N}\left(0,{\varvec{\Sigma}}\right).\) Notice that shocks can be cross-sectionally correlated.

In the present study, the VAR(1) includes 3 asset classes and 3 predictors (one predictor for each asset class). Thus, there is a total of \(8\times 6\times 3=144\) different model specifications corresponding to all possible combinations of predictors in \({\mathbf{u}}_{t}\).

The forecasts of the excess log-returns vector are obtained using several methods for combining the individual forecasts from models \({\mathcal{M}}_{j}\), with \(j=1, 2, \dots , n\). More specifically, the individual model forecasts, \({\mathbb{E}}\left({\mathbf{r}}_{t}|{\mathcal{M}}_{j},{\mathcal{F}}_{t-1}\right)\), and covariances matrices forecasts, \(\widehat{Cov}\left({\mathbf{r}}_{t}|{\mathcal{M}}_{j},{\mathcal{F}}_{t-1}\right)\), which use information up to time \(t-1\), \({\mathcal{F}}_{t-1}\), are then used to compute the Mean, i.e., the average of the forecasts of individual models, and the Weighted Mean Squared Forecasting Errors (WMSFE) forecasts of \({\mathbf{r}}_{t}\). For each model \(j\), the WMSFE is computed as

$$W{MSFE}_{j}=\frac{1}{T-{t}_{0} }\sum_{t={t}_{0}+1}^{T}{\mathbf{e}}_{j,t}^{\prime}[\widehat{Cov}\left({\mathbf{r}}_{t}\right){]}^{-1}{\mathbf{e}}_{j,t},$$
(3)

where \({\mathbf{e}}_{j,t}\) is the column vector of forecast errors of model \(j\) at time \(t\) in the out-of-sample period, with \(t={t}_{0}+1, {t}_{0}+2, \dots , T\). The forecast error of asset \(i\) in model \(j\), is the difference between the realized excess log-return,\({r}_{i,t}\), and the one-step-ahead forecast, \({\widehat{r}}_{i,j,t}\), i.e., \({e}_{i,j,t}={r}_{i,t}-{\widehat{r}}_{j,i,t}.\) \(\widehat{Cov}\left({\mathbf{r}}_{t}\right)\) is the sample estimate of the asset excess log-returns unconditional covariance matrix (see Rapach et al., 2010).

The combinations based on the WMSFE heighten certain individual forecasts due to the covariance matrix. This procedure attributes higher penalties to forecast errors with lower variability, that is, those on which the investor is highly confident, and penalizes more lightly diffuse forecasts. Also, penalties are lower when forecast errors have the same (opposite) sign for positively (negatively) correlated assets than when the reverse pattern holds. Furthermore, combinations based on the WMSFE allow the weights on individual forecasting models to reflect their past predictive accuracy. More specifically, the weight of model \(j\) at \(t\) is given by its WMSFE in the period before, \(t-1\), that is:

$${\omega }_{j,t}=\frac{{\varphi }_{j,t}^{-1}}{\sum_{l=1}^{n}{\varphi }_{l,t}^{-1}},$$
(4)

where \({\varphi }_{j,t}\) is the sorted WMSFE for the \(j\) model at time \(t\) according to the WMSFE. The weight \({\omega }_{j,t}\) may be computed using all models or just a subset of these models. In this study we consider the 10%, 20%, 30%, 40%, and 50% best models.

Regardless of the method used, there are several advantages to using forecast combinations. They can be seen as a diversification strategy in asset allocation, may capture different aspects of business conditions and provide information signals to models and predictive power variations through time (Bates & Granger, 1969; Rapach et al., 2010). For instance, if the correlation between individual forecasts is weak, their combination may produce less unstable models, rendering more stable forecasts, reducing forecast risk, and improving forecast performance under model instability and uncertainty (Rapach & Zhou, 2013).

4.2 Time-Varying Parameter Vector Autoregressive (TVP-VAR)

A Time-Varying Parameter-Vector Autoregressive model of order 1, TVP-VAR(1), may be represented as follows:

$${\mathbf{y}}_{{\varvec{t}}} = {\mathbf{X}}_{t} {\mathbf{B}}_{t} + {{\varvec{\upvarepsilon}}}_{t} ,\;{\text{and}}\;{\mathbf{B}}_{t} = {\mathbf{B}}_{t - 1} + {{\varvec{\upnu}}}_{t} ,$$
(5)

where \({{\varvec{\upvarepsilon}}}_{t}\) is i.i.d. \(\mathcal{N}(0,{{\varvec{\Sigma}}}_{t})\) and \({{\varvec{\upnu}}}_{t}\) is i.i.d. \(\mathcal{N}(0,{\mathbf{Q}}_{t})\). \({{\varvec{\upvarepsilon}}}_{\tau }\) and \({{\varvec{\upnu}}}_{t}\) are independent for all \(\tau\) and \(t\).

Traditionally TVP-VAR were usually estimated using forgetting factors (also known as discount factors). This method is still used in recent applications due to its simplicity and fast-tracking (see, for instance, Dangl & Halling, 2012; Koop & Korobilis, 2013).

Let \( {\mathbf{Y}}_{{t - 1}} = \left( {y_{1} , \ldots ,y_{{t - 1}} } \right)^{\prime } \) be all the observations until \(t-1\). Using the Kalman filter, the Bayesian inference of \({\mathbf{B}}_{t}\) is

$${\mathbf{B}}_{t-1}|{\mathbf{Y}}_{t-1}\sim \mathcal{N}\left({\mathbf{B}}_{t-1|t-1},{\mathbf{V}}_{t-1|t-1}\right).$$
(6)

The distribution of the state vector in the next period, using the same information set, is:

$${\mathbf{B}}_{t} |{\mathbf{Y}}_{t - 1} \sim{\mathcal{N}}\left( {{\mathbf{B}}_{t|t - 1} ,{\mathbf{V}}_{t|t - 1} } \right),\;{\text{with}}\;{\mathbf{V}}_{t|t - 1} = {\mathbf{V}}_{t - 1|t - 1} + {\mathbf{Q}}_{t}$$
(7)

By replacing \({\mathbf{Q}}_{t}=\left({\lambda }^{-1}-1\right){\mathbf{V}}_{t-1|t-1}\) into the previous equation one obtains

$${\mathbf{V}}_{t|t-1}=\frac{1}{\lambda }{\mathbf{V}}_{t-1|t-1},$$
(8)

where \(\lambda\) denotes the forgetting factor, with \(0<\lambda \le 1\). This implies that observations \(h\)-periods in the past have a weight \({\lambda }^{h}\) in the filtered estimate of \({\mathbf{B}}_{t}\) and that the variance of the coefficient vector increases by a factor of \(1/\lambda\) per period. Dangl and Halling (2012) apply two granularity choices for \(\lambda\): \(\lambda \in \{0.96, 0.98, 1.00\}\) and \(\lambda \in \{0.96, 0.97, 0.98, 0.99, 1.00\}\) and conclude that \(\lambda\) should have a lower bound of 0.98 to mitigate the variability of the coefficients. Alternatively, Koop and Korobilis (2013) implement a more robust technique. Instead of simply setting \(\lambda\) equal to a fixed value, they estimate:

$${\lambda }_{t}={\lambda }_{min}+\left(1-{\lambda }_{min}\right){L}^{{\theta }_{t}},$$
(9)

where \({\theta }^{t}=-NINT({\widetilde{{\varvec{\upvarepsilon}}}^{\prime}}_{t-1}{\widetilde{{\varvec{\upvarepsilon}}}}_{t-1})\), NINT rounds to the nearest integer, and \({\widetilde{{\varvec{\upvarepsilon}}}^{\prime}}_{t-1}={\mathbf{y}}_{t}-{\mathbf{X}}_{t}{\mathbf{B}}_{t|t-1}\) is the one-step-ahead prediction error produced by the Kalman filter. Following Koop and Korobilis (2013) we set \({\lambda }_{min}=\) 0.96 and \(L=1.1\) to obtain values between \(0.96\) and \(1\) for the forgetting factor.

To eliminate the need to simulate the multivariate stochastic volatility in the measurement equation we use the Exponentially Weighted Moving Average (EWMA) estimator for the error covariance matrix:

$${\widehat{{\varvec{\Sigma}}}}_{t}=\kappa {\widehat{{\varvec{\Sigma}}}}_{t-1}+\left(1-\kappa \right){\widetilde{{\varvec{\upvarepsilon}}}}_{t}{\widetilde{{\varvec{\upvarepsilon}}}^{\prime}}_{t},$$
(10)

Our choice of the decay factor draws on the Riskmetrics in JP Morgan/Reuters (1996) technical note, which computes the value of \(\kappa\) that minimizes the root mean squared prediction error for the variance using more than 480 series. Based on their estimates, they suggest using a decay factor in the interval (0.94, 0.98) for monthly data. In this research, we choose the value in the middle of the proposed range (0.96). The computation of \({\widehat{{\varvec{\Sigma}}}}_{t}\) also requires the choice of an initial condition for \({{\varvec{\Sigma}}}_{0}\), which is set equal to the sample covariance matrix of \({\mathbf{y}}_{{t}_{0}}\).

Models such as TVP-VAR are designed to accommodate gradual changes in the coefficients and are unable to adjust to abrupt changes, which reduces their performance. A way to deal with the possibility of significant changes is to allow the switching between different models. Thus, we enable the TVP-VAR(1) to change dimensions over time by using a Dynamic Model Selection (DMS) procedure. This procedure requires the estimation of the forgetting factors, the definition of the priors, and the definition of various dimensions of the combinations of the TVP-VAR models. We use the DMS to select the optimal values for the VAR shrinkage parameter in a time-varying manner. In this way, the DMS is a recursive algorithm where the important recursions are similar to the forecasting and updating equations of the Kalman filter. Following Koop and Korobilis (2013), the model prediction and updating equations using a forgetting factor \(\alpha\) are derived from:

$${\pi }_{j,t|t-1}=\frac{{\pi }_{j,t-1|t-1}^{\alpha }}{\sum_{l=1}^{n}{\pi }_{l,t-1|t-1}^{\alpha }},$$
(11)

which is the probability that model \(j\) will be chosen, given the information up to \(t-1\), and,

$${\pi }_{j,t|t}=\frac{{\pi }_{j,t|t-1}{p}_{j}({{\varvec{y}}}_{t}|{{\varvec{y}}}_{t-1})}{\sum_{l=1}^{n}{\pi }_{l,t|t-1 }{p}_{l}({{\varvec{y}}}_{t}|{{\varvec{y}}}_{t-1})},$$
(12)

where \({p}_{j}({{\varvec{y}}}_{t}|{{\varvec{y}}}_{t-1})\) is the predictive likelihood (the predictive density of model \(j\) calculated at \({{\varvec{y}}}_{t}\)). The probability used to select models can be written as:

$${\pi }_{j,t|t-1}\propto \prod_{k=1}^{t-1}{\left[{p}_{j}\left({{\varvec{y}}}_{t-k}|{{\varvec{y}}}_{t-k-1}\right)\right]}^{{\alpha }^{k}}.$$
(13)

The symbol \(\propto\) means that the probability is proportional to the expression on the right. Model \(j\) receives more weight at time \(t\) according to the accuracy of its forecasts in the recent past. The weight of past predictive densities is controlled by the forgetting factor,\(\alpha\), which has similar features to the forgetting factor defined before,\(\lambda .\)

In our study, we set \(\alpha =0.99\) as in Koop and Korobilis (2013), which implies that the forecast performance 60 periods ago receives 55% as much weight as the forecast performance in the last period. We do not define any other values for \(\alpha\). As Koop and Korobilis (2013) showed, choosing \(\alpha\) between 0.95 and 1 has a minor impact on the results. Furthermore, Dangl and Halling (2012) and Hill and Rodrigues (2022) show that models with \(\alpha\) close to 1 tend to outperform models that forget past information faster. In sum, we consider \(\alpha =0.99\), and \(\kappa = 0.96\).

Our approach does not require the estimation of \({\mathbf{Q}}_{t}\), and, as we referred before, it uses an EWMA estimator of \({{\varvec{\Sigma}}}_{t}\) that requires prior information on \({\mathbf{B}}_{0}\). In the literature, it is usual to use training sample priors to produce hyperparameters that monitor the degree of shrinkage when working with large VAR or TVP-VAR (Banbura et al., 2010). However, Koop and Korobilis (2013) used a different approach that allows for the estimation of the shrinkage hyperparameter in a time-varying manner. To do so, they applied an automatic updating procedure, which is less demanding computationally since it does not require the re-estimation of the shrinkage priors or the model at each point in time.

This study uses a normal prior for \({\mathbf{B}}_{0}\) like a Minnesota prior. The prior mean is set as \({\mathbb{E}}({\mathbf{B}}_{0 })=0\), and the prior covariance matrix of \({\mathbf{B}}_{0}\) is diagonal, such that the \(i\)-th diagonal element is equal to \(\underline{a}\) for the intercepts and \(\delta\) for the lag coefficients. Hence, \(\delta\) controls for the degree of shrinkage on the VAR coefficients,Footnote 3 and\(\underline{a}={10}^{3}\), for the intercepts to be uninformative.

A large degree of shrinkage is needed to obtain good forecasting performances in large VAR and TVP-VAR. To do so, we estimate \(\delta\) at each point in time using a method similar to the one used for the forgetting and decay factors. As in Koop and Korobilis (2013), we use a grid for \(\delta\), such that \(\delta \in \{{10}^{-10},{10}^{-5}, 0.001, 0.005, 0.01, 0.05, 0.1\}\).

In sum, the DMS implies choosing the model with the highest value of \({\pi }_{j,t|t-1}\) to obtain the forecast at time \(t\). Since \({\pi }_{j,t|t-1}\) varies over time, the forecasting model may change, allowing the model switching feature. Besides the DMS, we also consider Dynamic Model Averaging (DMA), which uses \({\pi }_{j,t|t-1}\) as the weighting scheme.

We also consider different dimensions (i.e., different state space sets) when implementing the DMS and DMA procedures. These models are denominated: (1) “Small”, which only considers the first lags of the excess returns, (2) “Medium”, which includes three additional predictors (the best one for each asset chosen by the GA), (3). “Large”, which includes the 17 predictors chosen by the GA, and (4) “Full”, which is selected or averaged, for the DMS and DMA, respectively, across the Small, Medium, and Large models.

A crucial point of TVP-VAR selection and averaging is the calculation of \({\pi }_{j,t|t-1}\). When forecasting at time \(t\), this probability for each model \(j\) is evaluated and the value of \(\delta\) and the dimension of TVP-VAR(1) that maximizes it is used. This is done recursively using Eqs. (11) and (12) and setting the initial probability of selecting each model equal to \({\pi }_{j,0|0}=1/n\) for all models. However, when dealing with TVP-VAR with different dimensions we have different predictive densities, \({p}_{j}({{\varvec{y}}}_{t}|{{\varvec{y}}}_{t-1})\), since \({\mathbf{y}}_{t}\) has different dimensions, rendering them incomparable. A possible solution is to use the same predictive densities for all dimensions. So, we use the predictive density of the Small model since the variables in this model are common to all models. In other words, the DMS is determined by the joint predictive likelihood of the three asset excess returns.

4.3 Forecasting Accuracy, Asset Allocation, and Portfolio Performance

The accuracy of the point forecasts of each forecasting scheme is measured using the Mean-Squared Forecast Errors (MSFE). These schemes are compared against the recursive historical average, which is the average of past asset returns up to the date on which the prediction is made. The forecasting schemes and the resulting are hereafter denoted by \(s\). Following Campbell and Thompson (2008), the additional predictive power of asset \(i\) in scheme \(s\) can be measured by the pseudo-\({R}^{2}\) out-of-sample:

$${{\text{pseudo}}-R}_{i,s}^{2} = 1 - \frac{{MSFE}_{i,s}}{{MSFE}_{i,ha}}=1-\frac{\sum_{t={t}_{o}+1}^{T}{e}_{i,s,t}^{2}}{\sum_{t={t}_{o}+1}^{T}{e}_{i,ha,t}^{2}}.$$
(14)

\({e}_{i,s,t}={r}_{i,t} -{\widehat{r}}_{i,s,t}\) is the forecast error of \(s\), and \({e}_{i, ha,t}={r}_{i,t}-{\overline{r} }_{i,t}\) is the forecast error in relation to \({\overline{r} }_{i,t}\) (the recursive historical average) of asset \(i\) at time \(t\). A scheme \(s\) produces better predictions than the historical average if the pseudo-\({R}_{i,s}^{2}\) is positive.

This analysis is complemented by the adj-MSFE test of Clark and West (2007). This test uses an approximately normal modified version of the MSFE statistic, which the authors show to be undersized. The null hypothesis is that the MSFE of \(s\) and the historical average are equal, whereas the alternative hypothesis is that the \(s\) predictions are more accurate. The most convenient way to implement this one-side test is to compute:

$${\widehat{f}}_{i,s,t} = {\left({r}_{i,t}-{\overline{r} }_{i,t-1}\right)}^{2}-\left[{\left({r}_{i,t}-{\widehat{r}}_{i,s,t-1}\right)}^{2}-{\left({\overline{r} }_{i,t}- {\widehat{r}}_{i,s,t-1}\right)}^{2}\right],$$
(15)

and then regress \({\widehat{f}}_{i,s,t}\) on a constant and use the resulting t-statistics.

To analyze the asset allocation amongst the three risky asset classes and the riskless asset, it is assumed that the investor has a CRRA utility function \(U\left({W}_{t}\right)=\frac{{W}_{t}^{1-\gamma }}{1-\gamma }\), where \({W}_{t}= {\text{exp}}\left\{{r}_{p,t}\right\}\) denotes the investor’s wealth at time \(t\), and \(\gamma\), with \(\gamma >1\), is the relative risk aversion coefficient. At each point in time, the investor chooses the allocation amongst these assets that maximizes the 1-period-ahead expected utility \({\mathbb{E}}_{t-1}[U\left({W}_{t}\right)]\). The optimal weights of \(s\) are given by the solution of the following constrained maximization problem, where the investor maximizes the difference between the portfolio’s simple expected excess return and the portfolio’s variance multiplied by half the coefficient of relative risk aversion (Fisher et al., 2020):

$${\text{arg}}\underset{{\mathbf{w}}_{s,t-1}}{{\text{max}}}{{\mathbf{w}}^{\prime}}_{s,t-1}\left({\widehat{{\varvec{\upmu}} }}_{s,t|t-1}+\frac{1}{2}{diag\widehat{{\varvec{\Sigma}}}}_{s,t|t-1}\right)-\frac{\gamma }{2}{{\mathbf{w}}^{\prime}}_{s,t-1} {\widehat{{\varvec{\Sigma}}}}_{s,t|t-1 }{\mathbf{w}}_{s,t-1}, s.t.: (1) {{\mathbf{w}}^{\prime}}_{s,t-1}{\varvec{\upiota}}\le 1\mathrm{ and }(2) {w}_{s,t-1}\ge 0,\forall \boldsymbol{ }s,t.$$
(16)

The vector \({\mathbf{w}}_{s,t-1}\) denotes the weights of the risky assets in the portfolio given strategy s, \({\widehat{{\varvec{\upmu}} }}_{s,t|t-1}={\mathbb{E}}({\mathbf{r}}_{t}|s,{\mathcal{F}}_{t-1})\) and \({\widehat{{\varvec{\Sigma}}}}_{s,t|t-1}=\widehat{cov}({\mathbf{r}}_{t}|s,{\mathcal{F}}_{t-1})\) are the mean and covariance of the predictive density of the vector of risky asset\({\mathbf{r}}_{t}\), computed using the information available at time \(t-1\) according to\(s\), and \({\varvec{\upiota}}\) is a conformable vector of ones \(.\) Following much of the asset allocation literature, short-selling (i.e., negative portfolio weights) is ruled out. The investment portfolio is rebalanced each month. We have opted for a monthly rebalancing frequency as a strategic choice aimed at mitigating the impact of transaction costs. Also, data has a monthly periodicity; hence it seems adequate to rebalance the portfolio once the information set is updated. This idea is present in Almadi et al. (2014) and Maeso and Lionel (2020).

Assuming that the excess returns of the risky assets are log-normally distributed, following (Campbell et al., 2003), the portfolio log-return resulting from \(s\) at time \(t\) is:

$${r}_{s,t}={r}_{F,t}+{{\widehat{\mathbf{w}}}^{\mathrm{^{\prime}}}}_{s,t-1}\left({\mathbf{r}}_{t}-{r}_{F,t}{\varvec{\upiota}}\right)+\frac{1}{2}{{{\widehat{\mathbf{w}}}^{\prime}}_{s,t-1}diag(\widehat{{\varvec{\Sigma}}}}_{s,t|t-1})-\frac{1}{2}{{{\widehat{\mathbf{w}}}^{\prime}}_{s,t-1}\widehat{{\varvec{\Sigma}}}}_{s,t|t-1}{\widehat{\mathbf{w}}}_{s,t-1}$$
(17)

where \({r}_{F,t}\) represents the continuously compounded risk-free rate.

Besides presenting the descriptive statistics of the discrete returns of the portfolios (mean, standard deviation, skewness, and kurtosis), the performance of the portfolios is also studied using three metrics. The discrete returns are computed as \(R={\text{exp}}\left(r\right)-1\).

The Certainty Equivalent Return (CER) is the risk-free rate that the investor is willing to accept rather than adopting the risky portfolio. Considering monthly data, the annualized CER can be expressed as follows:

$${\widehat{CER}}_{s}={\left(\frac{1}{T-{t}_{0}}\sum_{t={t}_{0}+1}^{T}{\widehat{W}}_{s,t}^{1-\gamma }\right)}^{\frac{12}{1-\gamma }}-1$$
(18)

where \({\widehat{W}}_{s,t}=1+{\widehat{R}}_{s,t}\) is the realized wealth at time \(t\) resulting from \(s\), T is the total number of periods, \({t}_{0}+1\) is the initial observation out-of-sample.

The annualized Sharpe ratio (SR) measures the desirability of a risky investment strategy, by dividing the portfolio average excess discrete return, by the standard deviation of the excess discrete return. In other words, the SR measures the reward per unit of variability:

$${\widehat{SR}}_{s}=\frac{\sqrt{12}\left({\widehat{\mu }}_{{R}_{s}}-{\mu }_{{R}_{F}}\right)}{\sqrt{\frac{1}{T-{t}_{o}}\sum_{t={t}_{o}+1}^{T}{\left\{\left({\widehat{R}}_{s,t}-{R}_{F,t}\right)-\left({\widehat{\mu }}_{{R}_{s}}-{\mu }_{{R}_{F}}\right)\right\}}^{2}}},$$
(19)

The annualized Sortino ratio (SOR) considers the downside risk, that is the negative deviations from a certain target point \(B\) that constitutes the minimum acceptable rate of return.

$${\widehat{SOR}}_{s}=\frac{\sqrt{12}\left({\widehat{\mu }}_{{R}_{s}}-B\right)}{\sqrt{\frac{1}{Q}\sum_{{\widehat{R}}_{s<B}}^{Q}{\left\{min\left({\widehat{R}}_{s}-B\right),0\right\}}^{2}}},$$
(20)

In the computation of SOR, it is considered that \(B=0\), and therefore the denominator is exogenous from the sample mean. Finally, we also present the CVaR at 5%.

These analyses are conducted without and with transaction costs. The transaction costs are incorporated considering a proportional cost rate of 0.5%. This figure is usually used in the literature (see, for instance, DeMiguel et al., 2009, and Almadi et al., 2014). One may argue that in this framework a transaction cost of 0.5% understates the real transaction costs, but one should notice that there are publicly available ETFs on the three indexes under scrutiny. Notice that at time \(t\) the weights are different from the ones estimated at time \(t-1\) due to the changes in prices. The new weight for each risky asset \(i\), after the price change but before rebalancing, is given by \({w}_{i,s,t}^{+}=\frac{{\widehat{w}}_{i,s,t-1}exp\left\{{r}_{i,t-1}\right\}}{\sum_{k}{\widehat{w}}_{k,s,t-1}exp\left\{{r}_{j,t-1}\right\}+\left(1-{\sum }_{k}{\widehat{w}}_{k,s,t-1}\right)}\), where \(\left(1-{\sum }_{k}{\widehat{w}}_{k,s,t-1}\right)\) represents the weight of the risk-free asset before the price change. Therefore, the rebalancing at time \(t\) has a transaction cost of \({tc}_{s,t}=0.5\%\sum_{k}|{\widehat{w}}_{k,s,t}-{w}_{k,s,t}^{+}|\). The portfolio discrete return after transaction costs at time \(t\) is given by \({R}_{s,t}^{tc}=\left(1-{tc}_{s,t}\right){R}_{s,t}\).

These metrics are compared with those of two benchmark portfolios: The portfolio based on the returns of the portfolio based on recursive historical means and unconditional covariances (called hereafter Historical) and the equally weighted portfolio that only includes the risky assets (denoted hereafter by 1/N). According to DeMiguel et al. (2009), this is a portfolio very hard to outperform, especially if transaction costs are taken into account.

5 Empirical Results

This section presents the results of the forecasting accuracy of the various model combination schemes for each asset class and the performance metrics for the portfolios created and rebalanced using the different strategies.

Table 4 reports the pseudo-R2 and the significance of the adj-MSFE test. Although there is only one model for Small, Medium, and Large sets, the selection/average of \(\delta\) is still made, among the 7 possible values reported in Sect. 4. The results of the DMS and DMS are the same for the Small, Medium, and Large models (the same situation occurs in Koop & Korobilis, 2013).

Table 4 - Out-of-sample pseudo-R2

Generally, Table 4 reports low predictability of the forecasting schemes when compared with the historical mean, except for the Mean and WMSFE combinations for Bonds, which have a pseudo-R2 statistically significant at the 5% level, ranging from 3.00 to 3.73%. For Stocks and REITs, the DMS and DMA of Medium models present the best results, with pseudo-R2 of 0.28 and 3.76, respectively, but none of them are significant. These mixed results highlight the problem of model uncertainty faced when dealing with forecasting returns (Fugazza et al., 2015). The good results for Bonds are in line with Fisher et al. (2020).

We continue the analysis by studying the portfolios of CRRA investors with three risk aversion coefficients (\(\gamma =3, 5, \mathrm{and }10\)). We have not discarded any forecasting scheme or any asset. Table 5 reports the mean, standard deviation, skewness, kurtosis, and Jarque–Bera normality test of the discrete returns of the portfolio strategies. The portfolio returns are analyzed without transaction costs and with proportional transaction costs of 0.5%.

Table 5 Statistics of out-of-sample portfolio discrete returns for different risk aversion coefficients

Several patterns emerge from Table 5 that we describe hereafter. First, the consideration of transaction costs does not alter the order of the statistics, most notably for the strategies with the best results. Transaction costs obviously decrease the mean returns for all strategies, but also decrease the standard deviation and the skewness and increase the kurtosis for most strategies. Second, the returns of the portfolios are non-normal, except for the portfolios resulting from the DMA Full with \(\gamma =5\). Third, the best results in terms of standard deviation, skewness, and kurtosis are scattered among the pairs strategy/\(\upgamma\), but most of the differences are small. Fourth, the increase of the risk aversion coefficient decreases the mean (with the exceptions of the DMS and DMA Small and Medium and DMA Full strategies, when this coefficient increases from 3 to 5), naturally decreases the standard deviation for all strategies and the results are mixed for the skewness and kurtosis, as in Diris et al. (2015). Fifth, the inclusion of more models in the WMSFE strategy improves the portfolios in terms of mean returns and skewness at a cost of an increase in the standard deviation, and in some cases increase in the kurtosis, but the strategy based on the mean forecast presents poor results. Sixth, most strategies outperform the Historical strategy without transaction costs regarding the mean, all have lower standard deviation and higher skewness, and most have lower kurtosis than the Historical strategy. This trend is more marked when transaction costs are considered, which turns all strategies superior in terms of mean, standard deviation, and skewness. Seventh, the 1/N strategy is better than the Historical (the only exception is the mean return of the Historical for \(\gamma =5\) without transaction costs), and its superiority clearly increases with transaction costs, with a visible impact on the mean returns. This implies that the 1/N strategy is more difficult to outperform, especially in the presence of transaction costs. Eight, although the majority of pairs strategy/\(\upgamma\) have higher means than the 1/N portfolio without transaction costs, even at the expense of a higher standard deviation, this number reduces significantly when transaction costs are considered. In this case, only the DMS and DMA Small, Medium, and Full present better results in terms of mean returns. Finally, and most strikingly, a pattern appears from the comparison of the best results, which suggests that given the small differences in terms of standard deviation, skewness, and kurtosis, the best strategy is the Small, due to the evident high mean returns, especially for \(\gamma =5\). On average, across the three risk aversion coefficients, the Small strategy has a mean return of approximately 21% higher than the second-best model, the Medium strategy, with and without transaction costs, 159% higher than the Historical and 1/N strategies without transaction costs and 2,041%, and 121% higher than the Historical and 1/N strategies with transaction costs.

The superiority of the Small strategy is quite remarkable. This can be visualized in Fig. 2 which shows the cumulative returns of all strategies with and without transaction costs for \(\gamma =\) 3, 5 and 10.

Fig. 2
figure 2

Cumulative returns of the different portfolio strategies

The fact that the Small strategy, which only considers the first lags of the excess returns, outperforms the other strategies is in line with Welch and Goyal (2008), which conclude that none of the used additional variables has worked in forecasting equity excess returns. This is reinforced by Goyal et al. (2023). However, one the one hand, these good results also may imply that traditional predictive regressions, updated with schemes to resolve parameter uncertainty and instability, work out-of-sample (see, for instance, Rapach et al., 2013; Koop & Korobilis, 2013; Fisher et al., 2020) and, on the other hand, that the inclusion of REITs improve the risk-return trade-off, as suggested by Ling et al. (2020) and Zhu and Lizieri (2022).

Table 6 shows the out-of-sample performance metrics of the different portfolio strategies for the three different risk aversion coefficients. For each strategy/\(\gamma\), we conduct a robust comparison with the Historical and 1/N portfolios using bootstrap p-values. The results presented in this table reinforce what has been discussed previously. The Mean and WMSFE strategies show poor performance for any of the risk aversion coefficients considered, but the picture is completely different for the DMS and DMA strategies. In these strategies, only the CER and SOR of the DMA Large for \(\gamma =3\), without transaction costs, is not significantly greater than the CER of the Historical strategy at the 10% level. The statistical significance of the difference of these metrics against the 1/N strategy is generally lower, namely for the Large, DMS Full and DMA Full, especially after transaction costs. The second-best strategy, the Medium portfolio, presents a better CER, SR, and SOR than the 1/N strategy at the 1% level, except the SR and CVaR without transaction costs. But undoubtably, the “winning horse” is the Small strategy (see, for instance, Koop & Korobilis, 2013). With and without transaction costs, this strategy has a higher CER, SR, and SOR than the Historical and 1/N strategies at any \(\gamma\) considered, all significant at the 1% level. The CVaR are lower than the 1/N portfolio, but it is not significant for \(\gamma =3\) and is significant at the 10% level for \(\gamma =5\). The differences in the SR and SOR, without transaction costs are remarkable and show an increasing trend with \(\gamma\). Just to give a general idea, on average, across the three risk aversion coefficients, the Small strategy has an SR which is approximately 304% and 148%, and a SOR which is 323% and 182% higher than the Historical and 1/N strategies, respectively, without transaction costs. With transaction costs, the average SR of the Historical portfolios is even negative. The average SR of the Small strategy is 116% higher than the one of the 1/N portfolio. With transaction costs, the average SOR of the Small strategy is higher than the Historical and 1/N portfolios by 3,967% and 144%, respectively.

Table 6 Portfolio performance out-of-sample for different risk aversion coefficients

6 Conclusion

This research aims to study the predictability of monthly returns and portfolio allocation among three US-based assets, namely Stocks, proxied by the S&P 500 Total Return Index, Bonds proxied by the Barclays Capital US Aggregate Bond Index, and REITs, proxied by the FTSE Nareit US Real Estate All Equity REITs Index. The period spans from March 1976 to December 2021, however, the analysis is conducted out-of-sample, beginning in October 2006 (last 183 months), which corresponds to a period of high turbulence in the stock and REITs markets, due to the global financial crises and the impact of the Covid-19 pandemic.

We use several procedures to obtain the forecasts of the three classes of assets. The forecasts resulting from the mean and the Weighted Mean Squared Forecasting Errors (WMSFE) of different VAR(1) models, Dynamic Model Selection (DMS) and Dynamic Model Averaging (DMA) applied to TVP-VAR(1) models, which differ according to the predictor space used. We begin with a large set of predictors, a total of 155 (19 for stocks, 122 for Bonds, and 14 for REITs), and resort to a Genetic Algorithm to shrink the predictor space. The Genetic Algorithm renders a set of 17 predictors (8 for Stocks, 6 for Bonds, and 3 for REITs).

The results are negative for Stocks, but some positive results are obtained for REITs and especially for Bonds, for which the Mean and WMSFE present significant pseudo-R2 at the 5% level. A possible explanation for these results is that the lagged stock returns are correlated to Bonds and REITs, the lagged returns of REITs are correlated with Bonds, and an additional predictor from the stock market (Conservative minus aggressive factor) is correlated with REITs. In sum, although we did not find any predictability in Stocks, this class of assets helps predicting REITs and most notably Bonds.

Then we devised several portfolio strategies for CRRA investors considering three risk aversion parameters, 3, 5, and 10. These portfolios are rebalanced monthly, considering the forecasts obtained from recursive estimations.

Although the portfolios with the best results in terms of standard deviation, skewness, and kurtosis are scattered among the different portfolios strategies, it is remarkable the performance of the DMS and DMA portfolios that only uses as predictors lagged returns of the three asset classes, independently of the risk aversion coefficient. Without considering transaction costs, the annual mean return of this strategy is 23.04%, 23.64%, and 20.52%, while for the portfolio based on the historical mean and covariances are 9.24%, 8.88%, and 7.80% for the risk aversion parameters of 3, 5 and 10, respectively. The equally weighted portfolio only presents an annual mean return of 8.64% while the corresponding figure with proportional transaction costs of 0.5% is 8.52%. The inclusion of proportional trading costs of 0.5% greatly penalizes the mean returns of the historical portfolio, which presents the values of 1.44%, 0.96%, and -0.24% for the three risk aversion parameters. With transaction costs, the best portfolio strategy, although more penalized than the equally weighted portfolio, maintains its superiority, with mean returns of 18.96%, 19.92%, and 17.64%.

The aforementioned mean return discrepancies translate into the superiority of the best strategy in terms of Certainty Equivalent, Sharpe ratio, and Sortino ratio, which are all significantly higher at the level of 1% than the corresponding figures for the historical and equally weighted portfolios with and without transaction costs for all risk parameters considered (except the Sharpe ratio with transaction cost and a risk aversion parameter of 3, which is significant at the 5% level). Additionally, even the CVaR at 5% of the best strategy are lower than those of the benchmark portfolios, with most of them statistically significant. Just to put the above discussion in context, with transaction cost for a risk aversion parameter of 3, the best strategy shows a CER of 17.13%, a SR of 127.57%, a SOR of 289.59%, and a CVaR of 7.53% while the equally weighted portfolio shows a CER of 6.46%, a SR of 64.64%, a SOR of 139.10%, and a CVaR of 7.86%.

In a nutshell, our paper presents two main conclusions. First, it supports the literature that argues in favor of methodologies that incorporate dynamic modelling and parameter uncertainty and uses combinations or selection of several models. Second, predictability mainly comes from the information of the stock market to other markets, hence including into the portfolio other assets tends to improve its performance. Therefore, our research points out that a promising line of research is to include in the analysis other assets, for instance, commodities, industry indexes, or other international markets. In future work, we intend to explore this line of research.