Introduction

Factor models play a crucial role in empirical asset pricing by providing a framework for explaining the cross-sectional variance in stock returns and identifying systematic risk factors (Fama and French 2020). Within this framework, abnormal returns derived from these factor models represent the proportion of stock returns that cannot be explained by systematic risk factors included in factor models. These abnormal returns can be influenced by a myriad of factors, including firms’ idiosyncratic characteristics, market efficiencies, macroeconomic conditions, financial market dynamics, and investor sentiment. Understanding abnormal returns holds significant value for researchers, as it provides insights into the efficacy of factor models, the presence of market efficiency, and various other facets of the financial market and the real economy. Meanwhile, excess returns measure the returns of stocks above a risk-free rate, such as the return on Treasury Bills. They represent the additional compensation that investors receive to shoulder the risks associated with investing in the stock market.

In the empirical asset pricing literature, to resolve the challenges posed by high dimensionality, non-linear relationships, and interaction effects, Cochrane (2011) and Giglio et al. (2022) have prompted researchers to explore alternative methodologies that can improve the accuracy of stock return predictions. In recent years, machine learning (ML) methods have gained popularity in empirical asset pricing research, referring to Israel et al. (2020). These methods offer powerful capabilities for analyzing complex financial data and capturing non-linear relationships and interaction effects that may be overlooked by traditional statistical approaches. Although ML methods have predominantly been applied to predict excess stock returns, it is crucial to acknowledge the economic significance of abnormal stock returns derived from various factor models. Therefore, this study concentrates on the use of neural network models for stock return prediction, assesses their performance in predicting different measures of stock returns, investigates variations in feature importance, and explores the interplay between firm-specific variables and macroeconomic factors when using neural network models to predict these different measures of stock returns.

This empirical study focuses exclusively on the US stock market and covers the period from 1971 to 2021. This study employs 49 stock characteristic variables to capture stock fundamentals and includes 14 distinct macroeconomic variables from three dimensions: financial market, real economic activities, and investor sentiment. Given the dataset spans over five decades, the factor models considered in this study have evolved from the capital asset pricing model (CAPM) to the Fama–French five-factor model. To thoroughly compare and analyze abnormal returns, this study extensively examines abnormal returns derived from various factor models, including the CAPM model, the Fama–French three-factor model (FF3), and the Fama–French five-factor model (FF5). By considering different factor models and their corresponding abnormal return estimates, this study elucidates the impact of factor selection on abnormal return analysis and its implications for empirical asset pricing research.

Prior to the main analysis using the neural network model, a univariate long–short portfolio analysis was conducted. Stocks are sorted into deciles based on 45 firm-specific characteristic feature variables (four variables cannot be cut into deciles), and long–short portfolios are constructed by a long top decile and a short bottom decile. Most stock characteristic feature variables exhibiting predictive power, with portfolios achieving higher-than-average mean returns and Sharpe ratios. Next, the full sample is divided into three subsamples based on the Chicago Fed National Activity Index (CFNAI): recession, normal, and expansion periods. Interaction effects have been observed between stock characteristic variables and macroeconomic conditions when predicting stock returns. Portfolios realized higher mean return and Sharpe ratio in the recession and expansion periods, and lower mean return and Sharpe ratio in the normal period when the market was less volatile regardless of different measures of stock returns. It is important to note that the analysis of univariate long–short portfolios represents only the first step in exploring the relationship between our candidate predictors and stock returns. This approach does not account for the non-linear relationship between predictors and stock returns or the interaction effects among firm-specific characteristic variables.

In the main analysis, only 49 firm-specific characteristic feature variables are used to predict stock returns as a first step. The empirical analysis reveals that the performance of neural network models in predicting abnormal returns is almost equal to their performance in predicting excess returns. The out-of-sample R-squared value in the testing dataset for predicting abnormal returns based on the CAPM model is the highest at 0.825%, while the lowest R-squared value comes from predicting abnormal returns based on the FF5 model, which is 0.664%. However, there is no significant difference in prediction accuracy in terms of the R-squared values for predicting various measures of stock returns, which are around 0.7% and in a range of less than 0.18%.

Next, 14 macroeconomic variables were incorporated into the model. These variables provide a summary of the macroeconomic conditions from different perspectives. The results reveal a significant improvement in the models’ performance for predicting excess stock returns in terms of R-squared values. The addition of these three macroeconomic variables from the financial market increases the out-of-sample R-squared value for excess stock returns by 89.6%, from 0.78 to 1.479%. Furthermore, after adding five macroeconomic variables from real economic activities to the model, the out-of-sample R-squared value for excess stock returns more than doubled to 2.458%. Finally, after adding the remaining six macroeconomic variables from the investor sentiment dataset, the R-squared value reached 5.474%. Similarly, the model’s performance in predicting abnormal stock returns derived from various factor models also shows an improvement in prediction accuracy, although the magnitude of the improvement is not as pronounced as that observed for predicting excess stock returns, as reflected in the increased R-squared values.

The performance of value-weighted portfolios based on model predictions vary significantly. The distribution of the portfolios’ CARs is nearly symmetrical, with the spectrum evenly allocated on both sides of the zero line. Investing in the top decile could yield up to eight times the abnormal returns for the entire period, whereas investing in the bottom decile could result in a loss of the same magnitude. In contrast to the spectrum of abnormal stock returns, the portfolios’ cumulative excess returns exhibit an upward trend for most deciles. Investing in the top decile can yield nearly 12 times the excess return, whereas investing in the bottom decile can result in a loss close to double the excess return. After incorporating the macroeconomic variables, there are slight improvements in distinguishing the performances of the different portfolios. This result is consistent with the improvements in R-squared values observed after including these additional variables.

The relative importance of firm characteristics and macroeconomic variables is then investigated to predict various stock return measures using neural network models. To quantify their contributions, we employ SHAP values developed by Lundberg et al. (2018). This analysis reveals that the relative importance rankings of these variables differ across the different stock return measures. When predicting abnormal stock returns derived from the FF3 and FF5 models, the firm characteristic variables from the Momentum and Trading Fractions groups emerge as the most important contributors, whereas the role of macroeconomic variables is less significant. Conversely, when predicting excess stock returns, macroeconomic variables are the most important contributors, particularly those from the investor sentiment dimension. However, when predicting abnormal stock returns derived from the CAPM, feature importance presents a mixed pattern, with macroeconomic variables from investor sentiment data and firm characteristic variables from the Trading Fractions group playing prominent roles. A potential explanation for these findings is that factor models, such as the FF3 and FF5 models, account for and eliminate the variance associated with macroeconomic conditions. By contrast, the CAPM model only includes market returns as a factor, thereby failing to remove the variance linked to macroeconomic conditions.

To gather additional evidence, the interaction effects between the firm characteristic variable short-term reversal (STreversal) and the macroeconomic variable CFNAI are explored to predict different measures of stock returns. These variables are chosen because of their significant roles as feature variables in neural network models for stock return predictions and their economic significance in empirical asset pricing. The results demonstrate a strong interaction effect between short-term reversal and CFNAI when predicting excess and abnormal stock returns derived from the CAPM model. However, this interaction effect is less pronounced when predicting the abnormal returns derived from the FF3 and FF5 models, suggesting that these factor models have already accounted for and removed the periodic variance associated with macroeconomic conditions. Similarly, the interaction effects between short-term reversals and investor sentiment are more evident in predicting excess stock returns and CAPM-derived abnormal returns, but less prominent in predicting abnormal returns derived from the FF3 and FF5 models. These consistent patterns across different macroeconomic variables highlight the importance of considering the information already incorporated and removed by the factor models.

Finally, we investigate whether the portfolios predicted by the neural network model can achieve higher mean returns and Sharpe ratios under different macroeconomic conditions. To do so, we divide the full sample into three subsamples based on the CFNAI index, similar to the univariate long–short portfolio analysis, defining them as recession, normal, and expansion. The results reveal that the long–short portfolios, constructed by buying stocks in the top decile and selling stocks in the bottom decile, consistently achieve the highest mean returns and Sharpe ratios during recession and expansion periods when the market is more volatile. Furthermore, a clear progression of the mean return and Sharpe ratio is observed from the bottom to the top decile across different economic periods for each long-only portfolio. These findings emphasize the importance of market timing, based on macroeconomic conditions, when constructing prediction-based portfolios.

Literature review

In recent years, ML methods have been increasingly used in the field of empirical asset pricing. The goal is to develop and implement models that can predict future returns or pricing of assets; exhaustive reviews in this area have been undertaken by Giglio et al. (2022) and Bagnara (2022). Freyberger et al. (2020) use adaptive grouped LASSO to predict excess stock returns non-parametrically and explore which firm characteristics provide independent information for cross-sectional stock return prediction. One drawback of linear models, such as LASSO and Elastic-net, is that the interaction effects among the feature variables are not accounted for. Bryzgalova et al. (2020) use decision trees to group stocks endogenously and select optimal portfolio splits to span the stochastic discount factor. Non-linear tree models can accommodate interaction effects among feature variables and predict more interpretable results.

As Cochrane (2011) mentions in his AFA (American Finance Association) Presidential Address, the current challenge in asset pricing research is the “curse of high-dimensionality,” and hence, another research direction is to reduce the dimensionality of predictors. Kelly et al. (2019) propose instrumented principal component analysis to predict cross-sectional stock returns. They use observable characteristics that instrument for unobserved dynamic factor loadings such that the formulated latent factors have lower dimensions and process time-varying characteristics. Kozak et al. (2020) impose a prior on SDF coefficients that shrink the low-variance principal components of the firm characteristic factors, and an SDF built by a smaller number of reduced components performs even better than the four- or five-factor models used in the recent literature. There are further studies in this direction, such as Feng et al. (2023), who use the deep factor model to generate latent factors.

Gu et al. (2020) conduct an empirical study that compared several prediction models. They find that ML models, particularly neural network models, can outperform traditional statistical models in terms of accuracy and precision, particularly when dealing with high-dimensional and complex financial data. Further research was conducted using deep neural network models. Gu et al. (2019) propose a new latent conditional asset pricing model, following the seminal research of Kelly et al. (2019). Cong et al. (2021) use deep reinforcement learning to achieve high prediction accuracy and model interpretation. Chen et al. (2019) use deep neural networks, including feed-forward networks, long–short-term memory, and adversarial networks to estimate a pricing kernel model that predicts cross-sectional stock returns based on a large amount of information. The deep neural network learning method outperforms the benchmark approaches in terms of the out-of-sample Sharpe ratio.

While most ML studies on empirical asset pricing focus on predicting excess stock returns, Kaniel et al. (2022) examine the predictive power of neural network models in predicting abnormal fund returns. Their findings suggest novel and substantial interaction effects among market sentiment, fund flows, and momentum. This raises the question of whether abnormal stock returns are more predictable than excess returns, and whether there are significant interaction effects between macroeconomic conditions and stock characteristics.

Stock return prediction models have evolved significantly over the last century, predating the incorporation of ML methods. The CAPM introduced by Sharpe (1964) describes the concept of beta as a measure of an asset’s risk relative to the market. Later, Fama and French (1993) expand the model to include two additional factors, size and value, which explain the cross-section of expected returns. The size factor captures the tendency of smaller firms to have higher expected returns, whereas the value factor captures the tendency of value stocks (i.e., stocks with low price-to-book ratios) to have higher expected returns. More recently, Fama and French (2015) added two factors to create a five-factor model. The profitability factor captures the risk associated with a firm’s profitability, the investment factor captures the risk associated with a firm’s investment policy. Recently, researchers have added that the momentum factor captures the risk associated with a stock’s recent price momentum. These factors help to capture additional sources of risk and returns that are not covered by the original three factors.

Exploring the similarities and differences between stocks’ abnormal returns (i.e., the return that deviates from the expected return based on factor models) and excess returns (i.e., the return that deviates from the risk-free rate) is intriguing. Investigating how macroeconomic variables predict stock returns provides valuable insights for both theoretical and empirical studies. These questions are worth further dissecting and exploring to provide empirical evidence.

Methodology

Neural network model

The artificial neural network is a potent ML modeling method. Our empirical analysis employs a simple feed-forward neural network architecture to capture the complex relationship between stock feature variables and returns. The input layer comprised a set of firm characteristics and macroeconomic variables. One or more hidden layers capture the interactive effects between different variables, and perform non-linear transformations through activation functions on the input variables. The output layer aggregates all information from the last hidden layer to generate the ultimate prediction output. This is illustrated in Fig. 1.

Fig. 1
figure 1

Demonstration of feed-forward neural network. This figure showcases a simple neural network model structure with on hidden layer

As posited by Gu et al. (2020), stock returns can be effectively described using an additive prediction error model represented by Eq. 1. In this equation, individual stocks are indexed as \(i=1,2,...,N\) and the time periods are indexed as \(t=1,2,...,T\). The abnormal return of stock i at time \(t+1\), denoted by \(R^{abr}_{i,t+1}\), can be understood as the expected stock return \(E_t(R^{abr}_{i,t+1})\) based on the conditional information available at time t, along with an error term \(\epsilon _{i,t+1}\).

$$\begin{aligned} R^{abr}_{i,t+1} = E_t(R^{abr}_{i,t+1}) + \epsilon _{i,t+1} \end{aligned}$$
(1)

In most ML studies, the primary objective is to estimate the expected return at time \(t+1\) based on the information available to practitioners at time t. This conditional expectation of returns can be denoted as \(E_t(R^{abr}_{i,t+1}) = g(z;\theta )\). Here, the function \(g(z,\theta )\) represents a mapping function that incorporates a set of predictors z and a set of parameters \(\theta\), which correspond to the weights and biases in each layer of the neural network model. By utilizing a feed-forward neural network, the model learns to establish a mapping between the inputs (predictors) and outputs (expected returns) by adjusting the weights of the nodes in each layer. This adjustment process occurs iteratively in response to errors caused by the model on the training dataset. A neural network model aims to minimize these errors and improve its performance in predicting stock returns. This process can be expressed mathematically as follows:

$$\begin{aligned} g(z;\theta ) = \theta _0 + \sum ^n_{j=1}x_j\theta _j \end{aligned}$$
(2)

Objective function

The objective function (also known as the loss function) quantifies the differences between the predicted and actual outcomes in a neural network and guides the training process to determine the optimal weights. Because of the vast space available for potential weight combinations, achieving a perfect fit is virtually impossible. Instead, the training process was reformulated as an optimization problem. The goal was to search for the best possible weight set that yielded accurate predictions. Neural networks typically employ a stochastic gradient descent (SGD) optimization algorithm. The weights of the models were adjusted using the backpropagation of errors method. Predictions were made using initial weights. Subsequently, the errors between the predicted and actual values were calculated. Based on this error, the SGD algorithm modifies the weights to reduce the errors in future predictions. Commonly used objective functions to measure this error include the mean squared error (MSE), mean absolute error, and Huber loss. The MSE is often preferred for stock return predictions, where large outliers are of particular concerning, MSE is often preferred. The MSE loss function is given by

$$\begin{aligned} MSE = \frac{1}{n} \sum _{i=1}^{n} (Y_i - \hat{Y}_i)^2 \end{aligned}$$
(3)

Activation function

Activation functions play an important role in neural networks. The choice of an activation function in the hidden layers dictates the transformation of the weighted sum of the inputs into output from one layer to the next. Meanwhile, the choice of the output layer determines the nature of the model’s predictions. Activation functions, such as softmax, sigmoid, and ReLU, offer a range of choices. The architecture of a neural network ensures the capture of the interaction effects between variables. Simultaneously, the correct activation function can account for the non-linearity of the individual predictors of the expected returns. In this study, a linear function is chosen for the output activation, because the returns we aim to predict have random characteristics and extreme values. For the hidden layers, the rectified linear unit (ReLU) function was adopted as the activation function because of its computational simplicity and alignment with existing literature, as discussed by He et al. (2015). The ReLU function is piecewise linear, delivering the input value if it is positive and zero otherwise, as shown in Eq. 4:

$$\begin{aligned} ReLU(x) = {\left\{ \begin{array}{ll} 0 &{} if \ x < 0 \\ x &{} otherwise. \end{array}\right. } \end{aligned}$$
(4)

Learning rate

Learning rate is a vital hyperparameter in neural networks that influences the extent to which a model adjusts during weight updates. In feed-forward neural networks, the SGD algorithm aims to find the best weights to minimize loss. This adjustment is scaled by the learning rate, which controls the speed of learning; a higher rate means faster learning but potential over-adjustments, whereas a lower rate ensures careful and more precise adjustments, but can be slower. Values typically lie between 0 and 1; a very high rate may cause erratic performance, whereas a very low rate may lead to slow or stagnant optimization. To optimize the training, this study incorporates momentum to adjust the learning rate over time. Momentum considers past updates to influence future updates and, when combined with a dynamic learning rate, can lead to faster and more efficient model training. A common adaptive learning rate strategy is represented in formula 5:

$$\begin{aligned} Learning \ Rate = Initial \ Rate \times \frac{1}{1 + decay \times iteration} \end{aligned}$$
(5)

This strategy, shown in Fig. 2, often outperforms static rates by better adapting to the training process.

Fig. 2
figure 2

Learning rate schedule. This figure shows how the learning rate changes in a learning rate schedule after each iteration. The initial learning rate is set to be 0.01, with decay rates of 0.1, 0.001, 0.001, and 0.0001 respectively

Fix overfitting with regularization

Overfitting is a common challenge in neural networks, where models may perform exceptionally well on training data but underperform on unseen test data. The primary goal of ML is to generalize training data to new, unseen instances, and overfitting compromises this ability. There are two fundamental approaches for mitigating overfitting: enlarging the training dataset and adjusting the complexity of the model. While expanding a dataset can enhance a neural network’s performance to its maximum learning capacity, adjusting the complexity involves either tweaking the network’s structure, such as the number of nodes or layers, or altering its parameters, primarily the weight values. Numerous regularization techniques such as activity regularization, weight constraints, dropout, and noise addition have been developed to address overfitting. Among these, weight regularization is prominent. If left unchecked during prolonged training, the weights may adapt overly to the training data, causing overfitting. Large weights can make the model overly reactive to minor changes in test data, leading to inconsistent predictions. Weight regularization combines this by imposing penalties based on the magnitude of the weights, ensuring that they remain controlled. Various penalties like L1 (sum of absolute weight values) and L2 (sum of squared weight values) exist. This study specifically utilizes L1 penalties on network weights, following the LASSO approach in linear regression, a method commonly adopted in many studies.

Enhancing predictions with ensemble learning

Although adept at generalizing complex mapping functions from inputs to predicted outputs, neural networks exhibit high flexibility. This implies that they may yield different weights each time they are trained even for identical datasets. This characteristic, referred to as their low bias and high variance nature, is due to their sensitivity to initial conditions, such as initial random weights and specific nuances in the training dataset. Consequently, different initializations can lead neural networks to converge to different weights, causing variations in the predictions of new data. This variability poses a challenge for the development of a consistent predictive model. A widely adopted solution to this challenge is ensemble learning, in which multiple models are trained, and their predictions are aggregated to produce a more stable and potentially more accurate output. Although ensembling may introduce bias, it counteracts the high variance often observed in stand-alone models. Thus, ensembled models, by virtue of being less reliant on training data specifics and initial conditions, not only mitigate the variance of individual models but can also outperform single models, especially if the aggregated models excel in diverse ways and their prediction errors are uncorrelated.

Finally, the “He initializer” is used to avoid the problem of “dying ReLU.” The maximum number of epochs during each training process was set to 100, and early stopping was implemented based on the R-squared value of the validation dataset. To enhance the stability of the training process, we used batch normalization, and to improve the robustness of our predictions, we employed ensemble learning by combining the predictions of the 10 models. The detailed model specifications are presented in Table 1, where the optimal network architecture comprised one hidden layer with 32 hidden nodes. The other hyperparameters were tuned using the validation dataset and are listed in the left panel of the table.

Table 1 Model summary

Stock abnormal return

The objective of this study is to investigate how neural network models behave differently when predicting various stock return measures. Specifically, we concentrate on predicting excess stock returns, which are computed by deducting the one-month treasury bill rate from actual stock returns. Furthermore, we generate abnormal stock returns from various factor models using a method inspired by Kaniel et al. (2022). For each stock, the abnormal return is defined as the difference between the stock’s excess return and the expected return based on various factor models, which consist of the alpha of the factor models and residuals.

To ensure a robust analysis, a criterion is set such that each stock has a minimum of 37 observations. Then, the \(\beta s\) are estimated for each factor using data from the past 36 months by employing Eq. 6 as follows:

$$\begin{aligned} R_{i,t-36:t} = \alpha _{i,t-36:t} + F_{t-36:t} \hat{\beta }_{i,t} + \epsilon _{i, t-36:t} \end{aligned}$$
(6)

in which \(R_{i,t}\) is the excess return of stock i in month t, \(F_t\) is a vector of factors; taking the FF5 model as an example, \(F_t\) would be \([SMB_t + HML_t + RMW_t + CMA_t + (R_m - R_f)_t]\), and \(\hat{\beta }_{i,t}\) is a vector of estimated coefficients accordingly. We then we use the estimated \(\hat{\beta }_{i,t}\) from the past 36 months to predict the stock returns for the 37th month, \(R^E_{i,t+1} = F_{t+1}\hat{\beta }_{i,t}\). Finally, the abnormal return for the stock is obtained as the difference between the excess return and the Fama–French five-factor model’s expected return, using equation 7.

$$\begin{aligned} R^{abn}_{i,t+1} = R_{i,t+1} - R^E_{i,t+1} \end{aligned}$$
(7)

Data

Firm characteristic data

This study deviates from the traditional approach of constructing all variables from raw data typically obtained from databases, such as Compustat or Thomson Reuters. Instead, we use firm characteristics data sourced from open-source asset pricingFootnote 1, a dataset created and maintained by Chen and Zimmermann (2021). There are several reasons for this observation. First, ML studies often involve the analysis of a large number of feature variables. For instance, Kozak et al. (2020) use 50 anomaly characteristics and 80 financial ratios from WRDS, while Freyberger et al. (2020) construct 62 firm characteristic variables based on raw data from Compustat. The process of constructing these variables is time-consuming. Second, the construction of these feature variables based on descriptions in original papers requires meticulous attention. Even slight, subjective variations can result in different datasets, making it challenging to reproduce some studies, let alone achieve results identical to those of the authors. Third, the use of a self-constructed dataset can introduce complexity into replication efforts and hinder comparability with other ML methods.

This study focuses on all publicly treated stocks in the U.S. stock market. The full dataset from Open Source Asset Pricing contains 319 distinct firm characteristic variables based on previous asset pricing studies. Among these, 161 characteristic variables exhibited significant t-statistics in the original papers, 44 showed mixed evidence, and the remaining 114 were found to be insignificant. Not all variables from the dataset were included in this analysis; rather, a specific subset was selected for this study. These selections were drawn from a subset of 161 characteristic variables that exhibited noteworthy t-statistics in the original research. The statistical significance of these variables underscores their relevance for predicting stock returns. Within a pool of 161 characteristic variables, the selection criteria considered the completeness of each variable. The time period ranges from 1971 to 2021, and variables with a significant number of missing observations, specifically those with more than 40% missing data, are excluded. This is related to the approach used for handling missing values in this study, thereby ensuring a maximally sized dataset in terms of observations. Gu et al. (2020), Leippold et al. (2022), and Drobetz and Otto (2021) replace all missing values with the cross-sectional median of each firm-characteristic variable for that month. Although this approach allows us to include as many firm characteristic variables as desired, it requires strong assumptions about the structure of the data and creates artificial time-series fluctuations in the firm characteristic variables. Given the non-random nature of missing data in firm characteristics, as noted by Bryzgalova et al. (2022), we chose not to use artificial imputation methods instead of dropping all missing values to obtain a balanced dataset. This was performed to avoid additional sources of error and to maintain the dependency structure in the dataset. Consequently, the final selection consists of 49 firm characteristic variables, neatly categorized into seven groups: Value, Investment, Trading Fractions, Profitability, Momentum, Intangible, and Risk. These categories are similar to those employed by Chen et al. (2019), as Table 2 presents. It is noteworthy that the Open Source Asset Pricing database lacks such variables as “short-term reversal,” “size,” and “return,” which are obtained from the Center for Research in Security Prices (CRSP) and merged using the unique stock identifier PERMNO and date.

Table 2 Firm characteristics by category

Macroeconomic data

Together with the incorporation of firms’ characteristic variables, an extensive array of macroeconomic variables was employed to capture the broader economic landscape in favor of stock return prediction. The macroeconomic data used in this study are sourced from three principal dimensions: the financial market, real economic activities, and investor sentiment. Within the financial market dimension, this study coincides with that of He et al. (2021, 2023), who demonstrate the significance of macroeconomic variables in predicting bond returns. Specifically, such metrics as the S&P 500 dividend-to-price ratio (S&P500_divprc), S&P 500 PE ratio (S&P500_pe), and S&P 500 index return variance (S&P500_var) are selected to quantify the state of the financial market, these variables initially used by Welch and Goyal (2008), are important in assessing market conditions. The monthly trend of each indicator is plotted in Fig. 3.

Fig. 3
figure 3

Macroeconomic variables from the financial market. This graphic plots the historical trends of macroeconomic variables from the financial market dimension in conjunction with the National Bureau of Economic Research (NBER) suggested recession periods. The data for S&P 500 index return variance is computed as the variance of daily index returns within a given month

From the real economic activities dimension, the selection of macroeconomic indicators includes the CFNAI and its subsectional macroeconomic indexes, including production and income (P_I), employment, unemployment, hours (EU_H), personal consumption and housing (C_H), and sales, orders, and inventories (SO_I). These variables serve as proxies for real economic activities and are commonly used to quantify the overall macroeconomic conditions in the United States, as discussed by Evans et al. (2002) and Brave et al. (2009). The CFNAI, reported monthly by the Federal Reserve Bank of Chicago, holds significance as a leading indicator of economic growth or recession and is closely monitored in both academic and policy circles. Figure 4 illustrates the comprehensive CFNAI index’s historical trends, and provides a detailed representation of the subsectional macroeconomic indexes, both in conjunction with the NBER’s suggested recession indicators.

Fig. 4
figure 4

Macroeconomic variables from real economic activities. This graphic plots the historical CFNAI index from 1971 to 2022 together with NBER suggested recession periods, in 2020 the CFNAI index together with the other four subsectional indices fell sharply due to the economic impact of the COVID-19 pandemic

For the investor sentiment dimension, the selection of macroeconomic data originates from Baker and Wurgler (2007), who find that market sentiment, reflecting investors’ collective disposition towards the market, significantly influences stock returns. To quantify market sentiment, they compiled a sentiment index based on five key indicators that encompass various facets of investor sentiment. Figure 5 presents the historical trend chart of investor sentiment and the components used to construct the investor sentiment index.

Fig. 5
figure 5

Macroeconomic variables from investor sentiment. This plot showcases the historical trends in investor sentiment and its five key indicators that encompass various facets of investor sentiment, drawing from data primarily referenced Baker and Wurgler (2007) and Baker and Wurgler (2006)

Fama-French 5 factor data

In this study, the Fama–French five-factor dataset and risk-free rates were incorporated from the Fama–French data library and merged with the primary dataset. A summary of the variables can be found in Table 3, whereas an in-depth discussion of their construction and economic rationale can be found in Fama and French (2015) work. The collected data span from July 1963 to the present, with a monthly frequency.

Table 3 Factors from Fama–French five-factor model

Stock abnormal return

Over the years, the field of finance has witnessed significant changes in the prevalence of factor models used for asset pricing. Starting from the classic CAPM (1964), the evolution of asset pricing models led to the intertemporal capital asset pricing model (ICAPM) by Merton (1973). The FF3 proposed by Fama and French (1992) has gained popularity, because it can better explain a cross-section of stock returns than the CAPM. However, it has been replaced by the more recent FF5 introduced by the same authors, Fama and French (2015). Moreover, researchers have proposed unofficial extensions to the FF5 model, such as the Fama–French five-factor model with momentum (FF5F+M) and the Fama–French six-factor model (FF6), which includes a volatility factor, as suggested by Harvey et al. (2016). To ensure that the study covers historical trends without drifting too much from the main academic consensus, we aim to compare the abnormal returns obtained using the CAPM, FF3, and FF5 models, given the long time horizon of our analysis spanning over five decades.

Figure 6 graphically represents the key features of excess stocks and abnormal returns. The left panel shows the cumulative monthly average excess and abnormal returns, together with the shaded NBER suggested recession periods. The cumulative excess return displays a clear upward slope, indicating a considerable increase of over four times from the first month to the last month. Several drawdowns occurred during or near the NBER suggested recession period. By contrast, the cumulative abnormal returns from the CAPM model oscillate within the range of [0,1], whereas those from the three- and five-factor models fluctuate alongside the zero bar. Among the abnormal stock returns derived from different factor models, the abnormal return generated by the five-factor model deviates the least from the zero line, suggesting that it is the best factor model for predicting stock returns among other factor models. The right panel of Fig. 6 displays the monthly standard deviations of excess stocks and abnormal returns. Generally, the monthly standard deviations of excess and abnormal stock returns overlap. However, the standard deviation of the abnormal stock returns derived from the FF5 model was the most dispersed. Table 4 presents a summary of the statistics for various measurements of stock returns.

Table 4 Summary statistics of stock returns
Fig. 6
figure 6

Excess and abnormal returns. The left panel of this figure shows the monthly average excess and abnormal return accumulated for the whole time period in our dataset. The right panel presents the standard deviation of stock excess and abnormal returns in each month

Result

Univariate long-short Portfolios’ return

Before conducting the main analysis, we examine the predictive power of the firm characteristic variables with respect to different stock return measures. To do so, we construct univariate long-short portfolios based on each of the 45 firm characteristic feature variables. Four firm characteristic variables (ShareIss1Y,ShareRepurchase, DelFINL, and DelLTI) cannot be sorted into deciles and thus, are excluded from this analysis. This approach involves sorting stocks into deciles based on the remaining 45 firm characteristic variables. We then purchase stocks in the top decile and sell them in the bottom deciles to build the long–short portfolios. To ensure comparability, all firm feature variables are signed to ensure that the top decile exhibits higher returns than the bottom deciles, on average. Portfolios are subject to periodic rebalancing contingent on the update frequency of the respective firm characteristic variables. For instance, variables such as investment in property, plants, & equipment (InvestPPEInv) and the change in net operating assets (dNoa) undergo annual updates, whereas others such as 12-month momentum (Mom12m) and long-run reversal (LRreversal) receive monthly updates. Transaction costs are an important consideration, referring to Avramov et al. (2023) and Jensen et al. (2022), particularly when it comes to rebalancing the portfolio based on the update frequency of each specific firm characteristic variable. However, the primary objective of this analysis is to demonstrate the predictive power of each firm’s characteristic variables. Therefore, the exact cumulative returns achieved by each portfolio were less critical in the context of this analysis. Additionally, the varying update frequencies of these characteristic variables, ranging from monthly to annual, would complicate the consideration of transaction costs and potentially obscure the comparison of their predictive powers of these characteristic variables. Hence, the consideration of transaction costs was relaxed in this study.Footnote 2 It is worth mentioning that the univariate long–short portfolio strategy solely concentrates on the predictive ability of individual firm characteristic variables. It disregards the time-varying patterns in the panel data, does not consider potential interaction effects among different variables, and overlooks the non-linear relationships between firm characteristic variables and returns.

Figure 7 shows the top and bottom three long–short univariate portfolios in terms of cumulative returns over the entire sample period using various measures of stock returns. The main finding is that regardless of the different measures of stock returns employed, the most successful univariate long–short portfolios are those based on short-term reversal (STreversal). This finding aligns with the conclusions of Welch and Goyal (2008), who demonstrate that historical average excess stock returns provide more accurate forecasts of future excess stock returns than do regressions of excess returns on predictor variables. By contrast, the bottom three long–short portfolios exhibit cumulative returns that oscillate around the zero line, indicating a lack of predictive power in the associated firm characteristic variables for stock returns. The cumulative returns of portfolios based on other firm characteristics are not shown in this figure, although most achieve significant cumulative returns at the end of the sample period. It is important to highlight that the worst performing portfolios differ depending on the specific measures of stock returns used. Nonetheless, it is essential to acknowledge that this analysis does not consider potential non-linear relationships and interaction effects among the different characteristic variables.

Fig. 7
figure 7

Univariate long–short Portfolios’ Cumulative Return. This graphic showcases the cumulative returns of univariate long-short portfolios sorted based on firm characteristics over the entire sample period. Each panel represents different measures of stock returns, and within each panel, we highlight the four best-performing and four worst-performing portfolios in terms of cumulative returns throughout the entire sample period

Subsequently, an analysis of univariate long–short portfolio performance across various macroeconomic conditions was conducted. In this study, the analysis involves 14 macroeconomic factors from three dimensions, collectively capturing the overall economic conditions. Similar to firm characteristic variables, these macroeconomic factors display varying predictive power with respect to stock returns. In the initial exploration of the interaction between the macroeconomic and firm characteristic variables, we focused solely on the interaction effect of the CFNAI. The CFNAI indicates that a positive index reading corresponds to growth above the trend, whereas a negative index reading corresponds to growth below the trend. Using the CFNAI index, data were sorted into tertiles, defined as expansion, normal, and recession periods from top to bottom, based on their CFNAI values. In Table 5, the first two columns present the Sharpe ratio and mean return for each univariate long–short portfolio across the entire sample period, based on abnormal stock returns calculated using the FF5 model. Notably, approximately half of these long–short portfolios display an average monthly return exceeding 0.1%, with the highest Sharpe ratio reaching 0.035. This observation aligns with the findings in Fig. 7, where certain firm characteristic variables demonstrate higher predictive power for stock returns, whereas others do not.

In Table 5, columns 3–8 provide information on the Sharpe ratio and mean return of univariate long–short portfolios under different macroeconomic conditions. A comparison of these numbers across three distinct macroeconomic scenarios is presented in Fig. 8, that is, a horizontal comparison of the portfolios formulated by the same firm characteristic variable but under different macroeconomic conditions. First, we compare each long–short portfolio’s Sharpe ratio under these three macroeconomic conditions and subsequently rank the macroeconomic conditions based on the portfolios’ Sharpe ratios. Panel A of Fig. 8 presents the analysis results. Among the 45 long–short portfolios, 17 achieved their highest Sharpe ratios during the expansion period, whereas 25 portfolios exhibited their lowest Sharpe ratios during normal periods. With the same methodology, Panel B presents the comparison of different macroeconomic conditions based on the mean returns of the portfolios. Among the 45 long–short portfolios, 21 achieved their highest mean returns during recession periods, whereas 26 portfolios exhibit their lowest mean returns during normal periods. This initial exploration demonstrates that univariate long–short portfolios tend to achieve higher Sharpe ratios and mean returns during recession and expansion periods when the market is more volatile. Conversely, they tended to achieve lower Sharpe ratios and mean returns when the market was relatively stable during normal periods.Footnote 3 These findings lead to the conclusion that an interaction effect exists between firm characteristics and macroeconomic conditions when predicting stock returns.

Fig. 8
figure 8

Portfolio performance and macroeconomic conditions. This graphic showcases firm characteristics sorted long–short univariate portfolios’ performance across different macroeconomic conditions. We compare univariate long–short portfolios’ Sharpe ratio and mean return in recession, normal, and expansion periods, which is the horizontal comparison among the last 6 columns in table 5

Table 5 FF5 abnormal return: univariate long–short Portfolios

Predicting with neural network model

In the main analysis, neural network models are used for stock return prediction, with the aim of capturing non-linear relationships and interaction effects. Using the 49 firm characteristics listed in Table 2, we specify the methodology employed in this study as follows. First, we randomly split the sample into three equal-sized subsamples: training, validation, and test datasets. The neural network model is then trained using the training dataset. Subsequently, we conduct a meticulous tuning process to optimize all relevant hyperparameters, with the primary objective of maximizing the R-squared value in the validation dataset. A summary of the hyperparameters and final model settings can be found in Table 1. In the final stage, the fitted model, along with different sets of feature variables, is used to predict stock returns for the following month in the testing dataset. This ensured that all the predictions were based on out-of-sample data. In the context of using ML methods for stock return prediction in finance, “out-of-sample” refers to the data that have not been used during the model training and parameter tuning phases and are instead reserved to test the model’s performance. Predicting “unseen data” can provide a more realistic evaluation of the potential predictive accuracy in real-world scenarios.

Predicting performance

We consistently adopt this methodology to predict different measures of stock returns, aiming for a comparative analysis. The results are summarized in Table 6. In the first row of the table, the R-squared value in the testing dataset is presented, using the 49 firm characteristic variables exclusively. It is important to emphasize that the performance of the model displays minimal variation across different stock return measures. This is shown by the R-squared values for the different target variables, which remain near 0.7%. Among these measures, the model demonstrates the highest accuracy when predicting CAPM abnormal returns, with an R-squared value of 0.825% in the testing dataset. Conversely, the model’s predictive accuracy was lowest when forecasting FF5 abnormal returns, yielding an R-squared value of 0.664% for the testing dataset.

Table 6 Full sample R-squared values

As extensively addressed in the literature, including studies by Cochrane et al. (2005), Welch and Goyal (2008), and as examined in  “Univariate long–short Portfolios’ return” section, macroeconomic variables exhibit predictive capabilities for stock returns. Moreover, interaction effects exist between firm characteristic feature variables and macroeconomic conditions in the context of stock return predictions. In the following analysis, to encompass a comprehensive view of the economic landscape in stock return prediction, we gradually incorporate 14 macroeconomic factors from three distinct dimensions. These 14 distinct macroeconomic variables are from the financial market, real economic activities, and investor sentiment, with the expectation of capturing macroeconomic conditions as comprehensively as possible. In Table 6, each subsequent row reports the R-squared values with all the firm characteristic variables, along with the addition of one more macroeconomic variable. The rows progress from the top, which includes only the firm characteristic variables, to the bottom, where all 49 firm characteristic variables are combined with all 14 macroeconomic factors for prediction. As shown in the table, there is a consistent increase in the predictive power of the neural network models, as indicated by the rising R-squared values for predicting stock returns. The R-squared values for predicting ’excess stock returns increased from 0.78 to 5.47%, while those for predicting abnormal stock returns based on FF5 models increased from 0.66 to 0.95% after incorporating all macroeconomic factors.

Fig. 9
figure 9

Full sample R-squared values with macroeconomic variables. This graphic shows the R-squared values in stock return prediction through neural network models, with the impact of an increasing number of macroeconomic factors. Moving from the far left to the far right, it demonstrates the R-squared values, starting with only firm characteristic variables and gradually incorporating all 14 macroeconomic factors together

Although the addition of macroeconomic variables led to a significant increase in the predictive power of the neural network models, the extent of improvement varied when predicting different stock return measures. To compare the performances of the models in predicting different stock return measures, we visualize the results in Table 6 and Fig. 9. As the graph shows, there was a substantial enhancement in the prediction of excess stock returns. The model’s predictive power increases more than seven times when all firm characteristics and macroeconomic variables are incorporated, elevating the R-squared values from 0.78 to 5.47%. By contrast, improvements in predicting abnormal stock returns are relatively moderate. Notably, the neural network models behave most similarly in predicting abnormal stock returns to FF3 and FF5 models, as the increase in R-squared values follows a consistent pattern as more macroeconomic variables are added. Even though the CAPM model is the simplest factor model, the abnormal returns derived from it still exhibit significant differences from excess stock returns, but behave more similarly to abnormal returns from the FF3 and FF5 models.

Neural network Portfolios

Next, we build value-weighted portfolios based on model predictions. Each month we sort stocks into deciles based on their predicted returns, in each decile we assign weights to each component stock based on their market capitalization. It is important to note that this portfolio formulation process involves rebalancing portfolios every month and does not consider transaction costs for the same reason as mentioned before. Figure 10 shows the portfolios’ cumulative returns with respect to various measures of stock returns throughout the sample period using only firm characteristic variables for the prediction. The prediction-based portfolios can be differentiated from good to bad regardless of the kind of measures used for stock returns. By following this investing strategy, we could achieve approximately seven times the cumulative abnormal returns across the sample period while avoiding potential losses at the same level. The portfolio spectrum is almost symmetrically allocated on both sides of the zero line for abnormal stock returns derived from the different factor models. Panel (A) of this figure depicts the cumulative value-weighted portfolios based on the prediction of excess stock returns. By contrast to abnormal returns, most portfolios present upward slopes across time instead of being nearly symmetrically distributed on both sides of the zero line. The model can better differentiate between good and bad portfolios in terms of cumulative returns; investing in the top decile can achieve nearly 12 times the excess return, while investing in the bottom decile would result in a negative return across the entire sample period.

Fig. 10
figure 10

Cumulative return with only firm features. This graphic presents the cumulative return of value-weighted portfolios based on predictions made solely with firm characteristic variables using neural network models. The portfolios are re-balanced each month based on model predictions without considering transaction costs

We employ a similar strategy to construct value-weighted portfolios, using predictions based on firm characteristic variables in conjunction with all 14 macroeconomic variables. The cumulative returns of these portfolios are shown in Fig. 11. As previously explored, the highest R-squared value in the testing dataset, listed in Table 6, is achieved when predicting using a combination of firm characteristic features and all 14 macroeconomic variables, indicating the highest level of prediction accuracy. A comparison between the cumulative returns of value-weighted portfolios based on all the variables and those based solely on firm features corroborates our previous findings. In Fig. 11, we can better distinguish the performance of different portfolios sorted based on the model predictions, especially for excess stock returns. The portfolio formulated using the top decile of the predictions achieves 13 cumulative returns across the entire sample period, whereas the portfolio formulated using the bottom decile of the predictions results in a loss of more than five times. It is important to note that these improvements extend beyond predicting excess stock returns; there are also enhancements in predicting abnormal stock returns. While linear factor models capture the linear relationship of macroeconomic conditions, they may overlook the interaction effects and non-linear relationships, which are effectively captured by neural network models.

Fig. 11
figure 11

Cumulative return with all features. This graphic presents the cumulative return of value-weighted portfolios based on predictions made with firm characteristic variables together with all 14 macroeconomic variables using neural network models. The portfolios are re-balanced each month based on model predictions without considering transaction costs

Long–short Portfolios based on prediction

In the univariate long–short portfolio analysis, the most successful portfolio was created using the short-term reversal (STreversal) factor. Over the entire sample period, this portfolio demonstrates outstanding performance, achieving more than 12 cumulative excess returns, nearly 14 abnormal returns based on the CAPM factor model, and over 17 and 20 abnormal returns derived from the FF3 and FF5 models, respectively. To compare the performance of long–short portfolios based on neural network model predictions with those constructed from univariate sorted portfolios, we initially sorted model-predicted returns into deciles. Subsequently, we select the top decile for a long position and the bottom decile for a short position, thereby forming equally weighted long–short portfolios based on predictions. We present the cumulative returns of these long–short portfolios for various measures with different information sets in Fig. 12. Cumulative returns can exceed 25 times for portfolios formulated using neural network model predictions that incorporate both firm characteristics and macroeconomic variables, regardless of the specific measure used for stock returns.

It is important to note that even in Fig. 7, the best-performing univariate long–short portfolio based solely on short-term reversal (STreversal) achieved only 20 abnormal returns derived from the FF5 model. Most other univariate long–short portfolios do not generate cumulative returns that exceed 10 times. When we examine each panel in Fig. 12, we observe that the red line, representing prediction-based long–short portfolios with both firm features and all 14 macroeconomic variables, outperforms most portfolios based on predictions with only firm features or a limited set of macroeconomic variables. This is also evident from the blue lines, which represent model prediction-based long–short portfolios with only firm characteristic variables that have achieved the lowest cumulative returns, regardless of which measure of stock returns is considered. Again, when comparing the different measures of stock returns, the addition of macroeconomic variables has the most significant impact on the cumulative returns of long–short portfolios, particularly excess stock returns. As previously discussed, linear factor models already capture macroeconomic conditions, and their addition does not lead to significant improvements in abnormal returns compared to predicting excess stock returns.

Fig. 12
figure 12

Cumulative return of long–short Portfolios based on prediction. This graphic showcases the cumulative returns of equal-weighted long-short portfolios formulated using neural network models’ predictions based on various predictor combinations. The graphic consists of four panels, each illustrating cumulative portfolio returns based on different measures of stock returns. Within each panel, we present cumulative long-short portfolio returns generated by neural network models’ predictions using different sets of predictor variables

Feature importance

As we show in the univariate long–short portfolio analysis, some predictors exhibit stronger predictive power, whereas others are less influential in predicting stock returns. In the subsequent analysis, we assessed the relative importance of firm characteristics and macroeconomic variables in predicting various measures of stock returns. To quantify the significance of features in the neural network model, we employ SHAP (SHapley Additive exPlanation) values, a novel approach introduced by Lundberg and Lee (2017). SHAP values track how the predicted outputs change when conditioning specific variables. Figure 13 illustrates the relative importance rankings of firm characteristics and macroeconomic variables for predicting various measures of stock returns. This includes the importance of each specific variable and the cumulative feature importance within each group. For a comprehensive list of these groups, please refer to Table 2.

The importance of feature variables varies in predictions when different measures of stock returns are compared. For instance, when predicting abnormal stock returns using the FF3 and FF5 models, the most critical feature variables belong to a group of Momentum and Trading Fractions, such as short-term reversals and past trading volumes (DolVol). Although macroeconomic variables contribute to the prediction, their relative importance is low. Conversely, macroeconomic variables play a dominant role in predicting excess stock returns. Among the macroeconomic variables, those from the investor sentiment and financial market groups rank highest in relative importance within the neural network. Similarly, when predicting abnormal stock returns derived from the CAPM, macroeconomic variables from investor sentiment take the lead, which is consistent with the prediction of excess stock returns. The most influential variables are from the Trading Fractions and Momentum groups, aligned with predicting abnormal stock returns using the FF3 and FF5 models. Abnormal returns derived from the simplest factor model, the CAPM, exhibit similarities with both excess stock returns and abnormal returns derived from FF3 and FF5 models.

As depicted in Fig. 6, abnormal stock returns are adjusted using factor models to remove periodic fluctuations related to the macroeconomic conditions inherent in stock returns. Only non-linear relationships and interactive effects from macroeconomic variables contribute to the prediction of abnormal stock returns because linear relationships have already been captured by linear factor models. Consequently, the relative importance of the macroeconomic variables in predicting abnormal stock returns is low. However, macroeconomic variables contain valuable information that contributes the most to predicting excess stock returns, which has not been touched upon by factor models. Hence, the macroeconomic variables rank highest in terms of relative feature importance for predicting excess stock returns. Finally, when predicting abnormal stock returns using the CAPM, the relative feature importance shows mixed evidence. This is because the original CAPM solely employs market returns to capture macroeconomic conditions, which fail to capture a substantial portion of macroeconomic conditions and neglect the periodic fluctuations associated with macroeconomic conditions. In summary, the importance of macroeconomic variables in predicting different measures of stock returns depends on the specific measure being considered and the adjustment made to the returns.

Fig. 13
figure 13

Shap feature importance. This figure shows the importance ranking for predictor variables and variable groups in predicting stock abnormal returns derived from factor models and stock excess return. The ranking is the average of the feature variables’ Shap force. The variable importance measures are evaluated on the testing dataset. Group Shap importance is the feature importance’s sum value within each group

Interaction effects

In the analysis of interaction effects, we explore how a short-term reversal (STreversal) interacts with two macroeconomic indicators: the CFNAI and the investor sentiment index. We specifically chose these variables because of the pronounced influence of short-term reversals and macroeconomic indicators, which typically act as feature variables in neural network models dedicated to stock return predictions. In addition, the CFNAI and investor sentiment index are comprehensive indicators of the overall health of real economic activities and investors’ enthusiasm levels, respectively. However, it is important to note that the interaction effects observed in the neural network model do not imply a causal relationship between the predictor variable and outcomes, nor do they offer actionable investment insights into practical scenarios. Instead, these findings are specific to the context of stock return predictions within specific modeling frameworks.

Figure 14 illustrates the interaction effect between short-term reversals and the macroeconomic variable CFNAI. Notably, we observe a strong interplay between these variables in predicting both excess and abnormal stock returns derived from the CAPM. However, when predicting abnormal stock returns using the FF3 and FF5 models, the interaction effects were less pronounced. This finding reinforces our previous results, indicating that abnormal stock returns derived from the CAPM exhibit similar properties to excess stock returns when fitted to neural network models. Meanwhile, abnormal stock returns derived from the FF3 and FF5 models lack the periodic variance associated with macroeconomic conditions, as they have been accounted for and removed by the factor models. Consequently, the interaction effects of the macroeconomic variables are not as prominent in these cases.

Fig. 14
figure 14

Interaction effect between STreveral and CFNAI. This figure shows the sensitivity of expected monthly returns (vertical axis) to interaction effects for short-term reversal and macroeconomic variable the CFNAI index in the Neural network model (holding all other covariates fixed at their median values, while the macroeconomic variable the CNFAI index is taken at 10%, 30%, 50%, and 70% level)

Figure 15 presents the interaction effects between short-term reversals and the macroeconomic variable of the investor sentiment index. Consistent with our previous findings, we observe that the interaction effects are more pronounced when predicting excess and abnormal stock returns derived from the CAPM. Conversely, the interaction effects are less evident when predicting abnormal stock returns derived from the FF3 and FF5. These results further reinforce our earlier conclusions, highlighting the consistent patterns observed across different macroeconomic variables. Stronger interaction effects in predicting excess stock returns and CAPM-derived abnormal returns suggest that short-term reversals and investor sentiment synergistically contribute to these stock return measures. However, the diminished interaction effects in predicting abnormal returns derived from the FF3 and FF5 models indicate that these factor models have already accounted for and incorporated the influence of investor sentiment, thus diminishing its additional impact.

Fig. 15
figure 15

Interact effect between STreveral and investor sentiment. This figure shows the sensitivity of expected monthly returns (vertical axis) to interaction effects for short-term reversal and macroeconomic variable the investor sentiment index in the Neural network model (holding all other covariates fixed at their median values, while the macroeconomic variable the investor sentiment index is taken at 10%, 30%, 50%, and 70% level

Market timing based on macroeconomic conditions

In the previous sections, we find a clear interaction effect between the macroeconomic variable CFNAI and the firm characteristic variable short-term reversal (STreversal). We also confirm that the inclusion of macroeconomic variables substantially improves the predictive accuracy of the neural network models. This prompts us to consider whether we can identify the optimal market timing to enhance the performance of our ML-driven portfolios. In this section, we extend the approach outlined in  “Neural network Portfolios” section to construct prediction-based portfolios by using a neural network model. However, instead of constructing portfolios using the entire sample, we follow the strategy used in univariate long–short portfolio analysis and divide the entire sample into tertiles based on the macroeconomic variable CFNAI. Following the methodology described in “Univariate long–short Portfolios’ return” section, we define each tertile as representing the expansion, normal, and recession periods. Within each economic period, we sort the stocks into deciles on a monthly basis. Table 7 presents the monthly mean returns and Sharpe ratios for each portfolio. The final row of each panel reports the mean return and Sharpe ratio of the long–short portfolio. The long–short portfolios are constructed by buying stocks in the top decile and selling stocks in the bottom decile.

By segmenting the sample data based on CFNAI, our objective is to capture the varying performance of prediction-based portfolios across different economic conditions. Interestingly, we consistently observe that, more often than not, both the long–short portfolios and long-only portfolios yield higher mean returns and Sharpe ratios during recession and expansion periods. It is notable that only a few long-only portfolios managed to achieve higher mean returns or Sharpe ratios during the “normal” period, characterized by CFNAI index values within the middle range, which signifies a more stable market environment with lower volatility. Moreover, for each long-only portfolio, we observe a distinct progression in the mean return and Sharpe ratio from the top to the bottom decile across different economic periods, regardless of the specific measure of stock return being examined. In conclusion, ML-based portfolios tend to exhibit improved performance, with higher mean returns and Sharpe ratios during recession and expansion periods, when the CFNAI reflects extreme values characterized by higher market volatility.

Table 7 Market timing based on CFNAI

Conclusion

This study offers an in-depth exploration of the network models across various stock return measures. When relying solely on firm characteristic variables, neural network models exhibit consistent performances across different stock return measures. However, including macroeconomic variables significantly boosts the predictive accuracy for excess stock returns, as evidenced by the substantial increase in out-of-sample R-squared values. By contrast, the enhancement in predicting abnormal stock returns derived from different factor models is modest.

The analysis of feature importance in predicting various stock return measures reveals distinct patterns. Macroeconomic variables emerged as the primary drivers of excess stock returns. However, when predicting stock abnormal returns derived from the FF3 and FF5 models, the variables from firm characteristic groups Momentum and Trading Fractions dominate. In the context of predicting abnormal returns based on the CAPM model, both macroeconomic variables from the investor sentiment dataset and firm characteristic variables from the Trading Fractions group take precedence. This divergence can be attributed to the nuances of the factor models, in which advanced factor models effectively adjust for and remove the periodic fluctuations associated with macroeconomic conditions. By contrast, the primitive CAPM relies solely on market returns to account for macroeconomic variations, missing a significant amount of macroeconomic information and disregarding associated periodic fluctuations. This makes its predictions similar to those of excess stock returns.

Furthermore, this study uncovers strong interaction effects between macroeconomic variables and firm-specific characteristics in predicting excess and abnormal stock returns derived from the CAPM. This interplay highlights the significance of considering macroeconomic conditions and their interactions with firm-specific factors to understand stock performance. These findings also offer valuable guidance for market timing strategies, suggesting that portfolios informed by neural network models tend to outperform during both recession and expansion periods, particularly when market volatility is heightened.

In summary, this study contributes to the advancement of empirical asset pricing research by exploring the capabilities of neural network models in predicting different measures of stock returns. The insights gained from this study can inform future research and assist market participants in making informed investment decisions by comprehensively considering the firm-specific characteristics, macroeconomic factors, and interaction effects. Despite its contributions, this study had several limitations. First, we chose a 36-month data window to estimate \(\beta s\) to compute abnormal returns; different choices of rolling windows may introduce potential bias. Additionally, neural network models are highly dynamic, and variations in the model structure or hyperparameters may yield slightly different results, especially concerning the interaction effects highlighted in this study. Second, although our methodology can be readily applied to other stock markets or different periods, we did not conduct a comprehensive robustness analysis. Thirdly, the factor models examined are not exhaustive, omitting emerging factors like Momentum (Carhart 1997), Liquidity (Acharya and Pedersen 2005), and Green-Brown (Pástor et al. 2022), etc. from recent asset pricing literature for example. Finally, although we identified the underlying reasons for the similarities and variances across different measures of stock returns, formal empirical validation of these hypotheses remains lacking. Addressing these nuances and embarking on pragmatic hypothesis testing will undoubtedly be valuable for future studies.