Recurrent double-conditional factor model

Fieberg, Christian; Liedtke, Gerrit; Poddig, Thorsten

doi:10.1007/s00291-024-00771-1

Recurrent double-conditional factor model

Original Article
Open access
Published: 02 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

OR Spectrum Aims and scope Submit manuscript

Recurrent double-conditional factor model

Download PDF

Christian Fieberg¹,
Gerrit Liedtke² &
Thorsten Poddig²

231 Accesses
Explore all metrics

Abstract

In economic applications, the behavior of objects (e.g., individuals, firms, or households) is often modeled as a function of microeconomic and/or macroeconomic conditions. While macroeconomic conditions are common to all objects and change only over time, microeconomic conditions are object-specific and thus vary both among objects and through time. The simultaneous modeling of microeconomic and macroeconomic conditions has proven to be extremely difficult for these applications due to the mismatch of dimensions, potential interactions, and the high number of parameters to estimate. By marrying recurrent neural networks with conditional factor models, we propose a new white-box machine learning method, the recurrent double-conditional factor model (RDCFM), which allows for the modeling of the simultaneous and combined influence of micro- and macroeconomic conditions while being parsimoniously parameterized. Due to the low degree of parameterization, the RDCFM generalizes well and estimation remains feasible even if the time-series and the cross-section are large. We demonstrate the suitability of our method using an application from the financial economics literature.

Dynamic Factor Models

Historical Consistent Neural Networks: New Perspectives on Market Modeling, Forecasting and Risk Analysis

Dynamic factor models: Does the specification matter?

Article Open access 23 November 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Modeling how the behavior of objects, for example, individuals, households, or firms ($\varvec{y}_t$) depends on macroeconomic ($\varvec{z}_t$) and/or microeconomic ($\varvec{C}_{t}$) conditions is a common interest in business studies, economics, and other research areas. Macroeconomic conditions, in a broader sense, are environmental variables that are common to all objects and thus have only a time-series dimension, while microeconomic conditions are object-specific and have both a time-series and a cross-sectional dimension. While the former affect the time-series variability of all objects simultaneously—although to varying degrees—the latter drive the cross-sectional variation among the objects as well as their individual temporal evolution, provided that these object-specific conditions change. A well-known example of such applications in academic research and business practice is the prediction of customer shopping behavior (e.g., Scholdra et al. 2021). For example, Ma et al. (2011) analyze the role of macroeconomic variables $\varvec{z}_t$ (e.g., gasoline price and gross domestic product (GDP) growth), and Kamakura and Yuxing Du (2012) examine how the microeconomic conditions (i.e., characteristics) of households $\varvec{C}_t$ (e.g., household’s net income or demographics) explain dispersion in a household’s total spending ($\varvec{y}_t$). Another application that is frequently studied in the accounting, finance, or operations research area is the prediction of financial distress or bankruptcy (e.g., Liang et al. 2016; Abid et al. 2022). Hernandez Tinoco and Wilson (2013) study how the probability of a firm becoming financially distressed or bankrupt ($\varvec{y}_t$) depends on macroeconomic $\varvec{z}_t$ (e.g., retail price index or treasury bill rates) and microeconomic $\varvec{C}_t$ (e.g., liabilities-to-assets and equity price) variables. The list of further potential applications is long; therefore, we summarize further examples from various research fields in Table 1.

Table 1 Selected examples

Full size table

Table 1 shows that modeling the influences of both macroeconomic and microeconomic variables on response variables is a common problem in a wide variety of economic applications. The mismatch between the dimensions of the macroeconomic and microeconomic variables and the potential existence of interaction effects pose challenges in modeling. In general, there are two major approaches to model $\varvec{y}_{t}$, namely cross-sectional and time-series approaches. A common methodological choice to model a large cross-section over time is to stack cross-sectional and time-series observations into panel matrices and estimate the relationships using methods such as pooled ordinary least squares (POLS) or feed-forward neural networks (FFNNs). Potential interactions of micro and macro variables are accounted for by creating interaction terms. However, this approach faces two drawbacks. First, by interacting macro- and microeconomic variables, the dimensionality increases, which requires the estimation of many parameters.^{Footnote 1} Second, by stacking the data set, POLS or FFNNs treat the entire data set as a cross-section; thus, no temporal dynamics can be accounted for. In time-series analyses, methods such as vector autoregressive (VAR) models (Sims 1980) or recurrent neural networks (RNNs) (Rumelhart et al. 1986b; Elman 1990) are used to capture the temporal dependencies in the data. However, both approaches suffer considerably from the curse of dimensionality, meaning that only a few response variables and no microeconomic conditions can be considered.^{Footnote 2} Accordingly, it is desirable to have a method that can extract the relevant time-series and cross-sectional information from a potentially large number of macroeconomic and microeconomic variables, respectively. Furthermore, the method should be able to model the simultaneous and combined influence of both types of information and should also be able to capture temporal dynamics of the macroeconomic variables. At the same time, it should be ensured that the dimensionality of the parameter space is not inflated so that the interpretation and estimation is feasible on the one hand and to achieve a better generalization (i.e., out-of-sample performance) on the other hand.

In this paper, we present a novel approach that models the time-series and the cross-section simultaneously, satisfying the above-mentioned desired properties. Specifically, we assume that the response variables $\varvec{y}_{t}$ follow a factor structure with factors $\varvec{f}_{t}$ and factor loadings $\varvec{B}_{t}$, where the factors follow a VAR process. The time-series factors represent common state variables that drive the common temporal variation in the response variables and the loadings explain the “reaction” of each response to changes in the factors. Accordingly, factor loadings explain the cross-sectional dispersion among the responses. Both factors and factor loadings are unobservable; however, we assume that macroeconomic variables $\varvec{z}_{t}$ are informative for identifying the factors, which are also assumed to follow a VAR process. Furthermore, microeconomic variables $\varvec{C}_{t}$ are assumed to be informative for explaining the differences in factor sensitivities. Thus, our model is based on three model features that individually have been successfully applied in the literature. Because we condition both factors and factor loadings on observable variables and we allow for an autonomous dynamic factor process, we call our model the recurrent double-conditional factor model (RDCFM).

Each of the three model features can be motivated both economically and technically. We discuss the advantages of the two perspectives throughout the paper and, therefore, emphasize only the central perspectives at this point. The first model feature represents the basic element of our model and is inspired by conditional factor models, as, for example, studied in Ferson and Harvey (1991, 1993) or Maio (2013). That is, we condition unobservable factors $\varvec{f}_{t}$, which are assumed to drive the common time-series variation in the response variables, on exogeneous macroeconomic variables $\varvec{z}_{t}$. This compresses the relevant information from a potentially large number of noisy macroeconomic conditions into a small number of common factors. In comparison with other dimensionality reduction approaches such as principal component analysis (PCA) the assumption that factors depend on exogenous variables drastically reduces the number of parameters to estimate and also allows one to better interpret these factors.

The second model feature accounts for a potential autonomous evolution of the conditional factors $\varvec{f}_{t}$ through time by allowing them to follow a VAR process. This turns static conditional factor models into dynamic factor models, which are particularly popular in macroeconomic applications (e.g., Kose et al. 2003; Del Negro and Otrok 2008; Jungbacker et al. 2011; Bai and Wang 2015). The dynamics of the factors are achieved by marrying conditional factor models with RNNs, which have been proven to effectively capture the temporal evolution in dynamic systems (Rumelhart et al. 1986b; Elman 1990). As a result, the conditional factors have internal memory that makes it possible to maintain information on current macroeconomic conditions over several periods. From a technical perspective, the combination of a conditional factor model and an RNN allows for efficient parameter estimation using well-established methods from the machine learning literature.

The third model feature allows factor loadings $\varvec{B}_{t}$ to be conditional on observed microeconomic conditions $\varvec{C}_{t}$, as, for example, suggested in Vermeulen et al. (1993) and Avramov and Chordia (2006). Recently, Kelly et al. (2019a) propose instrumented principal component analysis (IPCA), a PCA variant that also assumes conditional factor loadings. By conditioning factor loadings on microeconomic variables, we model a response’s “reaction” to changes in the state variables (i.e., factors) as a function of its microeconomic conditions. The estimation of conditional factor loadings has several advantages. First, in large cross-sections, modeling loadings as a function of responses’ microeconomic conditions reduces the number of parameters to estimate compared to unconditional loadings. This, on the one hand, makes model estimation feasible in high dimensional datasets and, on the other hand, increases the generalizability of the model. Second, if microeconomic conditions change over time, the factor loadings also become time-varying, avoiding parameter-rich specifications of time-varying loadings (e.g., Bali et al. 2009, 2017; Su and Wang 2017). Third, as opposed to unconditional loadings, estimating conditional factor loadings does not require a large time-series, as only its current microeconomic conditions need to be known. Last, conditional loadings allows to analyze why loadings differ across responses. While unconditional loadings are “black boxes”, modeling them instead as a function of microeconomic conditions allows one to analyze which variables account the most for cross-sectional variation.

In a broader sense, the RDCFM can be seen as a framework that nests a family of conditional factor models, all of which are applicable to both regression and classification problems. Depending on the specific application at hand, the model features can be enabled and disabled to obtain another model that may be more suited to the respective research application. However, the only empirical requirement is that the response variables $\varvec{y}_{t}$ and macroeconomic conditions $\varvec{z}_{t}$ are observed over time. At the model’s lowest level, factors are assumed to be conditional on macroeconomic variables, and the additional model features (i.e., VAR process, conditional factor loadings) are deactivated. Accordingly, factor loadings are assumed to be unconditional. We refer to this model as conditional factor model (CFM). Additionally, one can allow the factors to follow a VAR process to obtain the recurrent conditional factor model (RCFM). When estimating conditional factor loadings, the CFM and RCFM become the double-conditional factor model (DCFM) and RDCFM, respectively. The model feature to estimate conditional loadings is particularly interesting when analyzing a large cross-section because this drastically reduces the number of parameters to estimate. None of these modifications leads to a substantial change in parameter estimation, making this a one-size-fits-all estimator. Depending on the specific application, a model feature may be relevant or irrelevant. Using the Clark and West (2007) test, we show how the relevant model features can be identified by researchers. In summary, no a priori considerations are required to identify the most suitable model for the specific application at hand. As a result, the RDCFM and its variants can be used in numerous regression and classification applications (see Table 1) and are thus of interest to researchers in various disciplines.

As a hybrid type of a conditional factor model and RNN, the RDCFM itself can also be considered a machine learning method; however, it contributes to the literature on “white-box” machine learning (Loyola-Gonzalez 2019; Nagel 2021) because it is characterized by a high degree of interpretability. We demonstrate this using an application from the financial economics literature, namely, the prediction of stock returns. We choose this application because the dataset (1) is publicly available, (2) has been widely used in the finance literature, allowing us to directly compare the performance of the RDCFM with that of existing factor models, and (3) is characterized by a large time-series and cross-section, which makes this an ideal application for the RDCFM. Specifically, we condition the factors on empirical asset pricing factors $\varvec{z}_t$ (e.g., Fama and French 2015) and the factor loadings on stock characteristics $\varvec{C}_t$, such as financial indicators of firms (e.g., Banz 1981; Rosenberg et al. 1985). First, using the Clark and West (2007) test, we show how to identify the relevant model features (i.e., conditional factors, VAR process, and conditional factor loadings). This not only provides transparency into the model’s inner workings but also allows for the identification of crucial information sources driving the predictions. For the application at hand, we find that all model features contribute to the prediction of stock returns. Second, having identified the most appropriate model for the respective application, we show how to identify the relevant drivers within a specific model feature. That is, we show how to assess the importance and statistical significance of the macroeconomic and microeconomic variables. Furthermore, using Granger (1969) causality and impulse-response analysis, we analyze how time-series information is processed in the model. Within our application, we find that only a few empirical asset pricing factors significantly contribute to model performance, while many asset characteristics explain loadings on conditional factors. Impulse-response analysis reveals that lagged information in empirical asset pricing factors captures relevant information for future returns that is neglected by previous models. Last, we also find that the RDCFM outperforms the prevailing models in the literature as a result of the large amount of information being processed while being parsimoniously parameterized. In comparison, the RDCFM estimates only 0.5% of the number of parameters estimated in an RNN or in PCA, and only 9% of the number of parameters required for estimating IPCA.

The remainder of this paper is organized as follows. Section 2 describes the fusion of an RNN and conditional factor models to form the RDCFM and discusses the estimation of the model parameters. In Sect. 3, we present our empirical application. Finally, Sect. 4 concludes the paper.

2 Methodology

2.1 Recurrent double-conditional factor model

Assume that the realizations of the response variables $i = 1, \dots , N$ at time $t = 1, \dots , T$ follow a low-dimensional factor structure with $j = 1,\dots , K$ common factors:

$$\begin{aligned} y_{i,t} = f_{1,t} \beta _{i,1,t} + f_{2,t}\beta _{i,2,t} + \cdots + f_{K,t}\beta _{i,K,t} + e_{i,t} \end{aligned}$$

(1)

where $y_{i,t}$ denotes the outcome of the i-th response variable at time t, $f_{j,t}$ denotes the realization of the j-th factor at time t, $\beta _{i,j,t}$ is the individual sensitivity of the i-th response variable to the j-th factor at time t, and $e_{i,t}$ represents the component of $y_{i,t}$ that is not captured by the low-dimensional factor structure for which we assume that $\text {COV}\left( e_{i,t},f_{j,t}\right) = 0 \; \; \forall i,j,t$. Note that, for simplicity and without loss of generality, we drop the intercept in Eq. (1) and in our subsequent derivations. However, the intercept can easily be included by adding a constant factor. In matrix notation, we can rewrite Eq. (1) as follows:

$$\begin{aligned} \varvec{y}_{t} = \varvec{f}_{t} \varvec{B}_{t}^{'} + \varvec{e}_t \end{aligned}$$

(2)

with $\varvec{y}_t$ denoting a $1 \times N$ vector of the response variables, $\varvec{f}_t$ is a $1 \times K$ vector of latent factors, and $\varvec{B}_t$ denotes the $N \times K$ matrix of factor loadings at time t. $\varvec{e}_t$ is a $1 \times N$ vector of residuals. The factors $\varvec{f}_{t}$ are common to all objects; however, the sensitivities to these factors $\varvec{B}_{t}$ differ among them. Accordingly, the factors capture global shocks that drive the common time-series variations of all responses and the factor loadings explain the cross-sectional differences among them. The true factors and factor loadings are not observable in reality; however, we assume that we observe two sets of variables, i.e., macroeconomic $\varvec{z}_t$ and microeconomic $\varvec{C}_t$ variables, that are informative about either the factors or factor loadings, respectively. We first describe the estimation of the conditional factors and proceed with notes on modeling conditional factor loadings.

The basic building block of our model is the extraction of latent factors from a potentially large number of macroeconomic conditions (e.g., Ferson and Harvey 1991, 1993; Maio 2013). Denote $\varvec{z}_t$ as a $1 \times L$ dimensional vector of L macroeconomic variables. For the sake of clarity, we implicitly assume that $\varvec{z}_t$ may contain a constant (i.e., constant environment) and variables of any lag polynomial, that is, $\varvec{z}_t = [\varvec{z}_{t}^{*}, \varvec{z}_{t-1}^{*},\ldots , \varvec{z}_{t-p}^{*}]$, where p is the maximum lag order and $\varvec{z}_{t}^{*}$ denotes raw, unlagged macroeconomic variables. We assume that $\varvec{z}_t$ is informative about the factor realizations $\varvec{f}_t$, i.e., the factors are conditional on $\varvec{z}_t$. Accordingly, the factors are linear combinations of $\varvec{z}_t$, as described in Eq. (3):

$$\begin{aligned} f_{j,t}(\varvec{z}_{t}) = \omega _{1,j} z_{1,t} + \omega _{2,j} z_{2,t} + \cdots + \omega _{L,j}z_{L,t} \end{aligned}$$

(3)

where $f_{j,t}$ denotes the j-th factor at time t, conditional on factor instruments $\varvec{z}_t$. The term $\omega _{l,j}$ represents the time-invariant coefficient of the l-th factor instrument on the j-th factor. In matrix notation, we can write:

$$\begin{aligned} \varvec{f}_t(\varvec{z}_{t}) = \varvec{z}_{t} \varvec{\Omega } \end{aligned}$$

(4)

with $\varvec{\Omega }$ denoting an $L \times K$ matrix of coefficients of the factor instruments. The elements of $\varvec{\Omega }$ are the only free parameters for latent factor extraction, and we choose them such that the factor structure in Eq. (2) best explains the response variables. For details on the estimation procedure, we refer to Sect. 2.3. Technically, the construction of the factors can be conceptualized as dimensionality reduction from a potentially large number of (noisy) macroeconomic predictors to a smaller number of (denoised) macroeconomic state variables that capture only the relevant information of the macroeconomic variables.

In time-series analyses, e.g., macroeconomics, dynamic factor models have proven to be superior to static models; therefore, there are many attempts to allow for VAR factors (e.g., Kose et al. 2003; Del Negro and Otrok 2008; Jungbacker et al. 2011; Bai and Wang 2015). By estimating dynamic factors, temporal dependencies in the data are captured and the model can easily be iterated into the future to obtain more accurate predictions. We therefore transform the static conditional factor representation in Eq. (4) into a dynamic representation by allowing the factors to follow an autonomous dynamic process. Inspired by research on RNNs (Rumelhart et al. 1986b; Elman 1990), we implement dynamic factors as described in Eq. (5):

$$\begin{aligned} \varvec{f}_t(\varvec{z}_{t}, \varvec{f}_{t-1}) = \varvec{z}_{t} \varvec{\Omega } + \tanh \left( \varvec{f}_{t-1} \right) \varvec{AR} \end{aligned}$$

(5)

with $\varvec{AR}$ being a matrix of VAR(1) coefficients of dimension $K \times K$. Accordingly, the factor realizations are conditional on exogenous time-series variables and on their own past realizations, enabling information to be persistent for multiple periods. The tanh transformation of past factor realizations ensures that the VAR process is stable (i.e., produces stationary factors). This avoids imposing constraints on the coefficients of the $\varvec{AR}$ matrix, thus increasing computational efficiency, and, allows to capture complex nonlinear temporal dependencies. The factor specification in Eq. (5) links the conditional factor model to RNNs (e.g., Rumelhart et al. 1986b; Elman 1990), which is a key property of the RDCFM for efficient parameter optimization. We discuss this relationship in Sect. 2.2.

Frequently, estimates for factor loadings are obtained via an ordinary least squares (OLS) regression of the response variable on the factors, which implicitly assumes that loadings do not change over time. However, when modeling dynamic systems, this assumption might not hold. Some attempts in the literature to dynamically model factor loadings are presented in Del Negro and Otrok (2008), Bali et al. (2009, 2017). However, these approaches require the estimation of many parameters, increasing the complexity of the model. We follow the approach proposed in Vermeulen et al. (1993), Kelly et al. (2019a, b) and condition factor loadings on microeconomic conditions (i.e., characteristics of the responses) and thus allow theoretical or empirical evidence to be incorporated into the modeling of factor loadings. For example, Kose et al. (2003) find that small and volatile countries are less sensitive to fluctuations in a global business cycle compared to large and stable countries. At the firm level, Fama and French (1993) find that small stocks with high book-to-market equity ratios have higher loadings on a distress factor than large stocks with low book-to-market equity ratios. By incorporating such information directly into the modeling of factor loadings, we can obtain more precise estimates and are also able to explain differences in factor loadings in a model-driven manner. Furthermore, we obtain time-varying loadings if the conditioning variables evolve over time.

Denote $\varvec{C}_t$ as an $N \times M$ matrix of characteristics of the responses at time t with M denoting the number of characteristics. As for the factor instruments, we implicitly assume that $\varvec{C}_t$ may contain characteristics of any lag polynomial and a constant characteristic that explains a part of the factor loadings that is independent of any characteristic. In line with Vermeulen et al. (1993) and Kelly et al. (2019a, b), we obtain conditional factor loadings as linear combinations of these characteristics. Specifically, we estimate a matrix $\varvec{\Gamma }_{\beta }$ of dimension $M \times K$ that assigns for each factor loading a fixed coefficient to each characteristic. Accordingly, the conditional sensitivity $\beta _{i,j,t}$ of a response i to a factor j at time t can be written as:

$$\begin{aligned} \beta _{i,j,t}\left( \varvec{C}_t\right) = \gamma _{1,j} c_{i,1,t} + \gamma _{2,j} c_{i,2,t} +\cdots + \gamma _{M,j} c_{i,M,t} \end{aligned}$$

(6)

where $\gamma _{m,j}$ denotes the coefficient of the m-th characteristic for the loading on the j-th factor. The term $c_{i,m,t}$ refers to the m-th characteristic of response i at time t. We rewrite Eq. (6) in matrix notation and obtain the matrix of conditional factor loadings $\varvec{B}_t$ as follows:

$$\begin{aligned} \varvec{B}_t \left( \varvec{C}_t\right) = \varvec{C}_t \varvec{\Gamma }_{\beta } \end{aligned}$$

(7)

By conditioning the factor loadings on observable characteristics, we achieve a drastic parameter reduction compared to unconditional factor loadings if $M < N$. Furthermore, we also allow for time variation in loadings, provided that the characteristics change over time. As a result, the factor loadings not only explain the cross-sectional dispersion among the responses but also model an individual temporal evolution due to changes in the response’s microeconomic conditions.

After deriving the construction of both the dynamic conditional factors and the conditional factor loadings, we can write the RDCFM as an extended version of Eq. (2) as follows:

$$\begin{aligned} \varvec{y}_{t} = \underbrace{\Big ( \varvec{z}_{t}\varvec{\Omega } + \tanh (\varvec{f}_{t-1}) \varvec{AR} \Big )}_{\varvec{f}_t} \underbrace{\Big (\varvec{C}_{t} \varvec{\Gamma }_{\beta }\Big )^{'}}_{\varvec{B}_{t}^{'}} + \varvec{e}_{t} \end{aligned}$$

(8)

The RDCFM nests several model variants that can be obtained by enabling and disabling certain model features. For example, when estimating unconditional instead of conditional factor loadings, we obtain either a conditional factor model (CFM) in the spirit of Ferson and Harvey (1991, 1993) ($\varvec{AR} = 0$ and unconditional loadings) or a recurrent conditional factor model (RCFM) ($\varvec{AR} \ne 0$ and unconditional loadings). Additionally, we can also estimate a double-conditional factor model (DCFM) with static conditional factors and conditional factor loadings ($\varvec{AR} = 0$ and conditional loadings).

The preceding derivations and the model formulation in Eq. (8) address the solution of regression problems. As discussed in “Appendix 1.2”, the concepts described here can be adapted to classification problems with only a few methodological changes.

2.2 Relation to recurrent neural networks

This section emphasizes the similarities of RDCFMs with RNNs and discusses the advantages of our model. We focus on “standard” RNNs as originally presented in Rumelhart et al. (1986a) and Elman (1990). An RNN identifies a current hidden system state $\varvec{s}_{t}$ and estimates the output by mapping this state to the responses via an arbitrary activation function g(x). When solving regression problems, it is common practice to assume a linear activation function:

$$\begin{aligned} \varvec{y}_t = \varvec{s}_t \varvec{W_y}' + \varvec{\epsilon }_{t} \end{aligned}$$

(9)

where $\varvec{W_y}$ is a weight matrix of dimension $N \times K$, with K denoting the number of hidden states. The $1 \times K$ vector of hidden states is obtained by:

$$\begin{aligned} \varvec{s}_t = \tanh \left( \varvec{z}_t \varvec{W_z} + \varvec{s}_{t-1} \varvec{W_s} + \varvec{b} \right) \end{aligned}$$

(10)

where $\varvec{W_s}$ is a $K \times K$ weight matrix that describes the transition of the previous state to the next state and $\varvec{W_z}$ is an $L \times K$ weight matrix to map the input variables $\varvec{z}_t$ onto the hidden states. The $1 \times K$ vector $\varvec{b}$ denotes the bias of the hidden states (i.e., a constant state).

The weight matrices $\varvec{W_z}$ and $\varvec{W_s}$ are equivalent to the RDCFM parameter matrices $\varvec{\Omega }$ and $\varvec{AR}$, respectively. Assuming that the factor instruments include a constant factor instrument, the bias term $\varvec{b}$ in Eq. (10) is equivalent to the $\varvec{\Omega }$ coefficient on the constant factor instrument. Accordingly, the hidden states of an RNN are analogous to the latent factors of the RDCFM with the difference that we apply the tanh transformation in the RDCFM only to the past factor realizations. We do so to be more in line with linear conditional factor models; however, if one wishes to be more in line with RNNs, then conditional factors may also be extracted according to Eq. (10). As a result, the extraction of latent factors would be nonlinear. Interpreting the hidden states as latent factors implies that the weight matrix $\varvec{W_y}$ can be interpreted as a matrix of (time-invariant) factor loadings.

Figure 1 illustrates the RDCFM in analogy to the graphical representation of RNNs as presented in, for example, Zimmermann et al. (2005). The RDCFM is unfolded in time for three time steps, with the first two time steps ($(t-1)$ and t) indicating in-sample periods and the last time step ($t+1$) denoting an out-of-sample time step. The dots indicate that there may be several previous and subsequent time steps $(t-r)$ and $(t+h)$, respectively. The extraction of conditional factors in the RDCFM is equivalent to the construction of the hidden states in an RNN. However, instead of estimating a parameter-rich output weight matrix $\varvec{W_{y}}$, the RDCFM uses information in $\varvec{C}_{t}$ to estimate a time-varying conditional output weight matrix (i.e., factor loadings). The advantage of conditioning the factor loadings increases as the difference between the number of response variables and the number of characteristics becomes larger. We will highlight this throughout our empirical analysis. To summarize, one may regard the RDCFM as an extension of conditional factor models (originated in econometrics) or as an extension of RNNs (originated in machine learning), or as an unifying model connecting these two distinct strands of research. It is worth mentioning that the RNN is included as a special case of the RDCFM when estimating unconditional factor loadings, i.e., the RCFM.

2.3 Parameter estimation

This subsection describes the numerical estimation of the RDCFM parameters for solving regression problems. We discuss the methodological changes related to the solution of classification problems in “Appendix 1.2”. In addition to the model parameters $\varvec{\Gamma }_{\beta }$, $\varvec{\Omega }$, and $\varvec{AR}$, we treat the initialization of the factors $\varvec{f}_0$ for the VAR process as a set of free parameters. As long as the number of factors is small, this does not significantly increase the number of free parameters. We choose to find the parameters according to a least squares criterion; that is, we minimize the sum of squared residuals:

$$\begin{aligned} \min _{\varvec{\Omega }, \varvec{AR}, \varvec{f}_0, \varvec{\Gamma }_{\beta }} \sum _{t=1}^{T} \sum _{i=1}^{N} \Big ( y_{i,t} - \big ( \underbrace{\left[ \varvec{z}_{t}\varvec{\Omega } + \tanh (\varvec{f}_{t-1}) \varvec{AR}\right] }_{\varvec{f}_t} \underbrace{(\varvec{c}_{i,t} \varvec{\Gamma }_{\beta })^{'}}_{\varvec{\beta }_{i,t}^{'}} \big ) \Big )^2 \end{aligned}$$

(11)

Given a matrix of latent factors, we can analytically solve for $\varvec{\Gamma }_{\beta }$ via linear least squares. However, due to the existence of dot products between factors and factor loadings and the (nonlinear) VAR process of the factors, we must solve for $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_{0}$ numerically via nonlinear least squares. We begin by describing the estimation of the factor loading coefficients $\varvec{\Gamma }_{\beta }$ and continue with the more complex estimation of $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_0$.^{Footnote 3}

According to Vermeulen et al. (1993) and Kelly et al. (2019a, b), conditional factor loadings can be obtained via a POLS regression of the pooled response variables $\varvec{\widetilde{Y}}$ on a matrix of pooled cross-sectional instruments $\varvec{\widetilde{C}}$ interacted with latent factors $\varvec{\widetilde{F}}$^{Footnote 4}:

$$\begin{aligned} \varvec{\widetilde{Y}} = \left( \varvec{\widetilde{C}} \otimes \varvec{\widetilde{F}}\right) \cdot \text {vec}\left( \varvec{\Gamma }_{\beta }^{'}\right) \end{aligned}$$

(12)

We denote stacked variables with $(\widetilde{\bullet })$, i.e., $\varvec{\widetilde{Y}}$ is an $(NT)\times 1$ vector, $\varvec{\widetilde{C}}$ is an $(NT)\times M$ matrix, and $\varvec{\widetilde{F}}$ is an $(NT)\times K$ matrix, where each observation of the factors is repeated N times. To be more precise, the least-squares estimate for $\text {vec}\left( \varvec{\Gamma }_{\beta }^{'}\right)$ is:

$$\begin{aligned} \text {vec}(\varvec{\Gamma }_{\beta }^{'}) = \left( \sum _{t=1}^{\tau } \varvec{C}_{t}^{'} \varvec{C}_{t} \otimes \varvec{f}_{t}^{'}\varvec{f}_{t}\right) ^{-1}\left( \sum _{t=1}^{\tau } \varvec{y}_{t} \left[ \varvec{C}_{t} \otimes \varvec{f}_{t}\right] \right) ^{'} \end{aligned}$$

(13)

The estimates of $\varvec{\Gamma }_{\beta }$ are arranged as an $(MK) \times 1$ vector; therefore, the vector has to be reshaped into an $M\times K$ matrix.

Given any arbitrary estimate of $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_0$, we can construct the corresponding factors and thus provide the estimate for $\varvec{\Gamma }_{\beta }$. Therefore, the objective function from Eq. (11) can be reduced to depend only on $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_0$. Because our objective still contains dot products between factors and factor loadings and further includes a (nonlinear) VAR factor process, we solve for $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_0$ numerically via a nonlinear least squares algorithm. The problem may be solved by any derivative-free solver or solver that uses first derivatives approximated by finite differences. However, as the sizes of the time-series and cross-section increase, the use of a derivative-free solver or finite difference approximation becomes computationally infeasible. One way to address this problem is to analytically calculate the gradient of the loss function with respect to the unknown parameters. However, as each factor depends on previous factor realizations, the gradient calculation becomes more complex. By taking advantage of the link between the RDCFM and RNNs, we address this problem by using backpropagation through time (BPTT) to analytically compute the gradient (Rumelhart et al. 1986b; Werbos 1988, 1990). BPTT accumulates the gradient over time, taking into account the temporal interconnections of the factors. To save space, we present the derivation of the gradient calculation in “Appendix 1.1”. Given the gradient with respect to the model parameters, we update the parameters using the well-established Broyden–Fletcher–Goldfarb–Shanno (BFGS) (Broyden 1970; Fletcher 1970; Goldfarb 1970; Shanno 1970) second-order optimization algorithm because it has been extensively researched in operations research over time and provides powerful, fast, and computationally efficient solutions.

3 Empirical application

In Sect. 2.1, we show that the building blocks of the RDCFM architecture can be enabled and disabled to obtain a family of models, including the DCFM, RCFM, and CFM. This property makes the RDCFM a flexible model that can be adapted to a wide variety of applications.

For illustration purposes, we apply the RDCFM to the prediction of stock returns. We choose this application because the data set (1) is publicly available, (2) is characterized by a large time-series and cross-section, and (3) has been widely used in the finance literature, allowing us to directly compare the performance of the RDCFM with that of existing financial models. In Sect. 3.1, we introduce the asset pricing application by describing the data set. We proceed with the details on model estimation and validation in Sect. 3.2. The presentation of our empirical results in Sect. 3.3 is threefold. First, we demonstrate how researchers can identify which model features of the RDCFM (i.e., conditional factors, conditional loadings, and VAR process) are important in a specific application (Sect. 3.3.1). Second, we show how to identify the performance drivers within a model feature by using permutation importance, Granger causality, and impulse-response analysis (Sect. 3.3.2). Last, we compare the performance of the RDCFM with that of prevalent benchmark models in Sect. 3.3.3.

3.1 Data

For comparability with the previous results reported in the literature and because the data are publicly available, we use the same dataset as studied in Kelly et al. (2019a). The sample spans the period from July 1963 to May 2014, covering 599 months and 11,452 individual stocks. The total number of available firm-month return observations is 1.4 million.

The assumption that stock returns $\varvec{y}_{t}$ follow a low-dimensional factor structure is in line with economic theory (Ross 1976). We condition the factors $\varvec{f}_{t}$ on a constant to account for a permanent environment and on ten empirical asset pricing factors that are well-known to explain the time-series variation in stock returns (e.g., Fama and French 1993, 2015). The literature suggests that these empirical factors may proxy for future GDP growth (Liew and Vassalou 2000) or innovations in state variables (Petkova 2006); therefore, we use these as conditioning variables for the factors. We provide a detailed description of the factor instruments in “Appendix 2.1”. To be consistent with the asset pricing research, we do not lag the factor instruments, such that factor realizations explain the common contemporaneous temporal variation in the returns. We also follow the asset pricing literature when imposing an orthogonality constraint on the estimated factors (Ross 1976) by requiring the variance inflation factors (VIFs) of the conditional factors to be less than 2.^{Footnote 5} Note that this restriction is finance-specific and may not be relevant in other applications. We further restrict the Euclidean norm of the factors to control their magnitude, i.e., we require the maximum norm of the factors, scaled by the number of observations, to be less than one.^{Footnote 6}

A common assumption is that the sensitivities to risk factors $\varvec{B}_{t}$ are static (e.g., Fama and French 1993). In an asset pricing context, these sensitivities to risk factors can be interpreted as asset-specific systematic risks.^{Footnote 7} Because a firm may evolve from a small to a large business, assuming that these risks are constant over the life cycle of a firm may be an oversimplification and may thus lead to model misspecification. The literature finds that stock returns vary with the spectrum of firm-specific characteristics $\varvec{C}_{t}$, e.g., market value (Banz 1981) or the book-to-market equity ratio (Rosenberg et al. 1985). Based on these findings, Fama and French (1993) argue that small firms with high book-to-market equity ratios earn higher returns because they have higher loadings on a distress factor. Accordingly, the asset-specific risk may be conditional on the asset’s characteristics. The literature identifies hundreds of characteristics that appear to explain the cross-sectional dispersion in (stock) returns (Hou et al. 2020); thus, there is a large number of possible factor loading instruments. We condition the factor loadings on a constant characteristic and on the same 36 firm characteristics as studied in Kelly et al. (2019a). We provide a description in “Appendix 2.2”. We lag each characteristic by one month and apply a cross-sectional rank transformation to the characteristics to map them into the [$-$0.5, 0.5] interval. Note that this data transformation is finance-specific and, technically, any other (or no) form of data transformation can be used.

3.2 Model estimation and validation

We use recursive model estimation to evaluate the out-of-sample model performance (Kelly et al. 2019a). Specifically, we start by using the first 120 months of observations for the estimation of the model parameters (i.e., in-sample period) and proceed to estimate the (out-of-sample) one-month ahead returns in $t = 121$. Next, we expand the time horizon and re-estimate the parameters by using in-sample data up to $t = 121$ and estimate the next month’s returns in $t=122$. We continue until the sample ends in period $t = 599$.

Our central performance metric is the out-of-sample mean squared prediction error (MSPE) because it is the most prevalent for regression problems in numerous research disciplines and is directly targeted by the objective function in Eq. (11).^{Footnote 8} We define the MSPE as:

$$\begin{aligned} MSPE = \frac{1}{T-1} \sum _{t=1}^{T-1} \frac{1}{N_{t+1}} \sum _{i=1}^{N_{t+1}} \left( y_{i,t+1} - \hat{\varvec{\lambda }}_{t+1} \varvec{\beta }_{i,t} \right) ^2 \end{aligned}$$

(14)

with $N_{t+1}$ denoting the number of response variables in $t+1$, and $\varvec{\beta }_{i,t}$ denoting the object-specific factor loadings, with the subscript t indicating that the factor loadings are based on lagged characteristics, i.e., $\varvec{\beta }_{i,t} = \varvec{C}_{t} \varvec{\Gamma }_{\beta }$. The term $\hat{\varvec{\lambda }}_{t+1}$ denotes the ex ante factor estimate in $t+1$. Because we do not lag the factor instruments, we use their in-sample mean to obtain the factor estimate, as described in Eq. (15):

$$\begin{aligned} \hat{\varvec{\lambda }}_{t+1} = \bar{\varvec{z}} \varvec{\Omega } + \tanh \left( \hat{\varvec{f}}_{t} \right) \varvec{AR} \end{aligned}$$

(15)

with $\bar{\varvec{z}}$ denoting the $1 \times L$ vector of the (in-sample) means of the factor instruments.

To evaluate the relevance of a certain model feature, we test whether the model’s out-of-sample prediction error significantly decreases when the respective feature is enabled. Similarly, we test whether estimating an additional factor improves model performance. For this purpose, we use the Clark and West (2007) test for nested models.^{Footnote 9} We do not report the details of the implementation here and instead refer to “Appendix 1.4”.

3.3 Empirical results

3.3.1 Analyzing model feature importance

Our empirical analysis starts with a performance comparison of the RDCFM and its variants by reporting their out-of-sample MSPEs in Table 2. The MSPE of the one-factor RDCFM is 250.6 and monotonically decreases up to five factors, where the RDCFM achieves its minimum MSPE of 249.3. In comparison, the minimum MSPE of the static DCFM (i.e., VAR process disabled) is 250.1 when estimating six factors. In general, the RDCFM has lower prediction errors than the DCFM in any pairwise comparison, indicating that the VAR component is informative for capturing the variation in stock returns.

Table 2 Out-of-sample comparison of conditional factor models

Full size table

While the prediction errors of models with conditional loadings (i.e., RDCFM and DCFM) tend to decrease with additional factors, we find the opposite for models with unconditional loadings (i.e., RCFM and CFM). The MSPE for the RCFM increases from 251.0 for the one-factor model to 274.0 when six factors are included. Likewise, the error of the CFM increases with additional factors, even if the differences are smaller. This indicates an overfitting effect with respect to the factor loading estimates. Table 3 shows the number of parameters to estimate for the models. Although additional factors only slightly increase the number of additional model parameters for models with conditional factor loadings, the parameter space for the RCFM and CFM is massively inflated. The number of parameters required to estimate the factors increases at the same rate for all models, but the RCFM and CFM estimate NK parameters for the (unconditional) factor loadings, while the RDCFM and DCFM estimate only MK parameters for the (conditional) factor loadings. To quantify the advantage of conditioning factor loadings in this application, the RDCFM estimates less than 0.5% as many parameters as the RCFM while estimating the same number of factors. Accordingly, by conditioning the factor loadings on microeconomic variables, the RDCFM not only processes more information but also is parsimoniously parameterized, resulting in improved out-of-sample performance.

Table 3 Number of free parameters in conditional factor models

Full size table

Next, we test which model features in fact lead to a statistically significant lower prediction error. Specifically, using the Clark and West (2007) test, we test whether (1) the RDCFM has significantly lower prediction errors than the CFM, RCFM, and DCFM, (2) the DCFM has lower errors compared to the CFM, and (3) the RCFM outperforms the CFM. Table 4 reports the differences in the MSPEs of the models. If the null hypothesis of equal MSPE values can be rejected at the 1%, 5%, or 10% levels, we denote this with asterisks. Panel A reports the differences in MSPE when testing the RDCFM against the DCFM, RCFM, and CFM. The RDCFM significantly outperforms all its variants in each specification, suggesting that both the VAR process of the factors and the conditional factor loadings are informative for predicting stock returns. The differences in the MSPE values monotonically increase with additional factors (except for the six-factor RDCFM and DCFM) and are statistically significant at the 1% level. The finding that conditioning factor loadings on observable variables significantly improves the prediction accuracy is also confirmed by comparing the DCFM with the CFM (Panel B). The differences monotonically increase with the number of factors, and again, all differences are statistically significant at the 1% level. The monotonicity in the MSPE differences of the conditional and unconditional models is due to the overfitting of the unconditional models when estimating many parameters for the factor loadings. The results in Panel C show that the RCFM cannot outperform the CFM when estimating at least three factors. This is due to the large number of free parameters in both models. In the RCFM, overfitting occurs because the VAR process captures random noise in the responses rather than systematic movements. The resulting model is too complex and fails to explain unseen data. Combined with the results from Panel A, this suggests that the VAR process of the factors is informative only in combination with the dimensionality reduction due to the estimation of conditional factor loadings.

Table 4 Testing the relevance of the RDCFM features

Full size table

After determining that all features of the RDCFM are relevant in this application, we ask how many factors are truly relevant. To address this question, we compare the MSPE values of the RDCFM with different numbers of factors and test for statistically significant differences by using the Clark and West (2007) test. Table 5 reports the results. The main interest is the diagonal, as this indicates whether the addition of an extra factor significantly reduces the prediction error. We find that the null hypothesis of equal predictive accuracy can be rejected at the 1% level if up to five factors are estimated. We therefore use this specification in our subsequent analyses.

Table 5 Testing the number of RDCFM factors

Full size table

3.3.2 Analyzing importance within model features

Having identified that all model features are relevant for our application, we next aim to identify the the performance drivers within a specific feature. Specifically, we delve deeper into each model feature of the five-factor RDCFM, by examining (1) which microeconomic variables mainly contribute to the conditional factor loadings, (2) which macroeconomic variables are most relevant for explaining the common time-series variation in the responses, and (3) whether the nonlinear VAR process contains valuable information.

Our analysis starts with an assessment of the relevance of microeconomic conditions. Specifically, we use permutation importance, an approach established in the machine learning literature, to assess how microeconomic variables contribute to the model performance (e.g., Breiman 2001). Permutation importance is easy to implement, characterized by a high degree of interpretability, and and does not depend on the underlying model or application; hence, it can be used to identify the important variables in the DCFM, RCFM, and CFM. We define the importance of an instrument as the increase in the in-sample mean squared error ($MSE^{IS}$),^{Footnote 10} as a result of shuffling the order of its observations while leaving everything else unchanged. When the increase in model error is larger, the variable is more important for the model.

We first obtain the model parameters from our best model (i.e., five-factor RDCFM) found in Table 2, i.e., $\varvec{\Omega }$, $\varvec{AR}$, $\varvec{f_0}$, and $\varvec{\Gamma }_{\beta }$. For $M = 100$ permutations, we randomly shuffle the observations of an instrument, recalculate the conditional factor loadings as described in Eq. (7) and estimate the response variables (i.e., returns) according to Eq. (2). We rank the instruments according to the average change in $MSE^{IS}$ compared to the original model. A large increase in $MSE^{IS}$ indicates a high level of importance, whereas a small increase indicates low importance. We augment the permutation importance results by testing whether an instrument statistically significantly contributes to the overall model performance, by performing an F-test based on a double-bootstrap apporach, as described in “Appendix 1.3.1”.

Table 6 reports the results, which generally support the findings in the previous literature. Of the 36 firm-level characteristics considered, 17 capture the variation in the cross-section of stock returns, supporting the view that asset returns are multidimensional (Cochrane 2011; Kelly et al. 2019a). By far, the most important firm characteristics are market value (mktcap: 11.4) and total assets (assets: 7.88), both acting as proxies for firm size (Banz 1981). Other important predictors are the market beta (beta: 2.17) (Sharpe 1964; Lintner 1965; Frazzini and Pedersen 2014), twelve-month momentum (mom: 2.04) as documented in Jegadeesh and Titman (1993) and Asness et al. (2013), and assets-to-market (a2me: 1.82) (Fama and French 1992). Interestingly, the ranking matches the importance ranking reported in Kelly et al. (2019a) for IPCA, which also suggests that the short-term reversal (strev: 1.49) and turnover (turn: 0.98) (Datar et al. 1998) characteristics are important. In summary, the findings reveal that many asset characteristics have predictive power for future returns. However, many characteristics frequently proposed for predicting future returns, such as accruals (oa) (Sloan 1996), free cash flow (freecf) (Lakonishok et al. 1994), earnings-to-price (e2p) (Basu 1983), and book-to-market (bm) (Rosenberg et al. 1985) do not account for the substantial variation in returns, after controlling for other characteristics.

Table 6 Importance of factor loading instruments

Full size table

Next, we inspect the conditional factors. Due to the memory effect of the latent factors, a factor instrument may affect factor construction and, thus, the model estimates for several periods. Our subsequent analysis is twofold. We start by using permutation importance to assess the overall relevance of the factor instruments. Because permutation importance breaks the temporal order of the instruments, we not only assess how current instruments contribute to the model performance but also measure the relevance of the instruments through $\varvec{AR}$. However, permutation importance does not reveal how information flows through the RDCFM. We therefore use Granger (1969) causality and impulse-response analysis, two indispensable tools in time-series analyses, to reveal the short- and long-term memory effects of the latent factors.

Table 7 reports the permutation importance results for the factor instruments.^{Footnote 11} Statistical significance of a factor instrument is assessed using the test procedure described in “Appendix 1.3.2”. Among all factor instruments, only five significantly contribute to the model performance, with the most important factor instrument being the market return (Mkt). Shuffling its order yields an increase in the model’s MSE$^{IS}$ of 21.3. This finding is consistent with economic theory and empirical findings that there is a priced market factor in stock returns (Sharpe 1964; Lintner 1965; Fama and French 1993, 2015). The next most important instruments are SMB, WML, HMLm, and ROE, which generally supports the findings in the literature that they capture a substantial fraction of variation in the cross-section of stock returns (Fama and French 1993; Carhart 1997; Barillas and Shanken 2018).

Table 7 Importance of factor instruments

Full size table

After having identified the relevant microeconomic and macroeconomic variables, we next inspect the temporal dynamics in the RDCFM. The previous performance comparison reveals that the VAR process of the factors captures relevant information for future returns; however, how does information flow through the RDCFM?

We begin the analysis of the temporal dynamics of the factors by using the Granger (1969) causality test. Granger causality is a statistical test that assesses whether there are significant autocorrelations across the time-series (Granger 1969, 1980, 1988). Specifically, Granger causality tests whether any lag of a series (with a prespecified lag length p) contributes to the prediction of another series. Therefore, the Granger causality test is based on an F-test of the null hypothesis that the restricted model, i.e., the model that drops a factor, has similar predictive power to the unrestricted model that includes all lagged factors. In our RDCFM specification, we allow for a lag of length one; hence, we restrict our attention to the first lag, which reduces the Granger causality test to a t-test of the regression coefficients.

Table 8 reports the estimated coefficients of a VAR(1) model. The overall findings indicate that there are significant temporal dependencies between the conditional factors. Specifically, the first factor exhibits significant autocorrelation (0.07, p value = 0.14%) and also “Granger-causes” the third (0.11, p value = 0.00%) and fourth ($-$ 0.09, p value = 0.00%) factors. Interestingly, lagged realizations of the third factor are informative for the first factor ($-$ 0.42, p value = 2.50%), indicating that there is a direct feedback loop between them. Such direct feedback loops exist if causality is bidirectional, implying complex interdependence among conditional factors. A direct feedback loop can also be observed for the fourth and fifth factors: The fourth factor Granger-causes the fifth factor ($-$ 0.20, p value = 1.92%), and the fifth factor Granger-causes the fourth factor ($-$ 0.03, p value = 0.00%), indicating that information returns to the fourth factor in the next period through the fifth factor.

Table 8 Granger causality test

Full size table

The results in Table 8 show that there are significant temporal dependencies between conditional factors; however, direct feedback loops make it difficult to follow the information flow. Moreover, indirect feedback loops may occur, making interpretation even more difficult. As opposed to direct feedback loops shown above, information returns to a conditional factor over several periods if there are indirect feedback loops. To illustrate this, assume a time-series A Granger-causes B, and B Granger-causes C. If C Granger-causes A then there is an indirect feedback loop between A and C through B. As a result, in a system with many temporally interconnected variables, it is challenging to follow the information flow. Furthermore, because only a fraction of past information is shared among the factors, it is questionable how long certain information is relevant for factor construction and, thus, for affecting the common time-series variation of the response variables. Therefore, we quantify the persistence of system shocks by using impulse-response analysis, a tool frequently applied in time-series analyses (e.g., Potter 2000; Lütkepohl and Schlaak 2019; Lee et al. 2021).

Impulse-response analysis quantifies the flow of information by comparing two sample paths for the variable(s) of interest. One path is subject to an exogenous shock in a system variable at time $t_i$ (which denotes the impulse period) and the other is not. By comparing the evolution of these paths through time, we analyze how the shock in $t_i$ affects all system variables in future periods $t_{i + 1}, t_{i + 2} \dots , t_{i + R}$ with R denoting the length of the forecasting horizon.

In our last in-sample period ($t = 599$), we add a one-standard-deviation shock to the factor realizations of our five-factor RDCFM and estimate the factors for the following $r = 1, \dots , 5$ periods. We use the generalized impulse-response function of Koop et al. (1996) to quantify the magnitude and persistence of exogenous shocks:

$$\begin{aligned} GI_{f_j}(t_{i}+r,\nu _{t_i},\Phi _{t_i}) = E\left[ f_{t_{i+r},j}|\nu _{t_{i},j}, \Phi _{t_i}\right] - E\left[ f_{t_{i+r},j}|\Phi _{t_{i}}\right] \; \; \; \; \forall \; r = 1,\ldots ,R \end{aligned}$$

(16)

where $GI_{f_j}(t_{i}+r,\nu _{t_i},\Phi _{t_{i} - 1})$ denotes the impulse-response of the j-th factor in period $t_{i}+r$, $t_{i}$ refers to the shock period, $\nu _{t_{i},j}$ is the one-standard-deviation exogenous shock to the j-th system variable (factor) at $t_i$, and $\Phi _{t_i-1}$ denotes the information available upon the shock period. The first term of Eq. (16) thus represents the factor process under an exogenous shock, and the second term denotes the baseline factor process. By analyzing how an isolated impulse in a factor at $t_i$ affects the next period’s realizations, impulse-response analysis may provide a better picture of the factor dynamics.

Figure 2 presents the responses of each factor in our five-factor model to an isolated shock to each factor. Specifically, we add a one-standard-deviation shock to the j-th factor realization in the last sample period ($t = 599$) and “overshoot” our system for $R = 5$ periods. That is, we calculate factors for periods $t = 600,\ldots ,604$ according to Eq. (5), where $\varvec{f}_{599}$ receives a shock. The x-axis in Fig. 2 denotes the index of the period following the impulse period. The first factor responds most strongly to impulses in the first factor itself, which supports the findings above that this factor exhibits a high degree of autocorrelation. The first factor also responds to impulses in the second and third factors. However, the responses to the impulse in factors one and three approach zero in the second period following the shock (i.e., $t = 601$). Interestingly, the response to impulses in the second factor lasts for two periods, with a change in sign after the first period. However, even the response to the second factor converges to zero at $t = 602$, suggesting that the first factor has only short-term memory. The responses for most factors last at most two periods, with the fourth factor having the longest memory. The responses to impulses in the first and fourth factors last for at least three periods. The responses to the second factor approach zero after the fourth period. The large degree of autocorrelation with itself and feedback loops across the factors may cause the fourth factor to have the longest memory.

The impulse-response analysis in Fig. 2 shows that the factors have memory, allowing system shocks to persist for several periods. Since our factors are latent variables constructed from observable variables, such system shocks have their origin in shocks to the factor instruments. Therefore, we perform impulse-response analysis to directly analyze how factor instruments contribute to the factor realizations, not only for the current factor composition but also for the subsequent periods due to the memory of the factors. Again, we estimate all model parameters using the entire data set. We estimate factors for the next $R = 5$ periods and assume that a one-standard-deviation shock hits the system through the time-series instruments at time $t=600$. Because we do not lag the factor instruments, we hold all other factor instruments at their mean (i.e., zero for the centered nonconstant instruments). Thus, the only nonzero elements in the instrument vector at time $t = 600$ are the l-th factor instrument and the constant. The only nonzero element for the subsequent periods is the constant factor instrument.

The results are shown in Fig. 3. We also report the immediate responses of each factor to shocks in the factor instruments ($r = 0$), which quantifies the sensitivity of a factor to changes in the instruments through $\varvec{\Omega }$. The responses with respect to past shocks in the instruments start in period $r > 0$. In $r = 0$, the first factor strongly reacts to shocks in the market (Mkt) and size (SMB) factors. However, the responses in $r = 1$ are close to zero, indicating that these variables are less important for subsequent periods.

As already indicated by Fig. 2, the fourth factor has the longest memory process. Again, its responses show an interesting pattern. In $r=0$, the factor reacts to MKT, SMB, HML, RMW, and ROE. Due to the large degree of autocorrelation with themselves, it is not surprising that shocks to these variables cause longer responses. The contemporaneous contribution of WML to the fourth factor is zero, as indicated by the response in $r=0$. However, there is a response in $r = 1$, suggesting that WML enters the fourth factor due to correlation with lagged factors. Interestingly, the response to WML in the first period after the impulse is larger than that to the market return, RMW, ROE, and HML (in absolute terms). A look at the other factors reveals that, with the only exception being the second factor, the responses of all factors with respect to innovations in WML at $r = 1$ are distinguishable from zero, suggesting that lagged WML is relevant for explaining the time-series variation of stock returns.

3.3.3 Comparison with benchmark models

Within the model family, the RDCFM outperforms all other conditional factor models. However, how does it perform compared to benchmark models previously studied in the literature? In the forecasting literature, it is common to compare model forecasts with a naive benchmark model to analyze whether a model provides any valuable information. Therefore, we compare the predictions of the RDCFM with the prevailing (time-series) mean forecast. However, for the prediction of individual stock returns, a naive zero-return forecast often outperforms the mean model (Gu et al. 2020; Cakici et al. 2023a, b; Fieberg et al. 2023); therefore, we additionally compare the RDCFM with the zero return benchmark. To analyze whether the RDCFM can compete with other factor models from the finance literature, we compare the performance of the RDCFM relative to these models. We begin with an empirical benchmark, i.e., empirical factor models, including the capital asset pricing model (CAPM), the Fama-French 3-, 4-, 5-, and 6-factor models (Carhart 1997; Fama and French 1993, 2015, 2018). Given the observable factors, we estimate the unconditional factor loadings via OLS and the conditional factor loadings according to Eq. (13). The second set of benchmark models includes statistical factor models based on PCA (Connor and Korajczyk 1988) and IPCA (Kelly et al. 2019a). PCA extracts factors through an eigenvalue decomposition of the covariance matrix of stock returns. However, estimating the covariance matrix of individual stock returns requires a workaround because we have to address the missing data. We follow a large branch of the literature and extract latent factors from grouped data, i.e., portfolios (Kozak et al. 2018). Specifically, we use the same characteristic-managed portfolios as in Kelly et al. (2019a), extract the latent factors from the covariance matrix of portfolio returns, and estimate asset-specific loadings via OLS regression. A PCA equivalent with conditional loadings is IPCA (Kelly et al. 2019a, b), which finds latent factors and $\varvec{\Gamma }_{\beta }$ coefficients by alternating least squares. That is, given an estimate for the factors, IPCA estimates $\varvec{\Gamma }_{\beta }$ as described in Eq. (13) and updates the factor estimates via cross-sectional regressions of characteristic-managed portfolio returns on the estimated $\varvec{\Gamma }_{\beta }$ coefficients.

Table 9 Comparison with benchmark models

Full size table

Table 9 reports the differences in the MSPEs between the RDCFM and the benchmark models. Positive differences indicate that the benchmark models have higher prediction errors. Note that for the empirical factor models, $K = 1,3,\ldots ,6$ refers to the CAPM and the Fama-French 3-, 4-, 5-, and 6-factor models. Because the benchmark models and the RDCFM are non-nested, we test the statistical significance of the MSPE differences by using the cross-sectional version of the Diebold and Mariano (1995) test (Gu et al. 2020; Fieberg et al. 2023). If the RDCFM significantly outperforms a benchmark model at the 1%, 5%, or 10% level, we denote this with asterisks. We find that in every specification, the RDCFM has a significantly lower prediction error than the prevailing mean and zero-return forecasts and outperforms all prevailing benchmark factor models. The differences are particularly large when the RDCFM is compared to models with unconditional loadings. This finding, once again, indicates that models with unconditional factor loadings suffer from overfitting and/or are misspecified in this application.

Table 10 reports the number of free parameters to estimate for the benchmark factor models. Since the empirical factor models use prespecified factors, the additional number of parameters for each additional factor is N or M when estimating unconditional or conditional factor loadings, respectively. In addition to the parameters for the factor loadings, PCA and IPCA estimate T additional parameters for each factor. For the six-factor model, IPCA estimates 3816 parameters, whereas the RDCFM estimates only 320 parameters. Accordingly, the RDCFM achieves parsimonious parametrization by incorporating information from macroeconomic variables, which results in its superior performance compared to IPCA.

Table 10 Number of free parameters in the benchmark factor models

Full size table

4 Conclusion

We propose the RDCFM, a new white-box machine learning method that allows for the simultaneous modeling of the time-series and the cross-section of response variables. By marrying RNNs with economic factor models, our model (1) uses conditioning information included in macroeconomic variables to create common latent factors that drive the common time-series variation in the response variables, (2) allows these factors to follow an internal (nonlinear) VAR process, and (3) uses microeconomic variables to estimate conditional, time-varying loadings on these dynamic conditional factors. Since both factors and factor loadings are linear combinations of observable instrumental variables, we provide a low-parameter specification of dynamic latent factors and time-varying factor loadings, which results in improved parameter stability and generalizability. We demonstrate the use of the RDCFM in an application from the financial economics literature, namely the prediction of stock returns, and find that it outperforms prevailing benchmark factor models.

The RDCFM is not restricted to the conditioning of factors and factor loadings on macroeconomic and microeconomic variables. In a broader sense, the factor instruments may be more accurately described as state variables and the factor loading instruments may be described as object-specific features. Thus, the RDCFM is a general-purpose method that can be applied to solve numerous regression or classification problems—not limited to applications from business studies and economics. The RDCFM’s only empirical restriction is that the response variables and macroeconomic conditions are observable over time. Then, the model features can be enabled and disabled to best suit the particular application.

Even if the RDCFM is already a powerful tool for factor modeling, we see the scope for its possible extensions. The memory of the factors is short-term because we restrict attention to a VAR process of lag order one. The concept of conditional factor loadings may be incorporated into modern neural network architectures that allow for longer memory processes, for example, long short-term memory networks or gated recurrent unit networks. Regarding factor loadings, the estimation of $\varvec{\Gamma }_{\beta }$ may be modified to account for nonlinear relationships (a nonlinear extension to IPCA is proposed by Gu et al. 2021). Furthermore, one may also allow for VAR factor loadings to capture (short-term) autocorrelations across factor loadings.

Notes

For illustration purposes, we refer to Gu et al. (2020). They predict U.S. stock returns by using 94 firm characteristics, 74 indicator variables (i.e., microeconomic variables) and 8 macroeconomic variables. They stack the data and use all 94 characteristics, 74 indicators, and interactions of the characteristic and macroeconomic variables, which results in a total of 920 variables.
Although there are numerous extensions to VAR models to address this problem [e.g., VAR models with exogenous variables (VARX), factor-augmented VAR (FA-VAR) (Bernanke et al. 2005), and panel VAR (PVAR) (Canova et al. 2012)], they are highly parameterized; therefore, data limitations remain a problem. For example, Canova et al. (2012) use a PVAR model to model six time-series of ten European countries. They further include four exogenous macroeconomic variables, which results in a total of 3840 parameters.
Note that, in Eq. (11), we use the sum of squared residuals as the objective function because it is the most commonly used objective function for solving regression problems in the literature. However, any differentiable objective function can be used. Without methodological changes, the objective function can be exchanged or augmented to employ regularization to allow for shrinkage ($L^2$) and sparsity ($L^1$) (Tibshirani 1996; Zho and Hastie 2005). If it is not possible to solve analytically for the $\varvec{\Gamma }_{\beta }$ coefficients after changing the objective function, then one can solve for $\varvec{\Gamma }_{\beta }$ using nonlinear least squares as we demonstrate in “Appendix 1.2” for the binary cross-entropy objective function to solve classification problems. Regarding regularization terms, we do not include $L^1$ or $L^2$ penalties in our empirical application because our dimensionality reduction yields a parsimonious model that—as we show in the empirical part of this paper—does not suffer from overfitting.
Note that we dropped the intercept in Eq. (1). Technically, there is no difference between including or excluding the intercept; therefore, the intercept can easily be included by adding a constant factor to the matrix of latent factors.
Note that the actual choice of the maximal VIF does not qualitatively change the results. Setting the maximum VIF to 2 ensures that the factors are mutually uncorrelated. Stricter thresholds may further reduce the correlation among the factors but may also increase the computation time. We find that a VIF of 2 is a good tradeoff between orthogonality and computational effort.
This is similar to the normalization of eigenvectors in a PCA, which are scaled to be unit vectors. Again, the actual choice of the maximum norm does not significantly affect the results.
Generally, factor models describe not only the first moment but also the second moment, i.e., the variance–covariance matrix of the responses. Instead of computing a full variance–covariance matrix, which requires the estimation of $N(N+1)/2$ free parameters, the variance–covariance matrix can be obtained by the following decomposition (e.g., Moskowitz 2003):
$$\begin{aligned} \varvec{\Sigma }_{t}^{y} = \varvec{B}_{t} \varvec{\Sigma }^{f} \varvec{B}_{t}^{'} + \varvec{D} \end{aligned}$$
where $\varvec{\Sigma }_{t}^{y}$ is the $N \times N$ covariance matrix of the response variables in t, $\varvec{\Sigma }^{f}$ is the $K \times K$ covariance matrix of the factors, and $\varvec{D}$ is an $N \times N$ diagonal matrix of residual variances. For example, Jacobs et al. (2005) show that the assumption of a factor model improves the estimation of the variance–covariance matrix. Since the conditional factor loadings $\varvec{B}_{t}$ are time-varying, the RDCFM models a time-varying covariance matrix without estimating additional parameters.
Due to the way the conditional factors are specified (i.e., factors are conditional on non-lagged instruments), the RDCFM is not only a prediction but also an explanatory model. For the interested (finance) reader we also evaluate the explanatory power of the factor structure for contemporaneous returns in “Appendices 3.1 (in-sample) and 7.2 (out-of-sample)”. However, the analysis of the model’s explanatory power is very finance-specific. Since this paper aims to present a new general purpose method and not its application to financial economics, we only present the results on the predictive performance of the models in the main analysis.
Clark and McCracken (2005) and Clark and West (2007) show that the commonly used Diebold and Mariano (1995) test is undersized in nested models. If we test the relevance of the VAR process, then the DCFM and CFM are nested in the RDCFM and RCFM, respectively. Similarly, if we test for the relevance of additional factors, then the higher-dimensional factor models nest lower-dimensional factor models. Technically, the models with unconditional factor loadings are not nested in models with conditional loadings. Accordingly, in this case, the Clark and West (2007) test may be unsuitable for comparisons, however, in unreported results, we find that the results are qualitatively unchanged when applying the Diebold and Mariano (1995) test.
“Appendix 3.1” reports the in-sample performance of the competing RDCFM variants and 7.2 reports the corresponding out-of-sample measures. The results suggest that the RDCFM generalizes well, indicating that the identified factor structure is stable. The $MSE^{IS}$ of the five-factor RDCFM is 217.8.
Note that we re-estimate $\varvec{\Gamma }_{\beta }$ after obtaining new factor estimates, using the shuffled factor instruments. In unreported results, we find larger declines in $MSE^{IS}$ when we do not re-estimate $\varvec{\Gamma }_{\beta }$. However, the overall ranking does not change.
Note that missing values in the vector of errors and in the matrix of factor loadings are replaced with zeros, i.e., they do not contribute to the overall error flow.
To ensure the equivalence of the recurrent and feedforward system, unfolding in time assumes that the weights for the feedforward system are the same for each layer, i.e., the “shared weights” property.
Note that the derivative of the sigmoid function is $z \left( 1 - z \right)$.

References

Abid I, Ayadi R, Guesmi K, Mkaouar F (2022) A new approach to deal with variable selection in neural networks: an application to bankruptcy prediction. Ann Oper Res 313(2):605–623
Article Google Scholar
Agrawal VV, Yücel Ş (2022) Design of electricity demand-response programs. Manag Sci 68(10):7441–7456
Article Google Scholar
Asness CS, Moskowitz Tobias J, Pedersen LH (2013) Value and momentum everywhere. J Finance 68(3):929–985
Article Google Scholar
Avramov D, Chordia T (2006) Asset pricing models and financial market anomalies. Rev Financ Stud 19(3):1001–1040
Article Google Scholar
Bai J, Wang P (2015) Identification and Bayesian estimation of dynamic factor models. J Bus Econ Stat 33(2):221–240
Article Google Scholar
Bali TG, Cakici N, Tang Y (2009) The conditional beta and the cross-section of expected returns. Financ Manag 38(1):103–137
Article Google Scholar
Bali TG, Engle RF, Tang Y (2017) Dynamic conditional beta is alive and well in the cross section of daily stock returns. Manag Sci 63(11):3760–3779
Article Google Scholar
Banz RW (1981) The relationship between return and market value of common stocks. J Financ Econ 9(1):3–18
Article Google Scholar
Barillas F, Shanken J (2018) Comparing asset pricing models. J Finance 73(2):715–754
Article Google Scholar
Basu S (1983) The relationship between earnings’ yield, market value and return for NYSE common stocks. J Financ Econ 12(1):129–156
Article Google Scholar
Beasley MS (1996) An empirical analysis of the relation between the board of director composition and financial statement fraud. Account Rev 71(4):443–465
Google Scholar
Bernanke B, Boivin J, Eliasz P (2005) Measuring monetary policy: a factor augmented vector autoregressive (FAVAR) approach. Q J Econ 120:387–422
Google Scholar
Bertsimas D, Bjarnadóttir MV, Kane MA, Kryder JC, Pandey R, Vempala S, Wang G (2008) Algorithmic prediction of health-care costs. Oper Res 56(6):1382–1392
Article Google Scholar
Blasques F, Hoogerkamp MH, Koopman SJ, van de Werve I (2021) Dynamic factor models with clustered loadings: forecasting education flows using unemployment data. Int J Forecast 37(4):1426–1441
Article Google Scholar
Boone T, Ganeshan R, Hicks RL, Sanders NR (2018) Can google trends improve your sales forecast? Prod Oper Manag 27(10):1770–1774
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Broyden CG (1970) The convergence of a class of double-rank minimization algorithms. IMA J Appl Math 6(1):76–90
Article Google Scholar
Cakici N, Fieberg C, Metko D, Zaremba A (2023a) Machine learning goes global: cross-sectional return predictability in international stock markets. J Econ Dyn Control 155(104):725
Google Scholar
Cakici N, Fieberg C, Metko D, Zaremba A (2023b) Predicting returns with machine learning across horizons, firm size, and time. J Financ Data Sci 5(4):119–144
Article Google Scholar
Canova F, Ciccarelli M, Ortega E (2012) Do institutional changes affect business cycles? Evidence from Europe. J Econ Dyn Control 36(10):1520–1533
Article Google Scholar
Carhart MM (1997) On persistence in mutual fund performance. J Finance 52(1):57–82
Article Google Scholar
Cecchini M, Aytug H, Koehler GJ, Pathak P (2010) Detecting management fraud in public companies. Manag Sci 56(7):1146–1160
Article Google Scholar
Chou P, Chuang HHC, Chou YC, Liang TP (2022) Predictive analytics for customer repurchase: interdisciplinary integration of buy till you die modeling and machine learning. Eur J Oper Res 296(2):635–651
Article Google Scholar
Clark TE, McCracken MW (2005) Evaluating direct multistep forecasts. Econom Rev 24(4):369–404
Article Google Scholar
Clark TE, West KD (2007) Approximately normal tests for equal predictive accuracy in nested models. J Econom 138(1):291–311
Article Google Scholar
Cochrane JH (2011) Presidential address: discount rates. J Finance 66(4):1047–1108
Article Google Scholar
Cohen MC, Zhang R, Jiao K (2022) Data aggregation and demand prediction. Oper Res 70:2597–2618
Article Google Scholar
Connor G, Korajczyk RA (1988) Risk and return in an equilibrium APT: application of a new test methodology. J Financ Econ 21:1–64
Article Google Scholar
Datar VT, Naik YN, Radcliffe R (1998) Liquidity and stock returns: an alternative test. J Financ Mark 1(2):203–219
Article Google Scholar
Del Negro M, Otrok CM (2008) Dynamic factor models with time-varying parameters: measuring changes in international business cycles. Working paper
Diebold F, Mariano R (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263
Article Google Scholar
Elman J (1990) Finding structure in time. Cogn Sci 14(2):179–211
Article Google Scholar
Erişen E, Iyigun C, Tanrısever F (2017) Short-term electricity load forecasting with special days: an analysis on parametric and non-parametric methods. Ann Oper Res. https://doi.org/10.1007/s10479-017-2726-6
Article Google Scholar
Fama EF, French KR (1992) The cross-section of expected stock returns. J Finance 47(2):427–465
Google Scholar
Fama EF, French KR (1993) Common risk factors in the returns on stocks and bonds. J Financ Econ 33(1):3–56
Article Google Scholar
Fama EF, French KR (2015) A five-factor asset pricing model. J Financ Econ 116(1):1–22
Article Google Scholar
Fama EF, French KR (2018) Choosing factors. J Financ Econ 128(2):234–252
Article Google Scholar
Ferson WE, Harvey CR (1991) The variation of economic risk premiums. J Polit Econ 99(2):385–415
Article Google Scholar
Ferson WE, Harvey CR (1993) The risk and predictability of international equity returns. Rev Financ Stud 6(3):527–566
Article Google Scholar
Fieberg C, Metko D, Poddig T, Loy T (2023) Machine learning techniques for cross-sectional equity returns’ prediction. OR Spectrum 45(1):289–323
Article Google Scholar
Fletcher R (1970) A new approach to variable metric algorithms. Comput J 13(3):317–322
Article Google Scholar
Frazzini A, Pedersen LH (2014) Betting against beta. J Financ Econ 111(1):1–25
Article Google Scholar
Goldfarb D (1970) A family of variable-metric methods derived by variational means. Math Comput 24(109):23–26
Article Google Scholar
Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424
Article Google Scholar
Granger CWJ (1980) Testing for causality. J Econ Dyn Control 2:329–352
Article Google Scholar
Granger CWJ (1988) Causality, cointegration, and control. J Econ Dyn Control 12(2–3):551–559
Article Google Scholar
Greene WH (2003) Econometric analysis, 5th edn. Prentice Hall, Upper Saddle River
Google Scholar
Gross DB, Souleles NS (2002) An empirical analysis of personal bankruptcy and delinquency. Rev Financ Stud 15(1):319–347
Article Google Scholar
Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev Financ Stud 33(5):2223–2273
Article Google Scholar
Gu S, Kelly B, Xiu D (2021) Autoencoder asset pricing models. J Econom 222(1):429–450
Article Google Scholar
Gürtler M, Zöllner M (2023) Heterogeneities among credit risk parameter distributions: the modality defines the best estimation method. OR Spectrum 45(1):251–287
Article Google Scholar
Hernandez Tinoco M, Wilson N (2013) Financial distress and bankruptcy prediction among listed companies using accounting, market and macroeconomic variables. Int Rev Financ Anal 30:394–419
Article Google Scholar
Hou K, Xue C, Zhang L (2015) Digesting anomalies: an investment approach. Rev Financ Stud 28(3):650–705
Article Google Scholar
Hou K, Xue C, Zhang L (2020) Replicating anomalies. Rev Financ Stud 33(5):2019–2133
Article Google Scholar
Jacobs BI, Levy KN, Markowitz HM (2005) Portfolio optimization with factors, scenarios, and realistic short positions. Oper Res 53(4):586–599
Article Google Scholar
Jegadeesh N, Titman S (1993) Returns to buying winners and selling losers: implications for stock market efficiency. J Finance 48(1):65–91
Article Google Scholar
Jungbacker B, Koopman SJ, van der Wel M (2011) Maximum likelihood estimation for dynamic factor models with missing data. J Econ Dyn Control 35(8):1358–1368
Article Google Scholar
Kamakura WA, Yuxing Du R (2012) How economic contractions and expansions affect expenditure patterns. J Consum Res 39(2):229–247
Article Google Scholar
Kelly BT, Pruitt S, Su Y (2019a) Characteristics are covariances: a unified model of risk and return. J Financ Econ 134(3):501–524
Article Google Scholar
Kelly BT, Pruitt S, Su Y (2019b) Instrumented principal component analysis. Working paper
Kim Y, Krishnan R (2015) On product-level uncertainty and online purchase behavior: an empirical analysis. Manag Sci 61(10):2449–2467
Article Google Scholar
Konchitchki Y, Patatoukas PN (2014) Accounting earnings and gross domestic product. J Account Econ 57(1):76–88
Article Google Scholar
Koop G, Pesaran M, Potter SM (1996) Impulse response analysis in nonlinear multivariate models. J Econom 74(1):119–147
Article Google Scholar
Kose MA, Otrok C, Whiteman CH (2003) International business cycles: world, region, and country-specific factors. Am Econ Rev 93(4):1216–1239
Article Google Scholar
Kozak S, Nagel S, Santosh S (2018) Interpreting factor models. J Finance 73(3):1183–1223
Article Google Scholar
Kuppelwieser T, Wozabal D (2023) Intraday power trading: Toward an arms race in weather forecasting? OR Spectrum 45(1):57–83
Article Google Scholar
Lakonishok J, Shleifer A, Vishny RW (1994) Contrarian investment, extrapolation, and risk. J Finance 49(5):1541–1578
Article Google Scholar
Lee DJ, Kim TH, Mizen P (2021) Impulse response analysis in conditional quantile models with an application to monetary policy. J Econ Dyn Control 127(104):102
Google Scholar
Liang D, Lu CC, Tsai CF, Shih GA (2016) Financial ratios and corporate governance indicators in bankruptcy prediction: a comprehensive study. Eur J Oper Res 252(2):561–572
Article Google Scholar
Liew J, Vassalou M (2000) Can book-to-market, size and momentum be risk factors that predict economic growth? J Financ Econ 57(2):221–245
Article Google Scholar
Lintner J (1965) The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev Econ Stat 47(1):13–37
Article Google Scholar
Loyola-Gonzalez O (2019) Black-box vs. white-box: understanding their advantages and weaknesses from a practical point of view. IEEE Access 7:154096–154113
Article Google Scholar
Lütkepohl H, Schlaak T (2019) Bootstrapping impulse responses of structural vector autoregressive models identified through GARCH. J Econ Dyn Control 101:41–61
Article Google Scholar
Ma Y, Ailawadi KL, Gauri DK, Grewal D (2011) An empirical investigation of the impact of gasoline prices on grocery shopping behavior. J Mark 75(2):18–35
Article Google Scholar
Mahbobi M, Kimiagari S, Vasudevan M (2021) Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks. Ann Oper Res 330:609–637
Article Google Scholar
Maio P (2013) Intertemporal CAPM with conditioning variables. Manag Sci 59(1):122–141
Article Google Scholar
Mestekemper T, Kauermann G, Smith MS (2013) A comparison of periodic autoregressive and dynamic factor models in intraday energy demand forecasting. Int J Forecast 29(1):1–12
Article Google Scholar
Moskowitz TJ (2003) An analysis of covariance risk and pricing anomalies. Rev Financ Stud 16(2):417–457
Article Google Scholar
Nagel S (2021) Machine learning in asset pricing. Princeton lectures in finance. Princeton University Press, New Jersey
Newey WK, West KD (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55(3):703–708
Article Google Scholar
Petkova R (2006) Do the Fama-French factors proxy for innovations in predictive variables? J Finance 61(2):581–612
Article Google Scholar
Potter SM (2000) Nonlinear impulse response functions. J Econ Dyn Control 24(10):1425–1446
Article Google Scholar
Povel P, Singh R, Winton A (2007) Booms, busts, and fraud. Rev Financ Stud 20(4):1219–1254
Article Google Scholar
Rosenberg B, Reid K, Lanstein R (1985) Persuasive evidence of market inefficiency. J Portf Manag 11(3):9–16
Article Google Scholar
Ross SA (1976) The arbitrage theory of capital asset pricing. J Econ Theory 13(3):341–360
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986a) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986b) Parallel distributed processing: explorations in the microstructure of cognition, vol 1. MIT Press, Cambridge
Book Google Scholar
Scholdra TP, Wichmann JRK, Eisenbeiss M, Reinartz WJ (2021) Households under economic change: how micro- and macroeconomic conditions shape grocery shopping behavior. J Mark 86:95–117
Article Google Scholar
Shanno DF (1970) Conditioning of quasi-newton methods for function minimization. Math Comput 24(111):647–656
Article Google Scholar
Sharpe WF (1964) Capital asset prices: a theory of market equilibrium under conditions of risk. J Finance 19(3):425–442
Google Scholar
Sims CA (1980) Macroeconomics and reality. Econometrica 48(1):1–48
Article Google Scholar
Sloan RG (1996) Do stock prices fully reflect information in accruals and cash flows about future earnings? Account Rev 71(3):289–315
Google Scholar
Su L, Wang X (2017) On time-varying factor models: estimation and testing. J Econom 198(1):84–101
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Article Google Scholar
van Nguyen T, Zhou L, Chong AYL, Li B, Pu X (2020) Predicting customer demand for remanufactured products: a data-mining approach. Eur J Oper Res 281(3):543–558
Article Google Scholar
Vermeulen EM, Spronk J, van der Wijst N (1993) A new approach to firm evaluation. Ann Oper Res 45(1):387–403
Article Google Scholar
Vermeulen EM, Spronk J, van der Wijst N (1996) Analyzing risk and performance using the multi-factor concept. Eur J Oper Res 93(1):173–184
Article Google Scholar
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356
Article Google Scholar
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Article Google Scholar
Zho H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Methodol) 67(2):301–320
Article Google Scholar
Zhou LL, Ampon-Wireko S, Brobbey EW, Dauda L, Owusu-Marfo J, Tetgoum ADK (2020) The role of macroeconomic indicators on healthcare cost. Healthcare (Basel, Switzerland) 8(2):123
Google Scholar
Zimmermann HG, Grothmann R, Schäfer AM, Tietz C (2005) Modeling large dynamical systems with dynamical consistent neural networks. In: Zimmermann HG, Grothmann R, Schäfer AM, Tietz C (eds) New directions in statistical signal processing: from systems to brain. MIT Press, Cambridge
Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

HSB Hochschule Bremen - City University of Applied Sciences, Bremen, Germany
Christian Fieberg
University of Bremen, Bremen, Germany
Gerrit Liedtke & Thorsten Poddig

Authors

Christian Fieberg
View author publications
You can also search for this author in PubMed Google Scholar
Gerrit Liedtke
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Poddig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thorsten Poddig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Methodological appendix

1.1 Appendix 1.1: Gradient calculation

The BPTT algorithm used to train the RDCFM and its variants is a modified version of the BPTT algorithm presented in Rumelhart et al. (1986b) and Werbos (1988, 1990) because our RDCFM differs in two ways from standard RNNs. First, our nonlinear least squares estimation does not require the estimation of the potentially large weight matrix that maps the hidden states to the responses. Second, we do not apply a tanh transformation to the entire factor process, but only to past factor realizations. However, for the sake of consistency with the neural network literature, we start by describing a BPTT version that applies the tanh transformation to the entire factor process and proceed with notes on the modifications for our implementation. The application of the tanh transformation to the entire factor process involves nonlinear factor extraction. In fact, this option is also available in our implementation, but we use the variant that applies the tanh transformation only to past factor realizations to follow the standard literature on linear factor models as closely as possible. To save space, we do not present an exact derivation of the BPTT algorithm because it is well-documented in the literature (e.g., Werbos 1988, 1990). Instead, our aim is to present an intuitive explanation of the BPTT algorithm and demonstrate how the gradient is calculated recursively.

For presentation purposes, we assume that the construction of the conditional factors in the RDCFM is exactly the same as the construction of the hidden states in standard RNNs. Accordingly, the construction of the latent factors in Eq. (5) changes as follows:

$$\begin{aligned} \varvec{f}_t = \tanh \left( \varvec{z}_{t} \varvec{\Omega } + \varvec{f}_{t-1} \varvec{AR} \right) \end{aligned}$$

(17)

All symbols keep their definitions and dimensions as described in Sect. 2.1. Note that we use the terms hidden states and factors interchangeably. In the forward pass, the factors are obtained as described in Eq. (17), and the RDCFM estimates the response variable by multiplying the factors with the $N \times K$ matrix of contemporaneous factor loadings $\varvec{B}_t$, as described in Eq. (18):

$$\begin{aligned} \hat{\varvec{y}}_t = \varvec{f}_{t} \varvec{B}_{t}' \end{aligned}$$

(18)

To calculate the gradient in a time-dependent system, the system is “unfolded in time”; that is, each time step of the recurrent system (e.g., RNN, RDCFM, and RCFM) is treated as an individual layer in an FFNN (Rumelhart et al. 1986b; Werbos 1990). Once the system is unfolded in time, the error signals of the respective time steps are passed to earlier time steps and the calculated gradients are accumulated over the entire training sequence. Figure 4 presents the backward pass of the error signals in the RDCFM, as specified in Eqs. (17) and (18).

The error backpropagation starts at the last layer, i.e., the last time step of the estimation period, which we denote as $\tau$. As in standard backpropagation, we calculate the derivative of the loss with respect to the hidden states by applying the chain rule. That is, we first differentiate the loss $L_\tau$ with respect to the estimated response vector $\varvec{\hat{y}}_{\tau }$ and then differentiate the estimated outcome with respect to the factors $\varvec{f}_{\tau }$ as follows:

$$\begin{aligned} \frac{\partial L_{\tau }}{\partial \varvec{f}_{\tau }} = \frac{\partial L_{\tau }}{\partial \hat{\varvec{y}}_{\tau }} \frac{\partial \hat{\varvec{y}}_{\tau }}{\partial \varvec{f}_{\tau }} \end{aligned}$$

(19)

With our objective function being the sum of squared residuals, the first term in Eq. (19) is simply $2 \cdot \varvec{e}_{\tau }$, with $\varvec{e}_{\tau }$ denoting the $1 \times N$ vector of residuals. The second term is

$$\begin{aligned} \frac{\partial \hat{\varvec{y}}_{\tau }}{\partial \varvec{f}_{\tau }} = \varvec{B}_{\tau } \text {diag} \left( 1 - \varvec{f}_{\tau }^2\right) \end{aligned}$$

(20)

with diag denoting the operator that transforms a $1 \times K$ vector into a $K \times K$ diagonal matrix. Note that in Eq. (20), the matrix of factor loadings $\varvec{B}_{\tau }$ is time-varying for the RDCFM and DCFM but not for the RCFM and CFM (or standard RNN). Thus, the total error signal at $\tau$, which we denote as $\delta _{\tau }$, is given by Eq. (21)^{Footnote 12}:

$$\begin{aligned} \delta _{\tau } = 2\varvec{e}_{\tau } \varvec{B}_{\tau } \odot \left( 1 - \varvec{f}_{\tau }^2\right) \end{aligned}$$

(21)

Note that multiplication with a diagonal matrix is equivalent to elementwise multiplication, denoted by $\odot$. Equation (21) illustrates how a change in $\varvec{f}_{\tau }$ changes $\hat{\varvec{y}}_{\tau }$, which in turn changes the error. Applying the chain rule once again, we determine how the error changes with respect to changes in $\varvec{\Omega }$:

$$\begin{aligned} \begin{aligned} \frac{\partial L_{\tau }}{\partial \varvec{\Omega }}&= \frac{\partial \varvec{f}_{\tau }}{\partial \varvec{\Omega }}\delta _{\tau } \\&= \varvec{z}_{\tau }^{'} \delta _{\tau } \end{aligned} \end{aligned}$$

(22)

Regarding the last time step only, the algorithm is exactly the same as standard backpropagation for FFNNs. In the case of the DCFM and CFM, the gradient with respect to each time step is accumulated over the entire training sequence according to Eq. 22. However, in an ordered (iterative) network, as in the RDCFM and RCFM, the factors $\varvec{f}_{\tau }$ depend on previous factor realizations $\varvec{f}_{\tau -1}$, which in turn depend on $\varvec{f}_{\tau -2}$, up to the factor initialization $\varvec{f}_0$, as shown in Fig. 4. Unfolding in time means that a second FFNN for the previous time step $\tau -1$ is placed to the left of the layer $\tau$ (see Fig. 4).^{Footnote 13} Both layers are connected via the matrix of VAR coefficients $\varvec{AR}$. The difference between BPTT and standard backpropagation is that in the former, every layer produces an output and thus receives not only the error signal from the subsequent layer but also an error signal from the output layer. Therefore, the total error signal in the factor layer at $\tau - 1$ is the sum of the output error signal, denoted by $\delta _{\tau -1}^{out}$, and the backpropagated (total) error signal from the subsequent layer, denoted by $\delta _{\tau -1}^{AR}$. The backpropagation of the total error signal from the next layer is obtained by differentiating $\varvec{f}_{\tau }$ with respect to $\varvec{f}_{\tau -1}$. The total error signal $\delta _{\tau -1}$ is given by Eq. (23) as follows:

$$\begin{aligned} \delta _{\tau -1} = \underbrace{\frac{\partial \varvec{f}_{\tau }}{\partial \varvec{f}_{\tau -1}} \frac{\partial L_{\tau }}{\partial \varvec{f}_{\tau }}}_{\delta _{\tau -1}^{AR}} + \underbrace{\frac{\partial L_{\tau -1}}{\partial \varvec{y}_{\tau -1}} \frac{\partial \varvec{y}_{\tau -1}}{\partial \varvec{f}_{\tau -1}}}_{\delta _{\tau -1}^{out}} \end{aligned}$$

(23)

The first term of Eq. (23) $\delta _{\tau -1}^{out}$ is calculated equivalently to Eq. (21). The backpropagated error signal is calculated according to Eq. (24):

$$\begin{aligned} \delta _{\tau -1}^{AR} = \left( \varvec{\delta }_{\tau } \varvec{AR}^{'}\right) \odot \left( 1- \varvec{f}_{\tau -1}^2\right) \end{aligned}$$

(24)

Thus, the error is passed backward due to the $\varvec{AR}$ matrix, multiplied by the derivative of the tanh-activation function. The calculation of the total error signal can be simplified to:

$$\begin{aligned} \delta _{\tau -1} = \left[ 2\left( \varvec{e}_{\tau -1} \varvec{B}_{\tau -1}\right) + \left( \delta _{\tau } \varvec{AR}^{'} \right) \right] \odot \left( 1- \varvec{f}_{\tau -1}^2\right) \end{aligned}$$

(25)

This equation holds for any time step $0< t < \tau$. At $t=0$, i.e., the initialization of the factors, the factor layer does not receive an output error signal. Therefore, the total error signal at $t=0$ is simply the backpropagated error. Once the error signals are obtained, the gradient with respect to the free parameters is obtained by accumulating the gradient for each time step. The equations for the gradient with respect to the parameters are provided in Eqs. (26) to (28):

$$\begin{aligned} \frac{\partial L}{\partial \varvec{\Omega }} = \sum _{t=1}^{T} \varvec{z}_t^{'} \delta _t \end{aligned}$$

(26)

$$\begin{aligned} \begin{aligned} \frac{\partial L}{\partial \varvec{AR}}&= \sum _{t=0}^{T} \frac{\partial \varvec{f}_t}{\partial \varvec{AR}} \delta _t \\&= \sum _{t=1}^{T} \varvec{f}_{t-1}^{'} \delta _t \end{aligned} \end{aligned}$$

(27)

$$\begin{aligned} \frac{\partial L}{\partial \varvec{f}_0}&= \delta _0 \end{aligned}$$

(28)

The above equations derive the BPTT algorithm for training an RDCFM with a tanh transformation, applied to the entire factor process. To be more in line with the literature on linear factor models, we do not apply the tanh transformation to the entire factors because this would result in a nonlinear factor model. Therefore, as described in Sect. 2.1, we modify the factor construction of conventional RNNs as follows:

$$\begin{aligned} \varvec{f}_t = \varvec{z}_{t} \varvec{\Omega } + \tanh \left( \varvec{f}_{t-1} \right) \varvec{AR} \end{aligned}$$

(29)

For readability, we define the term $\widetilde{\varvec{f}}_{t-1} = \tanh \left( \varvec{f}_{t-1} \right)$ and rewrite Eq. (29) as follows:

$$\begin{aligned} \varvec{f}_t = \varvec{z}_{t} \varvec{\Omega } + \widetilde{\varvec{f}}_{t-1} \varvec{AR} \end{aligned}$$

(30)

Modifying the factor process in this way requires a slight modification of the BPTT algorithm derived above.

Figure 5 provides an intuitive explanation of how one can imagine what is happening in our application of the RDCFM. For simplicity, we assume that the first time step is $\tau -2$ and that no initialization is required. In the forward pass, factors are constructed from observable instruments $\varvec{z}_{\tau -2}$. These constructed factors produce an output $\hat{\varvec{y}}_{\tau -2}$ and thus an error $L_{\tau -2}$. Similarly, the factors of the subsequent period $\varvec{f}_{\tau -1}$ are conditional on $\varvec{z}_{\tau -1}$ and on the tanh transformed factors $\widetilde{\varvec{f}}_{\tau -2}$. The transformation occurs on a “virtual layer” which only applies the tanh transformation to the raw factors $\varvec{f}_{\tau -2}$. The factors $\varvec{f}_{\tau -2}$ are passed to the virtual layer by multiplication with an identity matrix $\varvec{I}$ of appropriate size. The virtual layer applies the tanh transformation and passes the transformed factors $\tilde{\varvec{f}}_{\tau -2}$ to the next layer, which produces an output $\hat{\varvec{y}}_{\tau -1}$. For the backward pass, the distinction between the virtual layer and factor layer implies that when calculating the output error signal, we do not have to account for the derivative of the tanh activation. Therefore, the total error signal on the factor layer is:

$$\begin{aligned} \delta _{\tau } = 2 \varvec{e}_{\tau } \varvec{B}_{\tau } \end{aligned}$$

(31)

for the last time step $\tau$ and

$$\begin{aligned} \delta _{t} = \underbrace{\Big (2 \varvec{e}_{t} \varvec{B}_{t}\Big )}_{\delta _t^{out}} + \underbrace{\left( \left( \delta _{t+1} \varvec{AR}^{'}\right) \odot \left( 1 - \tanh \left( \varvec{f}_{t} \right) ^2\right) \right) }_{\delta _t^{AR}} \end{aligned}$$

(32)

for any time step $0< t < \tau$. The general equations for the gradient with respect to $\varvec{\Omega }$ and $\varvec{f}_0$ do not change. However, for $\varvec{AR}$ we have to apply the tanh transformation to the past factors:

$$\begin{aligned} \frac{\partial L}{\partial \varvec{AR}} = \sum _{t=1}^{T} \tanh \left( \varvec{f}_{t-1}\right) ^{'} \delta _t \end{aligned}$$

(33)

Pseudocode that presents our implementation of the modified BPTT algorithm is given in Algorithm 1.

1.2 Appendix 1.2: Application to binary classification problems

This appendix demonstrates how the RDCFM algorithm designed for regression problems can be used to solve binary classification problems. The general equations for the latent factors and factor loadings are the same as those for regression problems, i.e.:

$$\begin{aligned} \varvec{f}_t = \varvec{z}_{t} \varvec{\Omega } + \tanh \left( \varvec{f}_{t-1} \right) \varvec{AR} \end{aligned}$$

(34)

and

$$\begin{aligned} \varvec{B}_t = \varvec{C}_t \varvec{\Gamma }_{\beta } \end{aligned}$$

(35)

where $\varvec{f}_t$ denotes a $1 \times K$ vector of factor realizations, $\varvec{z}_{t}$ is a $1 \times L$-dimensional vector of factor instruments, and $\varvec{AR}$ is a $K \times K$ matrix of VAR(1)-coefficients. The $N \times K$ matrix of conditional factor loadings $\varvec{B}_t$ is determined by $\varvec{\Gamma }_{\beta }$, an $M \times K$ matrix of instrument coefficients for the factor loadings, and $\varvec{C}_t$, an $N \times M$ matrix of cross-sectional variables.

Applying Eq. (2) yields a “raw output” $\widetilde{\varvec{y}_t}$:

$$\begin{aligned} \widetilde{\varvec{y}_t} = \varvec{f}_{t} \varvec{B}_{t}^{'} \end{aligned}$$

(36)

The model’s output $\widetilde{\varvec{y}_t}$ is continuous and may take any value below 0 and above 1. To interpret these values as probabilities, we squeeze these values into the interval [0,1] by applying a sigmoid transformation to $\widetilde{\varvec{y}_t}$:

$$\begin{aligned} \hat{\varvec{y}}_t = \frac{1}{1 + \exp \left( -\widetilde{\varvec{y}}_{t}\right) } \end{aligned}$$

(37)

Using these estimates as probabilities, we can classify the response variable into classes using a selected threshold.

Although the general structure of the RDCFM appears to be relatively unchanged, the nonlinearity due to the sigmoid transformation has important implications for the optimization of the parameters. In the regression setting, we solved for $\varvec{\Gamma }_{\beta }$ using linear least squares. However, for classification problems, there is no analytic solution for $\varvec{\Gamma }_{\beta }$ assuming that the factors are known. Accordingly, we have to solve for $\varvec{\Gamma }_{\beta }$ using nonlinear least squares. To solve for the unknown parameters, we minimize the binary cross-entropy:

$$\begin{aligned} BCE = - \sum _{t=1}^{\tau }\sum _{i=1}^{N} y_{i,t} \log \left( \hat{y}_{i,t} \right) + \left( 1 - y_{i,t} \right) \log \left( 1 - \hat{y}_{i,t} \right) \end{aligned}$$

(38)

with $\hat{y}_{i,t}$ denoting the estimated outcome for the i-th response at time t, as defined in Eq. (37).

The gradient calculation described in 5.1 is largely unaffected by the modifications of the RDCFM. However, due to the different objective function, the derivative of the loss with respect to the estimated responses differs from that for regression problems. Furthermore, to map the estimate into the [0,1] interval, a nonlinear sigmoid function is applied. In the BPTT algorithm for regression problems (e.g., Eq. (19)), we obtain the gradient with respect to the factors by first differentiating the loss with regard to the estimated responses $\hat{\varvec{y}}_t$; then, we differentiate the estimate with regard to the factors. Due to the nonlinear sigmoid transformation, we have to first differentiate with respect to the raw estimate $\widetilde{\varvec{y}}_t$:

$$\begin{aligned} \frac{\partial L_t}{\partial \widetilde{\varvec{y}}_t} = \frac{\partial L_t}{\partial \hat{\varvec{y}}_t} \frac{\partial \hat{\varvec{y}}_t}{\partial \widetilde{\varvec{y}}_t} \end{aligned}$$

(39)

resulting in^{Footnote 14}:

$$\begin{aligned} \frac{\partial L_t}{\partial \hat{y}_{i,t}} = - \frac{y_{i,t}}{\hat{y}_{i,t}} + \frac{1-y_{i,t}}{ 1 - \hat{y}_{i,t}} \end{aligned}$$

(40)

$$\begin{aligned} \frac{\partial \hat{y}_{i,t}}{\partial \widetilde{y}_{i,t}} = \hat{y}_{i,t} \left( 1 - \hat{y}_{i,t} \right) \end{aligned}$$

(41)

For the subsequent derivations, we denote the product of Eqs. (40) and (41) as $\varvec{d}_t$, which is of dimension $1 \times N$. The term $\varvec{d}_t$ replaces $2 \varvec{e}_t$ in the previous gradient calculations. Finally, the gradient with respect to $\varvec{\Gamma }_{\beta }$ is obtained as described in Eq. (42):

$$\begin{aligned} \begin{aligned} \frac{\partial L_t}{\partial \varvec{\Gamma }_{\beta }}&= \frac{\partial L_t}{\partial \widetilde{\varvec{y}}_t} \frac{\partial \widetilde{\varvec{y}}_t}{\partial \varvec{\Gamma }_{\beta }} \\&= \left( \varvec{d}_t \varvec{C}_t \right) ^{'} \otimes \varvec{f}_t \end{aligned} \end{aligned}$$

(42)

Algorithm 2 presents a pseudo-algorithm for the gradient calculation for optimizing $\varvec{\Gamma }_{\beta }$.

1.3 Appendix 1.3: Assessing statistical significance of instruments

This appendix describes how we assess the statistical significance of the microeconomic (i.e., factor loading instruments) (“Appendix A.3.1”) and macroeconomic (i.e., factor instruments) (“Appendix 5.3.2”) conditions. Note that while the test of macroeconomic variables is specific to the RDCFM and its variants, the test for assessing the statistical significance of the microeconomic variables can be used in any model with conditional factor loadings, for example, IPCA or instrumented factor models.

1.3.1 Appendix 1.3.1: Testing factor loading instruments

This section describes how to test the overall statistical significance of a factor loading instrument. In the following, the same variable labelling is used as in the main text; that is, we observe an $1 \times N$ vector of response variables $\varvec{y}_{t}$ at time t, an $1 \times K$ dimensional vector of factors $\varvec{f}_{t}$, and an $N \times M$ matrix of factor loading instruments $\varvec{C}_{t}$, where N denotes the number of response variables, K is the number of factors, and M is the number of microeconomic variables.

Our test procedure starts by obtaining an estimate of the $\hat{\varvec{\Gamma }}_{\beta }$ matrix as described in Eq. (43); that is, we create interaction terms of the microeconomic variables and the (estimated) factors, and regress the response variables on these interactions as follows:

$$\begin{aligned} \text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'}) = \left( \sum _{t=1}^{\tau } \varvec{C}_{t}^{'} \varvec{C}_{t} \otimes \varvec{f}_{t}^{'}\varvec{f}_{t}\right) ^{-1}\left( \sum _{t=1}^{\tau } \varvec{y}_{t} \left[ \varvec{C}_{t} \otimes \varvec{f}_{t}\right] \right) ^{'} \end{aligned}$$

(43)

Conceptually, we assess the statistical significance of an instrument by testing whether all coefficients pertaining to a particular instrument are mutually zero (i.e., we test whether all coefficients in a row of $\hat{\varvec{\Gamma }}_{\beta }$ are zero), which can be achieved by using an F-test, which allows to jointly test the significance of parameters. If the null hypothesis that all coefficients pertaining to the m-th instrument are zero cannot be rejected, then the instrument can be considered as insignificant. The F-test is repeated for each instrument, i.e., $m = 1,\dots , M$ times.

Formally, when testing the m-th instrument, we jointly test K hypotheses; therefore, the test can be described as follows: Denote $\varvec{R}_{m}$ as the $K \times \left( M \cdot K \right)$ dimensional restriction matrix of instrument m, we test the null hypothesis:

$$\begin{aligned} H_{0}: \varvec{R}_{m} \text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'}) = \varvec{r} \end{aligned}$$

(44)

against the alternative

$$\begin{aligned} H_{1}: \varvec{R}_{m} \text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'}) \ne \varvec{r} \end{aligned}$$

(45)

with $\varvec{r}$ being a $K \times 1$ vector of values to which the linear combination of coefficients $\varvec{R}_{m} \text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'})$ is constrained. If we test for the overall significance of an instrument, $\varvec{r}$ is a vector of zeros and the restriction matrix consists of zeros and an identity matrix for the respective factor loading instrument. Specifically, the restriction matrix $\varvec{R}_{m}$ is defined as follows:

$$\begin{aligned} \begin{aligned} \varvec{R}_{1}&= \left[ \varvec{I}_{K \times K}, \varvec{0}_{K \times \left( K \cdot (M-1)\right) } \right] \\ \varvec{R}_{2}&= \left[ \varvec{0}_{K \times K}, \varvec{I}_{K \times K}, \varvec{0}_{K \times \left( K \cdot (M-2)\right) } \right] \\&\;\;\vdots \\ \varvec{R}_{M}&= \left[ \varvec{0}_{K \times \left( K \cdot (M-1)\right) }, \varvec{I}_{K \times K} \right] \end{aligned} \end{aligned}$$

Under the assumption that the regressors in Eq. (43), i.e., the factors, are exogenous or stochastic variables with independent and identically distributed (i.i.d.) errors (Greene 2003), an unbiased estimate of the variance–covariance matrix of $\text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'})$ is obtained by:

$$\begin{aligned} \hat{\varvec{\Sigma }} = \hat{\sigma }^2 \left( \sum _{t=1}^{\tau } \varvec{C}_{t}^{'} \varvec{C}_{t} \otimes \varvec{f}_{t}^{'}\varvec{f}_{t}\right) ^{-1} \end{aligned}$$

(46)

with $\hat{\varvec{\Sigma }}$ representing the variance–covariance matrix of $\text {vec}(\hat{\varvec{\Gamma }}_{\beta }^{'})$ and $\hat{\sigma }^2$ being the residual variance of the regression in Eq. (43). The F-statistic is given by:

$$\begin{aligned} F_{m} = \frac{ \left( \varvec{R}_{m} \text {vec}\left( \hat{\varvec{\Gamma }}_{\beta }^{'}\right) - \varvec{r} \right) ^{'} \left( \varvec{R}_{m} \hat{\varvec{\Sigma }} \varvec{R}_{m}^{'} \right) ^{-1} \left( \varvec{R}_{m} \text {vec}\left( \hat{\varvec{\Gamma }}_{\beta }^{'}\right) - \varvec{r} \right) }{K} \sim F(K, (TN)-K) \end{aligned}$$

(47)

The test statistic follows an F-distribution with K and $(TN)-K$ degrees of freedom.

When the i.i.d. assumption is violated, $\hat{\varvec{\Sigma }}$ is biased; therefore, we propose obtaining an estimate for $\hat{\varvec{\Sigma }}$ via a double-bootstrap approach. Specifically, $\varvec{q}^{T}$ is denoted as the T-dimensional vector containing all time indices (i.e., $\varvec{q}^{T} = [1, \dots , T]$) and $\varvec{q}^{N}$ is the index vector for the response variables (i.e., $\varvec{q}^{N} = \left[ 1, \dots , N\right]$). For $b = 1, \dots , B$ bootstraps, we randomly sample over both indices to create B bootstrap samples. That is, we first sample over $\varvec{q}^{T}$ and then over $\varvec{q}^{N}$. The b-th bootstrap sample is denoted by $\widetilde{\varvec{y}}_{t}^{b}$, $\widetilde{\varvec{C}}_{t}^{b}$, and $\widetilde{\varvec{f}}_{t}^{b}$. For each bootstrap, we estimate Eq. (43) using $\widetilde{\varvec{y}}_{t}^{b}$, $\widetilde{\varvec{C}}_{t}^{b}$, and $\widetilde{\varvec{f}}_{t}^{b}$ to obtain the estimate $\text {vec}(\widetilde{\varvec{\Gamma }}_{\beta }^{b})$. Denote $\varvec{G}$ as the $B \times \left( MK\right)$ matrix of $\text {vec}(\widetilde{\varvec{\Gamma }}_{\beta }^{b})$ estimates and $\widetilde{\varvec{G}}$ as the respective demeaned matrix, we obtain the bootstrap-based estimate of $\varvec{\Sigma }$ as follows:

$$\begin{aligned} \widetilde{\varvec{\Sigma }} = \frac{1}{B-1} \widetilde{\varvec{G}}^{'} \widetilde{\varvec{G}} \end{aligned}$$

(48)

Finally, we calculate the test statistic:

$$\begin{aligned} \widetilde{F}_{m} = \frac{ \left( \varvec{R}_{m} \text {vec}\left( \hat{\varvec{\Gamma }}_{\beta }^{'}\right) - \varvec{r} \right) ^{'} \left( \varvec{R}_{m} \widetilde{\varvec{\Sigma }} \varvec{R}_{m}^{'} \right) ^{-1} \left( \varvec{R}_{m} \text {vec}\left( \hat{\varvec{\Gamma }}_{\beta }^{'}\right) - \varvec{r} \right) }{K} \sim F(K, B) \end{aligned}$$

(49)

Algorithm 3 presents a pseudo algorithm of the proposed test based on the double-bootstrap approach.

To demonstrate the performance of the proposed significance test, we next run a Monte Carlo simulation. Specifically, for $S = 10.000$ simulations, we generate data according to an RDCFM as described in Eq. (8):

$$\begin{aligned} \varvec{y}_{t} = \underbrace{\Big ( \varvec{z}_{t}\varvec{\Omega } + \tanh (\varvec{f}_{t-1}) \varvec{AR} \Big )}_{\varvec{f}_t} \underbrace{\Big (\varvec{C}_{t} \varvec{\Gamma }_{\beta }\Big )^{'}}_{\varvec{B}_{t}^{'}} + \varvec{e}_{t} \end{aligned}$$

with all symbols retaining their meaning as in the main text. We assume that the true data-generating process (DGP) is driven by $K = 4$ common factors with parameters $\varvec{\Omega }$, $\varvec{AR}$, $\varvec{f}_{0}$, and $\varvec{\Gamma }_{\beta }$.

The set of simulated factor instruments includes ten (uncorrelated) macroeconomic variables, where the first variable represents a constant and the remaining nine variables are drawn from a multivariate normal distribution with a mean of zero and standard deviation of 0.1; i.e.,

$$\begin{aligned} \varvec{z}_{t} = \sim \mathcal {N}\left( 0, 0.1^2 \cdot \varvec{I}_{L \times L} \right) \end{aligned}$$

with $\varvec{I}_{L \times L}$ denoting an $L \times L$ identity matrix. Additionally, we simulate ten factor loading instruments $\varvec{c}_{i,t}$, which are assumed to follow a near unit root AR(1) process:

$$\begin{aligned} \varvec{c}_{i, t} = 0.99 \cdot \varvec{c}_{i, t-1} + \varvec{u}_{i, t} \end{aligned}$$

with $\varvec{u}_{i, t}$ denoting the $M \times 1$ vector of innovations in factor loading instruments, drawn from a normal distribution:

$$\begin{aligned} \varvec{u}_{i, t} = \sim \mathcal {N}\left( 0, 0.1^2 \cdot \varvec{I}_{M \times M} \right) \end{aligned}$$

with $\varvec{I}_{M\times M}$ denoting an $M \times M$ identity matrix.

To avoid arbitrariness in the generation of the model parameters $\varvec{\Omega }$, $\varvec{AR}$, $\varvec{f}_{0}$, and $\varvec{\Gamma }_{\beta }$, we assume that they are all drawn from a normal distribution with a mean zero and a standard deviation of 0.1. To analyze whether the test correctly identifies the truly relevant microeconomic variables, we assume that the $\varvec{\Gamma }_{\beta }$ coefficients of the first five instruments are zero, while the remaining coefficients are nonzero. Once we obtain returns as described above, we add noise to the returns so that the $R^2$ corresponds to $\approx 33\%$. This setting is quite arbitrary but representative for many empirical studies.

In total, assuming the DGP described above, we generate $S=10.000$ times data for $T=500$ observations, $N = 1.000$ response variables, $L = 10$ macroeconomic variables and $M = 10$ microeconomic variables, assuming $K=4$ factors.

Since the proposed test is not specific to the RDCFM or DCFM but can be applied to any model with conditional factor loadings, our performance evaluation is twofold. First, we obtain the true factors and add i.i.d. noise to them. Second, we estimate the RDCFM to obtain estimates for $\hat{\varvec{f}}_{t}$ and $\hat{\varvec{\Gamma }}_{\beta }$ that serve as inputs for the significance test described above.

Table 11 reports the results of the Monte-Carlo simulation. Considering the test assuming i.i.d. errors in factors first, we observe that the type-I error of the F-test is in line with the assumed significance level of 5%. That is, the false discovery rates are within the range between 3.89 and 4.62%. At the same time, the test has a low type-II error, meaning that it correctly identifies the relevant microeconomic variables in more than 98% of all simulations. We can thus summarize that the F-test reliably identifies the true instruments if the factors have i.i.d. errors. In comparison, the type-I error of the bootstrap-based test is considerably below the assumed 5% level. However, the power is also reduced, although the test identifies the true factor loading instruments in more than 95% of all simulations.

Table 11 Simulation results for factor loading instruments

Full size table

Next, we turn to the scenario in which the factor estimates are obtained from the RDCFM. The results suggest that the errors are no longer i.i.d., as indicated by the high rejection rates for instruments one to five. The F-test thus does not meet the assumed significance level, making it unusable for identifying the true factor loading instruments in the RDCFM. In contrast, the bootstrap-based test still has a lower type-I error than expected under the null hypothesis; however, the power does not suffer from this. In more than 98% of all simulations, the true drivers are identified. In summary, the bootstrap-based test, on the one hand, has a low error in erroneously rejecting the null hypothesis, and on the other hand, reliably identifies the true relevant instruments. Consequently, there is a high probability that instruments identified as relevant (irrelevant) are actually relevant (irrelevant).

1.3.2 Appendix 1.3.2: Testing factor instruments

This section describes how to test the overall statistical significance of a factor instrument. Since the relationship between response variables and factor instruments extends over at least two levels (i.e., from response variables to factors and from factors to factor instruments) and because lagged information may also be relevant due to the nonlinear $\varvec{AR}$ process, directly testing the coefficients of the $\varvec{\Omega }$ matrix using an F-test is impossible. Therefore, our testing procedure is based on the idea that if a factor instrument is relevant for modeling the response variables, then it should contribute to reducing the model’s error. We start our test by first estimating the RDCFM over the entire sample period to obtain the parameter estimates denoted by $\hat{\varvec{\Omega }}$, $\hat{\varvec{AR}}$, $\hat{\varvec{f}}_{0}$, and $\hat{\varvec{\Gamma }}_{\beta }$ (see the respective in-sample performance measures in Table 13). To analyze whether the l-th factor instrument is relevant, we set its observations to the time-series mean and restart the optimization. Note that we use the parameter estimates $\hat{\varvec{\Omega }}$, $\hat{\varvec{AR}}$, $\hat{\varvec{f}}_{0}$, and $\hat{\varvec{\Gamma }}_{\beta }$ as the starting solution to avoid starting the optimization from a completely different local minimum. If setting all observations of $\varvec{z}_{l}$ to the mean does not significantly increase the model’s error—or even reduces it—then the instrument can be considered as irrelevant.

Let $\hat{y}_{i, t}$ denote the model’s estimate of the response variable and $\hat{y}_{i, t}^{-l}$ is the estimate for which the l-th factor instrument is set to its historical mean. The respective errors are represented by $\hat{e}_{i, t}$ and $\hat{e}_{i, t}^{-l}$. Under the null hypothesis that the l-th instrument is irrelevant, we expect to observe indistinguishable errors. That is, we test:

$$\begin{aligned} H_{0}: \Bigl ( \hat{e}_{i, t}^{-l}\Bigr )^2 - \Bigl ( \hat{e}_{i, t}\Bigr )^2 = 0 \end{aligned}$$

(50)

against the alternative that the l-th instrument reduces the model error

$$\begin{aligned} H_{1}: \Bigl ( \hat{e}_{i, t}^{-l}\Bigr )^2 - \Bigl ( \hat{e}_{i, t}\Bigr )^2 > 0 \end{aligned}$$

(51)

Instead of testing the set of hypotheses based on individual errors, we follow Gu et al. (2020) and obtain a time-series of average error differences to account for cross-sectionally correlated errors, i.e.,

$$\begin{aligned} \Delta e_{t} = \frac{1}{n_{t}}\sum _{i = 1}^{n_{t}} \Biggl [ \Bigl ( \hat{e}_{i, t}^{-l}\Bigr )^2 - \Bigl ( \hat{e}_{i, t}\Bigr )^2 \Biggr ] \end{aligned}$$

(52)

with $n_{t}$ being the number of non-missing response variables at time t. In the subsequent Monte Carlo simulation, we test three methods to calculate the test statistics. First, following Diebold and Mariano (1995) and Gu et al. (2020), the test statistic is

$$\begin{aligned} t = \frac{\bar{\Delta e}}{se_{\Delta e} } \sim t \left( T-1\right) \end{aligned}$$

(53)

with $\bar{\Delta e}$ denoting the mean of $\Delta e_{t}$ and $se_{\Delta e}$ being the standard error of the test statistic, i.e., $\sqrt{ \sigma _{\Delta e}^2/T}$, where $\sigma _{\Delta e}^2$ is the variance of $\Delta e_{t}$. Alternatively, we correct the error differences as proposed in Clark and West (2007) for nested models. We refer to “Appendix 5.4” for details on the implementation. Last, we obtain an estimate of $se_{\Delta e}$ using a time-series bootstrap. That is, for $b = 1, \dots , B$ bootstraps, we randomly sample over $\Delta e_{t}$, obtain the mean error difference $\bar{\Delta e}^{b}$ and calculate the $se_{\Delta e}$ as the standard deviation of all $\bar{\Delta e}^{b}$.

The performance of the factor instrument test is shown in the following Monte Carlo simulation. Specifically, for $S = 10.000$ simulations, we generate $T=500$ observations, $N = 1.000$ response variables, $L = 10$ macroeconomic variables and $M = 10$ microeconomic variables. The true DGP is driven by a 4-factor RDCFM with parameters $\varvec{\Omega }$, $\varvec{AR}$, $\varvec{f}_{0}$, and $\varvec{\Gamma }_{\beta }$.

As described for the test of factor loading instruments, we assume that the factor instruments are uncorrelated and drawn from a normal distribution and that the factor loading instruments $\varvec{c}_{i,t}$ follow a near unit root AR(1) process. The parameters $\varvec{\Omega }$, $\varvec{AR}$, $\varvec{f}_{0}$, and $\varvec{\Gamma }_{\beta }$ also come from a normal distribution. To analyze whether the test correctly identifies the truly relevant macroeconomic variables, we assume that the first factor instrument is an uninformative constant, the $\varvec{\Omega }$ coefficients of instruments two to five are nonzero, while the remaining coefficients are zero. Once we obtain returns as described above, we add noise to the returns so that the $R^2$ corresponds to $\approx 33\%$.

Table 12 reports the rejection rates of the test. First, we observe that both the type-I and type-II error rates for the bootstrap-based, Diebold and Mariano (1995), and Clark and West (2007) test are on par. The first line shows the test for the constant factor instrument and serves only as a plausibility check: Since instruments are set to their mean value, no rejection is to be expected for the constant. The rows for instruments six to ten show that the type-I error is considerably below the assumed significance level of 5%. However, as for the factor loading instruments, these relatively low rejection rates do not reduce the power of the test. The true factor instruments two to five are identified in more than 98% of all simulations, suggesting that the test correctly identifies the true relevant and irrelevant instruments.

1.4 Appendix 1.4: Cross-sectional Clark and West (2007) test

In our empirical analysis, we use the (Clark and West 2007) test to assess the relevance of a model feature. As discussed in Gu et al. (2020), in stock-level analysis, errors may exhibit strong cross-sectional dependence, which may violate the conditions needed for asymptotic normality of the test statistic. Therefore, we modify the (Clark and West 2007) test and compare the cross-sectional averages of estimation errors instead of individual estimation errors.

Denote $\bar{e}_{(0),t+1}^{2}$ as the cross-sectional average of the squared prediction error of the null (i.e., parsimonious) model and $\bar{e}_{(1),t+1}^{2}$ as the cross-sectional average of the squared prediction error of the alternative (i.e., larger) model, defined as:

$$\begin{aligned} \bar{e}_{(q),t+1}^{2} = \frac{1}{N_{t+1}} \sum _{i=1}^{N_{t+1}} = \left( y_{i,t+1} - \hat{y}_{(q),i,t+1}\right) ^{2} \end{aligned}$$

(54)

where $\bar{e}_{(q),t+1}^{2}$ is the cross-sectional average squared error of model q at $t+1$ and $\hat{y}_{(q),i,t+1}$ is the estimate of the i-th response variable at $t+1$ of the q-th model. We are interested in testing whether the alternative model has a significantly lower error. As discussed in (Clark and West 2007), in nested models, irrelevant additional parameters of the alternative model introduce noise into the model; therefore, the MSE of the alternative model is expected to be larger under the null hypothesis of equal MSEs. (Clark and West 2007) adjust the error of the alternative model $\bar{e}_{(1),t+1}^{2}$ to correct for the upward bias in the MSE. Accordingly, we test the null hypothesis:

$$\begin{aligned} H_{0}: \bar{e}_{(0),t+1}^{2} = \left( \bar{e}_{(1),t+1}^{2} - \text {adj}_{t+1} \right) \end{aligned}$$

(55)

against the alternative:

$$\begin{aligned} H_{1}: \bar{e}_{(0),t+1}^{2} > \left( \bar{e}_{(1),t+1}^{2} - \text {adj}_{t+1} \right) \end{aligned}$$

(56)

with $\text {adj}_{t+1}$ denoting an adjustment term defined as the cross-sectional average of the squared differences in the model estimates:

$$\begin{aligned} \text {adj}_{t+1} = \frac{1}{N_{t+1}} \sum _{i=1}^{N_{t+1}} \left( \hat{y}_{(0),i,t+1} - \hat{y}_{(1),i,t+1}\right) ^{2} \end{aligned}$$

(57)

Table 12 Simulation results for factor instruments

Full size table

Note that the (Clark and West 2007) test reduces to the (Diebold and Mariano 1995) test if the adjustment term is set to zero. (Clark and West 2007) show that testing for equal MSE can be accomplished by regressing the adjusted error difference $\bar{e}_{(0),t+1}^{2} - \left( \bar{e}_{(1),t+1}^{2} - \text {adj}_{t+1} \right)$ on a constant. The null hypothesis can be rejected if the t-statistic of the constant exceeds +1.282, +1.645, or +2.326 for a 10%, 5%, or 1% one-sided test, respectively. To account for a possible autocorrelation of the errors, we use (Newey and West 1987) corrected standard errors with six lags.

Appendix 2: Variable description

1.1 Appendix 2.1: Factor instruments

First, we include the five factors from the Fama-French 5-factor model, i.e., the market (Mkt) factor, size-based small-minus-big (SMB) factor, book-to-market-based high-minus-low (HML) factor, profitability-based robust-minus-weak (RMW) factor, and investment-based conservative-minus-aggressive (CMA) factor, (Fama and French 2015). Second, we add a momentum-based winner-minus-loser (WML) factor (Carhart 1997; Fama and French 2018). Third, we include the two mispricing factors of ?, i.e., management (MGMT) and performance (PERF), and finally, we consider alternative specifications of the profitability and value factors, that is, a profitability factor based on return on equity (Hou et al. 2015) (ROE) and a monthly book-to-market-based high-minus-low (HMLm) factor (Barillas and Shanken 2018).

1.2 Appendix 2.2: Factor loading instruments

As in Kelly et al. (2019a), we condition the factor loadings on a constant characteristic and 36 market- and accounting-based firm characteristics $\varvec{C}_{t}$, i.e., market beta (beta), bid-ask spread (bidask), assets-to-market (a2me), total assets (assets), sales-to-assets (ato), book-to-market (bm), cash-to-short-term-investment (c), capital turnover (cto), capital intensity (d2a), ratio of change in property, plant and equipment to the change in total assets (dpi2a), earnings-to-price (e2p), fixed costs-to-sales (fc2y), cash flow-to-book (freecf), idiosyncratic volatility with respect to the Fama-French 3-factor model (idiovol), investment (investment), leverage (lev), market capitalization (mktcap), turnover (turn), net operating assets (noa), operating accruals (oa), operating leverage (ol), price-to-cost margin (pcm), profit margin (pm), gross profitability (prof), Tobin’s Q (q), price relative to its 52-week high (w52h), return on net operating assets (rna), return on assets (roa), return on equity (roe), 12-month momentum (mom), intermediate momentum (intmom), short-term reversal (strev), long-term reversal (ltrev), sales-to-price (s2p), the ratio of sales and general administrative costs to sales (sga2s), and unexplained volume (suv).

Appendix 3: Additional results

1.1 Appendix 3.1: In-sample results

This section reports the in-sample results for the RDCFM and its variants. In-sample results are based on estimation of parameters using the entire data set; however for comparison purposes, we evaluate the performance using only data from the out-of-sample period (i.e., $t > 120$). We assess the in-sample performance using the mean squared error (MSE) because it is directly targeted by the objective function as described in Eq. (11). We define the MSE$^{IS}$ as:

$$\begin{aligned} MSE^{IS} = \frac{1}{T-1} \sum _{t=1}^{T-1} \frac{1}{N_{t+1}} \sum _{i=1}^{N_{t+1}} \left( y_{i,t+1} - \hat{\varvec{f}}_{t+1} \varvec{\beta }_{i,t} \right) ^2 \end{aligned}$$

(58)

with $N_{t+1}$ denoting the number of response variables in $t+1$ and $\hat{\varvec{f}}_{t+1}$ representing the estimate of the conditional factors in $t+1$:

$$\begin{aligned} \hat{\varvec{f}}_{t+1} = \varvec{z}_{t+1} \varvec{\Omega } + \tanh \left( \hat{\varvec{f}}_{t} \right) \varvec{AR} \end{aligned}$$

(59)

The MSE$^{IS}$ quantifies the in-sample explanatory power of the factor structure for contemporaneous returns. Analogous to the out-of-sample results reported in Table 2, we define an in-sample MSPE$^{IS}$ by estimating the contemporaneous factor realizations using only the mean of the factor instruments.

Table 13 In-sample comparison of conditional factor models

Full size table

For the RDCFM reported in Panel A, the MSE$^{IS}$ decreases from 224.3 for the single-factor model to 217.6 for the six-factor model. Compared to the static DCFM in Panel B, the MSE$^{IS}$ of the dynamic RDCFM is always below that of the corresponding DCFM, although the differences are relatively small. Panels C and D present the results for the RCFM and CFM, respectively, which estimate unconditional factor loadings via OLS. Models with unconditional loadings have lower MSE$^{IS}$ values than models with conditional loadings. For instance, the MSE$^{IS}$ of the six-factor RCFM is only 203.6 while that of the six-factor RDCFM is 217.6.

1.2 Appendix 3.2: Further out-of-sample results

This section reports the out-of-sample MSE$^{OOS}$ values, defined analogous to the MSE$^{IS}$ in Eq. (58). The asset-specific factor loadings $\varvec{\beta }_{i,t}$ are based on lagged firm characteristics, i.e., $\varvec{\beta }_{i,t} = \varvec{C}_{t} \varvec{\Gamma }_{\beta }$, whereas the factor estimates $\hat{\varvec{f}}_{t+1}$ are based on the in-sample estimates (i.e., $\varvec{\Omega }$, $\varvec{AR}$, and $\varvec{f}_{t}$) and on the information of the out-of-sample factor instruments $\varvec{z}_{t+1}$. Accordingly, the MSE$^{OOS}$ evaluates the total explanatory power of the factor structure for contemporaneous returns while assuming that the factor instruments and, thus, the factor realizations in $t+1$ are known. Accordingly, MSE$^{OOS}$ indicates how well the identified factor structure generalizes for unseen data. From a technical perspective, the MSE$^{OOS}$ can indicate whether the estimation method for the factor loadings is appropriate for the specific application. Since the uncertainty in the factor estimate is eliminated because we assume that we know the true factor instruments, a high or an increasing MSE$^{OOS}$ suggests overfitting with respect to the estimates of the factor loadings.

Table 14 presents the results. The MSE$^{OOS}$ of the one-factor RDCFM is 217.3 and declines with additional factors to 211.7 when five factors are included. The five-factor RDCFM achieves the lowest MSE$^{OOS}$ among all conditional factor models, although the differences between the RDCFM and DCFM become relatively small when estimating six factors. Although the explanatory power of the models with conditional factor loadings improves with additional factors, the MSE$^{OOS}$ values for the RCFM and CFM increase when estimating two or more factors. Since the factors used to calculate the MSE$^{OOS}$ are based on out-of-sample information and the factor loadings use only in-sample information, this indicates an overfitting effect with respect to the estimation of the factor loadings. As discussed in Sect. 3.3 the parameter space for models with unconditional loadings is massively inflated with additional factors. While the RDCFM and DCFM estimate MK parameters for the factor loadings, the RCFM and CFM require the estimation of NK parameters.

Table 14 Out-of-sample comparison of conditional factor models

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fieberg, C., Liedtke, G. & Poddig, T. Recurrent double-conditional factor model. OR Spectrum (2024). https://doi.org/10.1007/s00291-024-00771-1

Download citation

Received: 16 December 2022
Accepted: 23 May 2024
Published: 02 July 2024
DOI: https://doi.org/10.1007/s00291-024-00771-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Recurrent double-conditional factor model

Abstract

Similar content being viewed by others

Dynamic Factor Models

Historical Consistent Neural Networks: New Perspectives on Market Modeling, Forecasting and Risk Analysis

Dynamic factor models: Does the specification matter?

1 Introduction

2 Methodology

2.1 Recurrent double-conditional factor model

2.2 Relation to recurrent neural networks

2.3 Parameter estimation

3 Empirical application

3.1 Data

3.2 Model estimation and validation

3.3 Empirical results

3.3.1 Analyzing model feature importance

3.3.2 Analyzing importance within model features

3.3.3 Comparison with benchmark models

4 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Methodological appendix

1.1 Appendix 1.1: Gradient calculation

1.2 Appendix 1.2: Application to binary classification problems

1.3 Appendix 1.3: Assessing statistical significance of instruments

1.3.1 Appendix 1.3.1: Testing factor loading instruments

1.3.2 Appendix 1.3.2: Testing factor instruments

1.4 Appendix 1.4: Cross-sectional Clark and West (2007) test

Appendix 2: Variable description

1.1 Appendix 2.1: Factor instruments

1.2 Appendix 2.2: Factor loading instruments

Appendix 3: Additional results

1.1 Appendix 3.1: In-sample results

1.2 Appendix 3.2: Further out-of-sample results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation