1 Introduction

Future cash flows are critical for the survival of corporations. Reliable and accurate cash flow forecasting is important for academics and practitioners. For example, the value of a firm could be estimated by the sum of discounted future cash flows generated during its lifetime. One of the primary inputs of this valuation method is future cash flows. Also, when firms have larger accruals, more heterogeneous accounting choices than their peers in the same industry, higher earnings volatility, higher capital intensity, or poorer financial health, financial analysts prefer to provide cash flow forecasts to help their clients to make better investment decisions (Defond and Hung, 2003). Compared with accruals, cash flows are more difficult to be manipulated in earnings management; therefore, they could be used to monitor earnings transparency (McInnis and Collins, 2011). However, it is challenging to model the dynamics of future cash flows, which may be partially attributed to limited data and theory.

Before the 1980s, cash flows were indirectly estimated by deducting accruals and non-cash items from earnings, where measurement errors are inevitable (Drtina and Largay, 1985). Greenberg et al. (1986) use earnings and cash flow (both lagged) as two independent variables to predict cash flows. They suggest that lagged earnings perform better than lagged cash flows, both for one-year ahead and multi-year (2 and 3 years) ahead cash flow predictions. Wilson (1987) designs a multiple regression model which includes some lagged variables as predictors, e.g., earnings, cash flows, capital expenditures, and accruals. After the 1980s, cash flows data became publicly available.Footnote 1 Hence, an increasing number of papers about cash flows emerged. Finger (1994) examines the incremental predictive power of earnings to cash flows. Distinct from the previous research, she uses the unit-root test to examine the stationarity of these two time series. For 75% of the firms, their earnings and cash flows series are nonstationary, i.e., they follow a random walk process. Therefore, it may be better to use the difference instead of the level of earnings and cash flows series in the predictive study.

Based on the assumption of a random walk sales process, Dechow et al. (1998) (hereafter DKW model) develop a model of earnings, cash flows, and accruals. They also investigate the correlations between changes in earnings, cash flows, and accruals and their autocorrelations. The important message of their study is that, unlike earnings, a univariate time-series model is not sufficient to model cash flows, because it omits the other possible predictors, e.g., accruals. They also find that current earnings do outperform current cash flows in predicting future cash flows. Besides the random walk assumption, the DKW model also assumes that earnings and working capital accrual items are constant proportions of sales. Based on these assumptions, earnings could be the optimal predictor of future cash flows. However, because of managerial behaviours and other factors, earnings and working capital accruals may not always have a linear relationship with sales. Under a more empirical framework, Lorek and Willinger (2009) compare the performance of two single variable models in cash flow prediction, using earnings and cash flow of the last period respectively as the predictive variable. The two models are estimated in both cross-sectional and time-series ways, the former of which has a restrictive assumption that the parameters on the predictive variable are constant among firms. Moreover, the performances of the models are compared both in-sample and out-of-sample. They find that using past cash flow as the predictive variable and estimating the model on a firm-specific basis shows a better result in out-of-sample prediction. Ball and Nikolaev (2020) also study the predictive power of earnings for future cash flow with various methods. First, they use different definitions of earnings to construct the predictive variable, finding that. In model estimation, they apply fixed effects in pooled regression, which is expected to perform better than simple cross-sectional regression. Their empirical study support that earnings are better predictors for future cash flow, as long as the earnings are measured as operating cash flows adding working capital accruals. Gordon et al. (2017) also suggest that the results of cash flow prediction models are sensitive to accounting classification choices.

Based on the DKW model, Barth et al. (2001) (hereafter BCN model) decompose earnings into cash flows and accruals. However, unlike DKW who apply the model to the individual firms, BCN assume that the profit-generating processes of the sample firms are homogeneous and use pooled regression estimation method. BCN show that their revised model better fits the empirical data. Lorek and Willinger (2010) compare the BCN model with the DKW model. They suggest that DKW’s firm-by-firm estimation strategy generates more accurate predictions. Cheng and Hollie (1996, 2008) disaggregate cash flows into core and non-core cash flows. The core components are generated from sales, cost of goods sold, and operating and administrative expenses. The non-core components are interest, taxes, and others. Cheng and Hollie find that the core components have higher persistence than the non-core components. Compared with the BCN model, the model with disaggregated cash flow components can improve cash flow prediction. Orpurt and Zang (2009) confirm that disaggregating cash flows may provide more useful information, which is further supported by Farshadfar and Monem (2013a and 2013b). Farshadfar and Monem (2011) focus on accruals and separate them into two components, i.e., discretionary and non-discretionary accruals. The two components of accruals are expected to make different contributions to cash flow prediction. Because discretionary accruals are more persistent, discretionary accounting could be employed to enhance earnings’ predictive power for future cash flows. However, their conclusions need further verification. Bostwick et al. (2016) show that in addition to the BCN model, adding goodwill impairments to the predictor sets also improves the accuracy of cash flow prediction. Nallareddy et al. (2020) also find that disaggregating accruals into sub-components perform better than using aggregated accruals. More importantly, their study shows that the predictive ability of cash flows and accruals can differ over time.

The previous literature above indicates that the cash flow process is complicated. Hence, extra accounting information might be used as exogenous variables to better predict future cash flows. Also, heterogeneity in firms’ business models and operating activities is not considered by the prior studies. Instead of using firm-by-firm estimation or pooled regression estimation, a panel data model which combines time-series and cross-sectional analysis could help resolve this issue. Linear models are commonly used by existing research. Although they are easy to understand, parsimonious, and have predictive power, they are inadequate to capture the nonlinearity of cash flow dynamics. Nonlinear models have more complicated structures and sometimes add computational burdens. However, if the nonlinear models could provide better forecasting performance in empirical applications, it would not be difficult for researchers to make the trade-off between simplicity and accuracy. In addition, the cash flow generating process may vary with a firm’s life cycle. The close link between cash flow patterns and firms’ life cycle stages are well documented in financial accounting textbooks and previous literature. For example, Dickinson (2011) argues that the operating, investing, and financing cash flows behave differently across various life cycle stages. For cash flows from operations, introduction firms may generate negative cash flows due to the lack of customers and experience. However, in the growth and maturity stages, their operating cash flows would become positive and grow at different rates. In the decline stage, the growth rates of firms’ operating cash flows decrease, and the firms may even suffer from negative cash flows again. Although the investing and financing cash flows are not the focus of our study, their patterns may also vary with different life-cycle stages, which can be explained by economic theories such as managerial optimism and pecking order theory. In addition, Hovakimian (2009) suggests that cash flow is associated with a firm’s life cycle, according to the corporate life cycle hypothesis. Firms are normally characterized by low cash flows when they are at their early stage, and they are often able to generate greater cash flows when they become more mature. They argue that firms may experience growing cash flows and decreasing growth opportunities over their life cycle. Therefore, the existing static models could be extended to dynamic models to improve cash flow prediction performance. Also, in terms of selecting the optimal prediction model, there is no consensus regarding the criteria. Both in-sample fitting and out-of-sample prediction should be considered to evaluate one model’s forecasting performance.

This paper attempts to address the above-mentioned research gaps. The main contributions of this paper are as follows. First, based on the DKW model and the BCN model, we introduce panel data models to allow for the heterogeneity in firms’ business activities. Also, motivated by the grey-box model developed by Tan and Li (2002), we suggest potential improvements to incorporate the dynamic and nonlinear components in the panel data models.Footnote 2 The parameters in the static panel data models are treated as time-varying (TV) state variables and there are no linearity restrictions on the TV parameters. The nonlinearity of the parameters is captured by a function of the Padé approximant. Two exogenous variables are employed as input variables that are considered to have explanatory power for the parameter process. Moreover, we apply our models in the U.S. market and show that their prediction performances are better than that of the existing models. Finally, we discuss the long-term cash flow forecasting and the criteria adopted to evaluate the models’ performance. The idea of the vector autoregressive (VAR) model is analogously extended to a nonlinear form, so that our advanced models are applicable in the multiple-period setting. Regarding model selection, although it is difficult to draw a consistent conclusion based on multiple criteria, we could at least approach a balance point.

The structure of this paper is as follows. Section 2 focuses on the model developments that fill in the gaps of the prior literature. In Sect. 3, various cash flow prediction models are empirically applied to the U.S. dataset, and the empirical results are discussed. Section 4 is the concluding remarks.

2 Model specification

2.1 Dynamic cash flow prediction model

In this section, we start with DKW’s model and then describe the reasons why it is critical to propose grey-box models. The DKW model makes two major assumptions. First, sales follow a random walk. Second, costs and working capital accruals (i.e., accounts receivables, accounts payables, and inventory) are constant proportions of sales. Under these assumptions, the best forecast of cash flow is:

$$E_{t} [CF_{t + k} ] = EARN_{t} ,k = 1,2,...,n$$
(1)

where CF denotes net operating cash flow and EARN denotes earnings. In their empirical study, DKW use the firm-specific regression below:

$$CF_{i,t + k} = \gamma_{i,0} + \gamma_{i,1} CF_{i,t} + \gamma_{i,2} EARN_{i,t} + \varepsilon_{i,t}$$
(2)

where \(\gamma_{i,0}\), \(\gamma_{i,1}\) and \(\gamma_{i,2}\) are time-invariant parameters. Individual firms are denoted by the subscript i. DKW’s model suggests that earnings, as a naive predictor, work better than cash flows itself, because earnings take account of the information of accrual terms. However, DKW’s random walk assumption on sales may be too strict. As shown in Sect. 3.3, sales do not necessarily follow a random walk, but have a predictable growth pattern. To generalise the model, we introduce an additional variable \(r_{t}\), which is the growth rate of sales. Re-derive the DKW model as follows. First, net operating cash flows is the difference between cash received and cash paid outFootnote 3:

$$\begin{aligned} CF_{t} & = (SALES_{t} - \Delta AR_{t} ) - (PURCHASE_{t} - \Delta AP_{t} ) \\ &= (SALES_{t} - \Delta AR_{t} ) - (COST_{t} + \Delta INV_{t} - \Delta AP_{t} ) \\& = EARN_{t} - \Delta WC_{t} \\ \end{aligned}$$
(3)
Table 1 Descriptive statistics for the variables in cash flow prediction model

Assume that costs, accounts receivables, accounts payables, and inventory are constant proportions of sales, so are the earnings and working capital accruals:

$$EARN_{t} = \alpha SALES_{t}$$
(4)
$$WC_{t} = \beta SALES_{t}$$
(5)

where \(\alpha\) and \(\beta\) are constants. Define \(r_{t}\) to be the growth rate of sales and the relation holds:

$$SALES_{t} = (1 + r_{t} )SALES_{t - 1}$$
(6)

The recursive relationships for earnings and working capital accruals can be derived as:

$$\begin{aligned} EARN_{t} & = \alpha SALES_{t} = \alpha (1 + r_{t} )SALES_{{t - 1}} \\& = (1 + r_{t} )EARN_{{t - 1}} \\ \end{aligned}$$
(7)
$$\begin{aligned} \Delta WC_{t} &= \beta \Delta SALES_{t} = \beta r_{t} SALES_{{t - 1}} \hfill \\ &= \beta r_{t} (1 + r_{{t - 1}} )SALES_{{t - 2}} = \left( {\frac{{r_{t} }}{{r_{{t - 1}} }} + r_{t} } \right)\Delta WC_{{t - 1}} \hfill \\ \end{aligned}$$
(8)

Therefore, Eq. (3) can be rewritten as:

$$\begin{gathered} CF_{t} = EARN_{t} - \Delta WC_{t} = (1 + r_{t} )EARN_{{t - 1}} - (\frac{{r_{t} }}{{r_{{t - 1}} }} + r_{t} )\Delta WC_{{t - 1}} \hfill \\ \;\;\;\;\;\; = (1 + r_{t} )EARN_{{t - 1}} - (1 + r_{t} )\Delta WC_{{t - 1}} + (\frac{{r_{{t - 1}} - r_{t} }}{{r_{{t - 1}} }})\Delta WC_{{t - 1}} \hfill \\ \;\;\;\;\;\; = (1 + r_{t} )CF_{{t - 1}} + (\frac{{r_{{t - 1}} - r_{t} }}{{r_{{t - 1}} }})\Delta WC_{{t - 1}} \hfill \\ \end{gathered}$$
(9)

Combining (8) and (9), we have the following relationship based on the information set at time \(t\):

$${E}_{t}\left(\left[\begin{array}{c}{CF}_{t+1}\\ {\Delta WC}_{t+1}\end{array}\right]\right)={E}_{t}\left(\left[\begin{array}{cc}1+{r}_{t+1}& \frac{{r}_{t}-{r}_{t+1}}{{r}_{t}}\\ 0& \frac{{r}_{t+1}}{{r}_{t}}(1+{r}_{t})\end{array}\right]\right)\times \left[\begin{array}{c}{CF}_{t}\\ {\Delta WC}_{t}\end{array}\right]$$
(10)

When \(r_{t}\) is zero, i.e., sales follow a random walk, Eq. (10) reduces to the DKW model described by Eq. (1). More importantly, because \(r_{t}\) is not a constant, Eq. (10) shows a dynamic relationship between future cash flows and accrual terms. Equation (10) also implies the heterogeneity of individual firms in predicting cash flows. The parameters estimated for individual firms might be different, which mainly comes from the growth stages they are in. As predicted by Eq. (10), when firms are in the early stages, their sales growth rates are usually high. The difference between the parameters of lagged cash flows and lagged accruals is large, which implies that a small portion of future cash flows is predicted by lagged accruals. As firms become mature, their growth slows down, and the explanatory power of lagged cash flows and lagged accruals would gradually converge. This conjecture is supported by empirical findings in Sect. 3.3. Moreover, at time \(\mathrm{t}\), \({\mathrm{r}}_{\mathrm{t}}\) is observable, while \({\mathrm{r}}_{\mathrm{t}+1}\) is unknown. Based on our relaxed assumption that sales may have a predictable growth pattern, \({\mathrm{r}}_{\mathrm{t}+1}\) could be expressed as a linear or non-linear function of \({\mathrm{r}}_{\mathrm{t}}\). Therefore, Eq. (10) could be reduced to the version which only depends on \({\mathrm{r}}_{\mathrm{t}}\). It implies that the coefficients of \({\mathrm{CF}}_{\mathrm{t}}\) and \(\Delta {\mathrm{WC}}_{\mathrm{t}}\) are essentially governed by \({\mathrm{r}}_{\mathrm{t}}\).

The BCN model suggests that the components of accrual terms have different effects, and therefore it extends Eq. (2) to:

$$\begin{aligned} CF_{{t + 1}} & = \beta _{0} + \beta _{1} CF_{t} + \beta _{2} \Delta INV_{t} + \beta _{3} \Delta AP_{t} + \beta _{4} \Delta AR_{t} \\ & + \beta _{5} DEP_{t} + \beta _{6} AMORT_{t} + \beta _{7} OTHER_{t} + \varepsilon _{{t + 1}} \\ \end{aligned}$$
(11)

where \(DEP\) denotes depreciation,\(AMORT\) denotes amortisation, and \(OTHER\) denotes other accruals. Cheng and Hollie (2008) further extend the BCN model by disaggregating cash flow into components, which involves more regressors. Due to data availability, the number of firms that are eligible for individual estimation may be small. This could be a reason why BCN and Cheng and Hollie use cross-sectional regression rather than comply with the DKW framework that estimates model parameters individually. Empirically, when considering individual effects, panel data models are more appropriate than cross-sectional regression. As seen, Eq. (11) is still a simple, parsimonious, and static model, which can be used as a benchmark.

2.2 Panel data method

In panel data analysis, we face the problem of dealing with data of two dimensions. This study mainly focuses on accounting variables at firm level. Therefore, the time-series dimension of the panel is relatively short due to the low frequency of a firms’ financial information disclosure, but the cross-sectional dimension is large as there are many firms in the market. A question naturally arises that there may exist heterogeneity across different groups and the methods of model estimation should adapt to this situation accordingly. For instance, the fixed effect and random effect models are linear estimators used in the situation when there is heterogeneity only in the intercept term among the groups while the factor loadings on the predictive variables remain homogeneous. Equation (11) then becomes:

$$\begin{aligned} CF_{{i,t + 1}} & = \beta _{{i,0}} + \beta _{1} CF_{{i,t}} + \beta _{2} \Delta INV_{{i,t}} \\& + \beta _{3} \Delta AP_{{i,t}} + \beta _{4} \Delta AR_{{i,t}} + \beta _{5} DEP_{{i,t}} \\ &+ \beta _{6} AMORT_{{i,t}} + \beta _{7} OTHER_{{i,t}} + \varepsilon _{{i,t + 1}} \\ \end{aligned}$$
(12)

Note that subscript i for individual effects only appears in the intercept \(\beta_{i,0}\). If the firms are completely heterogeneous, there is no gain in analysing panel data, and it would be optimal to undertake individual estimation. It is unrealistic to assume homogeneity across firms. First, the sizes and/or business scales of firms vary, which often implies that their financial variables are unlikely to follow the same statistical distribution. Firms with different sizes also have different exposures to business risks. However, companies may at least share some similarities. For example, managers learn from leading firms no matter what industries they specialise in.

Heterogeneity in firm size causes the potential problem that for different firms the unexplained parts of the dependent variable may not be identically distributed. In Eq. (12), this problem is reflected in two aspects: the intercept may differ in level and the error term may have different variances. Difference in intercept would bias the parameter estimation by cross-sectional regression if the individual effects are correlated with the independent variables. In predictive applications, we need to consider and distinguish the individual effects of each firm, otherwise the prediction would be significantly biased. The individual intercepts can be solved by adding dummy variables, such as in Barth, et al. (2001), which is applicable but brings in too many parameters. Alternatively, the intercept term of individual effect can be eliminated by using one of the following two approaches –- demean and first difference.

Before introducing the two methods, it should be noted that the different variances of the error terms also have an impact on the estimation procedure. A regression-based estimation procedure aims to minimise the sum of squared errors or another error norm. Therefore, firms with large variance in the error term tend to dominate the results, especially when the degree of heterogeneity is high. A solution provided in the BCN paper is to deflate all the variables by a size-related variable, such as the total assets or shares outstanding. In this paper, average total asset is used as the deflator.

Demean. If (12) holds, for each individual firm i the following holds too:

$$\begin{gathered} \overline{{CF}} _{i} = \bar{\beta }_{{i,0}} + \beta _{1} \overline{{CF}} _{i} + \beta _{2} \overline{{\Delta INV}} _{i} + \beta _{3} \overline{{\Delta AP}} _{i} \hfill \\ \;\;\;\;\;\; + \beta _{4} \overline{{\Delta AR}} _{i} + \beta _{5} \overline{{DEP}} _{i} + \beta _{6} \overline{{AMORT}} _{i} + \beta _{7} \overline{{OTHER}} _{i} + \bar{\varepsilon }_{i} \hfill \\ \end{gathered}$$
(13)

Deduct (13) from (12), and we have:

$$\begin{aligned} CF_{{i,t + 1}} - \overline{{CF}} _{i} & = \beta _{{i,0}} - \bar{\beta }_{{i,0}} + \beta _{1} (CF_{{i,t}} - \overline{{CF}} _{i} ) \\ & + \beta _{2} (\Delta INV_{{i,t}} - \overline{{\Delta INV}} _{i} ) + \beta _{3} (\Delta AP_{{i,t}} - \overline{{\Delta AP}} _{i} ) \\ & + \beta _{4} (\Delta AR_{{i,t}} - \overline{{\Delta AR}} _{i} ) + \beta _{5} (DEP_{{i,t}} - \overline{{DEP}} _{i} ) \\ & + \beta _{6} (AMORT_{{i,t}} - \overline{{AMORT}} _{i} ) \\ & + \beta _{7} (OTHER_{{i,t}} - \overline{{OTHER}} _{i} ) + \varepsilon _{{i,t + 1}} - \bar{\varepsilon }_{i} \\ \end{aligned}$$
(14)

Removing the group mean from each variable eliminates the individual effect of the intercept term but keeps the other parameters unaffected (actually the demeaning has brought endogeneity into the model, which will be further discussed later). The individual \(\beta_{i,0}\) could be calculated by manipulating Eq. (14) as:

$$\begin{gathered} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } _{{i,0}} = \overline{{CF}} _{{i,t}} - (\beta _{1} \overline{{CF}} _{{i,t - 1}} + \beta _{2} \overline{{\Delta INV}} _{{i,t - 1}} + \beta _{3} \overline{{\Delta AP}} _{{i,t - 1}} \hfill \\ \;\;\;\;\;\; + \beta _{4} \overline{{\Delta AR}} _{{i,t - 1}} + \beta _{5} \overline{{DEP}} _{{i,t - 1}} + \beta _{6} \overline{{AMORT}} _{{i,t - 1}} + \beta _{7} \overline{{OTHER}} _{{i,t - 1}} ) \hfill \\ \end{gathered}$$
(15)

First Difference. An alternative approach is to take the first difference for all the variables, and the model becomes:

$$\begin{aligned} \Delta CF_{{i,t + 1}} & = \beta _{1} \Delta CF_{{i,t}} + \beta _{2} \Delta ^{2} INV_{{i,t}} + \beta _{3} \Delta ^{2} AP_{{i,t}} \\ & + \beta _{4} \Delta ^{2} AR_{{i,t}} + \beta _{5} \Delta DEP_{{i,t}} + \beta _{6} \Delta AMORT_{{i,t}} \\ & + \beta _{7} \Delta OTHER_{{i,t}} + \Delta \varepsilon _{{i,t + 1}} \\ \end{aligned}$$
(16)

where Δ2 denotes the second order difference i.e. Δtt-1. By differencing, the intercept, which is assumed to differ across individuals but to be constant over time, has been eliminated. Using the least squared (LS) method to estimate the parameters in (14) and (16) is unable to solve the endogeneity problem brought by the inclusion of AR term (for details see e.g., Cameron and Trivedi, 2005). Endogeneity causes inconsistency in the estimation of parameters. We apply the general method of moments (GMM) to obtain theoretically consistent estimated parameters. The GMM estimator is applied to Eq. (16), and it takes the form of:

$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta }_{AB} = \left[ {\left( {\sum\limits_{i = 1}^{N} {{\mathbf{\tilde{X}^{\prime}}}_{i} {\mathbf{Z}}_{i} } } \right){\mathbf{W}}_{N} \left( {\sum\limits_{i = 1}^{N} {{\mathbf{Z^{\prime}}}_{i} {\tilde{\mathbf{X}}}_{i} } } \right)} \right]^{ - 1} \left( {\sum\limits_{i = 1}^{N} {{\mathbf{\tilde{X}^{\prime}}}_{i} {\mathbf{Z}}_{i} } } \right){\mathbf{W}}_{N} \left( {\sum\limits_{i = 1}^{N} {{\mathbf{Z^{\prime}}}_{i} {\tilde{\mathbf{y}}}_{i} } } \right)$$
(17)

where \({\tilde{\mathbf{X}}}_{i}\) is a \((T - 2) \times (K + 1)\) matrix with all regressors including the lagged dependent variable for the ith company and \({\tilde{\mathbf{y}}}_{i}\) is a \((T - 2) \times 1\) vector of the ith company’s dependent variable. \({\mathbf{Z}}_{i}\) is a \((T - 2) \times r\) matrix of instrumental variables (IV). \({\mathbf{W}}_{N}\) is weighting matrix. The GMM estimator is used for the first time in cash flow prediction. It is compared empirically with the demean method to see whether this theoretically more consistent model can provide better prediction as well.

2.3 Nonlinear dynamic (grey-box) cash flow prediction model

Equation (10) describes a dynamic model that may better predict future cash flows. This section introduces a nonlinear dynamic model. This model suggests that the parameters’ dynamics are nonlinear and unknown. Re-write Eq. (11) by allowing for time-varying parameters:

$$\begin{aligned} CF_{{i,t + 1}} & = \beta _{{i,t,0}} + \beta _{{i,t,1}} CF_{{i,t}} + \beta _{{i,t,2}} \Delta INV_{{i,t}} \\ &+ \beta _{{i,t,3}} \Delta AP_{{i,t}} + \beta _{{i,t,4}} \Delta AR_{{i,t}} + \beta _{{i,t,5}} DEP_{{i,t}} \\ &+ \beta _{{i,t,6}} AMORT_{{i,t}} + \beta _{{i,t,7}} OTHER_{{i,t}} + \varepsilon _{{i,t + 1}} \\ \end{aligned}$$
(18)

Each parameter in Eq. (18) is controlled by a process:

$$\beta_{i,t,j} = F\left( {z_{t + 1} ,z_{t} } \right),j = 0, \ldots , 7$$
(19)

where \({{\varvec{z}}}_{t+1}\) and \({{\varvec{z}}}_{t}\) are dynamic processes and determine the dynamics of \({\beta }_{i,t,j}\). As shown in Eq. (10), sales growth rate \({r}_{t}\) could serve as one of the proxies for \({{\varvec{z}}}_{t}\). The form of \(F\) is unknown, but we could use a nonlinear approximation function to fit it. There are several options for such functions. For instance, a neural network is considered as a universal approximator since it can approximate any function (Cybenko, 1989). Taylor series and Fourier series could approximate functions with any degree of accuracy. Considering that higher degrees of complexity usually cost efficiency, this paper adopts the Padé approximant (Tan and Li, 2002) for \(F\) because it requires fewer coefficients while maintains sufficient accuracy. Besides sales growth rate, we also use firm age as the input variable of the black-box model. Sales growth and firm age are commonly employed as the proxies for life cycle (e.g., Anthony and Ramesh, 1992). We use the sales growth rate \({r}_{t+1} and {r}_{t}\) as a candidate for \({{\varvec{z}}}_{t+1} and {{\varvec{z}}}_{t}\) as a starting point. \({r}_{t+1}\), however, is not observable at time \(t\). If we assume that \({r}_{t+1}\) is predicted by \({r}_{t}\), either by linear or nonlinear predictive functions, Eq. (19) can be reduced to a function of a single variable \({r}_{t}\) and expressed as:

$${\beta }_{i,t, j}={F(r}_{i,t})=\frac{{a}_{0}+{a}_{1}{r}_{i,t}+{a}_{2}{r}_{i,t}^{2}}{1+{a}_{3}{r}_{i,t}+{a}_{4}{r}_{i,t}^{2}} j= 0, \dots , 7$$
(20)

where a0 to a4 are the coefficients to capture the dependence of the parameter \({\beta }_{i,t, j}\) on \({r}_{t}\). The functional form in Eq. (20) is a Padé approximant of order 2/2 (take the polynomials of the variable \({r}_{t}\) up to order 2, both in the numerator and the denominator). In Eq. (18), we have 8 parameters, and hence there are 40 Padé approximant coefficients in total to be determined in this model. Increasing the order of Padé approximant would improve the precision of data fitting but also inevitably require more coefficients to be estimated. We use order 2/2 to reach an optimal balance between gain and cost. A simple example can illustrate how Eq. (20) capture the dynamics of \({\beta }_{i,t,j}\). Assume that \({\beta }_{i,t,1}\) follows exactly the form of Eq. (10) and a firm’s growth rate decays at a constant rate, e.g., \({r}_{t+1}=0.9{r}_{t}\). Therefore, \({\beta }_{i,t,1}\) in Eq. (20) could be calibrated as (\(1+0.9{r}_{t}\)), which has a dynamic pattern as plotted in Fig. 1.

Fig. 1
figure 1

An example of the dynamics of \({\beta }_{i,t,1}\) in Eq. (20) Note: Assume that the initial growth rate of a firm is 20%

Equation (19) and (20) show an application of a black-box model, the antonym of a white-box model. The latter one is mostly common in physics and engineering disciplines where physical laws are well known and applied without any uncertainty. In social science, however, we need to consider uncertainty, structural changes, etc. Hence, the exact forms of functions describing the complex interactions among variables may not be available. Nevertheless, we may use data and black-box models to approximate the relationships between variables with certain accuracy. In the spirit of Tan and Li (2002), the joint model of Eq. (18) and (19) forms a grey-box system. The black-box model, i.e., Eq. (19), governs the dynamics and the nonlinearity of model parameters of the white-box model, i.e., Eq. (18). Each of the 8 parameters in Eq. (18) takes the function described by Eq. (20), and thus there are 40 Padé approximant coefficients to be estimated. We estimate them by minimising the sum squared prediction errors of all observations. With these coefficients, each parameter \(\beta\) could be calculated accordingly with respect to the level of \({r}_{t}\) or firm age, and then the prediction of cash flows could be obtained.

2.4 Long-Term Prediction

A natural extension of the one-period BCN model is to increase the lag length for multi-period ahead predictions. The main drawback of this option is that the maximum period ahead that can be predicted is critically limited by data availability. Stock valuation is considered as the aggregation of all cash flows that will be received in the future discounted back to the current time. In principle, the cash flows need to be predicted to infinite future. In univariate time series models, it is simple to derive multiple-period models by the recursive results. In multi-variable models, however, the effects of the other variables need to be considered as well. For simplicity, those variables can be treated as exogenous. Alternatively, long-term forecast can be achieved by estimating predictive models of all those variables and recursively forecast them into any period in the future. In linear models, the Vector Autoregressive (VAR) model (Sims, 1980) is the corner stone and has the form:

$${\mathbf{y}}_{i,t} = {\mathbf{\beta y}}_{i,t - 1} + {{\varvec{\upnu}}}_{i,t}$$
(21)

where \({\mathbf{y}}_{i,t}\) is the vector of all relevant variables in the model; \({{\varvec{\upbeta}}}\) is the parameter matrix identifying the predictive system;\({{\varvec{\upnu}}}_{i,t}\) is the disturbance vector for all variables. Once \({{\varvec{\upbeta}}}\) is estimated, all variables could be predicted by the following relationship:

$${\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }}_{i,t + k} = {{\varvec{\upbeta}}}^{k} {\mathbf{y}}_{i,t}$$
(22)

where k denotes the number of periods ahead required to be predicted. VAR is not limited by the length of data samples and hence is a more flexible tool. The dynamic models, either linear or nonlinear, take the form below instead:

$${\mathbf{y}}_{i,t} = {{\varvec{\upbeta}}}_{t} {\mathbf{y}}_{i,t - 1} + {{\varvec{\upnu}}}_{i,t}$$
(23)

Hence, the prediction takes the form of:

$${\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }}_{i,t + k} = \prod\limits_{n = 1}^{k} {{{\varvec{\upbeta}}}_{t + n} } {\mathbf{y}}_{i,t}$$
(24)

2.5 Model performance evaluation

For individual time-series prediction, it is straightforward to compare the performance of different models. In most cases, error-based criteria are commonly used, such as mean squared errors (MSE), mean absolute error (MAE), etc. There are also non-parametric measures. The model that generates smaller prediction errors is preferred. It is not trivial to select one criterion or multiple criteria to evaluate these models, especially in the panel data setting. In panel data, there are many individual firms, which makes the comparison results contradictive. In practice, the results of specific individuals are of more concern than that of the aggregate group, and firm-based specific prediction is of more value.

Usually, it is difficult to judge two models according to the aggregated measure, e.g., SSE of all firms. It is highly likely that one model that produces a smaller SSE performs worse for half of the sample firms. Therefore, the two-dimensional feature of panel data poses another requirement on predictive models in practice, i.e., generality. A good model should fulfil the aggregated accuracy and show superior power for as many individuals as possible. Studies in the early periods, e.g., Ball and Watts (1972), calculate the average rank of each model in fitting each observation or each group in general as a measure to evaluate the models. This paper also adopts the rank measure as a criterion for judging models’ performance.

3 Empirical study of the U.S. market

We study the annual data of the U.S. firms because the U.S. is the world’s largest economic entity and the empirical evidence drawn from the U.S. data could be treated as a benchmark. The variables used in this study are all public financial information, which are directly available in firms’ annual reports. All accounting data are obtained from the WRDS Compustat database. The data cover all the listed firms in the U.S., spanning the period from 1957 to 2013.

Before proceeding to the application of cash flow prediction models, it is helpful to have a general impression about the cash flow process first. There is no doubt that each sample firm has its particular cash flow paths. Larger firms tend to generate higher cash flows and vice versa, which makes the cash flows of different firms incomparable. Therefore, the first step in comparing them is to normalise the cash flow series of each firm. We accomplish it by deflating each firm’s cash flows by their initial positive cash flow observation. Negative observations, if they appear at the beginning of any firm’s cash flow series, are excluded. In such way, every firm has the same starting point, i.e., one unit of cash flow, no matter when it starts to operate. This process is less influenced by specific year effects because the time when each firm starts to enter the sample is diversified. The indicator for time is not the absolute year, e.g., 1987, 1990, or 2015, but is denoted as the number of years ahead of each firms’ beginning time. Thus, firms with various sizes and/or different ages could be compared. The results should therefore be more general than alternatively bringing together observations of the same particular period.

Cash flow disclosure was not compulsory until 1987. The DKW paper attempts to estimate cash flow indirectly from balance sheet and income statement in order to increase the available sample with tolerable measurement and calculation errors. In this study, cash flows before 1987 are estimated indirectly following the DKW approach. Operating cash flow is estimated using the general formula: net income + depreciation and amortisation – changes in non-cash current assets + changes in current liability. Using the data after 1987, the correlation between actual cash flow and estimated cash flow is 0.82. Therefore, it seems that the level of estimation error is tolerable.

3.1 Trend of cash flow levels

There are 21,905 firms with at least one available cash flow observation that is either disclosed or indirectly estimated and 239,835 firm-year cash flow observations in the entire sample. The distribution of available observations of each firm is shown in Fig. 2. The firm with the most observations is IBM that has 56 observations. Most sample firms have less than 20 observations. Both the upper and lower 1 percent of the normalised cash flows are excluded. For each leading period, i.e., the period ahead of the starting time, there is a distribution of cash flow levels. The mean, median and 95% range (from 2.5 to 97.5 percentiles) are shown in Fig. 3. The figure only shows 42 periods ahead of starting time, as the number of sample firms available reduces along with the x-axis. The results beyond 43 periods ahead are based on very few observations (no more than 16), so that they are not meaningful. In the chart, there is an obvious and almost monotonic upward trend for the mean and median, the latter being lower, cash flow series. The mean level of cash flow in period 42 is nearly 52, which implies an annual growth rate of 10%. Similarly, the median level of cash flow in period 42 is 26, implying an annual growth rate of 8%.

Fig. 2
figure 2

The distribution of the number of firms with different sample lengths. Note: There are 21,905 firms with at least one available cash flow observation that is either disclosed or indirectly estimated and 239,835 firm-year cash flow observations in total from 1957 to 2013 for the entire sample

Fig. 3
figure 3

Mean, median and the 2.5–97.5 percentiles of the normalised cash flow series of all firms. Note: For each leading period, i.e., period ahead of the starting time, there will be a distribution of cash flow levels. This figure shows the mean, median and 95% range (from 2.5 to 97.5 percentiles) up to 42 periods ahead of starting time. The results beyond 43 periods ahead are not shown due to very few observations (no more than 16)

However, because of survivorship bias, it is early to conclude that the U.S. firms have high cash flow growth rates. From Fig. 3, it is clearly seen that the cash flow distributions of each period are asymmetric. The asymmetry increases with periods. In 42 years, there are firms whose cash flow increased to 250 times but there is no firm whose cash flow decreased to 250 times. Managers tend to eliminate the possibility of symmetric cash flow distributions. Firms that incur losses one year after another may not be allowed to lose permanently. Hard decisions may be made, e.g., the management might be replaced, go into bankruptcy, or be taken over. If the firms could not recover, they are likely to exit the business. Therefore, it is possible that firms’ cash flows grow permanently to a very high level, but it is impossible that firms’ cash flows decrease permanently. This study conducts a simulated illustration to show the effect of survivorship bias. Assume that cash flow follows a random walk process, and its noise term follows a normal distribution with zero mean and variance of 16:

$$\begin{gathered} CF_{t} = CF_{t - 1} + \varepsilon_{t} \hfill \\ CF_{0} = 1 \hfill \\ \varepsilon_{t} \sim N(0,16) \hfill \\ \end{gathered}$$
(25)

Theoretically, this process has an expectation of 1, i.e., the initial value, in any period. However, a quit rule to the process is set: stop if there are 5 negative numbers in a row. The maximum length for each simulated series is 42. The simulation is run for 10,000 times. The distribution of the simulated sample is illustrated in Fig. 4. There are some similar features between Fig. 3 and 4. The simulated data also shows an increasing pattern for the mean level of cash flow series even though they are generated from a random walk process. These results suggest that survivorship bias has an upward effect on the general conclusion drawn from the sample. The true expectation of cash flow trend may not be as high as shown in Fig. 3.

Fig. 4
figure 4

Mean, median and the 2.5–97.5 percentiles of the simulated cash flow series. The distribution of the simulated sample is illustrated in Fig. 4. The maximum length for the simulation is 42. The simulation is run for 10,000 times the number of sample firms

To take the survivorship bias into account, the survival rates of the firms are calculated. For each period from 1 to 42, the survival rate is calculated by dividing the number of firms whose number of observations of the specified length is available by the number of firms that appear early enough in the sample to provide the required number of observations. For example, there are 17,043 firms providing observations of one year ahead of their initial time. To examine the 1-year survival rate, the denominator is the number of firms of which the first observations appeared before (including) year 2011, which is 18257. Thus, the 1-year survival rate is 93.35%. To calculate the 42-year survival rate, the numerator is the number of firms that provides observations of 42 years ahead of their initial time, i.e., 294, and the denominator is the number of firms that started before (including) 1970, i.e., 2015. The rates indicate the proportion of firms that survives in the list for a certain length of period, and this is depicted in Fig. 5. The survival rates suggest that in 10 years less than half of the U.S. firms remain in the market; in 42 years, less than 15%. Therefore, it should be realised that when calculating the mean of cash flow series with the surviving sample firms, a large proportion of bad cases are not contained in the sample. To make an adjustment, the mean of cash flow series is multiplied by the survival rates, which should be a better way to describe the true unconditional expectation of the cash flow pattern. The adjusted mean series are depicted in Fig. 6, along with the original cash flow mean series. The adjusted mean suggests that it is more appropriate to expect, in general, that a firm’s cash flow could grow to 7.58 times, rather than 52 times, of its original value in 42 years. This implies an annual growth rate of roughly 5%.

Fig. 5
figure 5

Survival rates of U.S. sample firms. Note: This figure depicts the survival rates of the US firms. The rates indicate the proportion of firms that are listed at any time in the market for a certain length of period

Fig. 6
figure 6

The mean cash flow series adjusted for survivorship bias. To describe the true unconditional expectation of the cash flow pattern make an adjustment, the mean of cash flow series is multiplied by the survival rates. Figure 6 plots the adjusted mean cash flow series along with the originally calculated mean cash flow series

3.2 Parameter estimation in the static model using different methods

From this section on, the public available cash flow data are used in the modelling of cash flow. We follow the criteria listed in the BCN paper to exclude observations if they belong to any of the following categories:

  • Financial services firms (SIC codes 6000-6999);

  • Sales less than $10 million;

  • Share price less than $1;

  • Earnings or cash flow in the extreme upper and lower 1 percent of their respective distributions.

This provides a sample of 99,845 firm-year observations. Table 1 provides the descriptive statistics of cash flow, depreciation and amortisation, changes in account receivable, changes in account payable, changes in inventory and other accruals, all variables deflated by average total assets of each firm. On average, cash flow deflated by average total assets is about 0.07 for the sample firms, but the dispersion is very large with a standard deviation of 0.13. There are also special cases in the sample where the minimum and maximum cash flow observations are greater than the firms’ average total assets in magnitude.

The equation to be estimated is:

$$\begin{gathered} CF_{i,t + 1} = \beta_{i,0} + \beta_{1} CF_{i,t} + \beta_{2} \Delta INV_{i,t} + \beta_{3} \Delta AP_{i,t} + \beta_{4} \Delta AR_{i,t} \\ + \beta_{5} DA_{i,t} + \beta_{6} OTHER_{i,t} + \varepsilon_{i,t + 1} \\ \end{gathered}$$
(26)

where \(DA\) denotes depreciation and amortisation. Note the intercept term is assumed to be identical for all firms in the pooled regression. First, to compare with the results in the BCN paper, pooled regression is applied for data between 1987 and 1996. For the rest of the paper, the whole sample is partitioned into two subsamples: data from 1987 to 2005 is used for in-sample estimation, and data from 2006 to 2013 is used for out-of-sample prediction performance comparison. Parameters in Eq. (26) are then estimated using four different methods: pooled regression, demean, first difference, and Arellano-Bond estimator.

The estimated results are summarised in Table 2. Numbers in parentheses are t statistics based on heteroskedastic robust standard error. The second column shows the results for the period between 1987 and 1996. The number of sample observations is 27,630. The results are very close to that of the BCN paper. All the selected variables are both statistically and economically significant and the signs of the parameters are consistent with those reported in the BCN paper. For the rest of the table, estimation period is from 1987 to 2005. Column 3 lists the results of pooled regression, which do not deviate much from that using the shorter period of data. The fourth, fifth and sixth columns are the estimators considering individual effects, where the intercept terms vary across firms but do not stay constant over time. Therefore, the intercept terms are not shown in the table. Column 4 gives the estimation results applying demean method. There is a major difference in the AR parameter, i.e., \(\beta_{1}\), between the results by this method and pooled regression. Pooled regression, which ignores individual effect, tends to bias parameters upwards, therefore the demean method shows that AR parameter is 0.392, much lower than 0.61 in the second column and 0.69 in the third column, both of which are estimated by pooled regression.

Table 2 he cash flow prediction model parameters estimated with panel data methods.

Column 5 provides the results estimated using the first difference of the variables. This method results in a negative autocorrelation in the AR term. The AR parameter shown in column 5 is negative and statistically significant at 0.01 level. The negative parameter is due to endogeneity caused by the first differences and thus it is not consistent. Conclusions drawn from the other parameters are generally consistent with that in the previous 3 columns, except for the depreciation and amortisation term (β5). This term is no long statistically significant when using the first difference estimator. Results of using the Arellano-Bond estimator are reported in column 6, supporting the insignificance of depreciation and amortisation. The Arellano-Bond estimator applies the GMM method, which is implemented by assigning all independent variables as instrumental variables (IV). It is considered that the Arellano-Bond estimator would take account of the endogeneity brought on by taking the first difference for the variables. Therefore, the AR parameter in column 6 is positive in contrast with column 5. It is hard to make a sound conclusion based on the results. However, the results in Table 2 suggest that pooled regression, which is widely used in the extant literature, biases the AR parameter upwards and gives readers the false impression that the cash flows are very persistent. In addition, there is no clear conclusion about whether depreciation and amortisation are significant for the cash flow model.

3.3 Parameters estimation in the grey-box models

The grey-box model does not assume the parameters to follow a linear random process, but it attempts to capture the parameters’ dynamics and heterogeneity by a deterministic function of some exogenous variable. The grey-box model can be written in the form of:

$$\begin{gathered} CF_{i,t + 1} = \beta_{i,t,0} + \beta_{i,t,1} CF_{i,t} + \beta_{i,t,2} \Delta INV_{i,t} + \beta_{i,t,3} \Delta AP_{i,t} + \beta_{i,t,4} \Delta AR_{i,t} \\ + \beta_{i,t,5} DA_{i,t} + \beta_{i,t,6} OTHER_{i,t} + \varepsilon_{i,t + 1} \\ {{\varvec{\upbeta}}}_{{{\mathbf{i,t}}}} {\mathbf{ = F}}(z_{t} ) \\ \end{gathered}$$
(27)

Each parameter is assumed to be a function of variable z and the function is captured by a Padé approximant. In the previous sections, we have shown that sales growth rate might have explanation power for the parameters’ dynamics. Therefore, the lagged sales growth rate is a candidate for variable z:

$${{\varvec{\upbeta}}}_{{{\mathbf{i,t}}}} {\mathbf{ = F}}(r_{t - 1} )$$
(28)

Using the in-sample data, the coefficients in the Padé approximant are estimated by minimising the sum squared error of prediction errors in the model, i.e., Eq. (27). For each parameter, there are 5 coefficients to be determined. These coefficients are used to numerically replicate the unknown functional form of F. The functional form of F can be shown in a graphic form. Figure 7 plots how the 7 parameters in Eq. (27) vary with the lagged sales growth rates. The growth rates take the values from -1 (i.e., sales drop to none) to 1 (i.e., sales double). Sales could grow more than double and have no upper limit in theory. However, it does not occur frequently. All seven parameters show nonlinear patterns. An interesting phenomenon in the charts is that as growth rates of sales get higher, the effects of the predictors tend to decline except the AR term. As a result, the distances between the AR parameter and the others get greater. When the growth rate approaches 0, the parameters on cash flow, changes in accounts payable and changes in accounts receivable converge in absolute values. This finding implies that for mature firms which have relatively lower sales growth rates, the gain of disaggregating earnings into components to predict cash flow becomes smaller than growing firms.

Fig. 7
figure 7

The association of parameter values and sales growth rates by grey-box model. Figure 7 plots how the 7 parameters in Eq. (27) vary with the lagged sales growth rates. The growth rates take the values from  − 1 that means an extreme scenario of sales dropping to zero to 1 that means that sales double

There is one drawback in selecting lagged sales growth rate in the cash flow prediction model: multiple periods ahead prediction requires to predict sales growth rates, which brings in more complexity. Firm age could be an alternative proxy for growth rates. Use it as an input variable of Eq. (28):

$${{\varvec{\beta}}}_{i,t}={{\varvec{F}}(AGE}_{i,t})=\frac{{a}_{0}+{a}_{1}{AGE}_{i,t}+{a}_{2}{AGE}_{i,t}^{2}}{1+{a}_{3}{AGE}_{i,t}+{a}_{4}{AGE}_{i,t}^{2}}$$
(29)

Firms’ growth rates tend to decline as time goes by, which implies a negative relation between firms’ ages and their growth rates. This conjecture is supported by empirical results. Using all sample data, mean growth rate of sales is calculated for each age, which is plotted in Fig. 8. Firm age is calculated as the number of years ahead of that firm’s first observation in the sample because firm age is not available in the database. There is a clear declining trend of mean sales growth rates along with the firm ages. The growth rates gradually drop in the first 10 years. After 20 years, the growth rates remain above 5 percent. After 40 years, the growth rates become spiky, probably due to small sample size. After 60 years, the trend of growth is not known. Age seems to be an appropriate proxy for growth. Moreover, firm age is simpler and requires no prediction.

Fig. 8
figure 8

The relationship between mean sales growth rate and firm age. Fig. 8 depicts the mean growth rate of sales for each firm age. Firm age is calculated as the number of years ahead of that firm’s first observation in the sample

The functions of Eq. (29) are fitted in the same way as lagged growth rates, and the results are plotted in Fig. 9. The parameters do not change monotonically with age. They tend to reach their extreme values in the early ages and then approach some fixed levels after 20 to 30 years. It implies that for firms that are older than 20 years, the grey-box model may be no different from the simple pooled regression model in prediction cash flow.

Fig. 9
figure 9

The evolution of parameters values with increasing firm age by grey-box model (U.S. listed firms). Fig. 9 reports the evolution of parameters values with firm age (up to age of 100) by grey-box model

3.4 In-sample fitness to data of different models

The above sections have presented the estimation results of various models. The performance of models could be examined by comparing their data fitting ability and also out-of-sample performance. This section briefly shows the in-sample results of each model. The in-sample fitness is examined based on the two measures, i.e. mean squared error and average rank.

In general, more complicated models and/or models with more parameters are considered to better fit the in-sample data but there is risk of over-fitting. The most complicated model of this study is the grey-box model. However, the linear panel models have more parameters than the grey-box model for taking account of individual effects. Nonetheless, grey-box model has the advantage in making predictions for firms whose individual effect is not easy to calculate–- consider a firm with only one observation for instance. The models for comparison are the random walk model (Model 1), the theoretical DKW model that says the prediction of future cash flow is current cash flow plus the changes in working capital terms (Model 2), the BCN model estimated by pooled regression (Model 3), the panel model that assumes homogeneous and constant parameters except the intercept term estimated using demean (Model 4), difference (Model 5) and Arellano-Bond estimators (Model 6), the grey-box model using sales growth rates (Model 7) and firm age (Model 8) as additional input variables.

The MSE and average ranks of each model are calculated and shown in Table 3 (Panel A). The results are based on 62,927 firm-year observations. The second row lists the resulting mean squared errors (MSE) and the numbers in the third row are the average ranks of each model. For both measures, a smaller number indicates better performance. The panel Models 4 and 6 have produced lowest MSE as they calculate individual effects for each firm. Model 5, however, has obtained a higher MSE than Model 4 and 6. Recall that the AR parameter estimated by Model 5 is negative due to the bias introduced by taking the first difference of variables, and the in-sample results suggest that the biased estimation might not make proper predictions. Despite that the panel Models 4 and 6 have lower MSE, their average ranks are among the highest tier, only lower than Model 5. Therefore, the panel models may be inferior in data fitting. Another point is that Model 6 using the Arellano-Bond estimator that is considered consistent does not outperform Model 4. Model 3 assumes total homogeneity even for the intercept term and is estimated simply by pooled regression. Although it has higher MSE than the panel models, the lower average rank of Model 3 indicates more general description of the cash flow process, which is inconsistent with the expectation from an econometric perspective because the results estimated by pooled regression without considering individual effects are likely to be biased. Model 1 and 2 predict cash flow in a naive way and are thus selected to be benchmark models. Their MSEs are relatively large compared with the others but their average ranks are lower than Models 3, 4, 5, and 6. It is noteworthy that Model 2 has poorer in-sample data fitting than Model 1. It implies that including accrual terms may not make better prediction than the simple random walk model. The random walk model performs very well based on average rank criterion, second only to Model 8. Model 7 and Model 8 are grey-box models. They fit the data very well. The MSEs of these two models are comparable with that of pooled regression, lower than the models with individual effect. However, the grey-box model with firm age as the input variable has the lowest average rank of all 8 models. Model 7 has the third lowest average rank, only higher than Model 1 and 8.

Table 3 The in-sample fitness and out-of-sample cash flow prediction performance

In summary, the grey-box models generally are the best form of model to fit the data in-sample comparing with other options. Firm age as the black-box input variable seems to work better than sales growth rates. The random walk model, though producing high prediction error, may better describes the cash flow process than some parameterised models. To gain deeper knowledge of the models’ performance, out-of-sample test is conducted.

3.5 Out-of-sample prediction performance of different models

As shown in Table 3 (Panel A), the grey-box model provides promising performance. Not only the MSE of the two grey-box models are as low as that of pooled regression, their average ranks are also among the best models. For practical prediction, out-of-sample examination is more important. The models perform well in-sample may not necessarily extend their superiority to the out-of-sample period. In this particular application, i.e. cash flow prediction, one-period-ahead and multi-period-ahead predictions are both important and useful, therefore, this section will test the two types of predictions separately.

3.5.1 One-period-ahead prediction

The data after 2005 is used for out-of-sample test purpose. For panel models, i.e. Model 4, 5, and 6, the application of them in the out-of-sample period required that the individual intercept values for the target firm are available, which exclude observations of firms that do not appear before 2005.

The MSE and average rank for the whole out-of-sample data are calculated and listed in Table 3 (Panel B). The calculations are based on 17,965 firm-year observations. For each criterion, the best two models are labelled by bold numbers. The panel Model 4, 5, and 6 have higher in-sample MSE. They also have the poorest performance of all the 8 models in the one-period-ahead forecast based on average rank. Model 5 has the poorest prediction. Model 4 has the least MSE among the three panel models but shows the highest average rank of all eight models. Model 6 that applies Arellano-Bond estimator does not appear to be a good predictive model according to both measures. Grey-box models prove their power in this comparison, especially for Model 8 which has both the lowest MSE and the lowest average rank. Model 7 has the second lowest MSE and its average rank is in the middle position of the eight models. The model that has the second-best average rank is the benchmark Model 1, which has the same level of MSE as Model 2. Model 3 has provided medium performance, better than the panel models but worse than the grey-box models.

3.5.2 Multi-period-ahead prediction

Predictions beyond one period are made recursively by extending the predictions of cash flow to all the predictive variables and thus use the predicted variables to make further periods’ predictions. Therefore, the workload of making multi-period-ahead prediction increases as there are 6 predictors in the model which are to be predicted into the future for longer-term use.

In this study, the predictive variables are predicted using the same way, i.e., the same independent variables, model structure, and estimation methods. Model 1 and 2 are exceptions as their multi-period-ahead predictions are simply the last in-sample observation of cash flow or cash flow plus changes in working capital accruals. Grey-box model 7 is not suitable for the multi-period task because the input variable, i.e., sales growth rates, requires more prediction, which adds extra complexity to the model. Therefore, only grey-box model 8 can be utilised and it replaces model 7. There are 7 models in total to compete in multi-period-ahead prediction in out-of-sample setting.

The parameters used for prediction are the in-sample estimated results and the predictors take their initial values in year 2005. Table 3 (Panel C) reports the models’ performance in the multi-period-ahead setting. The predictions are compared with the sample data from 2006 to 2012. Therefore, the models’ performance in the predictions of one to up to seven years ahead could be examined. The results favour the grey-box model as it outperforms all other models based on both criterion. The panel models generate higher MSE than simpler models 1, 2, and 3 and the more complicated grey-box model. It seems that considering individual effects does not help make prediction in practice, which is a counterintuitive conclusion. Model 3 has the second lowest MSE, and Model 1 has the second lowest average rank. They are simple enough but provide good prediction results in practice.

Another comparison is made by excluding the observations in 2006 (to focus on multi-period-ahead results), and the results are shown in the 3rd and 4th rows in Table 3 (Panel C). Grey-box model still performs best in both criteria. The conclusions in general do not change much for the sub-period only except that Model 2 outperform Model 1 when data in 2006 is excluded, which suggests that the DKW assertion that earnings make better prediction for cash flows than cash flows per se is more descriptive in the long run.

3.6 Robustness Check

To check the robustness of the proposed model, two more datasets from different economy, i.e. U.K. and China, are examined. In both markets, we find that grey-box models could provide impressive and encouraging predictions for future cash flow, especially for the shorter future. Simple pooled regression model also offers competitive performance and accurate predictions. Detailed discussions are presented in Appendix. The empirical results shown in the Appendix are supportive to the model’s robustness in different economic environment.

4 Conclusion

This paper proposes a cash flow forecast model which captures the nonlinearity and dynamics of the cash flow process. Our model incorporates heterogeneity across firms in cash flow prediction as we allow for a panel data setting which has both time-series and cross-sectional dimensions. The nonlinearity is captured numerically by a black-box model, and the linear form is captured by a white-box model. Therefore, our model is considered as a grey-box model, and it achieves a good balance between prediction accuracy and model complexity. Moreover, to incorporate the dynamics of cash flows, the parameters of the panel data models are treated as time-varying. No linearity restrictions are imposed on these time-varying parameters.

To evaluate the performance of our new model, we conduct an empirical analysis of the U.S. data and compare various models’ forecasting performance by using multiple criteria. Both in-sample and out-of-sample performances are examined, the latter of which includes not only the one-period-ahead prediction but also multi-period-ahead predictions. The models used in the study are classified into two categories, i.e., panel data models that consider individual effects and grey-box models. In general, our empirical results show that the proposed grey-box model consistently outperforms other models both in-sample and out-of-sample, especially in multi-period-ahead predictions. In particular, we show that our model performs better not only before the global financial crisis, but also during the crisis period.

This paper helps improve our understanding of the cash flow generating process. It suggests the importance of considering the nonlinearity and dynamics of model parameters when predicting cash flows. The empirical results show that sales growth rate and firm age may be responsible for the variability of model parameters and the change in the importance of regressors. In firms’ early stages, their growth rates are high, the predictive power of lagged cash flow is higher than that of accruals. As firms become mature, firms’ growth rates decline. The predictive power of lagged cash flow and accrual terms tend to converge. Our grey-box mode benefits practitioners (e.g., investors, analysts, and auditors) and researchers in the following contexts: (1) to better forecast future cash flows, which are key inputs of valuation models; (2) to provide an example of retaining a simple model structure but allowing for dynamics and nonlinearity of parameters.