Skip to main content
Log in

Forecasting US Unemployment with Radial Basis Neural Networks, Kalman Filters and Support Vector Regressions

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

This study investigates the efficiency of the radial basis function neural networks in forecasting the US unemployment and explores the utility of Kalman filter and support vector regression as forecast combination techniques. On one hand, an autoregressive moving average model, a smooth transition autoregressive model and three different neural networks architectures, namely a multi-layer perceptron, recurrent neural network and a psi sigma network are used as benchmarks for our radial basis function neural network. On the other hand, our forecast combination methods are benchmarked with a simple average and a least absolute shrinkage and selection operator. The statistical performance of our models is estimated throughout the period of 1972–2012, using the last 7 years for out-of-sample testing. The results show that the radial basis function neural network statistically outperforms all models’ individual performances. The forecast combinations are successful, since both Kalman filter and support vector regression techniques improve the statistical accuracy. Finally, support vector regression is found to be the superior model of the forecasting competition. The empirical evidence of this application are further validated by the use of the modified Diebold–Mariano test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The US unemployment rate or civilian unemployment rate represents the number of unemployed as a percentage of the labour force. Labour force data are restricted to people 16 years of age and older, who currently reside in one of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged) and who are not on active duty in the Armed Forces. This is the definition provided by FRED.

  2. We also explored autoregressive terms of other US macroeconomic indicators (e.g. the consumer price index, the industrial production index, M1 money stock) as potential inputs. However, the set of inputs presented in Table 2 gave our NNs the best statistical performance in the test sub-period during our sensitivity analysis.

  3. The MDM test follows the student distribution with T-1 degrees of freedom.

References

  • Barhoumi, K., Darné, O., Ferrara, L., et al. (2010). Are disaggregate data useful for factor analysis in forecasting French GDP? Journal of Forecasting, 29(1–2), 132–144.

    Article  Google Scholar 

  • Bates, J. M., & Granger, C. W. J. (1969). The combination of forecasts. Operational Research Society, 20(4), 451–468.

    Article  Google Scholar 

  • Broomhead, D. S., & Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321–355.

    Google Scholar 

  • Cao, L. J., Chua, K. S., Guan, L. K., et al. (2003). C-ascending support vector machines for financial time series forecasting. In: Computational Intelligence for Financial Engineering Proceedings (pp. 317–323).

  • Chan, K. S., & Tong, H. (1986). On estimating thresholds in autoregressive models. Journal of Time Series Analysis, 7(3), 178–190.

    Article  Google Scholar 

  • Chan, Y. L., Stock, J. H., & Watson, M. W. (1999). A dynamic factor model framework for forecast combination. Spanish Economic Review, 1(2), 91–121.

    Article  Google Scholar 

  • Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), 113–126.

    Article  Google Scholar 

  • Deutsch, M., Granger, C. W. J., Teräsvirta, T., et al. (1994). The combination of forecasts using changing weights. International Journal of Forecasting, 10(1), 47–57.

    Article  Google Scholar 

  • Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics, 13(1), 253–263.

    Google Scholar 

  • Duan, K., Keerthi, S. S., Poo, A. N., et al. (2003). Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing, 51, 41–59.

    Article  Google Scholar 

  • Fernandes, M., Medeiros, M. C., & Scharth, M. (2014). Modeling and predicting the CBOE market volatility index. Journal of Banking & Finance, 40, 1–10.

    Article  Google Scholar 

  • Harvey, D., Leybourne, S., Newbold, P., et al. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281–291.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Hatemi-J, A., & Roca, E. (2006). Calculating the optimal hedge ratio: constant, time varying and the Kalman filter approach. Applied Economics Letters, 13(5), 293–299.

    Article  Google Scholar 

  • Hiemstra, Y. (1996). Linear regression versus back propagation networks to predict quarterly stock market excess returns. Computational Economics, 9(1), 67–76.

    Article  Google Scholar 

  • Huang, S. C., Wang, N. Y., Li, T. Y., Lee, Y. C., Chang, L. F., & Pan, T. H. (2013). Financial forecasting by modified Kalman filters and Kernel machines. Journal of Statistics and Management Systems, 16(2–03), 163–176.

    Article  Google Scholar 

  • Ince, H., & Trafalis, T. B. (2008). Short term forecasting with support vector machines and application to stock price prediction. International Journal of General Systems, 37(6), 677–687.

    Article  Google Scholar 

  • Kapetanios, G., Labhard, V., & Price, S. (2008). Forecast combination and the Bank of England’s suite of statistical forecasting models. Economic Modelling, 25(4), 772–792.

    Article  Google Scholar 

  • Kim, H. S., & Sohn, S. Y. (2010). Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research, 201(3), 838–846.

    Article  Google Scholar 

  • Koop, G., & Potter, S. M. (1999). Dynamic asymmetries in U.S. unemployment. Journal of Business & Economic Statistics, 17(3), 298–312.

    Google Scholar 

  • Liang, F. (2005). Bayesian neural networks for nonlinear time series forecasting. Statistics and Computing, 15(1), 13–29.

    Article  Google Scholar 

  • Lin, C. J., & Teräsvirta, T. (1994). Testing the constancy of regression parameters against continuous structural changes. Journal of Econometrics, 62(2), 211–228.

    Article  Google Scholar 

  • Lu, C. J., Lee, T. S., Chiu, C. C., et al. (2009). Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems, 47(2), 115–125.

    Article  Google Scholar 

  • Milas, C., & Rothman, P. (2008). Out-of-sample forecasting of unemployment rates with pooled STVECM forecasts. International Journal of Forecasting, 24(1), 101–121.

    Article  Google Scholar 

  • Montgomery, A. L., Zarnowitz, V., Tsay, R. S., Tiao, G. C., et al. (1998). Forecasting the U.S. unemployment rate. Journal of the American Statistical Association, 93(442), 478–493.

    Article  Google Scholar 

  • Moshiri, S., Cameron, N. E., Scuse, D., et al. (1999). Static, dynamic, and hybrid neural networks in forecasting inflation. Computational Economics, 14(3), 219–235.

    Article  Google Scholar 

  • Moshiri, S., & Brown, L. (2004). Unemployment variation over the business cycles: A comparison of forecasting models. Journal of Forecasting, 23(7), 497–511.

    Article  Google Scholar 

  • Newbold, P., & Granger, C. W. J. (1974). Experience with forecasting univariate time series and the combination of forecasts. Journal of the Royal Statistical Society, 137(2), 131–165.

    Article  Google Scholar 

  • Olmedo, E. (2014). Forecasting Spanish unemployment using near neighbor and neural net techniques. Computational Economics, 43(2), 183–197.

    Article  Google Scholar 

  • Özkan, F. (2013). Comparing the forecasting performance of neural network and purchasing power parity: The case of Turkey. Economic Modelling, 31, 752–758.

    Article  Google Scholar 

  • Reboredo, J. C., Matías, J. M., Garcia-Rubio, R., et al. (2012). Nonlinearity in forecasting of high-frequency stock returns. Computational Economics, 40(3), 245–264.

    Article  Google Scholar 

  • Rothman, P. (1998). Forecasting asymmetric unemployment rates. The Review of Economics and Statistics, 80(1), 164–168.

    Article  Google Scholar 

  • Sermpinis, G., Dunis, C., Laws, J., Stasinakis, C., et al. (2012). Forecasting and trading the EUR/USD exchange rate with stochastic neural network combination and time-varying leverage. Decision Support Systems, 54(1), 316–329.

    Article  Google Scholar 

  • Shapiro, A. F. (2000). A Hitchhiker’s guide to the techniques of adaptive nonlinear models. Insurance: Mathematics and Economics, 26(2–3), 119–132.

    Google Scholar 

  • Shin, Y., & Ghosh, J. (1991). The pi–sigma network: An efficient higher-order neural networks for pattern classification and function approximation. Proceedings of International Joint Conference of Neural Networks, 1, 13–18.

    Google Scholar 

  • Skalin, J., & Teräsvirta, T. (2002). Modeling asymmetries and moving equilibria in unemployment rates. Macroeconomic Dynamics, 6(2), 202–241.

    Article  Google Scholar 

  • Sundberg, R. (2006). Shrinkage regression. In A. H. El-Shaarawi & W. W. Piergosh (Eds.), Encyclopedia of environmetrics (Vol. 4, pp. 1994–1998). New York: Wiley.

    Google Scholar 

  • Suykens, J. A. K., Brabanter, J. D., Lukas, L., Vandewalle, L., et al. (2002). Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing, 48(1–4), 85–105.

    Article  Google Scholar 

  • Swanson, N. R., & Zeng, T. (2001). Choosing among competing econometric forecasts: Regression-based forecast combination using model selection. Journal of Forecasting, 20(6), 425–440.

    Article  Google Scholar 

  • Szpiro, G. G. (1997). A search for hidden relationships: Data mining with genetic algorithms. Computational Economics, 10(3), 267–277.

    Article  Google Scholar 

  • Tenti, P. (1996). Forecasting foreign exchange rates using recurrent neural networks. Applied Artificial Intelligence, 10(6), 567–581.

    Article  Google Scholar 

  • Teräsvirta, T., Dijk, K. V., & Medeiros, M. C. (2005). Linear models, smooth transition autoregressions, and neural networks for forecasting macroeconomic time series: A re-examination. International Journal of Forecasting, 21(4), 755–774.

    Article  Google Scholar 

  • Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.

    Article  Google Scholar 

  • Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer.

    Book  Google Scholar 

  • Vasnev, A., Skirtun, M., Pauwels, L., et al. (2013). Forecasting monetary policy decisions in Australia: A forecast combinations approach. Journal of Forecasting, 32(2), 151–166.

    Article  Google Scholar 

  • Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the LAD-Lasso. Journal of Business and Economic Statistics, 25(3), 347–355.

    Article  Google Scholar 

  • Wells, C. (1996). The Kalman filter in finance. Dordrecht: Kluwer Academic.

    Book  Google Scholar 

  • Xu, W., Li, Z., Cheng, C., Zheng, T., et al. (2013). Data mining for unemployment rate prediction using search engine query data. Service Oriented Computing and Applications, 7(1), 33–42.

    Article  Google Scholar 

  • Yu, L., & Yao, X. (2013). A total least squares proximal support vector classifier for credit risk evaluation. Soft Computing, 17(4), 643–650.

    Article  Google Scholar 

  • Zhang, G., Patuwo, B. E., Hu, M. Y., et al. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35–62.

    Article  Google Scholar 

  • Zhang, G. P., & Qi, M. (2005). Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, 160(2), 501–514.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charalampos Stasinakis.

Appendices

Appendix 1: NNs’ Structure and Training Characteristics

This appendix section briefly describes the structure of the three traditional NNs used to benchmark the RBFNN. It also includes a summary of the training characteristics of all four NNs.

Firstly, a typical MLP model is shown in Fig. 4.

Fig. 4
figure 4

A single output, fully connected MLP model (bias nodes are not shown for simplicity) where \(x_t ^{[n]}\left( {n=1,2,\ldots ,k+1} \right) \) are the inputs; (including the input bias node) at time \(t\); \(h_t ^{[m]}\left( {m=1,2,\ldots ,j+1} \right) \) are the hidden nodes outputs; \(\hat{{Y}}_t \) is the MLP output (target value); \(u_{jk}, w_{j }\) are the network weights;

figure c
is the transfer sigmoid function \(S\left( x \right) =\frac{1}{1+e^{-x}}\);
figure d
is a linear function \(F\left( x \right) =\sum _i {x_i}\)

The error function to be minimized is:

$$\begin{aligned} E\left( {u_{jk} ,w_j } \right) =\frac{1}{T}\sum _{t=1}^T {\left( {Y_t -\hat{{Y}}_t \left( {u_{jk} ,w_j } \right) } \right) ^{2}} \end{aligned}$$
(18)

Secondly, the simple architecture of an RNN is presented in Fig. 5.

Fig. 5
figure 5

RNN with two nodes in the hidden layer where \(x_t^{[n]} (n=1,2,\ldots ,k+1),u_t^{[1]} ,u_t^{[2]} \) are the RNN inputs at time \(t\) (including bias node); \(\tilde{y}_t \)is the output of the RNN; \(d_t^{[f]} (f=1,2)\) and \(w_t^{[n]} (n=1,2,\ldots ,k+1)\) are the weights of the network; \(U_t^{[f]} ,f=(1,2)\) is the output of the hidden nodes at time \(t\);

figure e
is the transfer sigmoid function \(S\left( x \right) =\frac{1}{1+e^{-x}}\);
figure f
is a linear function \(F\left( x \right) =\sum _i {x_i }\)

The error function to be minimized is:

$$\begin{aligned} E\left( {d_t ,w_t } \right) =\frac{1}{T}\sum _{t=1}^T {\left( {y_t -\tilde{y}_t \left( {d_t ,w_t } \right) } \right) ^{2}} \end{aligned}$$
(19)

Thirdly, Fig. 6 describes the PSN architecture.

Fig. 6
figure 6

A PSN with one output layer, where \(x_{t} (n=1,2,{\ldots },k+1)\) are the model inputs; \(y_t ,\tilde{y}_t \) are the PSN input and output respectively; \(w_{j} (j=1,2\ldots ,k)\) are the adjustable weights (\(k\) is the desired order of the network); the hidden layer activation function: \(h\left( x \right) =\sum _i {x_i}\); the output sigmoid activation function (\(c \) the adjustable term) \(\sigma (x)=\frac{1}{1+e^{-xc}}\)

The error function minimized in this case:

$$\begin{aligned} E\left( {c,w_j } \right) =\frac{1}{T}\sum _{t=1}^T {\left( {y_t -\tilde{y}_t \left( {w_k ,c} \right) } \right) ^{2}} \end{aligned}$$
(20)

Finally, Table 6 summarizes the training characteristics of the four NN architectures used in this forecasting competition.

Table 6 The NNs training characteristics

Appendix 2: Statistical Performance Measures

The statistical performance measures are calculated as shown in Table 7.

Table 7 Statistical Performance Measures and Calculation

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stasinakis, C., Sermpinis, G., Theofilatos, K. et al. Forecasting US Unemployment with Radial Basis Neural Networks, Kalman Filters and Support Vector Regressions. Comput Econ 47, 569–587 (2016). https://doi.org/10.1007/s10614-014-9479-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-014-9479-y

Keywords

Navigation