# A Bayesian Signals Approach for the Detection of Crises

• Original Article
• Published:

## Abstract

In this paper, we consider the signals approach as an early-warning-system to detect crises. Crisis detection from a signals approach involves Type I and II errors which are handled through a utility function. We provide a Bayesian model and we test the effectiveness of the signals approach in three data sets: (1) Currency and banking crises for 76 currency and 26 banking crises in 15 developing and 5 industrial countries between 1970 and 1995, (2) costly asset price booms using quarterly data ranging from 1970 to 2007, and (3) public debt crises in Europe in 11 countries in the European Monetary Union from the introduction of the Euro until November 2011. The Bayesian model relies on a vector autoregression for indicator variables, and incorporates dynamic factors, time-varying weights in the latent composite indicator and special priors to avoid the proliferation of parameters. The Bayesian vector autoregressions are extended to a semi-parametric context to capture non-linearities. Our evidence reveals that our approach is successful as an early-warning mechanism after allowing for breaks and nonlinearities and, perhaps more importantly, the composite indicator is better represented as a flexible nonlinear function of the underlying indicators.

This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. For an extensive review, see EKS and the references therein.

2. For a variable $$\varvec{x}_{t}$$, $$L^{l} \varvec{x}_{t} : = \varvec{x}_{t - l}$$. Moreover, $$\varvec{A}_{n} \left( L \right): = \varvec{A}_{o} \varvec{f}_{t} + \varvec{A}_{1} \varvec{f}_{t - 1} + \varvec{A}_{2} f_{t - 2} + \cdots + \varvec{A}_{l} \varvec{f}_{t - l}$$, where the matrices $$\varvec{A}_{j}$$ have dimension $$M \times G$$, for $$j = 0,1, \ldots ,l$$.

3. We have $$\varvec{B}\left( L \right)f_{t} = \varvec{B}_{1} \varvec{f}_{t - 1} + \varvec{B}_{2} \varvec{f}_{t - 2} + \cdots$$, and similarly for $$\varvec{C}\left( L \right)$$ and $$\varvec{D}\left( L \right)$$ below.

4. This is, in fact, a gamma prior. Its interpretation is that from a sample of size $$\bar{n}_{\lambda }$$ from $${\mathcal{N}}\left( {0,\lambda^{2} } \right)$$ we obtain a sum of squares equal to $$\bar{q}_{\lambda }$$.

5. Here, we use the $$L_{\infty }$$-norm in $${\Re }^{{d_{\varvec{h}} }}$$.

6. It should be noted that the Bayesian LASSO prior does shrink coefficients toward zero, but it does not identically set certain coefficients to exactly zero.

7. As we have Gaussian hidden units, global approximation results are provided by Hartman et al. (1990).

8. All computations are performed in Fortran 77 making extensive use of netlib and IMSL software libraries. The platform is an Intel® Core™ i9–7900X CPU @ 3.30 GHz, RAM 32 GB running Windows 10 and the gnu compiler for Fortran 77. With 15,000 MCMC iterations omitting the first 5000 to mitigate start up effects and $$10^{7}$$ particles computations take, on the average, 38.77 min of CPU time.

9. The number of alternative configurations is set to 1000 to minimize computational effort without sacrificing thorough sensitivity analysis.

10. According to Kass and Raftery (1995)) these bounds should be $$1/e \cong 0.37$$ and $$e \cong 2.71$$. However, these bounds apply to Bayes factors. Here, they are used as rough bounds to determine the “significance” of RPS.

11. The benefit of MALA over Random-Walk-Metropolis arises when the number of parameters $$n$$ is large. This happens because the scaling parameter $$\lambda$$ is $$O\left( {n^{ - 1/2} } \right)$$ for Random-Walk-Metropolis but it is $$O\left( {n^{ - 1/6} } \right)$$ for MALA, see Roberts et al. (1997) and Roberts and Rosenthal (1998).

## References

• Abiad, A. 2003. Early warning systems: a survey and a regime-switching approach, IMF Working Paper 32.

• Alessi, L., and C. Detken. 2011. Quasi real time early warning indicators for costly asset price Boom/Bust cycles: a role for global liquidity. European Journal of Political Economy 27 (3): 520–533.

• Andrieu, C., and G.O. Roberts. 2009. The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics 37 (2): 697–725.

• Andrieu, C., A. Doucet, and R. Holenstein. 2010. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72: 269–342.

• Bai, J., and S. Ng. 2005. Tests for skewness, kurtosis, and normality for time series data. Journal of Business and Economic Statistics 23 (1): 49–60.

• Basak, G.K., P.K. Das, and A. Rohit. 2017. Capital inflow-terms of trade ‘nexus’: does it lead to financial crisis? Economic Modelling 65: 18–29.

• Berg, A., and C. Pattillo. 1999. Predicting currency crises: the indicators approach and an alternative. Journal of International Money and Finance 18 (4): 561–586.

• Billio, M., R. Casarin, M. Costola, and A. Pasqualini. 2016. An entropy-based early warning indicator for systemic risk. Journal of International Financial Markets, Institutions and Money 45: 42–59.

• Borio, C., Drehmann, M. 2009. Assessing the risk of banking crises–revisited. BIS Quarterly Review, 1–18.

• Bussière, M., and M. Fratzscher. 2006. Towards a new early warning system of financial crises. Journal of International Money and Finance 25 (6): 953–973.

• Caggiano, G., P. Calice, L. Leonida, and G. Kapetanios. 2016. Comparing logit-based early warning systems: does the duration of systemic banking crises matter? Journal of Empirical Finance 37: 104–116.

• Carlin, B.P., and T.A. Louis. 2000. Bayes and empirical bayes methods for data analysis, 2nd ed. London: Chapman & Hall.

• Casarin, R., Marin, J.-M. 2007. Online data processing: Comparison of Bayesian regularized particle filters, University of Brescia, Department of Economics, Working Paper n. 0704.

• Cheng, X., and H. Zhao. 2019. Modeling, analysis and mitigation of contagion in financial systems. Economic Modelling 76: 281–292.

• Caprio, C., Klingebiel, D. 1996. Bank insolvencies: cross‐country experience, World Bank Policy Research Working Paper 1620.

• Cheng, B., and D.M. Titterington. 1994. Neural networks: a review from a statistical perspective. Statistical Science 9 (1): 2–30.

• Lo Duca, M., Koban, A., Basten, M., Bengtsson, E., Klaus, B, Kusmierczyk, P., Lang, J.H., Detken, C., Peltonen, T. 2017. A new database for financial crises in European countries. Occasional Paper Series, No 194, ECB.

• Doucet, A., S. Godsill, and C. Andrieu. 2000. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 10 (3): 197–208.

• Doucet, A., N. de Freitas, and N. Gordon (eds.). 2001. Sequential Monte Carlo methods in practice. Berlin: Springer.

• Drakos, A.A., and G.P. Kouretas. 2015. The conduct of monetary policy in the Eurozone before and after the financial crisis. Economic Modelling 48: 83–92.

• Edison, H. 2003. Do indicators of financial crises work? An evaluation of an early warning system. International Journal of Finance and Economics 8 (1): 11–53.

• El-Shagi, M., T. Knedlik, and G. von Schweinitz. 2013. Predicting financial crises: the (statistical) significance of the signals approach. Journal of International Money and Finance 35: 76–103.

• Fearnhead, P., and P. Clifford. 2003. Online inference for hidden Markov models via particle filters. Journal of the Royal Statistical Society: Series B 65: 887–899.

• Fearnhead, P., O. Papaspiliopoulos, and G.O. Roberts. 2008. Particle filters for partially observed diffusions. Journal of the Royal Statistical Society: Series B 70: 1–28.

• Flury, T., Shepard, N., 2011. Bayesian inference based only on a simulated likelihood.Econometric Theory, 27, 933–956.

• Frankel, J.A., and A.K. Rose. 1996. Currency crashes in emerging markets: an empirical treatment. Journal of International Economics 41 (3–4): 351–366.

• Fuertes, A.-M., and E. Kalotychou. 2006. Early warning systems for sovereign debt crises: the role of heterogeneity. Computational Statistics & Data Analysis 51: 1420–1441.

• Geweke, J., and G. Amisano. 2010. Evaluating the predictive distributions of Bayesian models of asset returns. International Journal of Forecasting 26: 216–230.

• Geweke, J., and G. Amisano. 2011. Optimal prediction pools. Journal of Econometrics 164: 130–141.

• Guerreiro, D. 2014. Is the European debt crisis a mere balance of payments crisis? Economic Modelling 44: S50–S56.

• Gómez-Puig, M., and Sosvilla-Rivero, S. 2016. Causes and hazards of the euro area sovereign debt crisis: Pure and fundamentals-based contagion. Economic Modelling 56: 133–147.

• Lopes, H.F., and R.S. Tsay. 2011. Particle filters and bayesian inference in financial econometrics. Journal of Forecasting 30: 168–209.

• Hans, C. 2009. Bayesian lasso regression. Biometrika 96 (4): 835–845.

• Hartman, E.J., J.D. Keeler, and J.M. Kowalski. 1990. Layered neural networks with Gaussian hidden units as universal approximations. Neural Computation 2 (2): 210–215.

• Heryán, T., and P.G. Tzeremes. 2017. The bank lending channel of monetary policy in EU countries during the global financial crisis. Economic Modelling 67: 10–22.

• Hoggarth, G., R. Reis, and V. Saporta. 2002. Costs of banking system instability: some empirical evidence. Journal of Banking & Finance 26: 825–855.

• Hornik, K., M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5): 359–366.

• Kamin, S., J. Schindler, and S. Samuel. 2001. The contribution of domestic and external factors to emerging market devaluation crises: an early warning systems approach. FRB International Finance Discussion Paper 711: 56.

• Kaminsky, G.L., S. Lizondo, and C.M. Reinhart. 1998. Leading indicators of currency crises. IMF Staff Papers 45 (1): 1–48.

• Kaminsky, G.L., and C.M. Reinhart. 1999. The twin crises: the causes of banking and balance-of-payments problems. American Economic Review 89 (3): 473–500.

• Kaminsky, G.L. 2006. Currency crises: Are they all the same? Journal of International Money and Finance 25: 503–527.

• Kass, R.E., and A.E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90 (430): 773–795.

• Kittelmann, K., M. Tirpak, M. Schweickert, and L. Vinhas De Souza. 2006. From transition crises to macroeconomic stability? Lessons from a crises early warning system for Eastern European and CIS countries. Comparative Economic Studies 48 (3): 410–437.

• Knedlik, T., and R. Scheufele. 2008. Forecasting currency crises: which methods signaled the South African crisis of June 2006? South African Journal of Economics 76 (3): 367–383.

• Knedlik, T., and G. von Schweinitz. 2012. Macroeconomic imbalances as indicators for debt crises in Europe. Journal of Common Market Studies 50 (5): 726–745.

• Koop, G., and D.J. Poirier. 2004. Bayesian variants of some classical semiparametric regression techniques. Journal of Econometrics 123: 259–282.

• Koop, G., D.J. Poirier, and J. Tobias. 2005. Bayesian semiparametric inference in multiple equation models. Journal of Applied Econometrics 20: 723–747.

• Laeven, L., Valencia, F. 2012. Systemic banking crises database: an update. IMF Working Paper, No WP/12/163, IMF.

• Lin, F., Liang, D., Yeh, C-C, Huang, J.C. 2014. Novel feature selection methods to financial distress prediction. Expert Systems with Applications 41: 2472–2483.

• Liu, J., and M. West. 2001. Combined parameter and state estimation in simulation-based filtering. In Sequential Monte Carlo methods in practice, ed. A. Doucet, N. de Freitas, and N. Gordon. Berlin: Springer.

• Nemeth, C., C. Sherlock, and P. Fearnhead. 2016. Particle metropolis-adjusted Langevin algorithms. Biometrika 103: 701–717.

• Mariano, S., B. Gultekin, S. Ozmucur, T. Shabbir, and C. Alper. 2004. Prediction of currency crises: case of Turkey. Review of Middle East Economics and Finance 2 (2): 87–107.

• Park, T., and G. Casella. 2008. The Bayesian Lasso. Journal of the American Statistical Association 103 (482): 681–686.

• Pitt, M.K., and N. Shephard. 1999. Filtering via simulation: auxiliary particle filters. Journal of the American Statistical Association 94: 590–599.

• Pitt, M.K., R. Silva, P. Giordani, and R. Kohn. 2012. On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Journal of Econometrics 171 (2): 134–151.

• Qin, X., and C. Luo. 2014. Capital account openness and early warning system for banking crises in G20 countries. Economic Modelling 39: 190–194.

• Ristic, B., S. Arulampalam, and N. Gordon (eds.). 2004. Beyond the kalman filter: particle filters for tracking applications. Boston: Artech House.

• Robert, C.P. 2001. The Bayesian Choice, 2nd ed. New York: Springer.

• Roberts, G.O., A. Gelman, and W. Gilks. 1997. Weak convergence and optimal scaling of the random walk metropolis algorithms. The Annals of Applied Probability 7 (1): 110–120.

• Roberts, G.O., and J.S. Rosenthal. 1998. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B 60 (1): 255–268.

• Tibshirani, R. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B 58: 267–288.

• Tierney, L. 1994. Markov chains for exploring posterior distributions. Annals of Statistics 22: 1701–1762.

• Ureche-Rangau, L., and A. Burietz. 2013. One crisis, two crises…the subprime crisis and the European sovereign debt problems. Economic Modelling 35: 35–44.

• Wahba, G. 1990. Spline models for observational data. Philadelphia: SIAM.

• Wasserman, L. 2004. All of statistics: a concise course in statistical inference. New York: Springer.

## Author information

Authors

### Corresponding author

Correspondence to Panos Xidonas.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### Derivation of Likelihood

We summarize the model below.

Conditional on $$1 \le G < n$$ global factors $$\varvec{f}_{t}$$ we have:

$$\varvec{X}_{t}^{\left( n \right)} = \varvec{A}_{n} \left( L \right)\varvec{f}_{t} + \varvec{v}_{t}^{\left( n \right)} ,\,\forall n \in {\mathbb{N}},$$
(34)

For the error term we assume:

$$\xi_{t} \sim {\mathcal{N}}\left( {0,\varSigma_{\xi } } \right)\,\forall t \in {\mathbb{T}}.$$
(35)

For the individual country-specific components we assume:

$$\varvec{v}_{t}^{\left( n \right)} = \varvec{D}\left( L \right)\varvec{v}_{t}^{\left( n \right)} + \varvec{c}_{t}^{\left( n \right)} +\varvec{\varepsilon}_{t}^{\left( n \right)} ,$$
(36)

These country-specific effects have the following structure:

$$\varvec{c}_{t}^{\left( n \right)} = \varvec{c}_{t - 1}^{\left( n \right)} + 1_{M} \cdot \gamma_{b}^{\left( n \right)} D_{b,t}^{\left( n \right)} +\varvec{\zeta}_{t}^{\left( n \right)} \,\forall t \in {\mathbb{T}},\forall n \in {\mathbb{N}}$$
(37)

where $$D_{b,t}^{\left( n \right)} = 1$$ if there is a structural break at date $$t$$ for country $$n$$, and zero otherwise.

For the error terms we assume:

$$\zeta_{t}^{\left( n \right)} \sim {\mathcal{N}}\left( {0,\varSigma_{\zeta }^{\left( n \right)} } \right), \varepsilon_{t}^{\left( n \right)} \sim {\mathcal{N}}\left( {0,\varSigma_{\varepsilon }^{\left( n \right)} } \right)\,\forall n \in {\mathbb{N}}.$$
(38)

For the aggregation function we have a stochastic weighting scheme:

$$X_{t}^{*} = \mathop \sum \limits_{{n \in {\mathbb{N}}}} \alpha_{t,n} X_{t}^{\left( n \right)} + u_{t} : = \varvec{\alpha^{\prime}}_{t} \varvec{X}^{\left( t \right)} + u_{t} \,\forall t \in {\mathbb{T}},$$
(39)

where $$\varvec{X}^{\left( t \right)} : = \left\{ {X_{t}^{\left( n \right)} ,t \in {\mathbb{T}}} \right\}$$, the weights $$\alpha_{t,n} \ge 0,\,\forall n \in {\mathbb{N}}$$ and $$\mathop \sum \limits_{{n \in {\mathbb{N}}}} \alpha_{t,n} = 1\,\forall t \in {\mathbb{T}}$$. Dynamics in (12) can be accommodated using an AR ($$R$$) process for the error term:

$$u_{t} = \mathop \sum \limits_{r = 1}^{R} \rho_{r} u_{t - r} + \in_{t} , \in_{t} \sim {\mathcal{N}}\left( {0,\sigma_{ \in }^{2} } \right)\,\forall t \in {\mathbb{T}}.$$
(40)

For the weights $$\varvec{\alpha}_{t} = \left( {\alpha_{t,n} ,n \in {\mathbb{N}}} \right)$$ we assume that they are time-varying:

$$\varvec{\alpha}_{t} = 2\varvec{\alpha}_{t - 1} -\varvec{\alpha}_{t - 2} + \varvec{v}_{{\varvec{\alpha},t}} ,$$
(41)

where

$$\varvec{v}_{{\varvec{\alpha},t}} \sim {\mathcal{N}}_{n} \left( {0,\lambda^{2} \varvec{I}} \right),\,\forall t \in {\mathbb{T}}.$$
(42)

We derive the posterior for date t as we will apply SMC to compute the likelihood/posterior.

From (6) and (9) we have:

$$\varvec{X}_{t}^{\left( n \right)} - \varvec{D}\left( L \right)\varvec{X}_{t}^{\left( n \right)} = \left( {\varvec{I} - \varvec{D}\left( L \right)} \right)\varvec{A}_{n} \left( L \right)\varvec{f}_{t} + \varvec{c}_{t}^{\left( n \right)} +\varvec{\varepsilon}_{t}^{\left( n \right)} ,\,\forall n \in {\mathbb{N}}.$$
(43)

Therefore

$$p\left( {X_{t}^{\left( n \right)} |X_{t,lags}^{\left( n \right)} ,f_{t} ,c_{t}^{\left( n \right)} ,\theta } \right) \propto \left| {\varSigma_{\varepsilon }^{\left( n \right)} } \right|^{{\frac{1}{2}}} \exp \left( { - \frac{1}{2} U_{t}^{\left( n \right)} \prime \varSigma_{\varepsilon }^{ - 1} U_{t}^{\left( n \right)} } \right),$$
(44)

where $$\varvec{X}_{t,lags}^{\left( n \right)}$$ denotes lagged values of $$\varvec{X}_{t}^{\left( n \right)}$$ and $$\theta$$ denotes the “structural” parameters in the following matrices/vectors/scalars:$$\varvec{D},\left\{ {\varvec{A}_{n} } \right\},\varvec{\varSigma}_{\varvec{\varepsilon}} ,\varvec{B},\varvec{C},\sigma_{\varepsilon }^{2} ,\varvec{\rho},\lambda ,\left\{ {\gamma_{b}^{\left( n \right)} } \right\},\varvec{\varSigma}_{\varvec{\zeta}}$$,and

$$\varvec{U}_{t}^{\left( n \right)} = \varvec{X}_{t}^{\left( n \right)} - \varvec{D}\left( L \right)\varvec{X}_{t}^{\left( n \right)} - \left( {\varvec{I} - \varvec{D}\left( L \right)} \right)\varvec{A}_{n} \left( L \right)\varvec{f}_{t} - \varvec{c}_{t}^{\left( n \right)}$$
(45)

If we define $$\varvec{X}^{\left( t \right)} = \left[ {\varvec{X}_{t}^{\left( n \right)} ,\forall n \in {\mathbb{N}}} \right]$$, we have

$$p\left( {X^{\left( t \right)} |X_{lags}^{\left( t \right)} ,f_{t} ,\{ c_{t}^{\left( n \right)} ,\forall n \in {\mathbb{N}}\} ,\theta } \right) = \mathop \prod \limits_{{n \in {\mathbb{N}}}} p\left( {X_{t}^{\left( n \right)} |X_{t,lags}^{\left( n \right)} ,f_{t} ,c_{t}^{\left( n \right)} ,\theta } \right).$$
(46)

From (7) we obtain:

$$p\left( {f_{t} |f_{t,lags} ,X_{t}^{*} ,\theta } \right) \propto \left| {\varSigma_{\xi } } \right|^{{\frac{n}{2}}} \exp \left( { - \frac{1}{2} \mathop \sum \limits_{{n \in {\mathbb{N}}}} U_{ft}^{\left( n \right)} \prime \varSigma_{\xi }^{ - 1} U_{ft}^{\left( n \right)} } \right),$$
(47)

where $$\varvec{U}_{ft}^{\left( n \right)} = \left[ {\varvec{I} - \varvec{B}\left( L \right)} \right]\varvec{f}_{t} - \varvec{C}\left( L \right)X_{t}^{*}$$.

From (12) we obtain:

$$p\left( {X_{t}^{*} |\alpha_{t} ,X_{t,lags}^{*} ,X^{\left( t \right)} ,\theta } \right) \propto \sigma_{\varepsilon }^{ - 1} \exp \left( { - \frac{1}{{2\sigma_{\varepsilon }^{2} }}\left[ {X_{t}^{*} - \alpha^{\prime}_{t} X^{\left( t \right)} - \rho \left( L \right)X_{t}^{*} + \rho \left( L \right)\alpha^{\prime}_{t} X^{\left( t \right)} } \right]} \right),$$
(48)

where $$\rho \left( L \right)$$ represents the lag polynomial in (13).

From (14) and (15)

$$p\left( {\alpha_{t} |\alpha_{t,lags} ,\lambda } \right) \propto \lambda^{{ - \frac{n}{2}}} \exp \left( { - \frac{1}{{2\lambda^{2} \mathop \sum \nolimits_{{n \in {\mathbb{N}}}} \left[ {\alpha_{t,n} - 2\alpha_{t - 1,n} + \alpha_{t - 2,n} } \right]^{2} }}} \right).$$
(49)

From (10) we obtain:

$$p\left( {c_{t}^{\left( n \right)} |c_{t,lags}^{\left( n \right)} ,\gamma_{b}^{\left( n \right)} ,D_{b,t}^{\left( n \right)} ,\varSigma_{\zeta }^{\left( n \right)} } \right) \propto \left| {\varSigma_{\zeta }^{\left( n \right)} } \right|^{{ - \frac{1}{2}}} \exp \left( { - \frac{1}{2}U_{ct}^{\left( n \right)} \prime \varSigma_{\zeta }^{ - 1} U_{ct}^{\left( n \right)} } \right),$$
(50)

where $$\varvec{U}_{ct}^{\left( n \right)} = \varvec{c}_{t}^{\left( n \right)} - \varvec{c}_{t - 1}^{\left( n \right)} - 1_{M} \gamma_{b}^{\left( n \right)} D_{b,t}^{\left( n \right)}$$. Suppose $$\varvec{c}^{\left( t \right)} = \left[ {\varvec{c}_{t}^{\left( n \right)} \forall n \in {\mathbb{N}}} \right]$$. Then, we have:

$$p\left( {c^{\left( t \right)} |c^{{\left( {t,lags} \right)}} ,\{ \gamma_{b}^{\left( n \right)} \forall n \in {\mathbb{N}}\} ,\{ D_{b,t}^{\left( n \right)} \forall n \in {\mathbb{N}}\} ,\{ \varSigma_{\zeta }^{\left( n \right)} \forall n \in {\mathbb{N}}\} } \right) \propto \mathop \prod \limits_{{n \in {\mathbb{N}}}} p\left( {c_{t}^{\left( n \right)} |c_{t,lags}^{\left( n \right)} ,\gamma_{b}^{\left( n \right)} ,D_{b,t}^{\left( n \right)} ,\varSigma_{\zeta }^{\left( n \right)} } \right).$$
(51)

The likelihood function is

$$\ell \left( {\theta ;Y^{\left( t \right)} ,\varLambda_{t} } \right) \propto p\left( {X^{\left( t \right)} |X_{lags}^{\left( t \right)} ,f_{t} ,\{ c_{t}^{\left( n \right)} ,\forall n \in {\mathbb{N}}\} ,\theta } \right)$$
$$\cdot p\left( {f_{t} |f_{t,lags} ,X_{t}^{*} ,\theta } \right)$$
$$\cdot p\left( {X_{t}^{*} |\alpha_{t} ,X_{t,lags}^{*} ,X^{\left( t \right)} ,\theta } \right)$$
$$\cdot p\left( {\alpha_{t} |\alpha_{t,lags} ,\lambda } \right)p\left( {c^{\left( t \right)} |c^{{\left( {t,lags} \right)}} ,\{ \gamma_{b}^{\left( n \right)} \forall n \in {\mathbb{N}}\} ,\{ D_{b,t}^{\left( n \right)} \forall n \in {\mathbb{N}}\} ,\{ \varSigma_{\zeta }^{\left( n \right)} \forall n \in {\mathbb{N}}\} } \right),$$
(52)

where all densities have been defined above up to normalizing constants and $$\varvec{Y}^{\left( t \right)}$$ denotes all available data up to time $$t$$.

To compute the likelihood, we have to integrate out the latent variables $$\varvec{\varLambda}_{t} = \{ \varvec{f}_{t} ,\varvec{\alpha}_{t} ,\varvec{c}_{t} \forall t \in {\mathbb{T}}\}$$. The latent variables are integrated out using SMC–particle-filtering. Therefore, we can obtain a consistent (as the number of particles increases) estimator of $$\ell \left( {\theta ;\varvec{Y}^{\left( t \right)} } \right)$$. The overall posterior density is

$$p\left( {\varvec{\theta}|\varvec{Y}} \right) = p\left(\varvec{\theta}\right) \cdot \mathop \prod \limits_{{t \in {\mathbb{T}}}} \ell \left( {\varvec{\theta};\varvec{Y}^{\left( t \right)} } \right),$$
(53)

where $$p\left(\varvec{\theta}\right)$$ is the prior, and $$\varvec{Y} = \left\{ {\varvec{Y}^{\left( t \right)} \,\forall t \in {\mathbb{T}}} \right\}$$.

To implement MCMC we use $$\ell \left( {\varvec{\theta};\varvec{Y}^{\left( t \right)} ,\varvec{\varLambda}_{t} } \right)$$ treating $$\varvec{\varLambda}_{t}$$ as a block. Moreover, parameters $$\varvec{\theta}$$ are drawn using the MCMC described in Appendix B.

It remains to determine the number of factors G and the structural breaks. Given the dates of structural breaks we compute the posterior $$p\left( {\varvec{\theta}|\varvec{Y}} \right)$$ for several values of $$G \in \left\{ {1, \ldots ,{\ominus }\bar{G}} \right\}$$, where $$\bar{G}$$ is set to 5. In turn, we compute the marginal likelihood $$p(\varvec{Y}|G)$$ which is a by-product of MCMC.

#### Procedure DBR: Determination of Breaks

To determine the dates of structural breaks we proceed as follows. The number of breaks we allow is $$b \in {\mathbb{B}} = \left\{ {1, \ldots ,\bar{B}} \right\}$$ where $$\bar{B} \ll T$$. We need to determine whether $$D_{b,t} = 1$$ or zero for a given date $$t^{*}$$. We compute the marginal likelihood for $$t^{*} \in \left\{ {T_{o} ,T_{o} + 1, \ldots ,T} \right\}$$ where $$T_{o} > 1$$ is the minimal starting value for t given our lag structure and the maximum value of the marginal likelihood determines $$t^{*}$$. Then we apply SMC again, to determine two dates $$t_{1}^{*} < t_{2}^{*}$$ and we recompute the marginal likelihood. If it is lower than the marginal likelihood for a single break at $$t^{*}$$ we stop, otherwise we move on to the case of three breaks $$t_{1}^{*} < t_{2}^{*} < t_{3}^{*}$$. In all cases we examined, we have one or two breaks. This procedure, determines simultaneously the optimal number of factors $$G$$ as well and the timings of breaks.

Below we provide pseudo code that implements the approach.

.

### I. Particle Filtering

The particle filter methodology can be applied to state space models of the form:

$$y_{T} \sim p(y_{t} |s_{t} ),\;s_{t} \sim p(s_{t} |s_{t - 1} ),$$
(54)

where $$s_{t}$$ is a state variable. For general introductions see Andrieu et al. (2010, pp. 272, 277), Doucet et al. (2001) and Ristic et al. (2004). Given the data $$Y_{t}$$ the posterior disribution $$p(s_{t} |Y_{t} )$$ can be approximated by a set of (auxiliary) particles $$\left\{ {s_{t}^{\left( i \right)} ,i = 1, \ldots ,N} \right\}$$ with probability weights $$\left\{ {w_{t}^{\left( i \right)} ,i = 1, \ldots ,N} \right\}$$ where $$\mathop \sum \limits_{i = 1}^{N} w_{t}^{\left( i \right)} = 1$$. The predictive density is approximated by

$$p\left( {s_{t + 1} |Y_{t} } \right) = \smallint p\left( {s_{t + 1} |s_{t} } \right)p\left( {s_{t} |Y_{t} } \right)ds_{t} \simeq \mathop \sum \limits_{i = 1}^{N} p\left( {s_{t + 1} |s_{t}^{\left( i \right)} } \right)w_{t}^{\left( i \right)} ,$$
(55)

where $$w_{t}^{\left( i \right)} = \frac{{p\left( {s_{t}^{\left( i \right)} |Y_{t} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{N} p\left( {s_{t}^{\left( j \right)} |Y_{t} } \right) }}.$$

An approximation for the filtering density is

$$p(s_{t + 1} |Y_{t} ) \propto p(y_{t + 1} |s_{t + 1} )p(s_{t + 1} |Y_{t} ) \simeq p(y_{t + 1} |s_{t + 1} )\mathop \sum \limits_{i = 1}^{N} p(s_{t + 1} |s_{t}^{\left( i \right)} )w_{t}^{\left( i \right)} .$$
(56)

Particle filtering propagates $$\left\{ {s_{t}^{\left( i \right)} ,w_{t}^{\left( i \right)} ,i = 1, \ldots ,N} \right\}$$ to the next step, viz. $$\left\{ {s_{t + 1}^{\left( i \right)} ,w_{t + 1}^{\left( i \right)} ,i = 1, \ldots ,N} \right\}$$ but this often suffers from a weight degeneracy problem. If parameters $$\theta \in \varTheta \in \Re^{k}$$ are available, as is often the case, we follow Liu and West (2001). In this context, parameter learning takes place via a mixture of multivariate normal distributions:

$$p(\varvec{\theta}|Y_{t} ) \simeq \mathop \sum \limits_{i = 1}^{N} w_{t}^{\left( i \right)} N(\varvec{\theta}|a\varvec{\theta}_{t}^{\left( i \right)} + \left( {1 - a} \right)\bar{\varvec{\theta }}_{t} ,b^{2} V_{t} ),$$
(57)

where $$\bar{\varvec{\theta }}_{t} = \mathop \sum \limits_{i = 1}^{N} w_{t}^{\left( i \right)}\varvec{\theta}_{t}^{\left( i \right)}$$, and $$V_{t} = \mathop \sum \limits_{i = 1}^{N} w_{t}^{\left( i \right)} \left( {\varvec{\theta}_{t}^{\left( i \right)} - \bar{\varvec{\theta }}_{t} } \right)(\varvec{\theta}_{t}^{\left( i \right)} - \bar{\varvec{\theta }}_{t} )^{\prime}$$. The constants $$a$$ and b are related to shrinkage and are determined via a discount factor $$\delta \in \left( {0,1} \right)$$ as $$a = (1 - b^{2} )^{1/2}$$ and $$b^{2} = 1 - [\left( {3\delta - 1} \right)/2\delta ]^{2} .$$ See also Casarin (2007). Andrieu and Roberts (2009), Flury and Shephard (2011), and Pitt et al. (2012) suggested the Particle Metropolis–Hastimgs (PMCMC) technique which uses an unbiased estimator of the likelihood function $$\hat{p}_{N} (Y|\varvec{\theta})$$ as $$p(Y|\varvec{\theta})$$ as the latter is often not available in closed form. Given the current state of the parameter $$\varvec{\theta}^{\left( j \right)}$$ and the current estimate of the likelihood, say $$L^{j} = \hat{p}_{N} (Y|\varvec{\theta}^{\left( j \right)} )$$, a candidate $$\varvec{\theta}^{c}$$ is drawn from $$q(\varvec{\theta}^{c} |\varvec{\theta}^{\left( j \right)} )$$ yielding $$L^{c} = \hat{p}_{N} (Y|\varvec{\theta}^{c} )$$. Then, we set $$\varvec{\theta}^{{\left( {j + 1} \right)}} =\varvec{\theta}^{c}$$ with the Metropolis—Hastings probability:

$$A = { \hbox{min} }\left\{ {1,\;\frac{{p\left( {\varvec{\theta}^{c} } \right)L^{c} }}{{p\left( {\varvec{\theta}^{\left( j \right)} } \right)L^{j} }}\frac{{q(\varvec{\theta}^{\left( j \right)} |\varvec{\theta}^{c} )}}{{q(\varvec{\theta}^{c} |\varvec{\theta}^{\left( j \right)} )}}} \right\},$$
(58)

otherwise we repeat the current draws: $$\left\{ {\varvec{\theta}^{{\left( {j + 1} \right)}} ,L^{j + 1} } \right\} = \left\{ {\varvec{\theta}^{\left( j \right)} ,L^{j} } \right\}$$. An auxiliary particle filter has been proposed which rests upon the idea that adaptive particle filtering (Pitt et al. 2012) used within PMCMC requires far fewer particles than the standard particle filtering algorithm to approximate $$p(Y|\varvec{\theta})$$. From Pitt and Shephard (1999) we know that auxiliary particle filtering can be implemented easily once we can evaluate the state transition density $$p(s_{t} |s_{t - 1} )$$. When this is not possible, an alternative approach can be applied when, for instance, $$s_{t} = g\left( {s_{t - 1} ,u_{t} } \right)$$ for a certain disturbance. In this case we have

$$p(y_{t} |s_{t - 1} ) = \int p(y_{t} |s_{t} )p(s_{t} |s_{t - 1} )ds_{t} ,$$
(59)
$$p(u_{t} |s_{t - 1} ;y_{t} ) = p(y_{t} |s_{t - 1} ,u_{t} )p(u_{t} |s_{t - 1} )/p(y_{t} |s_{t - 1} ).$$
(60)

If we can evaluate $$p(y_{t} |s_{t - 1} )$$ and simulate from $$p(u_{t} |s_{t - 1} ;y_{t} )$$ the filter would be fully adaptable (Pitt and Shephard 1999). One can use a Gaussian approximation for the first-stage proposal $$g(y_{t} |s_{t - 1} )$$ by matching the first two moments of $$p(y_{t} |s_{t - 1} )$$. So in some way we find that the approximating density $$p\left( {y_{t} |s_{t - 1} } \right) = N\left( {{\mathbb{E}}\left( {y_{t} |s_{t - 1} } \right),{\mathbb{V}}\left( {y_{t} |s_{t - 1} } \right)} \right)$$. In the second stage, we know that $$p(u_{t} |y_{t} ,s_{t - 1} ) \propto p(y_{t} |s_{t - 1} ,u_{t} )p\left( {u_{t} } \right)$$. For $$p(u_{t} |y_{t} ,s_{t - 1} )$$ we know it is multimodal so suppose it has $$M$$ modes are $$\hat{u}_{t}^{m}$$, for $$m = 1, \ldots ,M$$. For each mode we can use a Laplace approximation. Let $$l\left( {u_{t} } \right) = log\left[ {p(y_{t} |s_{t - 1} ,u_{t} )p\left( {u_{t} } \right)} \right]$$. From the Laplace approximation we obtain:

$$l\left( {u_{t} } \right) \simeq l\left( {\hat{u}_{t}^{m} } \right) + \frac{1}{2}(u_{t} - \hat{u}_{t}^{m} )^{\prime}\nabla^{2} l\left( {\hat{u}_{t}^{m} } \right)\left( {u_{t} - \hat{u}_{t}^{m} } \right).$$
(61)

Then we use as a mixture approximation:

$$g(u_{t} |x_{t} ,s_{t - 1} ) = \mathop \sum \limits_{m = 1}^{M} \lambda_{m} (2\pi )^{ - d/2} |\varSigma_{m} |^{ - 1/2} \exp \left\{ {\frac{1}{2}(u_{t} - \hat{u}_{t}^{m} )^{\prime}\varSigma_{m}^{ - 1} (u_{t} - \hat{u}_{t}^{m} } \right\},$$
(62)

where $$\varSigma_{m} = - \left[ {\nabla^{2} l\left( {\hat{u}_{t}^{m} } \right)} \right]^{ - 1}$$ and $$\lambda_{m} \propto \exp \left\{ {l\left( {u_{t}^{m} } \right)} \right\}$$ with $$\mathop \sum \limits_{m = 1}^{M} = 1$$. This is done for each particle $$s_{t}^{i} .$$ This is known as the Auxiliary Disturbance Particle Filter (ADPF).

An alternative is the independent particle filter (IPF) of Lin et al. (2014). In standard particle filtering, particles are simulated through the state density $$p(s_{t}^{i} |s_{t - 1}^{i} )$$ and they are re-sampled with weights determined by the measurement density evaluated at the resulting particle, viz. $$p(y_{t} |s_{t}^{i} )$$.

The ADPF is simple to construct and rests upon the following steps (pseudo-code for Particle Filtering):

(63)

The estimate of likelihood from ADPF is

$$p\left( {Y_{1:T} } \right) = \mathop \prod \limits_{t = 1}^{T} \left( {\mathop \sum \limits_{i = 1}^{N} \omega_{t - 1|t}^{i} } \right)\left( {N^{ - 1} \mathop \sum \limits_{i = 1}^{N} \omega_{t}^{i} } \right).$$
(64)

### II. Particle Metropolis Adjusted Langevin filters

Nemeth et al. (2016) provide a particle version of a Metropolis adjusted Langevin algorithm (MALA). In Sequential Monte Carlo we are interested in approximating $$p(s_{t} |Y_{1:t} ,\theta )$$. Given that

$$p(s_{t} |Y_{1:t} ,\varvec{\theta}) \propto g(y_{t} |x_{t} ,\varvec{\theta})\smallint f(s_{t} |s_{t - 1} ,\varvec{\theta})p(s_{t - 1} |y_{1:t - 1} ,\varvec{\theta})ds_{t - 1} ,$$
(65)

where $$p(s_{t - 1} |y_{1:t - 1} ,\varvec{\theta})$$ is the posterior as of time $$t - 1$$. If at time $$t - 1$$ we have a set of particles $$\left\{ {s_{t - 1}^{i} ,i = 1, \ldots ,N} \right\}$$ and weights $$\left\{ {w_{t - 1}^{i} ,i = 1, \ldots ,N} \right\}$$ which form a discrete approximation for $$p(s_{t - 1} |y_{1:t - 1} ,\varvec{\theta})$$ then we have the approximation:

$$\hat{p}(s_{t - 1} |y_{1:t - 1} ,\varvec{\theta}) \propto \mathop \sum \limits_{i = 1}^{N} w_{t - 1}^{i} f(s_{t} |s_{t - 1}^{i} ,\varvec{\theta}).$$
(66)

See Doucet et al. (2001, 2014) for reviews. From (46) Fearnhead et al. (2008) make the important observation that the joint probability of sampling particle $$s_{t - 1}^{i}$$ and state $$s_{t}$$ is:

$$\omega_{t} = \frac{{w_{t - 1}^{i} g(y_{t} |s_{t} ,\varvec{\theta})f(s|s_{t - 1}^{i} ,\varvec{\theta})}}{{\xi_{t}^{i} q(s_{t} |s_{t - 1}^{i} ,y_{t} ,\varvec{\theta})}},$$
(67)

where $$q(s_{t} |s_{t - 1}^{i} ,y_{t} ,\varvec{\theta})$$ is a density function amenable to simulation,

$$\xi_{t}^{i} q(s_{t} |s_{t - 1}^{i} ,y_{t} ,\varvec{\theta}) \simeq cg(y_{t} |s_{t} ,\varvec{\theta})f(s_{t} |s_{t - 1}^{i} ,\varvec{\theta}),$$
(68)

and $$c$$ is the normalizing constant in (45).

In the MALA algorithm of Roberts and Rosenthal (1998)Footnote 11 we form a proposal

$$\varvec{\theta}^{c} =\varvec{\theta}^{\left( s \right)} + \lambda z + \frac{{\lambda^{2} }}{2}\nabla { \log }p(\varvec{\theta}^{\left( s \right)} |Y_{1:T} ),$$
(69)

where $$z \sim N\left( {0,I} \right)$$ which should result in larger jumps and better mixing properties, plus lower autocorrelations for a certain scale parameter $$\lambda$$. Acceptance probabilities are

$$a(\theta^{c} |\theta^{\left( s \right)} ) = \hbox{min} \left\{ {1,\frac{{p(\theta^{c} |Y_{1:T} )q(\theta^{\left( s \right)} |\theta^{c} )}}{{p(\theta^{\left( s \right)} |Y_{1:T} )q(\theta^{c} |\theta^{\left( s \right)} )}}} \right\} = \hbox{min} \left\{ {1,\frac{{p(Y_{1:T} |\theta^{c} )p\left( {\theta^{c} } \right)q(\theta^{\left( s \right)} |\theta^{c} )}}{{p\left( {Y_{1:T} |\theta^{s} } \right)p\left( {\theta^{s} } \right)q(\theta^{c} |\theta^{\left( s \right)} )}}} \right\}.$$
(70)

Using particle filters it is possible to create an approximation of the score vector using Fisher’s identity:

$$\nabla \log p(Y_{1:T} |\varvec{\theta}) = E\left[ {\nabla \log p(s_{1:T} ,Y_{1:T} |\varvec{\theta})} \right],$$
(71)

which corresponds to the expectation of

$$\nabla \log p(s_{1:T} ,Y_{1:T} |\varvec{\theta}) = \nabla \log p(s_{1:T - 1} ,Y_{1:T - 1} |\varvec{\theta}) + \nabla \log g(y_{T} |s_{T} ,\varvec{\theta}) + \nabla \log f(s_{T} |s_{T - 1} ,\varvec{\theta}),$$

over the path $$s_{1:T}$$. The particle approximation to the score vector results from replacing $$p(s_{1:T} |Y_{1:T} ,\varvec{\theta})$$ with a particle approximation $$\hat{p}(s_{1:T} |Y_{1:T} ,\varvec{\theta})$$. With particle i at time t-1 we can associate a value $$\alpha_{t - 1}^{i} = \nabla \log p(s_{1:t - 1}^{i} ,Y_{1:t - 1} |\varvec{\theta})$$ which can be updated recursively. As we sample $$\kappa_{i}$$ in the APF (the index of particle at time $$t - 1$$ that is propagated to produce the $$i$$ th particle at time t) we have the update:

$$\alpha_{t}^{i} = a_{t - 1}^{{\kappa_{i} }} + \nabla \log g(y_{t} |s_{t}^{i} ,\varvec{\theta}) + \nabla \log f(s_{t}^{i} |s_{t - 1}^{i} ,\varvec{\theta}).$$
(72)

To avoid problems with increasing variance of the score estimate $$\nabla \log p(Y_{1:t} |\varvec{\theta})$$ we can use the approximation:

$$\alpha_{t - 1}^{i} \sim N\left( {m_{t - 1}^{i} ,V_{t - 1} } \right)$$
(73)

The mean is obtained by shrinking $$\alpha_{t - 1}^{i}$$ towards the mean of $$\alpha_{t - 1}$$ as follows:

$$m_{t - 1}^{i} = \delta \alpha_{t - 1}^{i} + \left( {1 - \delta } \right)\mathop \sum \limits_{i = 1}^{N} w_{t - 1}^{i} \alpha_{t - 1}^{i} ,$$
(74)

where $$\delta \in \left( {0,1} \right)$$ is a shrinkage parameter. Using Rao-Blackewellization one can avoid sampling $$\alpha_{t}^{i}$$ and instead use the following recursion for the means:

$$m_{t}^{i} = \delta m_{t - 1}^{{\kappa_{i} }} + \left( {1 - \delta } \right)\mathop \sum \limits_{i = 1}^{N} w_{t - 1}^{i} m_{t - 1}^{i} + \nabla \log g(y_{t} |s_{t}^{i} ,\varvec{\theta}) + \nabla \log f(s_{t}^{i} |s_{t - 1}^{{\kappa_{i} }} ,\varvec{\theta}),$$
(75)

which yields the final score estimate:

$$\nabla \log \hat{p}(Y_{1:t} |\varvec{\theta}) = \mathop \sum \limits_{i = 1}^{N} w_{t}^{i} m_{t}^{i} .$$
(76)

As a rule of thumb Nemeth et al. (2016) suggest taking $$\delta = 0.95$$. Furthermore, they show the important result that the algorithm should be tuned to the asymptotically optimal acceptance rate of 15.47% and the number of particles must be selected so that the variance of the estimated log-posterior is about 3. Additionally, if measures are not taken to control the error in the variance of the score vector there is no gain over a simple random walk proposal. The marginal likelihood is given by:

$$p(Y_{1:T} |\varvec{\theta}) = p(y_{1} |\varvec{\theta})\mathop \prod \limits_{t = 2}^{T} p(y_{t} |Y_{1:t - 1} ,\varvec{\theta}),$$
(77)

where

$$p(y_{t} |Y_{1:t - 1} ,\varvec{\theta}) = \smallint g(y_{t} |s_{t} )\smallint f(s_{t} |s_{t - 1} ,\varvec{\theta})p(s_{t - 1} |Y_{1:T - 1} ,\varvec{\theta})ds_{t - 1} ds_{t} ,$$
(78)

provides, in explicit form, the predictive likelihood.

## Rights and permissions

Reprints and permissions

Michaelides, P., Tsionas, M. & Xidonas, P. A Bayesian Signals Approach for the Detection of Crises. J. Quant. Econ. 18, 551–585 (2020). https://doi.org/10.1007/s40953-019-00186-8