1 Introduction

The concept of financial conditions, a summary measure of how easily firms, households, and governments finance themselves, has received wide attention following the 2008–2009 Global Financial Crisis. Financial crises are typically heralded by long periods of low volatility, cheap borrowing rates, high asset prices, and compressed spreads, during which imbalances build-up. Loose financial conditions bring debtors close to their borrowing limits, setting the stage for nonlinear effects when financial conditions tighten. In this context, financial condition indices (FCIs) can help monitoring the phase of imbalances build-up and may provide a timely signal of mounting financial stress. Financial conditions are also often mentioned by monetary policy makers, who see asset prices as the first step of the complex transmission mechanism of monetary policy. Adrian and Liang (2018) go as far as opening their paper on monetary policy and financial stability with the sentence: “Monetary policy works by affecting financial condition.” And there is ample empirical evidence backing this statement, as monetary policy has been shown to generate large movements in credit costs, mostly via a widening of both term premia and credit spreads (Borio and Zhu 2012; Gertler and Karadi 2015).

A large number of papers have developed FCIs by summarizing in a single indicator the information coming from different segments of the financial sector. A non-exhaustive list of papers on the topic includes Illing and Liu (2006), Hakkio and Keeton (2009), Hatzius et al. (2010), Matheson (2012), Brave and Butters (2012), Hollo et al. (2012), and Koop and Korobilis (2014). Most of these papers borrow their methodological setup from the factor model literature (Forni et al. 2000; Stock and Watson 2002, 2011; Doz et al. 2012) and build on the idea that the relevant information contained in a large dataset can be summarized via data reduction techniques like principal components. The level of sophistication of these indices has increased over time. For instance, Koop and Korobilis (2014) have proposed to use factor models with time-varying loadings and time-varying volatilities to summarize the information contained in a large number of macroeconomic and financial variables. This methodology is designed to account for the fact that the relationship between the financial sector and the real economy is subject to structural breaks over time. Model instability can indeed be a concern. Hatzius et al. (2010), for instance, find that the predictive ability of FCIs for future GDP changes over time.

In this paper, we argue that FCIs based on factor models, either with constant or time-varying parameters, may be prone to some flaws. First, these techniques are designed to reduce information dimensionality in datasets that are characterized by high collinearity. The intuition is that when many series behave in a very similar way, their linear combination summarizes efficiently the information that they convey. Yet, the series that enter popular measures of financial conditions have very heterogeneous behavior. Table 1 shows the correlation structure of a representative sample of nine macro-financial indicators that are typically used to construct financial condition indices, including credit growth, interest rates, asset prices, volatilities and exchange rates.Footnote 1 Out of the 36 correlations that fill the off-diagonal elements of the table, only 3 are higher than 0.3 in absolute value. Given this heterogeneity and the lack of collinearity, it is very likely that the final composite index is largely going to reflect the behavior of a limited block of the time series that compose the information set. To illustrate this point in a “large data” context, Fig. 1 shows the correlation between the National Financial Condition Index (NFCI) for the US economy computed by the Federal Reserve of Chicago and the individual series that compose the index.Footnote 2 The different colors illustrate the block to which the series belong (blue for Spreads and volatilities, violet for Yields, yellow for Credit ratios, orange for Failure rates and delinquencies, green for Lending standards, purple for Issuance and open interests). Visual inspection of Fig. 1 shows that the NFCI loads very heavily on credit spreads, as shown by the large dominance of blue bars at the high end of the correlation spectrum. Some categories display a negligible contribution to the common factor, like for instance Lending standards.

Table 1 Correlation across macro-financial indicators
Fig. 1
figure 1

Correlation of NFCI subcomponents with the final index. For variables definitions and details on allocation to categories see Fig. 10 in Appendix 6. Due to data availability, correlations are computed over the period 3 June 2005–29 May 2020 (weekly)

The second problem is that some of these statistical techniques do not give much control over the sign with which the individual components end up contributing to the final indicator. Yet, there is outside information that one might want to use to discipline the direction in which the individual series affect the final index. For instance, exchange rates will move financial conditions in different directions depending on the role that foreign currencies have in the domestic economy. For countries that borrow in foreign currency (typically, emerging economies, EMEs), a depreciation implies an increase of the cost of debt in domestic currency, i.e., a tightening of borrowing conditions, a mechanism Bruno and Shin (2015) refer to as the financial channel of exchange rates. Moreover, EME sovereigns issue local currency debt but a large share of EMEs sovereign bonds are held by foreign investors. When the exchange rate depreciates, lenders incur in losses that lead to portfolio outflows. In response to these outflows, and to counteract the pass-through of the exchange rate to inflation (particularly strong in EMEs), central banks may end up raising interest rates, amplifying the initial financial tightening.Footnote 3 Appendix 1 illustrates these channels by showing how differently a financial shock originating in the USA affects the exchange rate, asset prices, and monetary policy rates in neighboring advanced and emerging economies, namely Canada and Mexico. The overall economic effect of exchange rate fluctuations then depends on whether this financial channel is stronger than the standard trade channel. Recent research finds that in emerging market economies, a depreciation is associated with weaker cross-border bank flows and lower real investment pointing to a prevalence of the financial channel (Avdjiev et al. 2019).

The third issue is that the weight that the single indicators receive reflects the nature of past shocks and past crises. It can therefore be the case that some variables that in the past did not cause any crises, yet that ex ante would be interesting to monitor, end up receiving zero weight in a composite index.

We argue that averaging with equal weights the indicators of interest successfully addresses the three issues raised above and produces FCIs that are not inferior to, and often perform better than, those constructed with more sophisticated statistical methods. First, by making sure that no series receives zero weight, the heterogeneity of the underlying components is by definition reflected in the final index. Second, one can judgmentally decide the sign with which the individual components end up contributing to the final indicator (like for instance the exchange rate). Third, one can make sure that all the variables that one wants to keep in sight actually enter the final indicators. An additional benefit is that analyzing the contribution of the individual components is trivial. Combination of indicators has a long tradition in the forecasting literature, dating back to the seminal work of Bates and Granger (1969). While numerous different weighting schemes have been proposed and analyzed, equal weights combinations have been “the workhorse of the combination literature and have set a benchmark that has proved surprisingly difficult to beat” (Timmermann 2006).Footnote 4

Of course, these advantages need to be traded off against the costs of not using any statistical objective function to aggregate information. This cost, however, can be assessed by checking the performance of different financial condition indices based on sensible criteria. We use two empirical criteria to evaluate the performance of different FCIs. First, in the context of quantile regressions, we examine across different methods the strength of the correlation between tightening in financial conditions and recessions. It is well known that recessions that originate in the financial sector are deeper than standard ones. A desirable property of an FCI is, therefore, to bear strong information for the left tail of GDP distribution (Adrian et al. 2019). We perform this exercise both in-sample as well as out-of-sample. The latter exercise is particularly challenging, as recent studies have documented a lack of predictability of tail GDP movements (Hasenzagl et al. 2020). Second, and related to the first, we examine how the alternative FCIs are correlated with future banking and currency crises (broadly speaking financial crises). Financial crises are somewhat related to deep recessions, so we see this exercise as complementary to the previous one. The results of our empirical analysis show that, for both exercises, FCIs constructed via equal weights averaging performs at least as well, and often better, than alternative FCIs.

One contribution of our paper is to test comparable FCIs for a large set of countries. This forces us to limit the number of variables underlying each index but also raises the question of whether “large data” alternatives, available for large advanced economies like the USA or the euro area (EA), outperform our simple indices. In the paper, we show that our FCIs outperform two popular alternatives, namely the Chicago Fed’s National Financial Conditions Index (NFCI) for the USA and the Composite Index of Systemic Stress (CISS) by Hollo et al. (2012) for the euro area, in detecting out-of-sample movements in the left tail of GDP.

The question remains as to how to deal with structural changes. As technology, preferences and regulation evolve, so will the relationship between the financial system and the real economy. For instance, prudential regulation led to banks being more capitalized, weakening the relationship between measures of global risk like the VIX and capital flows (Avdjiev et al. 2020). The use of cryptocurrencies could further increase in the future, so that a question on how to account for them in FCIs could arise. Our answer is that econometric models and complex machinery cannot replace the judgment of economists when it comes to understanding the relationship between financial conditions and economic outcomes. Such a judgment will call for a revision of the indicators to be monitored whenever deemed appropriate in the face of structural changes. But simplicity in the aggregation method will continue to provide valuable robustness to a synthetic index of financial conditions.

The paper is organized as follows. Section 2 provides some economic motivation to the exercise by reviewing the role of financial conditions in economic models. Section 3 provides a description of the data and details on the construction of the indices. Section 4 describes the criteria that we employ to assess the performance of our financial condition indices and discusses the empirical results. Section 5 concludes.

2 Financial Conditions, Leverage and the Macroeconomy

Interest in measuring and monitoring financial conditions burst out after the financial crisis in 2008–2009. The magnitude of the real consequences of this financial shock led to a renewed interest in better integrating the financial system in macroeconomic models. In particular, a large effort was devoted to building models that had inherent amplification mechanisms, able to reproduce the nonlinear effect of financial shocks on output. It also led to the search for summary measures of financial conditions that could signal the buildup of risk in the financial sector or signal imminent stress in the financial system.

On the theory side, two different strands are of interest for our purpose. The first is the set of papers that focus on economies with occasionally binding constraints (Deaton 1991; Guerrieri and Lorenzoni 2017). In these economies, credit constraints embody a pecuniary externality that induces private agents to borrow excessively. Individual agents fail to internalize the aggregate consequences of their individual over-borrowing and carry too much debt when they face a tightening of financial conditions (Bianchi 2011). When asset prices fall and their borrowing constraint becomes binding they need to scale back consumption significantly. This intuition can be generalized to non-financial firms, see, for instance, Jermann and Quadrini (2012) and Liu et al. (2013). These models predict that the loss of net worth due to a large drop in asset prices can lead to long-lasting recessions. In Bernanke and Gertler (1989), for instance, once a shock lowers the net worth of leveraged entrepreneurs, economic activity and profits falls and it takes entrepreneurs a long time to accumulate sufficient retained earnings to rebuild their net worth. Jorda et al. (2013) provide evidence that financial shocks can lead to protracted slumps.

In the second set of papers, financial intermediaries take center stage. Financially leveraged institutions face a sudden spike in their leverage (debt to asset ratio) when the price of the assets that they hold suddenly falls and their net worth plummets. As they try to return to their leverage target, they generate fire sales, exacerbating the initial price spiral and causing a financial crisis (Brunnermeier et al. 2013). Similar amplification effects emerge in models that include frictions in the financial intermediation process, see for instance Gertler and Kiyotaki (2010), Gertler and Karadi (2011), Brunnermeier and Sannikov (2015), and He and Krishnamurthy (2019).

A common theme in all these papers is the interplay between financial conditions and leverage. On the one hand, loose financial conditions favor the buildup of leverage, as inflated asset prices increase the value of the collateral that borrowers can post to obtain credit as well as the equity of lenders. On the other hand, leverage itself makes the economy vulnerable to sudden gyrations in financial conditions as the unwinding of asset prices can trigger a cascade of margin calls and fire sales, whose effects can be amplified when borrowers deleverage simultaneously. This mechanism is asymmetric: the costs stemming from the materialization of systemic risk are much higher than the benefits stemming from the build-up of systemic risk. This asymmetry is at the center of the results in Adrian et al. (2019) and of the growth-at-risk literature generated by this paper.

While measuring leverage is relatively straightforward, financial conditions are an elusive concept, and the question of how to measure them has spurred a large literature, see Kliesen et al. (2012) for a review. A widely accepted interpretation is that financial conditions reflect the price of risk and that this compensation for risk plays a key role in the fluctuation of asset prices. This is, for instance, the interpretation given in official IMF publications, and is the basis for measuring financial conditions via a weighted average of asset prices, where weights may come from principal components analysis, see, for instance, IMF (2019) and Barajas et al. (2021)Footnote 5, or from a combination of principal components and time-varying parameters, see IMF (2017). Although prima facie appealing, these methods are prone to the shortcomings described in Sect. 1. In the rest of the paper, we show that an approach based on equal weights averaging, which has a long tradition in the forecasting literature, is a valid alternative.

3 Data

Our analysis is based on data for 18 advanced and emerging economies at the monthly frequency from January 1995 to May 2020. As Fig. 2 shows, these countries represent about 70% of the world’s GDP at purchasing power parity (PPP).

Fig. 2
figure 2

Source: IMF World Economic Outlook, 2019 data. (Color figure online)

Shares of GDP at purchasing power parity. Blue bars represent advanced economies, and red bars represent emerging market economies. Data in percentages of the world’s total.

The Information Set We base the FCIs on a common information set composed of nine variables, namely (i) nominal long-term government bond yields; (ii) a set of four spreads, i.e., sovereign (for emerging economies only), corporate (for advanced economies only), inter-bank and term spreads (for all countries); (iii) realized equity volatility; (iv) the percentage change of equity and real residential house prices; (v) the growth rate of credit to households and non-profit institutions serving households; (vi) the bilateral exchange rate with the US dollar. For details on data sources and description of variables, see Table 5 in Appendix 2. The choice of the indicators is very standard, aimed at covering the main segments of the financial market, and is broadly consistent with IMF (2019). We also include one non-price indicator, i.e., the rate of growth of credit to households, because housing debt constitutes the key vulnerability to a sudden re-pricing of risk and booms in real estate lending greatly heighten the risk of financial crises (Jorda and Taylor 2015). Results available upon request show that excluding this variable from the analysis does not alter the results. All variables are standardized before aggregation.

EW-FCI (Equal Weights-FCI) The equal weights-FCI is constructed by aggregating the chosen indicators as simple (unweighted) averages. Table 2 summarizes the signs that we use to construct the EW-FCI. Given that an increase in the index reflects a tightening, we assign a positive sign to interest rates, spreads, and volatilities and a negative sign to equity prices, house prices, and credit volumes. We let the exchange rate have a different role for indices constructed for advanced and emerging economies. Since emerging economies own a non-negligible part of their debt in US dollars (Bénétrix et al. 2019), when the local currency weakens against the dollar the cost of debt expressed in national currency rises and financial conditions tighten. For advanced economies, we let the exchange rate work through a traditional trade channel, so that for these countries a weakening of the domestic currency results in an easing of financial conditions.

Table 2 Set of variables (all FCIs) and signs (EW-FCI)

PCA-FCI (Principal Component Analysis-FCI) As a first alternative, we aggregate the indicators through principal component. We select the first principal component, that is the one explaining the largest fraction of the variance of the original variables, to be our PCA-FCI. This aggregation method is widely used in the construction of FCIs, see Kliesen et al. (2012) and IMF (2019).

TVP-FCI (Time-Varying Parameters-FCI) In the third method, which follows Koop and Korobilis (2014) and Arregui et al. (2018), common dynamics across indicators are summarized through a (single) factor model with time-varying parameters. Time variation in the parameters provides a flexible weighting scheme for the input variables and should accommodate model instability. Appendix 3 provides all the methodological details.

Visual Inspection of the Indices Figures 3 and 4 compare the three sets of indicators. For a number of countries (USA and France, for instance) the three methods deliver very similar results. For some other countries, however, the measures differ notably and the TVP-FCI and PCA-FCI give results that are hard to interpret. First, for Germany and Japan they both present a visible upward trend, hard to reconcile with falling rates in both countries. Inspection of the model weights reveals that, for these two countries, the first principal component and the factor model assign a dominant weight to long-term interest rates, loading on this variable with a negative sign and with an opposite sign to spreads and volatilities.Footnote 6 This is an issue that may arise when these techniques are applied to variables that present some mild non-stationarity and sounds an alarming bell on applying these methods heedlessly. The EW-FCI, on the other hand, correctly detects a peak of financial stress during the Great Recession and an easing of financial conditions thereafter. Second, in many cases a simple correlation analysis reveals that the TVP-FCI and PCA-FCI load heavily on some specific indicators, either inter-bank spreads or realized equity volatility. They therefore fail to capture the broader signal coming from many indicators, but rather over-weigh a selected number of indicators. As explained in the Introduction, this is likely due to the low collinearity of the data and is a major shortcoming of principal components and factor models based indicators.Footnote 7 Third, for some countries the behavior of the indices is hard to reconcile with well known economic narratives. In the case of Italy, for instance, the TVP-FCI and PCA-FCI (unlike the EW-FCI) do not detect a major tightening of financial conditions during the 2008 financial crisis nor the resurgence of stress during the Sovereign Debt crisis in 2011. The second striking case is that of Turkey, an economy that underwent significant financial stress in 2019 following a collapse of the Turkish lira. This episode is ignored by both the TVP-FCI and PCA-FCI but is captured by the EW-FCI.

Fig. 3
figure 3

Comparison of the FCIs, 10 largest countries. Shaded areas represent NBER recessions (USA). All indicators are standardized. Countries are ordered by GDP PPP shares

Fig. 4
figure 4

Comparison of the FCIs, continued. Shaded areas represent NBER recessions (USA). All indicators are standardized. Countries are ordered by GDP PPP shares

4 Empirical Analysis

Visual inspection suggests that, in some instances, the PCA-FCI and TVP-FCI can behave counter-intuitively. The question is whether these methods have some statistical properties that make them preferable to a simpler alternative like equal weights averaging. In this section we show that this is not the case. When applied in a growth-at-risk context, the EW-FCI delivers more accurate out-of-sample predictions and it is also a better predictor of banking crises.

4.1 Quantile Regressions

The quantile regression approach provides a framework for estimating the impact of a given variable X on the entire conditional distribution of a dependent variable y. This is achieved through separate coefficients for the various quantiles (see Appendix 4 for more details). Based on this approach, Adrian et al. (2019) find for the USA a close link between current financial conditions and the conditional distribution of future GDP growth. In particular, the lower quantiles of future GDP growth are much more sensitive than the higher ones to current financial conditions developments. Moreover, the entire distribution of future GDP growth evolves over time. Recessions are associated with left-skewed tails, while during expansions the conditional distribution is broadly symmetric. This asymmetry in the evolution of the conditional tails of the distribution of future GDP growth indicates that downside risks to economic activity vary more strongly over time and react more to developments in financial conditions compared to upside risks. We use quantile regressions to test for the nonlinear impact of the three measures of financial conditions described in Sect. 3 on the different quantiles of industrial productionFootnote 8 (for data availability see Appendix 5). For a large majority of the countries in our sample, we find that (regardless of the FCI used) the impact of financial conditions on the lower quantiles of industrial production is significantly more negative than either on the central tendency or on the upper tails, confirming Adrian et al. (2019) results in an international context.

Figure 5 compares the impact on the lower quantile (5th percentile) of the distribution among the three FCIs.Footnote 9 The first criterion that we use to rank the three FCIs is their ability to signal downside risk to economic activity. This is of course a subjective criterion. It could be the case that, over this specific sample period, for some of these countries economic recessions have originated from other (than financial) shocks, so that no dynamic relationship between FCIs and economic activity can be established. But if such a relationship exists and is detected by some of the proposed FCIs, then a ranking across indices emerges. Using this metric, the number of countries for which an FCI fails to detect a significant relationship between financial conditions and economic activity is 8 for the TVP-FCI (India, Germany, Russia, Brazil, Italy, Canada, Norway, New Zealand), 7 for the PCA-FCI (India, Russia, Brazil, Mexico, Canada, Norway, New Zealand) and 6 for the EW-FCI (India, Russia, Turkey, Australia, Norway, New Zealand). In the case of India the EW-FCI is the only one for which the coefficient’s point estimate has the right (negative) sign.

For countries for which large datasets of good quality are available, other indices based on different data reduction methods might work better than a simple average. Two obvious alternatives are the Chicago Fed NFCI for the USA and the CISS by Hollo et al. (2012) for the euro area. The former, already described in the 1, is estimated via a large factor model. The latter is obtained by aggregating 13 indicators of financial stress through a time-varying correlation model. We therefore repeat the quantile regression analysis for the USA and the EA including also the NFCI and the CISS as potential competitors. The results of this exercise, shown in Fig. 6, indicate that neither of these indices has a superior in-sample predictive power.

Fig. 5
figure 5

Impact of FCIs on the 5th percentile of the distribution of industrial production. The white line represents the mean impact of FCIs changes on the 5th percentile of industrial production, while the shaded areas represent the 95% confidence intervals around it. The country-specific sample varies according to data availability, see Appendix 5. Countries are ordered by GDP PPP shares

Fig. 6
figure 6

Impact of FCIs on the 5th percentile of the distribution of industrial production–Comparison with Chicago NFCI and euro area CISS. The white line represents the mean impact of FCIs changes on the 5th percentile of industrial production, while the shaded areas represent the 95% confidence intervals around it. The euro area indicators are obtained by aggregating FCIs for Germany, France and Italy using varying GDP PPP annual shares as weights (see Fig. 2, for 2019). For euro area the data coverage is January 2000–December 2019

Fig. 7
figure 7

Rank of predictive scores of out-of-sample performance. The figure reports the predictive scores of the Probability Integral Transform (PIT). The out-of-sample predictive scores of the predictive distribution for industrial production growth are conditioned on each FCI at a time, a constant, and a lag of industrial production growth. The color coding defines the score ranking by country: the index that performs best the highest number of times is colored dark blue and ranked 1, the next one lighter blue and ranked 2, and so on with the lowest number of cases being colored the lightest blue and ranked 5. Countries are ordered by GDP PPP shares, except for the USA and the euro area. (Color figure online)

A more stringent test for the usefulness of the FCIs is their ability to detect risks out-of-sample, i.e., to signal coming recessions in real time. We therefore test the out-of-sample predictive accuracy of the FCIs in the quantile regression framework. Following Adrian et al. (2019), we compute predictive scores as the predictive distribution generated by the model and evaluated at the realized value of the time series.Footnote 10 The higher the predictive scores, the more accurate the out-of-sample prediction.

To provide a compact view of the results, we summarize them in a heatmap (Fig. 7) where the rows represent the countries and the columns the different FCIs. The darker the color, the better the average out-of-sample performance, so the dark cells indicate the best performing FCI. We use an ordinal criterion to measure performance, i.e., the best performing model is the one that attains the highest score in most months. This avoids ranking first an FCI that has a much better performance in a few selected quarters but does not outperform the others on a consistent basis.Footnote 11 This exercise gives a very clear ranking, as the EW-FCI is ranked first in 10 out of 19 cases, the PCA-FCI in 5 and the TVP-FCI in 4. Out of these three indices, the TVP-FCI attains by far the worst performance, since it is ranked third in most cases. To provide a more complete picture we also include in this comparison the VIX and the Chicago Fed NFCI for the USA and the CISS for the euro area. Their out-of-sample performance is very poor as they end up being ranked last.Footnote 12

The main takeaway of this analysis is that the EW-FCI performs at least as well as a range of alternative indicators in capturing in sample growth-at-risk, and significantly better in an out-of-sample context. This finding aligns well with the conclusion in Timmermann (2006) that equal weights combinations are a surprisingly hard model to beat in out-of-sample forecasting.Footnote 13

4.2 FCIs and Crisis Probability

As a second criterion for assessing the informational content of the three competing indices, we consider their ability to predict a set of crises, specifically systemic banking and currency crises.Footnote 14 For this purpose, we collect data on the timing of these crises from Laeven and Valencia (2020) and estimate the following panel probit modelFootnote 15 for each set of crises:

$$Pr({Y_{t}}=1 \, | \, {X_{t-1}}) = \int _{- \infty}^{X'_{t-1}\beta } \phi (t) dt = \Phi (X'_{t-1}\beta )$$
(1)

where Pr denotes the outcome probability, Y is a binary variable equal to 1 when a crisis occurs and 0 otherwise,Footnote 16 and X is a vector of explanatory variables that influence the outcome. We estimate four different specifications. In the first three, we include each of the competing indicators of financial conditions separately. In the fourth, we include all of them. We also include a set of standard control variables (\(X_t\)), namely the growth rate of inflation and of real GDP, the level of real credit from banks to the private non-financial sector and the growth rate of real domestic and foreign credit.Footnote 17 Since we are more interested in the predictive power rather than in the contemporaneous relationship of the variables, we lag all the regressors by one period.

Table 3 Panel probit

Table 3 reports the results. Let us look first at banking crises (panel A). Except for the PCA-FCI, the coefficients associated with the FCIs have a positive sign (i.e., a tightening in financial conditions at time t−1 increases the probability of a banking crisis at time t). However, the only FCI that combines statistical and economic significance is the EW-FCI. In addition, the magnitude of the coefficients, as well as the value of the log-likelihood, suggest that the EW-FCI is the best performing measure. This result is confirmed by the fact that when we include all the indicators simultaneously (column 4), only the coefficient associated with the EW-FCI remains statistically and economically significant. For a graphical comparison of the models, Fig. 8 plots the receiver operating characteristic (ROC) curves for each of the model in Table 3 and a model including only controls and excluding any type of FCI (that we call model 5 in the chart). Conceptually, the ROC compares the true positive, i.e., the probability of a crisis according to the model when there is a crisis (known as sensitivity), against false positives, i.e., the estimated probability of a crisis when there is not a crisis (known as specificity). The ROC curve of a random choice model is the 45 degrees line. The area below the ROC curve (AUROC) can be interpreted as a measure of accuracy of a binary model. The higher the AUROC, the better the model. The top panel of the chart confirms that the best performing model is model 4 (in yellow), that is the model that includes all FCIs. We formally test whether the AUROC associated with model 4 (i.e., the one with the largest AUROC and lowest log likelihood) is statistically different from the AUROC of every other model in pairwise comparisons. The top panel of Table 4 shows the results. For systemic banking crises, model 4 is not statistically different from the one solely based on the EW-FCI, while its performance is significantly better than that of the TVP-FCI and PCA-FCI models.

Moving to currency crises (panel B of Table 3), when indicators are included on their own, the best predictors are the TVP-FCI and EW-FCI. When including all the FCIs in the model, TVP-FCI is the only one whose coefficient remains statistically and economically significant. In this case, however, formal tests do not detect statistically significant differences across the alternative models, as shown in the bottom panel of Table 4.

Fig. 8
figure 8

Comparison of ROC curves for each model. The numbers in brackets refer to the different models reported in Table 3. Lines represent the ROC curves for each of the models. The blue line refers to the model including TVP-FCI, the red line to the model including PCA-FCI, the green line to the model including EW-FCI, the yellow line to the model including all three FCIs, and the light blue line to the model including only controls and excluding any type of FCIs (the latter is not included in Table 3). (Color figure online)

Table 4 Test for statistical difference of AUROCs

5 Conclusions

In this paper, we evaluate alternative measures of financial conditions indicators for a large number of advanced and emerging economies.

Our econometric evaluation, based on a large sample of countries, shows that FCIs obtained via equal weights combinations of financial variables have good statistical properties. They are not inferior to, and in some instances perform better than FCIs constructed with principal component analysis or with factor models with time-varying parameters. The results hold both in the context of quantile regressions, where they prove useful in anticipating downside risks to economic activity, as well as in probit models, where they show a stronger correlation with future banking crises. Importantly, for the euro area and for the USA these simple indices outperform popular alternatives based on larger information sets and on different econometric methods, namely the Composite Index of Systemic Stress (CISS) for the euro area and the National Financial Conditions Index (NFCI) published by the Chicago Fed for the USA.

Equal weights combination has a long tradition in the forecasting literature, does not suffer from parameter estimation uncertainty and allows for an easy decomposition of the contribution of the underlying indicators to the aggregate index. It is also less fragile in the face of short data and data irregularities, a pressing concern when working with emerging markets data. We believe the results of this paper will be of particular interest for researchers and institutions interested in monitoring financial conditions in a large number of economies, including those for which data are too short or present too many irregularities to be treated through more formal econometric methods.