1 Introduction

The report to the G20 Finance Ministers and Governors prepared in October 2009 by the International Monetary Fund (IMF), the Financial Stability Board (FSB) and the Bank of International Settlements (BIS), defines systemic risk as “a risk of disruption to financial services that is caused by an impairment of all or parts of the financial system and has the potential to have serious negative consequences for the real economy” (IMF, BIS, FSB, 2009). In December 2009, the European Central Bank (ECB) describes systemic risk in similar terms as a risk of financial instability “so widespread that it impairs the functioning of a financial system to the point where economic growth and welfare suffer materially” (ECB 2009). Negative externalities and the significant spillovers to the real economy—the essence of systemic risk—make a case for policy intervention (Russo et al. 2020). However, translating this general insight into practical policies is very difficult (Caruana 2010) and despite a general consensus on the importance of systemic risk and the need to keep it under control, considerable differences between the approach adopted by supervisory authorities and academic discourse remain. The bucketing of the Global Systemically Important Banks (G-SIBs) is a very relevant issue given the required “add-on” systemic capital buffer (and its possible consequence on credit supply). A 50-basis point add-on to the Common Equity Tier 1 (CET1) ratio—the difference deriving from positioning in bucket 1 or bucket 2 for instance—means raising various billion euro (or dollars) of extra equity or an even more considerable downsizing of total risk-weighted assets. Given its implications, bucketing should be straightforward and consistent, irrespective of methodology used.

The goal of this paper is to analyze the ranking consistency of the systemic riskiness of banks investigating different methodologies: two academic measures—(1) the Distressed Insurance Premium (DIP) methodology proposed by Huang et al. (2009) and (2) the SRISK of Acharya et al. (2012)—and (3) the regulatory approach. We also aim at looking how rankings lead to attribution to different buckets and add-ons of capital requirements.

Our paper contributes to the existing literature along a number of dimensions. First, to the best of our knowledge, this is the first attempt to directly and simultaneously compare academic and regulators’ views on bucketing. Moreover, instead of analyzing the financial sector of a specific geographical area as normally occurs in this literature—for instance Benoit et al. (2013) use 94 U.S. financial institutions, Lin et al. (2016) use 31 Taiwan financial institutions, Nucera et al. (2016) use 113 European Union financial institutions—we examine all listed G-SIBs identified by the FSB and the Basel Committee on Banking Supervision (BCBS), using 2014 as reference year, and consider almost the entire population of G-SIBs. Second, we apply the DIP measure and compare the ranking consistency between DIP and SRISK measures. We show that rankings not only depend on the model used but also on a number of assumptions – e.g., DIP rankings change considerably depending on the drop in aggregate bank liabilities set as the threshold (5%, 15% or 25%) for a crisis to be considered systemic. Third, we build risk buckets with DIP and SRISK and compare them with the official FSB–BCBS buckets. In addition to the assumptions necessary to rank the various institutions, we show that the subsequent allocation to different buckets (and consequently the size of the CET1 add-on) also stems from (i) decisions on the methodology used to define the buckets (e.g., FSB–BCBS use a relative measure based on balance sheet data which means that if all banks increase in size they could all become more systemic but their add-ons, based on relative positioning, remain unchanged) and (ii) the way cut-offs are set (e.g., the FSB–BCBS applies a constant 100 basis point division into buckets). Using the SRISK ranking we show that there are considerable differences in bucketing based on diverse cut-off methodologies. We compare the different bucketing methodologies by computing two very simple “diversity” and “harshness” indices. Indeed, bucketing is different and depends on assumptions, methodology and cut-offs; moreover, we show that the interaction between these elements’ changes harshness, namely add-ons, considerably and not uniformly.

The paper is structured as follows. In Sect. 2, we give an overview of the supervisors’ approach and we review the academic methods to calculate systemic risk, in order to have a complete view of the background and to understand which are the main gaps in the literature that we are going to close with our study. Moreover, we explain the different methodologies implemented for measuring financial institutions’ marginal contribution to systemic risk: our DIP approach; the Capital Shortfall, also called SRISK, presented and used in many papers such as Acharya et al. (2012), Acharya and Steffen (2014), and Brownlees and Engle (2017) and the BCBS approach. Section 3 compares the results of our DIP methodology, the SRISK ranking published on the website of the Volatility Institute of the New York Stern University and that of supervisors. Furthermore, we focus on bucketing and compare official buckets with outputs based on DIP and SRISK approaches. Indeed, the FSB–BCBS, using the BCBS (2011, 2013) methodology, ranks G-SIBs according to their contribution to systemic risk and then allocates them in four (theoretically five, but one bucket is empty) buckets corresponding to different required levels of additional loss absorption capacity. Given the importance of these buckets, which imply different capital requirements, we analyze the differences of the various outputs for three years from 2014 to 2016 (that is with 2013, 2014 and 2015 year-end data), the stability of bucket allocation over time and using different allocation methods. Section 4 discusses a number of implications of our analysis, suggests a combined use of different methodologies and concludes.

2 Background and literature

In order to improve the resilience of the global financial system to systemic risk and moral hazard, the FSB initially proposed the introduction of a specific regulatory framework and enhanced capital requirements for Systemically Important Financial Institutions (SIFIs) – i.e., “financial institutions whose distress or disorderly failure, because of their size, complexity and systemic interconnectedness, would cause significant disruption to the wider financial system and economic activity” (FSB 2010). The proposed regulatory framework for SIFIs includes guidelines for more intensive and effective supervision, for effective resolution regime, for increasing loss absorption capacity and improving funding profile, etc.

From an academic perspective, there are various definitions of systemic risk in the literature and a continuously expanding body of research is focused on identifying and measuring the systemic importance of financial institutions (Bisias et al. 2012; Silva et al. 2017; Benoit et al. 2017). A universally accepted definition remains elusive because there are several perspectives from which systemic risk can be defined depending on how the risk originates, and how it is transmitted across different institutions, markets and the “real economy”. The academic literature aimed at measuring systemic risk applies a great variety of data (from financial statement to financial market data) and vast array of techniques, ranging from tail risk measures to contingent claim analysis, from stress-testing approaches to network analysis.

We analyze the ranking consistency between the DIP of Huang et al. (2009) and the SRISK of Acharya et al. (2012) because they are among the most influential systemic risk measures for both academics and supervisors as reported in Benoit et al. (2017).Footnote 1 However, even if DIP is one of the most influential systemic risk measures, to the best of our knowledge, all the papers that study the ranking consistency use other measures, and in particular MES, SRISK and ΔCoVaR. Some papers use DIP and other systemic risk measures without the aim of comparing the ranking consistency. For instance, Cai et al. (2018), considering interconnectedness of banks in the syndicated loan market as a major source of systemic risk, use syndicated loan facilities originated for U.S. firms between 1988 and 2011 to check whether DIP, SRISK and CoVaR are correlated with an interconnectedness measure they build. Moreover, we compare the DIP methodology to the SRISK measure because, among the most popular systemic risk measures, DIP and SRISK differ from the other most relevant market-based measures given that they are not driven by a single market factor, but they use a mix of market and book data. This makes them more comparable with the BCBS methodology. Balance sheet data are also useful to keep the ranking most stable and this is an important feature as explained by Nucera et al. (2016). Lastly, as for DIP measure, it uses the CDS spread among its inputs and this peculiar feature (compared to measures such as MES, SRISK and ΔCoVaR) could be very useful following the findings of Rodríguez-Moreno and Peña (2013) that affirm: “measures based on CDSs outperform measures based on interbank rates or stock market prices”.

In what follows, we explain the different methodologies applied for our analysis.

2.1 Distressed insurance premium (DIP)

We use the DIP methodology proposed by Huang et al. (2009), based on the contribution of Tarashev and Zhu (2008), as applied by Huang et al. (2012a, 2012b) to measure the price of insurance against financial distress for a group of major financial institutions, that is the price to cover the losses on these banks’ liabilities. We apply a modified version of this methodology to almost the entire population of banking institutions classified as G-SIBs by FSB and BCBS in 2014: 26 banks for 2014, 2015 and 2016 (using 2013, 2014 and 2015 year-end data), as listed in Table 1, which sets out the 2014 FSB–BCBS G-SIBs buckets, CDS and Total Assets.

Table 1 2014 G-SIBs – sample and selected indicators (balance sheet items in million euro). Compared to the 2014 list (using end-2013 data), composed of 29 banks, we exclude Bank of New York Mellon, Groupe BPCE and State Street because of missing data (stock market returns and/or CDS spreads). Moreover, the FSB and BCBS added Agricultural Bank of China (and deleted BBVA) in 2015, and China Construction Bank in 2016. These two Chinese banks are not in our dataset due to lack of CDS spread data

We use the following parameters as inputs: a default probability measure derived from CDS spreads; a contagion assessment founded on equity return correlations among financial institutions; liabilities in the official financial statements; a 45% and 75% loss given default (LGD), i.e. the two percentage losses contained in the Basel Foundation approach. Therefore, for each G-SIB and for every year in our analysis (2013, 2014 and 2015), we use: (a) the CDS spread as at 31 December (source Bloomberg), (b) the bank’s daily stock market returns (source Datastream), (c) total liabilities reported in the official financial statements and converted into Euro (source Orbis Bank Focus), (d) the OIS rate as at 31 December as a risk-free rate (source Bloomberg). These data are computed for all the analyzed years and are used to form the systemic risk ranking for the subsequent year (2014, 2015 and 2016).

This methodology is founded on the following assumptions: constant risk-free term structure, OIS rate as proxy of the risk-free rate, flat default intensity term structure, LGD independent from the probability of default and equal to the values required by the capital regulatory framework, multivariate Normal distribution for asset returns.

Our methodology is composed of the following four steps.

  1. 1.

    Estimation of the risk-neutral PD of each bank implicit in the CDS spread, namely

    $$PD = ~\frac{{a\cdot s}}{{a\cdot LGD + b\cdot s}}$$
    (1)

    where s is the CDS spread, \(a = \mathop \int \nolimits_{t}^{{t + T}} e^{{ - r\tau }} d\tau\), b = \(\mathop \int \nolimits_{t}^{{t + T}} \tau e^{{ - r\tau }} d\tau\), and r is the risk-free rate. The implicit PD estimated from the CDS market assumes: (1) the compensation for expected default losses; (2) the default risk premium for bearing the default risk; and (3) other premium components, such as market liquidity premium. All these represent “events” which produce and propagate systemic risk in the banking/financial system.

  2. 2.

    Calculation of the equity return correlation as a proxy for the asset return correlation..

  3. 3.

    Construction of a hypothetical portfolio composed of the sum of the liabilities of all investigated G-SIBs.

  4. 4.

    Computation of the portfolio expected losses with a Monte Carlo simulation. We perform 1,000,000 times the following procedure:

  5. a.

    Extract a N dimensional vector of numbers (where N is the number of analyzed banks) from a multivariate Normal distribution with correlation matrix calculated in step 2;

  6. b.

    Compare the extracted number with the PD computed in step 1: if the number is below the PD value, the bank is defaulted while if it is equal or above the bank remains solvent;

  7. c.

    Compute the losses applying a fixed regulatory LGD to the liabilities of defaulted banks;

  8. d.

    Calculate the portfolio’s expected loss and the expected loss of each bank in the “systemic crisis” events, respectively:

    $$ E\left[ {L\left| L \right. > L_{{min}} } \right] $$
    $$ E\left[ {L_{i} \left| L \right. > L_{{min}} } \right] $$

    where L is the loss for the whole portfolio, Li is the loss for the single bank and Lmin is the threshold set to define a systemic crisis. In the baseline simulation, we set a threshold of the total banking sector loss above 15% (the same threshold is arbitrarily chosen by Huang et al. 2009), i.e. Lmin is 15%.

We modify the approach in two aspects: first, we calculate the correlation matrix for bank asset returns using past stock market daily data, and not with forecasted asset returns as in Huang et al. (2009) or with a dynamic conditional correlation (DCC) GARCH model as in Huang et al. (2012a). Second, differently from Tarashev and Zhu (2008), we do not perform a Monte Carlo simulation for the LGD assessment, instead for each year we implement the described methodology twice, using the two regulatory LGDs (45% and 75%) that yield very similar results. At the end of each Monte Carlo simulation, we compile a ranking based on the systemic risk of every bank, computed as the expected loss \(E\left[{L}_{i}|L>{L}_{min}\right]\) produced from each bank in case of systemic crisis.

2.2 SRISK

We compare the output of our DIP approach with the ranking based on the SRISK measure reported in the NYU Stern Systemic Risk RankingsFootnote 2 and based on the Capital Shortfall proposed by Acharya et al. (2012) that define SRISK as the amount of capital a financial institution would need to raise in order to normally function should another financial crisis occur. This measure uses a what if approach similar to stress tests conducted by most supervisory authorities on a regular basis. The stressed scenario is a 40% drop in the global equity market over six months and banks must satisfy a capital requirement of equity exceeding 8% of total assets, and a lower, 5.5% capital requirement used for banks that apply International Financial Reporting Standards (IFRS). Therefore, in the NYU Stern Systemic Risk Rankings, the capital requirement is 5.5% for European banks, and 8% for all the others. Table 7 shows the ranking of the 26 banks at the end of 2013, 2014 and 2015 extracted from the NYU Stern Systemic Risk Rankings. However, we also present the ranking when the capital requirement is 8% for all the banks and the related consistency (the Spearman’s correlations in Table 9).

2.3 The supervisors’ approach

BCBS (2011) developed an assessment methodology to identify the G-SIBs and starting from that year publishes the updated list annually. The starting point of the BCBS approach is that global systemic importance should be measured in terms of the impact that a failure of a bank can have on the global financial system and the economy rather than the risk that a failure can occur. This can be thought of as a global, system-wide, loss-given-default (LGD) concept rather than a probability of default (PD) concept. In this perspective, the methodology comprises both quantitative and qualitative indicators to assess systemic importance based on five categories of indicators with each category—size, cross-jurisdictional activity, interconnectedness, substitutability/financial institution infrastructure, complexity—given an equal weight of 20%. These indicators are aimed at measuring the multifaceted dimensions of systemic importance and reflecting the determinants of negative externalities and the characteristics that make a bank critical for the stability of the financial system. The multiple-indicator based approach is applied to the data of the previous fiscal year-end supplied by banks and validated by national authorities. For each bank, the score for a particular indicator is calculated by dividing the bank’s amount by the total for a sample of banks considered to be a proxy for the global banking sector and identified by the BCBS. Therefore, this approach is based on a relative measure. Because it is a relative measure it could underestimate required add-ons: ratios could remain similar if all banks increase their indicators. Moreover, it is based on accounting data which is not forward looking and could reflect inconsistent accounting practices.

The final score is then mapped to a corresponding bucket using the cut-off score and bucket thresholds.Footnote 3 When the bank’s final score exceeds a cut-off level set by the Committee, the bank will be classified as a G-SIB. The assignment to a bucket determines the higher loss absorbency requirement for each G-SIB that ranges from an add-on of 1% to 3.5% in the CET 1 Ratio depending on a bank's systemic importance (with the 3.5% CET1 add-on bucket empty as a means to discourage banks from becoming even more systemically important). This “add-on” improves the loss absorbency capacity of the bank and reduces its probability of failure. The list of the 26 G-SIBs and their relative 2014 systemic risk requirement add-on are set out in Table 1.

3 Comparing the methodologies

3.1 DIP rankings

Table 2 illustrates the rankings emerged from applying the DIP procedure for 2014, 2015 and 2016 respectively. The analysis of Table 2 shows that, for each year, the two output rankings are very similar in both simulations with LGD equal to 45% and to 75%: for instance, for 2014, none of the banks change their position for more than two places between the two simulations. Indeed, the Spearman’s rho is above 95% in all the three years.

Table 2 Ranking based on our modified DIP methodology for 2014, 2015 and 2016

Moreover, the ranking, and in particular the top positions are quite stable over the period. Indeed, we observe that the majority of the banks have a stable position or show small changes in the ranking. For instance, only three banks change more than three places between 2014 and 2015 in the simulation with LGD equal to 75%. We detect a larger number of changes between 2015 and 2016: in this case 10 banks change more than three places in the simulation with LGD equal to 75%, especially in the lower part of the ranking, where the Japanese banks (Mitsubishi UFJ, Sumitomo Mitsui and Mizuho) increase their systemic importance in 2016 mainly due to the growth in their liabilitiesFootnote 4 while UBS, Credit Suisse and Nordea decrease it, mainly due to their lower position in the correlation ranking. These observations are confirmed by the Spearman’s rho reported in Table 3: it is above 90% between 2014 and 2015 (91.3% and 93.2% for the simulation with LGD equal to 45% and to 75% respectively) and a bit smaller between 2015 and 2016 (85.4% and 86.5%). Interestingly, the Spearman’s rhos show high ranking stability even over the two-year time span, namely between 2014 and 2016: it is above 80% for both simulations with different LGDs. We observe the turnover at the top 10 G-SIBsFootnote 5 in all of the four cases (both at LGD equal to 45% and 75% and both between 2014–2015 and 2015–2016): 9 banks remain in the top 10. Even if the ranking is stable, we can observe some relevant changes. For instance, at both LGD levels:

  • JP Morgan Chase raises its systemic importance due to the growth in the CDS spread ranking and in the stock market correlation ranking.

  • Crédit Agricole and ING lower their positions. The drivers of these changes for Crédit Agricole are the CDS spread and the stock market correlation, while for ING they are the CDS spread and the liabilities value.

Table 3 Spearman’s rho: correlation between rankings for the same year at different LGD level, or for different years at the same LGD level

Therefore, examining the inputs, we can highlight that all the three determinants are important and interact among themselves in a non-linear way, confirming the findings of Huang et al. (2012b). Using 2014 as an example, Table 4 sets out the Spearman’s rho between rankings in the inputs (liabilities, CDS spreads, correlations, and LGDs) and the output (DIP) ranking. The correlations seem the most influential input in driving our results. Indeed, for instance, the first four institutions obtained in the simulations with LGD equalling 75% are three French banks (BNP Paribas, Credit Agricole and Société Générale), and Deutsche Bank which are very correlated among themselves and with the other Euro area banks. The importance of the correlation as systemic risk determinant is confirmed by the bottom of the ranking where (excluding Wells Fargo) we find the three Japanese banks. They are unrelated with all the other institutions – very low average correlations with the others (always below 20%). However, these three banks have very high correlations among themselves, and the ranking changes if we fix a lower threshold on the total liability loss for considering a crisis “systemic”. Indeed, at the end of 2013, the sum of the liabilities of the three Japanese banks amounts to 12.8% of the overall liabilities of investigated banks and, assuming the LGD equal to 45%, the loss caused by the joint default of these three banks accounts for 5.8% of the total liabilities, which is not enough to consider the crisis as “systemic”. Therefore, we replay the simulation for 2014 with a 5% instead of 15% threshold for considering the crisis as “systemic”. We find that they all face an increase of their systemic importance. The same holds for the two Chinese banks. Thus, an issue for this approach is that the threshold for considering a crisis “systemic” is discretional and this choice can influence the ranking. Table 5 illustrates the rankings based on simulations with different thresholds and Table 6 shows the Spearman’s rho between these rankings. Lowering the systemic crisis threshold from 15 to 5% affects the ranking (Spearman’s rho drops below 80%). The change is less relevant between thresholds 15% and 25% showing a non-linear effect of the threshold choice on the output ranking. The liability size and the CDS spread ranking (that creates the PD ranking) seem a bit less important than the stock market correlation that shows a Spearman’s rho above 80%, even if they are relevant in determining our results (Spearman’s rho always positive and statistically significant).

Table 4 Spearman’s rho: correlation between input and output rankings for 2014
Table 5 DIP ranking 2014 considering “systemic” a crisis that depletes at least 5%, 15% (baseline) or 25% of aggregate total liabilities, assuming a LGD equal to 45%
Table 6 Spearman’s rho: correlation between DIP ranking 2014 assuming a LGD equal to 45% for different thresholds

3.2 A comparison between DIP and SRISK rankings

Table 7 shows the ranking of the 26 banks at the end of 2013, 2014 and 2015 extracted from the NYU Stern Systemic Risk Rankings. Comparing the DIP and SRISK rankings, we can observe that they are very different if we use a 5.5% capital requirement for European banks, while the two rankings are more similar (even if some differences remain) in the case with 8% capital requirement for European banks. Indeed, we obtain a Spearman’s rho (Table 8) that is always over 40% and about 50% in the case of SRISK computed with 8% capital requirement for European banks, while it is quite low in the other case (around 30% in 2014 and 2015). The Spearman’s rho is always a bit larger if we consider the DIP simulations based on LGD equal to 75% compared to the simulations with LGD at 45%. These values are in line with the Spearman’s rho found by Lin et al. (2016) or Nucera et al. (2016) for different systemic risk measures, different samples and different time spans, therefore the low consistency of rankings seems to be a persistent feature. The SRISK approach ranks the Japanese banks and Bank of China in a substantially higher position: they are all in the first eight positions in every year and, in particular, Mitsubishi UFJ is always first, and Mizuho is second in 2015 and 2016 and third in 2014. On the other hand, all European banks seem less systemically important (see, for instance, Santander and BBVA that fall lower in ranking every year). With an 8% capital requirement for European banks, the two rankings are more similar. For instance, the top of the ranking is very similar in all years and, in particular, BNP Paribas is first in both SRISK and DIP rankings in 2015 and 2016, while in 2014 the first four banks for SRISK are the same as the DIP ranking computed with the LGD equal to 45%. However, in each year, assuming an 8% capital requirement for all banks, the SRISK approach ranks the Japanese banks and the Bank of China in a higher position, even if not as high as in the case of 5.5% capital requirement for European banks, while the U.S. banks (see, for instance, Citigroup, Goldman Sachs and JP Morgan Chase) and BBVA seem to be less systemically important compared to our DIP rankings. Obviously, the increase in the capital requirement amplifies the systemic importance of European banks and lowers the ranking of all the other banks, including Japanese and U.S. ones. This result is consistent with Engle et al. (2015), where the authors state that “the total systemic risk borne by European institutions is much larger than the one borne by US institutions”. Moreover, Engle et al. (2015), analysing 196 European financial firms over the 2000–2012 period, find that the five riskiest institutions are Barclays, BNP Paribas, Crédit Agricole, Deutsche Bank and Royal Bank of Scotland. This result is in line with the output of our DIP methodology: the only difference in our top European five is Société Générale instead of the Royal Bank of Scotland (however the two periods do not overlap).

Table 7 SRISK ranking for 2014, 2015 and 2016 (extracted on 31 December 2013, 2014 and 2015 from NYU Stern Systemic Risk Rankings) and computed assuming a capital requirement for European banks equal to 5.5% or 8% (8% for all the other banks)
Table 8 Spearman’s rho between SRISK rankings and DIP rankings
Table 9 Spearman’s rho: correlation between rankings for the same year computed assuming a capital requirement for European banks equal to 5.5% and to 8%, and for different years at the same level of capital requirement for European banks

A concern on the SRISK output can be the change of the rankings when it is computed assuming a 5.5% or 8% capital requirement for European banks. Indeed, the Spearman’s rho (Table 9), even if quite high, is always below 90% (better than the DIP instability due to the chosen threshold, but worse than the very low instability for DIP due to the LGD change), pointing out the presence of some relevant differences. For instance, we can highlight the strong volatility of the HSBC ranking with the SRISK measure: when the capital requirement is 5.5% HSBC ranks very low (and lower compared to the DIP ranking), while it ranks quite high (and higher compared to the DIP ranking) when the capital requirement is 8%.

Observing the dynamic over the three years, the SRISK leads to a very stable output with Spearman’s rho (Table 9) always around 95% between two consecutive years, even more stable than the DIP one. Indeed, we can see that, differently from our DIP ranking, JP Morgan Chase is stable, Crédit Agricole shows only a minor drop in the ranking and the Japanese banks do not show a rise in systemic importance between 2015 and 2016 (therefore, it is possible that the DIP ranking is biased by the exchange rate fluctuation). However, some trends are confirmed: ING lowers its systemic influence in the three years, and Credit Suisse, UBS and Nordea lower it between 2015 and 2016. The top of the ranking is also very stable: observing the turnover in the top 10, in three out of four cases (with a capital requirement for European banks equal to 5.5% and between 2015 and 2016 using 8%) nine banks remain stable in the top 10, while in one case (8%, between 2014 and 2015) 8 banks remain in the top 10.

3.3 A comparison between official (FSB–BCBS) and academic (DIP and SRISK) bucketing

In this section, we examine whether the FSB–BCBS approach, our version of the DIP methodology of Huang et al. (2009) and the SRISK yield similar results in terms of bucketing and capital add-ons. First, we compare the DIP and the official bucketing (Sect. 3.3.1), then we extend the analysis to include SRISK (Sects. 3.3.2 and 3.3.3).

3.3.1 A comparison between DIP and the official ranking

First of all, we use the ranking based on the DIP approach to divide banks into four buckets. We use the maximum DIP value of a single financial institution as reference point, we then rank in bucket 4 the banks that present a DIP value above 75% of the maximum DIP, in bucket 3 the banks with a DIP between 75% (included) and 50% (excluded) of the maximum DIP, in bucket 2 the banks with a DIP between 50% (included) and 25% (excluded) of the maximum DIP, and in bucket 1 the remaining ones. The thresholds are computed for every year separately; we chose to select a different threshold for each year to be coherent with the FSB–BCBS approach.Footnote 6

The two DIP simulations with LGD equal to 45% and 75% yield similar results, with few banking groups allocated to different buckets and none more than one bucket away, as shown in Tables 10, 11, 12.

Table 10 Buckets and surcharges in 2014 using BCBS, DIP and SRISK. SRISK rankings are based both on a capital requirement for European banks equal to 5.5% or to 8%. Columns with “1” in brackets fix the thresholds with the 25%–25%–25%–25% method, while columns with “2” in brackets fix the thresholds with the 10%–20%–30%–40% method
Table 11 Buckets and surcharges in 2015 using BCBS, DIP and SRISK. SRISK rankings are based both on a capital requirement for European banks equal to 5.5% or to 8%. Columns with “1” in brackets fix the thresholds with the 25%–25%–25%–25% method, while columns with “2” in brackets fix the thresholds with the 10%–20%–30%–40% method
Table 12 Buckets and surcharges in 2016 using BCBS, DIP and SRISK. SRISK rankings are based both on a capital requirement for European banks equal to 5.5% or to 8%. Columns with “1” in brackets fix the thresholds with the 25%–25%–25%–25% method, while columns with “2” in brackets fix the thresholds with the 10%–20%–30%–40% method

Compared to the FSB–BCBS bucketing, the two DIP rankings highlight a stronger systemic importance for Eurozone banks (Banco Santander, BBVA, Crédit Agricole, and Société Générale), while banks such as HSBC seem less systemically relevant. Ideally, bucketing should be consistent irrespective of methodology used, as it implies different capital requirements for the financial institutions involved. Instead considerable discrepancies emerge, for instance, HSBC is required an add-on of 2.5% of CET1 in 2014–2015 and of 2% in 2016, while with our DIP estimates the capital add-on is only 1%; conversely the FSB–BCBS requires an add-on of 1% CET1 to Crédit Agricole, Banco Santander and Société Générale, while our buckets based on the DIP measure require a 2% add-on (2.5% for Crédit Agricole in 2014).

We think that this result is driven by the large number of correlated Eurozone G-SIBs. Indeed, the 26 G-SIBs include eight Eurozone banking groups (and other seven European ones, for a total of 15 European financial institutions, while there are only six U.S., three Japanese and two Chinese banks) and the correlation among stock market returns of the Eurozone banks is very high (the first six banks in the correlation ranking are all from the Eurozone). This feature could affect the DIP rankings for two main reasons:

  1. 1.

    The threshold on the total liability loss chosen in the DIP methodology to classify a crisis as “systemic” is important in determining the ranking. In short, G-SIBs composition (i.e. presence of European banks) can influence the ranking, especially in connection with the chosen threshold, creating non-linear responses of the ranking to threshold changes. However, we have to point out two further considerations: (i) the Eurozone banking system is relatively larger than the banking system of other countries, such as the US, in which the market circuit is more developed, therefore in this perspective the sample of G-SIBs actually reflects the reality; (ii) our DIP ranking is computed on the sample derived from the FSB–BCBS ranking that is possibly biased by the absence of other relevant banks (Masciantonio 2015).

  2. 2.

    The low correlation among banks in different geographical areas, emerging from daily stock market returns, can underestimate the “multivariate tail” risk. The problem is the joint default distribution: as explained in the Extreme Value Theory and in the ample literature on Copula models applied to finance,Footnote 7 the use of a multivariate Normal distribution to simulate the defaults underestimates the joint tail risk. Indeed, the correlation in the systemic crises is larger than the correlation in relatively calm periods, implying a larger risk of joint default of many institutions. Therefore, using a different methodology (for instance based on copulas, as in Segoviano and Goodhart 2009, or as in Oh and Patton 2017) that considers the strong connection among different areas during a crisis, the ranking could change reducing the Eurozone importance.

However, the considerable differences found between the DIP and the FSB–BCBS rankings cannot be explained only by the correlation issue, but are also driven by other reasons:

  1. 1.

    The DIP measure uses size of liabilities, correlation and PD (implicit in CDS spreads), while the FSB–BCBS ranking is based on a larger number of idiosyncratic determinants. The FSB–BCBS approach uses a completely different perspective, it is aimed at measuring the systemic importance of a single institution for the purpose of requesting a higher capital buffer to mitigate the effects of a potential crisis. It makes no attempt to evaluate the PD;

  2. 2.

    The DIP methodology also considers market data, while the FSB–BCBS ranking is based on past balance sheet data. The BCBS uses an ex-post perspective—that would be not very useful to give an early warning and to prevent a financial crisis—and considers measurement as a stepping stone to mitigation via higher add-ons. Instead, many academic methods use up-to-date data such as market data;

  3. 3.

    Other reasons for the relevant differences between the two rankings are related to the way in which the various methodologies are implemented and their intrinsic shortcomings. First of all, as pointed out by Huang et al. (2009) the computation of the PD is tied to some assumptions that are often not fulfilled (constant risk-free term structure, flat default intensity term structure) and the LGD is independent of default risk.Footnote 8 Moreover, the exchange rate fluctuation can modify the evaluation of the liabilities value of banking groups located in different countries. Finally, this methodology faces the same common problems as the other approaches based on default probabilities and contingent claims such as: (i) they do not directly assess the network topology and therefore it is impossible to identify the effective contagion paths among institutions; (ii) as highlighted by Di Iasio et al. (2013) “several critics argued that these asset prices-based indicators might perform well as thermometers (coincident measures), but not as well as barometers (forward looking indicators)”. The FSB–BCBS ranking presents some other problems: it uses “rules of thumb” to compute the systemic importance of each financial institution, for instance, the weight allocated to each indicator is arbitrary and not scientifically grounded; moreover, according to Benoit et al. (2017), its score is dominated by the most volatile categories.

3.3.2 A cross-sectional comparison of bucketing

We extend the analysis to include the SRISK methodology. As in the DIP case, we use the maximum SRISK value of a single financial institution as reference point, we then rank in bucket 4 the banks that present an SRISK value above 75% of the maximum SRISK, in bucket 3 the banks with a SRISK between 75% (included) and 50% (excluded) and so on. However, applying the same cut-off thresholds to SRISK we observe a lower number of G-SIBs in bucket 1 (requiring lower add-ons), while the upper buckets are much more crowded. To obtain a more similar distribution of G-SIBs among buckets we fix another set of cut-off thresholds with the following method 10%–20%–30%–40%: we use the maximum SRISK value of a single financial institution as reference point, then we rank in bucket 4 the banks that present a SRISK value above 90% of the maximum SRISK, in bucket 3 the banks with a SRISK between 90% (included) and 70% (excluded) of the maximum SRISK, in bucket 2 the banks with a SRISK between 70% (included) and 40% (excluded) of the maximum SRISK, and in bucket 1 the remaining ones (also with negative SRISK). In the rest of the paper we call method 1 cut-off thresholds set with the 25%–25%–25%–25% division and method 2 the 10%–20%–30%–40% division. Tables 10, 11, 12 show official bucketing and bucketing using the two academic approaches for years 2014, 2015 and 2016 respectively.

First of all, observing the SRISK bucketing, we can highlight the importance of how cut-off thresholds are set, that is, method 1 (25%–25%–25%–25%) against method 2 (10%–20%–30%–40%). Method 1 leads to higher add-ons for a number of G-SIBs and the change from method 1 to method 2 lowers one bucket for about half the G-SIBs. Compared to both FSB–BCBS and DIP, considerable differences emerge in bucketing using SRISK. In particular, compared to DIP, SRISK bucketing is largely different when computed with a capital requirement for European banks equal to 5.5% – consistently with the lower Spearman’s rho values found in the rankings above (see Table 8). However, SRISK is more similar to the DIP than to the official ranking, confirming some of the divergences between DIP and FSB–BCBS bucketing.

Given that bucketing and add-ons depend on the approach used, we want to focus on the differences between the academic approaches and the regulatory one. In order to quantify the magnitude of the differences, we tabulate all differences computing the gap between the academic and the FSB–BCBS bucket (Table 13): for instance, if the academic approach sets a bank in bucket 1 while the FSB–BCBS in bucket 4, that bank is counted in column “ − 3”, while if the academic approach sets the bank in bucket 3 and the FSB–BCBS approach in bucket 2, that bank is counted in column “ + 1”, and so on. Then, to have an immediate measure of diversity, we compute a “diversity index”, that is simply the sum of all the differences in absolute values: for instance, if we find one bank in column “ − 3”, it counts │ − 3│*1 = 3, while if we find four banks in column “ + 2”, they count │ + 2│*4 = 8, and so on.

Table 13 Number of G-SIBs in different bucket between the FSB–BCBS approach and one of the academic approaches for each year. We compute the gap between academic bucket and FSB–BCBS bucket. The diversity index is the sum of differences among buckets and the harshness index measures overall impact of different bucketing

We observe that more than half of the banking groups are placed in different buckets by the academic and the FSB–BCBS approaches. The DIP approach, and in particular the simulation with LGD equal to 75%, is the most similar to FSB–BCBS bucketing. However, also in this case the number of banking groups in the same bucket between FSB–BCBS and DIP rankings is only 11 in 2014 and 2015 and 12 in 2016. In other words, even if many financial institutions are placed in the same bucket (typically institutions in bucket 1), the majority of G-SIBs is placed in different buckets.

Table 13 also shows a “harshness index” —i.e., the aggregate result of positive and negative differences considering the number of institutions which are placed in different buckets weighted by the difference in buckets. This measures the overall impact of differences in bucketing in terms of add-ons for the 26 G-SIBs—e.g., if one bank is placed in the next lower bucket and another rise to the next higher bucket the different bucketing approaches leads to the same overall basis point add-ons. These two indices provide a first assessment of whether the various methodologies rank differently—i.e., the overall level of add-ons is similar, but they are imposed to different institutions – or whether a methodology is more stringent and leads to overall higher capital surcharges.

Bucketing using the SRISK measure presents an average diversity index higher than DIP, therefore the DIP measure seems more in line with the regulatory one. Moreover, the importance of cut-off thresholds used for bucketing is confirmed. Indeed, harshness using the SRISK measure depends on the requirement for European banks, but especially on how cut-off thresholds are set. Using SRISK, assuming a 5.5% requirement for European banks and cut-off method 2, would actually lead to impose lower add-ons than the official approach. Instead, using method 1 we always find a higher level of the “diversity index” compared to method 2, with a relevant shift of the G-SIBs towards higher buckets and consequent considerably higher add-ons than with the FSB–BCBS approach.

3.3.3 A comparison of the changes in the bucketing over time

Looking at the evolution of the allocation of the banking groups in the four buckets from 2014 to 2016, we find that with FSB–BCBS bucketing no banking groups change more than one bucket from one year to the next and there are only nine changes over the two years. With SRISK the number of changes depends on the requirement for European banks and cut-off methods used, but there are no banks that change more than one bucket in the period. With the DIP approach, there are two banks (JP Morgan and Bank of America) that increase two buckets over the two years. Indeed, the FSB–BCBS bucketing presents only two changes between 2014 and 2015 (which are the reduction of the capital requirements for Royal Bank of Scotland that passes from bucket 2 to bucket 1, and the cancellation of BBVA from the list) and seven changes between 2015 and 2016 (HSBC, Barclays and Morgan Stanley decrease one bucket, while Citigroup, Bank of America, ICBC and Wells Fargo increase one bucket).

Instead, the DIP methodology produces many switches between 2014 and 2015:

  • with LGD equal to 45%, four banking groups (Bank of America, Credit Suisse, JP Morgan Chase and UBS) increase one bucket, and three banking groups (Barclays, Crédit Agricole and Royal Bank of Scotland) lose one bucket;

  • with LGD equal to 75%, two banking groups (Bank of America and JP Morgan Chase) increase one bucket, and three banking groups lose one bucket (Barclays, Crédit Agricole and Royal Bank of Scotland, as with LGD at 45%).

The mobility among buckets is confirmed also between 2015 and 2016: eight changes with LGD at 45% and 10 changes with LGD at 75%. It is interesting to note that with LGD equal to 75% between 2015 and 2016 there are only movements towards higher buckets. This is due to the fact that we change cut-off thresholds every year using as reference the maximum DIP of that year (trying to be coherent with the FSB–BCBS approach). However, this choice for the setting of the thresholds does not change our conclusion on the somewhat lower bucketing stability of DIP compared to the other approaches. Indeed, keeping the threshold fixed all the three years at the level found for the first year, we find (similarly to the simulation with LDG at 45%) that between 2014 and 2015 there are seven switches (4 upwards and 3 downwards), and between 2015 and 2016 there are nine switches (6 upwards and 3 downwards).

The dynamic of the buckets built with the SRISK approach reflects the capital requirement applied to European banks and on the method chosen to fix the thresholds. Using a capital requirement of 8% for European banks and method 2 for the thresholds, the SRISK bucketing is even more stable than the official approach: five switches (1 up and 4 down) between 2014 and 2015 and two switches (1 up and 1 down) between 2015 and 2016 (and a very similar overall requirement to the official approach, as shown by the harshness index in Table 13). Instead, in the other three cases (that is capital requirement at 8% and method 1 for thresholds, and capital requirement at 5.5% with both methods for thresholds) the SRISK approach produces more switches, slightly above the number of changes found with the FSB–BCBS approach and just under the number of changes found with the DIP approach. Another feature of the SRISK methodology is the presence of two conflicting changes in the bucket allocation for the same banks. For instance, with a capital requirement for European banks at 5.5% and using the second method for thresholds, ICBC passes from bucket 2 in 2014 to bucket 1 in 2015 and returns to bucket 2 in 2016, while JP Morgan Chase is placed in bucket 1 in 2014 and 2016 and in bucket 2 in 2015.

However, there are also some similarities in the rankings. Indeed, all methodologies share some common trends. In particular, between 2015 and 2016, Bank of America moves from bucket 2 to bucket 3 with both FSB–BCBS and DIP approaches and moves from bucket 1 to bucket 2 with three out of four in the SRISK rankings: it implies a strong signal of increased systemic risk. Similarly, in the same time span, Citigroup increases its systemic importance: from bucket 3 to bucket 4 with BCBS methodology, from bucket 2 to bucket 3 with DIP approach, and from bucket 1 to bucket 2 with SRISK with 5.5% capital requirement for European banks and method 2 for thresholds. In this case, where different methodologies show a similar trend, this can be interpreted as a strong signal of systemic risk growth.

Another characteristic is that, while some differences remain stable over the years (for instance, BNP Paribas is always in bucket 3 with the BCBS methodology, whereas it is always in bucket 4 with the DIP approach), certain discrepancies diminish. In particular, Morgan Stanley shows a change in the FSB–BCBS ranking towards the bucket previously fixed by the DIP and SRISK approaches, while HSBC seems to “converge” increasing from a low bucket in some of the DIP and SRISK bucketing and decreasing from bucket 4 to bucket 3 with the FSB–BCBS approach; furthermore, the DIP bucketing of JP Morgan and Goldman Sachs moves towards the BCBS ranking, confirming that neither the BCBS nor the DIP approach can be viewed as an early warning compared to the other one. Similarly, the SRISK rankings of Crédit Agricole, ING and Unicredit decreases toward bucket 1 in which the FSB–BCBS method place these G-SIBs. Table 14 summarizes our findings.

Table 14 Bucket changes. Column 0 counts the compensations (banks that go back and forth in the two years)

The relative stability of the BCBS approach could underestimate changes in systemic risk but avoiding unnecessary volatility (and also back and forth as in the SRISK bucketing) in the capital requirements applied to the banking groups is probably even more important, as suggested by Nucera et al. (2016). This trade-off between rigidity and volatility should be studied in future research.

4 Discussion and conclusion

Our comparison of the FSB–BCBS ranking and bucketing with the two academic approaches identifies several differences and complementarities between the various methods. It shows that the determination of the additional capital requirement for each G-SIB is the result of two different phases, ranking and bucketing, and both considerably influence the final outcome:

  • ranking depends on the methodology used and the implicit assumptions that it entails;

  • bucketing reflects how cut-offs are determined.

Both aspects are worthy of further academic reflection and investigation especially related to their shortcomings.

There are at least three areas of concern and therefore of interest related to the methodology used to rank banks: the theoretical model and its foundation, the data, and the model risk. All three are obviously largely interconnected and overlapping.

Nucera et al. (2016) suggest that regulators do not adopt academic measures because of their weak theoretical foundation. The theoretical foundation must address some general fundamental issues and some specific aspects. The fundamental issues are as follows.

  • What is systemic risk? As pointed out by Masciantonio (2015), should we measure the impact that the failure of an institution could have on the global financial system like the LGD concept (the systemic importance), or should we also consider the PD of an institution (the systemic risk contribution)? Should the risk of a bank be measured in term of the vulnerability to shocks arising from other institutions (exposure) or in term of the ability to propagate its shocks to other institutions (contributions), given that the two concepts are not equivalent as shown by Drehmann and Tarashev (2011)?

  • Which are the sources of systemic risk and how should they be measured? Indeed, all academic approaches lack many important systemic risk determinants or, even worse, they encompass only one facet of systemic risk, as suggested by Benoit et al. (2013, 2017) which show that one-factor linear models explain most of the variability of the systemic risk estimates. This is consistent with Drehmann and Tarashev (2011), who find that simple indicators well approximate model-based measures and suggest for practical purposes to use an indicator approach composed by simple indicators, easy to be implemented and communicated. This could be a theoretical base for the indicator approach chosen by the BCBS. However, also this approach has a weak theoretical foundation based on “rules of thumb” (for instance, why are the weights chosen in that way?) and on a choice of indicators that can be discussed (for instance, Cai et al. 2018, highlight that the BCBS interconnectedness is determined on direct exposures among financial institutions, but it does not consider the commonality of asset holdings that is a very important systemic risk factor in case of fire sales).

Every measure also presents specific drawbacks or drawbacks typical of a group of measures based on some assumptions. Examples of specific drawbacks highlighted in this paper are, for our DIP approach, the strong dependence of the results on the threshold chosen for a crisis to be considered systemic, the PD computed using constant risk-free term structure and flat default intensity term structure, the LGD independent of default risk, the influence of exchange rate fluctuations on the ranking and so on. The SRISK measure is, instead, based on an assumption of a 40% drop of the stock market in six months and is sensitive to other assumptions that may seem minor, but nonetheless change the ranking (see, for instance the impact of setting the capital requirement at 5.5% or 8% for European banks). Other measures like VaR or ES are based on the chosen confidence level, and so on while in the regulatory approach there is the ad hoc truncation of the most volatile categories. Moreover, there are drawbacks typical of (or at least more relevant for) a group of measures. For instance, with the words of Benoit et al. (2017), the market-based approaches “are rarely theoretically grounded and generally do not permit to clearly identify the source of risk at play”.

Data used is the second problem regarding the methodology and we can highlight two main issues: (i) which kind of data should we prefer (for instance, market data or accounting data)? (ii) How should we choose the sample of financial institutions that has to be analyzed? Besides the discussions presented in the previous section, there are two further major problems: first, a higher data frequency (for instance with daily market data) allows to detect sudden shifts in systemic risk (Benoit et al. 2017) but implies higher ranking volatility (Nucera et al. 2016). Second, some methodologies rely on data disclosed with a lag (such as accounting data), or data not in the public domain (such as interconnectedness) or data not available for some institutions; the latter problem is obviously related with the sample selection issue. Moreover, the lack of data is a problem also for methodologies based on market data that rely on a large dataset. Indeed, authors as Danielsson et al. (2016), Hansen (2014), and Löffler and Raupach (2013) point out the low frequency of systemic financial crises.

Lastly, the third problem related to the methodology is model risk. Danielsson et al. (2016) explain that even for the same systemic risk measure we can find different estimates and rankings: for instance, the SRISK can change using in its procedure a different volatility model, that is an historical simulation or a GARCH or an extreme value theory, or simply using a different parameterization for the same volatility model, for instance various GARCH models. They conclude that regulators should determine with caution which are the G-SIBs, and that a “better understanding of model risk should lead to more robust policymaking”. Similarly, Löffler and Raupach (2013) explain that non-linearities, sampling errors and misspecified estimators can invalidate the analysis based on market data. However, some of the highlighted problems are present also for methodologies not based on market data.

Moreover, bucketing G-SIBs largely depends on how cut-offs are set. We show that the change in the method used to fix the thresholds in the SRISK model moves almost half of the G-SIBs up or down one bucket. Furthermore, we show that some G-SIBs change bucket with the DIP approach if we fix the threshold every year or if we keep the threshold fixed over time. However, the thresholds issue is common to all methodologies and the following aspects should be addressed: should the threshold remain fixed over the years? Should the performance be evaluated in a “relative” or in an “absolute” way? We think that the threshold should be kept constant over a number of years and the excess capital should be evaluated in an absolute way, differently from the BCBS methodology that, as already explained, computes the score of every indicator by dividing the individual bank amount by the aggregate amount summed across all banks. Indeed, an “absolute” evaluation against a fixed threshold presents two important features: (i) it allows single banks, which have limited information on the overall system, to manage their own systemic importance, and possibly strive to reduce it in order to contain the required capital surcharge; (ii) it allows the financial regulator to perceive the changes in the overall systemic risk level. Obviously, the threshold can be adjusted over time, at suitable time intervals.

All methodologies present shortcomings but this is the very reason that makes these measures complementary (Hansen 2014). We suggest a more holistic approach that considers different methodologies. This is supported by the multidimensional nature of the risk involved as pointed out by Rodríguez-Moreno and Peña (2013) and the diversity of the financial system as in Ellis et al. (2014). Complete measurement would require a different and more robust approach; however, adding to the official approach, as control methodologies, other complementary measures seems more feasible in the short run and could be useful for the development of a mix modelFootnote 9 in the future. The improved performance of mix models is a feature found in different environments (see, for instance, Kuester et al. 2006, on VaR forecasting). However, proposing a formal combination of models in itself presents drawbacks as shown by Nucera et al. (2016) and Grundke (2019).Footnote 10

Concluding, because of its multidimensional nature and the lack of a generally accepted definition, in operational practice and academic literature there is no standardized way of measuring systemic risk and, in particular, the systemic risk of individual banks. We compare the regulatory view and academic measures of systemic risk. We find that ranking is affected by both the methodology and the decisions regarding the assumptions required by each methodology. Then, we divide the analysed banks into buckets and compare the capital surcharge based on the DIP and SRISK methodologies with the official buckets published by the FSB–BCBS. As envisaged by Danielsson et al. (2016), we find considerable differences in the results. The way cut-offs are set also considerably impacts bucketing and thus add-ons.

In sum, systemic risk is not unambiguously measured by academics, supervisors, and practitioners, consequently every measure provides different risk rankings that reflect the different perspectives and inputs. All methodologies present shortcomings but this is the very reason that makes these measures complementary, we therefore propound that, in addition to the accounting-based approach already applied, supervisory authorities also consider as control methods, market-based measures proposed by the academic literature that address the specific shortcoming that the authorities may wish to overcome. As pointed out by Benoit et al. (2013) “future risk measures should combine various sources of information” and “should also consider the definition of the perimeter of the financial system”. Indeed, a more comprehensive approach could also encompass network connections and integrate the shadow banking system and other risk factors (such as real estate and sovereign debt) that have played a key role in the recent systemic banking crisis. Given the importance of the practical implications for banking groups, that face capital surcharges at the increase of their systemic importance, the research in this field should continue to be vigorously pursued.