1 Introduction

The latest financial crisis highlighted the importance of banks and the effects that their failure has on a wider economy. Failure prediction and corporate governance (CG) are the two most important researched areas that contribute to the success of banks. Failure of banks not only affects the banks themselves but also reaches the global economy (Liang et al. 2016). The importance of failure prediction in banks has been highlighted by many researchers (Ravi Kumar and Ravi 2007; Boyacioglu et al. 2009; Wang et al. 2014; López Iturriaga and Sanz 2015; Liang et al. 2016). It is also necessary for a bank to predict its failure as early as possible. The precautions and preventive procedures that need to be taken not only depend on the probability of the bank’s failure, but also on the time horizon of the prediction (López Iturriaga and Sanz 2015; du Jardin 2017).

Failure prediction has been widely researched by using financial ratios. However, papers that study the failure of banks have not given much attention to other variables such as CG characteristics. There are many reasons to believe that incorporating CG characteristics in failure prediction will enhance the accuracy of prediction. First, CG is known for its importance and contribution to the success and failure of firms. Second, other research shows that incorporating non-financial variables has improved the accuracy of prediction models (Ioannidis et al. 2010).

Some studies have incorporated non-financial variables such as market-driven variables. Studies that use non-financial predictors include Cheng et al. (2018) and Charalambakis and Garrett (2016). However, Liang et al. (2016) declare that, even though the importance of CG is well recognised in the literature, little effort has been made to conduct empirical studies that test the contribution of CG indicators in failure prediction along with the financial ratios. They also declare that previously conducted studies have only used some selected features of CG, which suggests the need for a thorough examination of various CG indicators.

The selection of the financial ratios is an important process in failure prediction (Wang et al. 2014). The financial structure and characteristics of banks differ from other sectors (Cielen et al. 2004; Wu 2016), thus, common financial ratios used in non-financial sectors are not applicable to banks. As a result, the CAMEL rating system is adopted as a predictor of bank failure. It is a five-part rating system to evaluate banks’ overall condition based on their Capital adequacy, Asset quality, Management expertise, Earning strength and Liquidity.

In addition, long-term prediction of bank failure is a very important aspect as it affects decision-making, especially lending decisions. Basel Committee on Banking Supervision (2009) recommended banks to estimate the risk of lending decisions over a long-term period. du Jardin (2017) states that for this prudential reason, prediction exceeding one year is very important, especially for banks. The author provides a review of the time horizons of prediction in studies. The review shows that most studies provide predictions up to three years horizons, while fewer extend it to four- or five-year horizons. The review also shows that the optimal prediction accuracy is one year before failure, from that point the accuracy rates decrease, where the average rate for a one-year horizon is 85% and decreases to 69.5% for five years horizons. Similarly, López Iturriaga and Sanz (2015) state that the reliability of failure prediction is a concern when the time horizon exceeds the short term.

Therefore, this paper contributes to the wide literature on failure prediction by investigating the role of CG variables as non-financial predictors in enhancing the prediction accuracy of US bank failure using financial ratios. There are empirical studies that use CG as a non-financial predictor of failure (Daily and Dalton 1994; Lee and Yeh 2004; Brédart 2014a, b; Liang et al. 2016; Wu 2016; Jones 2017). However, these studies were conducted on non-financial firms, and to the best of our knowledge, CG has not been examined before as a non-financial predictor of the failure of US banks. In addition, we categorise the financial ratios into five categories, namely Capital, Assets, Management, Earnings and Liquidity (CAMEL) and identify the effects of each category on failure prediction. We also show which of these categories is the most significant for banks. These categories are in line with the rating system developed by the Federal Deposit Insurance Corporation (FDIC).

To study the bank failure prediction, we use Discriminant Analysis (DA) and check the robustness of the results using Logit Regressions (LA). We also perform an additional analysis using an out-of-sample examination to support the accuracy of the prediction model. The results show that adding CG variables to the traditionally used financial ratios enhances the accuracy rate and extends the time horizons. We believe that this is due to providing a broadened view of the banks’ condition by adding the non-financial predictors. The findings also show that, amongst the CAMEL ratios, earnings and liquidity are the more significant predictors. On the other hand, amongst the CG variables, CEO pay slice, unequal voting rights and institutional shareholding are the most significant predictors.

The paper is structured as follows: Sect. 2 contains the literature review of predicting bank failure, Sect. 3 includes the main analysis with the results’ discussion, Sect. 4 presents the robustness test, and, finally, Sect. 5 concludes.

2 Literature review

Researchers assert that bank failure prediction is a benefit to all shareholders, managers and stakeholders (Ravi Kumar and Ravi 2007; Chauhan et al. 2009; Wang et al. 2014). Ravi Kumar and Ravi (2007) provide a review of different statistical and intelligent techniques used in failure prediction studies conducted during 1968–2005. Their review reveals that most studies were conducted on firms and not banks, and mainly focused on the period from 1980 to 2003. This, alongside other reasons, highlights the importance of studying failure prediction in banks. For example, bank failure affects the whole economic stability (Boyacioglu et al. 2009), failure prediction enables banks to make appropriate lending decisions (Liang et al. 2016) and bank failure could have been prevented if appropriate failure prediction tools had been used (Kao and Liu 2004). Also, Sinnadurai et al. (2022) find that distressed companies are more likely to recover if their distress is diagnosed at early stages.

2.1 Failure prediction methodologies

Both statistical and non-statistical models have been used to predict firms’ failures. Among statistical methodologies, the most common is the DA, which was initially used by Altman (1968) and then developed and adopted by Boyacioglu et al. (2009), Canbas et al. (2005), Cielen et al. (2004), Cox and Wang (2014), du Jardin (2017), du Jardin (2016), Haslem et al. (1992), Kao and Liu (2004), Karels and Prakash (1987), Ohlson (1980) and Serrano-Cinca and Gutiérrez-Nieto (2013). Other methodologies include LR used by Boyacioglu et al. (2009), Brédart (2014a), Canbas et al. (2005), Daily and Dalton (1994), du Jardin (2017), du Jardin (2016), Kao and Liu (2004), Lee and Yeh (2004), Ohlson (1980), Serrano-Cinca and Gutiérrez-Nieto (2013), Wang et al. (2014), West (1985) and Wu (2016), Principal Component Analysis (PCA), used by Boyacioglu et al. (2009), Canbas et al. (2005) and Kao and Liu (2004), and PLS-DA, used by Serrano-Cinca and Gutiérrez-Nieto (2013).

Among non-statistical models, artificial intelligence tools are widely used for failure prediction. In most studies, they have proven to be highly accurate. However, Boyacioglu et al. (2009) used both statistical and artificial intelligence techniques to predict the failure of Turkish banks during the crisis. Their findings show that, while artificial intelligence tools are superior prediction techniques, the other statistical techniques also provide satisfying results in prediction. Similarly, Jones et al. (2017) show that simple classifiers such as LR and DA perform reasonably well in bankruptcy prediction. In addition, Alaka et al. (2018) use several prediction tools including two statistical tools (DA and LR) and six artificial intelligence tools. They found that no single tool is predominantly better than other tools.

Other studies compare several statistical and intelligence methodologies. Boyacioglu et al. (2009) find that DA and LR analysis are better failure predicting models among other models including neural network, support vector machine, and cluster analysis. In assessing bank crisis, Davis and Karim (2008a) compare LR with signal extraction in early warning systems, and in another study, Davis and Karim (2008b) compared LR with binomial tree-based early warning systems. The results of both studies suggest that LR performs better than the rest of the techniques.

2.2 Financial ratios

Pioneers in failure prediction have utilised financial ratios for the prediction of firm failure using statistical models (Beaver 1966; Altman 1968; Ohlson 1980). Subsequently, studies have mainly incorporated the traditionally used financial ratios but with different feature selection techniques, including Boyacioglu et al. (2009), Chauhan et al. (2009), Cox and Wang (2014), du Jardin (2010, 2016, 2017), Feki et al. (2012), Hosaka (2019), Lin et al. (2011), López Iturriaga and Sanz (2015), Serrano-Cinca and Gutiérrez-Nieto (2013) and Wang et al. (2014).

The selection of the financial ratios is an important process in failure prediction (Wang et al. 2014). du Jardin (2017) was able to have up to three years’ horizon prediction using variables selected based on prior literature. Because the financial structure and characteristics of banks differ from other sectors (Cielen et al. 2004; Wu 2016), common financial ratios used in non-financial sectors might not apply to banks. As a result, researchers have tried to adopt ratios in the CAMELS rating system as predictors.

CAMELS is a six-part rating system to evaluate banks’ overall condition based on their Capital adequacy, Asset quality, Management expertise, Earning strength, Liquidity, and Sensitivity to market risk. This rating system was developed by the Uniform Financial Institutions Rating System (UFIRS) in 1979 and is mandated by the Federal Deposit Insurance Corporation Improvement Act (FDICIA) of 1991 (Federal Deposit Insurance Corporation 1997).

Initially, the rating system consisted of only five groups which are Capital, Assets, Management, Earnings and Liquidity. In 1995, an additional group was added which is the Sensitivity to market which formed the currently used CAMELS rating system. According to the review of prior literature on failure prediction, there is no variable in sensitivity to market that significantly contributes to failure prediction, except for one which has no available data, which is the volatility of stock return. For this reason, this study incorporates the initial CAMEL rating system. Incorporating financial ratios that will test these five aspects of the rating system will enable us to have an overall coverage of the banks’ financial conditions.

Studies that used CAMELS include Boyacioglu et al. (2009), Feki et al. (2012) and Kristóf and Virág (2022)to predict failure in Turkish, Tunisian, and European banks respectively. Similarly, López Iturriaga and Sanz (2015) declare that their variables selection approach is close to the CAMEL rating system in studying the failure prediction in banks. Their model shows that the three financial ratios that have the most predictive power are the provision ratio, the risk concentration in the construction industry, and the equity support to loans. Also, the Canbas et al. (2005) study aims to construct an early warning system as a decision-support tool in banks. In studying Turkish banks, they find that PCA can be used as an alternative or supportive tool to the CAMELS rating system (Gasbarro et al. 2002).

2.3 Non-financial ratios

Existing studies that have examined the failure prediction of banks in the US include Serrano-Cinca and Gutiérrez-Nieto (2013) who use financial ratios to compare Partial Least Square Discriminant Analysis (PLS-DA) with eight other techniques. They assert that the US banking crisis is not over and that The FDIC recognizes that there are many banks at risk of failure. Also, López Iturriaga and Sanz (2015) predict the failure of US banks using a variables selection approach that is close to the CAMEL rating system. Other studies that have utilized financial ratios to study the failure prediction of US banks include Chauhan et al. (2009) and Cox and Wang (2014). However, none of these variables incorporates non-financial variables to predict bank failure.

In studying corporate bankruptcies, Jones (2017) finds that bankruptcy is better explained and predicted in a multi-dimensional setting. The author uses multiple non-financial and financial variables to predict bankruptcy and finds that non-traditional variables, such as ownership structure/concentration and CEO compensation, are among the strongest predictors. Also, Ioannidis et al. (2010) use several financial and non-financial variables to assess banks’ soundness; they find that the accuracy of classification of the models that include only financial variables is poor. This gives enough reason to believe that adding non-financial variables, such as CG, to the CAMEL ratios will enhance the accuracy of predicting bank failure.

Some studies have incorporated non-financial variables such as market-driven variables. Studies that use non-financial predictors include Cheng et al. (2018), Beaver et al. (2005), Jones (2017), Shumway (2001) and Charalambakis and Garrett (2016). CG is among the non-financial variables used in prediction (Daily and Dalton 1994; Lee and Yeh 2004; Brédart 2014a, b; Liang et al. 2016; Wu 2016; Jones 2017). However, Liang et al. (2016) declare that, even though the importance of CG is well recognised in the literature, little effort has been made to conduct empirical studies that test the contribution of CG indicators in failure prediction along with the financial ratios. They also declare that previously conducted studies have only used some selected features of CG, which suggests the need for a thorough examination of various CG indicators.

Similarly, Jones (2017) asserts that, despite having good theoretical reasons that relate CG indicators to failure, few studies examine them as alternative failure predictors. The Basel Committee on Banking Supervision states that the effectiveness of CG is critical to ensure the proper functioning of the banking sector and the whole economy (Basel Committee on Banking Supervision 2015). Lee and Yeh (2004) and Wu (2016) state that CG leads to corporate value reduction, but the question remains as to whether it also leads to financial distress. Also, Al-Faryan and Dockery (2021) find that the period following the CG change of firms listed in the Saudi Stock Markets shows sub-period improvement in market efficiency, and Enache and Hussainey (2020) find that CG has a positive effect on current and future firm performance up to two years ahead. While Zhai et al. (2022) find that CG drives the negative effect of bank risk-taking incentives on lending decisions. These arguments and findings give us reasons to believe that CG plays an important role in the success of firms.

To study financial distress in listed firms, Lee and Yeh (2004) use both financial ratios and CG indicators including board and ownership. They assert that weak CG leads to economic downturns and increases the probability of falling into financial distress. Likewise, Wu (2016) studies the relationship between CG variables and the risk of bankruptcy in firms. The author finds that board size and board independence are most significantly related to bankruptcy risk. The results show that CG variables are strong predictors of failure, but their prediction accuracy increases only nearer the time of bankruptcy. On the other hand, Daily and Dalton (1994) study the characteristics of failed banks and find that less board independence and more CEO duality show significant association with failure at three years before the bankruptcy event. Brédart (2014b, 2014a) finds that board size, CEO ownership, and CEO duality are significantly related to the financial distress of a firm.

One of the few studies that use CG as an alternative failure predictor is a study by Liang et al. (2016), who combine financial ratios with CG variables to predict failure. They conduct their study on non-financial firms in Taiwan by using statistical and artificial intelligence techniques. Their results suggest that CG enhances the accuracy of prediction and improves the performance of all models utilised in their study. They assert that their results may not apply to other markets due to the differences in the definition of distressed companies and CG indicators. They find that the most important CG indicators to predict failure are the ones related to the board and ownership structure. Jones (2017) uses 91 different predictor variables, including financial and non-financial predictors. He finds that the most significant predictors are ownership structure and CEO compensation, then market and accounting variables, and finally macro-economic variables. Also, Cheng et al. (2018) results show that specific types of institutional investors can determine which firms will file for bankruptcy among a set of equally distressed firms. These studies of failure prediction include few aspects of CG and do not include important characteristics such as CEO duality, board meetings and gender diversity.

3 Data and sample

3.1 Variables selection

We follow a two-step variable selection approach for the financial ratios. First, we use prior literature to select the financial ratios which have been used to predict bankruptcy or failure in studiesshown in Appendix 1, which resulted in 176 ratios. Next, we selected ratios that were found to be significant, which resulted in 43 ratios.Footnote 1 Then, 23 ratios were chosen out of the 43 based on the data availability. The second step is using the CAMEL rating system as a criterion for categorising the ratios into five groups, namely Capital, Assets, Management, Earnings, and Liquidity. It is worth mentioning that our review showed that the only significant variable in the sensitivity to market category which contributes to failure prediction is the volatility of stock return. This variable had no data availability; hence, we incorporate the 1991 CAMEL rating system in our study and exclude the Sensitivity to market category. The 23 CAMEL ratios are detailed in Panel B in Table 1.

Table 1 Number of banks in the datasets

As for CG variables, we have chosen all variables related to CG available on the Bloomberg database. We started with 72 variables, then eliminated variables with low data availability, and ended up with 23 variables that represent board characteristics, compensation structure, voting rights and ownership structure. The CG variables are detailed in Panel C in Table 1.

To confirm the results of CG in predicting bank failure, we replace the CG variables obtained from Bloomberg with another set of CG variables, which are the governance scores developed by the Institutional Shareholder Services (ISS). The ISS scores are detailed in Panel D in Table 1.

3.2 Data sampling

This study includes samples of failed and non-failed banks that are insured by FDIC from 2010 to 2018. The financial data were obtained from the FDIC website and the CG data from the Bloomberg database. Failed banks in the FDIC database include institutions entering receivership, had their deposits assumed by others, and merged into others under federal assistance plans (Bell 1997). However, in this study, failed banks are limited to either delisted or merged banks according to the Bloomberg database. The models are performed with five different datasets, as detailed in Table 1, namely CAMEL, CG, ISS, CAMEL with CG, and CAMEL with ISS.

Table 2 F-test of banks’ size

The analysis includes matched samples that were constructed following Altman (1968) and Beaver (1966) in pairing the datasets based on a stratified random sampling, in which a non-failed bank of similar size is matched for every failed bank for the corresponding year. Also, the F-test is shown in Table 2. Reveal that small banks and large banks have unequal variances, with a higher mean value for large banks. Therefore, the effect of the bank size is controlled for in constructing the sample.

The stratified random sampling technique has been recently used by Hartnett and Shamsuddin (2020), Islam et al. (2019) and Sarhan et al. (2018). This technique avoids a biased sample by ensuring that the samples for both the failed and non-failed banks include the best match. Beaver (1966) declares that this sampling technique controls for factors that might affect the relationship between ratios and failure prediction. In addition, this sampling technique accounts for the class imbalance problem caused by the difference between the number of failed and non-failed cases, which could lead to a degradation in the performance of the prediction (Liang et al. 2016).

3.3 Discriminatory power test

We use a Mann–Whitney test to assess the discriminatory power of each variable and ratio by testing the discrepancies between failed and non-failed banks for one year before failure. Eight CAMEL ratios and three CG variables showed significant discrimination between failed and non-failed firms, as shown in Table 3.

Table 3 Mann–Whitney test

The eight CAMEL ratios are PTItoE, ECofNCO and AperE, which are under the Capital, Assets and Management categories respectively, NIEtoTI, IBEItoA and IBEItoA under the Earnings category, and, finally, NLLtoD and GLtoTD under the Liquidity category.

The three CG variables are the CPS, which represents the CEO’s compensation in comparison to that of the other executives, UVR, which represents the voting rights of shareholders, and InstitutO, which represents institutional ownership. These results show that none of the variables that represent board characteristics has discriminatory power.

4 Methodology and results

We examine the prediction of bank failure using several types of predictors, which are financial ratios (CAMEL), non-financial variables (CG), and combinations of both. The aim is to find predictors that provide better accuracy rates. To predict bank failure, we use five datasets, namely CAMEL, CG, ISS, CAMEL with CG, and CAMEL with ISS. We run all models three times where the explanatory variables are lagged by one, two and three years before failure.

4.1 Dataset 1: CAMEL ratios

We investigate the prediction accuracy using only CAMEL ratios; this will enable us to compare the results with the other datasets when CG variables are added. The Mann–Whitney test resulted in eight significant ratios; the discriminant function for the CAMEL ratios is as follows:

$$\begin{aligned} D_{1} & = B_{0} + B_{1} PTItoE + B_{2} ECofNCO + B_{3} AperE + B_{4} NIEtoTI \\ & \quad + B_{5} IBEItoA + B_{6} REtoTA + B_{7} NLLtoD + B_{8} GLtoTD \\ \end{aligned}$$
(1)

where \(D_{1}\) is a discriminant score, \(B_{0}\) is the constant, \(B_{1}\) to \(B_{8}\) are the coefficients. \(PTItoE\) is the Pre-Tax Income to Equity ratio, \(ECofNCO\) is Earnings Coverage of Net Charge Offs, \(AperE\) is the Assets per Employee, \(NIEtoTI\) is the Non-Interest Expenses to Total Income ratio, \(IBEItoA\) is the Income Before Extraordinary Items to Assets ratio, \(REtoTA\) is the Retained Earnings to Total Assets ratio, \(NLLtoD\) is the Net Loans and Leases to Deposits ratio, and \(GLtoTD\) is the Gross Loans to Total Deposits ratio.

The result of measuring the accuracy of predicting failure by using the CAMEL ratios are reported in Table 4. The overall accuracy ranges from 60.3% for three years before failure to 61.1% for one year before failure. The Wilks’ Lambda P-value shows that the discriminant function is significant for one, two and three years’ lagging, which shows that the categorising power of the function is high. Also, the canonical correlation and the Chi-square show that the models have acceptable discriminant ability.

Table 4 Discriminant analysis CAMEL ratios

The results show that IBEItoA and REtoTA are significant at 1% in all models, while NIEtoTI decreases from very significant in the one-year lagged model to not significant in the three-year lagged model. These ratios represent the earnings of a bank and their coefficients illustrate that banks with lower earnings relative to assets are more likely to fail. These results are in line with Kristóf and Virág (2022) who find that earneds is one of the strongest predictors of bank failure. Also, NLLtoD and GLtoTD, which represent liquidity, are significant at 5% in all models and show that failed banks are less liquid and have fewer deposits in relation to loans and leases three years before failure, but are more liquid one year before failure. On the other hand, the other variables that represent the capital structure, asset quality, and management of banks are not significant across all models. These results are interesting and unexpected since they show that the earnings and liquidity of a bank are more significant than its capital structure and asset quality to predict failure. The prediction power of earnings that extends up to three years before failure indicates that the decisions related to earnings have a long-term effect. Also, these results imply that the deterioration of earnings in failed banks starts early, which might be due to the provisioning for loan losses that have a direct impact on a bank’s earnings (Gopalan 2010). In addition, the increase of liquidity in failed banks implies that failed banks liquidate their assets nearer to their failure. The increase can also be due to the bailouts provided by the government for failing banks. On the other hand, the results related to the ratios of the capital structure and asset quality show that they are insignificant in comparison to the other aspects. This insignificance might be due to the banks’ capability to increase their capital ratios through reducing lending or selling assets (Gopalan 2010), which will result in concealing the capital’s deterioration in failed banks.

4.2 Dataset 2: CG variables

In this model, we investigate the prediction accuracy using CG variables. The discriminant function is as follows:

$$D_{2} = B_{0} + B_{1} CPS + B_{2} UVR + B_{3} InstitutO$$
(2)

where \(D_{2}\) is a discriminant score, \(B_{0}\) is the constant, \(B_{1}\) to \(B_{3}\) are the coefficients. \(CPS\) is the CEO Pay Slice, \(UVR\) is the Unequal Voting Rights, and \(InstitutO\) is the Institutional Ownership.

The results of the DA for the second dataset, which represents the CG variables, are reported in Table 5. The eigenvalue, canonical correlation, and chi-square show that all models have a good discriminant ability, and Wilk’s Lambda p-value shows that the discriminant function is statistically significant. The accuracy of predicting bank failure using CG variables is higher in comparison to the CAMEL ratios, where the percentage ranges from 62.4% one year before failure to 64.1% three years before failure. Above that, the accuracy increases as the lagging increases, which shows that CG is better than CAMEL for long-term prediction. All variables are significant, notably the CPS, which is significant at 1% in all models. CPS is associated with agency problems and banks are more likely to fail if their CEOs receive high compensation in comparison to their executive directors. These results are in line with the findings of Jones (2017): that CEO compensation and ownership structure are among the strongest non-traditional predictors. Also, the CPS represents CEO power, which has been found to have a negative effect on the monitoring power of boards (Pathan 2009), accounting profitability and stock returns (Bebchuk et al. 2011).

Table 5 Discriminant analysis CG variables

In addition, the results show that unequal voting rights and institutional shareholding are the next significant non-financial predictors with positive effects. This is in line with the proposition that the potential costs of a dual-class structure increase with time, while the potential benefits decrease, which indicates the importance of sunset provisioning (Bebchuk and Kastiel 2017). Also, institutional shareholders pressurise management to deliver short-run performance because they do not internalise the social costs and institutional arrangements of financial institutions’ failures (Erkens et al. 2012; Andreou et al. 2016).

4.3 Dataset 3: CAMEL and CG

In this model, we investigate the prediction accuracy using CAMEL ratios with CG variables together. The discriminant function is as follows:

$$\begin{aligned} D_{4} & = B_{0} + B_{1} PTItoE + B_{2} ECofNCO + B_{3} AperE + B_{4} NIEtoTI + B_{5} IBEItoA \\ & \quad + B_{6} REtoTA + B_{7} NLLtoD + B_{8} GLtoTD + B_{9} CPS + B_{10} UVR + B_{11} InstitutO \\ \end{aligned}$$
(3)

where \(D_{4}\) is a discriminant score, \(B_{0}\) is the constant, \(B_{1}\) to \(B_{11}\) are the coefficients. \(PTItoE,\) \(ECofNCO,\) \(AperE,\) \(NIEtoTI,\) \(IBEItoA,\) \(REtoTA,\) \(NLLtoD,\) and \(GLtoTD\) are the CAMEL ratios. \(CPS,\) \(UVR,\) and \(InstitutO\) are the CG variables.

The results of the DA for both the CAMEL ratios with the CG variables are reported in Table 6. Despite the decrease in the significance of the models, the eigenvalue, canonical correlation and chi-square show that the function has a better discriminant ability when combining the CAMEL ratios and the CG variables. Also, the accuracy percentages have increased significantly in comparison to using them individually. For example, the percentage for the three-year lagged model has increased to 71.4% (from 60.3% using CAMEL ratios, and 64.1% using CG variables). Another notable finding is that the accuracy of prediction is increasing as the time horizon increases, comparing one and three years before failure. These results show that CG variables not only enhance the accuracy but also extend the time horizon of prediction. This finding confirms the crucial role of CG in assuring the proper functioning of banks, as suggested by the Basel Committee on Banking Supervision (Basel Committee on Banking Supervision 2015). Also, the increase in prediction accuracy when combining CG variables with CAMEL rations are in line with Brogi and Lagasio (2022) who find that CG is not important by itself. This confirms that having a multi-dimensional setting by including different aspects of the bank provides a better prediction of failure (Jones 2017).

Table 6 Discriminant analysis CAMEL ratios and CG variables

Also, the coefficients and their significance confirm and complement the previous findings using the first and the second datasets. With regard to CAMEL, the REtoTA shows robust and significant findings across all models, which confirms that failed banks have fewer earnings relative to assets. The models also confirm that the capital structure, assets and management of banks are not significant predictors, except for PTItoE, which is only significant one year before failure. On the other hand, these models show that liquidity ratios are not significant predictors, in contrast to the results using the first dataset. As for the CG variables, CPS shows robust significance and effects across all models, which confirms that in failed banks CEOs receive a higher percentage of remuneration. In addition, the unequal voting rights and institutional ownership are less significant and have fewer impacts in comparison to the models using the CG variables only.

4.4 Datasets 4 and 5: CAMEL and ISS

In the fourth dataset, we used ISS scores as an alternative measurement of CG, but the variables were not significant and the accuracy was much lower. We think that this is due to combining many variables in four indices that are not suitable for the prediction of failure. We excluded the results from the paper.

Using the fifth dataset, we investigate the prediction accuracy using CAMEL ratios with the four ISS scores. The discriminant function is as follows:

$$\begin{aligned} D_{5} & = B_{0} + B_{1} PTItoE + B_{2} ECofNCO + B_{3} AperE + B_{4} NIEtoTI + B_{5} IBEItoA \\ & \quad + B_{6} REtoTA + B_{7} NLLtoD + B_{8} GLtoTD + B_{9} ISSB + B_{10} ISSS + B_{11} ISSC + B_{12} ISSA \\ \end{aligned}$$
(4)

where \(D_{5}\) is a discriminant score, \(B_{0}\) is an estimated constant, \(B_{1}\) to \(B_{12}\) are the estimated coefficients. \(PTItoE,\) \(ECofNCO,\) \(AperE,\) \(NIEtoTI,\) \(IBEItoA,\) \(REtoTA,\) \(NLLtoD,\) and \(GLtoTD\) are the CAMEL ratios. \(ISSB,\) \(ISSS,\) \(ISSC,\) and \(ISSA\) are the ISS scores that represent CG.

Replacing CG variables with ISS scores shows relatively the same results for the years 2013 to 2018 shown in Table 7, which again confirms the early predictive power of CG.

Table 7 Discriminant analysis CAMEL ratios and ISS scores

5 Robustness test

It is worth mentioning that the compared models include different sizes of paired samples. Thus, we re-run the analysis using the same sample sizes for all models (CAMEL, CG, and CAMEL with CG) to test the robustness of the results. The results of the CG for both the DA and the LR in are relatively similar to the analysis using different sizes of paired samples.

Next, to test the robustness of the result, we re-estimate Table 6 (Discriminant analysis CAMEL ratios and CG variables) using propensity score matching approach to choose the matched samples. The propensity score approach helps in alleviating the omitted variable concern, allows for a more accurate analysis. (Rosenbaum and Rubin 1983; Heckman et al. 1997; Houston et al. 2014). We match failed banks with non-failed banks using the propensity score and then re-estimate the discriminant analysis using CAMEL ratios ad CG variables. The propensity scores are estimated via a logit model with the dependent variable as a dummy variable that equals one for non-failed banks, and zero for failed banks. The independent variables are the bank control variables which include log of total assets, return to assets, debt to assets and bank age.

The results based on propensity score matching reinforce the conclusion that the accuracy of failure prediction is enhanced when combining CG variables as non-financial predictor with financial predictors, which confirms the robustness of the results.

In addition, we test the robustness of the results using Logistic Regression (LR) to predict the failure of banks using CAMEL ratios and CG variables that were found to have a discriminatory power. du Jardin (2016) used LR, which has also been used by Ohlson (1980), shortly after DA to predict bankruptcy. The author uses LR alongside DA because it has two advantages over the latter: does not require optimality of explanatory variables and allows the use of qualitative variables. We run the five datasets three times where the explanatory variables are lagged for one, two and three years before failure. The model fit for each dataset is measured using the log-likelihood ratio, chi-square, and Pseudo R squared tests.

Overall, the robustness test using the LR confirms the findings of the DA, where adding CG as a non-financial predictor to the financial ratios enhances the accuracy of failure prediction and extends the time horizon. These results are in line with the proposition that failure prediction can be improved by using a multi-dimensional setting. The robustness test also confirms that earnings and liquidity are the most significant aspects in CAMEL, while CPS and institutional shareholding are the most significant in CG.

6 Additional analysis

The discriminant analysis of combining both CAMEL ratios and CG variables showed the best performance in terms of prediction accuracy. To contend that the bank failure prediction models with CG variables outperform the ones without CG variables, we conduct an out-of-sample prediction examination of the CG and CAMEL model and CAMEL only model.

We divide the whole sample period (2010–2018) into two subperiods. The sample of the earlier subperiod (2010–2016) is used to create the in-sample dataset and develop the prediction model. The second subperiod (2017–2018) is used to create the out-of-sample dataset and examine the prediction accuracy by employing the developed prediction model based on the in-sample dataset. In constructing the in and out samples which represent the training and testing samples respectively, it is taken into account the need for a large training sample to provide accurate prediction (Alaka et al. 2018). Therefore, the last two years of the full period were chosen as the test samples following López Iturriaga and Sanz (2015).

The in-sample results shown in panel A in Table 8 provide the development of the prediction model for the CG and CAMEL model based on the earlier subperiod. The results are similar to the main analysis findings shown in Table 6 which shows that the combination of CG variables (non-financial variables) and CAMEL ratios (financial ratios) can predict failure up to three years before failure. To examine the validity of this prediction model which includes eight CAMEL ratios and three CG variables, we employ this model on a new dataset that it has not been trained on, which is the latest subperiod that represents the out-of-sample dataset.

Table 8 Developing the prediction model (CAMEL and CG)

The out-of-sample results shown in panel A in Table 9 indicate that the combination of CAMEL ratios and CG variables identifies a high number of failures. Therefore, the out-of-sample analysis confirms that the prediction model has a good predictive ability. It also confirms that adding CG variables to the model increases the prediction accuracy as the time horizon extends to three years (72.4% accuracy for three years before failure in comparison to 61.3% for one year before failure).

Table 9 Out-of-sample examination

To further support these results, we compare the out-sample prediction powers of the CG and CAMEL model with the CAMEL only model. We first develop the prediction model using CAMEL ratios (excluding the CG variables) using the in-sample analysis which is shown in panel B in Table 8. The results show that the prediction accuracy of the model including CAMEL ratios only is lower than then model that includes CG variables. These results confirm that that adding CG variables not only enhance the accuracy but also extend the time horizon of prediction to three years before failure.

Next, we conduct the out-of-sample analysis using the CAMEL ratios only which is shown in panel B in Table 9. The results provide a further confirmation that the bank failure prediction models with CG variables outperform the ones without CG variables. Panel B in Table 9 shows that the accuracy rates of the model including the CAMEL ratios only range from 60.20% for one year before failure to 55.17% for three years before failure, while the model that includes the CG variables shown in Table 9 increases the accuracy rate to range from 61.3% for one year before failure to 72.4% for three years before failure.

7 Conclusion

Existing studies that examine bank failure prediction have restricted their prediction models to financial ratios only. However, this paper shows that adding CG variables (as non-financial predictors) to the traditional financial ratios not only enhances the accuracy of bank failure but also extends the time horizon of bank failure prediction. These findings imply that incorporating different aspects will give a better view of the bank’s condition and hence improve the prediction accuracy. By combining financial and non-financial variables, we were able to not only prevent the accuracy rates from dropping dramatically but also in some cases to improve them. Other studies suffered from decreasing accuracies as the time horizon of prediction increased using only financial ratios (du Jardin 2017).

Furthermore, we implement a Mann–Whitney test, which helps us identify variables with significant discriminatory power. The test shows that board characteristics and most compensation characteristics have no discriminatory power. We then employ DA and LR with five datasets to compare prediction models that include CG variables and other models that don’t. The results show that the earnings followed by the liquidity are the key determinants of bank failure, but capital, assets, and management are insignificant in failure prediction. In addition, the models with added CG variables have better prediction accuracies that increase up to three years before failure. These models also show that the CPS, unequal voting rights, and institutional ownership structure serve as significant predictors of bank failure.

These results are robust to the out-of-sample examination which confirms the validity of the prediction model. This paper has significant implications for shareholders, stakeholders, and regulators, as it provides guidelines related to the success of banks to predict failures and prevent them from happening.